Science.gov

Sample records for aspergillus genome database

  1. Screening of Aspergillus-derived FAD-glucose dehydrogenases from fungal genome database.

    PubMed

    Mori, Kazushige; Nakajima, Mitsuharu; Kojima, Katsuhiro; Murakami, Koudai; Ferri, Stefano; Sode, Koji

    2011-11-01

    Aspergillus-derived FAD-dependent glucose dehydrogenases (FADGDHs) were screened from fungal genomic databases, primarily by searching for putative homologues of the Aspergillus niger-derived glucose oxidase (GOD). Focusing on a GOD active-site motif, putative proteins annotated as belonging to the glucose methanol choline (GMC) oxidoreductase family were selected. Phylogenetic analysis of these putative proteins produced a GOD clade, which includes the A. niger and Penicillium amagasakiens GODs, and a second clade made up of putative proteins showing 30-40% homology with GOD. The genes encoding the proteins from the second clade were functionally expressed in Escherichia coli, resulting in dye-mediated glucose dehydrogenase (GDH) activity but not GOD activity. These results suggest that the putative proteins belonging to the second clade are FADGDHs. The 3D structure models of these FADGDHs were compared with the 3D structure of GOD. PMID:21748361

  2. Genome databases

    SciTech Connect

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts in the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.

  3. Genomics of Aspergillus flavus mycotoxin production

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The aspergilli show immense ecological and metabolic diversity. To date, the sequences of fifteen different Aspergillus genomes have been determined providing scientists with an exciting resource to improve the understanding of Aspergillus molecular genomics. Aspergillus flavus, one of the most wide...

  4. Draft Genome Sequences of Fungus Aspergillus calidoustus.

    PubMed

    Horn, Fabian; Linde, Jörg; Mattern, Derek J; Walther, Grit; Guthke, Reinhard; Scherlach, Kirstin; Martin, Karin; Brakhage, Axel A; Petzke, Lutz; Valiante, Vito

    2016-01-01

    Here, we report the draft genome sequence of Aspergillus calidoustus (strain SF006504). The functional annotation of A. calidoustus predicts a relatively large number of secondary metabolite gene clusters. The presented genome sequence builds the basis for further genome mining. PMID:26966204

  5. Draft Genome Sequences of Fungus Aspergillus calidoustus

    PubMed Central

    Horn, Fabian; Linde, Jörg; Mattern, Derek J.; Walther, Grit; Guthke, Reinhard; Scherlach, Kirstin; Martin, Karin; Brakhage, Axel A.; Petzke, Lutz

    2016-01-01

    Here, we report the draft genome sequence of Aspergillus calidoustus (strain SF006504). The functional annotation of A. calidoustus predicts a relatively large number of secondary metabolite gene clusters. The presented genome sequence builds the basis for further genome mining. PMID:26966204

  6. Aspergillus flavus Genomics for Controlling Aflatoxin Contamination

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The main objectives of the Aspergillus flavus genomics program are to identify genes and regulatory components involved in aflatoxin biosynthesis for solving aflatoxin contamination in agricultural crops. A. flavus Expressed Sequence Tags (EST), microarray and whole genome sequencing have been achi...

  7. Aspergillus flavus Blast2GO gene ontology database: elevated growth temperature alters amino acid metabolism

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The availability of a representative gene ontology (GO) database is a prerequisite for a successful functional genomics study. Using online Blast2GO resources we constructed a GO database of Aspergillus flavus. Of the predicted total 13,485 A. flavus genes 8,987 were annotated with GO terms. The mea...

  8. Comparative Reannotation of 21 Aspergillus Genomes

    SciTech Connect

    Salamov, Asaf; Riley, Robert; Kuo, Alan; Grigoriev, Igor

    2013-03-08

    We used comparative gene modeling to reannotate 21 Aspergillus genomes. Initial automatic annotation of individual genomes may contain some errors of different nature, e.g. missing genes, incorrect exon-intron structures, 'chimeras', which fuse 2 or more real genes or alternatively splitting some real genes into 2 or more models. The main premise behind the comparative modeling approach is that for closely related genomes most orthologous families have the same conserved gene structure. The algorithm maps all gene models predicted in each individual Aspergillus genome to the other genomes and, for each locus, selects from potentially many competing models, the one which most closely resembles the orthologous genes from other genomes. This procedure is iterated until no further change in gene models is observed. For Aspergillus genomes we predicted in total 4503 new gene models ( ~;;2percent per genome), supported by comparative analysis, additionally correcting ~;;18percent of old gene models. This resulted in a total of 4065 more genes with annotated PFAM domains (~;;3percent increase per genome). Analysis of a few genomes with EST/transcriptomics data shows that the new annotation sets also have a higher number of EST-supported splice sites at exon-intron boundaries.

  9. Querying genomic databases

    SciTech Connect

    Baehr, A.; Hagstrom, R.; Joerg, D.; Overbeek, R.

    1991-09-01

    A natural-language interface has been developed that retrieves genomic information by using a simple subset of English. The interface spares the biologist from the task of learning database-specific query languages and computer programming. Currently, the interface deals with the E. coli genome. It can, however, be readily extended and shows promise as a means of easy access to other sequenced genomic databases as well.

  10. Genomic mining for Aspergillus natural products.

    PubMed

    Bok, Jin Woo; Hoffmeister, Dirk; Maggio-Hall, Lori A; Murillo, Renato; Glasner, Jeremy D; Keller, Nancy P

    2006-01-01

    The genus Aspergillus is renowned for its ability to produce a myriad of bioactive secondary metabolites. Although the propensity of biosynthetic genes to form contiguous clusters greatly facilitates assignment of putative secondary metabolite genes in the completed Aspergillus genomes, such analysis cannot predict gene expression and, ultimately, product formation. To circumvent this deficiency, we have examined Aspergillus nidulans microarrays for expressed secondary metabolite gene clusters by using the transcriptional regulator LaeA. Deletion or overexpression of laeA clearly identified numerous secondary metabolite clusters. A gene deletion in one of the clusters eliminated the production of the antitumor compound terrequinone A, a metabolite not described, from A. nidulans. In this paper, we highlight that LaeA-based genome mining helps decipher the secondary metabolome of Aspergilli and provides an unparalleled view to assess secondary metabolism gene regulation. PMID:16426969

  11. Genomics of peanut-Aspergillus flavus interactions

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxin contamination caused by Aspergillus fungi is a great concern in peanut production worldwide. Pre-harvest Aspergillii infection and aflatoxin contamination are usually severe in peanuts that are grown under drought stressed conditions. Genomic research can provide new tools and resources to...

  12. Mouse genome database 2016

    PubMed Central

    Bult, Carol J.; Eppig, Janan T.; Blake, Judith A.; Kadin, James A.; Richardson, Joel E.

    2016-01-01

    The Mouse Genome Database (MGD; http://www.informatics.jax.org) is the primary community model organism database for the laboratory mouse and serves as the source for key biological reference data related to mouse genes, gene functions, phenotypes and disease models with a strong emphasis on the relationship of these data to human biology and disease. As the cost of genome-scale sequencing continues to decrease and new technologies for genome editing become widely adopted, the laboratory mouse is more important than ever as a model system for understanding the biological significance of human genetic variation and for advancing the basic research needed to support the emergence of genome-guided precision medicine. Recent enhancements to MGD include new graphical summaries of biological annotations for mouse genes, support for mobile access to the database, tools to support the annotation and analysis of sets of genes, and expanded support for comparative biology through the expansion of homology data. PMID:26578600

  13. Mouse genome database 2016.

    PubMed

    Bult, Carol J; Eppig, Janan T; Blake, Judith A; Kadin, James A; Richardson, Joel E

    2016-01-01

    The Mouse Genome Database (MGD; http://www.informatics.jax.org) is the primary community model organism database for the laboratory mouse and serves as the source for key biological reference data related to mouse genes, gene functions, phenotypes and disease models with a strong emphasis on the relationship of these data to human biology and disease. As the cost of genome-scale sequencing continues to decrease and new technologies for genome editing become widely adopted, the laboratory mouse is more important than ever as a model system for understanding the biological significance of human genetic variation and for advancing the basic research needed to support the emergence of genome-guided precision medicine. Recent enhancements to MGD include new graphical summaries of biological annotations for mouse genes, support for mobile access to the database, tools to support the annotation and analysis of sets of genes, and expanded support for comparative biology through the expansion of homology data. PMID:26578600

  14. Genomic Islands in Pathogenic Filamentous Fungus Aspergillus fumigatus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We present the genome sequences of a new clinical isolate, CEA10, of an important human pathogen, Aspergillus fumigatus, and two closely related, but rarely pathogenic species, Neosartorya fischeri NRRL181 and Aspergillus clavatus NRRL1. Comparative genomic analysis of CEA10 with the recently sequen...

  15. What can Aspergillus flavus genome offer for mycotoxin research?

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genomic study of filamentous fungi has made significant advances in recent years, and the genomes of several species in the genus Aspergillus have been sequenced, including Aspergillus flavus. This ubiquitous mold is present as a saprobe in a wide range of agricultural and natural habits, and c...

  16. A first glance into the genome sequence of Aspergillus flavus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxins, produced by Aspergillus flavus and A. parasiticus, are toxic and carcinogenic metabolites. They contaminate agricultural crops before harvest and post harvest grains during storage. In order to reduce and eliminate aflatoxin contamination of food and feed, Aspergillus flavus genomics p...

  17. WHOLE GENOME COMPARISON OF ASPERGILLUS FLAVUS AND A. ORYZAE

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus is a plant and animal pathogen that also produces the potent carcinogen aflatoxin. Aspergillus oryzae is a closely related species that has been used for centuries in the food fermentation industry and is generally regarded as safe (GRAS). Whole genome sequences for these two fu...

  18. Genomic sequence for the aflatoxigenic filamentous fungus Aspergillus nomius

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genome of the A. nomius type strain was sequenced using a personal genome machine. Annotation of the genes was undertaken, followed by gene ontology and an investigation into the number of secondary metabolite clusters. Comparative studies with other Aspergillus species involved shared/unique ge...

  19. CROP GENOME DATABASES -- CRITICAL ISSUES

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Crop genome databases, see www.agron.missouri.edu/bioservers.html of the past decade have had designed and implemented (1) models and schema for the genome and related domains; (2) methodologies for input of data by expert biologists and high-throughput projects; and (3) various text, graphical, and...

  20. Human genome protein function database.

    PubMed Central

    Sorenson, D. K.

    1991-01-01

    A database which focuses on the normal functions of the currently-known protein products of the Human Genome was constructed. Information is stored as text, figures, tables, and diagrams. The program contains built-in functions to modify, update, categorize, hypertext, search, create reports, and establish links to other databases. The semi-automated categorization feature of the database program was used to classify these proteins in terms of biomedical functions. PMID:1807638

  1. Aspergillus genomes: secret sex and the secrets of sex.

    PubMed

    Scazzocchio, Claudio

    2006-10-01

    The genomic sequences of three species of Aspergillus, including the model organism A. nidulans (which is homothallic: having no differentiated mating types, a strain being able to cross with itself), suggest that A. fumigatus and A. oryzae, considered to be asexual, might in fact be heterothallic (having two differentiated mating types, a strain being able to cross only with strains of opposite mating type). The genomic data have implications for the understanding of the evolution and the mechanism of sexual reproduction in this genus. We propose a model of epigenetic heterothallism to account for the reproductive patterns observed in Aspergillus nidulans. PMID:16911845

  2. Genomic analysis of aspergillus flavus pathogenesis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus and Fusarium verticillioides colonize developing maize seeds and contaminate them with mycotoxins. Maize genotypes differ in resistance to these fungi, but incorporation of adequate resistance into desirable hybrids has been challenging.Little is known about pathogenesis of seeds...

  3. GOLD: The Genomes Online Database

    DOE Data Explorer

    Kyrpides, Nikos; Liolios, Dinos; Chen, Amy; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor; Bernal, Alex

    Since its inception in 1997, GOLD has continuously monitored genome sequencing projects worldwide and has provided the community with a unique centralized resource that integrates diverse information related to Archaea, Bacteria, Eukaryotic and more recently Metagenomic sequencing projects. As of September 2007, GOLD recorded 639 completed genome projects. These projects have their complete sequence deposited into the public archival sequence databases such as GenBank EMBL,and DDBJ. From the total of 639 complete and published genome projects as of 9/2007, 527 were bacterial, 47 were archaeal and 65 were eukaryotic. In addition to the complete projects, there were 2158 ongoing sequencing projects. 1328 of those were bacterial, 59 archaeal and 771 eukaryotic projects. Two types of metadata are provided by GOLD: (i) project metadata and (ii) organism/environment metadata. GOLD CARD pages for every project are available from the link of every GOLD_STAMP ID. The information in every one of these pages is organized into three tables: (a) Organism information, (b) Genome project information and (c) External links. [The Genomes On Line Database (GOLD) in 2007: Status of genomic and metagenomic projects and their associated metadata, Konstantinos Liolios, Konstantinos Mavromatis, Nektarios Tavernarakis and Nikos C. Kyrpides, Nucleic Acids Research Advance Access published online on November 2, 2007, Nucleic Acids Research, doi:10.1093/nar/gkm884]

    The basic tables in the GOLD database that can be browsed or searched include the following information:

    • Gold Stamp ID
    • Organism name
    • Domain
    • Links to information sources
    • Size and link to a map, when available
    • Chromosome number, Plas number, and GC content
    • A link for downloading the actual genome data
    • Institution that did the sequencing
    • Funding source
    • Database where information resides
    • Publication status and information

    • Maize Genetics and Genomics Database

      Technology Transfer Automated Retrieval System (TEKTRAN)

      The 2007 report for MaizeGDB lists the new hires who will focus on curation/outreach and the genome sequence, respectively. Currently all sequence in the database comes from a PlantGDB pipeline and is presented with deep links to external resources such as PlantGDB, Dana Farber, GenBank, the Arizona...

    • GOBASE: an organelle genome database

      PubMed Central

      O’Brien, Emmet A.; Zhang, Yue; Wang, Eric; Marie, Veronique; Badejoko, Wole; Lang, B. Franz; Burger, Gertraud

      2009-01-01

      The organelle genome database GOBASE, now in its 21st release (June 2008), contains all published mitochondrion-encoded sequences (∼913 000) and chloroplast-encoded sequences (∼250 000) from a wide range of eukaryotic taxa. For all sequences, information on related genes, exons, introns, gene products and taxonomy is available, as well as selected genome maps and RNA secondary structures. Recent major enhancements to database functionality include: (i) addition of an interface for RNA editing data, with substitutions, insertions and deletions displayed using multiple alignments; (ii) addition of medically relevant information, such as haplotypes, SNPs and associated disease states, to human mitochondrial sequence data; (iii) addition of fully reannotated genome sequences for Escherichia coli and Nostoc sp., for reference and comparison; and (iv) a number of interface enhancements, such as the availability of both genomic and gene-coding sequence downloads, and a more sophisticated literature reference search functionality with links to PubMed where available. Future projects include the transfer of GOBASE features to NCBI/GenBank, allowing long-term preservation of accumulated expert information. The GOBASE database can be found at http://gobase.bcm.umontreal.ca/. Queries about custom and large-scale data retrievals should be addressed to gobase@bch.umontreal.ca. PMID:18953030

    • What can comparative genomics tell us about species concepts in the genus Aspergillus?

      SciTech Connect

      Rokas, Antonis; payne, gary; Federova, Natalie D.; Baker, Scott E.; Machida, Masa; yu, Jiujiang; georgianna, D. R.; Dean, Ralph A.; Bhatnagar, Deepak; Cleveland, T. E.; Wortman, Jennifer R.; Maiti, R.; Joardar, V.; Amedeo, Paolo; Denning, David W.; Nierman, William C.

      2007-12-15

      Understanding the nature of species" boundaries is a fundamental question in evolutionary biology. The availability of genomes from several species of the genus Aspergillus allows us for the first time to examine the demarcation of fungal species at the whole-genome level. Here, we examine four case studies, two of which involve intraspecific comparisons, whereas the other two deal with interspecific genomic comparisons between closely related species. These four comparisons reveal significant variation in the nature of species boundaries across Aspergillus. For example, comparisons between A. fumigatus and Neosartorya fischeri (the teleomorph of A. fischerianus) and between A. oryzae and A. flavus suggest that measures of sequence similarity and species-specific genes are significantly higher for the A. fumigatus - N. fischeri pair. Importantly, the values obtained from the comparison between A. oryzae and A. flavus are remarkably similar to those obtained from an intra-specific comparison of A. fumigatus strains, giving support to the proposal that A. oryzae represents a distinct ecotype of A. flavus and not a distinct species. We argue that genomic data can aid Aspergillus taxonomy by serving as a source of novel and unprecedented amounts of comparative data, as a resource for the development of additional diagnostic tools, and finally as a knowledge database about the biological differences between strains and species.

    • Sexual structures in Aspergillus: morphology, importance and genomics.

      PubMed

      Geiser, David M

      2009-01-01

      The genus Aspergillus comprises a few hundred species sharing a common asexual spore forming structure, the aspergillum. Approximately one-third of these species also produce a sexual stage, all but five of which are known to be homothallic. Sexual stages associated with Aspergillus fall into approximately ten different genera, reflecting a tremendous degree of phylogenetic and biological diversity. Sexual stages in Aspergillus are plectomycetous, typical for the order in which it resides, the Eurotiales. Theoretically, a homothallic Aspergillus species can produce both asexual conidia and sexual ascospores in both clonal and recombinant fashion, although the actual significance of these potential modes of reproduction is unclear. Aspergillus species with known sexual stages tend to be minor players in infections of humans, perhaps because of their tendency to produce fewer asexual spores compared to their non-teleomorphic congeners. The discovery of population genetic and genomic evidence for sex in species with no known sexual stage indicates that no assumptions can be made about the clonal versus recombinant life histories of a species based on its known mitotic and/or meiotic reproductive modes. PMID:18608901

    • The 2008 update of the Aspergillus nidulans genome annotation: a community effort

      PubMed Central

      Wortman, Jennifer Russo; Gilsenan, Jane Mabey; Joardar, Vinita; Deegan, Jennifer; Clutterbuck, John; Andersen, Mikael R.; Archer, David; Bencina, Mojca; Braus, Gerhard; Coutinho, Pedro; von Döhren, Hans; Doonan, John; Driessen, Arnold J.M.; Durek, Pawel; Espeso, Eduardo; Fekete, Erzsébet; Flipphi, Michel; Estrada, Carlos Garcia; Geysens, Steven; Goldman, Gustavo; de Groot, Piet W.J.; Hansen, Kim; Harris, Steven D.; Heinekamp, Thorsten; Helmstaedt, Kerstin; Henrissat, Bernard; Hofmann, Gerald; Homan, Tim; Horio, Tetsuya; Horiuchi, Hiroyuki; James, Steve; Jones, Meriel; Karaffa, Levente; Karányi, Zsolt; Kato, Masashi; Keller, Nancy; Kelly, Diane E.; Kiel, Jan A.K.W.; Kim, Jung-Mi; van der Klei, Ida J.; Klis, Frans M.; Kovalchuk, Andriy; Kraševec, Nada; Kubicek, Christian P.; Liu, Bo; MacCabe, Andrew; Meyer, Vera; Mirabito, Pete; Miskei, Márton; Mos, Magdalena; Mullins, Jonathan; Nelson, David R.; Nielsen, Jens; Oakley, Berl R.; Osmani, Stephen A.; Pakula, Tiina; Paszewski, Andrzej; Paulsen, Ian; Pilsyk, Sebastian; Pócsi, István; Punt, Peter J.; Ram, Arthur F.J.; Ren, Qinghu; Robellet, Xavier; Robson, Geoff; Seiboth, Bernhard; Solingen, Piet van; Specht, Thomas; Sun, Jibin; Taheri-Talesh, Naimeh; Takeshita, Norio; Ussery, Dave; vanKuyk, Patricia A.; Visser, Hans; van de Vondervoort, Peter J.I.; de Vries, Ronald P.; Walton, Jonathan; Xiang, Xin; Xiong, Yi; Zeng, An Ping; Brandt, Bernd W.; Cornell, Michael J.; van den Hondel, Cees A.M.J.J.; Visser, Jacob; Oliver, Stephen G.; Turner, Geoffrey

      2010-01-01

      The identification and annotation of protein-coding genes is one of the primary goals of whole-genome sequencing projects, and the accuracy of predicting the primary protein products of gene expression is vital to the interpretation of the available data and the design of downstream functional applications. Nevertheless, the comprehensive annotation of eukaryotic genomes remains a considerable challenge. Many genomes submitted to public databases, including those of major model organisms, contain significant numbers of wrong and incomplete gene predictions. We present a community-based reannotation of the Aspergillus nidulans genome with the primary goal of increasing the number and quality of protein functional assignments through the careful review of experts in the field of fungal biology. PMID:19146970

    • Comparative Genomics of Aspergillus flavus and A. oryzae: An Early View

      Technology Transfer Automated Retrieval System (TEKTRAN)

      Aspergillus flavus produces aflatoxins and is the second leading cause of aspergillosis in immunocompromised individuals. Aspergillus oryzae, on the other hand, has been used for centuries in Japan for the fermentation of food. The recently available whole genome sequences of Aspergillus flavus an...

    • Impact of Aspergillus oryzae genomics on industrial production of metabolites.

      PubMed

      Abe, Keietsu; Gomi, Katusya; Hasegawa, Fumihiko; Machida, Masayuki

      2006-09-01

      Aspergillus oryzae is used extensively for the production of the traditional Japanese fermented foods sake (rice wine), shoyu (soy sauce), and miso (soybean paste). In recent years, recombinant DNA technology has been used to enhance industrial enzyme production by A. oryzae. Recently completed genomic studies using expressed sequence tag (EST) analyses and whole-genome sequencing are quickly expanding the industrial potential of the fungus in biotechnology. Genes that have been newly discovered through genome research can be used for the production of novel valuable enzymes and chemicals, and are important for designing new industrial processes. This article describes recent progress of A . oryzae genomics and its impact on industrial production of enzymes, metabolites, and bioprocesses. PMID:16944282

    • Searching and Indexing Genomic Databases via Kernelization

      PubMed Central

      Gagie, Travis; Puglisi, Simon J.

      2015-01-01

      The rapid advance of DNA sequencing technologies has yielded databases of thousands of genomes. To search and index these databases effectively, it is important that we take advantage of the similarity between those genomes. Several authors have recently suggested searching or indexing only one reference genome and the parts of the other genomes where they differ. In this paper, we survey the 20-year history of this idea and discuss its relation to kernelization in parameterized complexity. PMID:25710001

    • Aspergillus Niger Genomics: Past, Present and into the Future

      SciTech Connect

      Baker, Scott E.

      2006-09-01

      Aspergillus niger is a filamentous ascomycete fungus that is ubiquitous in the environment and has been implicated in opportunistic infections of humans. In addition to its role as an opportunistic human pathogen, A. niger is economically important as a fermentation organism used for the production of citric acid. Industrial citric acid production by A. niger represents one of the most efficient, highest yield bioprocesses in use currently by industry. The genome size of A. niger is estimated to be between 35.5 and 38.5 megabases (Mb) divided among eight chromosomes/linkage groups that vary in size from 3.5 - 6.6 Mb. Currently, there are three independent A. niger genome projects, an indication of the economic importance of this organism. The rich amount of data resulting from these multiple A. niger genome sequences will be used for basic and applied research programs applicable to fermentation process development, morphology and pathogenicity.

    • The complete mitochondrial genome sequence of Aspergillus flavus.

      PubMed

      Yan, Zhengsong; Chen, Dan; Shen, Yiping; Ye, Baodong

      2016-07-01

      Aspergillus flavus is a haploid filamentous fungus that is common in the environment and has been implicated in human infections. The complete mitochondrial genome of A. flavus has been determined by high-throughput sequencing technology in this work. Our study revealed that the mitochondrial genome of A. flavus is 31,602 bp long, with an A + T content of 74.83%, which consists of a usual set of mitochondrial proteins and RNA genes, including large and small ribosomal RNAs, 15 proteins, and 20 tRNA genes and contains two introns. Notably, it also contains two hypothetical proteins without obvious homology to any known proteins. All structural genes are located on one strand and are apparently transcribed in one direction. Codon usage analysis indicated that all protein coding genes employ the standard fungal mitochondrial start and stop codons; and the nucleotide bias toward AT was also reflected in the codon usage. The complete mitochondrial genomes of A. flavus would be useful for future investigation of the genetic, evolution, and clinical identification of Aspergillus species. PMID:25922962

    • Genomic Islands in the Pathogenic Filamentous Fungus Aspergillus fumigatus

      PubMed Central

      Fedorova, Natalie D.; Khaldi, Nora; Joardar, Vinita S.; Maiti, Rama; Amedeo, Paolo; Anderson, Michael J.; Crabtree, Jonathan; Silva, Joana C.; Badger, Jonathan H.; Albarraq, Ahmed; Angiuoli, Sam; Bussey, Howard; Bowyer, Paul; Cotty, Peter J.; Dyer, Paul S.; Egan, Amy; Galens, Kevin; Fraser-Liggett, Claire M.; Haas, Brian J.; Inman, Jason M.; Kent, Richard; Lemieux, Sebastien; Malavazi, Iran; Orvis, Joshua; Roemer, Terry; Ronning, Catherine M.; Sundaram, Jaideep P.; Sutton, Granger; Turner, Geoff; Venter, J. Craig; White, Owen R.; Whitty, Brett R.; Youngman, Phil; Wolfe, Kenneth H.; Goldman, Gustavo H.; Wortman, Jennifer R.; Jiang, Bo; Denning, David W.; Nierman, William C.

      2008-01-01

      We present the genome sequences of a new clinical isolate of the important human pathogen, Aspergillus fumigatus, A1163, and two closely related but rarely pathogenic species, Neosartorya fischeri NRRL181 and Aspergillus clavatus NRRL1. Comparative genomic analysis of A1163 with the recently sequenced A. fumigatus isolate Af293 has identified core, variable and up to 2% unique genes in each genome. While the core genes are 99.8% identical at the nucleotide level, identity for variable genes can be as low 40%. The most divergent loci appear to contain heterokaryon incompatibility (het) genes associated with fungal programmed cell death such as developmental regulator rosA. Cross-species comparison has revealed that 8.5%, 13.5% and 12.6%, respectively, of A. fumigatus, N. fischeri and A. clavatus genes are species-specific. These genes are significantly smaller in size than core genes, contain fewer exons and exhibit a subtelomeric bias. Most of them cluster together in 13 chromosomal islands, which are enriched for pseudogenes, transposons and other repetitive elements. At least 20% of A. fumigatus-specific genes appear to be functional and involved in carbohydrate and chitin catabolism, transport, detoxification, secondary metabolism and other functions that may facilitate the adaptation to heterogeneous environments such as soil or a mammalian host. Contrary to what was suggested previously, their origin cannot be attributed to horizontal gene transfer (HGT), but instead is likely to involve duplication, diversification and differential gene loss (DDL). The role of duplication in the origin of lineage-specific genes is further underlined by the discovery of genomic islands that seem to function as designated “gene dumps” and, perhaps, simultaneously, as “gene factories”. PMID:18404212

    • Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine

      PubMed Central

      Elsik, Christine G.; Tayal, Aditi; Diesh, Colin M.; Unni, Deepak R.; Emery, Marianne L.; Nguyen, Hung N.; Hagen, Darren E.

      2016-01-01

      We report an update of the Hymenoptera Genome Database (HGD) (http://HymenopteraGenome.org), a model organism database for insect species of the order Hymenoptera (ants, bees and wasps). HGD maintains genomic data for 9 bee species, 10 ant species and 1 wasp, including the versions of genome and annotation data sets published by the genome sequencing consortiums and those provided by NCBI. A new data-mining warehouse, HymenopteraMine, based on the InterMine data warehousing system, integrates the genome data with data from external sources and facilitates cross-species analyses based on orthology. New genome browsers and annotation tools based on JBrowse/WebApollo provide easy genome navigation, and viewing of high throughput sequence data sets and can be used for collaborative genome annotation. All of the genomes and annotation data sets are combined into a single BLAST server that allows users to select and combine sequence data sets to search. PMID:26578564

    • Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine.

      PubMed

      Elsik, Christine G; Tayal, Aditi; Diesh, Colin M; Unni, Deepak R; Emery, Marianne L; Nguyen, Hung N; Hagen, Darren E

      2016-01-01

      We report an update of the Hymenoptera Genome Database (HGD) (http://HymenopteraGenome.org), a model organism database for insect species of the order Hymenoptera (ants, bees and wasps). HGD maintains genomic data for 9 bee species, 10 ant species and 1 wasp, including the versions of genome and annotation data sets published by the genome sequencing consortiums and those provided by NCBI. A new data-mining warehouse, HymenopteraMine, based on the InterMine data warehousing system, integrates the genome data with data from external sources and facilitates cross-species analyses based on orthology. New genome browsers and annotation tools based on JBrowse/WebApollo provide easy genome navigation, and viewing of high throughput sequence data sets and can be used for collaborative genome annotation. All of the genomes and annotation data sets are combined into a single BLAST server that allows users to select and combine sequence data sets to search. PMID:26578564

    • Rapid genome resequencing of an atoxigenic strain of Aspergillus carbonarius

      DOE PAGESBeta

      Cabañes, F. Javier; Sanseverino, Walter; Castellá, Gemma; Bragulat, M. Rosa; Cigliano, Riccardo Aiese; Sánchez, Armand

      2015-03-13

      In microorganisms, Ion Torrent sequencing technology has been proved to be useful in whole-genome sequencing of bacterial genomes (5 Mbp). In our study, for the first time we used this technology to perform a resequencing approach in a whole fungal genome (36 Mbp), a non-ochratoxin A producing strain of Aspergillus carbonarius. Ochratoxin A (OTA) is a potent nephrotoxin which is found mainly in cereals and their products, but it also occurs in a variety of common foods and beverages. Due to the fact that this strain does not produce OTA, we focused some of the bioinformatics analyses in genes involvedmore » in OTA biosynthesis, using a reference genome of an OTA producing strain of the same species. This study revealed that in the atoxigenic strain there is a high accumulation of nonsense and missense mutations in several genes. Importantly, a two fold increase in gene mutation ratio was observed in PKS and NRPS encoding genes which are suggested to be involved in OTA biosynthesis.« less

    • Plant cytogenetics in genome databases

      Technology Transfer Automated Retrieval System (TEKTRAN)

      Cytogenetic maps provide an integrated representation of genetic and cytological information that can be used to enhance genome and chromosome research. As genome analysis technologies become more affordable, the density of markers on cytogenetic maps increases, making these resources more useful a...

    • Complete mitochondrial genome of an Amynthas earthworm, Amynthas aspergillus (Oligochaeta: Megascolecidae).

      PubMed

      Zhang, Liangliang; Jiang, Jibao; Dong, Yan; Qiu, Jiangping

      2016-05-01

      We have determined the mitochondrial genome of the first Amynthas earthworm, Amynthas aspergillus (Perrier, 1872), which is a natural medical resource in Chinese traditional medicine. Its mitogenome is 15,115 bp in length containing 37 genes with the same contents and order as other sequenced earthworms. All genes are encoded by the same strand, all 13 PCGs use ATG as start codon. The content of A + T is 63.04% for A. aspergillus (33.41% A, 29.63% T, 14.56% G and 22.41% C). The complete mitochondrial genomes of A. aspergillus would be useful for the reconstruction of Oligochaeta polygenetic relationships. PMID:25329289

    • Genome Statute and Legislation Database

      MedlinePlus

      ... of page Last Reviewed: February 29, 2016 Get Email Updates Advancing human health through genomics research Privacy Copyright Contact Accessibility Plug-ins Site Map Staff Directory FOIA Share Top

  1. BGD: A Database of Bat Genomes

    PubMed Central

    Fang, Jianfei; Wang, Xuan; Mu, Shuo; Zhang, Shuyi; Dong, Dong

    2015-01-01

    Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD). BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/. PMID:26110276

  2. BGD: a database of bat genomes.

    PubMed

    Fang, Jianfei; Wang, Xuan; Mu, Shuo; Zhang, Shuyi; Dong, Dong

    2015-01-01

    Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD). BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/. PMID:26110276

  3. The UCSC Genome Browser database: 2016 update.

    PubMed

    Speir, Matthew L; Zweig, Ann S; Rosenbloom, Kate R; Raney, Brian J; Paten, Benedict; Nejad, Parisa; Lee, Brian T; Learned, Katrina; Karolchik, Donna; Hinrichs, Angie S; Heitner, Steve; Harte, Rachel A; Haeussler, Maximilian; Guruvadoo, Luvina; Fujita, Pauline A; Eisenhart, Christopher; Diekhans, Mark; Clawson, Hiram; Casper, Jonathan; Barber, Galt P; Haussler, David; Kuhn, Robert M; Kent, W James

    2016-01-01

    For the past 15 years, the UCSC Genome Browser (http://genome.ucsc.edu/) has served the international research community by offering an integrated platform for viewing and analyzing information from a large database of genome assemblies and their associated annotations. The UCSC Genome Browser has been under continuous development since its inception with new data sets and software features added frequently. Some release highlights of this year include new and updated genome browsers for various assemblies, including bonobo and zebrafish; new gene annotation sets; improvements to track and assembly hub support; and a new interactive tool, the "Data Integrator", for intersecting data from multiple tracks. We have greatly expanded the data sets available on the most recent human assembly, hg38/GRCh38, to include updated gene prediction sets from GENCODE, more phenotype- and disease-associated variants from ClinVar and ClinGen, more genomic regulatory data, and a new multiple genome alignment. PMID:26590259

  4. The UCSC Genome Browser database: 2016 update

    PubMed Central

    Speir, Matthew L.; Zweig, Ann S.; Rosenbloom, Kate R.; Raney, Brian J.; Paten, Benedict; Nejad, Parisa; Lee, Brian T.; Learned, Katrina; Karolchik, Donna; Hinrichs, Angie S.; Heitner, Steve; Harte, Rachel A.; Haeussler, Maximilian; Guruvadoo, Luvina; Fujita, Pauline A.; Eisenhart, Christopher; Diekhans, Mark; Clawson, Hiram; Casper, Jonathan; Barber, Galt P.; Haussler, David; Kuhn, Robert M.; Kent, W. James

    2016-01-01

    For the past 15 years, the UCSC Genome Browser (http://genome.ucsc.edu/) has served the international research community by offering an integrated platform for viewing and analyzing information from a large database of genome assemblies and their associated annotations. The UCSC Genome Browser has been under continuous development since its inception with new data sets and software features added frequently. Some release highlights of this year include new and updated genome browsers for various assemblies, including bonobo and zebrafish; new gene annotation sets; improvements to track and assembly hub support; and a new interactive tool, the “Data Integrator”, for intersecting data from multiple tracks. We have greatly expanded the data sets available on the most recent human assembly, hg38/GRCh38, to include updated gene prediction sets from GENCODE, more phenotype- and disease-associated variants from ClinVar and ClinGen, more genomic regulatory data, and a new multiple genome alignment. PMID:26590259

  5. The Saccharomyces Genome Database Variant Viewer.

    PubMed

    Sheppard, Travis K; Hitz, Benjamin C; Engel, Stacia R; Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C; Dalusag, Kyla S; Demeter, Janos; Hellerstedt, Sage T; Karra, Kalpana; Nash, Robert S; Paskov, Kelley M; Skrzypek, Marek S; Weng, Shuai; Wong, Edith D; Cherry, J Michael

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer. PMID:26578556

  6. GWIDD: Genome-wide protein docking database

    PubMed Central

    Kundrotas, Petras J.; Zhu, Zhengwei; Vakser, Ilya A.

    2010-01-01

    Structural information on interacting proteins is important for understanding life processes at the molecular level. Genome-wide docking database is an integrated resource for structural studies of protein–protein interactions on the genome scale, which combines the available experimental data with models obtained by docking techniques. Current database version (August 2009) contains 25 559 experimental and modeled 3D structures for 771 organisms spanned over the entire universe of life from viruses to humans. Data are organized in a relational database with user-friendly search interface allowing exploration of the database content by a number of parameters. Search results can be interactively previewed and downloaded as PDB-formatted files, along with the information relevant to the specified interactions. The resource is freely available at http://gwidd.bioinformatics.ku.edu. PMID:19900970

  7. The UCSC Genome Browser database: 2014 update

    PubMed Central

    Karolchik, Donna; Barber, Galt P.; Casper, Jonathan; Clawson, Hiram; Cline, Melissa S.; Diekhans, Mark; Dreszer, Timothy R.; Fujita, Pauline A.; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A.; Heitner, Steve; Hinrichs, Angie S.; Learned, Katrina; Lee, Brian T.; Li, Chin H.; Raney, Brian J.; Rhead, Brooke; Rosenbloom, Kate R.; Sloan, Cricket A.; Speir, Matthew L.; Zweig, Ann S.; Haussler, David; Kuhn, Robert M.; Kent, W. James

    2014-01-01

    The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a large collection of organisms, primarily vertebrates, with an emphasis on the human and mouse genomes. The Browser’s web-based tools provide an integrated environment for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic data sets. As of September 2013, the database contained genomic sequence and a basic set of annotation ‘tracks’ for ∼90 organisms. Significant new annotations include a 60-species multiple alignment conservation track on the mouse, updated UCSC Genes tracks for human and mouse, and several new sets of variation and ENCODE data. New software tools include a Variant Annotation Integrator that returns predicted functional effects of a set of variants uploaded as a custom track, an extension to UCSC Genes that displays haplotype alleles for protein-coding genes and an expansion of data hubs that includes the capability to display remotely hosted user-provided assembly sequence in addition to annotation data. To improve European access, we have added a Genome Browser mirror (http://genome-euro.ucsc.edu) hosted at Bielefeld University in Germany. PMID:24270787

  8. WGE: a CRISPR database for genome engineering

    PubMed Central

    Hodgkins, Alex; Farne, Anna; Perera, Sajith; Grego, Tiago; Parry-Smith, David J.; Skarnes, William C.; Iyer, Vivek

    2015-01-01

    Summary: The rapid development of CRISPR-Cas9 mediated genome editing techniques has given rise to a number of online and stand-alone tools to find and score CRISPR sites for whole genomes. Here we describe the Wellcome Trust Sanger Institute Genome Editing database (WGE), which uses novel methods to compute, visualize and select optimal CRISPR sites in a genome browser environment. The WGE database currently stores single and paired CRISPR sites and pre-calculated off-target information for CRISPRs located in the mouse and human exomes. Scoring and display of off-target sites is simple, and intuitive, and filters can be applied to identify high-quality CRISPR sites rapidly. WGE also provides a tool for the design and display of gene targeting vectors in the same genome browser, along with gene models, protein translation and variation tracks. WGE is open, extensible and can be set up to compute and present CRISPR sites for any genome. Availability and implementation: The WGE database is freely available at www.sanger.ac.uk/htgt/wge Contact: vvi@sanger.ac.uk or skarnes@sanger.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25979474

  9. Genome Sequencing and Evolutionary Analysis of Marine Gut Fungus Aspergillus sp. Z5 from Ligia oceanica.

    PubMed

    Li, Xue; Xu, Jin-Zhong; Wang, Wen-Jie; Chen, Yi-Wang; Zheng, Dao-Qiong; Di, Ya-Nan; Li, Ping; Wang, Pin-Mei; Li, Yu-Dong

    2016-01-01

    Aspergillus sp. Z5, isolated from the gut of marine isopods, produces prolific secondary metabolites with new structure and bioactivity. Here, we report the draft sequence of the approximately 33.8-Mbp genome of this strain. To the best of our knowledge, this is the first genome sequence of Aspergillus strain isolated from marine isopod Ligia oceanica. The phylogenetic analysis supported that this strain was closely related to A. versicolor, and genomic analysis revealed that Aspergillus sp. Z5 shared a high degree of colinearity with the genome of A. sydowii. Our results may facilitate studies on discovering the biosynthetic pathways of secondary metabolites and elucidating their evolution in this species. PMID:27081303

  10. Genome Sequencing and Evolutionary Analysis of Marine Gut Fungus Aspergillus sp. Z5 from Ligia oceanica

    PubMed Central

    Li, Xue; Xu, Jin-Zhong; Wang, Wen-Jie; Chen, Yi-Wang; Zheng, Dao-Qiong; Di, Ya-Nan; Li, Ping; Wang, Pin-Mei; Li, Yu-Dong

    2016-01-01

    Aspergillus sp. Z5, isolated from the gut of marine isopods, produces prolific secondary metabolites with new structure and bioactivity. Here, we report the draft sequence of the approximately 33.8-Mbp genome of this strain. To the best of our knowledge, this is the first genome sequence of Aspergillus strain isolated from marine isopod Ligia oceanica. The phylogenetic analysis supported that this strain was closely related to A. versicolor, and genomic analysis revealed that Aspergillus sp. Z5 shared a high degree of colinearity with the genome of A. sydowii. Our results may facilitate studies on discovering the biosynthetic pathways of secondary metabolites and elucidating their evolution in this species. PMID:27081303

  11. What Can Comparative Genomics Tell Us About Species Concepts in the Genus Aspergillus?

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Understanding the nature of species’ boundaries is a fundamental question in evolutionary biology. The availability of genomes from several species of the genus Aspergillus allows us for the first time to examine the demarcation of fungal species at the whole-genome level. Here, we examine four ca...

  12. CottonDB: Cotton Genome Database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    CottonDB (www.cottondb.org) is the first and most comprehensive source of cotton genome information. CottonDB is maintained at the Southern Plains Agricultural Research Center in College Station, TX. The project includes a website and database creating a repository of information for over 355,000 ...

  13. The UCSC Genome Browser database: 2015 update.

    PubMed

    Rosenbloom, Kate R; Armstrong, Joel; Barber, Galt P; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Dreszer, Timothy R; Fujita, Pauline A; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A; Heitner, Steve; Hickey, Glenn; Hinrichs, Angie S; Hubley, Robert; Karolchik, Donna; Learned, Katrina; Lee, Brian T; Li, Chin H; Miga, Karen H; Nguyen, Ngan; Paten, Benedict; Raney, Brian J; Smit, Arian F A; Speir, Matthew L; Zweig, Ann S; Haussler, David; Kuhn, Robert M; Kent, W James

    2015-01-01

    Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), 'mined the web' for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled. PMID:25428374

  14. The UCSC Genome Browser database: 2015 update

    PubMed Central

    Rosenbloom, Kate R.; Armstrong, Joel; Barber, Galt P.; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Dreszer, Timothy R.; Fujita, Pauline A.; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A.; Heitner, Steve; Hickey, Glenn; Hinrichs, Angie S.; Hubley, Robert; Karolchik, Donna; Learned, Katrina; Lee, Brian T.; Li, Chin H.; Miga, Karen H.; Nguyen, Ngan; Paten, Benedict; Raney, Brian J.; Smit, Arian F. A.; Speir, Matthew L.; Zweig, Ann S.; Haussler, David; Kuhn, Robert M.; Kent, W. James

    2015-01-01

    Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), ‘mined the web’ for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled. PMID:25428374

  15. Orthology for comparative genomics in the mouse genome database.

    PubMed

    Dolan, Mary E; Baldarelli, Richard M; Bello, Susan M; Ni, Li; McAndrews, Monica S; Bult, Carol J; Kadin, James A; Richardson, Joel E; Ringwald, Martin; Eppig, Janan T; Blake, Judith A

    2015-08-01

    The mouse genome database (MGD) is the model organism database component of the mouse genome informatics system at The Jackson Laboratory. MGD is the international data resource for the laboratory mouse and facilitates the use of mice in the study of human health and disease. Since its beginnings, MGD has included comparative genomics data with a particular focus on human-mouse orthology, an essential component of the use of mouse as a model organism. Over the past 25 years, novel algorithms and addition of orthologs from other model organisms have enriched comparative genomics in MGD data, extending the use of orthology data to support the laboratory mouse as a model of human biology. Here, we describe current comparative data in MGD and review the history and refinement of orthology representation in this resource. PMID:26223881

  16. Benchmarking database performance for genomic data.

    PubMed

    Khushi, Matloob

    2015-06-01

    Genomic regions represent features such as gene annotations, transcription factor binding sites and epigenetic modifications. Performing various genomic operations such as identifying overlapping/non-overlapping regions or nearest gene annotations are common research needs. The data can be saved in a database system for easy management, however, there is no comprehensive database built-in algorithm at present to identify overlapping regions. Therefore I have developed a novel region-mapping (RegMap) SQL-based algorithm to perform genomic operations and have benchmarked the performance of different databases. Benchmarking identified that PostgreSQL extracts overlapping regions much faster than MySQL. Insertion and data uploads in PostgreSQL were also better, although general searching capability of both databases was almost equivalent. In addition, using the algorithm pair-wise, overlaps of >1000 datasets of transcription factor binding sites and histone marks, collected from previous publications, were reported and it was found that HNF4G significantly co-locates with cohesin subunit STAG1 (SA1).Inc. PMID:25560631

  17. The Saccharomyces Genome Database: Exploring Genome Features and Their Annotations.

    PubMed

    Cherry, J Michael

    2015-12-01

    Genomic-scale assays result in data that provide information over the entire genome. Such base pair resolution data cannot be summarized easily except via a graphical viewer. A genome browser is a tool that displays genomic data and experimental results as horizontal tracks. Genome browsers allow searches for a chromosomal coordinate or a feature, such as a gene name, but they do not allow searches by function or upstream binding site. Entry into a genome browser requires that you identify the gene name or chromosomal coordinates for a region of interest. A track provides a representation for genomic results and is displayed as a row of data shown as line segments to indicate regions of the chromosome with a feature. Another type of track presents a graph or wiggle plot that indicates the processed signal intensity computed for a particular experiment or set of experiments. Wiggle plots are typical for genomic assays such as the various next-generation sequencing methods (e.g., chromatin immunoprecipitation [ChIP]-seq or RNA-seq), where it represents a peak of DNA binding, histone modification, or the mapping of an RNA sequence. Here we explore the browser that has been built into the Saccharomyces Genome Database (SGD). PMID:26631126

  18. Complete Genome Sequence of Soil Fungus Aspergillus terreus (KM017963), a Potent Lovastatin Producer

    PubMed Central

    Bhargavi, S. D.; Praveen, V. K.

    2016-01-01

    We report the complete genome of Aspergillus terreus (KM017963), a tropical soil isolate. The genome sequence is 29 Mb, with a G+C content of 51.12%. The genome sequence of A. terreus shows the presence of the complete gene cluster responsible for lovastatin (an anti-cholesterol drug) production in a single scaffold (1.16). PMID:27284150

  19. Complete Genome Sequence of Soil Fungus Aspergillus terreus (KM017963), a Potent Lovastatin Producer.

    PubMed

    Savitha, Janakiraman; Bhargavi, S D; Praveen, V K

    2016-01-01

    We report the complete genome of Aspergillus terreus (KM017963), a tropical soil isolate. The genome sequence is 29 Mb, with a G+C content of 51.12%. The genome sequence of A. terreus shows the presence of the complete gene cluster responsible for lovastatin (an anti-cholesterol drug) production in a single scaffold (1.16). PMID:27284150

  20. The mouse genome informatics and the mouse genome database

    SciTech Connect

    Maltais, L.J.; Blackburn, R.E.; Bradt, D.W.

    1994-09-01

    The Mouse Genome Database (MGD) is a centralized, comprehensive database of the mouse genome that includes genetic mapping data, comparative mapping data, gene descriptions, mutant phenotype descriptions, strains and allelic polymorphism data, inbred strain characteristics, physical mapping data, and molecular probes and clones data. Data in MGD are obtained from the published literature and by electronic transfer from laboratories working on large backcross panels of mice. MGD provides tools that enable the user to search the database, retrieve data, generate reports, analyze data, annotate records, and build genetic maps. The Encyclopedia of the Mouse Genome provides a graphic user interface to mouse genome data. It consists of software tools including: LinkMap, a graphic display of genetic linkage maps with the ability to magnify regions of high locus density: CytoMap, a graphic display of cytogenetic maps showing banded chromosomes with cytogenetic locations of genes and chromosomal aberrations; CATS, a catalog searching tool for text retrieval of mouse locus descriptions. These software tools provide access to the following data sets: Chromosome Committee Reports, MIT Genome Center data, GBASE reports, Mouse Locus Catalog (MLC), and Mouse Cytogenetic Mapping Data. The MGD is available to the scientific community through the World Wide Web (WWW) and Gopher. In addition GBASE can be accessed via the Internet.

  1. The population genomics of mycotoxin diversity in Aspergillus flavus and Aspergillus parasiticus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Mycotoxins, and especially the aflatoxins, are an enormous problem in agriculture, with aflatoxin B1 being the most carcinogenic known natural compound. The worldwide costs associated with aflatoxin monitoring and crop losses are in the hundreds of millions of dollars. Aspergillus flavus and A. par...

  2. Orchidstra: an integrated orchid functional genomics database.

    PubMed

    Su, Chun-lin; Chao, Ya-Ting; Yen, Shao-Hua; Chen, Chun-Yi; Chen, Wan-Chieh; Chang, Yao-Chien Alex; Shih, Ming-Che

    2013-02-01

    A specialized orchid database, named Orchidstra (URL: http://orchidstra.abrc.sinica.edu.tw), has been constructed to collect, annotate and share genomic information for orchid functional genomics studies. The Orchidaceae is a large family of Angiosperms that exhibits extraordinary biodiversity in terms of both the number of species and their distribution worldwide. Orchids exhibit many unique biological features; however, investigation of these traits is currently constrained due to the limited availability of genomic information. Transcriptome information for five orchid species and one commercial hybrid has been included in the Orchidstra database. Altogether, these comprise >380,000 non-redundant orchid transcript sequences, of which >110,000 are protein-coding genes. Sequences from the transcriptome shotgun assembly (TSA) were obtained either from output reads from next-generation sequencing technologies assembled into contigs, or from conventional cDNA library approaches. An annotation pipeline using Gene Ontology, KEGG and Pfam was built to assign gene descriptions and functional annotation to protein-coding genes. Deep sequencing of small RNA was also performed for Phalaenopsis aphrodite to search for microRNAs (miRNAs), extending the information archived for this species to miRNA annotation, precursors and putative target genes. The P. aphrodite transcriptome information was further used to design probes for an oligonucleotide microarray, and expression profiling analysis was carried out. The intensities of hybridized probes derived from microarray assays of various tissues were incorporated into the database as part of the functional evidence. In the future, the content of the Orchidstra database will be expanded with transcriptome data and genomic information from more orchid species. PMID:23324169

  3. How to use the Candida Genome Database

    PubMed Central

    Skrzypek, Marek S.; Binkley, Jonathan; Sherlock, Gavin

    2016-01-01

    Summary Studying Candida biology requires access to genomic sequence data in conjunction with experimental information that provides functional context to genes and proteins. The Candida Genome Database (CGD) integrates functional information about Candida genes and their products with a set of analysis tools that facilitate searching for sets of genes and exploring their biological roles. This chapter describes how the various types of information available at CGD can be searched, retrieved, and analyzed. Starting with the guided tour of the CGD Home page and Locus Summary page, this unit shows how to navigate the various assemblies of the C. albicans genome, how to use Gene Ontology tools to make sense of large-scale data, and how to access the microarray data archived at CGD. PMID:26519061

  4. How to Use the Candida Genome Database.

    PubMed

    Skrzypek, Marek S; Binkley, Jonathan; Sherlock, Gavin

    2016-01-01

    Studying Candida biology requires access to genomic sequence data in conjunction with experimental information that provides functional context to genes and proteins. The Candida Genome Database (CGD) integrates functional information about Candida genes and their products with a set of analysis tools that facilitate searching for sets of genes and exploring their biological roles. This chapter describes how the various types of information available at CGD can be searched, retrieved, and analyzed. Starting with the guided tour of the CGD Home page and Locus Summary page, this unit shows how to navigate the various assemblies of the C. albicans genome, how to use Gene Ontology tools to make sense of large-scale data, and how to access the microarray data archived at CGD. PMID:26519061

  5. Requirements and standards for organelle genome databases

    SciTech Connect

    Boore, Jeffrey L.

    2006-01-09

    Mitochondria and plastids (collectively called organelles)descended from prokaryotes that adopted an intracellular, endosymbioticlifestyle within early eukaryotes. Comparisons of their remnant genomesaddress a wide variety of biological questions, especially when includingthe genomes of their prokaryotic relatives and the many genes transferredto the eukaryotic nucleus during the transitions from endosymbiont toorganelle. The pace of producing complete organellar genome sequences nowmakes it unfeasible to do broad comparisons using the primary literatureand, even if it were feasible, it is now becoming uncommon for journalsto accept detailed descriptions of genome-level features. Unfortunatelyno database is currently useful for this task, since they have littlestandardization and are riddled with error. Here I outline what iscurrently wrong and what must be done to make this data useful to thescientific community.

  6. Draft Genome Sequence of Aspergillus niger Strain An76

    PubMed Central

    Gong, Weili; Cheng, Zhi; Zhang, Huaiqiang; Liu, Lin; Gao, Peiji

    2016-01-01

    The filamentous fungus Aspergillus niger has become one of the most important fungi in industrial biotechnology, and it can efficiently secrete both polysaccharide-degrading enzymes and organic acids. We report here the 6,074,961,332-bp draft sequence of A. niger strain An76, and the findings provide important information related to its lignocellulose-degrading ability. PMID:26893421

  7. Potential of Aspergillus flavus Genomics for Applications in Biotechnology

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus is a common saprophyte and opportunistic pathogen that survives in the natural environment by extracting nutrition from plant debris, insect carcasses and a variety of other carbon sources. A. flavus produces numerous secondary metabolites and hydrolytic enzymes. The primary obj...

  8. Draft Genome Sequences of Two Aspergillus fumigatus Strains, Isolated from the International Space Station.

    PubMed

    Singh, Nitin Kumar; Blachowicz, Adriana; Checinska, Aleksandra; Wang, Clay; Venkateswaran, Kasthuri

    2016-01-01

    Draft genome sequences of Aspergillus fumigatus strains (ISSFT-021 and IF1SW-F4), opportunistic pathogens isolated from the International Space Station (ISS), were assembled to facilitate investigations of the nature of the virulence characteristics of the ISS strains to other clinical strains isolated on Earth. PMID:27417828

  9. Draft Genome Sequences of Two Aspergillus fumigatus Strains, Isolated from the International Space Station

    PubMed Central

    Singh, Nitin Kumar; Blachowicz, Adriana; Checinska, Aleksandra; Wang, Clay

    2016-01-01

    Draft genome sequences of Aspergillus fumigatus strains (ISSFT-021 and IF1SW-F4), opportunistic pathogens isolated from the International Space Station (ISS), were assembled to facilitate investigations of the nature of the virulence characteristics of the ISS strains to other clinical strains isolated on Earth. PMID:27417828

  10. Advances in Aspergillus secondary metabolite research in the post-genomic era

    PubMed Central

    Sanchez, James F.; Somoza, Amber D.; Keller, Nancy P.

    2015-01-01

    This review studies the impact of whole genome sequencing on Aspergillus secondary metabolite research. There has been a proliferation of many new, intriguing discoveries since sequencing data became widely available. What is more, the genomes disclosed the surprising finding that there are many more secondary metabolite biosynthetic pathways than laboratory research had suggested. Activating these pathways has been met with some success, but many more dormant genes remain to be awakened. PMID:22228366

  11. Genome Sequences of Eight Aspergillus flavus spp. and One A. parasiticus sp., Isolated from Peanut Seeds in Georgia

    PubMed Central

    Wang, Xinye Monica; Palencia, Edwin R.

    2016-01-01

    Aspergillus flavus and A. parasiticus fungi produce carcinogenic mycotoxins in peanut seeds, causing considerable impact on both human health and the economy. Here, we report nine genome sequences of Aspergillus spp., isolated from Georgia peanut seeds in 2014. The information obtained will lead to further biodiversity studies that are essential for developing control strategies. PMID:27081142

  12. Genome Sequences of Eight Aspergillus flavus spp. and One A. parasiticus sp., Isolated From Peanut Seeds in Georgia

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus and A. parasiticus fungi, carcinogen-mycotoxins producers, infect peanut seeds, causing considerable impact on both human health and the economy. Here we report 9 genome sequences of Aspergillus spp. isolated from peanut seeds. The information obtained will allow conducting biodiv...

  13. Genomic Context of Azole Resistance Mutations in Aspergillus fumigatus Determined Using Whole-Genome Sequencing

    PubMed Central

    Abdolrasouli, Alireza; Rhodes, Johanna; Beale, Mathew A.; Hagen, Ferry; Rogers, Thomas R.; Chowdhary, Anuradha; Meis, Jacques F.

    2015-01-01

    ABSTRACT A rapid and global emergence of azole resistance has been observed in the pathogenic fungus Aspergillus fumigatus over the past decade. The dominant resistance mechanism appears to be of environmental origin and involves mutations in the cyp51A gene, which encodes a protein targeted by triazole antifungal drugs. Whole-genome sequencing (WGS) was performed for high-resolution single-nucleotide polymorphism (SNP) analysis of 24 A. fumigatus isolates, including azole-resistant and susceptible clinical and environmental strains obtained from India, the Netherlands, and the United Kingdom, in order to assess the utility of WGS for characterizing the alleles causing resistance. WGS analysis confirmed that TR34/L98H (a mutation comprising a tandem repeat [TR] of 34 bases in the promoter of the cyp51A gene and a leucine-to-histidine change at codon 98) is the sole mechanism of azole resistance among the isolates tested in this panel of isolates. We used population genomic analysis and showed that A. fumigatus was panmictic, with as much genetic diversity found within a country as is found between continents. A striking exception to this was shown in India, where isolates are highly related despite being isolated from both clinical and environmental sources across >1,000 km; this broad occurrence suggests a recent selective sweep of a highly fit genotype that is associated with the TR34/L98H allele. We found that these sequenced isolates are all recombining, showing that azole-resistant alleles are segregating into diverse genetic backgrounds. Our analysis delineates the fundamental population genetic parameters that are needed to enable the use of genome-wide association studies to identify the contribution of SNP diversity to the generation and spread of azole resistance in this medically important fungus. PMID:26037120

  14. Biocuration at the Saccharomyces genome database.

    PubMed

    Skrzypek, Marek S; Nash, Robert S

    2015-08-01

    Saccharomyces Genome Database is an online resource dedicated to managing information about the biology and genetics of the model organism, yeast (Saccharomyces cerevisiae). This information is derived primarily from scientific publications through a process of human curation that involves manual extraction of data and their organization into a comprehensive system of knowledge. This system provides a foundation for further analysis of experimental data coming from research on yeast as well as other organisms. In this review we will demonstrate how biocuration and biocurators add a key component, the biological context, to our understanding of how genes, proteins, genomes and cells function and interact. We will explain the role biocurators play in sifting through the wealth of biological data to incorporate and connect key information. We will also discuss the many ways we assist researchers with their various research needs. We hope to convince the reader that manual curation is vital in converting the flood of data into organized and interconnected knowledge, and that biocurators play an essential role in the integration of scientific information into a coherent model of the cell. PMID:25997651

  15. Biocuration at the Saccharomyces Genome Database

    PubMed Central

    Skrzypek, Marek S.; Nash, Robert S.

    2015-01-01

    Saccharomyces Genome Database is an online resource dedicated to managing information about the biology and genetics of the model organism, yeast (Saccharomyces cerevisiae). This information is derived primarily from scientific publications through a process of human curation that involves manual extraction of data and their organization into a comprehensive system of knowledge. This system provides a foundation for further analysis of experimental data coming from research on yeast as well as other organisms. In this review we will demonstrate how biocuration and biocurators add a key component, the biological context, to our understanding of how genes, proteins, genomes and cells function and interact. We will explain the role biocurators play in sifting through the wealth of biological data to incorporate and connect key information. We will also discuss the many ways we assist researchers with their various research needs. We hope to convince the reader that manual curation is vital in converting the flood of data into organized and interconnected knowledge, and that biocurators play an essential role in the integration of scientific information into a coherent model of the cell. PMID:25997651

  16. Genome-wide analysis of the Zn(II)₂Cys₆ zinc cluster-encoding gene family in Aspergillus flavus.

    PubMed

    Chang, Perng-Kuang; Ehrlich, Kenneth C

    2013-05-01

    Proteins with a Zn(II)₂Cys₆ domain, Cys-X₂-Cys-X₆-Cys-X₅₋₁₂-Cys-X₂-Cys-X₆₋₉-Cys (hereafter, referred to as the C6 domain), form a subclass of zinc finger proteins found exclusively in fungi and yeast. Genome sequence databases of Saccharomyces cerevisiae and Candida albicans have provided an overview of this family of genes. Annotation of this gene family in most fungal genomes is still far from perfect and refined bioinformatic algorithms are urgently needed. Aspergillus flavus is a saprophytic soil fungus that can produce the carcinogenic aflatoxin. It is the second leading causative agent of invasive aspergillosis. The 37-Mb genome of A. flavus is predicted to encode 12,000 proteins. Two and a half percent of the total proteins are estimated to contain the C6 domain, more than twofold greater than those estimated for yeast, which is about 1 %. The variability in the spacing between cysteines, C₃-C₄ and C₅-C₆, in the zinc cluster enables classification of the domains into distinct subgroups, which are also well conserved in Aspergillus nidulans. Sixty-six percent (202/306) of the A. flavus C6 proteins contain a specific transcription factor domain, and 7 % contain a domain of unknown function, DUF3468. Two A. nidulans C6 proteins containing the DUF3468 are involved in asexual conidiation and another two in sexual differentiation. In the anamorphic A. flavus, a homolog of the latter lacks the C6 domain. A. flavus being heterothallic and reproducing mainly through conidiation appears to have lost some components involved in homothallic sexual development. Of the 55 predicted gene clusters thought to be involved in production of secondary metabolites, only about half have a C6-encoding gene in or near the gene clusters. The features revealed by the A. flavus C6 proteins likely are common for other ascomycete fungi. PMID:23563886

  17. HGT-Finder: A New Tool for Horizontal Gene Transfer Finding and Application to Aspergillus genomes

    PubMed Central

    Nguyen, Marcus; Ekstrom, Alex; Li, Xueqiong; Yin, Yanbin

    2015-01-01

    Horizontal gene transfer (HGT) is a fast-track mechanism that allows genetically unrelated organisms to exchange genes for rapid environmental adaptation. We developed a new phyletic distribution-based software, HGT-Finder, which implements a novel bioinformatics algorithm to calculate a horizontal transfer index and a probability value for each query gene. Applying this new tool to the Aspergillus fumigatus, Aspergillus flavus, and Aspergillus nidulans genomes, we found 273, 542, and 715 transferred genes (HTGs), respectively. HTGs have shorter length, higher guanine-cytosine (GC) content, and relaxed selection pressure. Metabolic process and secondary metabolism functions are significantly enriched in HTGs. Gene clustering analysis showed that 61%, 41% and 74% of HTGs in the three genomes form physically linked gene clusters (HTGCs). Overlapping manually curated, secondary metabolite gene clusters (SMGCs) with HTGCs found that 9 of the 33 A. fumigatus SMGCs and 31 of the 65 A. nidulans SMGCs share genes with HTGCs, and that HTGs are significantly enriched in SMGCs. Our genome-wide analysis thus presented very strong evidence to support the hypothesis that HGT has played a very critical role in the evolution of SMGCs. The program is freely available at http://cys.bios.niu.edu/HGTFinder/HGTFinder.tar.gz. PMID:26473921

  18. MIPS: a database for genomes and protein sequences

    PubMed Central

    Mewes, H. W.; Frishman, D.; Gruber, C.; Geier, B.; Haase, D.; Kaps, A.; Lemcke, K.; Mannhaupt, G.; Pfeiffer, F.; Schüller, C.; Stocker, S.; Weil, B.

    2000-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried, near Munich, Germany, continues its longstanding tradition to develop and maintain high quality curated genome databases. In addition, efforts have been intensified to cover the wealth of complete genome sequences in a systematic, comprehensive form. Bioinformatics, supporting national as well as European sequencing and functional analysis projects, has resulted in several up-to-date genome-oriented databases. This report describes growing databases reflecting the progress of sequencing the Arabidopsis thaliana (MATDB) and Neurospora crassa genomes (MNCDB), the yeast genome database (MYGD) extended by functional analysis data, the database of annotated human EST-clusters (HIB) and the database of the complete cDNA sequences from the DHGP (German Human Genome Project). It also contains information on the up-to-date database of complete genomes (PEDANT), the classification of protein sequences (ProtFam) and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database. These databases can be accessed through the MIPS WWW server (http://www. mips.biochem.mpg.de ). PMID:10592176

  19. Signaling pathways for stress responses and adaptation in Aspergillus species: stress biology in the post-genomic era.

    PubMed

    Hagiwara, Daisuke; Sakamoto, Kazutoshi; Abe, Keietsu; Gomi, Katsuya

    2016-09-01

    Aspergillus species are among the most important filamentous fungi in terms of industrial use and because of their pathogenic or toxin-producing features. The genomes of several Aspergillus species have become publicly available in this decade, and genomic analyses have contributed to an integrated understanding of fungal biology. Stress responses and adaptation mechanisms have been intensively investigated using the accessible genome infrastructure. Mitogen-activated protein kinase (MAPK) cascades have been highlighted as being fundamentally important in fungal adaptation to a wide range of stress conditions. Reverse genetics analyses have uncovered the roles of MAPK pathways in osmotic stress, cell wall stress, development, secondary metabolite production, and conidia stress resistance. This review summarizes the current knowledge on the stress biology of Aspergillus species, illuminating what we have learned from the genomic data in this "post-genomic era." PMID:27007956

  20. Recent advances in genome mining of secondary metabolites in Aspergillus terreus

    PubMed Central

    Guo, Chun-Jun; Wang, Clay C. C.

    2014-01-01

    Filamentous fungi are rich resources of secondary metabolites (SMs) with a variety of interesting biological activities. Recent advances in genome sequencing and techniques in genetic manipulation have enabled researchers to study the biosynthetic genes of these SMs. Aspergillus terreus is the well-known producer of lovastatin, a cholesterol-lowering drug. This fungus also produces other SMs, including acetylaranotin, butyrolactones, and territram, with interesting bioactivities. This review will cover recent progress in genome mining of SMs identified in this fungus. The identification and characterization of the gene cluster for these SMs, as well as the proposed biosynthetic pathways, will be discussed in depth. PMID:25566227

  1. CADRE: the Central Aspergillus Data REpository 2012.

    PubMed

    Mabey Gilsenan, Jane; Cooley, John; Bowyer, Paul

    2012-01-01

    The Central Aspergillus Data REpository (CADRE; http://www.cadre-genomes.org.uk) is a public resource for genomic data extracted from species of Aspergillus. It provides an array of online tools for searching and visualising features of this significant fungal genus. CADRE arose from a need within the medical community to understand the human pathogen Aspergillus fumigatus. Due to the paucity of Aspergillus genomic resources 10 years ago, the long-term goal of this project was to collate and maintain Aspergillus genomes as they became available. Since our first release in 2004, the resource has expanded to encompass annotated sequence for eight other Aspergilli and provides much needed support to the international Aspergillus research community. Recent developments, however, in sequencing technology are creating a vast amount of genomic data and, as a result, we shortly expect a tidal wave of Aspergillus data. In preparation for this, we have upgraded the database and software suite. This not only enables better management of more complex data sets, but also improves annotation by providing access to genome comparison data and the integration of high-throughput data. PMID:22080563

  2. Tomato functional genomics database (TFGD): a comprehensive collection and analysis package for tomato functional genomics

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Tomato Functional Genomics Database (TFGD; http://ted.bti.cornell.edu) provides a comprehensive systems biology resource to store, mine, analyze, visualize and integrate large-scale tomato functional genomics datasets. The database is expanded from the previously described Tomato Expression Database...

  3. Draft Genome Sequences of Two Closely Related Aflatoxigenic Aspergillus Species Obtained from the Ivory Coast.

    PubMed

    Moore, Geromy G; Mack, Brian M; Beltz, Shannon B

    2016-03-01

    Aspergillus ochraceoroseus and Aspergillus rambellii were isolated from soil detritus in Taï National Park, Ivory Coast, Africa. The Type strain for each species happens to be the only representative ever sampled. Both species secrete copious amounts of aflatoxin B1 and sterigmatocystin, because each of their genomes contains clustered genes for biosynthesis of these mycotoxins. We sequenced their genomes using a personal genome machine and found them to be smaller in size (A. ochraceoroseus = 23.9 Mb and A. rambellii = 26.1 Mb), as well as in numbers of predicted genes (7,837 and 7,807, respectively), compared to other sequenced Aspergilli. Our findings also showed that the A. ochraceoroseus Type strain contains a single MAT1-1 gene, while the Type strain of A. rambellii contains a single MAT1-2 gene, indicating that these species are heterothallic (self-infertile). These draft genomes will be useful for understanding the genes and pathways necessary for the cosynthesis of these two toxic secondary metabolites as well as the evolution of these pathways in aflatoxigenic fungi. PMID:26637470

  4. An extended bioreaction database that significantly improves reconstruction and analysis of genome-scale metabolic networks.

    PubMed

    Stelzer, Michael; Sun, Jibin; Kamphans, Tom; Fekete, Sándor P; Zeng, An-Ping

    2011-11-01

    The bioreaction database established by Ma and Zeng (Bioinformatics, 2003, 19, 270-277) for in silico reconstruction of genome-scale metabolic networks has been widely used. Based on more recent information in the reference databases KEGG LIGAND and Brenda, we upgrade the bioreaction database in this work by almost doubling the number of reactions from 3565 to 6851. Over 70% of the reactions have been manually updated/revised in terms of reversibility, reactant pairs, currency metabolites and error correction. For the first time, 41 spontaneous sugar mutarotation reactions are introduced into the biochemical database. The upgrade significantly improves the reconstruction of genome scale metabolic networks. Many gaps or missing biochemical links can be recovered, as exemplified with three model organisms Homo sapiens, Aspergillus niger, and Escherichia coli. The topological parameters of the constructed networks were also largely affected, however, the overall network structure remains scale-free. Furthermore, we consider the problem of computing biologically feasible shortest paths in reconstructed metabolic networks. We show that these paths are hard to compute and present solutions to find such paths in networks of small and medium size. PMID:21952610

  5. GDR (Genome Database for Rosaceae): integrated web-database for Rosaceae genomics and genetics data.

    PubMed

    Jung, Sook; Staton, Margaret; Lee, Taein; Blenda, Anna; Svancara, Randall; Abbott, Albert; Main, Dorrie

    2008-01-01

    The Genome Database for Rosaceae (GDR) is a central repository of curated and integrated genetics and genomics data of Rosaceae, an economically important family which includes apple, cherry, peach, pear, raspberry, rose and strawberry. GDR contains annotated databases of all publicly available Rosaceae ESTs, the genetically anchored peach physical map, Rosaceae genetic maps and comprehensively annotated markers and traits. The ESTs are assembled to produce unigene sets of each genus and the entire Rosaceae. Other annotations include putative function, microsatellites, open reading frames, single nucleotide polymorphisms, gene ontology terms and anchored map position where applicable. Most of the published Rosaceae genetic maps can be viewed and compared through CMap, the comparative map viewer. The peach physical map can be viewed using WebFPC/WebChrom, and also through our integrated GDR map viewer, which serves as a portal to the combined genetic, transcriptome and physical mapping information. ESTs, BACs, markers and traits can be queried by various categories and the search result sites are linked to the mapping visualization tools. GDR also provides online analysis tools such as a batch BLAST/FASTA server for the GDR datasets, a sequence assembly server and microsatellite and primer detection tools. GDR is available at http://www.rosaceae.org. PMID:17932055

  6. Exploration of the Chemical Space of Public Genomic Databases

    EPA Science Inventory

    The current project aims to chemically index the content of public genomic databases to make these data accessible in relation to other publicly available, chemically-indexed toxicological information.

  7. Uniform standards for genome databases in forest and fruit trees

    Technology Transfer Automated Retrieval System (TEKTRAN)

    TreeGenes and tfGDR serve the international forestry and fruit tree genomics research communities, respectively. These databases hold similar sequence data and provide resources for the submission and recovery of this information in order to enable comparative genomics research. Large-scale genotype...

  8. Design and implementation of the cacao genome database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The Cacao Genome Database (CGD, www.cacaogenomedb.org) is being developed to provide a comprehensive data mining resource of genomic, genetic and breeding data for Theobroma cacao. Designed using Chado and a collection of Drupal modules, known as Tripal, CGD currently contains the genetically anchor...

  9. : a database of ciliate genome rearrangements

    PubMed Central

    Burns, Jonathan; Kukushkin, Denys; Lindblad, Kelsi; Chen, Xiao; Jonoska, Nataša; Landweber, Laura F.

    2016-01-01

    Ciliated protists exhibit nuclear dimorphism through the presence of somatic macronuclei (MAC) and germline micronuclei (MIC). In some ciliates, DNA from precursor segments in the MIC genome rearranges to form transcriptionally active genes in the mature MAC genome, making these ciliates model organisms to study the process of somatic genome rearrangement. Similar broad scale, somatic rearrangement events occur in many eukaryotic cells and tumors. The (http://oxytricha.princeton.edu/mds_ies_db) is a database of genome recombination and rearrangement annotations, and it provides tools for visualization and comparative analysis of precursor and product genomes. The database currently contains annotations for two completely sequenced ciliate genomes: Oxytricha trifallax and Tetrahymena thermophila. PMID:26586804

  10. The launch of the Aspergillus flavus genome browser and limited release of whole genome sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxins are produced by Aspergillus flavus and A. parasiticus. These toxic and carcinogenic compounds contaminate pre-harvest agricultural crops in the field and post-harvest grains during storage. In order to reduce and eliminate aflatoxin contamination of food and feed, Aspergilllus flavus wh...

  11. Genome Sequence Databases (Overview): Sequencing and Assembly

    SciTech Connect

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  12. Rice Annotation Database (RAD): a contig-oriented database for map-based rice genomics.

    PubMed

    Ito, Yuichi; Arikawa, Kohji; Antonio, Baltazar A; Ohta, Isamu; Naito, Shinji; Mukai, Yoshiyuki; Shimano, Atsuko; Masukawa, Masatoshi; Shibata, Michie; Yamamoto, Mayu; Ito, Yukiyo; Yokoyama, Junri; Sakai, Yasumichi; Sakata, Katsumi; Nagamura, Yoshiaki; Namiki, Nobukazu; Matsumoto, Takashi; Higo, Kenichi; Sasaki, Takuji

    2005-01-01

    A contig-oriented database for annotation of the rice genome has been constructed to facilitate map-based rice genomics. The Rice Annotation Database has the following functional features: (i) extensive effort of manual annotations of P1-derived artificial chromosome/bacterial artificial chromosome clones can be merged at chromosome and contig-level; (ii) concise visualization of the annotation information such as the predicted genes, results of various prediction programs (RiceHMM, Genscan, Genscan+, Fgenesh, GeneMark, etc.), homology to expressed sequence tag, full-length cDNA and protein; (iii) user-friendly clone / gene query system; (iv) download functions for nucleotide, amino acid and coding sequences; (v) analysis of various features of the genome (GC-content, average value, etc.); and (vi) genome-wide homology search (BLAST) of contig- and chromosome-level genome sequence to allow comparative analysis with the genome sequence of other organisms. As of October 2004, the database contains a total of 215 Mb sequence with relevant annotation results including 30 000 manually curated genes. The database can provide the latest information on manual annotation as well as a comprehensive structural analysis of various features of the rice genome. The database can be accessed at http://rad.dna.affrc.go.jp/. PMID:15608281

  13. Kazusa Marker DataBase: a database for genomics, genetics, and molecular breeding in plants.

    PubMed

    Shirasawa, Kenta; Isobe, Sachiko; Tabata, Satoshi; Hirakawa, Hideki

    2014-09-01

    In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10 plant species have been targeted: tomato (Solanum lycopersicum), pepper (Capsicum annuum), strawberry (Fragaria × ananassa), radish (Raphanus sativus), Lotus japonicus, soybean (Glycine max), peanut (Arachis hypogaea), red clover (Trifolium pratense), white clover (Trifolium repens), and eucalyptus (Eucalyptus camaldulensis). In addition, the number of plant species registered in this database will be increased as our research progresses. The Kazusa Marker DataBase will be a useful tool for both basic and applied sciences, such as genomics, genetics, and molecular breeding in crops. PMID:25320561

  14. Kazusa Marker DataBase: a database for genomics, genetics, and molecular breeding in plants

    PubMed Central

    Shirasawa, Kenta; Isobe, Sachiko; Tabata, Satoshi; Hirakawa, Hideki

    2014-01-01

    In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10 plant species have been targeted: tomato (Solanum lycopersicum), pepper (Capsicum annuum), strawberry (Fragaria × ananassa), radish (Raphanus sativus), Lotus japonicus, soybean (Glycine max), peanut (Arachis hypogaea), red clover (Trifolium pratense), white clover (Trifolium repens), and eucalyptus (Eucalyptus camaldulensis). In addition, the number of plant species registered in this database will be increased as our research progresses. The Kazusa Marker DataBase will be a useful tool for both basic and applied sciences, such as genomics, genetics, and molecular breeding in crops. PMID:25320561

  15. The UCSC Genome Browser database: extensions and updates 2013

    PubMed Central

    Meyer, Laurence R.; Zweig, Ann S.; Hinrichs, Angie S.; Karolchik, Donna; Kuhn, Robert M.; Wong, Matthew; Sloan, Cricket A.; Rosenbloom, Kate R.; Roe, Greg; Rhead, Brooke; Raney, Brian J.; Pohl, Andy; Malladi, Venkat S.; Li, Chin H.; Lee, Brian T.; Learned, Katrina; Kirkup, Vanessa; Hsu, Fan; Heitner, Steve; Harte, Rachel A.; Haeussler, Maximilian; Guruvadoo, Luvina; Goldman, Mary; Giardine, Belinda M.; Fujita, Pauline A.; Dreszer, Timothy R.; Diekhans, Mark; Cline, Melissa S.; Clawson, Hiram; Barber, Galt P.; Haussler, David; Kent, W. James

    2013-01-01

    The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a wide variety of organisms. The Browser is an integrated tool set for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic datasets. As of September 2012, genomic sequence and a basic set of annotation ‘tracks’ are provided for 63 organisms, including 26 mammals, 13 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6 worms, yeast and sea hare. In the past year 19 new genome assemblies have been added, and we anticipate releasing another 28 in early 2013. Further, a large number of annotation tracks have been either added, updated by contributors or remapped to the latest human reference genome. Among these are an updated UCSC Genes track for human and mouse assemblies. We have also introduced several features to improve usability, including new navigation menus. This article provides an update to the UCSC Genome Browser database, which has been previously featured in the Database issue of this journal. PMID:23155063

  16. Integration of new alternative reference strain genome sequences into the Saccharomyces genome database.

    PubMed

    Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C; Dalusag, Kyla; Demeter, Janos; Engel, Stacia; Hellerstedt, Sage T; Karra, Kalpana; Hitz, Benjamin C; Nash, Robert S; Paskov, Kelley; Sheppard, Travis; Skrzypek, Marek; Weng, Shuai; Wong, Edith; Michael Cherry, J

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. To provide a wider scope of genetic and phenotypic variation in yeast, the genome sequences and their corresponding annotations from 11 alternative S. cerevisiae reference strains have been integrated into SGD. Genomic and protein sequence information for genes from these strains are now available on the Sequence and Protein tab of the corresponding Locus Summary pages. We illustrate how these genome sequences can be utilized to aid our understanding of strain-specific functional and phenotypic differences.Database URL: www.yeastgenome.org. PMID:27252399

  17. Integration of new alternative reference strain genome sequences into the Saccharomyces genome database

    PubMed Central

    Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C.; Dalusag, Kyla; Demeter, Janos; Engel, Stacia; Hellerstedt, Sage T.; Karra, Kalpana; Hitz, Benjamin C.; Nash, Robert S.; Paskov, Kelley; Sheppard, Travis; Skrzypek, Marek; Weng, Shuai; Wong, Edith; Michael Cherry, J.

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. To provide a wider scope of genetic and phenotypic variation in yeast, the genome sequences and their corresponding annotations from 11 alternative S. cerevisiae reference strains have been integrated into SGD. Genomic and protein sequence information for genes from these strains are now available on the Sequence and Protein tab of the corresponding Locus Summary pages. We illustrate how these genome sequences can be utilized to aid our understanding of strain-specific functional and phenotypic differences. Database URL: www.yeastgenome.org PMID:27252399

  18. Corruption of genomic databases with anomalous sequence.

    PubMed Central

    Lamperti, E D; Kittelberger, J M; Smith, T F; Villa-Komaroff, L

    1992-01-01

    We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%. PMID:1614861

  19. Aspergillus flavus Genomics for Development of Strategies to Interrupt Aflatoxin Formation and Discovery of Fungal Enzymes for Biofuel Production

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus produces toxic and the most carcinogenic mycotoxins, the aflatoxins. The primary objectives of our A. flavus genomics program are to reduce and eliminate aflatoxin contamination in food and feed and control fungal infection in preharvest crops such as corn, cotton, peanut and tre...

  20. Genome Sequences of Eight Aspergillus flavus spp. and One A. parasiticus sp., Isolated from Peanut Seeds in Georgia.

    PubMed

    Faustinelli, Paola C; Wang, Xinye Monica; Palencia, Edwin R; Arias, Renée S

    2016-01-01

    Aspergillus flavusandA. parasiticusfungi produce carcinogenic mycotoxins in peanut seeds, causing considerable impact on both human health and the economy. Here, we report nine genome sequences ofAspergillusspp., isolated from Georgia peanut seeds in 2014. The information obtained will lead to further biodiversity studies that are essential for developing control strategies. PMID:27081142

  1. Prokaryotic Genomes from Microbes Online Database

    DOE Data Explorer

    Alm, Eric J.; Huang, Katherine H.; Price, Morgan N.; Koche, Richard P.; Keller, Keith; Dubchak, Inna L.; Arkin, Adam P.

    To describe the potential functions of genes, MicrobesOnline includes protein family analyses (from InterPro and COG), metabolic maps (from KEGG), links to research papers (from UniProt and PubMed), and operon predictions for every genome. To examine each gene's evolutionary history, MicrobesOnline includes precomputed phylogenetic trees for all the gene families. It displays gene trees with genomic context or it compares the gene tree to the species tree. The tools provided with MicrobesOnline allow users to: compute customized motifs, sequence alignments, and phylogenetic trees change expression patterns in metabolic maps annotate genes in various ways. A browse tree tool and a genome browser are available, along with specialized search capabilities. (Specialized Interface)

  2. OryzaGenome: Genome Diversity Database of Wild Oryza Species

    PubMed Central

    Ohyanagi, Hajime; Ebata, Toshinobu; Huang, Xuehui; Gong, Hao; Fujita, Masahiro; Mochizuki, Takako; Toyoda, Atsushi; Fujiyama, Asao; Kaminuma, Eli; Nakamura, Yasukazu; Feng, Qi; Wang, Zi-Xuan; Han, Bin; Kurata, Nori

    2016-01-01

    The species in the genus Oryza, encompassing nine genome types and 23 species, are a rich genetic resource and may have applications in deeper genomic analyses aiming to understand the evolution of plant genomes. With the advancement of next-generation sequencing (NGS) technology, a flood of Oryza species reference genomes and genomic variation information has become available in recent years. This genomic information, combined with the comprehensive phenotypic information that we are accumulating in our Oryzabase, can serve as an excellent genotype–phenotype association resource for analyzing rice functional and structural evolution, and the associated diversity of the Oryza genus. Here we integrate our previous and future phenotypic/habitat information and newly determined genotype information into a united repository, named OryzaGenome, providing the variant information with hyperlinks to Oryzabase. The current version of OryzaGenome includes genotype information of 446 O. rufipogon accessions derived by imputation and of 17 accessions derived by imputation-free deep sequencing. Two variant viewers are implemented: SNP Viewer as a conventional genome browser interface and Variant Table as a text-based browser for precise inspection of each variant one by one. Portable VCF (variant call format) file or tab-delimited file download is also available. Following these SNP (single nucleotide polymorphism) data, reference pseudomolecules/scaffolds/contigs and genome-wide variation information for almost all of the closely and distantly related wild Oryza species from the NIG Wild Rice Collection will be available in future releases. All of the resources can be accessed through http://viewer.shigen.info/oryzagenome/. PMID:26578696

  3. GenColors-based comparative genome databases for small eukaryotic genomes

    PubMed Central

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources. PMID:23193285

  4. Rapid genome resequencing of an atoxigenic strain of Aspergillus carbonarius

    SciTech Connect

    Cabañes, F. Javier; Sanseverino, Walter; Castellá, Gemma; Bragulat, M. Rosa; Cigliano, Riccardo Aiese; Sánchez, Armand

    2015-03-13

    In microorganisms, Ion Torrent sequencing technology has been proved to be useful in whole-genome sequencing of bacterial genomes (5 Mbp). In our study, for the first time we used this technology to perform a resequencing approach in a whole fungal genome (36 Mbp), a non-ochratoxin A producing strain of Aspergillus carbonarius. Ochratoxin A (OTA) is a potent nephrotoxin which is found mainly in cereals and their products, but it also occurs in a variety of common foods and beverages. Due to the fact that this strain does not produce OTA, we focused some of the bioinformatics analyses in genes involved in OTA biosynthesis, using a reference genome of an OTA producing strain of the same species. This study revealed that in the atoxigenic strain there is a high accumulation of nonsense and missense mutations in several genes. Importantly, a two fold increase in gene mutation ratio was observed in PKS and NRPS encoding genes which are suggested to be involved in OTA biosynthesis.

  5. Databases of homologous gene families for comparative genomics

    PubMed Central

    Penel, Simon; Arigon, Anne-Muriel; Dufayard, Jean-François; Sertier, Anne-Sophie; Daubin, Vincent; Duret, Laurent; Gouy, Manolo; Perrière, Guy

    2009-01-01

    Background Comparative genomics is a central step in many sequence analysis studies, from gene annotation and the identification of new functional regions in genomes, to the study of evolutionary processes at the molecular level (speciation, single gene or whole genome duplications, etc.) and phylogenetics. In that context, databases providing users high quality homologous families and sequence alignments as well as phylogenetic trees based on state of the art algorithms are becoming indispensable. Methods We developed an automated procedure allowing massive all-against-all similarity searches, gene clustering, multiple alignments computation, and phylogenetic trees construction and reconciliation. The application of this procedure to a very large set of sequences is possible through parallel computing on a large computer cluster. Results Three databases were developed using this procedure: HOVERGEN, HOGENOM and HOMOLENS. These databases share the same architecture but differ in their content. HOVERGEN contains sequences from vertebrates, HOGENOM is mainly devoted to completely sequenced microbial organisms, and HOMOLENS is devoted to metazoan genomes from Ensembl. Access to the databases is provided through Web query forms, a general retrieval system and a client-server graphical interface. The later can be used to perform tree-pattern based searches allowing, among other uses, to retrieve sets of orthologous genes. The three databases, as well as the software required to build and query them, can be used or downloaded from the PBIL (Pôle Bioinformatique Lyonnais) site at . PMID:19534752

  6. Megx.net: integrated database resource for marine ecological genomics.

    PubMed

    Kottmann, Renzo; Kostadinov, Ivalyo; Duhaime, Melissa Beth; Buttigieg, Pier Luigi; Yilmaz, Pelin; Hankeln, Wolfgang; Waldmann, Jost; Glöckner, Frank Oliver

    2010-01-01

    Megx.net is a database and portal that provides integrated access to georeferenced marker genes, environment data and marine genome and metagenome projects for microbial ecological genomics. All data are stored in the Microbial Ecological Genomics DataBase (MegDB), which is subdivided to hold both sequence and habitat data and global environmental data layers. The extended system provides access to several hundreds of genomes and metagenomes from prokaryotes and phages, as well as over a million small and large subunit ribosomal RNA sequences. With the refined Genes Mapserver, all data can be interactively visualized on a world map and statistics describing environmental parameters can be calculated. Sequence entries have been curated to comply with the proposed minimal standards for genomes and metagenomes (MIGS/MIMS) of the Genomic Standards Consortium. Access to data is facilitated by Web Services. The updated megx.net portal offers microbial ecologists greatly enhanced database content, and new features and tools for data analysis, all of which are freely accessible from our webpage http://www.megx.net. PMID:19858098

  7. Megx.net: integrated database resource for marine ecological genomics

    PubMed Central

    Kottmann, Renzo; Kostadinov, Ivalyo; Duhaime, Melissa Beth; Buttigieg, Pier Luigi; Yilmaz, Pelin; Hankeln, Wolfgang; Waldmann, Jost; Glöckner, Frank Oliver

    2010-01-01

    Megx.net is a database and portal that provides integrated access to georeferenced marker genes, environment data and marine genome and metagenome projects for microbial ecological genomics. All data are stored in the Microbial Ecological Genomics DataBase (MegDB), which is subdivided to hold both sequence and habitat data and global environmental data layers. The extended system provides access to several hundreds of genomes and metagenomes from prokaryotes and phages, as well as over a million small and large subunit ribosomal RNA sequences. With the refined Genes Mapserver, all data can be interactively visualized on a world map and statistics describing environmental parameters can be calculated. Sequence entries have been curated to comply with the proposed minimal standards for genomes and metagenomes (MIGS/MIMS) of the Genomic Standards Consortium. Access to data is facilitated by Web Services. The updated megx.net portal offers microbial ecologists greatly enhanced database content, and new features and tools for data analysis, all of which are freely accessible from our webpage http://www.megx.net. PMID:19858098

  8. Choosing a Genome Browser for a Model Organism Database (MOD): Surveying the Maize Community

    Technology Transfer Automated Retrieval System (TEKTRAN)

    As the maize genome sequencing is nearing its completion, the Maize Genetics and Genomics Database (MaizeGDB), the Model Organism Database for maize, integrated a genome browser to its already existing Web interface and database. The addition of the MaizeGDB Genome Browser to MaizeGDB will allow it ...

  9. BBGD: an online database for blueberry genomic data

    PubMed Central

    Alkharouf, Nadim W; Dhanaraj, Anik L; Naik, Dhananjay; Overall, Chris; Matthews, Benjamin F; Rowland, Lisa J

    2007-01-01

    Background Blueberry is a member of the Ericaceae family, which also includes closely related cranberry and more distantly related rhododendron, azalea, and mountain laurel. Blueberry is a major berry crop in the United States, and one that has great nutritional and economical value. Extreme low temperatures, however, reduce crop yield and cause major losses to US farmers. A better understanding of the genes and biochemical pathways that are up- or down-regulated during cold acclimation is needed to produce blueberry cultivars with enhanced cold hardiness. To that end, the blueberry genomics database (BBDG) was developed. Along with the analysis tools and web-based query interfaces, the database serves both the broader Ericaceae research community and the blueberry research community specifically by making available ESTs and gene expression data in searchable formats and in elucidating the underlying mechanisms of cold acclimation and freeze tolerance in blueberry. Description BBGD is the world's first database for blueberry genomics. BBGD is both a sequence and gene expression database. It stores both EST and microarray data and allows scientists to correlate expression profiles with gene function. BBGD is a public online database. Presently, the main focus of the database is the identification of genes in blueberry that are significantly induced or suppressed after low temperature exposure. Conclusion By using the database, researchers have developed EST-based markers for mapping and have identified a number of "candidate" cold tolerance genes that are highly expressed in blueberry flower buds after exposure to low temperatures. PMID:17263892

  10. CarrotDB: a genomic and transcriptomic database for carrot.

    PubMed

    Xu, Zhi-Sheng; Tan, Hua-Wei; Wang, Feng; Hou, Xi-Lin; Xiong, Ai-Sheng

    2014-01-01

    Carrot (Daucus carota L.) is an economically important vegetable worldwide and is the largest source of carotenoids and provitamin A in the human diet. Given the importance of this vegetable to humans, research and breeding communities on carrot should obtain useful genomic and transcriptomic information. The first whole-genome sequences of 'DC-27' carrot were de novo assembled and analyzed. Transcriptomic sequences of 14 carrot genotypes were downloaded from the Sequence Read Archive (SRA) database of National Center for Biotechnology Information (NCBI) and mapped to the whole-genome sequence before assembly. Based on these data sets, the first Web-based genomic and transcriptomic database for D. carota (CarrotDB) was developed (database homepage: http://apiaceae.njau.edu.cn/car rotdb). CarrotDB offers the tools of Genome Map and Basic Local Alignment Search Tool. Using these tools, users can search certain target genes and simple sequence repeats along with designed primers of 'DC-27'. Assembled transcriptomic sequences along with fragments per kilobase of transcript sequence per millions base pairs sequenced information (FPKM) information of 14 carrot genotypes are also provided. Users can download de novo assembled whole-genome sequences, putative gene sequences and putative protein sequences of 'DC-27'. Users can also download transcriptome sequence assemblies of 14 carrot genotypes along with their FPKM information. A total of 2826 transcription factor (TF) genes classified into 57 families were identified in the entire genome sequences. These TF genes were embedded in CarrotDB as an interface. The 'GERMPLASM' part of CarrotDB also offers taproot photos of 45 carrot genotypes and a table containing accession numbers, names, countries of origin and colors of cortex, phloem and xylem parts of taproots corresponding to each carrot genotype. CarrotDB will be continuously updated with new information. Database URL: http://apiaceae.njau.edu.cn/carrotdb/ PMID

  11. Querying genomic databases: refining the connectivity map.

    PubMed

    Segal, Mark R; Xiong, Hao; Bengtsson, Henrik; Bourgon, Richard; Gentleman, Robert

    2012-01-01

    The advent of high-throughput biotechnologies, which can efficiently measure gene expression on a global basis, has led to the creation and population of correspondingly rich databases and compendia. Such repositories have the potential to add enormous scientific value beyond that provided by individual studies which, due largely to cost considerations, are typified by small sample sizes. Accordingly, substantial effort has been invested in devising analysis schemes for utilizing gene-expression repositories. Here, we focus on one such scheme, the Connectivity Map (cmap), that was developed with the express purpose of identifying drugs with putative efficacy against a given disease, where the disease in question is characterized by a (differential) gene-expression signature. Initial claims surrounding cmap intimated that such tools might lead to new, previously unanticipated applications of existing drugs. However, further application suggests that its primary utility is in connecting a disease condition whose biology is largely unknown to a drug whose mechanisms of action are well understood, making cmap a tool for enhancing biological knowledge.The success of the Connectivity Map is belied by its simplicity. The aforementioned signature serves as an unordered query which is applied to a customized database of (differential) gene-expression experiments designed to elicit response to a wide range of drugs, across of spectrum of concentrations, durations, and cell lines. Such application is effected by computing a per experiment score that measures "closeness" between the signature and the experiment. Top-scoring experiments, and the attendant drug(s), are then deemed relevant to the disease underlying the query. Inference supporting such elicitations is pursued via re-sampling. In this paper, we revisit two key aspects of the Connectivity Map implementation. Firstly, we develop new approaches to measuring closeness for the common scenario wherein the query

  12. A primer on rapid prototyping of genomic databases in Prolog

    SciTech Connect

    Yoshida, Kaoru; Smith, C.L.; Overbeek, R.

    1992-01-01

    This report presents a tutorial on how one might create an integrated database of genomic information. We outline the required steps for implementation, give a brief introduction to Prolog, and discuss the query facility supported by our system. Our goal is to enable researchers to being constructing their own biological information system.

  13. MaizeGDB: The Maize Genetics and Genomics Database.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    MaizeGDB is the community database for biological information about the crop plant Zea mays. Genomic, genetic, sequence, gene product, functional characterization, literature reference, and person/organization contact information are among the datatypes stored at MaizeGDB. At the project’s website...

  14. MaizeGDB: The Maize Genetics and Genomics Database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    MaizeGDB is the community database for biological information about the crop plant Zea mays. Genetic, genomic, sequence, gene product, functional characterization, literature reference, and person/organization contact information are among the datatypes stored at MaizeGDB. At the project's website...

  15. BRAD, the genetics and genomics database for Brassica plants

    PubMed Central

    2011-01-01

    Background Brassica species include both vegetable and oilseed crops, which are very important to the daily life of common human beings. Meanwhile, the Brassica species represent an excellent system for studying numerous aspects of plant biology, specifically for the analysis of genome evolution following polyploidy, so it is also very important for scientific research. Now, the genome of Brassica rapa has already been assembled, it is the time to do deep mining of the genome data. Description BRAD, the Brassica database, is a web-based resource focusing on genome scale genetic and genomic data for important Brassica crops. BRAD was built based on the first whole genome sequence and on further data analysis of the Brassica A genome species, Brassica rapa (Chiifu-401-42). It provides datasets, such as the complete genome sequence of B. rapa, which was de novo assembled from Illumina GA II short reads and from BAC clone sequences, predicted genes and associated annotations, non coding RNAs, transposable elements (TE), B. rapa genes' orthologous to those in A. thaliana, as well as genetic markers and linkage maps. BRAD offers useful searching and data mining tools, including search across annotation datasets, search for syntenic or non-syntenic orthologs, and to search the flanking regions of a certain target, as well as the tools of BLAST and Gbrowse. BRAD allows users to enter almost any kind of information, such as a B. rapa or A. thaliana gene ID, physical position or genetic marker. Conclusion BRAD, a new database which focuses on the genetics and genomics of the Brassica plants has been developed, it aims at helping scientists and breeders to fully and efficiently use the information of genome data of Brassica plants. BRAD will be continuously updated and can be accessed through http://brassicadb.org. PMID:21995777

  16. Integrated database for identifying candate genes for Aspergillus flavus resistance in maize

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus Link:Fr, an opportunistic fungus that produces aflatoxin, is pathogenic to maize and other oilseed crops. Aflatoxin is a potent carcinogen, and its presence markedly reduces the value of grain. Understanding and enhancing host resistance to A. flavus infection and/or subsequent af...

  17. PGDD: a database of gene and genome duplication in plants

    PubMed Central

    Lee, Tae-Ho; Tang, Haibao; Wang, Xiyin; Paterson, Andrew H.

    2013-01-01

    Genome duplication (GD) has permanently shaped the architecture and function of many higher eukaryotic genomes. The angiosperms (flowering plants) are outstanding models in which to elucidate consequences of GD for higher eukaryotes, owing to their propensity for chromosomal duplication or even triplication in a few cases. Duplicated genome structures often require both intra- and inter-genome alignments to unravel their evolutionary history, also providing the means to deduce both obvious and otherwise-cryptic orthology, paralogy and other relationships among genes. The burgeoning sets of angiosperm genome sequences provide the foundation for a host of investigations into the functional and evolutionary consequences of gene and GD. To provide genome alignments from a single resource based on uniform standards that have been validated by empirical studies, we built the Plant Genome Duplication Database (PGDD; freely available at http://chibba.agtec.uga.edu/duplication/), a web service providing synteny information in terms of colinearity between chromosomes. At present, PGDD contains data for 26 plants including bryophytes and chlorophyta, as well as angiosperms with draft genome sequences. In addition to the inclusion of new genomes as they become available, we are preparing new functions to enhance PGDD. PMID:23180799

  18. Genomic Databases and Biobanks in Denmark.

    PubMed

    Hartlev, Mette

    2015-01-01

    Biobanking in Denmark is regulated via patients' rights laws, data protection laws, and research ethics reviews. Danish law recognizes tissue samples as personal data for purposes of the data protection laws, meaning research with tissue samples may be subject to research ethics review, data protection laws, and patients' rights requirements depending on the circumstances of collection. However, research on information gained through whole genome sequencing is subject only to data protection laws, despite the similarity in the nature of the information. The regulatory framework treats biobank samples collected from patients differently than samples collected from research participants, particularly with respect to autonomy. Importantly, biobanks established for future unspecified research are not subject to research ethics review. Biobank-based research has gained more prominence on the national level recently, and the potential for a less fragmented and more consistent regulatory approach may emerge from this attention. PMID:26711414

  19. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases

    PubMed Central

    Zobel, Justin; Zhang, Xiuzhen; Verspoor, Karin

    2016-01-01

    Motivation First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. Results We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material. PMID:27489953

  20. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

    PubMed

    Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

    2016-01-01

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. PMID:26578582

  1. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database

    PubMed Central

    Winsor, Geoffrey L.; Griffiths, Emma J.; Lo, Raymond; Dhillon, Bhavjinder K.; Shay, Julie A.; Brinkman, Fiona S. L.

    2016-01-01

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. PMID:26578582

  2. Building a genome database using an object-oriented approach.

    PubMed

    Barbasiewicz, Anna; Liu, Lin; Lang, B Franz; Burger, Gertraud

    2002-01-01

    GOBASE is a relational database that integrates data associated with mitochondria and chloroplasts. The most important data in GOBASE, i. e., molecular sequences and taxonomic information, are obtained from the public sequence data repository at the National Center for Biotechnology Information (NCBI), and are validated by our experts. Maintaining a curated genomic database comes with a towering labor cost, due to the shear volume of available genomic sequences and the plethora of annotation errors and omissions in records retrieved from public repositories. Here we describe our approach to increase automation of the database population process, thereby reducing manual intervention. As a first step, we used Unified Modeling Language (UML) to construct a list of potential errors. Each case was evaluated independently, and an expert solution was devised, and represented as a diagram. Subsequently, the UML diagrams were used as templates for writing object-oriented automation programs in the Java programming language. PMID:12542407

  3. Mouse Genome Database: from sequence to phenotypes and disease models

    PubMed Central

    Eppig, Janan T.; Richardson, Joel E.; Kadin, James A.; Smith, Cynthia L.; Blake, Judith A.; Bult, Carol J.

    2015-01-01

    The Mouse Genome Database (MGD, www.informatics.jax.org) is the international scientific database for genetic, genomic, and biological data on the laboratory mouse to support the research requirements of the biomedical community. To accomplish this goal, MGD provides broad data coverage, serves as the authoritative standard for mouse nomenclature for genes, mutants, and strains, and curates and integrates many types of data from literature and electronic sources. Among the key data sets MGD supports are: the complete catalog of mouse genes and genome features, comparative homology data for mouse and vertebrate genes, the authoritative set of Gene Ontology (GO) annotations for mouse gene functions, a comprehensive catalog of mouse mutations and their phenotypes, and a curated compendium of mouse models of human diseases. Here we describe the data acquisition process, specifics about MGD’s key data areas, methods to access and query MGD data, and outreach and user help facilities. PMID:26150326

  4. Mouse Genome Database: From sequence to phenotypes and disease models.

    PubMed

    Eppig, Janan T; Richardson, Joel E; Kadin, James A; Smith, Cynthia L; Blake, Judith A; Bult, Carol J

    2015-08-01

    The Mouse Genome Database (MGD, www.informatics.jax.org) is the international scientific database for genetic, genomic, and biological data on the laboratory mouse to support the research requirements of the biomedical community. To accomplish this goal, MGD provides broad data coverage, serves as the authoritative standard for mouse nomenclature for genes, mutants, and strains, and curates and integrates many types of data from literature and electronic sources. Among the key data sets MGD supports are: the complete catalog of mouse genes and genome features, comparative homology data for mouse and vertebrate genes, the authoritative set of Gene Ontology (GO) annotations for mouse gene functions, a comprehensive catalog of mouse mutations and their phenotypes, and a curated compendium of mouse models of human diseases. Here, we describe the data acquisition process, specifics about MGD's key data areas, methods to access and query MGD data, and outreach and user help facilities. PMID:26150326

  5. EU Laws on Privacy in Genomic Databases and Biobanking.

    PubMed

    Townend, David

    2016-03-01

    Both the European Union and the Council of Europe have a bearing on privacy in genomic databases and biobanking. In terms of legislation, the processing of personal data as it relates to the right to privacy is currently largely regulated in Europe by Directive 95/46/EC, which requires that processing be "fair and lawful" and follow a set of principles, meaning that the data be processed only for stated purposes, be sufficient for the purposes of the processing, be kept only for so long as is necessary to achieve those purposes, and be kept securely and only in an identifiable state for such time as is necessary for the processing. The European privacy regime does not require the de-identification (anonymization) of personal data used in genomic databases or biobanks, and alongside this practice informed consent as well as governance and oversight mechanisms provide for the protection of genomic data. PMID:27256129

  6. Development of genome viewer (Web Omics Viewer) for managing databases of cucumber genome

    NASA Astrophysics Data System (ADS)

    Wojcieszek, M.; RóŻ, P.; Pawełkowicz, M.; Nowak, R.; Przybecki, Z.

    Cucumber is an important plant in horticulture and science world. Sequencing projects of C. sativus genome enable new methodological aproaches in further investigation of this species. Accessibility is crucial to fully exploit obtained information about detail structure of genes, markers and other characteristic features such contigs, scaffolds and chromosomes. Genome viewer is one of tools providing plain and easy way for presenting genome data for users and for databases administration. Gbrowse - the main viewer has several very useful features but lacks in managing simplicity. Our group developed new genome browser Web Omics Viewer (WOV), keeping functionality but improving utilization and accessibility to cucumber genome data.

  7. DemaDb: an integrated dematiaceous fungal genomes database

    PubMed Central

    Kuan, Chee Sian; Yew, Su Mei; Chan, Chai Ling; Toh, Yue Fen; Lee, Kok Wei; Cheong, Wei-Hien; Yee, Wai-Yan; Hoh, Chee-Choong; Yap, Soon-Joo; Ng, Kee Peng

    2016-01-01

    Many species of dematiaceous fungi are associated with allergic reactions and potentially fatal diseases in human, especially in tropical climates. Over the past 10 years, we have isolated more than 400 dematiaceous fungi from various clinical samples. In this study, DemaDb, an integrated database was designed to support the integration and analysis of dematiaceous fungal genomes. A total of 92 072 putative genes and 6527 pathways that identified in eight dematiaceous fungi (Bipolaris papendorfii UM 226, Daldinia eschscholtzii UM 1400, D. eschscholtzii UM 1020, Pyrenochaeta unguis-hominis UM 256, Ochroconis mirabilis UM 578, Cladosporium sphaerospermum UM 843, Herpotrichiellaceae sp. UM 238 and Pleosporales sp. UM 1110) were deposited in DemaDb. DemaDb includes functional annotations for all predicted gene models in all genomes, such as Gene Ontology, EuKaryotic Orthologous Groups, Kyoto Encyclopedia of Genes and Genomes (KEGG), Pfam and InterProScan. All predicted protein models were further functionally annotated to Carbohydrate-Active enzymes, peptidases, secondary metabolites and virulence factors. DemaDb Genome Browser enables users to browse and visualize entire genomes with annotation data including gene prediction, structure, orientation and custom feature tracks. The Pathway Browser based on the KEGG pathway database allows users to look into molecular interaction and reaction networks for all KEGG annotated genes. The availability of downloadable files containing assembly, nucleic acid, as well as protein data allows the direct retrieval for further downstream works. DemaDb is a useful resource for fungal research community especially those involved in genome-scale analysis, functional genomics, genetics and disease studies of dematiaceous fungi. Database URL: http://fungaldb.um.edu.my PMID:26980516

  8. DemaDb: an integrated dematiaceous fungal genomes database.

    PubMed

    Kuan, Chee Sian; Yew, Su Mei; Chan, Chai Ling; Toh, Yue Fen; Lee, Kok Wei; Cheong, Wei-Hien; Yee, Wai-Yan; Hoh, Chee-Choong; Yap, Soon-Joo; Ng, Kee Peng

    2016-01-01

    Many species of dematiaceous fungi are associated with allergic reactions and potentially fatal diseases in human, especially in tropical climates. Over the past 10 years, we have isolated more than 400 dematiaceous fungi from various clinical samples. In this study, DemaDb, an integrated database was designed to support the integration and analysis of dematiaceous fungal genomes. A total of 92 072 putative genes and 6527 pathways that identified in eight dematiaceous fungi (Bipolaris papendorfii UM 226, Daldinia eschscholtzii UM 1400, D. eschscholtzii UM 1020, Pyrenochaeta unguis-hominis UM 256, Ochroconis mirabilis UM 578, Cladosporium sphaerospermum UM 843, Herpotrichiellaceae sp. UM 238 and Pleosporales sp. UM 1110) were deposited in DemaDb. DemaDb includes functional annotations for all predicted gene models in all genomes, such as Gene Ontology, EuKaryotic Orthologous Groups, Kyoto Encyclopedia of Genes and Genomes (KEGG), Pfam and InterProScan. All predicted protein models were further functionally annotated to Carbohydrate-Active enzymes, peptidases, secondary metabolites and virulence factors. DemaDb Genome Browser enables users to browse and visualize entire genomes with annotation data including gene prediction, structure, orientation and custom feature tracks. The Pathway Browser based on the KEGG pathway database allows users to look into molecular interaction and reaction networks for all KEGG annotated genes. The availability of downloadable files containing assembly, nucleic acid, as well as protein data allows the direct retrieval for further downstream works. DemaDb is a useful resource for fungal research community especially those involved in genome-scale analysis, functional genomics, genetics and disease studies of dematiaceous fungi. Database URL: http://fungaldb.um.edu.my. PMID:26980516

  9. Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics.

    PubMed

    Sakai, Hiroaki; Lee, Sung Shin; Tanaka, Tsuyoshi; Numa, Hisataka; Kim, Jungsok; Kawahara, Yoshihiro; Wakimoto, Hironobu; Yang, Ching-chia; Iwamoto, Masao; Abe, Takashi; Yamada, Yuko; Muto, Akira; Inokuchi, Hachiro; Ikemura, Toshimichi; Matsumoto, Takashi; Sasaki, Takuji; Itoh, Takeshi

    2013-02-01

    The Rice Annotation Project Database (RAP-DB, http://rapdb.dna.affrc.go.jp/) has been providing a comprehensive set of gene annotations for the genome sequence of rice, Oryza sativa (japonica group) cv. Nipponbare. Since the first release in 2005, RAP-DB has been updated several times along with the genome assembly updates. Here, we present our newest RAP-DB based on the latest genome assembly, Os-Nipponbare-Reference-IRGSP-1.0 (IRGSP-1.0), which was released in 2011. We detected 37,869 loci by mapping transcript and protein sequences of 150 monocot species. To provide plant researchers with highly reliable and up to date rice gene annotations, we have been incorporating literature-based manually curated data, and 1,626 loci currently incorporate literature-based annotation data, including commonly used gene names or gene symbols. Transcriptional activities are shown at the nucleotide level by mapping RNA-Seq reads derived from 27 samples. We also mapped the Illumina reads of a Japanese leading japonica cultivar, Koshihikari, and a Chinese indica cultivar, Guangluai-4, to the genome and show alignments together with the single nucleotide polymorphisms (SNPs) and gene functional annotations through a newly developed browser, Short-Read Assembly Browser (S-RAB). We have developed two satellite databases, Plant Gene Family Database (PGFD) and Integrative Database of Cereal Gene Phylogeny (IDCGP), which display gene family and homologous gene relationships among diverse plant species. RAP-DB and the satellite databases offer simple and user-friendly web interfaces, enabling plant and genome researchers to access the data easily and facilitating a broad range of plant research topics. PMID:23299411

  10. Genome Sequence of the White Koji Mold Aspergillus kawachii IFO 4308, Used for Brewing the Japanese Distilled Spirit Shochu

    PubMed Central

    Futagami, Taiki; Mori, Kazuki; Yamashita, Ayaka; Wada, Shotaro; Kajiwara, Yasuhiro; Takashita, Hideharu; Omori, Toshiro; Takegawa, Kaoru; Tashiro, Kosuke; Kuhara, Satoru; Goto, Masatoshi

    2011-01-01

    The filamentous fungus Aspergillus kawachii has traditionally been used for brewing the Japanese distilled spirit shochu. A. kawachii characteristically hyperproduces citric acid and a variety of polysaccharide glycoside hydrolases. Here the genome sequence of A. kawachii IFO 4308 was determined and annotated. Analysis of the sequence may provide insight into the properties of this fungus that make it superior for use in shochu production, leading to the further development of A. kawachii for industrial applications. PMID:22045919

  11. ICDS database: interrupted CoDing sequences in prokaryotic genomes.

    PubMed

    Perrodou, Emmanuel; Deshayes, Caroline; Muller, Jean; Schaeffer, Christine; Van Dorsselaer, Alain; Ripp, Raymond; Poch, Olivier; Reyrat, Jean-Marc; Lecompte, Odile

    2006-01-01

    Unrecognized frameshifts, in-frame stop codons and sequencing errors lead to Interrupted CoDing Sequence (ICDS) that can seriously affect all subsequent steps of functional characterization, from in silico analysis to high-throughput proteomic projects. Here, we describe the Interrupted CoDing Sequence database containing ICDS detected by a similarity-based approach in 80 complete prokaryotic genomes. ICDS can be retrieved by species browsing or similarity searches via a web interface (http://www-bio3d-igbmc.u-strasbg.fr/ICDS/). The definition of each interrupted gene is provided as well as the ICDS genomic localization with the surrounding sequence. Furthermore, to facilitate the experimental characterization of ICDS, we propose optimized primers for re-sequencing purposes. The database will be regularly updated with additional data from ongoing sequenced genomes. Our strategy has been validated by three independent tests: (i) ICDS prediction on a benchmark of artificially created frameshifts, (ii) comparison of predicted ICDS and results obtained from the comparison of the two genomic sequences of Bacillus licheniformis strain ATCC 14580 and (iii) re-sequencing of 25 predicted ICDS of the recently sequenced genome of Mycobacterium smegmatis. This allows us to estimate the specificity and sensitivity (95 and 82%, respectively) of our program and the efficiency of primer determination. PMID:16381882

  12. The Genome Database for Rosaceae (GDR): year 10 update.

    PubMed

    Jung, Sook; Ficklin, Stephen P; Lee, Taein; Cheng, Chun-Huai; Blenda, Anna; Zheng, Ping; Yu, Jing; Bombarely, Aureliano; Cho, Ilhyung; Ru, Sushan; Evans, Kate; Peace, Cameron; Abbott, Albert G; Mueller, Lukas A; Olmstead, Mercy A; Main, Dorrie

    2014-01-01

    The Genome Database for Rosaceae (GDR, http:/www.rosaceae.org), the long-standing central repository and data mining resource for Rosaceae research, has been enhanced with new genomic, genetic and breeding data, and improved functionality. Whole genome sequences of apple, peach and strawberry are available to browse or download with a range of annotations, including gene model predictions, aligned transcripts, repetitive elements, polymorphisms, mapped genetic markers, mapped NCBI Rosaceae genes, gene homologs and association of InterPro protein domains, GO terms and Kyoto Encyclopedia of Genes and Genomes pathway terms. Annotated sequences can be queried using search interfaces and visualized using GBrowse. New expressed sequence tag unigene sets are available for major genera, and Pathway data are available through FragariaCyc, AppleCyc and PeachCyc databases. Synteny among the three sequenced genomes can be viewed using GBrowse_Syn. New markers, genetic maps and extensively curated qualitative/Mendelian and quantitative trait loci are available. Phenotype and genotype data from breeding projects and genetic diversity projects are also included. Improved search pages are available for marker, trait locus, genetic diversity and publication data. New search tools for breeders enable selection comparison and assistance with breeding decision making. PMID:24225320

  13. The Genome Database for Rosaceae (GDR): year 10 update

    PubMed Central

    Jung, Sook; Ficklin, Stephen P.; Lee, Taein; Cheng, Chun-Huai; Blenda, Anna; Zheng, Ping; Yu, Jing; Bombarely, Aureliano; Cho, Ilhyung; Ru, Sushan; Evans, Kate; Peace, Cameron; Abbott, Albert G.; Mueller, Lukas A.; Olmstead, Mercy A.; Main, Dorrie

    2014-01-01

    The Genome Database for Rosaceae (GDR, http:/www.rosaceae.org), the long-standing central repository and data mining resource for Rosaceae research, has been enhanced with new genomic, genetic and breeding data, and improved functionality. Whole genome sequences of apple, peach and strawberry are available to browse or download with a range of annotations, including gene model predictions, aligned transcripts, repetitive elements, polymorphisms, mapped genetic markers, mapped NCBI Rosaceae genes, gene homologs and association of InterPro protein domains, GO terms and Kyoto Encyclopedia of Genes and Genomes pathway terms. Annotated sequences can be queried using search interfaces and visualized using GBrowse. New expressed sequence tag unigene sets are available for major genera, and Pathway data are available through FragariaCyc, AppleCyc and PeachCyc databases. Synteny among the three sequenced genomes can be viewed using GBrowse_Syn. New markers, genetic maps and extensively curated qualitative/Mendelian and quantitative trait loci are available. Phenotype and genotype data from breeding projects and genetic diversity projects are also included. Improved search pages are available for marker, trait locus, genetic diversity and publication data. New search tools for breeders enable selection comparison and assistance with breeding decision making. PMID:24225320

  14. Construction of an integrated database to support genomic sequence analysis

    SciTech Connect

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  15. Towards Interoperability in Genome Databases: The MAtDB (MIPS Arabidopsis Thaliana Database) Experience.

    PubMed

    Schoof, Heiko

    2003-01-01

    Increasing numbers of whole-genome sequences are available, but to interpret them fully requires more than listing all genes. Genome databases are faced with the challenges of integrating heterogenous data and enabling data mining. In comparison to a data warehousing approach, where integration is achieved through replication of all relevant data in a unified schema, distributed approaches provide greater flexibility and maintainability. These are important in a field where new data is generated rapidly and our understanding of the data changes. Interoperability between distributed data sources allows data maintenance to be separated from integration and analysis. Simple ways to access the data can facilitate the development of new data mining tools and the transition from model genome analysis to comparative genomics. With the MIPS Arabidopsis thaliana genome database (MAtDB, http://mips.gsf.de/proj/thal/db) our aim is to go beyond a data repository towards creating an integrated knowledge resource. To this end, the Arabidopsis genome has been a backbone against which to structure and integrate heterogenous data. The challenges to be met are continuous updating of data, the design of flexible data models that can evolve with new data, the integration of heterogenous data, e.g. through the use of ontologies, comprehensive views and visualization of complex information, simple interfaces for application access locally or via the Internet, and knowledge transfer across species. PMID:18629123

  16. Tripal: a construction toolkit for online genome databases.

    PubMed

    Ficklin, Stephen P; Sanderson, Lacey-Anne; Cheng, Chun-Huai; Staton, Margaret E; Lee, Taein; Cho, Il-Hyung; Jung, Sook; Bett, Kirstin E; Main, Doreen

    2011-01-01

    As the availability, affordability and magnitude of genomics and genetics research increases so does the need to provide online access to resulting data and analyses. Availability of a tailored online database is the desire for many investigators or research communities; however, managing the Information Technology infrastructure needed to create such a database can be an undesired distraction from primary research or potentially cost prohibitive. Tripal provides simplified site development by merging the power of Drupal, a popular web Content Management System with that of Chado, a community-derived database schema for storage of genomic, genetic and other related biological data. Tripal provides an interface that extends the content management features of Drupal to the data housed in Chado. Furthermore, Tripal provides a web-based Chado installer, genomic data loaders, web-based editing of data for organisms, genomic features, biological libraries, controlled vocabularies and stock collections. Also available are Tripal extensions that support loading and visualizations of NCBI BLAST, InterPro, Kyoto Encyclopedia of Genes and Genomes and Gene Ontology analyses, as well as an extension that provides integration of Tripal with GBrowse, a popular GMOD tool. An Application Programming Interface is available to allow creation of custom extensions by site developers, and the look-and-feel of the site is completely customizable through Drupal-based PHP template files. Addition of non-biological content and user-management is afforded through Drupal. Tripal is an open source and freely available software package found at http://tripal.sourceforge.net. PMID:21959868

  17. TreeGenes: A Forest Tree Genome Database

    PubMed Central

    Wegrzyn, Jill L.; Lee, Jennifer M.; Tearse, Brandon R.; Neale, David B.

    2008-01-01

    The Dendrome Project and associated TreeGenes database serve the forest genetics research community through a curated and integrated web-based relational database. The research community is composed of approximately 2 000 members representing over 730 organizations worldwide. The database itself is composed of a wide range of genetic data from many forest trees with focused efforts on commercially important members of the Pinaceae family. The primary data types curated include species, publications, tree and DNA extraction information, genetic maps, molecular markers, ESTs, genotypic, and phenotypic data. There are currently ten main search modules or user access points within this PostgreSQL database. These access points allow users to navigate logically through the related data types. The goals of the Dendrome Project are to (1) provide a comprehensive resource for forest tree genomics data to facilitate gene discovery in related species, (2) develop interfaces that encourage the submission and integration of all genomic data, and to (3) centralize and distribute existing and novel online tools for the research community that both support and ease analysis. Recent developments have focused on increasing data content, functional annotations, data retrieval, and visualization tools. TreeGenes was developed to provide a centralized web resource with analysis and visualization tools to support data storage and exchange. PMID:18725987

  18. GOLD.db: genomics of lipid-associated disorders database

    PubMed Central

    Hackl, Hubert; Maurer, Michael; Mlecnik, Bernhard; Hartler, Jürgen; Stocker, Gernot; Miranda-Saavedra, Diego; Trajanoski, Zlatko

    2004-01-01

    Background The GOLD.db (Genomics of Lipid-Associated Disorders Database) was developed to address the need for integrating disparate information on the function and properties of genes and their products that are particularly relevant to the biology, diagnosis management, treatment, and prevention of lipid-associated disorders. Description The GOLD.db provides a reference for pathways and information about the relevant genes and proteins in an efficiently organized way. The main focus was to provide biological pathways with image maps and visual pathway information for lipid metabolism and obesity-related research. This database provides also the possibility to map gene expression data individually to each pathway. Gene expression at different experimental conditions can be viewed sequentially in context of the pathway. Related large scale gene expression data sets were provided and can be searched for specific genes to integrate information regarding their expression levels in different studies and conditions. Analytic and data mining tools, reagents, protocols, references, and links to relevant genomic resources were included in the database. Finally, the usability of the database was demonstrated using an example about the regulation of Pten mRNA during adipocyte differentiation in the context of relevant pathways. Conclusions The GOLD.db will be a valuable tool that allow researchers to efficiently analyze patterns of gene expression and to display them in a variety of useful and informative ways, allowing outside researchers to perform queries pertaining to gene expression results in the context of biological processes and pathways. PMID:15588328

  19. Data mining approaches for information retrieval from genomic databases

    NASA Astrophysics Data System (ADS)

    Liu, Donglin; Singh, Gautam B.

    2000-04-01

    Sequence retrieval in genomic databases is used for finding sequences related to a query sequence specified by a user. Comparison is the main part of the retrieval system in genomic databases. An efficient sequence comparison algorithm is critical in bioinformatics. There are several different algorithms to perform sequence comparison, such as the suffix array based database search, divergence measurement, methods that rely upon the existence of a local similarity between the query sequence and sequences in the database, or common mutual information between query and sequences in DB. In this paper we have described a new method for DNA sequence retrieval based on data mining techniques. Data mining tools generally find patterns among data and have been successfully applied in industries to improve marketing, sales, and customer support operations. We have applied the descriptive data mining techniques to find relevant patterns that are significant for comparing genetic sequences. Relevance feedback score based on common patterns is developed and employed to compute distance between sequences. The contigs of human chromosomes are used to test the retrieval accuracy and the experimental results are presented.

  20. A web-based genomic sequence database for the Streptomycetaceae: a tool for systematics and genome mining

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The ARS Microbial Genome Sequence Database (http://199.133.98.43), a web-based database server, was established utilizing the BIGSdb (Bacterial Isolate Genomics Sequence Database) software package, developed at Oxford University, as a tool to manage multi-locus sequence data for the family Streptomy...

  1. Sinbase: an integrated database to study genomics, genetics and comparative genomics in Sesamum indicum.

    PubMed

    Wang, Linhai; Yu, Jingyin; Li, Donghua; Zhang, Xiurong

    2015-01-01

    Sesame (Sesamum indicum L.) is an ancient and important oilseed crop grown widely in tropical and subtropical areas. It belongs to the gigantic order Lamiales, which includes many well-known or economically important species, such as olive (Olea europaea), leonurus (Leonurus japonicus) and lavender (Lavandula spica), many of which have important pharmacological properties. Despite their importance, genetic and genomic analyses on these species have been insufficient due to a lack of reference genome information. The now available S. indicum genome will provide an unprecedented opportunity for studying both S. indicum genetic traits and comparative genomics. To deliver S. indicum genomic information to the worldwide research community, we designed Sinbase, a web-based database with comprehensive sesame genomic, genetic and comparative genomic information. Sinbase includes sequences of assembled sesame pseudomolecular chromosomes, protein-coding genes (27,148), transposable elements (372,167) and non-coding RNAs (1,748). In particular, Sinbase provides unique and valuable information on colinear regions with various plant genomes, including Arabidopsis thaliana, Glycine max, Vitis vinifera and Solanum lycopersicum. Sinbase also provides a useful search function and data mining tools, including a keyword search and local BLAST service. Sinbase will be updated regularly with new features, improvements to genome annotation and new genomic sequences, and is freely accessible at http://ocri-genomics.org/Sinbase/. PMID:25480115

  2. The Genomic Threading Database: a comprehensive resource for structural annotations of the genomes from key organisms.

    PubMed

    McGuffin, Liam J; Street, Stefano A; Bryson, Kevin; Sørensen, Søren-Aksel; Jones, David T

    2004-01-01

    Currently, the Genomic Threading Database (GTD) contains structural assignments for the proteins encoded within the genomes of nine eukaryotes and 101 prokaryotes. Structural annotations are carried out using a modified version of GenTHREADER, a reliable fold recognition method. The Gen THREADER annotation jobs are distributed across multiple clusters of processors using grid technology and the predictions are deposited in a relational database accessible via a web interface at http://bioinf.cs.ucl.ac.uk/GTD. Using this system, up to 84% of proteins encoded within a genome can be confidently assigned to known folds with 72% of the residues aligned. On average in the GTD, 64% of proteins encoded within a genome are confidently assigned to known folds and 58% of the residues are aligned to structures. PMID:14681393

  3. Functional Genomic Analysis of Aspergillus flavus Interacting with Resistant and Susceptible Peanut

    PubMed Central

    Wang, Houmiao; Lei, Yong; Yan, Liying; Wan, Liyun; Ren, Xiaoping; Chen, Silong; Dai, Xiaofeng; Guo, Wei; Jiang, Huifang; Liao, Boshou

    2016-01-01

    In the Aspergillus flavus (A. flavus)–peanut pathosystem, development and metabolism of the fungus directly influence aflatoxin contamination. To comprehensively understand the molecular mechanism of A. flavus interaction with peanut, RNA-seq was used for global transcriptome profiling of A. flavus during interaction with resistant and susceptible peanut genotypes. In total, 67.46 Gb of high-quality bases were generated for A. flavus-resistant (af_R) and -susceptible peanut (af_S) at one (T1), three (T2) and seven (T3) days post-inoculation. The uniquely mapped reads to A. flavus reference genome in the libraries of af_R and af_S at T2 and T3 were subjected to further analysis, with more than 72% of all obtained genes expressed in the eight libraries. Comparison of expression levels both af_R vs. af_S and T2 vs. T3 uncovered 1926 differentially expressed genes (DEGs). DEGs associated with mycelial growth, conidial development and aflatoxin biosynthesis were up-regulated in af_S compared with af_R, implying that A. flavus mycelia more easily penetrate and produce much more aflatoxin in susceptible than in resistant peanut. Our results serve as a foundation for understanding the molecular mechanisms of aflatoxin production differences between A. flavus-R and -S peanut, and offer new clues to manage aflatoxin contamination in crops. PMID:26891328

  4. Random Mutagenesis of the Aspergillus oryzae Genome Results in Fungal Antibacterial Activity.

    PubMed

    Leonard, Cory A; Brown, Stacy D; Hayman, J Russell

    2013-01-01

    Multidrug-resistant bacteria cause severe infections in hospitals and communities. Development of new drugs to combat resistant microorganisms is needed. Natural products of microbial origin are the source of most currently available antibiotics. We hypothesized that random mutagenesis of Aspergillus oryzae would result in secretion of antibacterial compounds. To address this hypothesis, we developed a screen to identify individual A. oryzae mutants that inhibit the growth of Methicillin-resistant Staphylococcus aureus (MRSA) in vitro. To randomly generate A. oryzae mutant strains, spores were treated with ethyl methanesulfonate (EMS). Over 3000 EMS-treated A. oryzae cultures were tested in the screen, and one isolate, CAL220, exhibited altered morphology and antibacterial activity. Culture supernatant from this isolate showed antibacterial activity against Methicillin-sensitive Staphylococcus aureus, MRSA, and Pseudomonas aeruginosa, but not Klebsiella pneumonia or Proteus vulgaris. The results of this study support our hypothesis and suggest that the screen used is sufficient and appropriate to detect secreted antibacterial fungal compounds resulting from mutagenesis of A. oryzae. Because the genome of A. oryzae has been sequenced and systems are available for genetic transformation of this organism, targeted as well as random mutations may be introduced to facilitate the discovery of novel antibacterial compounds using this system. PMID:23983696

  5. Addition of a breeding database in the Genome Database for Rosaceae.

    PubMed

    Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie

    2013-01-01

    Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will

  6. Addition of a breeding database in the Genome Database for Rosaceae

    PubMed Central

    Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie

    2013-01-01

    Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will

  7. Bovine Genome Database: new tools for gleaning function from the Bos taurus genome.

    PubMed

    Elsik, Christine G; Unni, Deepak R; Diesh, Colin M; Tayal, Aditi; Emery, Marianne L; Nguyen, Hung N; Hagen, Darren E

    2016-01-01

    We report an update of the Bovine Genome Database (BGD) (http://BovineGenome.org). The goal of BGD is to support bovine genomics research by providing genome annotation and data mining tools. We have developed new genome and annotation browsers using JBrowse and WebApollo for two Bos taurus genome assemblies, the reference genome assembly (UMD3.1.1) and the alternate genome assembly (Btau_4.6.1). Annotation tools have been customized to highlight priority genes for annotation, and to aid annotators in selecting gene evidence tracks from 91 tissue specific RNAseq datasets. We have also developed BovineMine, based on the InterMine data warehousing system, to integrate the bovine genome, annotation, QTL, SNP and expression data with external sources of orthology, gene ontology, gene interaction and pathway information. BovineMine provides powerful query building tools, as well as customized query templates, and allows users to analyze and download genome-wide datasets. With BovineMine, bovine researchers can use orthology to leverage the curated gene pathways of model organisms, such as human, mouse and rat. BovineMine will be especially useful for gene ontology and pathway analyses in conjunction with GWAS and QTL studies. PMID:26481361

  8. Bovine Genome Database: new tools for gleaning function from the Bos taurus genome

    PubMed Central

    Elsik, Christine G.; Unni, Deepak R.; Diesh, Colin M.; Tayal, Aditi; Emery, Marianne L.; Nguyen, Hung N.; Hagen, Darren E.

    2016-01-01

    We report an update of the Bovine Genome Database (BGD) (http://BovineGenome.org). The goal of BGD is to support bovine genomics research by providing genome annotation and data mining tools. We have developed new genome and annotation browsers using JBrowse and WebApollo for two Bos taurus genome assemblies, the reference genome assembly (UMD3.1.1) and the alternate genome assembly (Btau_4.6.1). Annotation tools have been customized to highlight priority genes for annotation, and to aid annotators in selecting gene evidence tracks from 91 tissue specific RNAseq datasets. We have also developed BovineMine, based on the InterMine data warehousing system, to integrate the bovine genome, annotation, QTL, SNP and expression data with external sources of orthology, gene ontology, gene interaction and pathway information. BovineMine provides powerful query building tools, as well as customized query templates, and allows users to analyze and download genome-wide datasets. With BovineMine, bovine researchers can use orthology to leverage the curated gene pathways of model organisms, such as human, mouse and rat. BovineMine will be especially useful for gene ontology and pathway analyses in conjunction with GWAS and QTL studies. PMID:26481361

  9. MonarchBase: the monarch butterfly genome database

    PubMed Central

    Zhan, Shuai; Reppert, Steven M.

    2013-01-01

    The monarch butterfly (Danaus plexippus) is emerging as a model organism to study the mechanisms of circadian clocks and animal navigation, and the genetic underpinnings of long-distance migration. The initial assembly of the monarch genome was released in 2011, and the biological interpretation of the genome focused on the butterfly’s migration biology. To make the extensive data associated with the genome accessible to the general biological and lepidopteran communities, we established MonarchBase (available at http://monarchbase.umassmed.edu). The database is an open-access, web-available portal that integrates all available data associated with the monarch butterfly genome. Moreover, MonarchBase provides access to an updated version of genome assembly (v3) upon which all data integration is based. These include genes with systematic annotation, as well as other molecular resources, such as brain expressed sequence tags, migration expression profiles and microRNAs. MonarchBase utilizes a variety of retrieving methods to access data conveniently and for integrating biological interpretations. PMID:23143105

  10. MonarchBase: the monarch butterfly genome database.

    PubMed

    Zhan, Shuai; Reppert, Steven M

    2013-01-01

    The monarch butterfly (Danaus plexippus) is emerging as a model organism to study the mechanisms of circadian clocks and animal navigation, and the genetic underpinnings of long-distance migration. The initial assembly of the monarch genome was released in 2011, and the biological interpretation of the genome focused on the butterfly's migration biology. To make the extensive data associated with the genome accessible to the general biological and lepidopteran communities, we established MonarchBase (available at http://monarchbase.umassmed.edu). The database is an open-access, web-available portal that integrates all available data associated with the monarch butterfly genome. Moreover, MonarchBase provides access to an updated version of genome assembly (v3) upon which all data integration is based. These include genes with systematic annotation, as well as other molecular resources, such as brain expressed sequence tags, migration expression profiles and microRNAs. MonarchBase utilizes a variety of retrieving methods to access data conveniently and for integrating biological interpretations. PMID:23143105

  11. Gene Variant Databases and Sharing: Creating a Global Genomic Variant Database for Personalized Medicine.

    PubMed

    Bean, Lora J H; Hegde, Madhuri R

    2016-06-01

    Revolutionary changes in sequencing technology and the desire to develop therapeutics for rare diseases have led to the generation of an enormous amount of genomic data in the last 5 years. Large-scale sequencing done in both research and diagnostic laboratories has linked many new genes to rare diseases, but has also generated a number of variants that we cannot interpret today. It is clear that we remain a long way from a complete understanding of the genomic variation in the human genome and its association with human health and disease. Recent studies identified susceptibility markers to infectious diseases and also the contribution of rare variants to complex diseases in different populations. The sequencing revolution has also led to the creation of a large number of databases that act as "keepers" of data, and in many cases give an interpretation of the effect of the variant. This interpretation is based on reports in the literature, prediction models, and in some cases is accompanied by functional evidence. As we move toward the practice of genomic medicine, and consider its place in "personalized medicine," it is time to ask ourselves how we can aggregate this wealth of data into a single database for multiple users with different goals. PMID:26931283

  12. GOBASE—a database of organelle and bacterial genome information

    PubMed Central

    O'Brien, Emmet A.; Zhang, Yue; Yang, LiuSong; Wang, Eric; Marie, Veronique; Lang, B. Franz; Burger, Gertraud

    2006-01-01

    The organelle genome database GOBASE is now in its twelfth release, and includes 350 000 mitochondrial sequences and 118 000 chloroplast sequences, roughly a 3-fold expansion since previously documented. GOBASE also includes a fully reannotated genome sequence of Rickettsia prowazekii, one of the closest bacterial relatives of mitochondria, and will shortly expand to contain more data from bacteria from which organelles originated. All these sequences are now accessible through a single unified interface. Enhancements to the functionality of GOBASE include addition of pages for RNA structures and a page compiling data about the taxonomic distribution of organelle-encoded genes; incorporation of Gene Ontology terms; addition of features deduced from incomplete annotations to sequences in GenBank; marking of type examples in cases where single genes in single species are oversampled within GenBank; and addition of graphics illustrating gene structure and the position of neighbouring genes on a sequence. The database has been reimplemented in PostgreSQL to facilitate development and maintenance, and structural modifications have been made to speed up queries, particularly those related to taxonomy. The GOBASE database can be queried at and inquiries should be directed to gobase@bch.umontreal.ca. PMID:16381962

  13. DoGSD: the dog and wolf genome SNP database.

    PubMed

    Bai, Bing; Zhao, Wen-Ming; Tang, Bi-Xia; Wang, Yan-Qing; Wang, Lu; Zhang, Zhang; Yang, He-Chuan; Liu, Yan-Hu; Zhu, Jun-Wei; Irwin, David M; Wang, Guo-Dong; Zhang, Ya-Ping

    2015-01-01

    The rapid advancement of next-generation sequencing technology has generated a deluge of genomic data from domesticated dogs and their wild ancestor, grey wolves, which have simultaneously broadened our understanding of domestication and diseases that are shared by humans and dogs. To address the scarcity of single nucleotide polymorphism (SNP) data provided by authorized databases and to make SNP data more easily/friendly usable and available, we propose DoGSD (http://dogsd.big.ac.cn), the first canidae-specific database which focuses on whole genome SNP data from domesticated dogs and grey wolves. The DoGSD is a web-based, open-access resource comprising ∼ 19 million high-quality whole-genome SNPs. In addition to the dbSNP data set (build 139), DoGSD incorporates a comprehensive collection of SNPs from two newly sequenced samples (1 wolf and 1 dog) and collected SNPs from three latest dog/wolf genetic studies (7 wolves and 68 dogs), which were taken together for analysis with the population genetic statistics, Fst. In addition, DoGSD integrates some closely related information including SNP annotation, summary lists of SNPs located in genes, synonymous and non-synonymous SNPs, sampling location and breed information. All these features make DoGSD a useful resource for in-depth analysis in dog-/wolf-related studies. PMID:25404132

  14. DoGSD: the dog and wolf genome SNP database

    PubMed Central

    Bai, Bing; Zhao, Wen-Ming; Tang, Bi-Xia; Wang, Yan-Qing; Wang, Lu; Zhang, Zhang; Yang, He-Chuan; Liu, Yan-Hu; Zhu, Jun-Wei; Irwin, David M.; Wang, Guo-Dong; Zhang, Ya-Ping

    2015-01-01

    The rapid advancement of next-generation sequencing technology has generated a deluge of genomic data from domesticated dogs and their wild ancestor, grey wolves, which have simultaneously broadened our understanding of domestication and diseases that are shared by humans and dogs. To address the scarcity of single nucleotide polymorphism (SNP) data provided by authorized databases and to make SNP data more easily/friendly usable and available, we propose DoGSD (http://dogsd.big.ac.cn), the first canidae-specific database which focuses on whole genome SNP data from domesticated dogs and grey wolves. The DoGSD is a web-based, open-access resource comprising ∼19 million high-quality whole-genome SNPs. In addition to the dbSNP data set (build 139), DoGSD incorporates a comprehensive collection of SNPs from two newly sequenced samples (1 wolf and 1 dog) and collected SNPs from three latest dog/wolf genetic studies (7 wolves and 68 dogs), which were taken together for analysis with the population genetic statistics, Fst. In addition, DoGSD integrates some closely related information including SNP annotation, summary lists of SNPs located in genes, synonymous and non-synonymous SNPs, sampling location and breed information. All these features make DoGSD a useful resource for in-depth analysis in dog-/wolf-related studies. PMID:25404132

  15. The Rice Genome Knowledgebase (RGKbase): an annotation database for rice comparative genomics and evolutionary biology

    PubMed Central

    Wang, Dapeng; Xia, Yan; Li, Xinna; Hou, Lixia; Yu, Jun

    2013-01-01

    Over the past 10 years, genomes of cultivated rice cultivars and their wild counterparts have been sequenced although most efforts are focused on genome assembly and annotation of two major cultivated rice (Oryza sativa L.) subspecies, 93-11 (indica) and Nipponbare (japonica). To integrate information from genome assemblies and annotations for better analysis and application, we now introduce a comparative rice genome database, the Rice Genome Knowledgebase (RGKbase, http://rgkbase.big.ac.cn/RGKbase/). RGKbase is built to have three major components: (i) integrated data curation for rice genomics and molecular biology, which includes genome sequence assemblies, transcriptomic and epigenomic data, genetic variations, quantitative trait loci (QTLs) and the relevant literature; (ii) User-friendly viewers, such as Gbrowse, GeneBrowse and Circos, for genome annotations and evolutionary dynamics and (iii) Bioinformatic tools for compositional and synteny analyses, gene family classifications, gene ontology terms and pathways and gene co-expression networks. RGKbase current includes data from five rice cultivars and species: Nipponbare (japonica), 93-11 (indica), PA64s (indica), the African rice (Oryza glaberrima) and a wild rice species (Oryza brachyantha). We are also constantly introducing new datasets from variety of public efforts, such as two recent releases—sequence data from ∼1000 rice varieties, which are mapped into the reference genome, yielding ample high-quality single-nucleotide polymorphisms and insertions–deletions. PMID:23193278

  16. The YeastGenome app: the Saccharomyces Genome Database at your fingertips.

    PubMed

    Wong, Edith D; Karra, Kalpana; Hitz, Benjamin C; Hong, Eurie L; Cherry, J Michael

    2013-01-01

    The Saccharomyces Genome Database (SGD) is a scientific database that provides researchers with high-quality curated data about the genes and gene products of Saccharomyces cerevisiae. To provide instant and easy access to this information on mobile devices, we have developed YeastGenome, a native application for the Apple iPhone and iPad. YeastGenome can be used to quickly find basic information about S. cerevisiae genes and chromosomal features regardless of internet connectivity. With or without network access, you can view basic information and Gene Ontology annotations about a gene of interest by searching gene names and gene descriptions or by browsing the database within the app to find the gene of interest. With internet access, the app provides more detailed information about the gene, including mutant phenotypes, references and protein and genetic interactions, as well as provides hyperlinks to retrieve detailed information by showing SGD pages and views of the genome browser. SGD provides online help describing basic ways to navigate the mobile version of SGD, highlights key features and answers frequently asked questions related to the app. The app is available from iTunes (http://itunes.com/apps/yeastgenome). The YeastGenome app is provided freely as a service to our community, as part of SGD's mission to provide free and open access to all its data and annotations. PMID:23396302

  17. Tetrahymena functional genomics database (TetraFGD): an integrated resource for Tetrahymena functional genomics.

    PubMed

    Xiong, Jie; Lu, Yuming; Feng, Jinmei; Yuan, Dongxia; Tian, Miao; Chang, Yue; Fu, Chengjie; Wang, Guangying; Zeng, Honghui; Miao, Wei

    2013-01-01

    The ciliated protozoan Tetrahymena thermophila is a useful unicellular model organism for studies of eukaryotic cellular and molecular biology. Researches on T. thermophila have contributed to a series of remarkable basic biological principles. After the macronuclear genome was sequenced, substantial progress has been made in functional genomics research on T. thermophila, including genome-wide microarray analysis of the T. thermophila life cycle, a T. thermophila gene network analysis based on the microarray data and transcriptome analysis by deep RNA sequencing. To meet the growing demands for the Tetrahymena research community, we integrated these data to provide a public access database: Tetrahymena functional genomics database (TetraFGD). TetraFGD contains three major resources, including the RNA-Seq transcriptome, microarray and gene networks. The RNA-Seq data define gene structures and transcriptome, with special emphasis on exon-intron boundaries; the microarray data describe gene expression of 20 time points during three major stages of the T. thermophila life cycle; the gene network data identify potential gene-gene interactions of 15 049 genes. The TetraFGD provides user-friendly search functions that assist researchers in accessing gene models, transcripts, gene expression data and gene-gene relationships. In conclusion, the TetraFGD is an important functional genomic resource for researchers who focus on the Tetrahymena or other ciliates. Database URL: http://tfgd.ihb.ac.cn/ PMID:23482072

  18. The YeastGenome app: the Saccharomyces Genome Database at your fingertips

    PubMed Central

    Wong, Edith D.; Karra, Kalpana; Hitz, Benjamin C.; Hong, Eurie L.; Cherry, J. Michael

    2013-01-01

    The Saccharomyces Genome Database (SGD) is a scientific database that provides researchers with high-quality curated data about the genes and gene products of Saccharomyces cerevisiae. To provide instant and easy access to this information on mobile devices, we have developed YeastGenome, a native application for the Apple iPhone and iPad. YeastGenome can be used to quickly find basic information about S. cerevisiae genes and chromosomal features regardless of internet connectivity. With or without network access, you can view basic information and Gene Ontology annotations about a gene of interest by searching gene names and gene descriptions or by browsing the database within the app to find the gene of interest. With internet access, the app provides more detailed information about the gene, including mutant phenotypes, references and protein and genetic interactions, as well as provides hyperlinks to retrieve detailed information by showing SGD pages and views of the genome browser. SGD provides online help describing basic ways to navigate the mobile version of SGD, highlights key features and answers frequently asked questions related to the app. The app is available from iTunes (http://itunes.com/apps/yeastgenome). The YeastGenome app is provided freely as a service to our community, as part of SGD’s mission to provide free and open access to all its data and annotations. PMID:23396302

  19. Dual genome microarray: Fusarium verticillioides and Aspergillus flavus gene expression in co-culture

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxins produced by Aspergillus flavus, and fumonisins produced by Fusarium verticillioides, are prominent among the mycotoxins associated with economic losses to the maize grain industry worldwide. F. verticillioides is also recognized as a systemic endophyte of maize that prevents opportunisti...

  20. Plant Genome DataBase Japan (PGDBj): A Portal Website for the Integration of Plant Genome-Related Databases

    PubMed Central

    Asamizu, Erika; Ichihara, Hisako; Nakaya, Akihiro; Nakamura, Yasukazu; Hirakawa, Hideki; Ishii, Takahiro; Tamura, Takuro; Fukami-Kobayashi, Kaoru; Nakajima, Yukari; Tabata, Satoshi

    2014-01-01

    The Plant Genome DataBase Japan (PGDBj, http://pgdbj.jp/?ln=en) is a portal website that aims to integrate plant genome-related information from databases (DBs) and the literature. The PGDBj is comprised of three component DBs and a cross-search engine, which provides a seamless search over the contents of the DBs. The three DBs are as follows. (i) The Ortholog DB, providing gene cluster information based on the amino acid sequence similarity. Over 500,000 amino acid sequences of 20 Viridiplantae species were subjected to reciprocal BLAST searches and clustered. Sequences from plant genome DBs (e.g. TAIR10 and RAP-DB) were also included in the cluster with a direct link to the original DB. (ii) The Plant Resource DB, integrating the SABRE DB, which provides cDNA and genome sequence resources accumulated and maintained in the RIKEN BioResource Center and National BioResource Projects. (iii) The DNA Marker DB, providing manually or automatically curated information of DNA markers, quantitative trait loci and related linkage maps, from the literature and external DBs. As the PGDBj targets various plant species, including model plants, algae, and crops important as food, fodder and biofuel, researchers in the field of basic biology as well as a wide range of agronomic fields are encouraged to perform searches using DNA sequences, gene names, traits and phenotypes of interest. The PGDBj will return the search results from the component DBs and various types of linked external DBs. PMID:24363285

  1. Tomato Functional Genomics Database: a comprehensive resource and analysis package for tomato functional genomics.

    PubMed

    Fei, Zhangjun; Joung, Je-Gun; Tang, Xuemei; Zheng, Yi; Huang, Mingyun; Lee, Je Min; McQuinn, Ryan; Tieman, Denise M; Alba, Rob; Klee, Harry J; Giovannoni, James J

    2011-01-01

    Tomato Functional Genomics Database (TFGD) provides a comprehensive resource to store, query, mine, analyze, visualize and integrate large-scale tomato functional genomics data sets. The database is functionally expanded from the previously described Tomato Expression Database by including metabolite profiles as well as large-scale tomato small RNA (sRNA) data sets. Computational pipelines have been developed to process microarray, metabolite and sRNA data sets archived in the database, respectively, and TFGD provides downloads of all the analyzed results. TFGD is also designed to enable users to easily retrieve biologically important information through a set of efficient query interfaces and analysis tools, including improved array probe annotations as well as tools to identify co-expressed genes, significantly affected biological processes and biochemical pathways from gene expression data sets and miRNA targets, and to integrate transcript and metabolite profiles, and sRNA and mRNA sequences. The suite of tools and interfaces in TFGD allow intelligent data mining of recently released and continually expanding large-scale tomato functional genomics data sets. TFGD is available at http://ted.bti.cornell.edu. PMID:20965973

  2. fPoxDB: fungal peroxidase database for comparative genomics

    PubMed Central

    2014-01-01

    Background Peroxidases are a group of oxidoreductases which mediate electron transfer from hydrogen peroxide (H2O2) and organic peroxide to various electron acceptors. They possess a broad spectrum of impact on industry and fungal biology. There are numerous industrial applications using peroxidases, such as to catalyse highly reactive pollutants and to breakdown lignin for recycling of carbon sources. Moreover, genes encoding peroxidases play important roles in fungal pathogenicity in both humans and plants. For better understanding of fungal peroxidases at the genome-level, a novel genomics platform is required. To this end, Fungal Peroxidase Database (fPoxDB; http://peroxidase.riceblast.snu.ac.kr/) has been developed to provide such a genomics platform for this important gene family. Description In order to identify and classify fungal peroxidases, 24 sequence profiles were built and applied on 331 genomes including 216 from fungi and Oomycetes. In addition, NoxR, which is known to regulate NADPH oxidases (NoxA and NoxB) in fungi, was also added to the pipeline. Collectively, 6,113 genes were predicted to encode 25 gene families, presenting well-separated distribution along the taxonomy. For instance, the genes encoding lignin peroxidase, manganese peroxidase, and versatile peroxidase were concentrated in the rot-causing basidiomycetes, reflecting their ligninolytic capability. As a genomics platform, fPoxDB provides diverse analysis resources, such as gene family predictions based on fungal sequence profiles, pre-computed results of eight bioinformatics programs, similarity search tools, a multiple sequence alignment tool, domain analysis functions, and taxonomic distribution summary, some of which are not available in the previously developed peroxidase resource. In addition, fPoxDB is interconnected with other family web systems, providing extended analysis opportunities. Conclusions fPoxDB is a fungi-oriented genomics platform for peroxidases. The sequence

  3. The Rat Genome Database 2015: genomic, phenotypic and environmental variations and disease

    PubMed Central

    Shimoyama, Mary; De Pons, Jeff; Hayman, G. Thomas; Laulederkind, Stanley J.F.; Liu, Weisong; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R.; Tutaj, Marek; Wang, Shur-Jen; Worthey, Elizabeth; Dwinell, Melinda; Jacob, Howard

    2015-01-01

    The Rat Genome Database (RGD, http://rgd.mcw.edu) provides the most comprehensive data repository and informatics platform related to the laboratory rat, one of the most important model organisms for disease studies. RGD maintains and updates datasets for genomic elements such as genes, transcripts and increasingly in recent years, sequence variations, as well as map positions for multiple assemblies and sequence information. Functional annotations for genomic elements are curated from published literature, submitted by researchers and integrated from other public resources. Complementing the genomic data catalogs are those associated with phenotypes and disease, including strains, QTL and experimental phenotype measurements across hundreds of strains. Data are submitted by researchers, acquired through bulk data pipelines or curated from published literature. Innovative software tools provide users with an integrated platform to query, mine, display and analyze valuable genomic and phenomic datasets for discovery and enhancement of their own research. This update highlights recent developments that reflect an increasing focus on: (i) genomic variation, (ii) phenotypes and diseases, (iii) data related to the environment and experimental conditions and (iv) datasets and software tools that allow the user to explore and analyze the interactions among these and their impact on disease. PMID:25355511

  4. Sequence modelling and an extensible data model for genomic database

    SciTech Connect

    Li, Peter Wei-Der Lawrence Berkeley Lab., CA )

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  5. Sequence modelling and an extensible data model for genomic database

    SciTech Connect

    Li, Peter Wei-Der |

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  6. The Saccharomyces Genome Database: Advanced Searching Methods and Data Mining.

    PubMed

    Cherry, J Michael

    2015-12-01

    At the core of the Saccharomyces Genome Database (SGD) are chromosomal features that encode a product. These include protein-coding genes and major noncoding RNA genes, such as tRNA and rRNA genes. The basic entry point into SGD is a gene or open-reading frame name that leads directly to the locus summary information page. A keyword describing function, phenotype, selective condition, or text from abstracts will also provide a door into the SGD. A DNA or protein sequence can be used to identify a gene or a chromosomal region using BLAST. Protein and DNA sequence identifiers, PubMed and NCBI IDs, author names, and function terms are also valid entry points. The information in SGD has been gathered and is maintained by a group of scientific biocurators and software developers who are devoted to providing researchers with up-to-date information from the published literature, connections to all the major research resources, and tools that allow the data to be explored. All the collected information cannot be represented or summarized for every possible question; therefore, it is necessary to be able to search the structured data in the database. This protocol describes the YeastMine tool, which provides an advanced search capability via an interactive tool. The SGD also archives results from microarray expression experiments, and a strategy designed to explore these data using the SPELL (Serial Pattern of Expression Levels Locator) tool is provided. PMID:26631124

  7. The Saccharomyces Genome Database: Exploring Biochemical Pathways and Mutant Phenotypes.

    PubMed

    Cherry, J Michael

    2015-12-01

    Many biochemical processes, and the proteins and cofactors involved, have been defined for the eukaryote Saccharomyces cerevisiae. This understanding has been largely derived through the awesome power of yeast genetics. The proteins responsible for the reactions that build complex molecules and generate energy for the cell have been integrated into web-based tools that provide classical views of pathways. The Yeast Pathways in the Saccharomyces Genome Database (SGD) is, however, the only database created from manually curated literature annotations. In this protocol, gene function is explored using phenotype annotations to enable hypotheses to be formulated about a gene's action. A common use of the SGD is to understand more about a gene that was identified via a phenotypic screen or found to interact with a gene/protein of interest. There are still many genes that do not yet have an experimentally defined function and so the information currently available can be used to speculate about their potential function. Typically, computational annotations based on sequence similarity are used to predict gene function. In addition, annotations are sometimes available for phenotypes of mutations in the gene of interest. Integrated results for a few example genes will be explored in this protocol. This will be instructive for the exploration of details that aid the analysis of experimental results and the establishment of connections within the yeast literature. PMID:26631123

  8. The release of the Aspergillus flavus whole genome sequence for public access

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxins are produced by Aspergillus flavus and A. parasiticus. These toxic and carcinogenic compounds contaminate pre-harvest agricultural crops in the field and post-harvest grains during storage. In order to reduce and eliminate aflatoxin contamination of food and feed, Aspergilllus flavus wh...

  9. Aspergillus flavus whole genome and EST sequence releases and construction of homologous gene search blast server

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxins are toxic and carcinogenic secondary metabolites. These compounds, produced by Aspergillus flavus and A. parasiticus, contaminate pre-harvest agricultural crops in the field and post-harvest grains during storage. In order to reduce and eliminate aflatoxin contamination of food and feed...

  10. Genome wide association mapping of Aspergillus flavus and aflatoxin accumulation resistance in maize

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Contamination of maize with aflatoxin, produced by the fungus Aspergillus flavus, has severe health and economic consequences. Efforts to reduce aflatoxin accumulation in maize have focused on identifying and selecting germplasm with natural host resistance factors, and several maize lines with sign...

  11. Aspergillus flavus Genomics as a Tool for Studying the Mechanism of Aflatoxin Formation

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus is a pathogen that infects plants, animals, and humans. It produces the most potent carcinogens, known as aflatoxins, when it infects agricultural crops. In order to devise strategies to control aflatoxin contamination of pre-harvest agricultural crops and post harvest grains du...

  12. Aspergillus flavus Genomic Data Mining Provides Clues for Its Use in Producing Biobased Products

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus is notorious for its ability to produce aflatoxins. It is also an opportunistic pathogen that infects plants, animals and human beings. The ability to survive in the natural environment, living on plant tissues (leaves or stalks), live or dead insects make A. flavus a ubiquitous...

  13. The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata

    SciTech Connect

    Fenner, Marsha W; Liolios, Konstantinos; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Kyrpides, Nikos C.

    2007-12-31

    The Genomes On Line Database (GOLD) is a comprehensive resource of information for genome and metagenome projects world-wide. GOLD provides access to complete and ongoing projects and their associated metadata through pre-computed lists and a search page. The database currently incorporates information for more than 2900 sequencing projects, of which 639 have been completed and the data deposited in the public databases. GOLD is constantly expanding to provide metadata information related to the project and the organism and is compliant with the Minimum Information about a Genome Sequence (MIGS) specifications.

  14. MBGD update 2015: microbial genome database for flexible ortholog analysis utilizing a diverse set of genomic data.

    PubMed

    Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu

    2015-01-01

    The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information. PMID:25398900

  15. ECRbase: Database of Evolutionary Conserved Regions, Promoters, and Transcription Factor Binding Sites in Vertebrate Genomes

    DOE Data Explorer

    Loots, Gabriela G. [LLNL; Ovcharenko, I. [LLNL

    Evolutionary conservation of DNA sequences provides a tool for the identification of functional elements in genomes. This database of evolutionary conserved regions (ECRs) in vertebrate genomes features a database of syntenic blocks that recapitulate the evolution of rearrangements in vertebrates and a comprehensive collection of promoters in all vertebrate genomes generated using multiple sources of gene annotation. The database also contains a collection of annotated transcription factor binding sites (TFBSs) in evolutionary conserved and promoter elements. ECRbase currently includes human, rhesus macaque, dog, opossum, rat, mouse, chicken, frog, zebrafish, and fugu genomes. (taken from paper in Journal: Bioinformatics, November 7, 2006, pp. 122-124

  16. ECRbase: Database of Evolutionary Conserved Regions, Promoters, and Transcription Factor Binding Sites in Vertebrate Genomes

    SciTech Connect

    Loots, G; Ovcharenko, I

    2006-08-08

    Evolutionary conservation of DNA sequences provides a tool for the identification of functional elements in genomes. We have created a database of evolutionary conserved regions (ECRs) in vertebrate genomes entitled ECRbase that is constructed from a collection of pairwise vertebrate genome alignments produced by the ECR Browser database. ECRbase features a database of syntenic blocks that recapitulate the evolution of rearrangements in vertebrates and a collection of promoters in all vertebrate genomes presented in the database. The database also contains a collection of annotated transcription factor binding sites (TFBS) in all ECRs and promoter elements. ECRbase currently includes human, rhesus macaque, dog, opossum, rat, mouse, chicken, frog, zebrafish, and two pufferfish genomes. It is freely accessible at http://ECRbase.dcode.org.

  17. MELOGEN: an EST database for melon functional genomics

    PubMed Central

    Gonzalez-Ibeas, Daniel; Blanca, José; Roig, Cristina; González-To, Mireia; Picó, Belén; Truniger, Verónica; Gómez, Pedro; Deleu, Wim; Caño-Delgado, Ana; Arús, Pere; Nuez, Fernando; Garcia-Mas, Jordi; Puigdomènech, Pere; Aranda, Miguel A

    2007-01-01

    Background Melon (Cucumis melo L.) is one of the most important fleshy fruits for fresh consumption. Despite this, few genomic resources exist for this species. To facilitate the discovery of genes involved in essential traits, such as fruit development, fruit maturation and disease resistance, and to speed up the process of breeding new and better adapted melon varieties, we have produced a large collection of expressed sequence tags (ESTs) from eight normalized cDNA libraries from different tissues in different physiological conditions. Results We determined over 30,000 ESTs that were clustered into 16,637 non-redundant sequences or unigenes, comprising 6,023 tentative consensus sequences (contigs) and 10,614 unclustered sequences (singletons). Many potential molecular markers were identified in the melon dataset: 1,052 potential simple sequence repeats (SSRs) and 356 single nucleotide polymorphisms (SNPs) were found. Sixty-nine percent of the melon unigenes showed a significant similarity with proteins in databases. Functional classification of the unigenes was carried out following the Gene Ontology scheme. In total, 9,402 unigenes were mapped to one or more ontology. Remarkably, the distributions of melon and Arabidopsis unigenes followed similar tendencies, suggesting that the melon dataset is representative of the whole melon transcriptome. Bioinformatic analyses primarily focused on potential precursors of melon micro RNAs (miRNAs) in the melon dataset, but many other genes potentially controlling disease resistance and fruit quality traits were also identified. Patterns of transcript accumulation were characterised by Real-Time-qPCR for 20 of these genes. Conclusion The collection of ESTs characterised here represents a substantial increase on the genetic information available for melon. A database (MELOGEN) which contains all EST sequences, contig images and several tools for analysis and data mining has been created. This set of sequences constitutes

  18. Draft genome sequences of two closely-related aflatoxigenic Aspergillus species obtained from the Ivory Coast

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genomes of the A. ochraceoroseus and A. rambellii type strains were sequenced using a personal genome machine, followed by annotation of their genes. The genome size for A. ochraceoroseus was found to be approximately 23 Mb and contained 7,837 genes, while the A. rambellii genome was found to be...

  19. Genome-scale analysis of the high-efficient protein secretion system of Aspergillus oryzae

    PubMed Central

    2014-01-01

    Background The koji mold, Aspergillus oryzae is widely used for the production of industrial enzymes due to its particularly high protein secretion capacity and ability to perform post-translational modifications. However, systemic analysis of its secretion system is lacking, generally due to the poorly annotated proteome. Results Here we defined a functional protein secretory component list of A. oryzae using a previously reported secretory model of S. cerevisiae as scaffold. Additional secretory components were obtained by blast search with the functional components reported in other closely related fungal species such as Aspergillus nidulans and Aspergillus niger. To evaluate the defined component list, we performed transcriptome analysis on three α-amylase over-producing strains with varying levels of secretion capacities. Specifically, secretory components involved in the ER-associated processes (including components involved in the regulation of transport between ER and Golgi) were significantly up-regulated, with many of them never been identified for A. oryzae before. Furthermore, we defined a complete list of the putative A. oryzae secretome and monitored how it was affected by overproducing amylase. Conclusion In combination with the transcriptome data, the most complete secretory component list and the putative secretome, we improved the systemic understanding of the secretory machinery of A. oryzae in response to high levels of protein secretion. The roles of many newly predicted secretory components were experimentally validated and the enriched component list provides a better platform for driving more mechanistic studies of the protein secretory pathway in this industrially important fungus. PMID:24961398

  20. Purification and enzymatic characterization of secretory glycoside hydrolase family 3 (GH3) aryl β-glucosidases screened from Aspergillus oryzae genome.

    PubMed

    Kudo, Kanako; Watanabe, Akira; Ujiie, Seiryu; Shintani, Takahiro; Gomi, Katsuya

    2015-12-01

    By a global search of the genome database of Aspergillus oryzae, we found 23 genes encoding putative β-glucosidases, among which 10 genes with a signal peptide belonging to glycoside hydrolase family 3 (GH3) were overexpressed in A. oryzae using the improved glaA gene promoter. Consequently, crude enzyme preparations from three strains, each harboring the genes AO090038000223 (bglA), AO090103000127 (bglF), and AO090003001511 (bglJ), showed a substrate preference toward p-nitrophenyl-β-d-glucopyranoside (pNPGlc) and thus were purified to homogeneity and enzymatically characterized. All the purified enzymes (BglA, BglF, and BglJ) preferentially hydrolyzed aryl β-glycosides, including pNPGlc, rather than cellobiose, and these enzymes were proven to be aryl β-glucosidases. Although the specific activity of BglF toward all the substrates tested was significantly low, BglA and BglJ showed appreciably high activities toward pNPGlc and arbutin. The kinetic parameters of BglA and BglJ for pNPGlc suggested that both the enzymes had relatively higher hydrolytic activity toward pNPGlc among the fungal β-glucosidases reported. The thermal and pH stabilities of BglA were higher than those of BglJ, and BglA was particularly stable in a wide pH range (pH 4.5-10). In contrast, BglJ was the most heat- and alkaline-labile among the three β-glucosidases. Furthermore, BglA was more tolerant to ethanol than BglJ; as a result, it showed much higher hydrolytic activity toward isoflavone glycosides in the presence of ethanol than BglJ. This study suggested that the mining of novel β-glucosidases exhibiting higher activity from microbial genome sequences is of great use for the production of beneficial compounds such as isoflavone aglycones. PMID:25936960

  1. Brassica database (BRAD) version 2.0: integrating and mining Brassicaceae species genomic resources

    PubMed Central

    Wang, Xiaobo; Wu, Jian; Liang, Jianli; Cheng, Feng; Wang, Xiaowu

    2015-01-01

    The Brassica database (BRAD) was built initially to assist users apply Brassica rapa and Arabidopsis thaliana genomic data efficiently to their research. However, many Brassicaceae genomes have been sequenced and released after its construction. These genomes are rich resources for comparative genomics, gene annotation and functional evolutionary studies of Brassica crops. Therefore, we have updated BRAD to version 2.0 (V2.0). In BRAD V2.0, 11 more Brassicaceae genomes have been integrated into the database, namely those of Arabidopsis lyrata, Aethionema arabicum, Brassica oleracea, Brassica napus, Camelina sativa, Capsella rubella, Leavenworthia alabamica, Sisymbrium irio and three extremophiles Schrenkiella parvula, Thellungiella halophila and Thellungiella salsuginea. BRAD V2.0 provides plots of syntenic genomic fragments between pairs of Brassicaceae species, from the level of chromosomes to genomic blocks. The Generic Synteny Browser (GBrowse_syn), a module of the Genome Browser (GBrowse), is used to show syntenic relationships between multiple genomes. Search functions for retrieving syntenic and non-syntenic orthologs, as well as their annotation and sequences are also provided. Furthermore, genome and annotation information have been imported into GBrowse so that all functional elements can be visualized in one frame. We plan to continually update BRAD by integrating more Brassicaceae genomes into the database. Database URL: http://brassicadb.org/brad/ PMID:26589635

  2. Brassica database (BRAD) version 2.0: integrating and mining Brassicaceae species genomic resources.

    PubMed

    Wang, Xiaobo; Wu, Jian; Liang, Jianli; Cheng, Feng; Wang, Xiaowu

    2015-01-01

    The Brassica database (BRAD) was built initially to assist users apply Brassica rapa and Arabidopsis thaliana genomic data efficiently to their research. However, many Brassicaceae genomes have been sequenced and released after its construction. These genomes are rich resources for comparative genomics, gene annotation and functional evolutionary studies of Brassica crops. Therefore, we have updated BRAD to version 2.0 (V2.0). In BRAD V2.0, 11 more Brassicaceae genomes have been integrated into the database, namely those of Arabidopsis lyrata, Aethionema arabicum, Brassica oleracea, Brassica napus, Camelina sativa, Capsella rubella, Leavenworthia alabamica, Sisymbrium irio and three extremophiles Schrenkiella parvula, Thellungiella halophila and Thellungiella salsuginea. BRAD V2.0 provides plots of syntenic genomic fragments between pairs of Brassicaceae species, from the level of chromosomes to genomic blocks. The Generic Synteny Browser (GBrowse_syn), a module of the Genome Browser (GBrowse), is used to show syntenic relationships between multiple genomes. Search functions for retrieving syntenic and non-syntenic orthologs, as well as their annotation and sequences are also provided. Furthermore, genome and annotation information have been imported into GBrowse so that all functional elements can be visualized in one frame. We plan to continually update BRAD by integrating more Brassicaceae genomes into the database. Database URL: http://brassicadb.org/brad/. PMID:26589635

  3. Comparative genomics of citric-acid producing Aspergillus niger ATCC 1015 versus enzyme-producing CBS 513.88

    SciTech Connect

    Andersen, Mikael R.; Salazar, Margarita; Schaap, Peter; van de Vondervoort, Peter; Culley, David E.; Thykaer, Jette; Frisvad, Jens C.; Nielsen, Kristian F.; Albang, Richard; Albermann, Kaj; Berka, Randy; Braus, Gerhard; Braus-Stromeyer, Susanna A.; Corrochano, Luis; Dai, Ziyu; van Dijck, Piet; Hofmann, Gerald; Lasure, Linda L.; Magnuson, Jon K.; Menke, Hildegard; Meijer, Martin; Meijer, Susan; Nielsen, Jakob B.; Nielsen, Michael L.; van Ooyen, Albert; Pel, Herman J.; Poulsen, Lars; Samson, Rob; Stam, Hein; Tsang, Adrian; van den Brink, Johannes M.; ATkins, Alex; Aerts, Andrea; Shapiro, Harris; Pangilinan, Jasmyn; Salamov, Asaf; Lou, Yigong; Lindquist, Erika; Lucas, Susan; Grimwood, Jane; Grigoriev, Igor V.; Kubicek, Christian P.; Martinez, Diego; van Peij, Noel; Roubos, Johannes A.; Nielsen, Jens B.; Baker, Scott E.

    2011-06-01

    The filamentous fungus Aspergillus niger exhibits great diversity in its phenotype. It is found globally, both as marine and terrestrial strains, produces both organic acids and hydrolytic enzymes in high amounts, and some isolates exhibit pathogenicity. Although the genome of an industrial enzyme-producing A. niger strain (CBS 513.88) has already been sequenced, the versatility and diversity of this species compels additional exploration. We therefore undertook whole genome sequencing of the acidogenic A. niger wild type strain (ATCC 1015), and produced a genome sequence of very high quality. Only 15 gaps are present in the sequence and half the telomeric regions have been elucidated. Moreover, sequence information from ATCC 1015 was utilized to improve the genome sequence of CBS 513.88. Chromosome-level comparisons uncovered several genome rearrangements, deletions, a clear case of strain-specific horizontal gene transfer, and identification of 0.8 megabase of novel sequence. Single nucleotide polymorphisms per kilobase (SNPs/kb) between the two strains were found to be exceptionally high (average: 7.8, maximum: 160 SNPs/kb). High variation within the species was confirmed with exo-metabolite profiling and phylogenetics. Detailed lists of alleles were generated, and genotypic differences were observed to accumulate in metabolic pathways essential to acid production and protein synthesis. A transcriptome analysis revealed up-regulation of the electron transport chain, specifically the alternative oxidative pathway in ATCC 1015, while CBS 513.88 showed significant up regulation of genes associated with biosynthesis of amino acids that are abundant in glucoamylase A, tRNA-synthases and protein transporters.

  4. pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.

    PubMed

    Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael

    2013-08-01

    With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains. PMID:23826978

  5. Biosynthetic Pathway for the Epipolythiodioxopiperazine Acetylaranotin in Aspergillus terreus Revealed by Genome-based Deletion Analysis

    SciTech Connect

    Guo, Chun-Jun; Yeh, Hsu-Hua; Chiang, Yi Ming; Sanchez, James F.; Chang, ShuLin; Bruno, Kenneth S.; Wang, Clay C.

    2013-04-15

    Abstract Epipolythiodioxopiperazines (ETPs) are a class of fungal secondary metabolites derived from cyclic peptides. Acetylaranotin belongs to one structural subgroup of ETPs characterized by the presence of a seven-membered dihydrooxepine ring. Defining the genes involved in acetylaranotin biosynthesis should provide a means to increase production of these compounds and facilitate the engineering of second-generation molecules. The filamentous fungus Aspergillus terreus produces acetylaranotin and related natural products. Using targeted gene deletions, we have identified a cluster of 9 genes including one nonribosomal peptide synthase gene, ataP, that is required for acetylaranotin biosynthesis. Chemical analysis of the wild type and mutant strains enabled us to isolate seventeen natural products that are either intermediates in the normal biosynthetic pathway or shunt products that are produced when the pathway is interrupted through mutation. Nine of the compounds identified in this study are novel natural products. Our data allow us to propose a complete biosynthetic pathway for acetylaranotin and related natural products.

  6. Comparative genomics of citric-acid producing Aspergillus niger ATCC 1015 versus enzyme-producing CBS 513.88

    SciTech Connect

    Grigoriev, Igor V.; Baker, Scott E.; Andersen, Mikael R.; Salazar, Margarita P.; Schaap, Peter J.; Vondervoot, Peter J.I. van de; Culley, David; Thykaer, Jette; Frisvad, Jens C.; Nielsen, Kristen F.; Albang, Richard; Albermann, Kaj; Berka, Randy M.; Braus, Gerhard H.; Braus-Stromeyer, Susanna A.; Corrochano, Luis M.; Dai, Ziyu; Dijck, Piet W.M. van; Hofmann, Gerald; Lasure, Linda L.; Magnusson, Jon K.; Meijer, Susan L.; Nielsen, Jakob B.; Nielsen, Michael L.; Ooyen, Albert J.J. van; Panther, Kathyrn S.; Pel, Herman J.; Poulsen, Lars; Samson, Rob A.; Stam, Hen; Tsang, Adrian; Brink, Johannes M. van den; Atkins, Alex; Aerts, Andrea; Shapiro, Harris; Pangilinan, Jasmyn; Salamov, Asaf; Lou, Yigong; Lindquist, Erika; Lucas, Susan; Grimwood, Jane; Kubicek, Christian P.; Martinez, Diego; Peij, Noel N.M.E. van; Roubos, Johannes A.; Nielsen, Jens

    2011-04-28

    The filamentous fungus Aspergillus niger exhibits great diversity in its phenotype. It is found globally, both as marine and terrestrial strains, produces both organic acids and hydrolytic enzymes in high amounts, and some isolates exhibit pathogenicity. Although the genome of an industrial enzyme-producing A. niger strain (CBS 513.88) has already been sequenced, the versatility and diversity of this species compels additional exploration. We therefore undertook whole genome sequencing of the acidogenic A. niger wild type strain (ATCC 1015), and produced a genome sequence of very high quality. Only 15 gaps are present in the sequence and half the telomeric regions have been elucidated. Moreover, sequence information from ATCC 1015 was utilized to improve the genome sequence of CBS 513.88. Chromosome-level comparisons uncovered several genome rearrangements, deletions, a clear case of strain-specific horizontal gene transfer, and identification of 0.8 megabase of novel sequence. Single nucleotide polymorphisms per kilobase (SNPs/kb) between the two strains were found to be exceptionally high (average: 7.8, maximum: 160 SNPs/kb). High variation within the species was confirmed with exo-metabolite profiling and phylogenetics. Detailed lists of alleles were generated, and genotypic differences were observed to accumulate in metabolic pathways essential to acid production and protein synthesis. A transcriptome analysis revealed up-regulation of the electron transport chain, specifically the alternative oxidative pathway in ATCC 1015, while CBS 513.88 showed significant up-regulation of genes relevant to glucoamylase A production, such as tRNA-synthases and protein transporters. Our results and datasets from this integrative systems biology analysis resulted in a snapshot of fungal evolution and will support further optimization of cell factories based on filamentous fungi.[Supplemental materials (10 figures, three text documents and 16 tables) have been made available

  7. The MaizeGDB Genome Browser Tutorial: One example of database outreach to biologists via video

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Video tutorials are an effective way for researchers to quickly learn how to use online tools offered by biological databases. At the Maize Genetics and Genomics Database (MaizeGDB), we have developed a number of video tutorials that aim to demonstrate how to use various tools as well as to explici...

  8. CottonGen: a genomics, genetics and breeding database for cotton research

    Technology Transfer Automated Retrieval System (TEKTRAN)

    CottonGen (http://www.cottongen.org) is a curated and integrated web-based relational database providing access to publicly available genomic, genetic and breeding data for cotton. CottonGen supercedes CottonDB and the Cotton Marker Database, with enhanced tools for easier data sharing, mining, vis...

  9. Use of Genomic Databases for Inquiry-Based Learning about Influenza

    ERIC Educational Resources Information Center

    Ledley, Fred; Ndung'u, Eric

    2011-01-01

    The genome projects of the past decades have created extensive databases of biological information with applications in both research and education. We describe an inquiry-based exercise that uses one such database, the National Center for Biotechnology Information Influenza Virus Resource, to advance learning about influenza. This database…

  10. Characterization of the Mutagenic Spectrum of 4-Nitroquinoline 1-Oxide (4-NQO) in Aspergillus nidulans by Whole Genome Sequencing

    PubMed Central

    Downes, Damien J.; Chonofsky, Mark; Tan, Kaeling; Pfannenstiel, Brandon T.; Reck-Peterson, Samara L.; Todd, Richard B.

    2014-01-01

    4-Nitroquinoline 1-oxide (4-NQO) is a highly carcinogenic chemical that induces mutations in bacteria, fungi, and animals through the formation of bulky purine adducts. 4-NQO has been used as a mutagen for genetic screens and in both the study of DNA damage and DNA repair. In the model eukaryote Aspergillus nidulans, 4-NQO−based genetic screens have been used to study diverse processes, including gene regulation, mitosis, metabolism, organelle transport, and septation. Early work during the 1970s using bacterial and yeast mutation tester strains concluded that 4-NQO was a guanine-specific mutagen. However, these strains were limited in their ability to determine full mutagenic potential, as they could not identify mutations at multiple sites, unlinked suppressor mutations, or G:C to C:G transversions. We have now used a whole genome resequencing approach with mutant strains generated from two independent genetic screens to determine the full mutagenic spectrum of 4-NQO in A. nidulans. Analysis of 3994 mutations from 38 mutant strains reveals that 4-NQO induces substitutions in both guanine and adenine residues, although with a 19-fold preference for guanine. We found no association between mutation load and mutagen dose and observed no sequence bias in the residues flanking the mutated purine base. The mutations were distributed randomly throughout most of the genome. Our data provide new evidence that 4-NQO can potentially target all base pairs. Furthermore, we predict that current practices for 4-NQO−induced mutagenesis are sufficient to reach gene saturation for genetic screens with feasible identification of causative mutations via whole genome resequencing. PMID:25352541

  11. SoyBase, The USDA-ARS Soybean Genetics and Genomics Database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    SoyBase, the USDA-ARS soybean genetic database, is a comprehensive repository for professionally curated genetics, genomics and related data resources for soybean. SoyBase contains the most current genetic, physical and genomic sequence maps integrated with qualitative and quantitative traits. The...

  12. CURRENT STATUS OF THE COTTON DB, A GENOME DATABASE FOR GOSSYPIUM SPP

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The cotton genome database, CottonDB, is a publicly available resource for cotton genome research ranging from the collections of Gossypium germplasm, molecular markers, to the functions of cotton genes. Curation of CottonDB is currently maintained in our Research Unit (http://algodon.tamu.edu/cotto...

  13. CURRENT STATUS OF THE COTTONDB, A GENOME DATABASE FOR GOSSYPIUM SPP

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The cotton genome database, CottonDB, is a publicly available resource for cotton genome research ranging from the collections of Gossypium germplasm, molecular markers, to the functions of cotton genes. Curation of CottonDB is currently maintained in our Research Unit(http://algodon.tamu.edu/cotton...

  14. The MaizeGDB Genome Browser tutorial: one example of database outreach to biologists via video

    PubMed Central

    Harper, Lisa C.; Schaeffer, Mary L.; Thistle, Jordan; Gardiner, Jack M.; Andorf, Carson M.; Campbell, Darwin A.; Cannon, Ethalinda K.S.; Braun, Bremen L.; Birkett, Scott M.; Lawrence, Carolyn J.; Sen, Taner Z.

    2011-01-01

    Video tutorials are an effective way for researchers to quickly learn how to use online tools offered by biological databases. At MaizeGDB, we have developed a number of video tutorials that demonstrate how to use various tools and explicitly outline the caveats researchers should know to interpret the information available to them. One such popular video currently available is ‘Using the MaizeGDB Genome Browser’, which describes how the maize genome was sequenced and assembled as well as how the sequence can be visualized and interacted with via the MaizeGDB Genome Browser. Database URL: http://www.maizegdb.org/ PMID:21565781

  15. Mitome: dynamic and interactive database for comparative mitochondrial genomics in metazoan animals.

    PubMed

    Lee, Yong Seok; Oh, Jeongsu; Kim, Young Uk; Kim, Namchul; Yang, Sungjin; Hwang, Ui Wook

    2008-01-01

    Mitome is a specialized mitochondrial genome database designed for easy comparative analysis of various features of metazoan mitochondrial genomes such as base frequency, A+T skew, codon usage and gene arrangement pattern. A particular function of the database is the automatic reconstruction of phylogenetic relationships among metazoans selected by a user from a taxonomic tree menu based on nucleotide sequences, amino acid sequences or gene arrangement patterns. Mitome also enables us (i) to easily find the taxonomic positions of organisms of which complete mitochondrial genome sequences are publicly available; (ii) to acquire various metazoan mitochondrial genome characteristics through a graphical genome browser; (iii) to search for homology patterns in mitochondrial gene arrangements; (iv) to download nucleotide or amino acid sequences not only of an entire mitochondrial genome but also of each component; and (v) to find interesting references easily through links with PubMed. In order to provide users with a dynamic, responsive, interactive and faster web database, Mitome is constructed using two recently highlighted techniques, Ajax (Asynchronous JavaScript and XML) and Web Services. Mitome has the potential to become very useful in the fields of molecular phylogenetics and evolution and comparative organelle genomics. The database is available at: http://www.mitome.info. PMID:17940090

  16. Sputnik: a database platform for comparative plant genomics.

    PubMed

    Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F X

    2003-01-01

    Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics. PMID:12519965

  17. MIPS PlantsDB: a database framework for comparative plant genome research.

    PubMed

    Nussbaumer, Thomas; Martis, Mihaela M; Roessner, Stephan K; Pfeifer, Matthias; Bader, Kai C; Sharma, Sapna; Gundlach, Heidrun; Spannagl, Manuel

    2013-01-01

    The rapidly increasing amount of plant genome (sequence) data enables powerful comparative analyses and integrative approaches and also requires structured and comprehensive information resources. Databases are needed for both model and crop plant organisms and both intuitive search/browse views and comparative genomics tools should communicate the data to researchers and help them interpret it. MIPS PlantsDB (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) was initially described in NAR in 2007 [Spannagl,M., Noubibou,O., Haase,D., Yang,L., Gundlach,H., Hindemitt, T., Klee,K., Haberer,G., Schoof,H. and Mayer,K.F. (2007) MIPSPlantsDB-plant database resource for integrative and comparative plant genome research. Nucleic Acids Res., 35, D834-D840] and was set up from the start to provide data and information resources for individual plant species as well as a framework for integrative and comparative plant genome research. PlantsDB comprises database instances for tomato, Medicago, Arabidopsis, Brachypodium, Sorghum, maize, rice, barley and wheat. Building up on that, state-of-the-art comparative genomics tools such as CrowsNest are integrated to visualize and investigate syntenic relationships between monocot genomes. Results from novel genome analysis strategies targeting the complex and repetitive genomes of triticeae species (wheat and barley) are provided and cross-linked with model species. The MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) as well as tight connections to other databases, e.g. via web services, are further important components of PlantsDB. PMID:23203886

  18. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency

    PubMed Central

    Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio

    2015-01-01

    Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB. PMID:26558254

  19. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency.

    PubMed

    Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio

    2015-01-01

    Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB. PMID:26558254

  20. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

    SciTech Connect

    Liolios, Konstantinos; Chen, Amy; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Phil; Markowitz, Victor; Kyrpides, Nikos C.

    2009-09-01

    The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification.

  1. Databases in SenseLab for the Genomics, Protemics, and Function of Olfactory Receptors

    PubMed Central

    Marenco, Luis N.; Bahl, Gautam; Hyland, Lorra; Shi, Jing; Wang, Rixin; Lai, Peter C.; Miller, Perry L.; Shepherd, Gordon M.; Crasto, Chiquito J.

    2013-01-01

    We present here, the salient aspects of three databases: Olfactory Receptor Database (ORDB) is a repository of genomics and proteomics information of ORs; OdorDB stores information related to odorous compounds, specifically identitying those that have been shown to interact with olfactory rectors; and OdorModelDB disseminates information related to computational models of olfactory receptors (ORs). The data stored among these databases is integrated. Presented in this chapter are descriptions of these resources, which are part of the SenseLab suite of databases, a discussion of the computational infrastructure that enhances the efficacy of information storage, retrieval, dissemination, and automated data population from external sources. PMID:23585030

  2. Databases and web tools for cancer genomics study.

    PubMed

    Yang, Yadong; Dong, Xunong; Xie, Bingbing; Ding, Nan; Chen, Juan; Li, Yongjun; Zhang, Qian; Qu, Hongzhu; Fang, Xiangdong

    2015-02-01

    Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new knowledge. Here, we describe the web resources for cancer genomics research and rate them on the basis of the diversity of cancer types, sample size, omics data comprehensiveness, and user experience. The resources reviewed include data repository and analysis tools; and we hope such introduction will promote the awareness and facilitate the usage of these resources in the cancer research community. PMID:25707591

  3. Databases and Web Tools for Cancer Genomics Study

    PubMed Central

    Yang, Yadong; Dong, Xunong; Xie, Bingbing; Ding, Nan; Chen, Juan; Li, Yongjun; Zhang, Qian; Qu, Hongzhu; Fang, Xiangdong

    2015-01-01

    Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new knowledge. Here, we describe the web resources for cancer genomics research and rate them on the basis of the diversity of cancer types, sample size, omics data comprehensiveness, and user experience. The resources reviewed include data repository and analysis tools; and we hope such introduction will promote the awareness and facilitate the usage of these resources in the cancer research community. PMID:25707591

  4. PGSB PlantsDB: updates to the database framework for comparative plant genome research.

    PubMed

    Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai C; Martis, Mihaela M; Seidel, Michael; Kugler, Karl G; Gundlach, Heidrun; Mayer, Klaus F X

    2016-01-01

    PGSB (Plant Genome and Systems Biology: formerly MIPS) PlantsDB (http://pgsb.helmholtz-muenchen.de/plant/index.jsp) is a database framework for the comparative analysis and visualization of plant genome data. The resource has been updated with new data sets and types as well as specialized tools and interfaces to address user demands for intuitive access to complex plant genome data. In its latest incarnation, we have re-worked both the layout and navigation structure and implemented new keyword search options and a new BLAST sequence search functionality. Actively involved in corresponding sequencing consortia, PlantsDB has dedicated special efforts to the integration and visualization of complex triticeae genome data, especially for barley, wheat and rye. We enhanced CrowsNest, a tool to visualize syntenic relationships between genomes, with data from the wheat sub-genome progenitor Aegilops tauschii and added functionality to the PGSB RNASeqExpressionBrowser. GenomeZipper results were integrated for the genomes of barley, rye, wheat and perennial ryegrass and interactive access is granted through PlantsDB interfaces. Data exchange and cross-linking between PlantsDB and other plant genome databases is stimulated by the transPLANT project (http://transplantdb.eu/). PMID:26527721

  5. PGSB PlantsDB: updates to the database framework for comparative plant genome research

    PubMed Central

    Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai C.; Martis, Mihaela M.; Seidel, Michael; Kugler, Karl G.; Gundlach, Heidrun; Mayer, Klaus F.X.

    2016-01-01

    PGSB (Plant Genome and Systems Biology: formerly MIPS) PlantsDB (http://pgsb.helmholtz-muenchen.de/plant/index.jsp) is a database framework for the comparative analysis and visualization of plant genome data. The resource has been updated with new data sets and types as well as specialized tools and interfaces to address user demands for intuitive access to complex plant genome data. In its latest incarnation, we have re-worked both the layout and navigation structure and implemented new keyword search options and a new BLAST sequence search functionality. Actively involved in corresponding sequencing consortia, PlantsDB has dedicated special efforts to the integration and visualization of complex triticeae genome data, especially for barley, wheat and rye. We enhanced CrowsNest, a tool to visualize syntenic relationships between genomes, with data from the wheat sub-genome progenitor Aegilops tauschii and added functionality to the PGSB RNASeqExpressionBrowser. GenomeZipper results were integrated for the genomes of barley, rye, wheat and perennial ryegrass and interactive access is granted through PlantsDB interfaces. Data exchange and cross-linking between PlantsDB and other plant genome databases is stimulated by the transPLANT project (http://transplantdb.eu/). PMID:26527721

  6. Databases and information integration for the Medicago truncatula genome and transcriptome.

    PubMed

    Cannon, Steven B; Crow, John A; Heuer, Michael L; Wang, Xiaohong; Cannon, Ethalinda K S; Dwan, Christopher; Lamblin, Anne-Francoise; Vasdewani, Jayprakash; Mudge, Joann; Cook, Andrew; Gish, John; Cheung, Foo; Kenton, Steve; Kunau, Timothy M; Brown, Douglas; May, Gregory D; Kim, Dongjin; Cook, Douglas R; Roe, Bruce A; Town, Chris D; Young, Nevin D; Retzel, Ernest F

    2005-05-01

    An international consortium is sequencing the euchromatic genespace of Medicago truncatula. Extensive bioinformatic and database resources support the marker-anchored bacterial artificial chromosome (BAC) sequencing strategy. Existing physical and genetic maps and deep BAC-end sequencing help to guide the sequencing effort, while EST databases provide essential resources for genome annotation as well as transcriptome characterization and microarray design. Finished BAC sequences are joined into overlapping sequence assemblies and undergo an automated annotation process that integrates ab initio predictions with EST, protein, and other recognizable features. Because of the sequencing project's international and collaborative nature, data production, storage, and visualization tools are broadly distributed. This paper describes databases and Web resources for the project, which provide support for physical and genetic maps, genome sequence assembly, gene prediction, and integration of EST data. A central project Web site at medicago.org/genome provides access to genome viewers and other resources project-wide, including an Ensembl implementation at medicago.org, physical map and marker resources at mtgenome.ucdavis.edu, and genome viewers at the University of Oklahoma (www.genome.ou.edu), the Institute for Genomic Research (www.tigr.org), and Munich Information for Protein Sequences Center (mips.gsf.de). PMID:15888676

  7. A Ruby API to query the Ensembl database for genomic features

    PubMed Central

    Strozzi, Francesco; Aerts, Jan

    2011-01-01

    Summary: The Ensembl database makes genomic features available via its Genome Browser. It is also possible to access the underlying data through a Perl API for advanced querying. We have developed a full-featured Ruby API to the Ensembl databases, providing the same functionality as the Perl interface with additional features. A single Ruby API is used to access different releases of the Ensembl databases and is also able to query multi-species databases. Availability and Implementation: Most functionality of the API is provided using the ActiveRecord pattern. The library depends on introspection to make it release independent. The API is available through the Rubygem system and can be installed with the command gem install ruby-ensembl-api. Contact: jan.aerts@esat.kuleuven.be PMID:21278190

  8. SNP-Seek database of SNPs derived from 3000 rice genomes.

    PubMed

    Alexandrov, Nickolai; Tai, Shuaishuai; Wang, Wensheng; Mansueto, Locedie; Palis, Kevin; Fuentes, Roven Rommel; Ulat, Victor Jun; Chebotarov, Dmytro; Zhang, Gengyun; Li, Zhikang; Mauleon, Ramil; Hamilton, Ruaraidh Sackville; McNally, Kenneth L

    2015-01-01

    We have identified about 20 million rice SNPs by aligning reads from the 3000 rice genomes project with the Nipponbare genome. The SNPs and allele information are organized into a SNP-Seek system (http://www.oryzasnp.org/iric-portal/), which consists of Oracle database having a total number of rows with SNP genotypes close to 60 billion (20 M SNPs × 3 K rice lines) and web interface for convenient querying. The database allows quick retrieving of SNP alleles for all varieties in a given genome region, finding different alleles from predefined varieties and querying basic passport and morphological phenotypic information about sequenced rice lines. SNPs can be visualized together with the gene structures in JBrowse genome browser. Evolutionary relationships between rice varieties can be explored using phylogenetic trees or multidimensional scaling plots. PMID:25429973

  9. diArk – the database for eukaryotic genome and transcriptome assemblies in 2014

    PubMed Central

    Kollmar, Martin; Kollmar, Lotte; Hammesfahr, Björn; Simm, Dominic

    2015-01-01

    Eukaryotic genomes are the basis for understanding the complexity of life from populations to the molecular level. Recent technological innovations have revolutionized the speed of data generation enabling the sequencing of eukaryotic genomes and transcriptomes within days. The database diArk (http://www.diark.org) has been developed with the aim to provide access to all available assembled genomes and transcriptomes. In September 2014, diArk contains about 2600 eukaryotes with 6000 genome and transcriptome assemblies, of which 22% are not available via NCBI/ENA/DDBJ. Several indicators for the quality of the assemblies are provided to facilitate their comparison for selecting the most appropriate dataset for further studies. diArk has a user-friendly web interface with extensive options for filtering and browsing the sequenced eukaryotes. In this new version of the database we have also integrated species, for which transcriptome assemblies are available, and we provide more analyses of assemblies. PMID:25378341

  10. VibrioBase: a model for next-generation genome and annotation database development.

    PubMed

    Choo, Siew Woh; Heydari, Hamed; Tan, Tze King; Siow, Cheuk Chuen; Beh, Ching Yew; Wee, Wei Yee; Mutha, Naresh V R; Wong, Guat Jah; Ang, Mia Yang; Yazdi, Amir Hessam

    2014-01-01

    To facilitate the ongoing research of Vibrio spp., a dedicated platform for the Vibrio research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data. We present VibrioBase, a useful resource platform, providing all basic features of a sequence database with the addition of unique analysis tools which could be valuable for the Vibrio research community. VibrioBase currently houses a total of 252 Vibrio genomes developed in a user-friendly manner and useful to enable the analysis of these genomic data, particularly in the field of comparative genomics. Besides general data browsing features, VibrioBase offers analysis tools such as BLAST interfaces and JBrowse genome browser. Other important features of this platform include our newly developed in-house tools, the pairwise genome comparison (PGC) tool, and pathogenomics profiling tool (PathoProT). The PGC tool is useful in the identification and comparative analysis of two genomes, whereas PathoProT is designed for comparative pathogenomics analysis of Vibrio strains. Both of these tools will enable researchers with little experience in bioinformatics to get meaningful information from Vibrio genomes with ease. We have tested the validity and suitability of these tools and features for use in the next-generation database development. PMID:25243218

  11. VibrioBase: A Model for Next-Generation Genome and Annotation Database Development

    PubMed Central

    Choo, Siew Woh; Tan, Tze King; Mutha, Naresh V. R.; Wong, Guat Jah

    2014-01-01

    To facilitate the ongoing research of Vibrio spp., a dedicated platform for the Vibrio research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data. We present VibrioBase, a useful resource platform, providing all basic features of a sequence database with the addition of unique analysis tools which could be valuable for the Vibrio research community. VibrioBase currently houses a total of 252 Vibrio genomes developed in a user-friendly manner and useful to enable the analysis of these genomic data, particularly in the field of comparative genomics. Besides general data browsing features, VibrioBase offers analysis tools such as BLAST interfaces and JBrowse genome browser. Other important features of this platform include our newly developed in-house tools, the pairwise genome comparison (PGC) tool, and pathogenomics profiling tool (PathoProT). The PGC tool is useful in the identification and comparative analysis of two genomes, whereas PathoProT is designed for comparative pathogenomics analysis of Vibrio strains. Both of these tools will enable researchers with little experience in bioinformatics to get meaningful information from Vibrio genomes with ease. We have tested the validity and suitability of these tools and features for use in the next-generation database development. PMID:25243218

  12. Aspergillus: introduction

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Species in the genus Aspergillus possess versatile metabolic activities that impact our daily life both positively and negatively. Aspergillus flavus and Aspergillus oryzae are closely related fungi. While the former is able to produce carcinogenic aflatoxins and is an etiological agent of aspergill...

  13. Tomato genomic resources database: an integrated repository of useful tomato genomic information for basic and applied research.

    PubMed

    Suresh, B Venkata; Roy, Riti; Sahu, Kamlesh; Misra, Gopal; Chattopadhyay, Debasis

    2014-01-01

    Tomato Genomic Resources Database (TGRD) allows interactive browsing of tomato genes, micro RNAs, simple sequence repeats (SSRs), important quantitative trait loci and Tomato-EXPEN 2000 genetic map altogether or separately along twelve chromosomes of tomato in a single window. The database is created using sequence of the cultivar Heinz 1706. High quality single nucleotide polymorphic (SNP) sites between the genes of Heinz 1706 and the wild tomato S. pimpinellifolium LA1589 are also included. Genes are classified into different families. 5'-upstream sequences (5'-US) of all the genes and their tissue-specific expression profiles are provided. Sequences of the microRNA loci and their putative target genes are catalogued. Genes and 5'-US show presence of SSRs and SNPs. SSRs located in the genomic, genic and 5'-US can be analysed separately for the presence of any particular motif. Primer sequences for all the SSRs and flanking sequences for all the genic SNPs have been provided. TGRD is a user-friendly web-accessible relational database and uses CMAP viewer for graphical scanning of all the features. Integration and graphical presentation of important genomic information will facilitate better and easier use of tomato genome. TGRD can be accessed as an open source repository at http://59.163.192.91/tomato2/. PMID:24466070

  14. Tomato Genomic Resources Database: An Integrated Repository of Useful Tomato Genomic Information for Basic and Applied Research

    PubMed Central

    Suresh, B. Venkata; Roy, Riti; Sahu, Kamlesh; Misra, Gopal; Chattopadhyay, Debasis

    2014-01-01

    Tomato Genomic Resources Database (TGRD) allows interactive browsing of tomato genes, micro RNAs, simple sequence repeats (SSRs), important quantitative trait loci and Tomato-EXPEN 2000 genetic map altogether or separately along twelve chromosomes of tomato in a single window. The database is created using sequence of the cultivar Heinz 1706. High quality single nucleotide polymorphic (SNP) sites between the genes of Heinz 1706 and the wild tomato S. pimpinellifolium LA1589 are also included. Genes are classified into different families. 5′-upstream sequences (5′-US) of all the genes and their tissue-specific expression profiles are provided. Sequences of the microRNA loci and their putative target genes are catalogued. Genes and 5′-US show presence of SSRs and SNPs. SSRs located in the genomic, genic and 5′-US can be analysed separately for the presence of any particular motif. Primer sequences for all the SSRs and flanking sequences for all the genic SNPs have been provided. TGRD is a user-friendly web-accessible relational database and uses CMAP viewer for graphical scanning of all the features. Integration and graphical presentation of important genomic information will facilitate better and easier use of tomato genome. TGRD can be accessed as an open source repository at http://59.163.192.91/tomato2/. PMID:24466070

  15. coliBASE: an online database for Escherichia coli, Shigella and Salmonella comparative genomics

    PubMed Central

    Chaudhuri, Roy R.; Khan, Arshad M.; Pallen, Mark J.

    2004-01-01

    We have constructed coliBASE, a database for Escherichia coli, Shigella and Salmonella comparative genomics available online at http://colibase.bham.ac.uk. Unlike other E.coli databases, which focus on the laboratory model strain K12, coliBASE is intended to reflect the full diversity of E.coli and its relatives. The database contains comparative data including whole genome alignments and lists of putative orthologous genes, together with numerous analytical tools and links to existing online resources. The data are stored in a relational database, accessible by a number of user-friendly search methods and graphical browsers. The database schema is generic and can easily be applied to other bacterial genomes. Two such databases, CampyDB (for the analysis of Campylobacter spp.) and ClostriDB (for Clostridium spp.) are also available at http://campy.bham.ac.uk and http://clostri.bham.ac.uk, respectively. An example of the power of E.coli comparative analyses such as those available through coliBASE is presented. PMID:14681417

  16. A novel mycovirus from Aspergillus fumigatus contains four unique dsRNAs as its genome and is infectious as dsRNA

    PubMed Central

    Kanhayuwa, Lakkhana; Kotta-Loizou, Ioly; Özkan, Selin; Gunning, A. Patrick; Coutts, Robert H. A.

    2015-01-01

    We report the discovery and characterization of a double-stranded RNA (dsRNA) mycovirus isolated from the human pathogenic fungus Aspergillus fumigatus, Aspergillus fumigatus tetramycovirus-1 (AfuTmV-1), which reveals several unique features not found previously in positive-strand RNA viruses, including the fact that it represents the first dsRNA (to our knowledge) that is not only infectious as a purified entity but also as a naked dsRNA. The AfuTmV-1 genome consists of four capped dsRNAs, the largest of which encodes an RNA-dependent RNA polymerase (RdRP) containing a unique GDNQ motif normally characteristic of negative-strand RNA viruses. The third largest dsRNA encodes an S-adenosyl methionine–dependent methyltransferase capping enzyme and the smallest dsRNA a P-A-S–rich protein that apparently coats but does not encapsidate the viral genome as visualized by atomic force microscopy. A combination of a capping enzyme with a picorna-like RdRP in the AfuTmV-1 genome is a striking case of chimerism and the first example (to our knowledge) of such a phenomenon. AfuTmV-1 appears to be intermediate between dsRNA and positive-strand ssRNA viruses, as well as between encapsidated and capsidless RNA viruses. PMID:26139522

  17. A novel mycovirus from Aspergillus fumigatus contains four unique dsRNAs as its genome and is infectious as dsRNA.

    PubMed

    Kanhayuwa, Lakkhana; Kotta-Loizou, Ioly; Özkan, Selin; Gunning, A Patrick; Coutts, Robert H A

    2015-07-21

    We report the discovery and characterization of a double-stranded RNA (dsRNA) mycovirus isolated from the human pathogenic fungus Aspergillus fumigatus, Aspergillus fumigatus tetramycovirus-1 (AfuTmV-1), which reveals several unique features not found previously in positive-strand RNA viruses, including the fact that it represents the first dsRNA (to our knowledge) that is not only infectious as a purified entity but also as a naked dsRNA. The AfuTmV-1 genome consists of four capped dsRNAs, the largest of which encodes an RNA-dependent RNA polymerase (RdRP) containing a unique GDNQ motif normally characteristic of negative-strand RNA viruses. The third largest dsRNA encodes an S-adenosyl methionine-dependent methyltransferase capping enzyme and the smallest dsRNA a P-A-S-rich protein that apparently coats but does not encapsidate the viral genome as visualized by atomic force microscopy. A combination of a capping enzyme with a picorna-like RdRP in the AfuTmV-1 genome is a striking case of chimerism and the first example (to our knowledge) of such a phenomenon. AfuTmV-1 appears to be intermediate between dsRNA and positive-strand ssRNA viruses, as well as between encapsidated and capsidless RNA viruses. PMID:26139522

  18. An integrated computational pipeline and database to support whole-genome sequence annotation

    PubMed Central

    Mungall, CJ; Misra, S; Berman, BP; Carlson, J; Frise, E; Harris, N; Marshall, B; Shu, S; Kaminker, JS; Prochnik, SE; Smith, CD; Smith, E; Tupy, JL; Wiel, C; Rubin, GM; Lewis, SE

    2002-01-01

    We describe here our experience in annotating the Drosophila melanogaster genome sequence, in the course of which we developed several new open-source software tools and a database schema to support large-scale genome annotation. We have developed these into an integrated and reusable software system for whole-genome annotation. The key contributions to overall annotation quality are the marshalling of high-quality sequences for alignments and the design of a system with an adaptable and expandable flexible architecture. PMID:12537570

  19. PvTFDB: a Phaseolus vulgaris transcription factors database for expediting functional genomics in legumes

    PubMed Central

    Bhawna; Bonthala, V.S.; Gajula, MNV Prasad

    2016-01-01

    The common bean [Phaseolus vulgaris (L.)] is one of the essential proteinaceous vegetables grown in developing countries. However, its production is challenged by low yields caused by numerous biotic and abiotic stress conditions. Regulatory transcription factors (TFs) symbolize a key component of the genome and are the most significant targets for producing stress tolerant crop and hence functional genomic studies of these TFs are important. Therefore, here we have constructed a web-accessible TFs database for P. vulgaris, called PvTFDB, which contains 2370 putative TF gene models in 49 TF families. This database provides a comprehensive information for each of the identified TF that includes sequence data, functional annotation, SSRs with their primer sets, protein physical properties, chromosomal location, phylogeny, tissue-specific gene expression data, orthologues, cis-regulatory elements and gene ontology (GO) assignment. Altogether, this information would be used in expediting the functional genomic studies of a specific TF(s) of interest. The objectives of this database are to understand functional genomics study of common bean TFs and recognize the regulatory mechanisms underlying various stress responses to ease breeding strategy for variety production through a couple of search interfaces including gene ID, functional annotation and browsing interfaces including by family and by chromosome. This database will also serve as a promising central repository for researchers as well as breeders who are working towards crop improvement of legume crops. In addition, this database provide the user unrestricted public access and the user can download entire data present in the database freely. Database URL: http://www.multiomics.in/PvTFDB/ PMID:27465131

  20. PvTFDB: a Phaseolus vulgaris transcription factors database for expediting functional genomics in legumes.

    PubMed

    Bhawna; Bonthala, V S; Gajula, Mnv Prasad

    2016-01-01

    The common bean [Phaseolus vulgaris (L.)] is one of the essential proteinaceous vegetables grown in developing countries. However, its production is challenged by low yields caused by numerous biotic and abiotic stress conditions. Regulatory transcription factors (TFs) symbolize a key component of the genome and are the most significant targets for producing stress tolerant crop and hence functional genomic studies of these TFs are important. Therefore, here we have constructed a web-accessible TFs database for P. vulgaris, called PvTFDB, which contains 2370 putative TF gene models in 49 TF families. This database provides a comprehensive information for each of the identified TF that includes sequence data, functional annotation, SSRs with their primer sets, protein physical properties, chromosomal location, phylogeny, tissue-specific gene expression data, orthologues, cis-regulatory elements and gene ontology (GO) assignment. Altogether, this information would be used in expediting the functional genomic studies of a specific TF(s) of interest. The objectives of this database are to understand functional genomics study of common bean TFs and recognize the regulatory mechanisms underlying various stress responses to ease breeding strategy for variety production through a couple of search interfaces including gene ID, functional annotation and browsing interfaces including by family and by chromosome. This database will also serve as a promising central repository for researchers as well as breeders who are working towards crop improvement of legume crops. In addition, this database provide the user unrestricted public access and the user can download entire data present in the database freely.Database URL: http://www.multiomics.in/PvTFDB/. PMID:27465131

  1. Update on Genomic Databases and Resources at the National Center for Biotechnology Information.

    PubMed

    Tatusova, Tatiana

    2016-01-01

    The National Center for Biotechnology Information (NCBI), as a primary public repository of genomic sequence data, collects and maintains enormous amounts of heterogeneous data. Data for genomes, genes, gene expressions, gene variation, gene families, proteins, and protein domains are integrated with the analytical, search, and retrieval resources through the NCBI website, text-based search and retrieval system, provides a fast and easy way to navigate across diverse biological databases.Comparative genome analysis tools lead to further understanding of evolution processes quickening the pace of discovery. Recent technological innovations have ignited an explosion in genome sequencing that has fundamentally changed our understanding of the biology of living organisms. This huge increase in DNA sequence data presents new challenges for the information management system and the visualization tools. New strategies have been designed to bring an order to this genome sequence shockwave and improve the usability of associated data. PMID:27115625

  2. The PRRS Host Genomic Consortium (PHGC) Database: Management of large data sets.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In any consortium project where large amounts of phenotypic and genotypic data are collected across several research labs, issues arise with maintenance and analysis of datasets. The PRRS Host Genomic Consortium (PHGC) Database was developed to meet this need for the PRRS research community. The sch...

  3. Development of a grape genomics database using IBM DB2 content manager software

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A relational database was created for the North American Grapevine Genome project at the Viticultural Research Center, at Florida A&M University. The collaborative project with USDA, ARS researchers is an important resource for viticulture production of new grapevine varieties which will be adapted ...

  4. The Changing Face of Scientific Discourse: Analysis of Genomic and Proteomic Database Usage and Acceptance.

    ERIC Educational Resources Information Center

    Brown, Cecelia

    2003-01-01

    Discusses the growth in use and acceptance of Web-based genomic and proteomic databases (GPD) in scholarly communication. Confirms the role of GPD in the scientific literature cycle, suggests GPD are a storage and retrieval mechanism for molecular biology information, and recommends that existing models of scientific communication be updated to…

  5. A DATABASE FOR TRACKING TOXICOGENOMIC SAMPLES AND PROCEDURES WITH GENOMIC, PROTEOMIC AND METABONOMIC COMPONENTS

    EPA Science Inventory

    A Database for Tracking Toxicogenomic Samples and Procedures with Genomic, Proteomic and Metabonomic Components
    Wenjun Bao1, Jennifer Fostel2, Michael D. Waters2, B. Alex Merrick2, Drew Ekman3, Mitchell Kostich4, Judith Schmid1, David Dix1
    Office of Research and Developmen...

  6. Vitigene: A database for grape genomics and genetic resources delivery that benefits grape growers and scientific communities

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A new grape genomic database was established for ‘Native American Grape Species’ as a genomic resource (http://vitigene.famu.edu:9082/eclient/ IDMLogon2.jsp). The new database hosts genetic information collected from disease tolerant/resistant grapevine endemic to North America, and is a valuable re...

  7. SpBase: the sea urchin genome database and web site

    PubMed Central

    Cameron, R. Andrew; Samanta, Manoj; Yuan, Autumn; He, Dong; Davidson, Eric

    2009-01-01

    SpBase is a system of databases focused on the genomic information from sea urchins and related echinoderms. It is exposed to the public through a web site served with open source software (http://spbase.org/). The enterprise was undertaken to provide an easily used collection of information to directly support experimental work on these useful research models in cell and developmental biology. The information served from the databases emerges from the draft genomic sequence of the purple sea urchin, Strongylocentrotus purpuratus and includes sequence data and genomic resource descriptions for other members of the echinoderm clade which in total span 540 million years of evolutionary time. This version of the system contains two assemblies of the purple sea urchin genome, associated expressed sequences, gene annotations and accessory resources. Search mechanisms for the sequences and the gene annotations are provided. Because the system is maintained along with the Sea Urchin Genome resource, a database of sequenced clones is also provided. PMID:19010966

  8. Towards a Universal Clinical Genomics Database: The 2012 International Standards for Cytogenomic Arrays (ISCA) Consortium Meeting

    PubMed Central

    Riggs, Erin Rooney; Wain, Karen E.; Riethmaier, Darlene; Savage, Melissa; Smith-Packard, Bethanny; Kaminsky, Erin B.; Rehm, Heidi L.; Martin, Christa Lese; Ledbetter, David H.; Faucett, W. Andrew

    2013-01-01

    The 2012 International Standards for Cytogenomic Arrays (ISCA) Consortium Meeting, “Towards a Universal Clinical Genomic Database,” was held in Bethesda, MD, 21–22 May 2012 and was attended by over 200 individuals from around the world representing clinical genetic testing laboratories, clinicians, academia, industry, research, and regulatory agencies. The scientific program centered on expanding the current focus of the ISCA Consortium to include the collection and curation of both structural and sequence-level variation into a unified clinical genomics database, available to the public through resources such as the National Center for Biotechnology Information (NCBI)’s ClinVar database. Here, we provide an overview of the conference, with summaries of the topics presented for discussion by over 25 different speakers. Presentations are available online at www.iscaconsortium.org. PMID:23463607

  9. The HuGeMap Database: interconnection and visualization of human genome maps.

    PubMed

    Barillot, E; Pook, S; Guyon, F; Cussat-Blanc, C; Viara, E; Vaysseix, G

    1999-01-01

    The HuGeMap database stores the major genetic and physical maps of the human genome. HuGeMap is accessible on the Web at http://www. infobiogen.fr/services/Hugemap and through a CORBA server. A standard genome map data format for the interconnection of genome map databases was defined in collaboration with the EBI. The HuGeMap CORBA server provides this interconnection using the interface definition language IDL. Two graphical user interfaces were developed for the visualization of the HuGeMap data: ZoomMap (http://www.infobiogen.fr/services/zomit/Zoom Map.html) for navigation by zooming and data transformation via magic lenses, and MappetShow (http://www.infobiogen.fr/services/Mappet) for visualizing and comparing maps. PMID:9847155

  10. Large-Scale Phylogenetic Classification of Fungal Chitin Synthases and Identification of a Putative Cell-Wall Metabolism Gene Cluster in Aspergillus Genomes

    PubMed Central

    Pacheco-Arjona, Jose Ramon; Ramirez-Prado, Jorge Humberto

    2014-01-01

    The cell wall is a protective and versatile structure distributed in all fungi. The component responsible for its rigidity is chitin, a product of chitin synthase (Chsp) enzymes. There are seven classes of chitin synthase genes (CHS) and the amount and type encoded in fungal genomes varies considerably from one species to another. Previous Chsp sequence analyses focused on their study as individual units, regardless of genomic context. The identification of blocks of conserved genes between genomes can provide important clues about the interactions and localization of chitin synthases. On the present study, we carried out an in silico search of all putative Chsp encoded in 54 full fungal genomes, encompassing 21 orders from five phyla. Phylogenetic studies of these Chsp were able to confidently classify 347 out of the 369 Chsp identified (94%). Patterns in the distribution of Chsp related to taxonomy were identified, the most prominent being related to the type of fungal growth. More importantly, a synteny analysis for genomic blocks centered on class IV Chsp (the most abundant and widely distributed Chsp class) identified a putative cell wall metabolism gene cluster in members of the genus Aspergillus, the first such association reported for any fungal genome. PMID:25148134

  11. Design and implementation of a database for Brucella melitensis genome annotation.

    PubMed

    De Hertogh, Benoît; Lahlimi, Leïla; Lambert, Christophe; Letesson, Jean-Jacques; Depiereux, Eric

    2008-03-18

    The genome sequences of three Brucella biovars and of some species close to Brucella sp. have become available, leading to new relationship analysis. Moreover, the automatic genome annotation of the pathogenic bacteria Brucella melitensis has been manually corrected by a consortium of experts, leading to 899 modifications of start sites predictions among the 3198 open reading frames (ORFs) examined. This new annotation, coupled with the results of automatic annotation tools of the complete genome sequences of the B. melitensis genome (including BLASTs to 9 genomes close to Brucella), provides numerous data sets related to predicted functions, biochemical properties and phylogenic comparisons. To made these results available, alphaPAGe, a functional auto-updatable database of the corrected sequence genome of B. melitensis, has been built, using the entity-relationship (ER) approach and a multi-purpose database structure. A friendly graphical user interface has been designed, and users can carry out different kinds of information by three levels of queries: (1) the basic search use the classical keywords or sequence identifiers; (2) the original advanced search engine allows to combine (by using logical operators) numerous criteria: (a) keywords (textual comparison) related to the pCDS's function, family domains and cellular localization; (b) physico-chemical characteristics (numerical comparison) such as isoelectric point or molecular weight and structural criteria such as the nucleic length or the number of transmembrane helix (TMH); (c) similarity scores with Escherichia coli and 10 species phylogenetically close to B. melitensis; (3) complex queries can be performed by using a SQL field, which allows all queries respecting the database's structure. The database is publicly available through a Web server at the following url: http://www.fundp.ac.be/urbm/bioinfo/aPAGe. PMID:18160234

  12. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

    PubMed Central

    Liolios, Konstantinos; Chen, I-Min A.; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor M.; Kyrpides, Nikos C.

    2010-01-01

    The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/ PMID:19914934

  13. OperomeDB: A Database of Condition-Specific Transcription Units in Prokaryotic Genomes

    PubMed Central

    Chetal, Kashish; Janga, Sarath Chandra

    2015-01-01

    Background. In prokaryotic organisms, a substantial fraction of adjacent genes are organized into operons—codirectionally organized genes in prokaryotic genomes with the presence of a common promoter and terminator. Although several available operon databases provide information with varying levels of reliability, very few resources provide experimentally supported results. Therefore, we believe that the biological community could benefit from having a new operon prediction database with operons predicted using next-generation RNA-seq datasets. Description. We present operomeDB, a database which provides an ensemble of all the predicted operons for bacterial genomes using available RNA-sequencing datasets across a wide range of experimental conditions. Although several studies have recently confirmed that prokaryotic operon structure is dynamic with significant alterations across environmental and experimental conditions, there are no comprehensive databases for studying such variations across prokaryotic transcriptomes. Currently our database contains nine bacterial organisms and 168 transcriptomes for which we predicted operons. User interface is simple and easy to use, in terms of visualization, downloading, and querying of data. In addition, because of its ability to load custom datasets, users can also compare their datasets with publicly available transcriptomic data of an organism. Conclusion. OperomeDB as a database should not only aid experimental groups working on transcriptome analysis of specific organisms but also enable studies related to computational and comparative operomics. PMID:26543854

  14. Manual curation is not sufficient for annotation of genomic databases

    PubMed Central

    Baumgartner, William A.; Cohen, K. Bretonnel; Fox, Lynne M.; Acquaah-Mensah, George; Hunter, Lawrence

    2008-01-01

    Motivation Knowledge base construction has been an area of intense activity and great importance in the growth of computational biology. However, there is little or no history of work on the subject of evaluation of knowledge bases, either with respect to their contents or with respect to the processes by which they are constructed. This article proposes the application of a metric from software engineering known as the found/fixed graph to the problem of evaluating the processes by which genomic knowledge bases are built, as well as the completeness of their contents. Results Well-understood patterns of change in the found/fixed graph are found to occur in two large publicly available knowledge bases. These patterns suggest that the current manual curation processes will take far too long to complete the annotations of even just the most important model organisms, and that at their current rate of production, they will never be sufficient for completing the annotation of all currently available proteomes. Contact larry.hunter@uchsc.edu PMID:17646325

  15. Scalable metagenomic taxonomy classification using a reference genome database

    PubMed Central

    Ames, Sasha K.; Hysom, David A.; Gardner, Shea N.; Lloyd, G. Scott; Gokhale, Maya B.; Allen, Jonathan E.

    2013-01-01

    Motivation: Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, do not scale well to analyze large metagenomic datasets, and balancing classification accuracy with computational efficiency presents a fundamental challenge. Results: A method is presented to shift computational costs to an off-line computation by creating a taxonomy/genome index that supports scalable metagenomic classification. Scalable performance is demonstrated on real and simulated data to show accurate classification in the presence of novel organisms on samples that include viruses, prokaryotes, fungi and protists. Taxonomic classification of the previously published 150 giga-base Tyrolean Iceman dataset was found to take <20 h on a single node 40 core large memory machine and provide new insights on the metagenomic contents of the sample. Availability: Software was implemented in C++ and is freely available at http://sourceforge.net/projects/lmat Contact: allen99@llnl.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23828782

  16. CottonGen: a genomics, genetics and breeding database for cotton research.

    PubMed

    Yu, Jing; Jung, Sook; Cheng, Chun-Huai; Ficklin, Stephen P; Lee, Taein; Zheng, Ping; Jones, Don; Percy, Richard G; Main, Dorrie

    2014-01-01

    CottonGen (http://www.cottongen.org) is a curated and integrated web-based relational database providing access to publicly available genomic, genetic and breeding data for cotton. CottonGen supercedes CottonDB and the Cotton Marker Database, with enhanced tools for easier data sharing, mining, visualization and data retrieval of cotton research data. CottonGen contains annotated whole genome sequences, unigenes from expressed sequence tags (ESTs), markers, trait loci, genetic maps, genes, taxonomy, germplasm, publications and communication resources for the cotton community. Annotated whole genome sequences of Gossypium raimondii are available with aligned genetic markers and transcripts. These whole genome data can be accessed through genome pages, search tools and GBrowse, a popular genome browser. Most of the published cotton genetic maps can be viewed and compared using CMap, a comparative map viewer, and are searchable via map search tools. Search tools also exist for markers, quantitative trait loci (QTLs), germplasm, publications and trait evaluation data. CottonGen also provides online analysis tools such as NCBI BLAST and Batch BLAST. PMID:24203703

  17. gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes.

    PubMed

    Nakagawa, So; Takahashi, Mahoko Ueda

    2016-01-01

    In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs. To investigate the function and evolution of EVEs in mammalian genomes, we developed EVE ORF databases for 20 genomes of 19 mammalian species. A total of 736,771 non-overlapping EVE ORFs were identified and archived in a database named gEVE (http://geve.med.u-tokai.ac.jp). The gEVE database provides nucleotide and amino acid sequences, genomic loci and functional annotations of EVE ORFs for all 20 genomes. In analyzing RNA-seq data with the gEVE database, we successfully identified the expressed EVE genes, suggesting that the gEVE database facilitates studies of the genomic analyses of various mammalian species.Database URL: http://geve.med.u-tokai.ac.jp. PMID:27242033

  18. gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes

    PubMed Central

    Nakagawa, So; Takahashi, Mahoko Ueda

    2016-01-01

    In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs. To investigate the function and evolution of EVEs in mammalian genomes, we developed EVE ORF databases for 20 genomes of 19 mammalian species. A total of 736,771 non-overlapping EVE ORFs were identified and archived in a database named gEVE (http://geve.med.u-tokai.ac.jp). The gEVE database provides nucleotide and amino acid sequences, genomic loci and functional annotations of EVE ORFs for all 20 genomes. In analyzing RNA-seq data with the gEVE database, we successfully identified the expressed EVE genes, suggesting that the gEVE database facilitates studies of the genomic analyses of various mammalian species. Database URL: http://geve.med.u-tokai.ac.jp PMID:27242033

  19. Unlimited Thirst for Genome Sequencing, Data Interpretation, and Database Usage in Genomic Era: The Road towards Fast-Track Crop Plant Improvement

    PubMed Central

    Govindaraj, Mahalingam

    2015-01-01

    The number of sequenced crop genomes and associated genomic resources is growing rapidly with the advent of inexpensive next generation sequencing methods. Databases have become an integral part of all aspects of science research, including basic and applied plant and animal sciences. The importance of databases keeps increasing as the volume of datasets from direct and indirect genomics, as well as other omics approaches, keeps expanding in recent years. The databases and associated web portals provide at a minimum a uniform set of tools and automated analysis across a wide range of crop plant genomes. This paper reviews some basic terms and considerations in dealing with crop plant databases utilization in advancing genomic era. The utilization of databases for variation analysis with other comparative genomics tools, and data interpretation platforms are well described. The major focus of this review is to provide knowledge on platforms and databases for genome-based investigations of agriculturally important crop plants. The utilization of these databases in applied crop improvement program is still being achieved widely; otherwise, the end for sequencing is not far away. PMID:25874133

  20. dbEM: A database of epigenetic modifiers curated from cancerous and normal genomes

    PubMed Central

    Singh Nanda, Jagpreet; Kumar, Rahul; Raghava, Gajendra P. S.

    2016-01-01

    We have developed a database called dbEM (database of Epigenetic Modifiers) to maintain the genomic information of about 167 epigenetic modifiers/proteins, which are considered as potential cancer targets. In dbEM, modifiers are classified on functional basis and comprise of 48 histone methyl transferases, 33 chromatin remodelers and 31 histone demethylases. dbEM maintains the genomic information like mutations, copy number variation and gene expression in thousands of tumor samples, cancer cell lines and healthy samples. This information is obtained from public resources viz. COSMIC, CCLE and 1000-genome project. Gene essentiality data retrieved from COLT database further highlights the importance of various epigenetic proteins for cancer survival. We have also reported the sequence profiles, tertiary structures and post-translational modifications of these epigenetic proteins in cancer. It also contains information of 54 drug molecules against different epigenetic proteins. A wide range of tools have been integrated in dbEM e.g. Search, BLAST, Alignment and Profile based prediction. In our analysis, we found that epigenetic proteins DNMT3A, HDAC2, KDM6A, and TET2 are highly mutated in variety of cancers. We are confident that dbEM will be very useful in cancer research particularly in the field of epigenetic proteins based cancer therapeutics. This database is available for public at URL: http://crdd.osdd.net/raghava/dbem. PMID:26777304

  1. RiceVarMap: a comprehensive database of rice genomic variations

    PubMed Central

    Zhao, Hu; Yao, Wen; Ouyang, Yidan; Yang, Wanneng; Wang, Gongwei; Lian, Xingming; Xing, Yongzhong; Chen, Lingling; Xie, Weibo

    2015-01-01

    Rice Variation Map (RiceVarMap, http:/ricevarmap.ncpgr.cn) is a database of rice genomic variations. The database provides comprehensive information of 6 551 358 single nucleotide polymorphisms (SNPs) and 1 214 627 insertions/deletions (INDELs) identified from sequencing data of 1479 rice accessions. The SNP genotypes of all accessions were imputed and evaluated, resulting in an overall missing data rate of 0.42% and an estimated accuracy greater than 99%. The SNP/INDEL genotypes of all accessions are available for online query and download. Users can search SNPs/INDELs by identifiers of the SNPs/INDELs, genomic regions, gene identifiers and keywords of gene annotation. Allele frequencies within various subpopulations and the effects of the variation that may alter the protein sequence of a gene are also listed for each SNP/INDEL. The database also provides geographical details and phenotype images for various rice accessions. In particular, the database provides tools to construct haplotype networks and design PCR-primers by taking into account surrounding known genomic variations. These data and tools are highly useful for exploring genetic variations and evolution studies of rice and other species. PMID:25274737

  2. HPMCD: the database of human microbial communities from metagenomic datasets and microbial reference genomes.

    PubMed

    Forster, Samuel C; Browne, Hilary P; Kumar, Nitin; Hunt, Martin; Denise, Hubert; Mitchell, Alex; Finn, Robert D; Lawley, Trevor D

    2016-01-01

    The Human Pan-Microbe Communities (HPMC) database (http://www.hpmcd.org/) provides a manually curated, searchable, metagenomic resource to facilitate investigation of human gastrointestinal microbiota. Over the past decade, the application of metagenome sequencing to elucidate the microbial composition and functional capacity present in the human microbiome has revolutionized many concepts in our basic biology. When sufficient high quality reference genomes are available, whole genome metagenomic sequencing can provide direct biological insights and high-resolution classification. The HPMC database provides species level, standardized phylogenetic classification of over 1800 human gastrointestinal metagenomic samples. This is achieved by combining a manually curated list of bacterial genomes from human faecal samples with over 21000 additional reference genomes representing bacteria, viruses, archaea and fungi with manually curated species classification and enhanced sample metadata annotation. A user-friendly, web-based interface provides the ability to search for (i) microbial groups associated with health or disease state, (ii) health or disease states and community structure associated with a microbial group, (iii) the enrichment of a microbial gene or sequence and (iv) enrichment of a functional annotation. The HPMC database enables detailed analysis of human microbial communities and supports research from basic microbiology and immunology to therapeutic development in human health and disease. PMID:26578596

  3. Biological database of images and genomes: tools for community annotations linking image and genomic information.

    PubMed

    Oberlin, Andrew T; Jurkovic, Dominika A; Balish, Mitchell F; Friedberg, Iddo

    2013-01-01

    Genomic data and biomedical imaging data are undergoing exponential growth. However, our understanding of the phenotype-genotype connection linking the two types of data is lagging behind. While there are many types of software that enable the manipulation and analysis of image data and genomic data as separate entities, there is no framework established for linking the two. We present a generic set of software tools, BioDIG, that allows linking of image data to genomic data. BioDIG tools can be applied to a wide range of research problems that require linking images to genomes. BioDIG features the following: rapid construction of web-based workbenches, community-based annotation, user management and web services. By using BioDIG to create websites, researchers and curators can rapidly annotate a large number of images with genomic information. Here we present the BioDIG software tools that include an image module, a genome module and a user management module. We also introduce a BioDIG-based website, MyDIG, which is being used to annotate images of mycoplasmas. PMID:23550062

  4. Genome Data from DOOR: a Database for prOkaryotic OpeRons

    DOE Data Explorer

    DOOR (Database of prOkaryotic OpeRons) is an operon database developed by Computational Systems Biology Lab (CSBL) at University of Georgia. Although the operons in the database are based on prediction, there are some unique features. These are: • A algorithm is consistently best at all aspects including sensitivity and specificity for both true positives and true negatives, and the overall accuracy reaches 90 percent. The prediction algorithm is based on this paper: P. Dam, V. Olman, K. Harris, Z. Su, Y. Xu., Operon prediction using both genome-specific and general genomic information, Nucleic Acids Res., 35(1):288-98, 2007 • DOOR provides one of the largest data sets of operon information available to the public. DOOR provides operons for 675 prokaryotic genomes. Although most of operons in DOOR are not verified by experiments, the creators are also trying to provide some limited literature information, which is extracted from ODB. They emphasize that if the users are looking for strictly experimentally verified operons, they should look into DBTBS and RegulonDB first. • Operons which include RNA genes, which are rarely seen in other operon databases especially for predicted operon databases • Defined the similarity scores between operons, which is based on weighted maximum matching between operons. Similar operon groups can be used to predict accurate orthologous genes,and their upstream regions can be used to find the consensus binding motifs. • Integration of two motif finding programs in the database: MEME and CUBIC. DOOR provides an Organism View for browsing, a gene search tool, an operon search tool, and the operon prediction interface.[Text taken and edited from http://csbl1.bmb.uga.edu/OperonDB/tutorial.php

  5. SinEx DB: a database for single exon coding sequences in mammalian genomes

    PubMed Central

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F.; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S.

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as ‘single exon genes’ (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs. Database URL: www.sinex.cl PMID:27278816

  6. Scriptable access to the Caenorhabditis elegans genome sequence and other ACEDB databases.

    PubMed

    Stein, L D; Thierry-Mieg, J

    1998-12-01

    Much of the world's genomic data are available to the community through networked databases that are accessed via Web interfaces. Although this paradigm provides browse-level access and has greatly facilitated linking between databases, it does not provide any convenient mechanism for programmatically fetching and integrating data from diverse databases. We have created a library and an application programming interface (API) named AcePerl that provides simple, direct access to ACEDB databases from the Perl programming language. With this library, programmers and computer-savvy biologists can write software to pose complex queries on local and remote ACEDB databases, retrieve the data, integrate the results, and move data objects from one database to another. In addition, a set of Web scripts running on top of AcePerl provides Web-based browsing of any local or remote ACEDB database. AcePerl and the AceBrowser Web browser run on Unix systems and are available under a license that allows for unrestricted use and redistribution. Both packages can be downloaded from URL. A Microsoft Windows port of AcePerl is in the planning stages. PMID:9872985

  7. SinEx DB: a database for single exon coding sequences in mammalian genomes.

    PubMed

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl. PMID:27278816

  8. SorghumFDB: sorghum functional genomics database with multidimensional network analysis

    PubMed Central

    Tian, Tian; You, Qi; Zhang, Liwei; Yi, Xin; Yan, Hengyu; Xu, Wenying; Su, Zhen

    2016-01-01

    Sorghum (Sorghum bicolor [L.] Moench) has excellent agronomic traits and biological properties, such as heat and drought-tolerance. It is a C4 grass and potential bioenergy-producing plant, which makes it an important crop worldwide. With the sorghum genome sequence released, it is essential to establish a sorghum functional genomics data mining platform. We collected genomic data and some functional annotations to construct a sorghum functional genomics database (SorghumFDB). SorghumFDB integrated knowledge of sorghum gene family classifications (transcription regulators/factors, carbohydrate-active enzymes, protein kinases, ubiquitins, cytochrome P450, monolignol biosynthesis related enzymes, R-genes and organelle-genes), detailed gene annotations, miRNA and target gene information, orthologous pairs in the model plants Arabidopsis, rice and maize, gene loci conversions and a genome browser. We further constructed a dynamic network of multidimensional biological relationships, comprised of the co-expression data, protein–protein interactions and miRNA-target pairs. We took effective measures to combine the network, gene set enrichment and motif analyses to determine the key regulators that participate in related metabolic pathways, such as the lignin pathway, which is a major biological process in bioenergy-producing plants. Database URL: http://structuralbiology.cau.edu.cn/sorghum/index.html. PMID:27352859

  9. SorghumFDB: sorghum functional genomics database with multidimensional network analysis.

    PubMed

    Tian, Tian; You, Qi; Zhang, Liwei; Yi, Xin; Yan, Hengyu; Xu, Wenying; Su, Zhen

    2016-01-01

    Sorghum (Sorghum bicolor [L.] Moench) has excellent agronomic traits and biological properties, such as heat and drought-tolerance. It is a C4 grass and potential bioenergy-producing plant, which makes it an important crop worldwide. With the sorghum genome sequence released, it is essential to establish a sorghum functional genomics data mining platform. We collected genomic data and some functional annotations to construct a sorghum functional genomics database (SorghumFDB). SorghumFDB integrated knowledge of sorghum gene family classifications (transcription regulators/factors, carbohydrate-active enzymes, protein kinases, ubiquitins, cytochrome P450, monolignol biosynthesis related enzymes, R-genes and organelle-genes), detailed gene annotations, miRNA and target gene information, orthologous pairs in the model plants Arabidopsis, rice and maize, gene loci conversions and a genome browser. We further constructed a dynamic network of multidimensional biological relationships, comprised of the co-expression data, protein-protein interactions and miRNA-target pairs. We took effective measures to combine the network, gene set enrichment and motif analyses to determine the key regulators that participate in related metabolic pathways, such as the lignin pathway, which is a major biological process in bioenergy-producing plants.Database URL: http://structuralbiology.cau.edu.cn/sorghum/index.html. PMID:27352859

  10. Elucidation of veA Dependent Genes Associated with Aflatoxin and Sclerotial Production in Aspergillus flavus by Functional Genomics

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The aflatoxin-producing fungi, Aspergillus flavus and A. parasiticus, form structures called sclerotia that allow for survival under adverse conditions. Deletion of the veA gene in A. flavus and A. parasiticus blocks production of aflatoxin, as well as sclerotial formation. We used microarray tech...

  11. Genome sequence of Aspergillus flavus NRRL 3357, a strain that causes aflatoxin contamination of food and feed

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxin contamination of food and livestock feed results in significant annual crop losses internationally. Aspergillus flavus is the major fungus responsible for this loss. Additionally, A. flavus is the second leading cause of aspergillosis in immune compromised human patients. Here we report th...

  12. Use of functional genomics to assess the impact of climate change on Aspergillus flavus and aflatoxin production

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus is an opportunistic pathogenic fungus that infects several crops of agricultural importance, among them, corn, cotton, and peanuts. Once established as a pathogen the fungus may secrete secondary metabolites commonly known as mycotoxins, that if consumed by humans or animals may r...

  13. EchoBASE: an integrated post-genomic database for Escherichia coli.

    PubMed

    Misra, Raju V; Horler, Richard S P; Reindl, Wolfgang; Goryanin, Igor I; Thomas, Gavin H

    2005-01-01

    EchoBASE (http://www.ecoli-york.org) is a relational database designed to contain and manipulate information from post-genomic experiments using the model bacterium Escherichia coli K-12. Its aim is to collate information from a wide range of sources to provide clues to the functions of the approximately 1500 gene products that have no confirmed cellular function. The database is built on an enhanced annotation of the updated genome sequence of strain MG1655 and the association of experimental data with the E.coli genes and their products. Experiments that can be held within EchoBASE include proteomics studies, microarray data, protein-protein interaction data, structural data and bioinformatics studies. EchoBASE also contains annotated information on 'orphan' enzyme activities from this microbe to aid characterization of the proteins that catalyse these elusive biochemical reactions. PMID:15608209

  14. A genome-wide microsatellite polymorphism database for the indica and japonica rice.

    PubMed

    Zhang, Zhonghua; Deng, Yajun; Tan, Jun; Hu, Songnian; Yu, Jun; Xue, Qingzhong

    2007-02-28

    Microsatellite (MS) polymorphism is an important source of genetic diversity, providing support for map-based cloning and molecular breeding. We have developed a new database that contains 52 845 polymorphic MS loci between indica and japonica, composed of ample Class II MS markers, and integrated 18 828 MS loci from IRGSP and genetic markers from RGP. Based on genetic marker positions on the rice genome (http://rise.genomics.org.cn/rice2/index.jsp ), we determined the approximate genetic distances of these MS loci and validated 100 randomly selected markers experimentally with 90% success rate. In addition, we recorded polymorphic MS positions in indica cv. 9311 that is the most important paternal parent of the two-line hybrid rice in China. Our database will undoubtedly facilitate the application of MS markers in genetic researches and marker-assisted breeding. The data set is freely available from www.wigs.zju.edu.cn/achievment/polySSR. PMID:17452422

  15. Aspergillus parasiticus SU-1 genome sequence, predicted chromosome structure, and comparative gene expression under aflatoxin-inducing conditions: evidence that differential expression contributes to species phenotype.

    PubMed

    Linz, John E; Wee, Josephine; Roze, Ludmila V

    2014-08-01

    The filamentous fungi Aspergillus parasiticus and Aspergillus flavus produce the carcinogenic secondary metabolite aflatoxin on susceptible crops. These species differ in the quantity of aflatoxins B1, B2, G1, and G2 produced in culture, in the ability to produce the mycotoxin cyclopiazonic acid, and in morphology of mycelia and conidiospores. To understand the genetic basis for differences in biochemistry and morphology, we conducted next-generation sequence (NGS) analysis of the A. parasiticus strain SU-1 genome and comparative gene expression (RNA sequence analysis [RNA Seq]) analysis of A. parasiticus SU-1 and A. flavus strain NRRL 3357 (3357) grown under aflatoxin-inducing and -noninducing culture conditions. Although A. parasiticus SU-1 and A. flavus 3357 are highly similar in genome structure and gene organization, we observed differences in the presence of specific mycotoxin gene clusters and differential expression of specific mycotoxin genes and gene clusters that help explain differences in the type and quantity of mycotoxins synthesized. Using computer-aided analysis of secondary metabolite clusters (antiSMASH), we demonstrated that A. parasiticus SU-1 and A. flavus 3357 may carry up to 93 secondary metabolite gene clusters, and surprisingly, up to 10% of the genome appears to be dedicated to secondary metabolite synthesis. The data also suggest that fungus-specific zinc binuclear cluster (C6) transcription factors play an important role in regulation of secondary metabolite cluster expression. Finally, we identified uniquely expressed genes in A. parasiticus SU-1 that encode C6 transcription factors and genes involved in secondary metabolism and stress response/cellular defense. Future work will focus on these differentially expressed A. parasiticus SU-1 loci to reveal their role in determining distinct species characteristics. PMID:24951444

  16. ChiloDB: a genomic and transcriptome database for an important rice insect pest Chilo suppressalis

    PubMed Central

    Yin, Chuanlin; Liu, Ying; Liu, Jinding; Xiao, Huamei; Huang, Shuiqing; Lin, Yongjun; Han, Zhaojun; Li, Fei

    2014-01-01

    ChiloDB is an integrated resource that will be of use to the rice stem borer research community. The rice striped stem borer (SSB), Chilo suppressalis Walker, is a major rice pest that causes severe yield losses in most rice-producing countries. A draft genome of this insect is available. The aims of ChiloDB are (i) to store recently acquired genomic sequence and transcriptome data and integrate them with protein-coding genes, microRNAs, piwi-interacting RNAs (piRNAs) and RNA sequencing (RNA-Seq) data and (ii) to provide comprehensive search tools and downloadable data sets for comparative genomics and gene annotation of this important rice pest. ChiloDB contains the first version of the official SSB gene set, comprising 80 479 scaffolds and 10 221 annotated protein-coding genes. Additionally, 262 SSB microRNA genes predicted from a small RNA library, 82 639 piRNAs identified using the piRNApredictor software, 37 040 transcripts from a midgut transcriptome and 69 977 transcripts from a mixed sample have all been integrated into ChiloDB. ChiloDB was constructed using a data structure that is compatible with data resources, which will be incorporated into the database in the future. This resource will serve as a long-term and open-access database for research on the biology, evolution and pest control of SSB. To the best of our knowledge, ChiloDB is one of the first genomic and transcriptome database for rice insect pests. Database URL: http://ento.njau.edu.cn/ChiloDB. PMID:24997141

  17. ChiloDB: a genomic and transcriptome database for an important rice insect pest Chilo suppressalis.

    PubMed

    Yin, Chuanlin; Liu, Ying; Liu, Jinding; Xiao, Huamei; Huang, Shuiqing; Lin, Yongjun; Han, Zhaojun; Li, Fei

    2014-01-01

    ChiloDB is an integrated resource that will be of use to the rice stem borer research community. The rice striped stem borer (SSB), Chilo suppressalis Walker, is a major rice pest that causes severe yield losses in most rice-producing countries. A draft genome of this insect is available. The aims of ChiloDB are (i) to store recently acquired genomic sequence and transcriptome data and integrate them with protein-coding genes, microRNAs, piwi-interacting RNAs (piRNAs) and RNA sequencing (RNA-Seq) data and (ii) to provide comprehensive search tools and downloadable data sets for comparative genomics and gene annotation of this important rice pest. ChiloDB contains the first version of the official SSB gene set, comprising 80,479 scaffolds and 10 221 annotated protein-coding genes. Additionally, 262 SSB microRNA genes predicted from a small RNA library, 82 639 piRNAs identified using the piRNApredictor software, 37,040 transcripts from a midgut transcriptome and 69 977 transcripts from a mixed sample have all been integrated into ChiloDB. ChiloDB was constructed using a data structure that is compatible with data resources, which will be incorporated into the database in the future. This resource will serve as a long-term and open-access database for research on the biology, evolution and pest control of SSB. To the best of our knowledge, ChiloDB is one of the first genomic and transcriptome database for rice insect pests. Database URL: http://ento.njau.edu.cn/ChiloDB. PMID:24997141

  18. The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata

    PubMed Central

    Pagani, Ioanna; Liolios, Konstantinos; Jansson, Jakob; Chen, I-Min A.; Smirnova, Tatyana; Nosrat, Bahador; Markowitz, Victor M.; Kyrpides, Nikos C.

    2012-01-01

    The Genomes OnLine Database (GOLD, http://www.genomesonline.org/) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2011, GOLD, now on version 4.0, contains information for 11 472 sequencing projects, of which 2907 have been completed and their sequence data has been deposited in a public repository. Out of these complete projects, 1918 are finished and 989 are permanent drafts. Moreover, GOLD contains information for 340 metagenome studies associated with 1927 metagenome samples. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about any (x) Sequence specification and beyond. PMID:22135293

  19. A rapid classification protocol for the CATH Domain Database to support structural genomics

    PubMed Central

    Pearl, Frances M. G.; Martin, Nigel; Bray, James E.; Buchan, Daniel W. A.; Harrison, Andrew P.; Lee, David; Reeves, Gabrielle A.; Shepherd, Adrian J.; Sillitoe, Ian; Todd, Annabel E.; Thornton, Janet M.; Orengo, Christine A.

    2001-01-01

    In order to support the structural genomic initiatives, both by rapidly classifying newly determined structures and by suggesting suitable targets for structure determination, we have recently developed several new protocols for classifying structures in the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath). These aim to increase the speed of classification of new structures using fast algorithms for structure comparison (GRATH) and to improve the sensitivity in recognising distant structural relatives by incorporating sequence information from relatives in the genomes (DomainFinder). In order to ensure the integrity of the database given the expected increase in data, the CATH Protein Family Database (CATH-PFDB), which currently includes 25 320 structural domains and a further 160 000 sequence relatives has now been installed in a relational ORACLE database. This was essential for developing more rigorous validation procedures and for allowing efficient querying of the database, particularly for genome analysis. The associated Dictionary of Homologous Superfamilies [Bray,J.E., Todd,A.E., Pearl,F.M.G., Thornton,J.M. and Orengo,C.A. (2000) Protein Eng., 13, 153–165], which provides multiple structural alignments and functional information to assist in assigning new relatives, has also been expanded recently and now includes information for 903 homo­logous superfamilies. In order to improve coverage of known structures, preliminary classification levels are now provided for new structures at interim stages in the classification protocol. Since a large proportion of new structures can be rapidly classified using profile-based sequence analysis [e.g. PSI-BLAST: Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389–3402], this provides preliminary classification for easily recognisable homologues, which in the latest release of CATH (version 1.7) represented nearly three-quarters of

  20. A rapid classification protocol for the CATH Domain Database to support structural genomics.

    PubMed

    Pearl, F M; Martin, N; Bray, J E; Buchan, D W; Harrison, A P; Lee, D; Reeves, G A; Shepherd, A J; Sillitoe, I; Todd, A E; Thornton, J M; Orengo, C A

    2001-01-01

    In order to support the structural genomic initiatives, both by rapidly classifying newly determined structures and by suggesting suitable targets for structure determination, we have recently developed several new protocols for classifying structures in the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath). These aim to increase the speed of classification of new structures using fast algorithms for structure comparison (GRATH) and to improve the sensitivity in recognising distant structural relatives by incorporating sequence information from relatives in the genomes (DomainFinder). In order to ensure the integrity of the database given the expected increase in data, the CATH Protein Family Database (CATH-PFDB), which currently includes 25,320 structural domains and a further 160,000 sequence relatives has now been installed in a relational ORACLE database. This was essential for developing more rigorous validation procedures and for allowing efficient querying of the database, particularly for genome analysis. The associated Dictionary of Homologous Superfamilies [Bray,J.E., Todd,A.E., Pearl,F.M.G., Thornton,J.M. and Orengo,C.A. (2000) Protein Eng., 13, 153-165], which provides multiple structural alignments and functional information to assist in assigning new relatives, has also been expanded recently and now includes information for 903 homologous superfamilies. In order to improve coverage of known structures, preliminary classification levels are now provided for new structures at interim stages in the classification protocol. Since a large proportion of new structures can be rapidly classified using profile-based sequence analysis [e.g. PSI-BLAST: Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389-3402], this provides preliminary classification for easily recognisable homologues, which in the latest release of CATH (version 1.7) represented nearly three-quarters of the non

  1. GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences.

    PubMed

    Antonov, Ivan; Baranov, Pavel; Borodovsky, Mark

    2013-01-01

    Database annotations of prokaryotic genomes and eukaryotic mRNA sequences pay relatively low attention to frame transitions that disrupt protein-coding genes. Frame transitions (frameshifts) could be caused by sequencing errors or indel mutations inside protein-coding regions. Other observed frameshifts are related to recoding events (that evolved to control expression of some genes). Earlier, we have developed an algorithm and software program GeneTack for ab initio frameshift finding in intronless genes. Here, we describe a database (freely available at http://topaz.gatech.edu/GeneTack/db.html) containing genes with frameshifts (fs-genes) predicted by GeneTack. The database includes 206 991 fs-genes from 1106 complete prokaryotic genomes and 45 295 frameshifts predicted in mRNA sequences from 100 eukaryotic genomes. The whole set of fs-genes was grouped into clusters based on sequence similarity between fs-proteins (conceptually translated fs-genes), conservation of the frameshift position and frameshift direction (-1, +1). The fs-genes can be retrieved by similarity search to a given query sequence via a web interface, by fs-gene cluster browsing, etc. Clusters of fs-genes are characterized with respect to their likely origin, such as pseudogenization, phase variation, etc. The largest clusters contain fs-genes with programed frameshifts (related to recoding events). PMID:23161689

  2. RPFdb: a database for genome wide information of translated mRNA generated from ribosome profiling.

    PubMed

    Xie, Shang-Qian; Nie, Peng; Wang, Yan; Wang, Hongwei; Li, Hongyu; Yang, Zhilong; Liu, Yizhi; Ren, Jian; Xie, Zhi

    2016-01-01

    Translational control is crucial in the regulation of gene expression and deregulation of translation is associated with a wide range of cancers and human diseases. Ribosome profiling is a technique that provides genome wide information of mRNA in translation based on deep sequencing of ribosome protected mRNA fragments (RPF). RPFdb is a comprehensive resource for hosting, analyzing and visualizing RPF data, available at www.rpfdb.org or http://sysbio.sysu.edu.cn/rpfdb/index.html. The current version of database contains 777 samples from 82 studies in 8 species, processed and reanalyzed by a unified pipeline. There are two ways to query the database: by keywords of studies or by genes. The outputs are presented in three levels. (i) Study level: including meta information of studies and reprocessed data for gene expression of translated mRNAs; (ii) Sample level: including global perspective of translated mRNA and a list of the most translated mRNA of each sample from a study; (iii) Gene level: including normalized sequence counts of translated mRNA on different genomic location of a gene from multiple samples and studies. To explore rich information provided by RPF, RPFdb also provides a genome browser to query and visualize context-specific translated mRNA. Overall our database provides a simple way to search, analyze, compare, visualize and download RPF data sets. PMID:26433228

  3. Comprehensive coverage of cardiovascular disease data in the disease portals at the Rat Genome Database.

    PubMed

    Wang, Shur-Jen; Laulederkind, Stanley J F; Hayman, G Thomas; Petri, Victoria; Smith, Jennifer R; Tutaj, Marek; Nigam, Rajni; Dwinell, Melinda R; Shimoyama, Mary

    2016-08-01

    Cardiovascular diseases are complex diseases caused by a combination of genetic and environmental factors. To facilitate progress in complex disease research, the Rat Genome Database (RGD) provides the community with a disease portal where genome objects and biological data related to cardiovascular diseases are systematically organized. The purpose of this study is to present biocuration at RGD, including disease, genetic, and pathway data. The RGD curation team uses controlled vocabularies/ontologies to organize data curated from the published literature or imported from disease and pathway databases. These organized annotations are associated with genes, strains, and quantitative trait loci (QTLs), thus linking functional annotations to genome objects. Screen shots from the web pages are used to demonstrate the organization of annotations at RGD. The human cardiovascular disease genes identified by annotations were grouped according to data sources and their annotation profiles were compared by in-house tools and other enrichment tools available to the public. The analysis results show that the imported cardiovascular disease genes from ClinVar and OMIM are functionally different from the RGD manually curated genes in terms of pathway and Gene Ontology annotations. The inclusion of disease genes from other databases enriches the collection of disease genes not only in quantity but also in quality. PMID:27287925

  4. HGD-Chn: The Database of Genome Diversity and Variation for Chinese Populations.

    PubMed

    Hong-Sheng, Gui; Peng, Zhou; Cheng-Bo, Yang; Sheng-Bin, Li

    2009-04-01

    The Database of Genome Diversity and Variation for Chinese Populations is toward a more efficient utilization and sharing of the valuable yet diminishing genetic resources in China (including sample information of healthy populations, healthy pedigrees, disease population and disease pedigrees; genomic diversity data; disease-related allelic and haplotype data). Organization of the database can be divided into two parts: (1) Genetic resources of healthy people--Organizing genetic resources of healthy people. A variety of genetic markers (VNTR, STR, SNP, HLA, and enzyme markers, etc.) are chosen for their diversity among populations, with their distribution among different ethnic groups in China stored in the form of allelic frequency. A further analysis as well as an overall description of the Chinese population genetic structure is also being made possible. (2) Disease genetic resources--Four categories are mainly concerned: chromosomal diseases, monogenic diseases, polygenic diseases, and birth defects. For each kind of disease, the basic introduction and description, sample information, and allelic data of related gene are involved. Aside from research-oriented information, introductory courses oriented at general public covering fields of genomic diversity and variation, the related experimental techniques, standards and specifications could also be accessed in our website. Further more, flexible query and submit system with user-friendly interfaces are also integrated in our website to simplify the process of user-query and administrators' database maintenance work. Online data analyzing and managing tools are developed using bioinformatics algorithm and programming language for a better interpretation of the biological data. PMID:19342283

  5. Adding semantics to genome databases: towards an ontology for molecular biology.

    PubMed

    Schulze-Kremer, S

    1997-01-01

    Molecular biology has a communication problem. There are many databases using their own labels and categories for storing data objects and some using identical labels and categories but with a different meaning. Conversely, one concept is often found under different names. Prominent examples are the concepts "gene" and "protein sequence" which are used with different semantics by major international genomic and protein databases thereby making database integration difficult and strenuous. This situation can only be improved by either defining individual semantic interfaces between each pair of databases (complexity of order n2) or by implementing one agreeable, transparent and computationally tractable semantic repository and linking each database to it (complexity of order n). Ontologies are one means to provide such semantic repository by explicitly specifying the meaning of and relation between the fundamental concepts in an application domain. Here, heuristics for building an ontology and the upper level and a database branch of a prospective Ontology for Molecular Biology are presented and compared to other ontologies with respect to suitability for molecular biology (http:/(/)igd.rz-berlin.mpg.de/www/oe/mbo.html). PMID:9322049

  6. PeanutMap: an online genome database for comparative molecular maps of peanut

    PubMed Central

    Jesubatham, Arun M; Burow, Mark D

    2006-01-01

    Background Molecular maps have been developed for many species, and are of particular importance for varietal development and comparative genomics. However, despite the existence of multiple sets of linkage maps, databases of these data are lacking for many species, including peanut. Description PeanutMap provides a web-based interface for viewing specific linkage groups of a map set. PeanutMap can display and compare multiple maps of a set based upon marker or trait correspondences, which is particularly important as cultivated peanut is a disomic tetraploid. The database can also compare linkage groups among multiple map sets, allowing identification of corresponding linkage groups from results of different research projects. Data from the two published peanut genome map sets, and also from three maps sets of phenotypic traits are present in the database. Data from PeanutMap have been incorporated into the Legume Information System website to allow peanut map data to be used for cross-species comparisons. Conclusion The utility of the database is expected to increase as several SSR-based maps are being developed currently, and expanded efforts for comparative mapping of legumes are underway. Optimal use of these data will benefit from the development of tools to facilitate comparative analysis. PMID:16904007

  7. BμG@Sbase—a microbial gene expression and comparative genomic database

    PubMed Central

    Witney, Adam A.; Waldron, Denise E.; Brooks, Lucy A.; Tyler, Richard H.; Withers, Michael; Stoker, Neil G.; Wren, Brendan W.; Butcher, Philip D.; Hinds, Jason

    2012-01-01

    The reducing cost of high-throughput functional genomic technologies is creating a deluge of high volume, complex data, placing the burden on bioinformatics resources and tool development. The Bacterial Microarray Group at St George's (BμG@S) has been at the forefront of bacterial microarray design and analysis for over a decade and while serving as a hub of a global network of microbial research groups has developed BμG@Sbase, a microbial gene expression and comparative genomic database. BμG@Sbase (http://bugs.sgul.ac.uk/bugsbase/) is a web-browsable, expertly curated, MIAME-compliant database that stores comprehensive experimental annotation and multiple raw and analysed data formats. Consistent annotation is enabled through a structured set of web forms, which guide the user through the process following a set of best practices and controlled vocabulary. The database currently contains 86 expertly curated publicly available data sets (with a further 124 not yet published) and full annotation information for 59 bacterial microarray designs. The data can be browsed and queried using an explorer-like interface; integrating intuitive tree diagrams to present complex experimental details clearly and concisely. Furthermore the modular design of the database will provide a robust platform for integrating other data types beyond microarrays into a more Systems analysis based future. PMID:21948792

  8. SmedGD 2.0: The Schmidtea mediterranea genome database

    PubMed Central

    Robb, Sofia M.C.; Gotting, Kirsten; Ross, Eric; Sánchez Alvarado, Alejandro

    2016-01-01

    Planarians have emerged as excellent models for the study of key biological processes such as stem cell function and regulation, axial polarity specification, regeneration, and tissue homeostasis among others. The most widely used organism for these studies is the free-living flatworm Schmidtea mediterranea. In 2007, the Schmidtea mediterranea Genome Database (SmedGD) was first released to provide a much needed resource for the small, but growing planarian community. SmedGD 1.0 has been a depository for genome sequence, a draft assembly, and related experimental data (e.g., RNAi phenotypes, in situ hybridization images, and differential gene expression results). We report here a comprehensive update to SmedGD (SmedGD 2.0) that aims to expand its role as an interactive community resource. The new database includes more recent, and up-to-date transcription data, provides tools that enhance interconnectivity between different genome assemblies and transcriptomes, including next generation assemblies for both the sexual and asexual biotypes of S. mediterranea. SmedGD 2.0 (http://smedgd.stowers.org) not only provides significantly improved gene annotations, but also tools for data sharing, attributes that will help both the planarian and biomedical communities to more efficiently mine the genomics and transcriptomics of S. mediterranea. PMID:26138588

  9. The MiST2 database: a comprehensive genomics resource on microbial signal transduction

    PubMed Central

    Ulrich, Luke E.; Zhulin, Igor B.

    2010-01-01

    The MiST2 database (http://mistdb.com) identifies and catalogs the repertoire of signal transduction proteins in microbial genomes. Signal transduction systems regulate the majority of cellular activities including the metabolism, development, host-recognition, biofilm production, virulence, and antibiotic resistance of human pathogens. Thus, knowledge of the proteins and interactions that comprise these communication networks is an essential component to furthering biomedical discovery. These are identified by searching protein sequences for specific domain profiles that implicate a protein in signal transduction. Compared to the previous version of the database, MiST2 contains a host of new features and improvements including the following: draft genomes; extracytoplasmic function (ECF) sigma factor protein identification; enhanced classification of signaling proteins; novel, high-quality domain models for identifying histidine kinases and response regulators; neighboring two-component genes; gene cart; better search capabilities; enhanced taxonomy browser; advanced genome browser; and a modern, biologist-friendly web interface. MiST2 currently contains 966 complete and 157 draft bacterial and archaeal genomes, which collectively contain more than 245 000 signal transduction proteins. The majority (66%) of these are one-component systems, followed by two-component proteins (26%), chemotaxis (6%), and finally ECF factors (2%). PMID:19900966

  10. SmedGD 2.0: The Schmidtea mediterranea genome database.

    PubMed

    Robb, Sofia M C; Gotting, Kirsten; Ross, Eric; Sánchez Alvarado, Alejandro

    2015-08-01

    Planarians have emerged as excellent models for the study of key biological processes such as stem cell function and regulation, axial polarity specification, regeneration, and tissue homeostasis among others. The most widely used organism for these studies is the free-living flatworm Schmidtea mediterranea. In 2007, the Schmidtea mediterranea Genome Database (SmedGD) was first released to provide a much needed resource for the small, but growing planarian community. SmedGD 1.0 has been a depository for genome sequence, a draft assembly, and related experimental data (e.g., RNAi phenotypes, in situ hybridization images, and differential gene expression results). We report here a comprehensive update to SmedGD (SmedGD 2.0) that aims to expand its role as an interactive community resource. The new database includes more recent, and up-to-date transcription data, provides tools that enhance interconnectivity between different genome assemblies and transcriptomes, including next-generation assemblies for both the sexual and asexual biotypes of S. mediterranea. SmedGD 2.0 (http://smedgd.stowers.org) not only provides significantly improved gene annotations, but also tools for data sharing, attributes that will help both the planarian and biomedical communities to more efficiently mine the genomics and transcriptomics of S. mediterranea. PMID:26138588

  11. Exploring novel candidate genes from the Mouse Genome Informatics database: Potential implications for avian migration research.

    PubMed

    Contina, Andrea; Bridge, Eli S; Kelly, Jeffrey F

    2016-07-01

    To search for genes associated with migratory phenotypes in songbirds, we selected candidate genes through annotations from the Mouse Genome Informatics database and assembled an extensive candidate-gene library. Then, we implemented a next-generation sequencing approach to obtain DNA sequences from the Painted Bunting genome. We focused on those sequences that were conserved across avian species and that aligned with candidate genes in our mouse library. We genotyped short sequence repeats from the following candidate genes: ADRA1d, ANKRD17, CISH and MYH7. We studied the possible correlations between allelic variations occurring in these novel candidate migration genes and avian migratory phenotypes available from the published literature. We found that allele variation at MYH7 correlated with a calculated index of speed of migration (km/day) across 11 species of songbirds. We highlight the potential of the Mouse Genome Informatics database in providing new candidate genes that might play a crucial role in regulating migration in birds and possibly in other taxa. Our research effort shows the benefits and limitations of working with extensive genomic datasets and offers a snapshot of the challenges related to cross-species validation in behavioral and molecular ecology studies. PMID:27061206

  12. The Generic Genome Browser: A Building Block for a Model Organism System Database

    PubMed Central

    Stein, Lincoln D.; Mungall, Christopher; Shu, ShengQiang; Caudy, Michael; Mangone, Marco; Day, Allen; Nickerson, Elizabeth; Stajich, Jason E.; Harris, Todd W.; Arva, Adrian; Lewis, Suzanna

    2002-01-01

    The Generic Model Organism System Database Project (GMOD) seeks to develop reusable software components for model organism system databases. In this paper we describe the Generic Genome Browser (GBrowse), a Web-based application for displaying genomic annotations and other features. For the end user, features of the browser include the ability to scroll and zoom through arbitrary regions of a genome, to enter a region of the genome by searching for a landmark or performing a full text search of all features, and the ability to enable and disable tracks and change their relative order and appearance. The user can upload private annotations to view them in the context of the public ones, and publish those annotations to the community. For the data provider, features of the browser software include reliance on readily available open source components, simple installation, flexible configuration, and easy integration with other components of a model organism system Web site. GBrowse is freely available under an open source license. The software, its documentation, and support are available at http://www.gmod.org. PMID:12368253

  13. PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database

    PubMed Central

    Davis, James J.; Gerdes, Svetlana; Olsen, Gary J.; Olson, Robert; Pusch, Gordon D.; Shukla, Maulik; Vonstein, Veronika; Wattam, Alice R.; Yoo, Hyunseung

    2016-01-01

    The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods. PMID:26903996

  14. Construction of a promoter probe vector autonomously maintained in Aspergillus and characterization of promoter regions derived from A. niger and A. oryzae genomes.

    PubMed

    Ozeki, K; Kanda, A; Hamachi, M; Nunokawa, Y

    1996-03-01

    We used a plasmid carrying a sequence for autonomous maintenance in Aspergillus (AMA1) and the E. coli uidA gene as a reporter gene to search the A. oryzae and A. niger genomes for DNA fragments having strong promoter activity. Beta-glucuronidase (GUS)-producing A. oryzae transformants containing the No. 8AN derived from A. niger, or the No. 9AO derived from A. oryzae, were constitutive for the expression of the uidA gene when cultivated in the presence of a variety of carbon and nitrogen sources. When the GUS-producing transformants were grown in liquid culture, the No. 8AN showed an increase of approximately 3-fold in GUS activity compared to the amyB (alpha-amylase encoding gene) promoter. There was also a corresponding increase in the amount of GUS gene-specific mRNA. When these transformants were grown as rice-koji, the No. 8AN showed an increase of approximately 6-fold compared to the amyB promoter, and the amount of GUS protein produced also increased. These strong promoter regions might be applicable to the production of other heterologous proteins in Aspergillus species. PMID:8901095

  15. SymbioGenomesDB: a database for the integration and access to knowledge on host–symbiont relationships

    PubMed Central

    Reyes-Prieto, Mariana; Vargas-Chávez, Carlos; Latorre, Amparo; Moya, Andrés

    2015-01-01

    Symbiotic relationships occur naturally throughout the tree of life, either in a commensal, mutualistic or pathogenic manner. The genomes of multiple organisms involved in symbiosis are rapidly being sequenced and becoming available, especially those from the microbial world. Currently, there are numerous databases that offer information on specific organisms or models, but none offer a global understanding on relationships between organisms, their interactions and capabilities within their niche, as well as their role as part of a system, in this case, their role in symbiosis. We have developed the SymbioGenomesDB as a community database resource for laboratories which intend to investigate and use information on the genetics and the genomics of organisms involved in these relationships. The ultimate goal of SymbioGenomesDB is to host and support the growing and vast symbiotic–host relationship information, to uncover the genetic basis of such associations. SymbioGenomesDB maintains a comprehensive organization of information on genomes of symbionts from diverse hosts throughout the Tree of Life, including their sequences, their metadata and their genomic features. This catalog of relationships was generated using computational tools, custom R scripts and manual integration of data available in public literature. As a highly curated and comprehensive systems database, SymbioGenomesDB provides web access to all the information of symbiotic organisms, their features and links to the central database NCBI. Three different tools can be found within the database to explore symbiosis-related organisms, their genes and their genomes. Also, we offer an orthology search for one or multiple genes in one or multiple organisms within symbiotic relationships, and every table, graph and output file is downloadable and easy to parse for further analysis. The robust SymbioGenomesDB will be constantly updated to cope with all the data being generated and included in major

  16. SymbioGenomesDB: a database for the integration and access to knowledge on host-symbiont relationships.

    PubMed

    Reyes-Prieto, Mariana; Vargas-Chávez, Carlos; Latorre, Amparo; Moya, Andrés

    2015-01-01

    Symbiotic relationships occur naturally throughout the tree of life, either in a commensal, mutualistic or pathogenic manner. The genomes of multiple organisms involved in symbiosis are rapidly being sequenced and becoming available, especially those from the microbial world. Currently, there are numerous databases that offer information on specific organisms or models, but none offer a global understanding on relationships between organisms, their interactions and capabilities within their niche, as well as their role as part of a system, in this case, their role in symbiosis. We have developed the SymbioGenomesDB as a community database resource for laboratories which intend to investigate and use information on the genetics and the genomics of organisms involved in these relationships. The ultimate goal of SymbioGenomesDB is to host and support the growing and vast symbiotic-host relationship information, to uncover the genetic basis of such associations. SymbioGenomesDB maintains a comprehensive organization of information on genomes of symbionts from diverse hosts throughout the Tree of Life, including their sequences, their metadata and their genomic features. This catalog of relationships was generated using computational tools, custom R scripts and manual integration of data available in public literature. As a highly curated and comprehensive systems database, SymbioGenomesDB provides web access to all the information of symbiotic organisms, their features and links to the central database NCBI. Three different tools can be found within the database to explore symbiosis-related organisms, their genes and their genomes. Also, we offer an orthology search for one or multiple genes in one or multiple organisms within symbiotic relationships, and every table, graph and output file is downloadable and easy to parse for further analysis. The robust SymbioGenomesDB will be constantly updated to cope with all the data being generated and included in major

  17. ReplicationDomain: a visualization tool and comparative database for genome-wide replication timing data

    PubMed Central

    Weddington, Nodin; Stuy, Alexander; Hiratani, Ichiro; Ryba, Tyrone; Yokochi, Tomoki; Gilbert, David M

    2008-01-01

    Background Eukaryotic DNA replication is regulated at the level of large chromosomal domains (0.5–5 megabases in mammals) within which replicons are activated relatively synchronously. These domains replicate in a specific temporal order during S-phase and our genome-wide analyses of replication timing have demonstrated that this temporal order of domain replication is a stable property of specific cell types. Results We have developed ReplicationDomain as a web-based database for analysis of genome-wide replication timing maps (replication profiles) from various cell lines and species. This database also provides comparative information of transcriptional expression and is configured to display any genome-wide property (for instance, ChIP-Chip or ChIP-Seq data) via an interactive web interface. Our published microarray data sets are publicly available. Users may graphically display these data sets for a selected genomic region and download the data displayed as text files, or alternatively, download complete genome-wide data sets. Furthermore, we have implemented a user registration system that allows registered users to upload their own data sets. Upon uploading, registered users may choose to: (1) view their data sets privately without sharing; (2) share with other registered users; or (3) make their published or "in press" data sets publicly available, which can fulfill journal and funding agencies' requirements for data sharing. Conclusion ReplicationDomain is a novel and powerful tool to facilitate the comparative visualization of replication timing in various cell types as well as other genome-wide chromatin features and is considerably faster and more convenient than existing browsers when viewing multi-megabase segments of chromosomes. Furthermore, the data upload function with the option of private viewing or sharing of data sets between registered users should be a valuable resource for the scientific community. PMID:19077204

  18. Fine De Novo Sequencing of a Fungal Genome Using only SOLiD Short Read Data: Verification on Aspergillus oryzae RIB40

    PubMed Central

    Takeda, Itaru; Hagiwara, Hiroko; Ikegami, Tsutomu; Koike, Hideaki; Machida, Masayuki

    2013-01-01

    The development of next-generation sequencing (NGS) technologies has dramatically increased the throughput, speed, and efficiency of genome sequencing. The short read data generated from NGS platforms, such as SOLiD and Illumina, are quite useful for mapping analysis. However, the SOLiD read data with lengths of <60 bp have been considered to be too short for de novo genome sequencing. Here, to investigate whether de novo sequencing of fungal genomes is possible using only SOLiD short read sequence data, we performed de novo assembly of the Aspergillus oryzae RIB40 genome using only SOLiD read data of 50 bp generated from mate-paired libraries with 2.8- or 1.9-kb insert sizes. The assembled scaffolds showed an N50 value of 1.6 Mb, a 22-fold increase than those obtained using only SOLiD short read in other published reports. In addition, almost 99% of the reference genome was accurately aligned by the assembled scaffold fragments in long lengths. The sequences of secondary metabolite biosynthetic genes and clusters, whose products are of considerable interest in fungal studies due to their potential medicinal, agricultural, and cosmetic properties, were also highly reconstructed in the assembled scaffolds. Based on these findings, we concluded that de novo genome sequencing using only SOLiD short reads is feasible and practical for molecular biological study of fungi. We also investigated the effect of filtering low quality data, library insert size, and k-mer size on the assembly performance, and recommend for the assembly use of mild filtered read data where the N50 was not so degraded and the library has an insert size of ∼2.0 kb, and k-mer size 33. PMID:23667655

  19. ATGC: a database of orthologous genes from closely related prokaryotic genomes and a research platform for microevolution of prokaryotes

    SciTech Connect

    Novichkov, Pavel S.; Ratnere, Igor; Wolf, Yuri I.; Koonin, Eugene V.; Dubchak, Inna

    2009-07-23

    The database of Alignable Tight Genomic Clusters (ATGCs) consists of closely related genomes of archaea and bacteria, and is a resource for research into prokaryotic microevolution. Construction of a data set with appropriate characteristics is a major hurdle for this type of studies. With the current rate of genome sequencing, it is difficult to follow the progress of the field and to determine which of the available genome sets meet the requirements of a given research project, in particular, with respect to the minimum and maximum levels of similarity between the included genomes. Additionally, extraction of specific content, such as genomic alignments or families of orthologs, from a selected set of genomes is a complicated and time-consuming process. The database addresses these problems by providing an intuitive and efficient web interface to browse precomputed ATGCs, select appropriate ones and access ATGC-derived data such as multiple alignments of orthologous proteins, matrices of pairwise intergenomic distances based on genome-wide analysis of synonymous and nonsynonymous substitution rates and others. The ATGC database will be regularly updated following new releases of the NCBI RefSeq. The database is hosted by the Genomics Division at Lawrence Berkeley National laboratory and is publicly available at http://atgc.lbl.gov.

  20. Construction of an Ortholog Database Using the Semantic Web Technology for Integrative Analysis of Genomic Data

    PubMed Central

    Chiba, Hirokazu; Nishide, Hiroyo; Uchiyama, Ikuo

    2015-01-01

    Recently, various types of biological data, including genomic sequences, have been rapidly accumulating. To discover biological knowledge from such growing heterogeneous data, a flexible framework for data integration is necessary. Ortholog information is a central resource for interlinking corresponding genes among different organisms, and the Semantic Web provides a key technology for the flexible integration of heterogeneous data. We have constructed an ortholog database using the Semantic Web technology, aiming at the integration of numerous genomic data and various types of biological information. To formalize the structure of the ortholog information in the Semantic Web, we have constructed the Ortholog Ontology (OrthO). While the OrthO is a compact ontology for general use, it is designed to be extended to the description of database-specific concepts. On the basis of OrthO, we described the ortholog information from our Microbial Genome Database for Comparative Analysis (MBGD) in the form of Resource Description Framework (RDF) and made it available through the SPARQL endpoint, which accepts arbitrary queries specified by users. In this framework based on the OrthO, the biological data of different organisms can be integrated using the ortholog information as a hub. Besides, the ortholog information from different data sources can be compared with each other using the OrthO as a shared ontology. Here we show some examples demonstrating that the ortholog information described in RDF can be used to link various biological data such as taxonomy information and Gene Ontology. Thus, the ortholog database using the Semantic Web technology can contribute to biological knowledge discovery through integrative data analysis. PMID:25875762

  1. Construction of an ortholog database using the semantic web technology for integrative analysis of genomic data.

    PubMed

    Chiba, Hirokazu; Nishide, Hiroyo; Uchiyama, Ikuo

    2015-01-01

    Recently, various types of biological data, including genomic sequences, have been rapidly accumulating. To discover biological knowledge from such growing heterogeneous data, a flexible framework for data integration is necessary. Ortholog information is a central resource for interlinking corresponding genes among different organisms, and the Semantic Web provides a key technology for the flexible integration of heterogeneous data. We have constructed an ortholog database using the Semantic Web technology, aiming at the integration of numerous genomic data and various types of biological information. To formalize the structure of the ortholog information in the Semantic Web, we have constructed the Ortholog Ontology (OrthO). While the OrthO is a compact ontology for general use, it is designed to be extended to the description of database-specific concepts. On the basis of OrthO, we described the ortholog information from our Microbial Genome Database for Comparative Analysis (MBGD) in the form of Resource Description Framework (RDF) and made it available through the SPARQL endpoint, which accepts arbitrary queries specified by users. In this framework based on the OrthO, the biological data of different organisms can be integrated using the ortholog information as a hub. Besides, the ortholog information from different data sources can be compared with each other using the OrthO as a shared ontology. Here we show some examples demonstrating that the ortholog information described in RDF can be used to link various biological data such as taxonomy information and Gene Ontology. Thus, the ortholog database using the Semantic Web technology can contribute to biological knowledge discovery through integrative data analysis. PMID:25875762

  2. Databases of genomic variation and phenotypes: existing resources and future needs

    PubMed Central

    Johnston, Jennifer J.; Biesecker, Leslie G.

    2013-01-01

    Massively parallel sequencing (MPS) has become an important tool for identifying medically significant variants in both research and the clinic. Accurate variation and genotype–phenotype databases are critical in our ability to make sense of the vast amount of information that MPS generates. The purpose of this review is to summarize the state of the art of variation and genotype–phenotype databases, how they can be used, and opportunities to improve these resources. Our working assumption is that the objective of the clinical genomicist is to identify highly penetrant variants that could explain existing disease or predict disease risk for individual patients or research participants. We have detailed how current databases contribute to this goal providing frequency data, literature reviews and predictions of causation for individual variants. For variant annotation, databases vary greatly in their ease of use, the use of standard mutation nomenclature, the comprehensiveness of the variant cataloging and the degree of expert opinion. Ultimately, we need a dynamic and comprehensive reference database of medically important variants that is easily cross referenced to exome and genome sequence data and allows for an accumulation of expert opinion. PMID:23962721

  3. GrainGenes, the genome database for small-grain crops.

    PubMed

    Matthews, David E; Carollo, Victoria L; Lazo, Gerard R; Anderson, Olin D

    2003-01-01

    GrainGenes, http://www.graingenes.org, is the international database for the wheat, barley, rye and oat genomes. For these species it is the primary repository for information about genetic maps, mapping probes and primers, genes, alleles and QTLs. Documentation includes such data as primer sequences, polymorphism descriptions, genotype and trait scoring data, experimental protocols used, and photographs of marker polymorphisms, disease symptoms and mutant phenotypes. These data, curated with the help of many members of the research community, are integrated with sequence and bibliographic records selected from external databases and results of BLAST searches of the ESTs. Records are linked to corresponding records in other important databases, e.g. Gramene's EST homologies to rice BAC/PACs, TIGR's Gene Indices and GenBank. In addition to this information within the GrainGenes database itself, the GrainGenes homepage at http://wheat.pw.usda.gov provides many other community resources including publications (the annual newsletters for wheat, barley and oat, monographs and articles), individual datasets (mapping and QTL studies, polymorphism surveys, variety performance evaluations), specialized databases (Triticeae repeat sequences, EST unigene sets) and pages to facilitate coordination of cooperative research efforts in specific areas such as SNP development, EST-SSRs and taxonomy. The goal is to serve as a central point for obtaining and contributing information about the genetics and biology of these cereal crops. PMID:12519977

  4. The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease

    PubMed Central

    Eppig, Janan T.; Blake, Judith A.; Bult, Carol J.; Kadin, James A.; Richardson, Joel E.

    2015-01-01

    The Mouse Genome Database (MGD, http://www.informatics.jax.org) serves the international biomedical research community as the central resource for integrated genomic, genetic and biological data on the laboratory mouse. To facilitate use of mouse as a model in translational studies, MGD maintains a core of high-quality curated data and integrates experimentally and computationally generated data sets. MGD maintains a unified catalog of genes and genome features, including functional RNAs, QTL and phenotypic loci. MGD curates and provides functional and phenotype annotations for mouse genes using the Gene Ontology and Mammalian Phenotype Ontology. MGD integrates phenotype data and associates mouse genotypes to human diseases, providing critical mouse–human relationships and access to repositories holding mouse models. MGD is the authoritative source of nomenclature for genes, genome features, alleles and strains following guidelines of the International Committee on Standardized Genetic Nomenclature for Mice. A new addition to MGD, the Human–Mouse: Disease Connection, allows users to explore gene–phenotype–disease relationships between human and mouse. MGD has also updated search paradigms for phenotypic allele attributes, incorporated incidental mutation data, added a module for display and exploration of genes and microRNA interactions and adopted the JBrowse genome browser. MGD resources are freely available to the scientific community. PMID:25348401

  5. The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease.

    PubMed

    Eppig, Janan T; Blake, Judith A; Bult, Carol J; Kadin, James A; Richardson, Joel E

    2015-01-01

    The Mouse Genome Database (MGD, http://www.informatics.jax.org) serves the international biomedical research community as the central resource for integrated genomic, genetic and biological data on the laboratory mouse. To facilitate use of mouse as a model in translational studies, MGD maintains a core of high-quality curated data and integrates experimentally and computationally generated data sets. MGD maintains a unified catalog of genes and genome features, including functional RNAs, QTL and phenotypic loci. MGD curates and provides functional and phenotype annotations for mouse genes using the Gene Ontology and Mammalian Phenotype Ontology. MGD integrates phenotype data and associates mouse genotypes to human diseases, providing critical mouse-human relationships and access to repositories holding mouse models. MGD is the authoritative source of nomenclature for genes, genome features, alleles and strains following guidelines of the International Committee on Standardized Genetic Nomenclature for Mice. A new addition to MGD, the Human-Mouse: Disease Connection, allows users to explore gene-phenotype-disease relationships between human and mouse. MGD has also updated search paradigms for phenotypic allele attributes, incorporated incidental mutation data, added a module for display and exploration of genes and microRNA interactions and adopted the JBrowse genome browser. MGD resources are freely available to the scientific community. PMID:25348401

  6. PhylomeDB: a database for genome-wide collections of gene phylogenies

    PubMed Central

    Huerta-Cepas, Jaime; Bueno, Anibal; Dopazo, Joaquín; Gabaldón, Toni

    2008-01-01

    The complete collection of evolutionary histories of all genes in a genome, also known as phylome, constitutes a valuable source of information. The reconstruction of phylomes has been previously prevented by large demands of time and computer power, but is now feasible thanks to recent developments in computers and algorithms. To provide a publicly available repository of complete phylomes that allows researchers to access and store large-scale phylogenomic analyses, we have developed PhylomeDB. PhylomeDB is a database of complete phylomes derived for different genomes within a specific taxonomic range. All phylomes in the database are built using a high-quality phylogenetic pipeline that includes evolutionary model testing and alignment trimming phases. For each genome, PhylomeDB provides the alignments, phylogentic trees and tree-based orthology predictions for every single encoded protein. The current version of PhylomeDB includes the phylomes of Human, the yeast Saccharomyces cerevisiae and the bacterium Escherichia coli, comprising a total of 32 289 seed sequences with their corresponding alignments and 172 324 phylogenetic trees. PhylomeDB can be publicly accessed at http://phylomedb.bioinfo.cipf.es PMID:17962297

  7. SoyBase, the USDA-ARS soybean genetics and genomics database.

    PubMed

    Grant, David; Nelson, Rex T; Cannon, Steven B; Shoemaker, Randy C

    2010-01-01

    SoyBase, the USDA-ARS soybean genetic database, is a comprehensive repository for professionally curated genetics, genomics and related data resources for soybean. SoyBase contains the most current genetic, physical and genomic sequence maps integrated with qualitative and quantitative traits. The quantitative trait loci (QTL) represent more than 18 years of QTL mapping of more than 90 unique traits. SoyBase also contains the well-annotated 'Williams 82' genomic sequence and associated data mining tools. The genetic and sequence views of the soybean chromosomes and the extensive data on traits and phenotypes are extensively interlinked. This allows entry to the database using almost any kind of available information, such as genetic map symbols, soybean gene names or phenotypic traits. SoyBase is the repository for controlled vocabularies for soybean growth, development and trait terms, which are also linked to the more general plant ontologies. SoyBase can be accessed at http://soybase.org. PMID:20008513

  8. A low-latency, big database system and browser for storage, querying and visualization of 3D genomic data.

    PubMed

    Butyaev, Alexander; Mavlyutov, Ruslan; Blanchette, Mathieu; Cudré-Mauroux, Philippe; Waldispühl, Jérôme

    2015-09-18

    Recent releases of genome three-dimensional (3D) structures have the potential to transform our understanding of genomes. Nonetheless, the storage technology and visualization tools need to evolve to offer to the scientific community fast and convenient access to these data. We introduce simultaneously a database system to store and query 3D genomic data (3DBG), and a 3D genome browser to visualize and explore 3D genome structures (3DGB). We benchmark 3DBG against state-of-the-art systems and demonstrate that it is faster than previous solutions, and importantly gracefully scales with the size of data. We also illustrate the usefulness of our 3D genome Web browser to explore human genome structures. The 3D genome browser is available at http://3dgb.cs.mcgill.ca/. PMID:25990738

  9. A low-latency, big database system and browser for storage, querying and visualization of 3D genomic data

    PubMed Central

    Butyaev, Alexander; Mavlyutov, Ruslan; Blanchette, Mathieu; Cudré-Mauroux, Philippe; Waldispühl, Jérôme

    2015-01-01

    Recent releases of genome three-dimensional (3D) structures have the potential to transform our understanding of genomes. Nonetheless, the storage technology and visualization tools need to evolve to offer to the scientific community fast and convenient access to these data. We introduce simultaneously a database system to store and query 3D genomic data (3DBG), and a 3D genome browser to visualize and explore 3D genome structures (3DGB). We benchmark 3DBG against state-of-the-art systems and demonstrate that it is faster than previous solutions, and importantly gracefully scales with the size of data. We also illustrate the usefulness of our 3D genome Web browser to explore human genome structures. The 3D genome browser is available at http://3dgb.cs.mcgill.ca/. PMID:25990738

  10. The Eukaryotic Pathogen Databases: a functional genomic resource integrating data from human and veterinary parasites.

    PubMed

    Harb, Omar S; Roos, David S

    2015-01-01

    Over the past 20 years, advances in high-throughput biological techniques and the availability of computational resources including fast Internet access have resulted in an explosion of large genome-scale data sets "big data." While such data are readily available for download and personal use and analysis from a variety of repositories, often such analysis requires access to seldom-available computational skills. As a result a number of databases have emerged to provide scientists with online tools enabling the interrogation of data without the need for sophisticated computational skills beyond basic knowledge of Internet browser utility. This chapter focuses on the Eukaryotic Pathogen Databases (EuPathDB: http://eupathdb.org) Bioinformatic Resource Center (BRC) and illustrates some of the available tools and methods. PMID:25388105

  11. The PlantsP and PlantsT Functional Genomics Databases.

    PubMed

    Tchieu, Jason H; Fana, Fariba; Fink, J Lynn; Harper, Jeffrey; Nair, T Murlidharan; Niedner, R Hannes; Smith, Douglas W; Steube, Kenneth; Tam, Tobey M; Veretnik, Stella; Wang, Degeng; Gribskov, Michael

    2003-01-01

    PlantsP and PlantsT allow users to quickly gain a global understanding of plant phosphoproteins and plant membrane transporters, respectively, from evolutionary relationships to biochemical function as well as a deep understanding of the molecular biology of individual genes and their products. As one database with two functionally different web interfaces, PlantsP and PlantsT are curated plant-specific databases that combine sequence-derived information with experimental functional-genomics data. PlantsP focuses on proteins involved in the phosphorylation process (i.e., kinases and phosphatases), whereas PlantsT focuses on membrane transport proteins. Experimentally, PlantsP provides a resource for information on a collection of T-DNA insertion mutants (knockouts) in each kinase and phosphatase, primarily in Arabidopsis thaliana, and PlantsT uniquely combines experimental data regarding mineral composition (derived from inductively coupled plasma atomic emission spectroscopy) of mutant and wild-type strains. Both databases provide extensive information on motifs and domains, detailed information contributed by individual experts in their respective fields, and descriptive information drawn directly from the literature. The databases incorporate a unique user annotation and review feature aimed at acquiring expert annotation directly from the plant biology community. PlantsP is available at http://plantsp.sdsc.edu and PlantsT is available at http://plantst.sdsc.edu. PMID:12520018

  12. Research Update: The materials genome initiative: Data sharing and the impact of collaborative ab initio databases

    NASA Astrophysics Data System (ADS)

    Jain, Anubhav; Persson, Kristin A.; Ceder, Gerbrand

    2016-05-01

    Materials innovations enable new technological capabilities and drive major societal advancements but have historically required long and costly development cycles. The Materials Genome Initiative (MGI) aims to greatly reduce this time and cost. In this paper, we focus on data reuse in the MGI and, in particular, discuss the impact of three different computational databases based on density functional theory methods to the research community. We also discuss and provide recommendations on technical aspects of data reuse, outline remaining fundamental challenges, and present an outlook on the future of MGI's vision of data sharing.

  13. Mining the Plasmodium genome database to define organellar function: what does the apicoplast do?

    PubMed Central

    Roos, David S; Crawford, Michael J; Donald, Robert G K; Fraunholz, Martin; Harb, Omar S; He, Cynthia Y; Kissinger, Jessica C; Shaw, Michael K; Striepen, Boris

    2002-01-01

    Apicomplexan species constitute a diverse group of parasitic protozoa, which are responsible for a wide range of diseases in many organisms. Despite differences in the diseases they cause, these parasites share an underlying biology, from the genetic controls used to differentiate through the complex parasite life cycle, to the basic biochemical pathways employed for intracellular survival, to the distinctive cell biology necessary for host cell attachment and invasion. Different parasites lend themselves to the study of different aspects of parasite biology: Eimeria for biochemical studies, Toxoplasma for molecular genetic and cell biological investigation, etc. The Plasmodium falciparum Genome Project contributes the first large-scale genomic sequence for an apicomplexan parasite. The Plasmodium Genome Database (http://PlasmoDB.org) has been designed to permit individual investigators to ask their own questions, even prior to formal release of the reference P. falciparum genome sequence. As a case in point, PlasmoDB has been exploited to identify metabolic pathways associated with the apicomplexan plastid, or 'apicoplast' - an essential organelle derived by secondary endosymbiosis of an alga, and retention of the algal plastid. PMID:11839180

  14. Aspergillus niger genome-wide analysis reveals a large number of novel alpha-glucan acting enzymes with unexpected expression profiles.

    PubMed

    Yuan, Xiao-Lian; van der Kaaij, Rachel M; van den Hondel, Cees A M J J; Punt, Peter J; van der Maarel, Marc J E C; Dijkhuizen, Lubbert; Ram, Arthur F J

    2008-06-01

    The filamentous ascomycete Aspergillus niger is well known for its ability to produce a large variety of enzymes for the degradation of plant polysaccharide material. A major carbon and energy source for this soil fungus is starch, which can be degraded by the concerted action of alpha-amylase, glucoamylase and alpha-glucosidase enzymes, members of the glycoside hydrolase (GH) families 13, 15 and 31, respectively. In this study we have combined analysis of the genome sequence of A. niger CBS 513.88 with microarray experiments to identify novel enzymes from these families and to predict their physiological functions. We have identified 17 previously unknown family GH13, 15 and 31 enzymes in the A. niger genome, all of which have orthologues in other aspergilli. Only two of the newly identified enzymes, a putative alpha-glucosidase (AgdB) and an alpha-amylase (AmyC), were predicted to play a role in starch degradation. The expression of the majority of the genes identified was not induced by maltose as carbon source, and not dependent on the presence of AmyR, the transcriptional regulator for starch degrading enzymes. The possible physiological functions of the other predicted family GH13, GH15 and GH31 enzymes, including intracellular enzymes and cell wall associated proteins, in alternative alpha-glucan modifying processes are discussed. PMID:18320228

  15. The FunGenES Database: A Genomics Resource for Mouse Embryonic Stem Cell Differentiation

    PubMed Central

    Adler, Priit; Aksoy, Irène; Anastassiadis, Konstantinos; Bader, Michael; Billon, Nathalie; Boeuf, Hélène; Bourillot, Pierre-Yves; Buchholz, Frank; Dani, Christian; Doss, Michael Xavier; Forrester, Lesley; Gitton, Murielle; Henrique, Domingos; Hescheler, Jürgen; Himmelbauer, Heinz; Hübner, Norbert; Karantzali, Efthimia; Kretsovali, Androniki; Lubitz, Sandra; Pradier, Laurent; Rai, Meena; Reimand, Jüri; Rolletschek, Alexandra; Sachinidis, Agapios; Savatier, Pierre; Stewart, Francis; Storm, Mike P.; Trouillas, Marina; Vilo, Jaak; Welham, Melanie J.; Winkler, Johannes; Wobus, Anna M.; Hatzopoulos, Antonis K.

    2009-01-01

    Embryonic stem (ES) cells have high self-renewal capacity and the potential to differentiate into a large variety of cell types. To investigate gene networks operating in pluripotent ES cells and their derivatives, the “Functional Genomics in Embryonic Stem Cells” consortium (FunGenES) has analyzed the transcriptome of mouse ES cells in eleven diverse settings representing sixty-seven experimental conditions. To better illustrate gene expression profiles in mouse ES cells, we have organized the results in an interactive database with a number of features and tools. Specifically, we have generated clusters of transcripts that behave the same way under the entire spectrum of the sixty-seven experimental conditions; we have assembled genes in groups according to their time of expression during successive days of ES cell differentiation; we have included expression profiles of specific gene classes such as transcription regulatory factors and Expressed Sequence Tags; transcripts have been arranged in “Expression Waves” and juxtaposed to genes with opposite or complementary expression patterns; we have designed search engines to display the expression profile of any transcript during ES cell differentiation; gene expression data have been organized in animated graphs of KEGG signaling and metabolic pathways; and finally, we have incorporated advanced functional annotations for individual genes or gene clusters of interest and links to microarray and genomic resources. The FunGenES database provides a comprehensive resource for studies into the biology of ES cells. PMID:19727443

  16. Coverage of whole proteome by structural genomics observed through protein homology modeling database

    PubMed Central

    Yamaguchi, Akihiro; Go, Mitiko

    2006-01-01

    We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE (http://daisy.nagahama-i-bio.ac.jp/Famsbase/), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50% of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60% of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30% in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5%, and that for eubacteria by 7% in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics. PMID:17146617

  17. First whole genome based microsatellite DNA marker database of tomato for mapping and variety identification

    PubMed Central

    2013-01-01

    Background The cultivated tomato is second most consumed vegetable of the world and is an important part of a diverse and balanced diet as a rich source of vitamins, minerals, phenolic antioxidants and antioxidant lycopene having anti-cancer properties. To reap benefit of genomics of the domestic tomato (Solanum lycopersicum L.) unravelled by Tomato Genome Consortium (The Tomato Genome Consortium, 2012), the bulk mining of its markers in totality is imperative and critically required. The solgenomics has limited number of microsatellite DNA markers (2867) pertaining to solanaceae family. As these markers are of linkage map having relative distance, the choice of selected markers based on absolute distance as of physical map is missing. Only limited microsatellite markers with limitations are reported for variety identification thus there is a need for more markers supplementing DUS test and also for traceability of product in global market. Description We present here the first whole genome based microsatellite DNA marker database of tomato, TomSatDB (Tomato MicroSatellite Database) with more than 1.4 million markers mined in-silico, using MIcroSAtellite (MISA) tool. To cater the customized needs of wet lab, features with a novelty of an automated primer designing tool is added. TomSatDB (http://cabindb.iasri.res.in/tomsatdb), a user-friendly and freely accessible tool offers chromosome wise as well as location wise search of primers. It is an online relational database based on “three-tier architecture” that catalogues information of microsatellites in MySQL and user-friendly interface developed using PHP (Hypertext Pre Processor). Conclusion Besides abiotic stress, tomato is known to have biotic stress due to its susceptibility over 200 diseases caused by pathogenic fungi, bacteria, viruses and nematodes. These markers are expected to pave the way of germplasm management over abiotic and biotic stress as well as improvement through molecular breeding, leading

  18. dbWGFP: a database and web server of human whole-genome single nucleotide variants and their functional predictions

    PubMed Central

    Wu, Jiaxin; Wu, Mengmeng; Li, Lianshuo; Liu, Zhuo; Zeng, Wanwen; Jiang, Rui

    2016-01-01

    The recent advancement of the next generation sequencing technology has enabled the fast and low-cost detection of all genetic variants spreading across the entire human genome, making the application of whole-genome sequencing a tendency in the study of disease-causing genetic variants. Nevertheless, there still lacks a repository that collects predictions of functionally damaging effects of human genetic variants, though it has been well recognized that such predictions play a central role in the analysis of whole-genome sequencing data. To fill this gap, we developed a database named dbWGFP (a database and web server of human whole-genome single nucleotide variants and their functional predictions) that contains functional predictions and annotations of nearly 8.58 billion possible human whole-genome single nucleotide variants. Specifically, this database integrates 48 functional predictions calculated by 17 popular computational methods and 44 valuable annotations obtained from various data sources. Standalone software, user-friendly query services and free downloads of this database are available at http://bioinfo.au.tsinghua.edu.cn/dbwgfp. dbWGFP provides a valuable resource for the analysis of whole-genome sequencing, exome sequencing and SNP array data, thereby complementing existing data sources and computational resources in deciphering genetic bases of human inherited diseases. PMID:26989155

  19. dbWGFP: a database and web server of human whole-genome single nucleotide variants and their functional predictions.

    PubMed

    Wu, Jiaxin; Wu, Mengmeng; Li, Lianshuo; Liu, Zhuo; Zeng, Wanwen; Jiang, Rui

    2016-01-01

    The recent advancement of the next generation sequencing technology has enabled the fast and low-cost detection of all genetic variants spreading across the entire human genome, making the application of whole-genome sequencing a tendency in the study of disease-causing genetic variants. Nevertheless, there still lacks a repository that collects predictions of functionally damaging effects of human genetic variants, though it has been well recognized that such predictions play a central role in the analysis of whole-genome sequencing data. To fill this gap, we developed a database named dbWGFP (a database and web server of human whole-genome single nucleotide variants and their functional predictions) that contains functional predictions and annotations of nearly 8.58 billion possible human whole-genome single nucleotide variants. Specifically, this database integrates 48 functional predictions calculated by 17 popular computational methods and 44 valuable annotations obtained from various data sources. Standalone software, user-friendly query services and free downloads of this database are available at http://bioinfo.au.tsinghua.edu.cn/dbwgfp. dbWGFP provides a valuable resource for the analysis of whole-genome sequencing, exome sequencing and SNP array data, thereby complementing existing data sources and computational resources in deciphering genetic bases of human inherited diseases. PMID:26989155

  20. OntoMate: a text-mining tool aiding curation at the Rat Genome Database.

    PubMed

    Liu, Weisong; Laulederkind, Stanley J F; Hayman, G Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R; De Pons, Jeff; Dwinell, Melinda R; Shimoyama, Mary

    2015-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu. PMID:25619558

  1. OntoMate: a text-mining tool aiding curation at the Rat Genome Database

    PubMed Central

    Liu, Weisong; Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R.; De Pons, Jeff; Dwinell, Melinda R.; Shimoyama, Mary

    2015-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu PMID:25619558

  2. Genome-wide analysis of the Zn(II)2Cys6 zinc cluster-encoding gene family in Aspergillus flavus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Proteins with a Zn(II)2Cys6 domain, Cys-X2-Cys-X6-Cys-X5-12-Cys-X2-Cys-X6-9-Cys (hereafter, referred to as the C6 domain), form a subclass of zinc finger proteins found exclusively in fungi and yeast. Genome sequence databases of Saccharomyces cerevisiae and Candida albicans have provided an overvie...

  3. InvFEST, a database integrating information of polymorphic inversions in the human genome

    PubMed Central

    Martínez-Fundichely, Alexander; Casillas, Sònia; Egea, Raquel; Ràmia, Miquel; Barbadilla, Antonio; Pantano, Lorena; Puig, Marta; Cáceres, Mario

    2014-01-01

    The newest genomic advances have uncovered an unprecedented degree of structural variation throughout genomes, with great amounts of data accumulating rapidly. Here we introduce InvFEST (http://invfestdb.uab.cat), a database combining multiple sources of information to generate a complete catalogue of non-redundant human polymorphic inversions. Due to the complexity of this type of changes and the underlying high false-positive discovery rate, it is necessary to integrate all the available data to get a reliable estimate of the real number of inversions. InvFEST automatically merges predictions into different inversions, refines the breakpoint locations, and finds associations with genes and segmental duplications. In addition, it includes data on experimental validation, population frequency, functional effects and evolutionary history. All this information is readily accessible through a complete and user-friendly web report for each inversion. In its current version, InvFEST combines information from 34 different studies and contains 1092 candidate inversions, which are categorized based on internal scores and manual curation. Therefore, InvFEST aims to represent the most reliable set of human inversions and become a central repository to share information, guide future studies and contribute to the analysis of the functional and evolutionary impact of inversions on the human genome. PMID:24253300

  4. Genome cluster database. A sequence family analysis platform for Arabidopsis and rice.

    PubMed

    Horan, Kevin; Lauricha, Josh; Bailey-Serres, Julia; Raikhel, Natasha; Girke, Thomas

    2005-05-01

    The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Web-accessible Genome Cluster Database (http://bioinfo.ucr.edu/projects/GCD) with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species. PMID:15888677

  5. InvFEST, a database integrating information of polymorphic inversions in the human genome.

    PubMed

    Martínez-Fundichely, Alexander; Casillas, Sònia; Egea, Raquel; Ràmia, Miquel; Barbadilla, Antonio; Pantano, Lorena; Puig, Marta; Cáceres, Mario

    2014-01-01

    The newest genomic advances have uncovered an unprecedented degree of structural variation throughout genomes, with great amounts of data accumulating rapidly. Here we introduce InvFEST (http://invfestdb.uab.cat), a database combining multiple sources of information to generate a complete catalogue of non-redundant human polymorphic inversions. Due to the complexity of this type of changes and the underlying high false-positive discovery rate, it is necessary to integrate all the available data to get a reliable estimate of the real number of inversions. InvFEST automatically merges predictions into different inversions, refines the breakpoint locations, and finds associations with genes and segmental duplications. In addition, it includes data on experimental validation, population frequency, functional effects and evolutionary history. All this information is readily accessible through a complete and user-friendly web report for each inversion. In its current version, InvFEST combines information from 34 different studies and contains 1092 candidate inversions, which are categorized based on internal scores and manual curation. Therefore, InvFEST aims to represent the most reliable set of human inversions and become a central repository to share information, guide future studies and contribute to the analysis of the functional and evolutionary impact of inversions on the human genome. PMID:24253300

  6. PATtyFams: Protein families for the microbial genomes in the PATRIC database

    DOE PAGESBeta

    Davis, James J.; Gerdes, Svetlana; Olsen, Gary J.; Olson, Robert; Pusch, Gordon D.; Shukla, Maulik; Vonstein, Veronika; Wattam, Alice R.; Yoo, Hyunseung

    2016-02-08

    The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based functionmore » assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). In conclusion, this new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.« less

  7. Genome-wide Mycobacterium tuberculosis variation (GMTV) database: a new tool for integrating sequence variations and epidemiology

    PubMed Central

    2014-01-01

    Background Tuberculosis (TB) poses a worldwide threat due to advancing multidrug-resistant strains and deadly co-infections with Human immunodeficiency virus. Today large amounts of Mycobacterium tuberculosis whole genome sequencing data are being assessed broadly and yet there exists no comprehensive online resource that connects M. tuberculosis genome variants with geographic origin, with drug resistance or with clinical outcome. Description Here we describe a broadly inclusive unifying Genome-wide Mycobacterium tuberculosis Variation (GMTV) database, (http://mtb.dobzhanskycenter.org) that catalogues genome variations of M. tuberculosis strains collected across Russia. GMTV contains a broad spectrum of data derived from different sources and related to M. tuberculosis molecular biology, epidemiology, TB clinical outcome, year and place of isolation, drug resistance profiles and displays the variants across the genome using a dedicated genome browser. GMTV database, which includes 1084 genomes and over 69,000 SNP or Indel variants, can be queried about M. tuberculosis genome variation and putative associations with drug resistance, geographical origin, and clinical stages and outcomes. Conclusions Implementation of GMTV tracks the pattern of changes of M. tuberculosis strains in different geographical areas, facilitates disease gene discoveries associated with drug resistance or different clinical sequelae, and automates comparative genomic analyses among M. tuberculosis strains. PMID:24767249

  8. Expanded microbial genome coverage and improved protein family annotation in the COG database

    PubMed Central

    Galperin, Michael Y.; Makarova, Kira S.; Wolf, Yuri I.; Koonin, Eugene V.

    2015-01-01

    Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the

  9. KONAGAbase: a genomic and transcriptomic database for the diamondback moth, Plutella xylostella

    PubMed Central

    2013-01-01

    Background The diamondback moth (DBM), Plutella xylostella, is one of the most harmful insect pests for crucifer crops worldwide. DBM has rapidly evolved high resistance to most conventional insecticides such as pyrethroids, organophosphates, fipronil, spinosad, Bacillus thuringiensis, and diamides. Therefore, it is important to develop genomic and transcriptomic DBM resources for analysis of genes related to insecticide resistance, both to clarify the mechanism of resistance of DBM and to facilitate the development of insecticides with a novel mode of action for more effective and environmentally less harmful insecticide rotation. To contribute to this goal, we developed KONAGAbase, a genomic and transcriptomic database for DBM (KONAGA is the Japanese word for DBM). Description KONAGAbase provides (1) transcriptomic sequences of 37,340 ESTs/mRNAs and 147,370 RNA-seq contigs which were clustered and assembled into 84,570 unigenes (30,695 contigs, 50,548 pseudo singletons, and 3,327 singletons); and (2) genomic sequences of 88,530 WGS contigs with 246,244 degenerate contigs and 106,455 singletons from which 6,310 de novo identified repeat sequences and 34,890 predicted gene-coding sequences were extracted. The unigenes and predicted gene-coding sequences were clustered and 32,800 representative sequences were extracted as a comprehensive putative gene set. These sequences were annotated with BLAST descriptions, Gene Ontology (GO) terms, and Pfam descriptions, respectively. KONAGAbase contains rich graphical user interface (GUI)-based web interfaces for easy and efficient searching, browsing, and downloading sequences and annotation data. Five useful search interfaces consisting of BLAST search, keyword search, BLAST result-based search, GO tree-based search, and genome browser are provided. KONAGAbase is publicly available from our website (http://dbm.dna.affrc.go.jp/px/) through standard web browsers. Conclusions KONAGAbase provides DBM comprehensive transcriptomic

  10. SBMDb: first whole genome putative microsatellite DNA marker database of sugarbeet for bioenergy and industrial applications.

    PubMed

    Iquebal, Mir Asif; Jaiswal, Sarika; Angadi, U B; Sablok, Gaurav; Arora, Vasu; Kumar, Sunil; Rai, Anil; Kumar, Dinesh

    2015-01-01

    DNA marker plays important role as valuable tools to increase crop productivity by finding plausible answers to genetic variations and linking the Quantitative Trait Loci (QTL) of beneficial trait. Prior approaches in development of Short Tandem Repeats (STR) markers were time consuming and inefficient. Recent methods invoking the development of STR markers using whole genomic or transcriptomics data has gained wide importance with immense potential in developing breeding and cultivator improvement approaches. Availability of whole genome sequences and in silico approaches has revolutionized bulk marker discovery. We report world's first sugarbeet whole genome marker discovery having 145 K markers along with 5 K functional domain markers unified in common platform using MySQL, Apache and PHP in SBMDb. Embedded markers and corresponding location information can be selected for desired chromosome, location/interval and primers can be generated using Primer3 core, integrated at backend. Our analyses revealed abundance of 'mono' repeat (76.82%) over 'di' repeats (13.68%). Highest density (671.05 markers/Mb) was found in chromosome 1 and lowest density (341.27 markers/Mb) in chromosome 6. Current investigation of sugarbeet genome marker density has direct implications in increasing mapping marker density. This will enable present linkage map having marker distance of ∼2 cM, i.e. from 200 to 2.6 Kb, thus facilitating QTL/gene mapping. We also report e-PCR-based detection of 2027 polymorphic markers in panel of five genotypes. These markers can be used for DUS test of variety identification and MAS/GAS in variety improvement program. The present database presents wide source of potential markers for developing and implementing new approaches for molecular breeding required to accelerate industrious use of this crop, especially for sugar, health care products, medicines and color dye. Identified markers will also help in improvement of bioenergy trait of

  11. SBMDb: first whole genome putative microsatellite DNA marker database of sugarbeet for bioenergy and industrial applications

    PubMed Central

    Iquebal, Mir Asif; Jaiswal, Sarika; Angadi, U.B.; Sablok, Gaurav; Arora, Vasu; Kumar, Sunil; Rai, Anil; Kumar, Dinesh

    2015-01-01

    DNA marker plays important role as valuable tools to increase crop productivity by finding plausible answers to genetic variations and linking the Quantitative Trait Loci (QTL) of beneficial trait. Prior approaches in development of Short Tandem Repeats (STR) markers were time consuming and inefficient. Recent methods invoking the development of STR markers using whole genomic or transcriptomics data has gained wide importance with immense potential in developing breeding and cultivator improvement approaches. Availability of whole genome sequences and in silico approaches has revolutionized bulk marker discovery. We report world’s first sugarbeet whole genome marker discovery having 145 K markers along with 5 K functional domain markers unified in common platform using MySQL, Apache and PHP in SBMDb. Embedded markers and corresponding location information can be selected for desired chromosome, location/interval and primers can be generated using Primer3 core, integrated at backend. Our analyses revealed abundance of ‘mono’ repeat (76.82%) over ‘di’ repeats (13.68%). Highest density (671.05 markers/Mb) was found in chromosome 1 and lowest density (341.27 markers/Mb) in chromosome 6. Current investigation of sugarbeet genome marker density has direct implications in increasing mapping marker density. This will enable present linkage map having marker distance of ∼2 cM, i.e. from 200 to 2.6 Kb, thus facilitating QTL/gene mapping. We also report e-PCR-based detection of 2027 polymorphic markers in panel of five genotypes. These markers can be used for DUS test of variety identification and MAS/GAS in variety improvement program. The present database presents wide source of potential markers for developing and implementing new approaches for molecular breeding required to accelerate industrious use of this crop, especially for sugar, health care products, medicines and color dye. Identified markers will also help in improvement of bioenergy trait

  12. TP53 Variations in Human Cancers: New Lessons from the IARC TP53 Database and Genomics Data.

    PubMed

    Bouaoun, Liacine; Sonkin, Dmitriy; Ardin, Maude; Hollstein, Monica; Byrnes, Graham; Zavadil, Jiri; Olivier, Magali

    2016-09-01

    TP53 gene mutations are one of the most frequent somatic events in cancer. The IARC TP53 Database (http://p53.iarc.fr) is a popular resource that compiles occurrence and phenotype data on TP53 germline and somatic variations linked to human cancer. The deluge of data coming from cancer genomic studies generates new data on TP53 variations and attracts a growing number of database users for the interpretation of TP53 variants. Here, we present the current contents and functionalities of the IARC TP53 Database and perform a systematic analysis of TP53 somatic mutation data extracted from this database and from genomic data repositories. This analysis showed that IARC has more TP53 somatic mutation data than genomic repositories (29,000 vs. 4,000). However, the more complete screening achieved by genomic studies highlighted some overlooked facts about TP53 mutations, such as the presence of a significant number of mutations occurring outside the DNA-binding domain in specific cancer types. We also provide an update on TP53 inherited variants including the ones that should be considered as neutral frequent variations. We thus provide an update of current knowledge on TP53 variations in human cancer as well as inform users on the efficient use of the IARC TP53 Database. PMID:27328919

  13. Database management research for the Human Genome Project: Progress report, 7/1/96-3/15/97

    SciTech Connect

    Goodman, N.

    1997-03-01

    Progress is reported on the development of software that works in conjunction with database management systems (DBMSs) in ways that are useful for genomics. This new release of LabBase has two major advantages over the previous version, namely it runs on the Sybase relational DBMS rather than ObjectStore and offers more complete data modeling features than the previous version so is suitable for more kinds of genetic databases.

  14. Complete mitochondrial genome database and standardized classification system for Canis lupus familiaris.

    PubMed

    Duleba, Anna; Skonieczna, Katarzyna; Bogdanowicz, Wiesław; Malyarchuk, Boris; Grzybowski, Tomasz

    2015-11-01

    To contribute to the complete mitogenome database of the species Canis lupus familiaris and shed more light on its origin, we have sequenced mitochondrial genomes of 120 modern dogs from worldwide populations. Together with all the previously published mitogenome sequences of acceptable quality, we have reconstructed a global phylogenetic tree of 555 C. l. familiaris mitogenomes and standardized haplogroup nomenclature. The phylogenetic tree presented here and available online at http://clf.mtdna.tree.cm.umk.pl/ could be further used by forensic and evolutionary geneticists as well cynologists, for data quality control and unambiguous haplogroup classification. Our in-depth phylogeographic analysis of all C. l. familiaris mitogenomes confirmed that domestic dogs may have originated in East Asia during the Mesolithic and Upper Paleolithic time periods and started to expand to other parts of the world during Neolithic times. PMID:26218982

  15. Uncovering the Genome-Wide Transcriptional Responses of the Filamentous Fungus Aspergillus niger to Lignocellulose Using RNA Sequencing

    PubMed Central

    Gaddipati, Sanyasi; Kokolski, Matthew; Malla, Sunir; Blythe, Martin J.; Ibbett, Roger; Campbell, Maria; Liddell, Susan; Aboobaker, Aziz; Tucker, Gregory A.; Archer, David B.

    2012-01-01

    A key challenge in the production of second generation biofuels is the conversion of lignocellulosic substrates into fermentable sugars. Enzymes, particularly those from fungi, are a central part of this process, and many have been isolated and characterised. However, relatively little is known of how fungi respond to lignocellulose and produce the enzymes necessary for dis-assembly of plant biomass. We studied the physiological response of the fungus Aspergillus niger when exposed to wheat straw as a model lignocellulosic substrate. Using RNA sequencing we showed that, 24 hours after exposure to straw, gene expression of known and presumptive plant cell wall–degrading enzymes represents a huge investment for the cells (about 20% of the total mRNA). Our results also uncovered new esterases and surface interacting proteins that might form part of the fungal arsenal of enzymes for the degradation of plant biomass. Using transcription factor deletion mutants (xlnR and creA) to study the response to both lignocellulosic substrates and low carbon source concentrations, we showed that a subset of genes coding for degradative enzymes is induced by starvation. Our data support a model whereby this subset of enzymes plays a scouting role under starvation conditions, testing for available complex polysaccharides and liberating inducing sugars, that triggers the subsequent induction of the majority of hydrolases. We also showed that antisense transcripts are abundant and that their expression can be regulated by growth conditions. PMID:22912594

  16. Resistance Gene-Guided Genome Mining: Serial Promoter Exchanges in Aspergillus nidulans Reveal the Biosynthetic Pathway for Fellutamide B, a Proteasome Inhibitor.

    PubMed

    Yeh, Hsu-Hua; Ahuja, Manmeet; Chiang, Yi-Ming; Oakley, C Elizabeth; Moore, Shauna; Yoon, Olivia; Hajovsky, Heather; Bok, Jin-Woo; Keller, Nancy P; Wang, Clay C C; Oakley, Berl R

    2016-08-19

    Fungal genome projects are revealing thousands of cryptic secondary metabolism (SM) biosynthetic gene clusters that encode pathways that potentially produce valuable compounds. Heterologous expression systems should allow these clusters to be expressed and their products obtained, but approaches are needed to identify the most valuable target clusters. The inp cluster of Aspergillus nidulans contains a gene, inpE, that encodes a proteasome subunit, leading us to hypothesize that the inp cluster produces a proteasome inhibitor and inpE confers resistance to this compound. Previous efforts to express this cluster have failed, but by sequentially replacing the promoters of the genes of the cluster with a regulatable promotor, we have expressed them successfully. Expression reveals that the product of the inp cluster is the proteasome inhibitor fellutamide B, and our data allow us to propose a biosynthetic pathway for the compound. By deleting inpE and activating expression of the inp cluster, we demonstrate that inpE is required for resistance to internally produced fellutamide B. These data provide experimental validation for the hypothesis that some fungal SM clusters contain genes that encode resistant forms of the enzymes targeted by the compound produced by the cluster. PMID:27294372

  17. Trichinella spiralis: genome database searches for the presence and immunolocalization of protein disulphide isomerase family members.

    PubMed

    Freitas, C P; Clemente, I; Mendes, T; Novo, C

    2016-01-01

    The formation of nurse cells in host muscle cells during Trichinella spiralis infection is a key step in the infective mechanism. Collagen trimerization is set up via disulphide bond formation, catalysed by protein disulphide isomerase (PDI). In T. spiralis, some PDI family members have been identified but no localization is described and no antibodies specific for T. spiralis PDIs are available. In this work, computational approaches were used to search for non-described PDIs in the T. spiralis genome database and to check the cross-reactivity of commercial anti-human antibodies with T. spiralis orthologues. In addition to a previously described PDI (PDIA2), endoplasmic reticulum protein (ERp57/PDIA3), ERp72/PDIA4, and the molecular chaperones calreticulin (CRT), calnexin (CNX) and immunoglobulin-binding protein/glucose-regulated protein (BIP/GRP78), we identified orthologues of the human thioredoxin-related-transmembrane proteins (TMX1, TMX2 and TMX3) in the genome protein database, as well as ERp44 (PDIA10) and endoplasmic reticulum disulphide reductase (ERdj5/PDIA19). Immunocytochemical staining of paraffin sections of muscle infected by T. spiralis enabled us to localize some orthologues of the human PDIs (PDIA3 and TMX1) and the chaperone GRP78. A theoretical three-dimensional model for T. spiralis PDIA3 was constructed. The localization and characteristics of the predicted linear B-cell epitopes and amino acid sequence of the immunogens used for commercial production of anti-human PDIA3 antibodies validated the use of these antibodies for the immunolocalization of T. spiralis PDIA3 orthologues. These results suggest that further study of the role of the PDIs and chaperones during nurse cell formation is desirable. PMID:25475092

  18. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    SciTech Connect

    Reddy, Tatiparthi B. K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2014-10-27

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Within this paper, we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. Lastly, GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

  19. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    PubMed Central

    Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402

  20. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification.

    PubMed

    Reddy, T B K; Thomas, Alex D; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A; Kyrpides, Nikos C

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19,200 studies, 56,000 Biosamples, 56,000 sequencing projects and 39,400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402

  1. The Saccharomyces Genome Database: Gene Product Annotation of Function, Process, and Component.

    PubMed

    Cherry, J Michael

    2015-12-01

    An ontology is a highly structured form of controlled vocabulary. Each entry in the ontology is commonly called a term. These terms are used when talking about an annotation. However, each term has a definition that, like the definition of a word found within a dictionary, provides the complete usage and detailed explanation of the term. It is critical to consult a term's definition because the distinction between terms can be subtle. The use of ontologies in biology started as a way of unifying communication between scientific communities and to provide a standard dictionary for different topics, including molecular functions, biological processes, mutant phenotypes, chemical properties and structures. The creation of ontology terms and their definitions often requires debate to reach agreement but the result has been a unified descriptive language used to communicate knowledge. In addition to terms and definitions, ontologies require a relationship used to define the type of connection between terms. In an ontology, a term can have more than one parent term, the term above it in an ontology, as well as more than one child, the term below it in the ontology. Many ontologies are used to construct annotations in the Saccharomyces Genome Database (SGD), as in all modern biological databases; however, Gene Ontology (GO), a descriptive system used to categorize gene function, is the most extensively used ontology in SGD annotations. Examples included in this protocol illustrate the structure and features of this ontology. PMID:26631125

  2. CrAgDb--a database of annotated chaperone repertoire in archaeal genomes.

    PubMed

    Rani, Shikha; Srivastava, Abhishikha; Kumar, Manish; Goel, Manisha

    2016-03-01

    Chaperones are a diverse class of ubiquitous proteins that assist other cellular proteins in folding correctly and maintaining their native structure. Many different chaperones cooperate to constitute the 'proteostasis' machinery in the cells. It has been proposed earlier that archaeal organisms could be ideal model systems for deciphering the basic functioning of the 'protein folding machinery' in higher eukaryotes. Several chaperone families have been characterized in archaea over the years but mostly one protein at a time, making it difficult to decipher the composition and mechanistics of the protein folding system as a whole. In order to deal with these lacunae, we have developed a database of all archaeal chaperone proteins, CrAgDb (Chaperone repertoire in Archaeal genomes). The data have been presented in a systematic way with intuitive browse and search facilities for easy retrieval of information. Access to these curated datasets should expedite large-scale analysis of archaeal chaperone networks and significantly advance our understanding of operation and regulation of the protein folding machinery in archaea. Researchers could then translate this knowledge to comprehend the more complex protein folding pathways in eukaryotic systems. The database is freely available at http://14.139.227.92/mkumar/cragdb/. PMID:26862144

  3. The Candida Genome Database: the new homology information page highlights protein similarity and phylogeny.

    PubMed

    Binkley, Jonathan; Arnaud, Martha B; Inglis, Diane O; Skrzypek, Marek S; Shah, Prachi; Wymore, Farrell; Binkley, Gail; Miyasato, Stuart R; Simison, Matt; Sherlock, Gavin

    2014-01-01

    The Candida Genome Database (CGD, http://www.candidagenome.org/) is a freely available online resource that provides gene, protein and sequence information for multiple Candida species, along with web-based tools for accessing, analyzing and exploring these data. The goal of CGD is to facilitate and accelerate research into Candida pathogenesis and biology. The CGD Web site is organized around Locus pages, which display information collected about individual genes. Locus pages have multiple tabs for accessing different types of information; the default Summary tab provides an overview of the gene name, aliases, phenotype and Gene Ontology curation, whereas other tabs display more in-depth information, including protein product details for coding genes, notes on changes to the sequence or structure of the gene and a comprehensive reference list. Here, in this update to previous NAR Database articles featuring CGD, we describe a new tab that we have added to the Locus page, entitled the Homology Information tab, which displays phylogeny and gene similarity information for each locus. PMID:24185697

  4. GeMInA, Genomic Metadata for Infectious Agents, a geospatial surveillance pathogen database

    PubMed Central

    Schriml, Lynn M.; Arze, Cesar; Nadendla, Suvarna; Ganapathy, Anu; Felix, Victor; Mahurkar, Anup; Phillippy, Katherine; Gussman, Aaron; Angiuoli, Sam; Ghedin, Elodie; White, Owen; Hall, Neil

    2010-01-01

    The Gemina system (http://gemina.igs.umaryland.edu) identifies, standardizes and integrates the outbreak metadata for the breadth of NIAID category A–C viral and bacterial pathogens, thereby providing an investigative and surveillance tool describing the Who [Host], What [Disease, Symptom], When [Date], Where [Location] and How [Pathogen, Environmental Source, Reservoir, Transmission Method] for each pathogen. The Gemina database will provide a greater understanding of the interactions of viral and bacterial pathogens with their hosts and infectious diseases through in-depth literature text-mining, integrated outbreak metadata, outbreak surveillance tools, extensive ontology development, metadata curation and representative genomic sequence identification and standards development. The Gemina web interface provides metadata selection and retrieval of a pathogen's; Infection Systems (Pathogen, Host, Disease, Transmission Method and Anatomy) and Incidents (Location and Date) along with a hosts Age and Gender. The Gemina system provides an integrated investigative and geospatial surveillance system connecting pathogens, pathogen products and disease anchored on the taxonomic ID of the pathogen and host to identify the breadth of hosts and diseases known for these pathogens, to identify the extent of outbreak locations, and to identify unique genomic regions with the DNA Signature Insignia Detection Tool. PMID:19850722

  5. Xenbase, the Xenopus model organism database; new virtualized system, data types and genomes

    PubMed Central

    Karpinka, J. Brad; Fortriede, Joshua D.; Burns, Kevin A.; James-Zorn, Christina; Ponferrada, Virgilio G.; Lee, Jacqueline; Karimi, Kamran; Zorn, Aaron M.; Vize, Peter D.

    2015-01-01

    Xenbase (http://www.xenbase.org), the Xenopus frog model organism database, integrates a wide variety of data from this biomedical model genus. Two closely related species are represented: the allotetraploid Xenopus laevis that is widely used for microinjection and tissue explant-based protocols, and the diploid Xenopus tropicalis which is used for genetics and gene targeting. The two species are extremely similar and protocols, reagents and results from each species are often interchangeable. Xenbase imports, indexes, curates and manages data from both species; all of which are mapped via unique IDs and can be queried in either a species-specific or species agnostic manner. All our services have now migrated to a private cloud to achieve better performance and reliability. We have added new content, including providing full support for morpholino reagents, used to inhibit mRNA translation or splicing and binding to regulatory microRNAs. New genomes assembled by the JGI for both species and are displayed in Gbrowse and are also available for searches using BLAST. Researchers can easily navigate from genome content to gene page reports, literature, experimental reagents and many other features using hyperlinks. Xenbase has also greatly expanded image content for figures published in papers describing Xenopus research via PubMedCentral. PMID:25313157

  6. openSputnik--a database to ESTablish comparative plant genomics using unsaturated sequence collections.

    PubMed

    Rudd, Stephen

    2005-01-01

    The public expressed sequence tag collections are continually being enriched with high-quality sequences that represent an ever-expanding range of taxonomically diverse plant species. While these sequence collections provide biased insight into the populations of expressed genes available within individual species and their associated tissues, the information is conceivably of wider relevance in a comparative context. When we consider the available expressed sequence tag (EST) collections of summer 2004, most of the major plant taxonomic clades are at least superficially represented. Investigation of the five million available plant ESTs provides a wealth of information that has applications in modelling the routes of plant genome evolution and the identification of lineage-specific genes and gene families. Over four million ESTs from over 50 distinct plant species have been collated within an EST analysis pipeline called openSputnik. The ESTs were resolved down into approximately one million unigene sequences. These have been annotated using orthology-based annotation transfer from reference plant genomes and using a variety of contemporary bioinformatics methods to assign peptide, structural and functional attributes. The openSputnik database is available at http://sputnik.btk.fi. PMID:15608275

  7. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases

    PubMed Central

    Caspi, Ron; Billington, Richard; Ferrer, Luciana; Foerster, Hartmut; Fulcher, Carol A.; Keseler, Ingrid M.; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A.; Ong, Quang; Paley, Suzanne; Subhraveti, Pallavi; Weaver, Daniel S.; Karp, Peter D.

    2016-01-01

    The MetaCyc database (MetaCyc.org) is a freely accessible comprehensive database describing metabolic pathways and enzymes from all domains of life. The majority of MetaCyc pathways are small-molecule metabolic pathways that have been experimentally determined. MetaCyc contains more than 2400 pathways derived from >46 000 publications, and is the largest curated collection of metabolic pathways. BioCyc (BioCyc.org) is a collection of 5700 organism-specific Pathway/Genome Databases (PGDBs), each containing the full genome and predicted metabolic network of one organism, including metabolites, enzymes, reactions, metabolic pathways, predicted operons, transport systems, and pathway-hole fillers. The BioCyc website offers a variety of tools for querying and analyzing PGDBs, including Omics Viewers and tools for comparative analysis. This article provides an update of new developments in MetaCyc and BioCyc during the last two years, including addition of Gibbs free energy values for compounds and reactions; redesign of the primary gene/protein page; addition of a tool for creating diagrams containing multiple linked pathways; several new search capabilities, including searching for genes based on sequence patterns, searching for databases based on an organism's phenotypes, and a cross-organism search; and a metabolite identifier translation service. PMID:26527732

  8. Novel LanT Associated Lantibiotic Clusters Identified by Genome Database Mining

    PubMed Central

    Singh, Mangal; Sareen, Dipti

    2014-01-01

    Background Frequent use of antibiotics has led to the emergence of antibiotic resistance in bacteria. Lantibiotic compounds are ribosomally synthesized antimicrobial peptides against which bacteria are not able to produce resistance, hence making them a good alternative to antibiotics. Nisin is the oldest and the most widely used lantibiotic, in food preservation, without having developed any significant resistance against it. Having their antimicrobial potential and a limited number, there is a need to identify novel lantibiotics. Methodology/Findings Identification of novel lantibiotic biosynthetic clusters from an ever increasing database of bacterial genomes, can provide a major lead in this direction. In order to achieve this, a strategy was adopted to identify novel lantibiotic biosynthetic clusters by screening the sequenced genomes for LanT homolog, which is a conserved lantibiotic transporter specific to type IB clusters. This strategy resulted in identification of 54 bacterial strains containing the LanT homologs, which are not the known lantibiotic producers. Of these, 24 strains were subjected to a detailed bioinformatic analysis to identify genes encoding for precursor peptides, modification enzyme, immunity and quorum sensing proteins. Eight clusters having two LanM determinants, similar to haloduracin and lichenicidin were identified, along with 13 clusters having a single LanM determinant as in mersacidin biosynthetic cluster. Besides these, orphan LanT homologs were also identified which might be associated with novel bacteriocins, encoded somewhere else in the genome. Three identified gene clusters had a C39 domain containing LanT transporter, associated with the LanBC proteins and double glycine type precursor peptides, the only known example of such a cluster is that of salivaricin. Conclusion This study led to the identification of 8 novel putative two-component lantibiotic clusters along with 13 having a single LanM and 3 with LanBC genes

  9. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

    PubMed

    Pruitt, Kim D; Tatusova, Tatiana; Maglott, Donna R

    2005-01-01

    The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff. PMID:15608248

  10. The new modern era of yeast genomics: community sequencing and the resulting annotation of multiple Saccharomyces cerevisiae strains at the Saccharomyces Genome Database

    PubMed Central

    Engel, Stacia R.; Cherry, J. Michael

    2013-01-01

    The first completed eukaryotic genome sequence was that of the yeast Saccharomyces cerevisiae, and the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the original model organism database. SGD remains the authoritative community resource for the S. cerevisiae reference genome sequence and its annotation, and continues to provide comprehensive biological information correlated with S. cerevisiae genes and their products. A diverse set of yeast strains have been sequenced to explore commercial and laboratory applications, and a brief history of those strains is provided. The publication of these new genomes has motivated the creation of new tools, and SGD will annotate and provide comparative analyses of these sequences, correlating changes with variations in strain phenotypes and protein function. We are entering a new era at SGD, as we incorporate these new sequences and make them accessible to the scientific community, all in an effort to continue in our mission of educating researchers and facilitating discovery. Database URL: http://www.yeastgenome.org/ PMID:23487186

  11. Improvements to GALA and dbERGE II: databases featuring genomic sequence alignment, annotation and experimental results.

    PubMed

    Elnitski, Laura; Giardine, Belinda; Shah, Prachi; Zhang, Yi; Riemer, Cathy; Weirauch, Matthew; Burhans, Richard; Miller, Webb; Hardison, Ross C

    2005-01-01

    We describe improvements to two databases that give access to information on genomic sequence similarities, functional elements in DNA and experimental results that demonstrate those functions. GALA, the database of Genome ALignments and Annotations, is now a set of interlinked relational databases for five vertebrate species, human, chimpanzee, mouse, rat and chicken. For each species, GALA records pairwise and multiple sequence alignments, scores derived from those alignments that reflect the likelihood of being under purifying selection or being a regulatory element, and extensive annotations such as genes, gene expression patterns and transcription factor binding sites. The user interface supports simple and complex queries, including operations such as subtraction and intersections as well as clustering and finding elements in proximity to features. dbERGE II, the database of Experimental Results on Gene Expression, contains experimental data from a variety of functional assays. Both databases are now run on the DB2 database management system. Improved hardware and tuning has reduced response times and increased querying capacity, while simplified query interfaces will help direct new users through the querying process. Links are available at http://www.bx.psu.edu/. PMID:15608239

  12. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases

    PubMed Central

    Caspi, Ron; Altman, Tomer; Dreher, Kate; Fulcher, Carol A.; Subhraveti, Pallavi; Keseler, Ingrid M.; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A.; Ong, Quang; Paley, Suzanne; Pujar, Anuradha; Shearer, Alexander G.; Travers, Michael; Weerasinghe, Deepika; Zhang, Peifen; Karp, Peter D.

    2012-01-01

    The MetaCyc database (http://metacyc.org/) provides a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. MetaCyc contains more than 1800 pathways derived from more than 30 000 publications, and is the largest curated collection of metabolic pathways currently available. Most reactions in MetaCyc pathways are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes and literature citations. BioCyc (http://biocyc.org/) is a collection of more than 1700 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference database, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs contain additional features, including predicted operons, transport systems and pathway-hole fillers. The BioCyc website and Pathway Tools software offer many tools for querying and analysis of PGDBs, including Omics Viewers and comparative analysis. New developments include a zoomable web interface for diagrams; flux-balance analysis model generation from PGDBs; web services; and a new tool called Web Groups. PMID:22102576

  13. dbSUPER: a database of super-enhancers in mouse and human genome

    PubMed Central

    Khan, Aziz; Zhang, Xuegong

    2016-01-01

    Super-enhancers are clusters of transcriptional enhancers that drive cell-type-specific gene expression and are crucial to cell identity. Many disease-associated sequence variations are enriched in super-enhancer regions of disease-relevant cell types. Thus, super-enhancers can be used as potential biomarkers for disease diagnosis and therapeutics. Current studies have identified super-enhancers in more than 100 cell types and demonstrated their functional importance. However, a centralized resource to integrate all these findings is not currently available. We developed dbSUPER (http://bioinfo.au.tsinghua.edu.cn/dbsuper/), the first integrated and interactive database of super-enhancers, with the primary goal of providing a resource for assistance in further studies related to transcriptional control of cell identity and disease. dbSUPER provides a responsive and user-friendly web interface to facilitate efficient and comprehensive search and browsing. The data can be easily sent to Galaxy instances, GREAT and Cistrome web-servers for downstream analysis, and can also be visualized in the UCSC genome browser where custom tracks can be added automatically. The data can be downloaded and exported in variety of formats. Furthermore, dbSUPER lists genes associated with super-enhancers and also links to external databases such as GeneCards, UniProt and Entrez. dbSUPER also provides an overlap analysis tool to annotate user-defined regions. We believe dbSUPER is a valuable resource for the biology and genetic research communities. PMID:26438538

  14. dbSUPER: a database of super-enhancers in mouse and human genome.

    PubMed

    Khan, Aziz; Zhang, Xuegong

    2016-01-01

    Super-enhancers are clusters of transcriptional enhancers that drive cell-type-specific gene expression and are crucial to cell identity. Many disease-associated sequence variations are enriched in super-enhancer regions of disease-relevant cell types. Thus, super-enhancers can be used as potential biomarkers for disease diagnosis and therapeutics. Current studies have identified super-enhancers in more than 100 cell types and demonstrated their functional importance. However, a centralized resource to integrate all these findings is not currently available. We developed dbSUPER (http://bioinfo.au.tsinghua.edu.cn/dbsuper/), the first integrated and interactive database of super-enhancers, with the primary goal of providing a resource for assistance in further studies related to transcriptional control of cell identity and disease. dbSUPER provides a responsive and user-friendly web interface to facilitate efficient and comprehensive search and browsing. The data can be easily sent to Galaxy instances, GREAT and Cistrome web-servers for downstream analysis, and can also be visualized in the UCSC genome browser where custom tracks can be added automatically. The data can be downloaded and exported in variety of formats. Furthermore, dbSUPER lists genes associated with super-enhancers and also links to external databases such as GeneCards, UniProt and Entrez. dbSUPER also provides an overlap analysis tool to annotate user-defined regions. We believe dbSUPER is a valuable resource for the biology and genetic research communities. PMID:26438538

  15. Gene evolution and gene expression after whole genome duplication in fish: the PhyloFish database.

    PubMed

    Pasquier, Jeremy; Cabau, Cédric; Nguyen, Thaovi; Jouanno, Elodie; Severac, Dany; Braasch, Ingo; Journot, Laurent; Pontarotti, Pierre; Klopp, Christophe; Postlethwait, John H; Guiguen, Yann; Bobe, Julien

    2016-01-01

    With more than 30,000 species, ray-finned fish represent approximately half of vertebrates. The evolution of ray-finned fish was impacted by several whole genome duplication (WGD) events including a teleost-specific WGD event (TGD) that occurred at the root of the teleost lineage about 350 million years ago (Mya) and more recent WGD events in salmonids, carps, suckers and others. In plants and animals, WGD events are associated with adaptive radiations and evolutionary innovations. WGD-spurred innovation may be especially relevant in the case of teleost fish, which colonized a wide diversity of habitats on earth, including many extreme environments. Fish biodiversity, the use of fish models for human medicine and ecological studies, and the importance of fish in human nutrition, fuel an important need for the characterization of gene expression repertoires and corresponding evolutionary histories of ray-finned fish genes. To this aim, we performed transcriptome analyses and developed the PhyloFish database to provide (i) de novo assembled gene repertoires in 23 different ray-finned fish species including two holosteans (i.e. a group that diverged from teleosts before TGD) and 21 teleosts (including six salmonids), and (ii) gene expression levels in ten different tissues and organs (and embryos for many) in the same species. This resource was generated using a common deep RNA sequencing protocol to obtain the most exhaustive gene repertoire possible in each species that allows between-species comparisons to study the evolution of gene expression in different lineages. The PhyloFish database described here can be accessed and searched using RNAbrowse, a simple and efficient solution to give access to RNA-seq de novo assembled transcripts. PMID:27189481

  16. MetaPathways: a modular pipeline for constructing pathway/genome databases from environmental sequence information

    PubMed Central

    2013-01-01

    Background A central challenge to understanding the ecological and biogeochemical roles of microorganisms in natural and human engineered ecosystems is the reconstruction of metabolic interaction networks from environmental sequence information. The dominant paradigm in metabolic reconstruction is to assign functional annotations using BLAST. Functional annotations are then projected onto symbolic representations of metabolism in the form of KEGG pathways or SEED subsystems. Results Here we present MetaPathways, an open source pipeline for pathway inference that uses the PathoLogic algorithm to map functional annotations onto the MetaCyc collection of reactions and pathways, and construct environmental Pathway/Genome Databases (ePGDBs) compatible with the editing and navigation features of Pathway Tools. The pipeline accepts assembled or unassembled nucleotide sequences, performs quality assessment and control, predicts and annotates noncoding genes and open reading frames, and produces inputs to PathoLogic. In addition to constructing ePGDBs, MetaPathways uses MLTreeMap to build phylogenetic trees for selected taxonomic anchor and functional gene markers, converts General Feature Format (GFF) files into concatenated GenBank files for ePGDB construction based on third-party annotations, and generates useful file formats including Sequin files for direct GenBank submission and gene feature tables summarizing annotations, MLTreeMap trees, and ePGDB pathway coverage summaries for statistical comparisons. Conclusions MetaPathways provides users with a modular annotation and analysis pipeline for predicting metabolic interaction networks from environmental sequence information using an alternative to KEGG pathways and SEED subsystems mapping. It is extensible to genomic and transcriptomic datasets from a wide range of sequencing platforms, and generates useful data products for microbial community structure and function analysis. The MetaPathways software package

  17. Database management research for the Human Genome Project. Final progress report for period: 02/01/99 - 06/14/00

    SciTech Connect

    Bult, Carol J.

    1999-11-01

    The MouseBLAST server allows researchers to search a sequence within mouse/rodent sequence databases to find matching sequences that may be associated with mouse genes. Query results may be linked to gene detail records in the Mouse Genome Database (MGD). Searches are performed using WU-BLAST 2.0. All sequence databases are updated on a weekly basis.

  18. Genome-Wide Transcriptome Analysis of Cotton (Gossypium hirsutum L.) Identifies Candidate Gene Signatures in Response to Aflatoxin Producing Fungus Aspergillus flavus

    PubMed Central

    Bedre, Renesh; Rajasekaran, Kanniah; Mangu, Venkata Ramanarao; Sanchez Timm, Luis Eduardo; Bhatnagar, Deepak; Baisakh, Niranjan

    2015-01-01

    Aflatoxins are toxic and potent carcinogenic metabolites produced from the fungi Aspergillus flavus and A. parasiticus. Aflatoxins can contaminate cottonseed under conducive preharvest and postharvest conditions. United States federal regulations restrict the use of aflatoxin contaminated cottonseed at >20 ppb for animal feed. Several strategies have been proposed for controlling aflatoxin contamination, and much success has been achieved by the application of an atoxigenic strain of A. flavus in cotton, peanut and maize fields. Development of cultivars resistant to aflatoxin through overexpression of resistance associated genes and/or knocking down aflatoxin biosynthesis of A. flavus will be an effective strategy for controlling aflatoxin contamination in cotton. In this study, genome-wide transcriptome profiling was performed to identify differentially expressed genes in response to infection with both toxigenic and atoxigenic strains of A. flavus on cotton (Gossypium hirsutum L.) pericarp and seed. The genes involved in antifungal response, oxidative burst, transcription factors, defense signaling pathways and stress response were highly differentially expressed in pericarp and seed tissues in response to A. flavus infection. The cell-wall modifying genes and genes involved in the production of antimicrobial substances were more active in pericarp as compared to seed. The genes involved in auxin and cytokinin signaling were also induced. Most of the genes involved in defense response in cotton were highly induced in pericarp than in seed. The global gene expression analysis in response to fungal invasion in cotton will serve as a source for identifying biomarkers for breeding, potential candidate genes for transgenic manipulation, and will help in understanding complex plant-fungal interaction for future downstream research. PMID:26366857

  19. Genome-Wide Transcriptome Analysis of Cotton (Gossypium hirsutum L.) Identifies Candidate Gene Signatures in Response to Aflatoxin Producing Fungus Aspergillus flavus.

    PubMed

    Bedre, Renesh; Rajasekaran, Kanniah; Mangu, Venkata Ramanarao; Sanchez Timm, Luis Eduardo; Bhatnagar, Deepak; Baisakh, Niranjan

    2015-01-01

    Aflatoxins are toxic and potent carcinogenic metabolites produced from the fungi Aspergillus flavus and A. parasiticus. Aflatoxins can contaminate cottonseed under conducive preharvest and postharvest conditions. United States federal regulations restrict the use of aflatoxin contaminated cottonseed at >20 ppb for animal feed. Several strategies have been proposed for controlling aflatoxin contamination, and much success has been achieved by the application of an atoxigenic strain of A. flavus in cotton, peanut and maize fields. Development of cultivars resistant to aflatoxin through overexpression of resistance associated genes and/or knocking down aflatoxin biosynthesis of A. flavus will be an effective strategy for controlling aflatoxin contamination in cotton. In this study, genome-wide transcriptome profiling was performed to identify differentially expressed genes in response to infection with both toxigenic and atoxigenic strains of A. flavus on cotton (Gossypium hirsutum L.) pericarp and seed. The genes involved in antifungal response, oxidative burst, transcription factors, defense signaling pathways and stress response were highly differentially expressed in pericarp and seed tissues in response to A. flavus infection. The cell-wall modifying genes and genes involved in the production of antimicrobial substances were more active in pericarp as compared to seed. The genes involved in auxin and cytokinin signaling were also induced. Most of the genes involved in defense response in cotton were highly induced in pericarp than in seed. The global gene expression analysis in response to fungal invasion in cotton will serve as a source for identifying biomarkers for breeding, potential candidate genes for transgenic manipulation, and will help in understanding complex plant-fungal interaction for future downstream research. PMID:26366857

  20. Improving the quality of genome, protein sequence, and taxonomy databases: a prerequisite for microbiome meta-omics 2.0.

    PubMed

    Pible, Olivier; Armengaud, Jean

    2015-10-01

    High-throughput shotgun metaproteomic approaches on environmental or medical microbiomes are producing huge amounts of tandem mass spectrometry data. These can be interpreted either with a general protein sequence database comprising tens of thousands of sequenced genomes or with a more customized database such as those obtained after metagenome sequencing of the DNA extracted from the same sample. However, not all entries in a nucleotide or protein sequence database are of equal quality and this can critically impact metaproteomic data interpretation. In this viewpoint article, we exemplify several key issues. First, either genome or transcriptome data interpretation due to inaccurate contig assembly and gene prediction may be erroneous, for its mitigation the metaproteogenomic strategies could have an interesting perspective. Errors in sample handling and taxonomical characterization may also be problematic. Cross-contamination of genome sequences is also underestimated while frequent. As a consequence of these structural errors regarding protein sequences and additional problems due to homology-based functional annotation of proteins, specific efforts for better interpretation of metaproteomic data are required. We propose the development of new bioinformatic pipelines devoted to detection and correction of errors and contaminations to improve the overall quality of sequence and taxonomy databases for metaproteomics. PMID:26038180

  1. CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects

    PubMed Central

    Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf

    2014-01-01

    CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB PMID:25281234

  2. CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects.

    PubMed

    Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf

    2014-01-01

    CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB. PMID:25281234

  3. Genome-wide transcriptome analysis of cotton (Gossypium hirsutum L.) identifies candidate gene signatures in response to aflatoxin producing fungus Aspergillus flavus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxins are toxic metabolites and potent carcinogen produced from asexual fungi Aspergillus flavus and A. parasiticus. Aflatoxins can contaminate cottonseed under conducive preharvest and postharvest conditions. U.S. federal regulations restrict the use of aflatoxin contaminated cottonseed at >20...

  4. A Genome-Scale Database and Reconstruction of Caenorhabditis elegans Metabolism.

    PubMed

    Gebauer, Juliane; Gentsch, Christoph; Mansfeld, Johannes; Schmeißer, Kathrin; Waschina, Silvio; Brandes, Susanne; Klimmasch, Lukas; Zamboni, Nicola; Zarse, Kim; Schuster, Stefan; Ristow, Michael; Schäuble, Sascha; Kaleta, Christoph

    2016-05-25

    We present a genome-scale model of Caenorhabditis elegans metabolism along with the public database ElegCyc (http://elegcyc.bioinf.uni-jena.de:1100), which represents a reference for metabolic pathways in the worm and allows for the visualization as well as analysis of omics datasets. Our model reflects the metabolic peculiarities of C. elegans that make it distinct from other higher eukaryotes and mammals, including mice and humans. We experimentally verify one of these peculiarities by showing that the lifespan-extending effect of L-tryptophan supplementation is dose dependent (hormetic). Finally, we show the utility of our model for analyzing omics datasets through predicting changes in amino acid concentrations after genetic perturbations and analyzing metabolic changes during normal aging as well as during two distinct, reactive oxygen species (ROS)-related lifespan-extending treatments. Our analyses reveal a notable similarity in metabolic adaptation between distinct lifespan-extending interventions and point to key pathways affecting lifespan in nematodes. PMID:27211858

  5. The Portable Dictionary of the Mouse Genome: a personal database for gene mapping and molecular biology.

    PubMed

    Williams, R W

    1994-06-01

    The Portable Dictionary of the Mouse Genome is a database for personal computers that contains information on approximately 10,000 loci in the mouse, along with data on homologs in several other mammalian species, including human, rat, cat, cow, and pig. Key features of the dictionary are its compact size, its network independence, and the ability to convert the entire dictionary to a wide variety of common application programs. Another significant feature is the integration of DNA sequence accession data. Loci in the dictionary can be rapidly resorted by chromosomal position, by type, by human homology, or by gene effect. The dictionary provides an accessible, easily manipulated set of data that has many uses--from a quick review of loci and gene nomenclature to the design of experiments and analysis of results. The Portable Dictionary is available in several formats suitable for conversion to different programs and computer systems. It can be obtained on disk or from Internet Gopher servers (mickey.utmen.edu or anat4.utmen.edu), an anonymous FTP site (nb.utmem.edu in the directory pub/genedict), and a World Wide Web server (http://mickey.utmem.edu/front.html). PMID:8043953

  6. Animal QTLdb: an improved database tool for livestock animal QTL/association data dissemination in the post-genome era.

    PubMed

    Hu, Zhi-Liang; Park, Carissa A; Wu, Xiao-Lin; Reecy, James M

    2013-01-01

    The Animal QTL database (QTLdb; http://www.animalgenome.org/QTLdb) is designed to house all publicly available QTL and single-nucleotide polymorphism/gene association data on livestock animal species. An earlier version was published in the Nucleic Acids Research Database issue in 2007. Since then, we have continued our efforts to develop new and improved database tools to allow more data types, parameters and functions. Our efforts have transformed the Animal QTLdb into a tool that actively serves the research community as a quality data repository and more importantly, a provider of easily accessible tools and functions to disseminate QTL and gene association information. The QTLdb has been heavily used by the livestock genomics community since its first public release in 2004. To date, there are 5920 cattle, 3442 chicken, 7451 pigs, 753 sheep and 88 rainbow trout data points in the database, and at least 290 publications that cite use of the database. The rapid advancement in genomic studies of cattle, chicken, pigs, sheep and other livestock animals has presented us with challenges, as well as opportunities for the QTLdb to meet the evolving needs of the research community. Here, we report our progress over the recent years and highlight new functions and services available to the general public. PMID:23180796

  7. GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes

    PubMed Central

    Chan, Patricia P.; Lowe, Todd M.

    2016-01-01

    Transfer RNAs represent the largest, most ubiquitous class of non-protein coding RNA genes found in all living organisms. The tRNAscan-SE search tool has become the de facto standard for annotating tRNA genes in genomes, and the Genomic tRNA Database (GtRNAdb) was created as a portal for interactive exploration of these gene predictions. Since its published description in 2009, the GtRNAdb has steadily grown in content, and remains the most commonly cited web-based source of tRNA gene information. In this update, we describe not only a major increase in the number of tRNA predictions (>367000) and genomes analyzed (>4370), but more importantly, the integration of new analytic and functional data to improve the quality and biological context of tRNA gene predictions. New information drawn from other sources includes tRNA modification data, epigenetic data, single nucleotide polymorphisms, gene expression and evolutionary conservation. A richer set of analytic data is also presented, including better tRNA functional prediction, non-canonical features, predicted structural impacts from sequence variants and minimum free energy structural predictions. Views of tRNA genes in genomic context are provided via direct links to the UCSC genome browsers. The database can be searched by sequence or gene features, and is available at http://gtrnadb.ucsc.edu/. PMID:26673694

  8. VaDE: a manually curated database of reproducible associations between various traits and human genomic polymorphisms.

    PubMed

    Nagai, Yoko; Takahashi, Yasuko; Imanishi, Tadashi

    2015-01-01

    Genome-wide association studies (GWASs) have identified numerous single nucleotide polymorphisms (SNPs) associated with the development of common diseases. However, it is clear that genetic risk factors of common diseases are heterogeneous among human populations. Therefore, we developed a database of genomic polymorphisms that are reproducibly associated with disease susceptibilities, drug responses and other traits for each human population: 'VarySysDB Disease Edition' (VaDE; http://bmi-tokai.jp/VaDE/). SNP-trait association data were obtained from the National Human Genome Research Institute GWAS (NHGRI GWAS) catalog and RAvariome, and we added detailed information of sample populations by curating original papers. In addition, we collected and curated original papers, and registered the detailed information of SNP-trait associations in VaDE. Then, we evaluated reproducibility of associations in each population by counting the number of significantly associated studies. VaDE provides literature-based SNP-trait association data and functional genomic region annotation for SNP functional research. SNP functional annotation data included experimental data of the ENCODE project, H-InvDB transcripts and the 1000 Genome Project. A user-friendly web interface was developed to assist quick search, easy download and fast swapping among viewers. We believe that our database will contribute to the future establishment of personalized medicine and increase our understanding of genetic factors underlying diseases. PMID:25361969

  9. Shanghai RAPESEED Database: a resource for functional genomics studies of seed development and fatty acid metabolism of Brassica

    PubMed Central

    Wu, Guo-Zhang; Shi, Qiu-Ming; Niu, Ya; Xing, Mei-Qing; Xue, Hong-Wei

    2008-01-01

    The Shanghai RAPESEED Database (RAPESEED, http://rapeseed.plantsignal.cn/) was created to provide the solid platform for functional genomics studies of oilseed crops with the emphasis on seed development and fatty acid metabolism. The RAPESEED includes the resource of 8462 unique ESTs, of which 3526 clones are with full length cDNA; the expression profiles of 8095 genes and the Serial Analysis of Gene Expression (SAGE, 23 895 unique tags) and tag-to-gene data during seed development. In addition, a total of ∼14 700 M3 mutant populations were generated by ethylmethanesulfonate (EMS) mutagenesis and related seed quality information was determined using the Foss NIR System. Further, the TILLING (Targeting Induced Local Lesions IN Genomes) platform was established based on the generated EMS mutant population. The relevant information was collected in RAPESEED database, which can be searched through keywords, nucleotide or protein sequences, or seed quality parameters, and downloaded. PMID:17916574

  10. BactPepDB: a database of predicted peptides from a exhaustive survey of complete prokaryote genomes

    PubMed Central

    Rey, Julien; Deschavanne, Patrick; Tuffery, Pierre

    2014-01-01

    With the recent progress in complete genome sequencing, mining the increasing amount of genomic information available should in theory provide the means to discover new classes of peptides. However, annotation pipelines often do not consider small reading frames likely to be expressed. BactPepDB, available online at http://bactpepdb.rpbs.univ-paris-diderot.fr, is a database that aims at providing an exhaustive re-annotation of all complete prokaryotic genomes—chromosomal and plasmid DNA—available in RefSeq for coding sequences ranging between 10 and 80 amino acids. The identified peptides are classified as (i) previously identified in RefSeq, (ii) entity-overlapping (intragenic) or intergenic, and (iii) potential pseudogenes—intergenic sequences corresponding to a portion of a previously annotated larger gene. Additional information is related to homologs within order, predicted signal sequence, transmembrane segments, disulfide bonds, secondary structure, and the existence of a related 3D structure in the Protein Databank. As a result, BactPepDB provides insights about candidate peptides, and provides information about their conservation, together with some of their expected biological/structural features. The BactPepDB interface allows to search for candidate peptides in the database, or to search for peptides similar to a query, according to the multiple properties predicted or related to genomic localization. Database URL: http://www.yeastgenome.org/ PMID:25377257

  11. Tractor_DB (version 2.0): a database of regulatory interactions in gamma-proteobacterial genomes.

    PubMed

    Pérez, Abel González; Angarica, Vladimir Espinosa; Vasconcelos, Ana Tereza R; Collado-Vides, Julio

    2007-01-01

    The version 2.0 of Tractor_DB is now accessible at its three international mirrors: www.bioinfo.cu/Tractor_DB, www.tractor.lncc.br and http://www.ccg.unam.mx/tractorDB. This database contains a collection of computationally predicted Transcription Factors' binding sites in gamma-proteobacterial genomes. These data should aid researchers in the design of microarray experiments and the interpretation of their results. They should also facilitate studies of Comparative Genomics of the regulatory networks of this group of organisms. In this paper we describe the main improvements incorporated to the database in the past year and a half which include incorporating information on the regulatory networks of 13-increasing to 30-new gamma-proteobacteria and developing a new computational strategy to complement the putative sites identified by the original weight matrix-based approach. We have also added dynamically generated navigation tabs to the navigation interfaces. Moreover, we developed a new interface that allows users to directly retrieve information on the conservation of regulatory interactions in the 30 genomes included in the database by navigating a map that represents a core of the known Escherichia coli regulatory network. PMID:17088283

  12. The TTSMI database: a catalog of triplex target DNA sites associated with genes and regulatory elements in the human genome

    PubMed Central

    Jenjaroenpun, Piroon; Chew, Chee Siang; Yong, Tai Pang; Choowongkomon, Kiattawee; Thammasorn, Wimada; Kuznetsov, Vladimir A.

    2015-01-01

    A triplex target DNA site (TTS), a stretch of DNA that is composed of polypurines, is able to form a triple-helix (triplex) structure with triplex-forming oligonucleotides (TFOs) and is able to influence the site-specific modulation of gene expression and/or the modification of genomic DNA. The co-localization of a genomic TTS with gene regulatory signals and functional genome structures suggests that TFOs could potentially be exploited in antigene strategies for the therapy of cancers and other genetic diseases. Here, we present the TTS Mapping and Integration (TTSMI; http://ttsmi.bii.a-star.edu.sg) database, which provides a catalog of unique TTS locations in the human genome and tools for analyzing the co-localization of TTSs with genomic regulatory sequences and signals that were identified using next-generation sequencing techniques and/or predicted by computational models. TTSMI was designed as a user-friendly tool that facilitates (i) fast searching/filtering of TTSs using several search terms and criteria associated with sequence stability and specificity, (ii) interactive filtering of TTSs that co-localize with gene regulatory signals and non-B DNA structures, (iii) exploration of dynamic combinations of the biological signals of specific TTSs and (iv) visualization of a TTS simultaneously with diverse annotation tracks via the UCSC genome browser. PMID:25324314

  13. The TTSMI database: a catalog of triplex target DNA sites associated with genes and regulatory elements in the human genome.

    PubMed

    Jenjaroenpun, Piroon; Chew, Chee Siang; Yong, Tai Pang; Choowongkomon, Kiattawee; Thammasorn, Wimada; Kuznetsov, Vladimir A

    2015-01-01

    A triplex target DNA site (TTS), a stretch of DNA that is composed of polypurines, is able to form a triple-helix (triplex) structure with triplex-forming oligonucleotides (TFOs) and is able to influence the site-specific modulation of gene expression and/or the modification of genomic DNA. The co-localization of a genomic TTS with gene regulatory signals and functional genome structures suggests that TFOs could potentially be exploited in antigene strategies for the therapy of cancers and other genetic diseases. Here, we present the TTS Mapping and Integration (TTSMI; http://ttsmi.bii.a-star.edu.sg) database, which provides a catalog of unique TTS locations in the human genome and tools for analyzing the co-localization of TTSs with genomic regulatory sequences and signals that were identified using next-generation sequencing techniques and/or predicted by computational models. TTSMI was designed as a user-friendly tool that facilitates (i) fast searching/filtering of TTSs using several search terms and criteria associated with sequence stability and specificity, (ii) interactive filtering of TTSs that co-localize with gene regulatory signals and non-B DNA structures, (iii) exploration of dynamic combinations of the biological signals of specific TTSs and (iv) visualization of a TTS simultaneously with diverse annotation tracks via the UCSC genome browser. PMID:25324314

  14. Use of a Drosophila Genome-Wide Conserved Sequence Database to Identify Functionally Related cis-Regulatory Enhancers

    PubMed Central

    Brody, Thomas; Yavatkar, Amarendra S; Kuzin, Alexander; Kundu, Mukta; Tyson, Leonard J; Ross, Jermaine; Lin, Tzu-Yang; Lee, Chi-Hon; Awasaki, Takeshi; Lee, Tzumin; Odenwald, Ward F

    2012-01-01

    Background: Phylogenetic footprinting has revealed that cis-regulatory enhancers consist of conserved DNA sequence clusters (CSCs). Currently, there is no systematic approach for enhancer discovery and analysis that takes full-advantage of the sequence information within enhancer CSCs. Results: We have generated a Drosophila genome-wide database of conserved DNA consisting of >100,000 CSCs derived from EvoPrints spanning over 90% of the genome. cis-Decoder database search and alignment algorithms enable the discovery of functionally related enhancers. The program first identifies conserved repeat elements within an input enhancer and then searches the database for CSCs that score highly against the input CSC. Scoring is based on shared repeats as well as uniquely shared matches, and includes measures of the balance of shared elements, a diagnostic that has proven to be useful in predicting cis-regulatory function. To demonstrate the utility of these tools, a temporally-restricted CNS neuroblast enhancer was used to identify other functionally related enhancers and analyze their structural organization. Conclusions: cis-Decoder reveals that co-regulating enhancers consist of combinations of overlapping shared sequence elements, providing insights into the mode of integration of multiple regulating transcription factors. The database and accompanying algorithms should prove useful in the discovery and analysis of enhancers involved in any developmental process. Developmental Dynamics 241:169–189, 2012. © 2011 Wiley Periodicals, Inc. Key findings A genome-wide catalog of Drosophila conserved DNA sequence clusters. cis-Decoder discovers functionally related enhancers. Functionally related enhancers share balanced sequence element copy numbers. Many enhancers function during multiple phases of development. PMID:22174086

  15. Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases

    PubMed Central

    Sanderson, Lacey-Anne; Ficklin, Stephen P.; Cheng, Chun-Huai; Jung, Sook; Feltus, Frank A.; Bett, Kirstin E.; Main, Dorrie

    2013-01-01

    Tripal is an open-source freely available toolkit for construction of online genomic and genetic databases. It aims to facilitate development of community-driven biological websites by integrating the GMOD Chado database schema with Drupal, a popular website creation and content management software. Tripal provides a suite of tools for interaction with a Chado database and display of content therein. The tools are designed to be generic to support the various ways in which data may be stored in Chado. Previous releases of Tripal have supported organisms, genomic libraries, biological stocks, stock collections and genomic features, their alignments and annotations. Also, Tripal and its extension modules provided loaders for commonly used file formats such as FASTA, GFF, OBO, GAF, BLAST XML, KEGG heir files and InterProScan XML. Default generic templates were provided for common views of biological data, which could be customized using an open Application Programming Interface to change the way data are displayed. Here, we report additional tools and functionality that are part of release v1.1 of Tripal. These include (i) a new bulk loader that allows a site curator to import data stored in a custom tab delimited format; (ii) full support of every Chado table for Drupal Views (a powerful tool allowing site developers to construct novel displays and search pages); (iii) new modules including ‘Feature Map’, ‘Genetic’, ‘Publication’, ‘Project’, ‘Contact’ and the ‘Natural Diversity’ modules. Tutorials, mailing lists, download and set-up instructions, extension modules and other documentation can be found at the Tripal website located at http://tripal.info. Database URL: http://tripal.info/ PMID:24163125

  16. Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases.

    PubMed

    Sanderson, Lacey-Anne; Ficklin, Stephen P; Cheng, Chun-Huai; Jung, Sook; Feltus, Frank A; Bett, Kirstin E; Main, Dorrie

    2013-01-01

    Tripal is an open-source freely available toolkit for construction of online genomic and genetic databases. It aims to facilitate development of community-driven biological websites by integrating the GMOD Chado database schema with Drupal, a popular website creation and content management software. Tripal provides a suite of tools for interaction with a Chado database and display of content therein. The tools are designed to be generic to support the various ways in which data may be stored in Chado. Previous releases of Tripal have supported organisms, genomic libraries, biological stocks, stock collections and genomic features, their alignments and annotations. Also, Tripal and its extension modules provided loaders for commonly used file formats such as FASTA, GFF, OBO, GAF, BLAST XML, KEGG heir files and InterProScan XML. Default generic templates were provided for common views of biological data, which could be customized using an open Application Programming Interface to change the way data are displayed. Here, we report additional tools and functionality that are part of release v1.1 of Tripal. These include (i) a new bulk loader that allows a site curator to import data stored in a custom tab delimited format; (ii) full support of every Chado table for Drupal Views (a powerful tool allowing site developers to construct novel displays and search pages); (iii) new modules including 'Feature Map', 'Genetic', 'Publication', 'Project', 'Contact' and the 'Natural Diversity' modules. Tutorials, mailing lists, download and set-up instructions, extension modules and other documentation can be found at the Tripal website located at http://tripal.info. DATABASE URL: http://tripal.info/. PMID:24163125

  17. Fungal plant cell wall-degrading enzyme database: a platform for comparative and evolutionary genomics in fungi and Oomycetes

    PubMed Central

    2013-01-01

    Background Plant cell wall-degrading enzymes (PCWDEs) play significant roles throughout the fungal life including acquisition of nutrients and decomposition of plant cell walls. In addition, many of PCWDEs are also utilized by biofuel and pulp industries. In order to develop a comparative genomics platform focused in fungal PCWDEs and provide a resource for evolutionary studies, Fungal PCWDE Database (FPDB) is constructed (http://pcwde.riceblast.snu.ac.kr/). Results In order to archive fungal PCWDEs, 22 sequence profiles were constructed and searched on 328 genomes of fungi, Oomycetes, plants and animals. A total of 6,682 putative genes encoding PCWDEs were predicted, showing differential distribution by their life styles, host ranges and taxonomy. Genes known to be involved in fungal pathogenicity, including polygalacturonase (PG) and pectin lyase, were enriched in plant pathogens. Furthermore, crop pathogens had more PCWDEs than those of rot fungi, implying that the PCWDEs analysed in this study are more needed for invading plant hosts than wood-decaying processes. Evolutionary analysis of PGs in 34 selected genomes revealed that gene duplication and loss events were mainly driven by taxonomic divergence and partly contributed by those events in species-level, especially in plant pathogens. Conclusions The FPDB would provide a fungi-specialized genomics platform, a resource for evolutionary studies of PCWDE gene families and extended analysis option by implementing Favorite, which is a data exchange and analysis hub built in Comparative Fungal Genomics Platform (CFGP 2.0; http://cfgp.snu.ac.kr/). PMID:24564786

  18. Conserved Secondary Structures in Aspergillus

    PubMed Central

    McGuire, Abigail Manson; Galagan, James E.

    2008-01-01

    Background Recent evidence suggests that the number and variety of functional RNAs (ncRNAs as well as cis-acting RNA elements within mRNAs ) is much higher than previously thought; thus, the ability to computationally predict and analyze RNAs has taken on new importance. We have computationally studied the secondary structures in an alignment of six Aspergillus genomes. Little is known about the RNAs present in this set of fungi, and this diverse set of genomes has an optimal level of sequence conservation for observing the correlated evolution of base-pairs seen in RNAs. Methodology/Principal Findings We report the results of a whole-genome search for evolutionarily conserved secondary structures, as well as the results of clustering these predicted secondary structures by structural similarity. We find a total of 7450 predicted secondary structures, including a new predicted ∼60 bp long hairpin motif found primarily inside introns. We find no evidence for microRNAs. Different types of genomic regions are over-represented in different classes of predicted secondary structures. Exons contain the longest motifs (primarily long, branched hairpins), 5′ UTRs primarily contain groupings of short hairpins located near the start codon, and 3′ UTRs contain very little secondary structure compared to other regions. There is a large concentration of short hairpins just inside the boundaries of exons. The density of predicted intronic RNAs increases with the length of introns, and the density of predicted secondary structures within mRNA coding regions increases with the number of introns in a gene. Conclusions/Sigificance There are many conserved, high-confidence RNAs of unknown function in these Aspergillus genomes, as well as interesting spatial distributions of predicted secondary structures. This study increases our knowledge of secondary structure in these aspergillus organisms. PMID:18665251

  19. The FlyBase database of the Drosophila genome projects andcommunity literature

    SciTech Connect

    Gelbart, William; Bayraktaroglu, Leyla; Bettencourt, Brian; Campbell, Kathy; Crosby, Madeline; Emmert, David; Hradecky, Pavel; Huang,Yanmei; Letovsky, Stan; Matthews, Beverly; Russo, Susan; Schroeder,Andrew; Smutniak, Frank; Zhou, Pinglei; Zytkovicz, Mark; Ashburner,Michael; Drysdale, Rachel; de Grey, Aubrey; Foulger, Rebecca; Millburn,Gillian; Yamada, Chihiro; Kaufman, Thomas; Matthews, Kathy; Gilbert, Don; Grumbling, Gary; Strelets, Victor; Shemen, C.; Rubin, Gerald; Berman,Brian; Frise, Erwin; Gibson, Mark; Harris, Nomi; Kaminker, Josh; Lewis,Suzanna; Marshall, Brad; Misra, Sima; Mungall, Christopher; Prochnik,Simon; Richter, John; Smith, Christopher; Shu, ShengQiang; Tupy,Jonathan; Wiel, Colin

    2002-09-16

    FlyBase (http://flybase.bio.indiana.edu/) provides an integrated view of the fundamental genomic and genetic data on the major genetic model Drosophila melanogaster and related species. FlyBase has primary responsibility for the continual reannotation of the D.melanogaster genome. The ultimate goal of the reannotation effort is to decorate the euchromatic sequence of the genome with as much biological information as is available from the community and from the major genome project centers. A complete revision of the annotations of the now-finished euchromatic genomic sequence has been completed. There are many points of entry to the genome within FlyBase, most notably through maps, gene products and ontologies, structured phenotypic and gene expression data, and anatomy.

  20. The two genome sequence release and blast server construction for aflatoxin-producing L and S strains Aspergillus parasiticus and A. flavus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxins are toxic and carcinogenic secondary metabolites. These compounds, produced by Aspergillus flavus and A. parasiticus, contaminate pre-harvest agricultural crops in the field and post-harvest grains during storage. In order to reduce and eliminate aflatoxin contamination of food and feed...

  1. PlanTE-MIR DB: a database for transposable element-related microRNAs in plant genomes.

    PubMed

    R Lorenzetti, Alan P; A de Antonio, Gabriel Y; Paschoal, Alexandre R; Domingues, Douglas S

    2016-05-01

    Transposable elements (TEs) comprise a major fraction of many plant genomes and are known to drive their organization and evolution. Several studies show that these repetitive elements have a prominent role in shaping noncoding regions of the genome such as microRNA (miRNA) loci, which are components of post-transcriptional regulation mechanisms. Although some studies have reported initial formation of miRNA loci from TE sequences, especially in model plants, the approaches that were used did not employ systems that would allow results to be delivered by a user-friendly database. In this study, we identified 152 precursor miRNAs overlapping TEs in 10 plant species. PlanTE-MIR DB was designed to assemble this data and deliver it to the scientific community interested in miRNA origin, evolution, and regulation pathways. Users can browse the database through a web interface and search for entries using various parameters. This resource is cross-referenced with repetitive element (Repbase Update) and miRNA (miRBase) repositories, where sequences can be checked for further analysis. All data in PlanTE-MIR DB are publicly available for download in several file formats to facilitate their understanding and use. The database is hosted at http://bioinfo-tool.cp.utfpr.edu.br/plantemirdb/ . PMID:26887375

  2. VitisExpDB: A Database Resource for Grape Functional Genomics

    Technology Transfer Automated Retrieval System (TEKTRAN)

    VitisExpDB is an online MySQL-PHP driven relational database that houses annotated EST and gene expression data for Vitis vinifera and non-vinifera grape varieties. Currently, the database stores ~320,000 EST sequences derived from 8 species/hybrids, their annotation details and gene ontology based...

  3. ASPERGILLUS BOMBYCIS GENOTYPES (RFLP) FROM SILKWORM CULTIVATION

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Eighteen isolates of Aspergillus bombycis from samples of dust, insect frass, and soil collected from 8 silkworm rearing facilities in Japan, as well as single silkworm rearing facilities in Indonesia and Malaysia, were subjected to DNA fingerprinting. PstI digests of total genomic DNA from each is...

  4. Practical Value of Food Pathogen Traceability through Building a Whole-Genome Sequencing Network and Database.

    PubMed

    Allard, Marc W; Strain, Errol; Melka, David; Bunning, Kelly; Musser, Steven M; Brown, Eric W; Timme, Ruth

    2016-08-01

    The FDA has created a United States-based open-source whole-genome sequencing network of state, federal, international, and commercial partners. The GenomeTrakr network represents a first-of-its-kind distributed genomic food shield for characterizing and tracing foodborne outbreak pathogens back to their sources. The GenomeTrakr network is leading investigations of outbreaks of foodborne illnesses and compliance actions with more accurate and rapid recalls of contaminated foods as well as more effective monitoring of preventive controls for food manufacturing environments. An expanded network would serve to provide an international rapid surveillance system for pathogen traceback, which is critical to support an effective public health response to bacterial outbreaks. PMID:27008877

  5. The Littorina sequence database (LSD)--an online resource for genomic data.

    PubMed

    Canbäck, Björn; André, Carl; Galindo, Juan; Johannesson, Kerstin; Johansson, Tomas; Panova, Marina; Tunlid, Anders; Butlin, Roger

    2012-01-01

    We present an interactive, searchable expressed sequence tag database for the periwinkle snail Littorina saxatilis, an upcoming model species in evolutionary biology. The database is the result of a hybrid assembly between Sanger and 454 sequences, 1290 and 147,491 sequences respectively. Normalized and non-normalized cDNA was obtained from different ecotypes of L. saxatilis collected in the UK and Sweden. The Littorina sequence database (LSD) contains 26,537 different contigs, of which 2453 showed similarity with annotated proteins in UniProt. Querying the LSD permits the selection of the taxonomic origin of blast hits for each contig, and the search can be restricted to particular taxonomic groups. The database allows access to UniProt annotations, blast output, protein family domains (PFAM) and Gene Ontology. The database will allow users to search for genetic markers and identifying candidate genes or genes for expression analyses. It is open for additional deposition of sequence information for L. saxatilis and other species of the genus Littorina. The LSD is available at http://mbio-serv2.mbioekol.lu.se/Littorina/. PMID:21707958

  6. Comprehensive annotation of secondary metabolite biosynthetic genes and gene clusters of Aspergillus nidulans, A. fumigatus, A. niger and A. oryzae

    PubMed Central

    2013-01-01

    Background Secondary metabolite production, a hallmark of filamentous fungi, is an expanding area of research for the Aspergilli. These compounds are potent chemicals, ranging from deadly toxins to therapeutic antibiotics to potential anti-cancer drugs. The genome sequences for multiple Aspergilli have been determined, and provide a wealth of predictive information about secondary metabolite production. Sequence analysis and gene overexpression strategies have enabled the discovery of novel secondary metabolites and the genes involved in their biosynthesis. The Aspergillus Genome Database (AspGD) provides a central repository for gene annotation and protein information for Aspergillus species. These annotations include Gene Ontology (GO) terms, phenotype data, gene names and descriptions and they are crucial for interpreting both small- and large-scale data and for aiding in the design of new experiments that further Aspergillus research. Results We have manually curated Biological Process GO annotations for all genes in AspGD with recorded functions in secondary metabolite production, adding new GO terms that specifically describe each secondary metabolite. We then leveraged these new annotations to predict roles in secondary metabolism for genes lacking experimental characterization. As a starting point for manually annotating Aspergillus secondary metabolite gene clusters, we used antiSMASH (antibiotics and Secondary Metabolite Analysis SHell) and SMURF (Secondary Metabolite Unknown Regions Finder) algorithms to identify potential clusters in A. nidulans, A. fumigatus, A. niger and A. oryzae, which we subsequently refined through manual curation. Conclusions This set of 266 manually curated secondary metabolite gene clusters will facilitate the investigation of novel Aspergillus secondary metabolites. PMID:23617571

  7. The Disease Portals, disease-gene annotation and the RGD disease ontology at the Rat Genome Database.

    PubMed

    Hayman, G Thomas; Laulederkind, Stanley J F; Smith, Jennifer R; Wang, Shur-Jen; Petri, Victoria; Nigam, Rajni; Tutaj, Marek; De Pons, Jeff; Dwinell, Melinda R; Shimoyama, Mary

    2016-01-01

    The Rat Genome Database (RGD;http://rgd.mcw.edu/) provides critical datasets and software tools to a diverse community of rat and non-rat researchers worldwide. To meet the needs of the many users whose research is disease oriented, RGD has created a series of Disease Portals and has prioritized its curation efforts on the datasets important to understanding the mechanisms of various diseases. Gene-disease relationships for three species, rat, human and mouse, are annotated to capture biomarkers, genetic associations, molecular mechanisms and therapeutic targets. To generate gene-disease annotations more effectively and in greater detail, RGD initially adopted the MEDIC disease vocabulary from the Comparative Toxicogenomics Database and adapted it for use by expanding this framework with the addition of over 1000 terms to create the RGD Disease Ontology (RDO). The RDO provides the foundation for, at present, 10 comprehensive disease area-related dataset and analysis platforms at RGD, the Disease Portals. Two major disease areas are the focus of data acquisition and curation efforts each year, leading to the release of the related Disease Portals. Collaborative efforts to realize a more robust disease ontology are underway. Database URL:http://rgd.mcw.edu. PMID:27009807

  8. The Disease Portals, disease–gene annotation and the RGD disease ontology at the Rat Genome Database

    PubMed Central

    Hayman, G. Thomas; Laulederkind, Stanley J. F.; Smith, Jennifer R.; Wang, Shur-Jen; Petri, Victoria; Nigam, Rajni; Tutaj, Marek; De Pons, Jeff; Dwinell, Melinda R.; Shimoyama, Mary

    2016-01-01

    The Rat Genome Database (RGD; http://rgd.mcw.edu/) provides critical datasets and software tools to a diverse community of rat and non-rat researchers worldwide. To meet the needs of the many users whose research is disease oriented, RGD has created a series of Disease Portals and has prioritized its curation efforts on the datasets important to understanding the mechanisms of various diseases. Gene-disease relationships for three species, rat, human and mouse, are annotated to capture biomarkers, genetic associations, molecular mechanisms and therapeutic targets. To generate gene–disease annotations more effectively and in greater detail, RGD initially adopted the MEDIC disease vocabulary from the Comparative Toxicogenomics Database and adapted it for use by expanding this framework with the addition of over 1000 terms to create the RGD Disease Ontology (RDO). The RDO provides the foundation for, at present, 10 comprehensive disease area-related dataset and analysis platforms at RGD, the Disease Portals. Two major disease areas are the focus of data acquisition and curation efforts each year, leading to the release of the related Disease Portals. Collaborative efforts to realize a more robust disease ontology are underway. Database URL: http://rgd.mcw.edu PMID:27009807

  9. Islander: A database of precisely mapped genomic islands in tRNA and tmRNA genes

    DOE PAGESBeta

    Hudson, Corey M.; Lau, Britney Y.; Williams, Kelly P.

    2014-11-05

    Genomic islands are mobile DNAs that are major agents of bacterial and archaeal evolution. Integration into prokaryotic chromosomes usually occurs site-specifically at tRNA or tmRNA gene (together, tDNA) targets, catalyzed by tyrosine integrases. This splits the target gene, yet sequences within the island restore the disrupted gene; the regenerated target and its displaced fragment precisely mark the endpoints of the island. We applied this principle to search for islands in genomic DNA sequences. Our algorithm identifies tDNAs, finds fragments of those tDNAs in the same replicon and removes unlikely candidate islands through a series of filters. A search for islandsmore » in 2168 whole prokaryotic genomes produced 3919 candidates. The website Islander (recently moved to http://bioinformatics.sandia.gov/islander/) presents these precisely mapped candidate islands, the gene content and the island sequence. The algorithm further insists that each island encode an integrase, and attachment site sequence identity is carefully noted; therefore, the database also serves in the study of integrase site-specificity and its evolution.« less

  10. Islander: A database of precisely mapped genomic islands in tRNA and tmRNA genes

    SciTech Connect

    Hudson, Corey M.; Lau, Britney Y.; Williams, Kelly P.

    2014-11-05

    Genomic islands are mobile DNAs that are major agents of bacterial and archaeal evolution. Integration into prokaryotic chromosomes usually occurs site-specifically at tRNA or tmRNA gene (together, tDNA) targets, catalyzed by tyrosine integrases. This splits the target gene, yet sequences within the island restore the disrupted gene; the regenerated target and its displaced fragment precisely mark the endpoints of the island. We applied this principle to search for islands in genomic DNA sequences. Our algorithm identifies tDNAs, finds fragments of those tDNAs in the same replicon and removes unlikely candidate islands through a series of filters. A search for islands in 2168 whole prokaryotic genomes produced 3919 candidates. The website Islander (recently moved to http://bioinformatics.sandia.gov/islander/) presents these precisely mapped candidate islands, the gene content and the island sequence. The algorithm further insists that each island encode an integrase, and attachment site sequence identity is carefully noted; therefore, the database also serves in the study of integrase site-specificity and its evolution.

  11. Aspergillus antigen skin test (image)

    MedlinePlus

    The aspergillus antigen skin test determines whether or not a person has been exposed to the mold aspergillus. It is performed by injecting an aspergillus antigen under the skin with a needle. After 48 ...

  12. Design and implementation of a twin-family database for behavior genetics and genomics studies.

    PubMed

    Boomsma, Dorret I; Willemsen, Gonneke; Vink, Jacqueline M; Bartels, Meike; Groot, Paul; Hottenga, Jouke Jan; van Beijsterveldt, C E M Toos; Stroet, Therese; van Dijk, Rob; Wertheim, Rien; Visser, Marco; van der Kleij, Frank

    2008-06-01

    In this article we describe the design and implementation of a database for extended twin families. The database does not focus on probands or on index twins, as this approach becomes problematic when larger multigenerational families are included, when more than one set of multiples is present within a family, or when families turn out to be part of a larger pedigree. Instead, we present an alternative approach that uses a highly flexible notion of persons and relations. The relations among the subjects in the database have a one-to-many structure, are user-definable and extendible and support arbitrarily complicated pedigrees. Some additional characteristics of the database are highlighted, such as the storage of historical data, predefined expressions for advanced queries, output facilities for individuals and relations among individuals and an easy-to-use multi-step wizard for contacting participants. This solution presents a flexible approach to accommodate pedigrees of arbitrary size, multiple biological and nonbiological relationships among participants and dynamic changes in these relations that occur over time, which can be implemented for any type of multigenerational family study. PMID:18498212

  13. PLEXdb: Plant and Pathogen Expression Database and Tools for Comparative and Functional Genomics Analysis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    PLEXdb is a plant expression database that supports all Affymetrix microarray designs for plants and plant pathogens. PLEXdb provides annotation and hand-curated microarray data. Experiments deposited in PLEXdb are checked for MIAME/Plant compliance and completeness, then processed by normalizing th...

  14. An Integrated Database for Grass and Endophyte Genomics at www.grassendophyte.org

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The endophytic microbes are able to promote plant growth and health under various stresses via their symbiotic association with host plants. Genome-wide comparative analysis has been extensively employed to decipher complex mechanisms of interactions between endophytic microbes and host plants, resu...

  15. Panzea: A Database and Resource for Molecular and Functional Diversity in the Maize Genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Maize, a classical model for genetic studies, is an important agronomic crop, and the genome is known to contain many natural differences in the DNA between different strains. On average, two randomly chosen maize lines have an average of one single nucleotide polymorphism (SNP) every ~100 bp; this...

  16. Aspergillus spinal epidural abscess

    SciTech Connect

    Byrd, B.F. III; Weiner, M.H.; McGee, Z.A.

    1982-12-17

    A spinal epidural abscess developed in a renal transplant recipient; results of a serum radioimmunoassay for Aspergillus antigen were positive. Laminectomy disclosed an abscess of the L4-5 interspace and L-5 vertebral body that contained hyphal forms and from which Aspergillus species was cultured. Serum Aspergillus antigen radioimmunoassay may be a valuable, specific early diagnostic test when systemic aspergillosis is a consideration in an immunosuppressed host.

  17. Citrus sinensis annotation project (CAP): a comprehensive database for sweet orange genome.

    PubMed

    Wang, Jia; Chen, Dijun; Lei, Yang; Chang, Ji-Wei; Hao, Bao-Hai; Xing, Feng; Li, Sen; Xu, Qiang; Deng, Xiu-Xin; Chen, Ling-Ling

    2014-01-01

    Citrus is one of the most important and widely grown fruit crop with global production ranking firstly among all the fruit crops in the world. Sweet orange accounts for more than half of the Citrus production both in fresh fruit and processed juice. We have sequenced the draft genome of a double-haploid sweet orange (C. sinensis cv. Valencia), and constructed the Citrus sinensis annotation project (CAP) to store and visualize the sequenced genomic and transcriptome data. CAP provides GBrowse-based organization of sweet orange genomic data, which integrates ab initio gene prediction, EST, RNA-seq and RNA-paired end tag (RNA-PET) evidence-based gene annotation. Furthermore, we provide a user-friendly web interface to show the predicted protein-protein interactions (PPIs) and metabolic pathways in sweet orange. CAP provides comprehensive information beneficial to the researchers of sweet orange and other woody plants, which is freely available at http://citrus.hzau.edu.cn/. PMID:24489955

  18. Maize databases

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This chapter is a succinct overview of maize data held in the species-specific database MaizeGDB (the Maize Genomics and Genetics Database), and selected multi-species data repositories, such as Gramene/Ensembl Plants, Phytozome, UniProt and the National Center for Biotechnology Information (NCBI), ...

  19. Aspergillus fumigatus and Aspergillosis

    PubMed Central

    Latgé, Jean-Paul

    1999-01-01

    Aspergillus fumigatus is one of the most ubiquitous of the airborne saprophytic fungi. Humans and animals constantly inhale numerous conidia of this fungus. The conidia are normally eliminated in the immunocompetent host by innate immune mechanisms, and aspergilloma and allergic bronchopulmonary aspergillosis, uncommon clinical syndromes, are the only infections observed in such hosts. Thus, A. fumigatus was considered for years to be a weak pathogen. With increases in the number of immunosuppressed patients, however, there has been a dramatic increase in severe and usually fatal invasive aspergillosis, now the most common mold infection worldwide. In this review, the focus is on the biology of A. fumigatus and the diseases it causes. Included are discussions of (i) genomic and molecular characterization of the organism, (ii) clinical and laboratory methods available for the diagnosis of aspergillosis in immunocompetent and immunocompromised hosts, (iii) identification of host and fungal factors that play a role in the establishment of the fungus in vivo, and (iv) problems associated with antifungal therapy. PMID:10194462

  20. GELBANK : A database of annotated two-dimensional gel electrophoresis patterns of biological systems with completed genomes.

    SciTech Connect

    Babnigg, G.; Giometti, C. S.; Biosciences Division

    2004-01-01

    GELBANK is a publicly available database of two-dimensional gel electrophoresis (2DE) gel patterns of proteomes from organisms with known genome information (available at and ftp://bioinformatics.anl.gov/gelbank/). Currently it includes 131 completed, mostly microbial proteomes available from the National Center for Biotechnology Information. A web interface allows the upload of 2D gel patterns and their annotation for registered users. The images are organized by species, tissue type, separation method, sample type and staining method. The database can be queried based on protein or 2DE-pattern attributes. A web interface allows registered users to assign molecular weight and pH gradient profiles to their own 2D gel patterns as well as to link protein identifications to a given spot on the pattern. The website presents all of the submitted 2D gel patterns where the end-user can dynamically display the images or parts of images along with molecular weight, pH profile information and linked protein identification. A collection of images can be selected for the creation of animations from which the user can select sub-regions of interest and unlimited 2D gel patterns for visualization. The website currently presents 233 identifications for 81 gel patterns for Homo sapiens, Methanococcus jannaschii, Pyro coccus furiosus, Shewanella oneidensis, Escherichia coli and Deinococcus radiodurans.

  1. MitoFish and MitoAnnotator: A Mitochondrial Genome Database of Fish with an Accurate and Automatic Annotation Pipeline

    PubMed Central

    Iwasaki, Wataru; Fukunaga, Tsukasa; Isagozawa, Ryota; Yamada, Koichiro; Maeda, Yasunobu; Satoh, Takashi P.; Sado, Tetsuya; Mabuchi, Kohji; Takeshima, Hirohiko; Miya, Masaki; Nishida, Mutsumi

    2013-01-01

    Mitofish is a database of fish mitochondrial genomes (mitogenomes) that includes powerful and precise de novo annotations for mitogenome sequences. Fish occupy an important position in the evolution of vertebrates and the ecology of the hydrosphere, and mitogenomic sequence data have served as a rich source of information for resolving fish phylogenies and identifying new fish species. The importance of a mitogenomic database continues to grow at a rapid pace as massive amounts of mitogenomic data are generated with the advent of new sequencing technologies. A severe bottleneck seems likely to occur with regard to mitogenome annotation because of the overwhelming pace of data accumulation and the intrinsic difficulties in annotating sequences with degenerating transfer RNA structures, divergent start/stop codons of the coding elements, and the overlapping of adjacent elements. To ease this data backlog, we developed an annotation pipeline named MitoAnnotator. MitoAnnotator automatically annotates a fish mitogenome with a high degree of accuracy in approximately 5 min; thus, it is readily applicable to data sets of dozens of sequences. MitoFish also contains re-annotations of previously sequenced fish mitogenomes, enabling researchers to refer to them when they find annotations that are likely to be erroneous or while conducting comparative mitogenomic analyses. For users who need more information on the taxonomy, habitats, phenotypes, or life cycles of fish, MitoFish provides links to related databases. MitoFish and MitoAnnotator are freely available at http://mitofish.aori.u-tokyo.ac.jp/ (last accessed August 28, 2013); all of the data can be batch downloaded, and the annotation pipeline can be used via a web interface. PMID:23955518

  2. MitoFish and MitoAnnotator: a mitochondrial genome database of fish with an accurate and automatic annotation pipeline.

    PubMed

    Iwasaki, Wataru; Fukunaga, Tsukasa; Isagozawa, Ryota; Yamada, Koichiro; Maeda, Yasunobu; Satoh, Takashi P; Sado, Tetsuya; Mabuchi, Kohji; Takeshima, Hirohiko; Miya, Masaki; Nishida, Mutsumi

    2013-11-01

    Mitofish is a database of fish mitochondrial genomes (mitogenomes) that includes powerful and precise de novo annotations for mitogenome sequences. Fish occupy an important position in the evolution of vertebrates and the ecology of the hydrosphere, and mitogenomic sequence data have served as a rich source of information for resolving fish phylogenies and identifying new fish species. The importance of a mitogenomic database continues to grow at a rapid pace as massive amounts of mitogenomic data are generated with the advent of new sequencing technologies. A severe bottleneck seems likely to occur with regard to mitogenome annotation because of the overwhelming pace of data accumulation and the intrinsic difficulties in annotating sequences with degenerating transfer RNA structures, divergent start/stop codons of the coding elements, and the overlapping of adjacent elements. To ease this data backlog, we developed an annotation pipeline named MitoAnnotator. MitoAnnotator automatically annotates a fish mitogenome with a high degree of accuracy in approximately 5 min; thus, it is readily applicable to data sets of dozens of sequences. MitoFish also contains re-annotations of previously sequenced fish mitogenomes, enabling researchers to refer to them when they find annotations that are likely to be erroneous or while conducting comparative mitogenomic analyses. For users who need more information on the taxonomy, habitats, phenotypes, or life cycles of fish, MitoFish provides links to related databases. MitoFish and MitoAnnotator are freely available at http://mitofish.aori.u-tokyo.ac.jp/ (last accessed August 28, 2013); all of the data can be batch downloaded, and the annotation pipeline can be used via a web interface. PMID:23955518

  3. The CATH database: an extended protein family resource for structural and functional genomics

    PubMed Central

    Pearl, F. M. G.; Bennett, C. F.; Bray, J. E.; Harrison, A. P.; Martin, N.; Shepherd, A.; Sillitoe, I.; Thornton, J.; Orengo, C. A.

    2003-01-01

    The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath_new) currently contains 34 287 domain structures classified into 1383 superfamilies and 3285 sequence families. Each structural family is expanded with domain sequence relatives recruited from GenBank using a variety of efficient sequence search protocols and reliable thresholds. This extended resource, known as the CATH-protein family database (CATH-PFDB) contains a total of 310 000 domain sequences classified into 26 812 sequence families. New sequence search protocols have been designed, based on these intermediate sequence libraries, to allow more regular updating of the classification. Further developments include the adaptation of a recently developed method for rapid structure comparison, based on secondary structure matching, for domain boundary assignment. The philosophy behind CATHEDRAL is the recognition of recurrent folds already classified in CATH. Benchmarking of CATHEDRAL, using manually validated domain assignments, demonstrated that 43% of domains boundaries could be completely automatically assigned. This is an improvement on a previous consensus approach for which only 10–20% of domains could be reliably processed in a completely automated fashion. Since domain boundary assignment is a significant bottleneck in the classification of new structures, CATHEDRAL will also help to increase the frequency of CATH updates. PMID:12520050

  4. A genome-scale metabolic flux model of Escherichia coli K–12 derived from the EcoCyc database

    PubMed Central

    2014-01-01

    Background Constraint-based models of Escherichia coli metabolic flux have played a key role in computational studies of cellular metabolism at the genome scale. We sought to develop a next-generation constraint-based E. coli model that achieved improved phenotypic prediction accuracy while being frequently updated and easy to use. We also sought to compare model predictions with experimental data to highlight open questions in E. coli biology. Results We present EcoCyc–18.0–GEM, a genome-scale model of the E. coli K–12 MG1655 metabolic network. The model is automatically generated from the current state of EcoCyc using the MetaFlux software, enabling the release of multiple model updates per year. EcoCyc–18.0–GEM encompasses 1445 genes, 2286 unique metabolic reactions, and 1453 unique metabolites. We demonstrate a three-part validation of the model that breaks new ground in breadth and accuracy: (i) Comparison of simulated growth in aerobic and anaerobic glucose culture with experimental results from chemostat culture and simulation results from the E. coli modeling literature. (ii) Essentiality prediction for the 1445 genes represented in the model, in which EcoCyc–18.0–GEM achieves an improved accuracy of 95.2% in predicting the growth phenotype of experimental gene knockouts. (iii) Nutrient utilization predictions under 431 different media conditions, for which the model achieves an overall accuracy of 80.7%. The model’s derivation from EcoCyc enables query and visualization via the EcoCyc website, facilitating model reuse and validation by inspection. We present an extensive investigation of disagreements between EcoCyc–18.0–GEM predictions and experimental data to highlight areas of interest to E. coli modelers and experimentalists, including 70 incorrect predictions of gene essentiality on glucose, 80 incorrect predictions of gene essentiality on glycerol, and 83 incorrect predictions of nutrient utilization. Conclusion Significant

  5. Hybridization between Aspergillus flavus and Aspergillus parasiticus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    To date the sexual stages or teleomorphs have been described for three aflatoxigenic species in Aspergillus section Flavi: Petromyces flavus, P. parasiticus and P. nomius. In this study we examined the possibility of interspecific matings between A. flavus and A. parasiticus. These species can b...

  6. Cloning and characterization of three Aspergillus niger promoters.

    PubMed

    Luo, X

    1995-09-22

    An Aspergillus niger (An) genomic library was constructed using the promoter-trap vector, pLX2A, which contains a hygromycin B (Hy) phosphotransferase-encoding gene (hph) for selection of DNA fragments with promoter activity. This library was transformed in Escherichia coli and 80,000 colonies were obtained, 94% of which contained inserts. Transformations of plasmid DNA from the library into An resulted in 53 Hy-resistant (HyR) colonies. Southern blot analysis of 21 transformants confirmed the integration of hph into the An genome. Using the sib selection procedure, three functional promoters, PX6, PX18 and PX21, were identified from this library. Both DNA strands of all three fragments were sequenced and their sequences showed no significant homology to those in the database. Comparison of the sequences of all known promoters from An suggested that C+T-rich stretches are probably important for promoter structures. The promoter activity was analysed further using beta-galactosidase (beta Gal) as a quantitative marker. The results suggest that while PX21 is a much stronger promoter than the known alpha-amylase promoter of A. oryzae, PX6 promotes only weak expression of beta Gal. PMID:7557461

  7. MitoZoa 2.0: a database resource and search tools for comparative and evolutionary analyses of mitochondrial genomes in Metazoa

    PubMed Central

    D'Onorio de Meo, Paolo; D'Antonio, Mattia; Griggio, Francesca; Lupi, Renato; Borsani, Massimiliano; Pavesi, Giulio; Castrignanò, Tiziana; Pesole, Graziano; Gissi, Carmela

    2012-01-01

    The MITOchondrial genome database of metaZOAns (MitoZoa) is a public resource for comparative analyses of metazoan mitochondrial genomes (mtDNA) at both the sequence and genomic organizational levels. The main characteristics of the MitoZoa database are the careful revision of mtDNA entry annotations and the possibility of retrieving gene order and non-coding region (NCR) data in appropriate formats. The MitoZoa retrieval system enables basic and complex queries at various taxonomic levels using different search menus. MitoZoa 2.0 has been enhanced in several aspects, including: a re-annotation pipeline to check the correctness of protein-coding gene predictions; a standardized annotation of introns and of precursor ORFs whose functionality is post-transcriptionally recovered by RNA editing or programmed translational frameshifting; updates of taxon-related fields and a BLAST sequence similarity search tool. Database novelties and the definition of standard mtDNA annotation rules, together with the user-friendly retrieval system and the BLAST service, make MitoZoa a valuable resource for comparative and evolutionary analyses as well as a reference database to assist in the annotation of novel mtDNA sequences. MitoZoa is freely accessible at http://www.caspur.it/mitozoa. PMID:22123747

  8. Cas-Database: web-based genome-wide guide RNA library design for gene knockout screens using CRISPR-Cas9

    PubMed Central

    Park, Jeongbin; Kim, Jin-Soo; Bae, Sangsu

    2016-01-01

    Motivation: CRISPR-derived RNA guided endonucleases (RGENs) have been widely used for both gene knockout and knock-in at the level of single or multiple genes. RGENs are now available for forward genetic screens at genome scale, but single guide RNA (sgRNA) selection at this scale is difficult. Results: We develop an online tool, Cas-Database, a genome-wide gRNA library design tool for Cas9 nucleases from Streptococcus pyogenes (SpCas9). With an easy-to-use web interface, Cas-Database allows users to select optimal target sequences simply by changing the filtering conditions. Furthermore, it provides a powerful way to select multiple optimal target sequences from thousands of genes at once for the creation of a genome-wide library. Cas-Database also provides a web application programming interface (web API) for advanced bioinformatics users. Availability and implementation: Free access at http://www.rgenome.net/cas-database/. Contact: sangsubae@hanyang.ac.kr or jskim01@snu.ac.kr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153724

  9. DLGP: A database for lineage-conserved and lineage-specific gene pairs in animal and plant genomes.

    PubMed

    Wang, Dapeng

    2016-01-15

    The conservation of gene organization in the genome with lineage-specificity is an invaluable resource to decipher their potential functionality with diverse selective constraints, especially in higher animals and plants. Gene pairs appear to be the minimal structure for such kind of gene clusters that tend to reside in their preferred locations, representing the distinctive genomic characteristics in single species or a given lineage. Despite gene families having been investigated in a widespread manner, the definition of gene pair families in various taxa still lacks adequate attention. To address this issue, we report DLGP (http://lcgbase.big.ac.cn/DLGP/) that stores the pre-calculated lineage-based gene pairs in currently available 134 animal and plant genomes and inspect them under the same analytical framework, bringing out a set of innovational features. First, the taxonomy or lineage has been classified into four levels such as Kingdom, Phylum, Class and Order. It adopts all-to-all comparison strategy to identify the possible conserved gene pairs in all species for each gene pair in certain species and reckon those that are conserved in over a significant proportion of species in a given lineage (e.g. Primates, Diptera or Poales) as the lineage-conserved gene pairs. Furthermore, it predicts the lineage-specific gene pairs by retaining the above-mentioned lineage-conserved gene pairs that are not conserved in any other lineages. Second, it carries out pairwise comparison for the gene pairs between two compared species and creates the table including all the conserved gene pairs and the image elucidating the conservation degree of gene pairs in chromosomal level. Third, it supplies gene order browser to extend gene pairs to gene clusters, allowing users to view the evolution dynamics in the gene context in an intuitive manner. This database will be able to facilitate the particular comparison between animals and plants, between vertebrates and arthropods, and

  10. Human Mitochondrial Protein Database

    National Institute of Standards and Technology Data Gateway

    SRD 131 Human Mitochondrial Protein Database (Web, free access)   The Human Mitochondrial Protein Database (HMPDb) provides comprehensive data on mitochondrial and human nuclear encoded proteins involved in mitochondrial biogenesis and function. This database consolidates information from SwissProt, LocusLink, Protein Data Bank (PDB), GenBank, Genome Database (GDB), Online Mendelian Inheritance in Man (OMIM), Human Mitochondrial Genome Database (mtDB), MITOMAP, Neuromuscular Disease Center and Human 2-D PAGE Databases. This database is intended as a tool not only to aid in studying the mitochondrion but in studying the associated diseases.

  11. The Rat Genome Database curation tool suite: a set of optimized software tools enabling efficient acquisition, organization, and presentation of biological data

    PubMed Central

    Laulederkind, Stanley J. F.; Shimoyama, Mary; Hayman, G. Thomas; Lowry, Timothy F.; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R.; Wang, Shur-Jen; de Pons, Jeff; Kowalski, George; Liu, Weisong; Rood, Wes; Munzenmaier, Diane H.; Dwinell, Melinda R.; Twigger, Simon N.; Jacob, Howard J.

    2011-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic and genetic data and currently houses over 40 000 rat gene records as well as human and mouse orthologs, 1771 rat and 1911 human quantitative trait loci (QTLs) and 2209 rat strains. Biological information curated for these data objects includes disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components. A suite of tools has been developed to aid curators in acquiring and validating data objects, assigning nomenclature, attaching biological information to objects and making connections among data types. The software used to assign nomenclature, to create and edit objects and to make annotations to the data objects has been specifically designed to make the curation process as fast and efficient as possible. The user interfaces have been adapted to the work routines of the curators, creating a suite of tools that is intuitive and powerful. Database URL: http://rgd.mcw.edu PMID:21321022

  12. Final Technical Report on the Genome Sequence DataBase (GSDB): DE-FG03 95 ER 62062 September 1997-September 1999

    SciTech Connect

    Harger, Carol A.

    1999-10-28

    Since September 1997 NCGR has produced two web-based tools for researchers to use to access and analyze data in the Genome Sequence DataBase (GSDB). These tools are: Sequence Viewer, a nucleotide sequence and annotation visualization tool, and MAR-Finder, a tool that predicts, base upon statistical inferences, the location of matrix attachment regions (MARS) within a nucleotide sequence. [The annual report for June 1996 to August 1997 is included as an attachment to this final report.

  13. GRASP: analysis of genotype–phenotype results from 1390 genome-wide association studies and corresponding open access database

    PubMed Central

    Leslie, Richard; O’Donnell, Christopher J.; Johnson, Andrew D.

    2014-01-01

    Summary: We created a deeply extracted and annotated database of genome-wide association studies (GWAS) results. GRASP v1.0 contains >6.2 million SNP-phenotype association from among 1390 GWAS studies. We re-annotated GWAS results with 16 annotation sources including some rarely compared to GWAS results (e.g. RNAediting sites, lincRNAs, PTMs). Motivation: To create a high-quality resource to facilitate further use and interpretation of human GWAS results in order to address important scientific questions. Results: GWAS have grown exponentially, with increases in sample sizes and markers tested, and continuing bias toward European ancestry samples. GRASP contains >100 000 phenotypes, roughly: eQTLs (71.5%), metabolite QTLs (21.2%), methylation QTLs (4.4%) and diseases, biomarkers and other traits (2.8%). cis-eQTLs, meQTLs, mQTLs and MHC region SNPs are highly enriched among significant results. After removing these categories, GRASP still contains a greater proportion of studies and results than comparable GWAS catalogs. Cardiovascular disease and related risk factors pre-dominate remaining GWAS results, followed by immunological, neurological and cancer traits. Significant results in GWAS display a highly gene-centric tendency. Sex chromosome X (OR = 0.18[0.16-0.20]) and Y (OR = 0.003[0.001-0.01]) genes are depleted for GWAS results. Gene length is correlated with GWAS results at nominal significance (P ≤ 0.05) levels. We show this gene-length correlation decays at increasingly more stringent P-value thresholds. Potential pleotropic genes and SNPs enriched for multi-phenotype association in GWAS are identified. However, we note possible population stratification at some of these loci. Finally, via re-annotation we identify compelling functional hypotheses at GWAS loci, in some cases unrealized in studies to date. Conclusion: Pooling summary-level GWAS results and re-annotating with bioinformatics predictions and molecular features provides a good platform for new

  14. New species of Aspergillus producing sterigmatocystin.

    PubMed Central

    Rabie, C J; Steyn, M; van Schalkwyk, G C

    1977-01-01

    A number of species belonging to the genus Aspergillus were evaluated for their toxicity to ducklings and the ability to produce sterigmatocystin. Three new species capable of producing sterigmatocystin were found, namely, Aspergillus aurantio-brunneus, Aspergillus quadrilineatus, and Aspergillus ustus. All three were toxic to ducklings. The production of sterigmatocystin by Aspergillus rugulosus was confirmed, and the toxicity of Aspergillus stellatus and Aspergillus multicolor is described. PMID:406838

  15. Development in Aspergillus

    PubMed Central

    Krijgsheld, P.; Bleichrodt, R.; van Veluw, G.J.; Wang, F.; Müller, W.H.; Dijksterhuis, J.; Wösten, H.A.B.

    2013-01-01

    The genus Aspergillus represents a diverse group of fungi that are among the most abundant fungi in the world. Germination of a spore can lead to a vegetative mycelium that colonizes a substrate. The hyphae within the mycelium are highly heterogeneous with respect to gene expression, growth, and secretion. Aspergilli can reproduce both asexually and sexually. To this end, conidiophores and ascocarps are produced that form conidia and ascospores, respectively. This review describes the molecular mechanisms underlying growth and development of Aspergillus. PMID:23450714

  16. Genomics and Health Impact Update

    MedlinePlus

    ... Genomics in Practice Newborn Screening Pharmacogenomics Reproductive Health Tools and Databases About the Genomics & Health Impact Update The Office of Public Health Genomics provides updated and credible ...

  17. An integrated database of genes responsive to the Myc oncogenic transcription factor: identification of direct genomic targets

    PubMed Central

    Zeller, Karen I; Jegga, Anil G; Aronow, Bruce J; O'Donnell, Kathryn A; Dang, Chi V

    2003-01-01

    We report a database of genes responsive to the Myc oncogenic transcription factor. The database Myc Target Gene prioritizes candidate target genes according to experimental evidence and clusters responsive genes into functional groups. We coupled the prioritization of target genes with phylogenetic sequence comparisons to predict c-Myc target binding sites, which are in turn validated by chromatin immunoprecipitation assays. This database is essential for the understanding of the genetic regulatory networks underlying the genesis of cancers. PMID:14519204

  18. KEGG orthology-based annotation of the predicted proteome of Acropora digitifera: ZoophyteBase - an open access and searchable database of a coral genome

    PubMed Central

    2013-01-01

    Background Contemporary coral reef research has firmly established that a genomic approach is urgently needed to better understand the effects of anthropogenic environmental stress and global climate change on coral holobiont interactions. Here we present KEGG orthology-based annotation of the complete genome sequence of the scleractinian coral Acropora digitifera and provide the first comprehensive view of the genome of a reef-building coral by applying advanced bioinformatics. Description Sequences from the KEGG database of protein function were used to construct hidden Markov models. These models were used to search the predicted proteome of A. digitifera to establish complete genomic annotation. The annotated dataset is published in ZoophyteBase, an open access format with different options for searching the data. A particularly useful feature is the ability to use a Google-like search engine that links query words to protein attributes. We present features of the annotation that underpin the molecular structure of key processes of coral physiology that include (1) regulatory proteins of symbiosis, (2) planula and early developmental proteins, (3) neural messengers, receptors and sensory proteins, (4) calcification and Ca2+-signalling proteins, (5) plant-derived proteins, (6) proteins of nitrogen metabolism, (7) DNA repair proteins, (8) stress response proteins, (9) antioxidant and redox-protective proteins, (10) proteins of cellular apoptosis, (11) microbial symbioses and pathogenicity proteins, (12) proteins of viral pathogenicity, (13) toxins and venom, (14) proteins of the chemical defensome and (15) coral epigenetics. Conclusions We advocate that providing annotation in an open-access searchable database available to the public domain will give an unprecedented foundation to interrogate the fundamental molecular structure and interactions of coral symbiosis and allow critical questions to be addressed at the genomic level based on combined aspects of

  19. A novel non-thermostable deuterolysin from Aspergillus oryzae.

    PubMed

    Maeda, Hiroshi; Katase, Toru; Sakai, Daisuke; Takeuchi, Michio; Kusumoto, Ken-Ichi; Amano, Hitoshi; Ishida, Hiroki; Abe, Keietsu; Yamagata, Youhei

    2016-09-01

    Three putative deuterolysin (EC 3.4.24.29) genes (deuA, deuB, and deuC) were found in the Aspergillus oryzae genome database ( http://www.bio.nite.go.jp/dogan/project/view/AO ). One of these genes, deuA, was corresponding to NpII gene, previously reported. DeuA and DeuB were overexpressed by recombinant A. oryzae and were purified. The degradation profiles against protein substrates of both enzymes were similar, but DeuB showed wider substrate specificity against peptidyl MCA-substrates compared with DeuA. Enzymatic profiles of DeuB except for thermostability also resembled those of DeuA. DeuB was inactivated by heat treatment above 80° C, different from thermostable DeuA. Transcription analysis in wild type A. oryzae showed only deuB was expressed in liquid culture, and the addition of the proteinous substrate upregulated the transcription. Furthermore, the NaNO3 addition seems to eliminate the effect of proteinous substrate for the transcription of deuB. PMID:27050120

  20. DOR – a Database of Olfactory Receptors – Integrated Repository for Sequence and Secondary Structural Information of Olfactory Receptors in Selected Eukaryotic Genomes

    PubMed Central

    Nagarathnam, Balasubramanian; Karpe, Snehal D; Harini, Krishnan; Sankar, Kannan; Iftekhar, Mohammed; Rajesh, Durairaj; Giji, Sadasivam; Archunan, Govidaraju; Balakrishnan, Veluchamy; Gromiha, M Michael; Nemoto, Wataru; Fukui, Kazhuhiko; Sowdhamini, Ramanathan

    2014-01-01

    Olfaction is the response to odors and is mediated by a class of membrane-bound proteins called olfactory receptors (ORs). An understanding of these receptors serves as a good model for basic signal transduction mechanisms and also provides important clues for the strategies adopted by organisms for their ultimate survival using chemosensory perception in search of food or defense against predators. Prior research on cross-genome phylogenetic analyses from our group motivated the addressal of conserved evolutionary trends, clustering, and ortholog prediction of ORs. The database of olfactory receptors (DOR) is a repository that provides sequence and structural information on ORs of selected organisms (such as Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Mus musculus, and Homo sapiens). Users can download OR sequences, study predicted membrane topology, and obtain cross-genome sequence alignments and phylogeny, including three-dimensional (3D) structural models of 100 selected ORs and their predicted dimer interfaces. The database can be accessed from http://caps.ncbs.res.in/DOR. Such a database should be helpful in designing experiments on point mutations to probe into the possible dimerization modes of ORs and to even understand the evolutionary changes between different receptors. PMID:25002814

  1. New taxa in Aspergillus section Usti

    PubMed Central

    Samson, R.A.; Varga, J.; Meijer, M.; Frisvad, J.C.

    2011-01-01

    Based on phylogenetic analysis of sequence data, Aspergillus section Usti includes 21 species, inclucing two teleomorphic species Aspergillus heterothallicus (= Emericella heterothallica) and Fennellia monodii. Aspergillus germanicus sp. nov. was isolated from indoor air in Germany. This species has identical ITS sequences with A. insuetus CBS 119.27, but is clearly distinct from that species based on β-tubulin and calmodulin sequence data. This species is unable to grow at 37 °C, similarly to A. keveii and A. insuetus. Aspergillus carlsbadensis sp. nov. was isolated from the Carlsbad Caverns National Park in New Mexico. This taxon is related to, but distinct from a clade including A. calidoustus, A. pseudodeflectus, A. insuetus and A. keveii on all trees. This species is also unable to grow at 37 °C, and acid production was not observed on CREA. Aspergillus californicus sp. nov. is proposed for an isolate from chamise chaparral (Adenostoma fasciculatum) in California. It is related to a clade including A. subsessilis and A. kassunensis on all trees. This species grew well at 37 °C, and acid production was not observed on CREA. The strain CBS 504.65 from soil in Turkey showed to be clearly distinct from the A. deflectus ex-type strain, indicating that this isolate represents a distinct species in this section. We propose the name A. turkensis sp. nov. for this taxon. This species grew, although rather restrictedly at 37 °C, and acid production was not observed on CREA. Isolates from stored maize, South Africa, as a culture contaminant of Bipolaris sorokiniana from indoor air in Finland proved to be related to, but different from A. ustus and A. puniceus. The taxon is proposed as the new species A. pseudoustus. Although supported only by low bootstrap values, F. monodii was found to belong to section Usti based on phylogenetic analysis of either loci BLAST searches to the GenBank database also resulted in closest hits from section Usti. This species obviously

  2. Plant and Crop Databases

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Databases have become an integral part of all aspects of biological research, including basic and applied plant biology. The importance of databases continues to increase as the volume of data from direct and indirect genomics approaches expands. What is not always obvious to users of databases is t...

  3. Identification of alternative splice variants in Aspergillus flavus through comparison of multiple tandem MS search algorithms

    PubMed Central

    2011-01-01

    Background Database searching is the most frequently used approach for automated peptide assignment and protein inference of tandem mass spectra. The results, however, depend on the sequences in target databases and on search algorithms. Recently by using an alternative splicing database, we identified more proteins than with the annotated proteins in Aspergillus flavus. In this study, we aimed at finding a greater number of eligible splice variants based on newly available transcript sequences and the latest genome annotation. The improved database was then used to compare four search algorithms: Mascot, OMSSA, X! Tandem, and InsPecT. Results The updated alternative splicing database predicted 15833 putative protein variants, 61% more than the previous results. There was transcript evidence for 50% of the updated genes compared to the previous 35% coverage. Database searches were conducted using the same set of spectral data, search parameters, and protein database but with different algorithms. The false discovery rates of the peptide-spectrum matches were estimated < 2%. The numbers of the total identified proteins varied from 765 to 867 between algorithms. Whereas 42% (1651/3891) of peptide assignments were unanimous, the comparison showed that 51% (568/1114) of the RefSeq proteins and 15% (11/72) of the putative splice variants were inferred by all algorithms. 12 plausible isoforms were discovered by focusing on the consensus peptides which were detected by at least three different algorithms. The analysis found different conserved domains in two putative isoforms of UDP-galactose 4-epimerase. Conclusions We were able to detect dozens of new peptides using the improved alternative splicing database with the recently updated annotation of the A. flavus genome. Unlike the identifications of the peptides and the RefSeq proteins, large variations existed between the putative splice variants identified by different algorithms. 12 candidates of putative isoforms

  4. Pituitary aspergillus infection.

    PubMed

    Moore, Lauren A; Erstine, Emily M; Prayson, Richard A

    2016-07-01

    Fungal infection should be considered in the differential diagnosis of a pituitary or sellar mass, albeit fungal infections involving the pituitary gland and sella are a rare occurrence. We report a case of Aspergillus infection involving the pituitary gland and sellar region discovered in a 74-year-old man. The patient had a history of hypertension, chronic renal disease, autoimmune hemolytic anemia and presented with right eye pain, headaches and worsening hemiparesis. Imaging studies revealed a right internal carotid artery occlusion and an acute right pontine stroke along with smaller infarcts in the right middle cerebral artery distribution. Clinically, the patient was thought to have vasculitis. An infectious etiology was not identified. He developed respiratory distress and died. At autopsy, necrotizing meningitis was discovered. A predominantly chronic inflammatory cell infiltrate consisting of benign-appearing lymphocytes, plasma cells and macrophages was accompanied by acute angle branching, angioinvasive hyphae which were highlighted on Gomori methenamine silver staining and were morphologically consistent with Aspergillus species. In previously reported cases of Aspergillus infection involving the pituitary or sella, most presented with headaches or impaired vision and were not immunocompromised. A transsphenoidal surgical approach is recommended in suspected cases in order to minimize the risk of dissemination of the infection. Some patients have responded well to antifungal medications once diagnosed. PMID:26896907

  5. Enhancing a Pathway-Genome Database (PGDB) to Capture Subcellular Localization of Metabolites and Enzymes: The Nucleotide-Sugar Biosynthetic Pathways of Populus trichocarpa

    SciTech Connect

    Nag, A.; Karpinets, T. V.; Chang, C. H.; Bar-Peled, M.

    2012-01-01

    Understanding how cellular metabolism works and is regulated requires that the underlying biochemical pathways be adequately represented and integrated with large metabolomic data sets to establish a robust network model. Genetically engineering energy crops to be less recalcitrant to saccharification requires detailed knowledge of plant polysaccharide structures and a thorough understanding of the metabolic pathways involved in forming and regulating cell-wall synthesis. Nucleotide-sugars are building blocks for synthesis of cell wall polysaccharides. The biosynthesis of nucleotide-sugars is catalyzed by a multitude of enzymes that reside in different subcellular organelles, and precise representation of these pathways requires accurate capture of this biological compartmentalization. The lack of simple localization cues in genomic sequence data and annotations however leads to missing compartmentalization information for eukaryotes in automatically generated databases, such as the Pathway-Genome Databases (PGDBs) of the SRI Pathway Tools software that drives much biochemical knowledge representation on the internet. In this report, we provide an informal mechanism using the existing Pathway Tools framework to integrate protein and metabolite sub-cellular localization data with the existing representation of the nucleotide-sugar metabolic pathways in a prototype PGDB for Populus trichocarpa. The enhanced pathway representations have been successfully used to map SNP abundance data to individual nucleotide-sugar biosynthetic genes in the PGDB. The manually curated pathway representations are more conducive to the construction of a computational platform that will allow the simulation of natural and engineered nucleotide-sugar precursor fluxes into specific recalcitrant polysaccharide(s).

  6. What's My Substrate? Computational Function Assignment of Candida parapsilosis ADH5 by Genome Database Search, Virtual Screening, and QM/MM Calculations.

    PubMed

    Dhoke, Gaurao V; Ensari, Yunus; Davari, Mehdi D; Ruff, Anna Joëlle; Schwaneberg, Ulrich; Bocola, Marco

    2016-07-25

    Zinc-dependent medium chain reductase from Candida parapsilosis can be used in the reduction of carbonyl compounds to pharmacologically important chiral secondary alcohols. To date, the nomenclature of cpADH5 is differing (CPCR2/RCR/SADH) in the literature, and its natural substrate is not known. In this study, we utilized a substrate docking based virtual screening method combined with KEGG, MetaCyc pathway, and Candida genome databases search for the discovery of natural substrates of cpADH5. The virtual screening of 7834 carbonyl compounds from the ZINC database provided 94 aldehydes or methyl/ethyl ketones as putative carbonyl substrates. Out of which, 52 carbonyl substrates of cpADH5 with catalytically active docking pose were identified by employing mechanism based substrate docking protocol. Comparison of the virtual screening results with KEGG, MetaCyc database search, and Candida genome pathway analysis suggest that cpADH5 might be involved in the Ehrlich pathway (reduction of fusel aldehydes in leucine, isoleucine, and valine degradation). Our QM/MM calculations and experimental activity measurements affirmed that butyraldehyde substrates are the potential natural substrates of cpADH5, suggesting a carbonyl reductase role for this enzyme in butyraldehyde reduction in aliphatic amino acid degradation pathways. Phylogenetic tree analysis of known ADHs from Candida albicans shows that cpADH5 is close to caADH5. We therefore propose, according to the experimental substrate identification and sequence similarity, the common name butyraldehyde dehydrogenase cpADH5 for Candida parapsilosis CPCR2/RCR/SADH. PMID:27387009

  7. SpinachDB: A Well-Characterized Genomic Database for Gene Family Classification and SNP Information of Spinach.

    PubMed

    Yang, Xue-Dong; Tan, Hua-Wei; Zhu, Wei-Min

    2016-01-01

    Spinach (Spinacia oleracea L.), which originated in central and western Asia, belongs to the family Amaranthaceae. Spinach is one of most important leafy vegetables with a high nutritional value as well as being a perfect research material for plant sex chromosome models. As the completion of genome assembly and gene prediction of spinach, we developed SpinachDB (http://222.73.98.124/spinachdb) to store, annotate, mine and analyze genomics and genetics datasets efficiently. In this study, all of 21702 spinach genes were annotated. A total of 15741 spinach genes were catalogued into 4351 families, including identification of a substantial number of transcription factors. To construct a high-density genetic map, a total of 131592 SSRs and 1125743 potential SNPs located in 548801 loci of spinach genome were identified in 11 cultivated and wild spinach cultivars. The expression profiles were also performed with RNA-seq data using the FPKM method, which could be used to compare the genes. Paralogs in spinach and the orthologous genes in Arabidopsis, grape, sugar beet and rice were identified for comparative genome analysis. Finally, the SpinachDB website contains seven main sections, including the homepage; the GBrowse map that integrates genome, genes, SSR and SNP marker information; the Blast alignment service; the gene family classification search tool; the orthologous and paralogous gene pairs search tool; and the download and useful contact information. SpinachDB will be continually expanded to include newly generated robust genomics and genetics data sets along with the associated data mining and analysis tools. PMID:27148975

  8. SpinachDB: A Well-Characterized Genomic Database for Gene Family Classification and SNP Information of Spinach

    PubMed Central

    Zhu, Wei-Min

    2016-01-01

    Spinach (Spinacia oleracea L.), which originated in central and western Asia, belongs to the family Amaranthaceae. Spinach is one of most important leafy vegetables with a high nutritional value as well as being a perfect research material for plant sex chromosome models. As the completion of genome assembly and gene prediction of spinach, we developed SpinachDB (http://222.73.98.124/spinachdb) to store, annotate, mine and analyze genomics and genetics datasets efficiently. In this study, all of 21702 spinach genes were annotated. A total of 15741 spinach genes were catalogued into 4351 families, including identification of a substantial number of transcription factors. To construct a high-density genetic map, a total of 131592 SSRs and 1125743 potential SNPs located in 548801 loci of spinach genome were identified in 11 cultivated and wild spinach cultivars. The expression profiles were also performed with RNA-seq data using the FPKM method, which could be used to compare the genes. Paralogs in spinach and the orthologous genes in Arabidopsis, grape, sugar beet and rice were identified for comparative genome analysis. Finally, the SpinachDB website contains seven main sections, including the homepage; the GBrowse map that integrates genome, genes, SSR and SNP marker information; the Blast alignment service; the gene family classification search tool; the orthologous and paralogous gene pairs search tool; and the download and useful contact information. SpinachDB will be continually expanded to include newly generated robust genomics and genetics data sets along with the associated data mining and analysis tools. PMID:27148975

  9. Genome-Wide Analysis of Microsatellite Markers Based on Sequenced Database in Chinese Spring Wheat (Triticum aestivum L.)

    PubMed Central

    Tang, Zhaohui; Ren, Yongkang; Li, Yali; Zhang, Dayong; Dong, Yanhui; Zhao, Xinghua

    2015-01-01

    Microsatellites or simple sequence repeats (SSRs) are distributed across both prokaryotic and eukaryotic genomes and have been widely used for genetic studies and molecular marker-assisted breeding in crops. Though an ordered draft sequence of hexaploid bread wheat have been announced, the researches about systemic analysis of SSRs for wheat still have not been reported so far. In the present study, we identified 364,347 SSRs from among 10,603,760 sequences of the Chinese spring wheat (CSW) genome, which were present at a density of 36.68 SSR/Mb. In total, we detected 488 types of motifs ranging from di- to hexanucleotides, among which dinucleotide repeats dominated, accounting for approximately 42.52% of the genome. The density of tri- to hexanucleotide repeats was 24.97%, 4.62%, 3.25% and 24.65%, respectively. AG/CT, AAG/CTT, AGAT/ATCT, AAAAG/CTTTT and AAAATT/AATTTT were the most frequent repeats among di- to hexanucleotide repeats. Among the 21 chromosomes of CSW, the density of repeats was highest on chromosome 2D and lowest on chromosome 3A. The proportions of di-, tri-, tetra-, penta- and hexanucleotide repeats on each chromosome, and even on the whole genome, were almost identical. In addition, 295,267 SSR markers were successfully developed from the 21 chromosomes of CSW, which cover the entire genome at a density of 29.73 per Mb. All of the SSR markers were validated by reverse electronic-Polymerase Chain Reaction (re-PCR); 70,564 (23.9%) were found to be monomorphic and 224,703 (76.1%) were found to be polymorphic. A total of 45 monomorphic markers were selected randomly for validation purposes; 24 (53.3%) amplified one locus, 8 (17.8%) amplified multiple identical loci, and 13 (28.9%) did not amplify any fragments from the genomic DNA of CSW. Then a dendrogram was generated based on the 24 monomorphic SSR markers among 20 wheat cultivars and three species of its diploid ancestors showing that monomorphic SSR markers represented a promising source to

  10. Genome-Wide Analysis of Microsatellite Markers Based on Sequenced Database in Chinese Spring Wheat (Triticum aestivum L.).

    PubMed

    Han, Bin; Wang, Changbiao; Tang, Zhaohui; Ren, Yongkang; Li, Yali; Zhang, Dayong; Dong, Yanhui; Zhao, Xinghua

    2015-01-01

    Microsatellites or simple sequence repeats (SSRs) are distributed across both prokaryotic and eukaryotic genomes and have been widely used for genetic studies and molecular marker-assisted breeding in crops. Though an ordered draft sequence of hexaploid bread wheat have been announced, the researches about systemic analysis of SSRs for wheat still have not been reported so far. In the present study, we identified 364,347 SSRs from among 10,603,760 sequences of the Chinese spring wheat (CSW) genome, which were present at a density of 36.68 SSR/Mb. In total, we detected 488 types of motifs ranging from di- to hexanucleotides, among which dinucleotide repeats dominated, accounting for approximately 42.52% of the genome. The density of tri- to hexanucleotide repeats was 24.97%, 4.62%, 3.25% and 24.65%, respectively. AG/CT, AAG/CTT, AGAT/ATCT, AAAAG/CTTTT and AAAATT/AATTTT were the most frequent repeats among di- to hexanucleotide repeats. Among the 21 chromosomes of CSW, the density of repeats was highest on chromosome 2D and lowest on chromosome 3A. The proportions of di-, tri-, tetra-, penta- and hexanucleotide repeats on each chromosome, and even on the whole genome, were almost identical. In addition, 295,267 SSR markers were successfully developed from the 21 chromosomes of CSW, which cover the entire genome at a density of 29.73 per Mb. All of the SSR markers were validated by reverse electronic-Polymerase Chain Reaction (re-PCR); 70,564 (23.9%) were found to be monomorphic and 224,703 (76.1%) were found to be polymorphic. A total of 45 monomorphic markers were selected randomly for validation purposes; 24 (53.3%) amplified one locus, 8 (17.8%) amplified multiple identical loci, and 13 (28.9%) did not amplify any fragments from the genomic DNA of CSW. Then a dendrogram was generated based on the 24 monomorphic SSR markers among 20 wheat cultivars and three species of its diploid ancestors showing that monomorphic SSR markers represented a promising source to

  11. Aspergillus: a primer for the novice.

    PubMed

    Bennett, J W

    2009-01-01

    Aspergillus is a genus of molds named after the morphological structure that bears asexual spores, the aspergillum, which resembles a liturgical device. This genus contains several species of positive or negative economic importance in industry, agriculture and medicine. The majority of aspergilli, including most species of economic importance, are known to reproduce only by asexual spores. Genome projects have been completed for A. fumigatus, A. nidulans, A. niger and A. oryzae; several other species are also being sequenced. The data from these genome projects have been useful in elucidating aspects of phylogeny, the evolution of sexuality and the extent of secondary metabolite diversity. To date, however, the impact on drug discovery, diagnosis of aspergillosis, and our understanding of fungal pathogenesis has been less pronounced. PMID:19253144

  12. ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases

    PubMed Central

    2014-01-01

    Background Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge. Results We have developed ngs.plot – a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready. Conclusions We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data. PMID:24735413

  13. Metabolomics of Aspergillus fumigatus.

    PubMed

    Frisvad, Jens C; Rank, Christian; Nielsen, Kristian F; Larsen, Thomas O

    2009-01-01

    Aspergillus fumigatus is the most important species in Aspergillus causing infective lung diseases. This species has been reported to produce a large number of extrolites, including secondary metabolites, acids, and proteins such as hydrophobins and extracellular enzymes. At least 226 potentially bioactive secondary metabolites have been reported from A. fumigatus that can be ordered into 24 biosynthetic families. Of these families we have detected representatives from the following families of secondary metabolites: fumigatins, fumigaclavines, fumiquinazolines, trypacidin and monomethylsulochrin, fumagillins, gliotoxins, pseurotins, chloroanthraquinones, fumitremorgins, verruculogen, helvolic acids, and pyripyropenes by HPLC with diode array detection and mass spectrometric detection. There is still doubt whether A. fumigatus can produce tryptoquivalins, but all isolates produce the related fumiquinazolines. We also tentatively detected sphingofungins in A. fumigatus Af293 and in an isolate of A. lentulus. The sphingofungins may have a similar role as the toxic fumonisins, found in A. niger. A further number of mycotoxins, including ochratoxin A, and other secondary metabolites have been reported from A. fumigatus, but in those cases either the fungus or its metabolite appear to be misidentified. PMID:18763205

  14. Previously unknown species of Aspergillus.

    PubMed

    Gautier, M; Normand, A-C; Ranque, S

    2016-08-01

    The use of multi-locus DNA sequence analysis has led to the description of previously unknown 'cryptic' Aspergillus species, whereas classical morphology-based identification of Aspergillus remains limited to the section or species-complex level. The current literature highlights two main features concerning these 'cryptic' Aspergillus species. First, the prevalence of such species in clinical samples is relatively high compared with emergent filamentous fungal taxa such as Mucorales, Scedosporium or Fusarium. Second, it is clearly important to identify these species in the clinical laboratory because of the high frequency of antifungal drug-resistant isolates of such Aspergillus species. Matrix-assisted laser desorption/ionization-time of flight mass spectrometry (MALDI-TOF MS) has recently been shown to enable the identification of filamentous fungi with an accuracy similar to that of DNA sequence-based methods. As MALDI-TOF MS is well suited to the routine clinical laboratory workflow, it facilitates the identification of these 'cryptic' Aspergillus species at the routine mycology bench. The rapid establishment of enhanced filamentous fungi identification facilities will lead to a better understanding of the epidemiology and clinical importance of these emerging Aspergillus species. Based on routine MALDI-TOF MS-based identification results, we provide original insights into the key interpretation issues of a positive Aspergillus culture from a clinical sample. Which ubiquitous species that are frequently isolated from air samples are rarely involved in human invasive disease? Can both the species and the type of biological sample indicate Aspergillus carriage, colonization or infection in a patient? Highly accurate routine filamentous fungi identification is central to enhance the understanding of these previously unknown Aspergillus species, with a vital impact on further improved patient care. PMID:27263029

  15. Characteristic clinical features of Aspergillus appendicitis: Case report and literature review

    PubMed Central

    Gjeorgjievski, Mihajlo; Amin, Mitual B; Cappell, Mitchell S

    2015-01-01

    This work aims to facilitate diagnosing Aspergillus appendicitis, which can be missed clinically due to its rarity, by proposing a clinical pentad for Aspergillus appendicitis based on literature review and one new case. The currently reported case of pathologically-proven Aspergillus appendicitis was identified by computerized search of pathology database at William Beaumont Hospital, 1999-2014. Prior cases were identified by computerized literature search. Among 10980 pathology reports of pathologically-proven appendicitis, one case of Aspergillus appendicitis was identified (rate = 0.01%). A young boy with profound neutropenia, recent chemotherapy, and acute myelogenous leukemia presented with right lower quadrant pain, pyrexia, and generalized malaise. Abdominal computed tomography scan showed a thickened appendiceal wall and periappendiceal inflammation, suggesting appendicitis. Emergent laparotomy showed an inflamed, thickened appendix, which was resected. The patient did poorly postoperatively with low-grade-fevers while receiving antibacterial therapy, but rapidly improved after initiating amphotericin therapy. Microscopic examination of a silver stain of the appendectomy specimen revealed fungi with characteristic Aspergillus morphology, findings confirmed by immunohistochemistry. Primary Aspergillus appendicitis is exceptionally rare, with only 3 previously reported cases. All three cases presented with (1)-neutropenia, (2)-recent chemotherapy, (3)-acute leukemia, and (4)-suspected appendicitis; (5)-the two prior cases initially treated with antibacterial therapy, fared poorly before instituting anti-Aspergillus therapy. The current patient satisfied all these five criteria. Based on these four cases, a clinical pentad is proposed for Aspergillus appendicitis: clinically-suspected appendicitis, neutropenia, recent chemotherapy, acute leukemia, and poor clinical response if treated solely by antibacterial/anti-candidial therapy. Patients presenting with

  16. Legal aspects of genetic databases for international biomedical research: the example of the International Cancer Genome Consortium (ICGC).

    PubMed

    Romeo-Casabona, Carlos; Nicolás, Pilar; Knoppers, Bartha Maria; Joly, Yann; Wallace, Susan E; Chalmers, Don; Dyke, Stephanie; Kennedy, Karen; Troncoso, Antonio; Kaan, Terry; Rial-Sebbag, Emmanuelle

    2012-01-01

    There is a noticeable lack of international regulation on personal data exchange and management in research. This article sheds light in this area by describing how the International Cancer Genome Consortium is developing policies and procedures to address the ethical and legal issues raised by the international transfer of data and results. These policies and procedures aim, first and most importantly, to safeguard the interests of the research participants and other involved stakeholders and, secondly, to facilitate the sharing of data and results to realize greater benefits from this kind of internationally collaborative genetic research. PMID:23520913

  17. Significant variance in genetic diversity among populations of Schistosoma haematobium detected using microsatellite DNA loci from a genome-wide database

    PubMed Central

    2013-01-01

    Background Urogenital schistosomiasis caused by Schistosoma haematobium is widely distributed across Africa and is increasingly being targeted for control. Genome sequences and population genetic parameters can give insight into the potential for population- or species-level drug resistance. Microsatellite DNA loci are genetic markers in wide use by Schistosoma researchers, but there are few primers available for S. haematobium. Methods We sequenced 1,058,114 random DNA fragments from clonal cercariae collected from a snail infected with a single Schistosoma haematobium miracidium. We assembled and aligned the S. haematobium sequences to the genomes of S. mansoni and S. japonicum, identifying microsatellite DNA loci across all three species and designing primers to amplify the loci in S. haematobium. To validate our primers, we screened 32 randomly selected primer pairs with population samples of S. haematobium. Results We designed >13,790 primer pairs to amplify unique microsatellite loci in S. haematobium, (available at http://www.cebio.org/projetos/schistosoma-haematobium-genome). The three Schistosoma genomes contained similar overall frequencies of microsatellites, but the frequency and length distributions of specific motifs differed among species. We identified 15 primer pairs that amplified consistently and were easily scored. We genotyped these 15 loci in S. haematobium individuals from six locations: Zanzibar had the highest levels of diversity; Malawi, Mauritius, Nigeria, and Senegal were nearly as diverse; but the sample from South Africa was much less diverse. Conclusions About half of the primers in the database of Schistosoma haematobium microsatellite DNA loci should yield amplifiable and easily scored polymorphic markers, thus providing thousands of potential markers. Sequence conservation among S. haematobium, S. japonicum, and S. mansoni is relatively high, thus it should now be possible to identify markers that are universal among Schistosoma

  18. Characterization of Expressed Sequence Tag-Derived Simple Sequence Repeat Markers for Aspergillus flavus: Emphasis on Variability of Isolates from the Southern United States

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Simple Sequence Repeat (SSR) markers were developed from Aspergillus flavus expressed sequence tag (EST) database to conduct an analysis of genetic relationships of Aspergillus isolates from numerous host species and geographical regions, but primarily from the United States. Twenty-nine primers wer...

  19. Tremorgenic Mycotoxins from Aspergillus Caespitosus

    PubMed Central

    Schroeder, H. W.; Cole, R. J.; Hein, H.; Kirksey, J. W.

    1975-01-01

    Two tremorgenic mycotoxins were isolated from Aspergillus caespitosus, and identified as verruculogen and fumitremorgin B. They were produced at the rate of 172 and 325 mg per kg, respectively, on autoclaved cracked field corn. PMID:1155935

  20. Tremorgenic mycotoxins from Aspergillus caespitosus.

    PubMed

    Schroeder, H W; Cole, R J; Hein, H; Kirksey, J W

    1975-06-01

    Two tremorgenic mycotoxins were isolated from Aspergillus caespitosus, and identified as verruculogen and fumitremorgin B. They were produced at the rate of 172 and 325 mg per kg, respectively, on autoclaved cracked field corn. PMID:1155935

  1. Aspergillus fumigatus in Poultry

    PubMed Central

    Arné, Pascal; Thierry, Simon; Wang, Dongying; Deville, Manjula; Le Loc'h, Guillaume; Desoutter, Anaïs; Féménia, Françoise; Nieguitsila, Adélaïde; Huang, Weiyi; Chermette, René; Guillot, Jacques

    2011-01-01

    Aspergillus fumigatus remains a major respiratory pathogen in birds. In poultry, infection by A. fumigatus may induce significant economic losses particularly in turkey production. A. fumigatus develops and sporulates easily in poor quality bedding or contaminated feedstuffs in indoor farm environments. Inadequate ventilation and dusty conditions increase the risk of bird exposure to aerosolized spores. Acute cases are seen in young animals following inhalation of spores, causing high morbidity and mortality. The chronic form affects older birds and looks more sporadic. The respiratory tract is the primary site of A. fumigatus development leading to severe respiratory distress and associated granulomatous airsacculitis and pneumonia. Treatments for infected poultry are nonexistent; therefore, prevention is the only way to protect poultry. Development of avian models of aspergillosis may improve our understanding of its pathogenesis, which remains poorly understood. PMID:21826144

  2. Cryptic Aspergillus nidulans Antimicrobials▿

    PubMed Central

    Giles, Steve S.; Soukup, Alexandra A.; Lauer, Carrie; Shaaban, Mona; Lin, Alexander; Oakley, Berl R.; Wang, Clay C. C.; Keller, Nancy P.

    2011-01-01

    Secondary metabolite (SM) production by fungi is hypothesized to provide some fitness attribute for the producing organisms. However, most SM clusters are “silent” when fungi are grown in traditional laboratory settings, and it is difficult to ascertain any function or activity of these SM cluster products. Recently, the creation of a chromatin remodeling mutant in Aspergillus nidulans induced activation of several cryptic SM gene clusters. Systematic testing of nine purified metabolites from this mutant identified an emodin derivate with efficacy against both human fungal pathogens (inhibiting both spore germination and hyphal growth) and several bacteria. The ability of catalase to diminish this antimicrobial activity implicates reactive oxygen species generation, specifically, the generation of hydrogen peroxide, as the mechanism of emodin hydroxyl activity. PMID:21478304

  3. Update on antifungal drug resistance mechanisms of Aspergillus fumigatus.

    PubMed

    Chamilos, G; Kontoyiannis, D P

    2005-12-01

    Although the arsenal of agents with anti-Aspergillus activity has expanded over the last decade, mortality due to invasive aspergillosis (IA) remains unacceptably high. Aspergillus fumigatus still accounts for the majority of cases of IA; however less susceptible to antifungals non-fumigatus aspergilli began to emerge. Antifungal drug resistance of Aspergillus might partially account for treatment failures. Recent advances in our understanding of mechanisms of antifungal drug action in Aspergillus, along with the standardization of in vitro susceptibility testing methods, has brought resistance testing to the forefront of clinical mycology. In addition, molecular biology has started to shed light on the mechanisms of resistance of A. fumigatus to azoles and the echinocandins, while genome-based assays show promise for high-throughput screening for genotypic antifungal resistance. Several problems remain, however, in the study of this complex area. Large multicenter clinical studies--point prevalence or longitudinal--to capture the incidence and prevalence of antifungal resistance in A. fumigatus isolates are lacking. Correlation of in vitro susceptibility with clinical outcome and susceptibility breakpoints has not been established. In addition, the issue of cross-resistance between the newer triazoles is of concern. Furthermore, in vitro resistance testing for polyenes and echinocandins is difficult, and their mechanisms of resistance are largely unknown. This review examines challenges in the diagnosis, epidemiology, and mechanisms of antifungal drug resistance in A. fumigatus. PMID:16488654

  4. Aspergillus mulundensis sp. nov., a new species for the fungus producing the antifungal echinocandin lipopeptides, mulundocandins.

    PubMed

    Bills, Gerald F; Yue, Qun; Chen, Li; Li, Yan; An, Zhiqiang; Frisvad, Jens C

    2016-03-01

    The invalidly published name Aspergillus sydowii var. mulundensis was proposed for a strain of Aspergillus that produced new echinocandin metabolites designated as the mulundocadins. Reinvestigation of this strain (Y-30462=DSMZ 5745) using phylogenetic, morphological, and metabolic data indicated that it is a distinct and novel species of Aspergillus sect. Nidulantes. The taxonomic novelty, Aspergillus mulundensis, is introduced for this historically important echinocandin-producing strain. The closely related A. nidulans FGSC A4 has one of the most extensively characterized secondary metabolomes of any filamentous fungus. Comparison of the full-genome sequences of DSMZ 5745 and FGSC A4 indicated that the two strains share 33 secondary metabolite biosynthetic gene clusters. These shared gene clusters represent ~45% of the total secondary metabolome of each strain, thus indicating a high level intraspecific divergence in terms of secondary metabolism. PMID:26464011

  5. RegTransBase - A Database Of Regulatory Sequences and Interactionsin a Wide Range of Prokaryotic Genomes

    SciTech Connect

    Kazakov, Alexei E.; Cipriano, Michael J.; Novichkov, Pavel S.; Minovitsky, Simon; Vinogradov, Dmitry V.; Arkin, Adam; Mironov, AndreyA.; Gelfand, Mikhail S.; Dubchak, Inna

    2006-07-01

    RegTransBase, a manually curated database of regulatoryinteractions in prokaryotes, captures the knowledge in publishedscientific literature using a controlled vocabulary. Although a number ofdatabases describing interactions between regulatory proteins and theirbinding sites are currently being maintained, they focus mostly on themodel organisms Escherichia coli and Bacillus subtilis, or are entirelycomputationally derived. RegTransBase describes a large number ofregulatory interactions reported in many organisms and contains varioustypes of experimental data, in particular: the activation or repressionof transcription by an identified direct regulator; determining thetranscriptional regulatory function of a protein (or RNA) directlybinding to DNA (RNA); mapping or prediction of binding site for aregulatory protein; characterization of regulatory mutations. Currently,the RegTransBase content is derived from about 3000 relevant articlesdescribing over 7000 experiments in relation to 128 microbes. It containsdata on the regulation of about 7500 genes and evidence for 6500interactions with 650 regulators. RegTransBase also contains manuallycreated position weight matrices (PWM) that can be used to identifycandidate regulatory sites in over 60 species. RegTransBase is availableat http://regtransbase.lbl.gov.

  6. GWASdb v2: an update database for human genetic variants identified by genome-wide association studies

    PubMed Central

    Li, Mulin Jun; Liu, Zipeng; Wang, Panwen; Wong, Maria P.; Nelson, Matthew R.; Kocher, Jean-Pierre A.; Yeager, Meredith; Sham, Pak Chung; Chanock, Stephen J.; Xia, Zhengyuan; Wang, Junwen

    2016-01-01

    Genome-wide association studies (GWASs), now as a routine approach to study single-nucleotide polymorphism (SNP)-trait association, have uncovered over ten thousand significant trait/disease associated SNPs (TASs). Here, we updated GWASdb (GWASdb v2, http://jjwanglab.org/gwasdb) which provides comprehensive data curation and knowledge integration for GWAS TASs. These updates include: (i) Up to August 2015, we collected 2479 unique publications from PubMed and other resources; (ii) We further curated moderate SNP-trait associations (P-value < 1.0×10−3) from each original publication, and generated a total of 252 530 unique TASs in all GWASdb v2 collected studies; (iii) We manually mapped 1610 GWAS traits to 501 Human Phenotype Ontology (HPO) terms, 435 Disease Ontology (DO) terms and 228 Disease Ontology Lite (DOLite) terms. For each ontology term, we also predicted the putative causal genes; (iv) We curated the detailed sub-populations and related sample size for each study; (v) Importantly, we performed extensive function annotation for each TAS by incorporating gene-based information, ENCODE ChIP-seq assays, eQTL, population haplotype, functional prediction across multiple biological domains, evolutionary signals and disease-related annotation; (vi) Additionally, we compiled a SNP-drug response association dataset for 650 pharmacogenetic studies involving 257 drugs in this update; (vii) Last, we improved the user interface of website. PMID:26615194

  7. Canadian Open Genetics Repository (COGR): a unified clinical genomics database as a community resource for standardising and sharing genetic interpretations

    PubMed Central

    Lerner-Ellis, Jordan; Wang, Marina; White, Shana; Lebo, Matthew S

    2015-01-01

    Background The Canadian Open Genetics Repository is a collaborative effort for the collection, storage, sharing and robust analysis of variants reported by medical diagnostics laboratories across Canada. As clinical laboratories adopt modern genomics technologies, the need for this type of collaborative framework is increasingly important. Methods A survey to assess existing protocols for variant classification and reporting was delivered to clinical genetics laboratories across Canada. Based on feedback from this survey, a variant assessment tool was made available to all laboratories. Each participating laboratory was provided with an instance of GeneInsight, a software featuring versioning and approval processes for variant assessments and interpretations and allowing for variant data to be shared between instances. Guidelines were established for sharing data among clinical laboratories and in the final outreach phase, data will be made readily available to patient advocacy groups for general use. Results The survey demonstrated the need for improved standardisation and data sharing across the country. A variant assessment template was made available to the community to aid with standardisation. Instances of the GeneInsight tool were provided to clinical diagnostic laboratories across Canada for the purpose of uploading, transferring, accessing and sharing variant data. Conclusions As an ongoing endeavour and a permanent resource, the Canadian Open Genetics Repository aims to serve as a focal point for the collaboration of Canadian laboratories with other countries in the development of tools that take full advantage of laboratory data in diagnosing, managing and treating genetic diseases. PMID:25904639

  8. Gene Expression Profiling and Identification of Resistance Genes to Aspergillus flavus Infection in Peanut through EST and Microarray Strategies

    PubMed Central

    Guo, Baozhu; Fedorova, Natalie D.; Chen, Xiaoping; Wan, Chun-Hua; Wang, Wei; Nierman, William C.; Bhatnagar, Deepak; Yu, Jiujiang

    2011-01-01

    Aspergillus flavus and A. parasiticus infect peanut seeds and produce aflatoxins, which are associated with various diseases in domestic animals and humans throughout the world. The most cost-effective strategy to minimize aflatoxin contamination involves the development of peanut cultivars that are resistant to fungal infection and/or aflatoxin production. To identify peanut Aspergillus-interactive and peanut Aspergillus-resistance genes, we carried out a large scale peanut Expressed Sequence Tag (EST) project which we used to construct a peanut glass slide oligonucleotide microarray. The fabricated microarray represents over 40% of the protein coding genes in the peanut genome. For expression profiling, resistant and susceptible peanut cultivars were infected with a mixture of Aspergillus flavus and parasiticus spores. The subsequent microarray analysis identified 62 genes in resistant cultivars that were up-expressed in response to Aspergillus infection. In addition, we identified 22 putative Aspergillus-resistance genes that were constitutively up-expressed in the resistant cultivar in comparison to the susceptible cultivar. Some of these genes were homologous to peanut, corn, and soybean genes that were previously shown to confer resistance to fungal infection. This study is a first step towards a comprehensive genome-scale platform for developing Aspergillus-resistant peanut cultivars through targeted marker-assisted breeding and genetic engineering. PMID:22069737

  9. MDP, a database linking drug response data to genomic information, identifies dasatinib and statins as a combinatorial strategy to inhibit YAP/TAZ in cancer cells

    PubMed Central

    Zannini, Alessandro; Caroli, Jimmy; Beneventano, Domenico; Anderlucci, Laura; Lolli, Marco; Bicciato, Silvio; Del Sal, Giannino

    2015-01-01

    Targeted anticancer therapies represent the most effective pharmacological strategies in terms of clinical responses. In this context, genetic alteration of several oncogenes represents an optimal predictor of response to targeted therapy. Integration of large-scale molecular and pharmacological data from cancer cell lines promises to be effective in the discovery of new genetic markers of drug sensitivity and of clinically relevant anticancer compounds. To define novel pharmacogenomic dependencies in cancer, we created the Mutations and Drugs Portal (MDP, http://mdp.unimore.it), a web accessible database that combines the cell-based NCI60 screening of more than 50,000 compounds with genomic data extracted from the Cancer Cell Line Encyclopedia and the NCI60 DTP projects. MDP can be queried for drugs active in cancer cell lines carrying mutations in specific cancer genes or for genetic markers associated to sensitivity or resistance to a given compound. As proof of performance, we interrogated MDP to identify both known and novel pharmacogenomics associations and unveiled an unpredicted combination of two FDA-approved compounds, namely statins and Dasatinib, as an effective strategy to potently inhibit YAP/TAZ in cancer cells. PMID:26513174

  10. TreeTFDB: An Integrative Database of the Transcription Factors from Six Economically Important Tree Crops for Functional Predictions and Comparative and Functional Genomics

    PubMed Central

    Mochida, Keiichi; Yoshida, Takuhiro; Sakurai, Tetsuya; Yamaguchi-Shinozaki, Kazuko; Shinozaki, Kazuo; Tran, Lam-Son Phan

    2013-01-01

    Crop plants, whose productivity is affected by a wide range of growing and environmental conditions, are grown for economic purposes. Transcription factors (TFs) play central role in regulation of many biological processes, including plant development and responses to environmental stimuli, by activating or repressing spatiotemporal gene expression. Here, we describe the TreeTFDB (http://treetfdb.bmep.riken.jp/index.pl) that houses the TF repertoires of six economically important tree crop species: Jatropha curcas, papaya, cassava, poplar, castor bean and grapevine. Among these, the TF repertoire of J. curcas has not been reported by any other TF databases. In addition to their basic information, such as sequence and domain features, domain alignments, gene ontology assignment and sequence comparison, information on available full-length cDNAs, identity and positions of all types of known cis-motifs found in the promoter regions, gene expression data are provided. With its newly designed and friendly interface and its unique features, TreeTFDB will enable research community to predict the functions and provide access to available genetic resources for performing comparative and functional genomics of the crop TFs, either individually or at whole family level, in a comprehensive and convenient manner. PMID:23284086

  11. Databases for Microbiologists

    PubMed Central

    2015-01-01

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. The purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists. PMID:26013493

  12. Identification by Molecular Methods and Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry and Antifungal Susceptibility Profiles of Clinically Significant Rare Aspergillus Species in a Referral Chest Hospital in Delhi, India.

    PubMed

    Masih, Aradhana; Singh, Pradeep K; Kathuria, Shallu; Agarwal, Kshitij; Meis, Jacques F; Chowdhary, Anuradha

    2016-09-01

    Aspergillus species cause a wide spectrum of clinical infections. Although Aspergillus fumigatus and Aspergillus flavus remain the most commonly isolated species in aspergillosis, in the last decade, rare and cryptic Aspergillus species have emerged in diverse clinical settings. The present study analyzed the distribution and in vitro antifungal susceptibility profiles of rare Aspergillus species in clinical samples from patients with suspected aspergillosis in 8 medical centers in India. Further, a matrix-assisted laser desorption ionization-time of flight mass spectrometry in-house database was developed to identify these clinically relevant Aspergillus species. β-Tubulin and calmodulin gene sequencing identified 45 rare Aspergillus isolates to the species level, except for a solitary isolate. They included 23 less common Aspergillus species belonging to 12 sections, mainly in Circumdati, Nidulantes, Flavi, Terrei, Versicolores, Aspergillus, and Nigri Matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) identified only 8 (38%) of the 23 rare Aspergillus isolates to the species level. Following the creation of an in-house database with the remaining 14 species not available in the Bruker database, the MALDI-TOF MS identification rate increased to 95%. Overall, high MICs of ≥2 μg/ml were noted for amphotericin B in 29% of the rare Aspergillus species, followed by voriconazole in 20% and isavuconazole in 7%, whereas MICs of >0.5 μg/ml for posaconazole were observed in 15% of the isolates. Regarding the clinical diagnoses in 45 patients with positive rare Aspergillus species cultures, 19 (42%) were regarded to represent colonization. In the remaining 26 patients, rare Aspergillus species were the etiologic agent of invasive, chronic, and allergic bronchopulmonary aspergillosis, allergic fungal rhinosinusitis, keratitis, and mycetoma. PMID:27413188

  13. Cyclopiazonic Acid Biosynthesis of Aspergillus flavus and Aspergillus oryzae

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Cyclopiazonic acid (CPA) is an indole-tetramic acid neurotoxin produced by some of the same strains of A. flavus that produce aflatoxins and by some Aspergillus oryzae strains. Despite its discovery 40 years ago, few reviews of its toxicity and biosynthesis have been reported. This review examines w...

  14. Host Genes Involved in the Interaction between Aspergillus flavus and Maize

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxin contamination caused by Aspergillus flavus is a major concern in maize production. Understanding the complex interrelationships of genes during the A. flavus-maize interaction may be the key to developing strategies to interrupt the aflatoxin contamination process. The A. flavus Genome Seq...

  15. HMMerThread: Detecting Remote, Functional Conserved Domains in Entire Genomes by Combining Relaxed Sequence-Database Searches with Fold Recognition

    PubMed Central

    Bradshaw, Charles Richard; Surendranath, Vineeth; Henschel, Robert; Mueller, Matthias Stefan; Habermann, Bianca Hermine

    2011-01-01

    Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs) are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10), a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in proteins based on weak

  16. Aspergillus infections in cystic fibrosis.

    PubMed

    King, Jill; Brunel, Shan F; Warris, Adilia

    2016-07-01

    Patients with cystic fibrosis (CF) suffer from chronic lung infection and airway inflammation. Respiratory failure secondary to chronic or recurrent infection remains the commonest cause of death and accounts for over 90% of mortality. Bacteria as Staphylococcus aureus, Pseudomonas aeruginosa and Burkholderia cepacia complex have been regarded the main CF pathogens and their role in progressive lung decline has been studied extensively. Little attention has been paid to the role of Aspergillus spp. and other filamentous fungi in the pathogenesis of non-ABPA (allergic bronchopulmonary aspergillosis) respiratory disease in CF, despite their frequent recovery in respiratory samples. It has become more apparent however, that Aspergillus spp. may play an important role in chronic lung disease in CF. Research delineating the underlying mechanisms of Aspergillus persistence and infection in the CF lung and its link to lung deterioration is lacking. This review summarizes the Aspergillus disease phenotypes observed in CF, discusses the role of CFTR (cystic fibrosis transmembrane conductance regulator)-protein in innate immune responses and new treatment modalities. PMID:27177733

  17. 76 FR 16297 - Aspergillus flavus

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-03-23

    ...This regulation establishes an exemption from the requirement of a tolerance for residues of the microbial pesticide, Aspergillus flavus AF36, in or on corn food and feed commodities, when applied/used as an antifungal agent. The Arizona Cotton Research and Protection Council submitted a petition to EPA under the Federal Food, Drug, and Cosmetic Act (FFDCA), requesting an amendment to the......

  18. Sexual recombination in Aspergillus tubingensis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus tubingensis from section Nigri (Black Aspergilli) is closely related to A. niger and is used extensively in the industrial production of enzymes and organic acids. We recently discovered sexual reproduction in A. tubingensis and in this study, demonstrate that the progeny are products o...

  19. Sexual reproduction in Aspergillus flavus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus is the major producer of carcinogenic aflatoxins in crops worldwide and is also an important opportunistic human pathogen in aspergillosis. The sexual state of this heterothallic fungus is described from crosses between strains of the opposite mating type. Sexual reproduction oc...

  20. RNA-Seq-Based Transcriptome Analysis of Aflatoxigenic Aspergillus flavus in Response to Water Activity

    PubMed Central

    Zhang, Feng; Guo, Zhenni; Zhong, Hong; Wang, Sen; Yang, Weiqiang; Liu, Yongfeng; Wang, Shihua

    2014-01-01

    Aspergillus flavus is one of the most important producers of carcinogenic aflatoxins in crops, and the effect of water activity (aw) on growth and aflatoxin production of A. flavus has been previously studied. Here we found the strains under 0.93 aw exhibited decreased conidiation and aflatoxin biosynthesis compared to that under 0.99 aw. When RNA-Seq was used to delineate gene expression profile under different water activities, 23,320 non-redundant unigenes, with an average length of 1297 bp, were yielded. By database comparisons, 19,838 unigenes were matched well (e-value < 10−5) with known gene sequences, and another 6767 novel unigenes were obtained by comparison to the current genome annotation of A. flavus. Based on the RPKM equation, 5362 differentially expressed unigenes (with |log2Ratio| ≥ 1) were identified between 0.99 aw and 0.93 aw treatments, including 3156 up-regulated and 2206 down-regulated unigenes, suggesting that A. flavus underwent an extensive transcriptome response during water activity variation. Furthermore, we found that the expression of 16 aflatoxin producing-related genes decreased obviously when water activity decreased, and the expression of 11 development-related genes increased after 0.99 aw treatment. Our data corroborate a model where water activity affects aflatoxin biosynthesis through increasing the expression of aflatoxin producing-related genes and regulating development-related genes. PMID:25421810

  1. Sequence- and Structure-Based Functional Annotation and Assessment of Metabolic Transporters in Aspergillus oryzae: A Representative Case Study

    PubMed Central

    Raethong, Nachon; Wong-ekkabut, Jirasak; Laoteng, Kobkul; Vongsangnak, Wanwipa

    2016-01-01

    Aspergillus oryzae is widely used for the industrial production of enzymes. In A. oryzae metabolism, transporters appear to play crucial roles in controlling the flux of molecules for energy generation, nutrients delivery, and waste elimination in the cell. While the A. oryzae genome sequence is available, transporter annotation remains limited and thus the connectivity of metabolic networks is incomplete. In this study, we developed a metabolic annotation strategy to understand the relationship between the sequence, structure, and function for annotation of A. oryzae metabolic transporters. Sequence-based analysis with manual curation showed that 58 genes of 12,096 total genes in the A. oryzae genome encoded metabolic transporters. Under consensus integrative databases, 55 unambiguous metabolic transporter genes were distributed into channels and pores (7 genes), electrochemical potential-driven transporters (33 genes), and primary active transporters (15 genes). To reveal the transporter functional role, a combination of homology modeling and molecular dynamics simulation was implemented to assess the relationship between sequence to structure and structure to function. As in the energy metabolism of A. oryzae, the H+-ATPase encoded by the AO090005000842 gene was selected as a representative case study of multilevel linkage annotation. Our developed strategy can be used for enhancing metabolic network reconstruction. PMID:27274991

  2. Aspergillus asperus and Aspergillus collinsii, two new species from Aspergillus section Usti

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In sampling fungi from the built environment, two isolates that could not confidently be placed in described species were encountered. Phenotypic analysis suggested that they belonged in Aspergillus sect. Usti. In order to verify the sectional placement and to assure that they were undescribed rathe...

  3. Maize Genetics and Genomics Database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This report is provided each year to our stakeholders in the maize genetic community. In this report, we describe the five-year plan for MaizeGDB reviewed in early 2008 by the USDA-ARS peer review process and which was developed with inputs from our Working Group and the Allerton 2007 Report (MNL 82...

  4. Overexpression of Aspergillus tubingensis faeA in protease-deficient Aspergillus niger enables ferulic acid production from plant material.

    PubMed

    Zwane, Eunice N; Rose, Shaunita H; van Zyl, Willem H; Rumbold, Karl; Viljoen-Bloom, Marinda

    2014-06-01

    The production of ferulic acid esterase involved in the release of ferulic acid side groups from xylan was investigated in strains of Aspergillus tubingensis, Aspergillus carneus, Aspergillus niger and Rhizopus oryzae. The highest activity on triticale bran as sole carbon source was observed with the A. tubingensis T8.4 strain, which produced a type A ferulic acid esterase active against methyl p-coumarate, methyl ferulate and methyl sinapate. The activity of the A. tubingensis ferulic acid esterase (AtFAEA) was inhibited twofold by glucose and induced twofold in the presence of maize bran. An initial accumulation of endoglucanase was followed by the production of endoxylanase, suggesting a combined action with ferulic acid esterase on maize bran. A genomic copy of the A. tubingensis faeA gene was cloned and expressed in A. niger D15#26 under the control of the A. niger gpd promoter. The recombinant strain has reduced protease activity and does not acidify the media, therefore promoting high-level expression of recombinant enzymes. It produced 13.5 U/ml FAEA after 5 days on autoclaved maize bran as sole carbon source, which was threefold higher than for the A. tubingensis donor strain. The recombinant AtFAEA was able to extract 50 % of the available ferulic acid from non-pretreated maize bran, making this enzyme suitable for the biological production of ferulic acid from lignocellulosic plant material. PMID:24664515

  5. Atypical Aspergillus parasiticus isolates from pistachio with aflR gene nucleotide insertion identical to Aspergillus sojae

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxins are the most toxic and carcinogenic secondary metabolites produced primarily by the filamentous fungi Aspergillus flavus and Aspergillus parasiticus. The toxins cause devastating economic losses because of strict regulations on distribution of contaminated products. Aspergillus sojae are...

  6. Developmental regulators in Aspergillus fumigatus.

    PubMed

    Park, Hee-Soo; Yu, Jae-Hyuk

    2016-03-01

    The filamentous fungus Aspergillus fumigatus is the most prevalent airborne fungal pathogen causing severe and usually fatal invasive aspergillosis in immunocompromised patients. This fungus produces a large number of small hydrophobic asexual spores called conidia as the primary means of reproduction, cell survival, propagation, and infectivity. The initiation, progression, and completion of asexual development (conidiation) is controlled by various regulators that govern expression of thousands of genes associated with formation of the asexual developmental structure conidiophore, and biogenesis of conidia. In this review, we summarize key regulators that directly or indirectly govern conidiation in this important pathogenic fungus. Better understanding these developmental regulators may provide insights into the improvement in controlling both beneficial and detrimental aspects of various Aspergillus species. PMID:26920882

  7. Data-based Reconstruction of Gene Regulatory Networks of Fungal Pathogens

    PubMed Central

    Guthke, Reinhard; Gerber, Silvia; Conrad, Theresia; Vlaic, Sebastian; Durmuş, Saliha; Çakır, Tunahan; Sevilgen, F. E.; Shelest, Ekaterina; Linde, Jörg

    2016-01-01

    In the emerging field of systems biology of fungal infection, one of the central roles belongs to the modeling of gene regulatory networks (GRNs). Utilizing omics-data, GRNs can be predicted by mathematical modeling. Here, we review current advances of data-based reconstruction of both small-scale and large-scale GRNs for human pathogenic fungi. The advantage of large-scale genome-wide modeling is the possibility to predict central (hub) genes and thereby indicate potential biomarkers and drug targets. In contrast, small-scale GRN models provide hypotheses on the mode of gene regulatory interactions, which have to be validated experimentally. Due to the lack of sufficient quantity and quality of both experimental data and prior knowledge about regulator–target gene relations, the genome-wide modeling still remains problematic for fungal pathogens. While a first genome-wide GRN model has already been published for Candida albicans, the feasibility of such modeling for Aspergillus fumigatus is evaluated in the present article. Based on this evaluation, opinions are drawn on future directions of GRN modeling of fungal pathogens. The crucial point of genome-wide GRN modeling is the experimental evidence, both used for inferring the networks (omics ‘first-hand’ data as well as literature data used as prior knowledge) and for validation and evaluation of the inferred network models. PMID:27148247

  8. Data-based Reconstruction of Gene Regulatory Networks of Fungal Pathogens.

    PubMed

    Guthke, Reinhard; Gerber, Silvia; Conrad, Theresia; Vlaic, Sebastian; Durmuş, Saliha; Çakır, Tunahan; Sevilgen, F E; Shelest, Ekaterina; Linde, Jörg

    2016-01-01

    In the emerging field of systems biology of fungal infection, one of the central roles belongs to the modeling of gene regulatory networks (GRNs). Utilizing omics-data, GRNs can be predicted by mathematical modeling. Here, we review current advances of data-based reconstruction of both small-scale and large-scale GRNs for human pathogenic fungi. The advantage of large-scale genome-wide modeling is the possibility to predict central (hub) genes and thereby indicate potential biomarkers and drug targets. In contrast, small-scale GRN models provide hypotheses on the mode of gene regulatory interactions, which have to be validated experimentally. Due to the lack of sufficient quantity and quality of both experimental data and prior knowledge about regulator-target gene relations, the genome-wide modeling still remains problematic for fungal pathogens. While a first genome-wide GRN model has already been published for Candida albicans, the feasibility of such modeling for Aspergillus fumigatus is evaluated in the present article. Based on this evaluation, opinions are drawn on future directions of GRN modeling of fungal pathogens. The crucial point of genome-wide GRN modeling is the experimental evidence, both used for inferring the networks (omics 'first-hand' data as well as literature data used as prior knowledge) and for validation and evaluation of the inferred network models. PMID:27148247

  9. Generation, annotation, and analysis of an extensive Aspergillus niger EST collection

    PubMed Central

    Semova, Natalia; Storms, Reginald; John, Tricia; Gaudet, Pascale; Ulycznyj, Peter; Min, Xiang Jia; Sun, Jian; Butler, Greg; Tsang, Adrian

    2006-01-01

    Background Aspergillus niger, a saprophyte commonly found on decaying vegetation, is widely used and studied for industrial purposes. Despite its place as one of the most important organisms for commercial applications, the lack of available information about its genetic makeup limits research with this filamentous fungus. Results We present here the analysis of 12,820 expressed sequence tags (ESTs) generated from A. niger cultured under seven different growth conditions. These ESTs identify about 5,108 genes of which 44.5% code for proteins sharing similarity (E ≤ 1e -5) with GenBank entries of known function, 38% code for proteins that only share similarity with GenBank entries of unknown function and 17.5% encode proteins that do not have a GenBank homolog. Using the Gene Ontology hierarchy, we present a first classification of the A. niger proteins encoded by these genes and compare its protein repertoire with other well-studied fungal species. We have established a searchable web-based database that includes the EST and derived contig sequences and their annotation. Details about this project and access to the annotated A. niger database are available. Conclusion This EST collection and its annotation provide a significant resource for fundamental and applied research with A. niger. The gene set identified in this manuscript will be highly useful in the annotation of the genome sequence of A. niger, the genes described in the manuscript, especially those encoding hydrolytic enzymes will provide a valuable source for researchers interested in enzyme properties and applications. PMID:16457709

  10. An automated system designed for large scale NMR data deposition and annotation: application to over 600 assigned chemical shift data entries to the BioMagResBank from the Riken Structural Genomics/Proteomics Initiative internal database.

    PubMed

    Kobayashi, Naohiro; Harano, Yoko; Tochio, Naoya; Nakatani, Eiichi; Kigawa, Takanori; Yokoyama, Shigeyuki; Mading, Steve; Ulrich, Eldon L; Markley, John L; Akutsu, Hideo; Fujiwara, Toshimichi

    2012-08-01

    Biomolecular NMR chemical shift data are key information for the functional analysis of biomolecules and the development of new techniques for NMR studies utilizing chemical shift statistical information. Structural genomics projects are major contributors to the accumulation of protein chemical shift information. The management of the large quantities of NMR data generated by each project in a local database and the transfer of the data to the public databases are still formidable tasks because of the complicated nature of NMR data. Here we report an automated and efficient system developed for the deposition and annotation of a large number of data sets including (1)H, (13)C and (15)N resonance assignments used for the structure determination of proteins. We have demonstrated the feasibility of our system by applying it to over 600 entries from the internal database generated by the RIKEN Structural Genomics/Proteomics Initiative (RSGI) to the public database, BioMagResBank (BMRB). We have assessed the quality of the deposited chemical shifts by comparing them with those predicted from the PDB coordinate entry for the corresponding protein. The same comparison for other matched BMRB/PDB entries deposited from 2001-2011 has been carried out and the results suggest that the RSGI entries greatly improved the quality of the BMRB database. Since the entries include chemical shifts acquired under strikingly similar experimental conditions, these NMR data can be expected to be a promising resource to improve current technologies as well as to develop new NMR methods for protein studies. PMID:22689068

  11. Protein sequence databases.

    PubMed

    Apweiler, Rolf; Bairoch, Amos; Wu, Cathy H

    2004-02-01

    A variety of protein sequence databases exist, ranging from simple sequence repositories, which store data with little or no manual intervention in the creation of the records, to expertly curated universal databases that cover all species and in which the original sequence data are enhanced by the manual addition of further information in each sequence record. As the focus of researchers moves from the genome to the proteins encoded by it, these databases will play an even more important role as central comprehensive resources of protein information. Several the leading protein sequence databases are discussed here, with special emphasis on the databases now provided by the Universal Protein Knowledgebase (UniProt) consortium. PMID:15036160

  12. Genome-wide expression analysis upon constitutive activation of the HacA bZIP transcription factor in Aspergillus niger reveals a coordinated cellular response to counteract ER stress

    PubMed Central

    2012-01-01

    Background HacA/Xbp1 is a conserved bZIP transcription factor in eukaryotic cells which regulates gene expression in response to various forms of secretion stress and as part of secretory cell differentiation. In the present study, we replaced the endogenous hacA gene of an Aspergillus niger strain with a gene encoding a constitutively active form of the HacA transcription factor (HacACA). The impact of constitutive HacA activity during exponential growth was explored in bioreactor controlled cultures using transcriptomic analysis to identify affected genes and processes. Results Transcription profiles for the wild-type strain (HacAWT) and the HacACA strain were obtained using Affymetrix GeneChip analysis of three replicate batch cultures of each strain. In addition to the well known HacA targets such as the ER resident foldases and chaperones, GO enrichment analysis revealed up-regulation of genes involved in protein glycosylation, phospholipid biosynthesis, intracellular protein transport, exocytosis and protein complex assembly in the HacACA mutant. Biological processes over-represented in the down-regulated genes include those belonging to central metabolic pathways, translation and transcription. A remarkable transcriptional response in the HacACA strain was the down-regulation of the AmyR transcription factor and its target genes. Conclusions The results indicate that the constitutive activation of the HacA leads to a coordinated regulation of the folding and secretion capacity of the cell, but with consequences on growth and fungal physiology to reduce secretion stress. PMID:22846479

  13. LAMP-PCR detection of ochratoxigenic Aspergillus species collected from peanut kernel.

    PubMed

    Al-Sheikh, H M

    2015-01-01

    Over the last decade, ochratoxin A (OTA) has been widely described and is ubiquitous in several agricultural products. Ochratoxins represent the second-most important mycotoxin group after aflatoxins. A total of 34 samples were surveyed from 3 locations, including Mecca, Madina, and Riyadh, Saudi Arabia, during 2012. Fungal contamination frequency was determined for surface-sterilized peanut seeds, which were seeded onto malt extract agar media. Aspergillus niger (35%), Aspergillus ochraceus (30%), and Aspergillus carbonarius (25%) were the most frequently observed Aspergillius species, while Aspergillus flavus and Aspergillus phoenicis isolates were only infrequently recovered and in small numbers (10%). OTA production was evaluated on yeast extract sucrose medium, which revealed that 57% of the isolates were A. niger and 60% of A. carbonarius isolates were OTA producers; 100% belonged to A. ochraceus. Only one isolate, morphologically identified as A. carbonarius, and 3 A. niger isolates unstably produced OTA. A polymerase chain reaction (PCR)-based identification and detection assay was used to identify A. ochraceus isolates. Using the primer sets OCRA1/OCRA2, 400-base pair PCR fragments were produced only when genomic DNA from A. ochraceus isolates was used. Recently, the loop-mediated isothermal amplification assay using recombinase polymerase amplification chemistry was used for A. carbonarius and A. niger DNA identification. As a non-gel-based technique, the amplification product was directly visualized in the reaction tube after adding calcein for naked-eye examination. PMID:25729999

  14. Genetics of Polyketide Metabolism in Aspergillus nidulans

    PubMed Central

    Klejnstrup, Marie L.; Frandsen, Rasmus J. N.; Holm, Dorte K.; Nielsen, Morten T.; Mortensen, Uffe H.; Larsen, Thomas O.; Nielsen, Jakob B.

    2012-01-01

    Secondary metabolites are small molecules that show large structural diversity and a broad range of bioactivities. Some metabolites are attractive as drugs or pigments while others act as harmful mycotoxins. Filamentous fungi have the capacity to produce a wide array of secondary metabolites including polyketides. The majority of genes required for production of these metabolites are mostly organized in gene clusters, which often are silent or barely expressed under laboratory conditions, making discovery and analysis difficult. Fortunately, the genome sequences of several filamentous fungi are publicly available, greatly facilitating the establishment of links between genes and metabolites. This review covers the attempts being made to trigger the activation of polyketide metabolism in the fungal model organism Aspergillus nidulans. Moreover, it will provide an overview of the pathways where ten polyketide synthase genes have been coupled to polyketide products. Therefore, the proposed biosynthesis of the following metabolites will be presented; naphthopyrone, sterigmatocystin, aspyridones, emericellamides, asperthecin, asperfuranone, monodictyphenone/emodin, orsellinic acid, and the austinols. PMID:24957370

  15. Functional Analysis of the Aspergillus nidulans Kinome

    PubMed Central

    De Souza, Colin P.; Hashmi, Shahr B.; Osmani, Aysha H.; Andrews, Peter; Ringelberg, Carol S.; Dunlap, Jay C.; Osmani, Stephen A.

    2013-01-01

    The filamentous fungi are an ecologically important group of organisms which also have important industrial applications but devastating effects as pathogens and agents of food spoilage. Protein kinases have been implicated in the regulation of virtually all biological processes but how they regulate filamentous fungal specific processes is not understood. The filamentous fungus Aspergillus nidulans has long been utilized as a powerful molecular genetic system and recent technical advances have made systematic approaches to study large gene sets possible. To enhance A. nidulans functional genomics we have created gene deletion constructs for 9851 genes representing 93.3% of the encoding genome. To illustrate the utility of these constructs, and advance the understanding of fungal kinases, we have systematically generated deletion strains for 128 A. nidulans kinases including expanded groups of 15 histidine kinases, 7 SRPK (serine-arginine protein kinases) kinases and an interesting group of 11 filamentous fungal specific kinases. We defined the terminal phenotype of 23 of the 25 essential kinases by heterokaryon rescue and identified phenotypes for 43 of the 103 non-essential kinases. Uncovered phenotypes ranged from almost no growth for a small number of essential kinases implicated in processes such as ribosomal biosynthesis, to conditional defects in response to cellular stresses. The data provide experimental evidence that previously uncharacterized kinases function in the septation initiation network, the cell wall integrity and the morphogenesis Orb6 kinase signaling pathways, as well as in pathways regulating vesicular trafficking, sexual development and secondary metabolism. Finally, we identify ChkC as a third effector kinase functioning in the cellular response to genotoxic stress. The identification of many previously unknown functions for kinases through the functional analysis of the A. nidulans kinome illustrates the utility of the A. nidulans gene

  16. Cyclopiazonic Acid Biosynthesis of Aspergillus flavus and Aspergillus oryzae

    PubMed Central

    Chang, Perng-Kuang; Ehrlich, Kenneth C.; Fujii, Isao

    2009-01-01

    Cyclopiazonic acid (CPA) is an indole-tetramic acid neurotoxin produced by some of the same strains of A. flavus that produce aflatoxins and by some Aspergillus oryzae strains. Despite its discovery 40 years ago, few reviews of its toxicity and biosynthesis have been reported. This review examines what is currently known about the toxicity of CPA to animals and humans, both by itself or in combination with other mycotoxins. The review also discusses CPA biosynthesis and the genetic diversity of CPA production in A. flavus/oryzae populations. PMID:22069533

  17. The PIR-International databases.

    PubMed Central

    Barker, W C; George, D G; Mewes, H W; Pfeiffer, F; Tsugita, A

    1993-01-01

    PIR-International is an association of macromolecular sequence data collection centers dedicated to fostering international cooperation as an essential element in the development of scientific databases. PIR-International is most noted for the Protein Sequence Database. This database originated in the early 1960's with the pioneering work of the late Margaret Dayhoff as a research tool for the study of protein evolution and intersequence relationships; it is maintained as a scientific resource, organized by biological concepts, using sequence homology as a guiding principle. PIR-International also maintains a number of other genomic, protein sequence, and sequence-related databases. The databases of PIR-International are made widely available. This paper briefly describes the architecture of the Protein Sequence Database, a number of other PIR-International databases, and mechanisms for providing access to and for distribution of these databases. PMID:8332528

  18. Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae.

    PubMed

    Galagan, James E; Calvo, Sarah E; Cuomo, Christina; Ma, Li-Jun; Wortman, Jennifer R; Batzoglou, Serafim; Lee, Su-In; Baştürkmen, Meray; Spevak, Christina C; Clutterbuck, John; Kapitonov, Vladimir; Jurka, Jerzy; Scazzocchio, Claudio; Farman, Mark; Butler, Jonathan; Purcell, Seth; Harris, Steve; Braus, Gerhard H; Draht, Oliver; Busch, Silke; D'Enfert, Christophe; Bouchier, Christiane; Goldman, Gustavo H; Bell-Pedersen, Deborah; Griffiths-Jones, Sam; Doonan, John H; Yu, Jaehyuk; Vienken, Kay; Pain, Arnab; Freitag, Michael; Selker, Eric U; Archer, David B; Peñalva, Miguel A; Oakley, Berl R; Momany, Michelle; Tanaka, Toshihiro; Kumagai, Toshitaka; Asai, Kiyoshi; Machida, Masayuki; Nierman, William C; Denning, David W; Caddick, Mark; Hynes, Michael; Paoletti, Mathieu; Fischer, Reinhard; Miller, Bruce; Dyer, Paul; Sachs, Matthew S; Osmani, Stephen A; Birren, Bruce W

    2005-12-22

    The aspergilli comprise a diverse group of filamentous fungi spanning over 200 million years of evolution. Here we report the genome sequence of the model organism Aspergillus nidulans, and a comparative study with Aspergillus fumigatus, a serious human pathogen, and Aspergillus oryzae, used in the production of sake, miso and soy sauce. Our analysis of genome structure provided a quantitative evaluation of forces driving long-term eukaryotic genome evolution. It also led to an experimentally validated model of mating-type locus evolution, suggesting the potential for sexual reproduction in A. fumigatus and A. oryzae. Our analysis of sequence conservation revealed over 5,000 non-coding regions actively conserved across all three species. Within these regions, we identified potential functional elements including a previously uncharacterized TPP riboswitch and motifs suggesting regulation in filamentous fungi by Puf family genes. We further obtained comparative and experimental evidence indicating widespread translational regulation by upstream open reading frames. These results enhance our understanding of these widely studied fungi as well as provide new insight into eukaryotic genome evolution and gene regulation. PMID:16372000

  19. Prophage Genomics

    PubMed Central

    Canchaya, Carlos; Proux, Caroline; Fournous, Ghislain; Bruttin, Anne; Brüssow, Harald

    2003-01-01

    The majority of the bacterial genome sequences deposited in the National Center for Biotechnology Information database contain prophage sequences. Analysis of the prophages suggested that after being integrated into bacterial genomes, they undergo a complex decay process consisting of inactivating point mutations, genome rearrangements, modular exchanges, invasion by further mobile DNA elements, and massive DNA deletion. We review the technical difficulties in defining such altered prophage sequences in bacterial genomes and discuss theoretical frameworks for the phage-bacterium interaction at the genomic level. The published genome sequences from three groups of eubacteria (low- and high-G+C gram-positive bacteria and γ-proteobacteria) were screened for prophage sequences. The prophages from Streptococcus pyogenes served as test case for theoretical predictions of the role of prophages in the evolution of pathogenic bacteria. The genomes from further human, animal, and plant pathogens, as well as commensal and free-living bacteria, were included in the analysis to see whether the same principles of prophage genomics apply for bacteria living in different ecological niches and coming from distinct phylogenetical affinities. The effect of selection pressure on the host bacterium is apparently an important force shaping the prophage genomes in low-G+C gram-positive bacteria and γ-proteobacteria. PMID:12794192

  20. Two novel species of Aspergillus section Nigri from indoor air

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus collinsii, Aspergillus floridensis, and Aspergillus trinidadensis are described as novel uniseriate species of Aspergillus section Nigri isolated from air samples. To describe the species we used phenotypes from 7-d Czapek yeast extract agar culture (CYA) and malt extract agar culture (M...

  1. Biological databases for human research.

    PubMed

    Zou, Dong; Ma, Lina; Yu, Jun; Zhang, Zhang

    2015-02-01

    The completion of the Human Genome Project lays a foundation for systematically studying the human genome from evolutionary history to precision medicine against diseases. With the explosive growth of biological data, there is an increasing number of biological databases that have been developed in aid of human-related research. Here we present a collection of human-related biological databases and provide a mini-review by classifying them into different categories according to their data types. As human-related databases continue to grow not only in count but also in volume, challenges are ahead in big data storage, processing, exchange and curation. PMID:25712261

  2. Biological Databases for Human Research

    PubMed Central

    Zou, Dong; Ma, Lina; Yu, Jun; Zhang, Zhang

    2015-01-01

    The completion of the Human Genome Project lays a foundation for systematically studying the human genome from evolutionary history to precision medicine against diseases. With the explosive growth of biological data, there is an increasing number of biological databases that have been developed in aid of human-related research. Here we present a collection of human-related biological databases and provide a mini-review by classifying them into different categories according to their data types. As human-related databases continue to grow not only in count but also in volume, challenges are ahead in big data storage, processing, exchange and curation. PMID:25712261

  3. Aspergillus Osteomyelitis of the Skull.

    PubMed

    Nicholson, Simon; King, Richard; Chumas, Paul; Russell, John; Liddington, Mark

    2016-07-01

    Osteomyelitis of the craniofacial skeleton is rare, with fungal pathogens least commonly implicated. The authors present 2 patients of osteomyelitis of the skull caused by Aspergillus spp. and discuss the diagnosis, clinicopathological course, and management strategies.Late recurrence seen in this type of infection warrants long-term follow-up and a high index of suspicion for the clinical signs associated with recurrence.Such patients would benefit from their surgical debridement being planned and managed via a specialist craniofacial unit, so as to utilize the most aesthetically sensitive approach and the experience of specialists from several surgical disciplines. PMID:27391523

  4. PRIMED: PRIMEr database for deleting and tagging all fission and budding yeast genes developed using the open-source genome retrieval script (GRS).

    PubMed

    Cummings, Michael T; Joh, Richard I; Motamedi, Mo

    2015-01-01

    The fission (Schizosaccharomyces pombe) and budding (Saccharomyces cerevisiae) yeasts have served as excellent models for many seminal discoveries in eukaryotic biology. In these organisms, genes are deleted or tagged easily by transforming cells with PCR-generated DNA inserts, flanked by short (50-100 bp) regions of gene homology. These PCR reactions use especially designed long primers, which, in addition to the priming sites, carry homology for gene targeting. Primer design follows a fixed method but is tedious and time-consuming especially when done for a large number of genes. To automate this process, we developed the Python-based Genome Retrieval Script (GRS), an easily customizable open-source script for genome analysis. Using GRS, we created PRIMED, the complete PRIMEr D atabase for deleting and C-terminal tagging genes in the main S. pombe and five of the most commonly used S. cerevisiae strains. Because of the importance of noncoding RNAs (ncRNAs) in many biological processes, we also included the deletion primer set for these features in each genome. PRIMED are accurate and comprehensive and are provided as downloadable Excel files, removing the need for future primer design, especially for large-scale functional analyses. Furthermore, the open-source GRS can be used broadly to retrieve genome information from custom or other annotated genomes, thus providing a suitable platform for building other genomic tools by the yeast or other research communities. PMID:25643023

  5. Navigating Public Microarray Databases

    PubMed Central

    Bähler, Jürg

    2004-01-01

    With the ever-escalating amount of data being produced by genome-wide microarray studies, it is of increasing importance that these data are captured in public databases so that researchers can use this information to complement and enhance their own studies. Many groups have set up databases of expression data, ranging from large repositories, which are designed to comprehensively capture all published data, through to more specialized databases. The public repositories, such as ArrayExpress at the European Bioinformatics Institute contain complete datasets in raw format in addition to processed data, whilst the specialist databases tend to provide downstream analysis of normalized data from more focused studies and data sources. Here we provide a guide to the use of these public microarray resources. PMID:18629145

  6. Navigating public microarray databases.

    PubMed

    Penkett, Christopher J; Bähler, Jürg

    2004-01-01

    With the ever-escalating amount of data being produced by genome-wide microarray studies, it is of increasing importance that these data are captured in public databases so that researchers can use this information to complement and enhance their own studies. Many groups have set up databases of expression data, ranging from large repositories, which are designed to comprehensively capture all published data, through to more specialized databases. The public repositories, such as ArrayExpress at the European Bioinformatics Institute contain complete datasets in raw format in addition to processed data, whilst the specialist databases tend to provide downstream analysis of normalized data from more focused studies and data sources. Here we provide a guide to the use of these public microarray resources. PMID:18629145

  7. Diversity of Aspergillus oryzae genotypes (RFLP) isolated from traditional soy sauce production within Malaysia and Southeast Asia

    Technology Transfer Automated Retrieval System (TEKTRAN)

    DNA fingerprinting was performed on 64 strains of Aspergillus oryzae and one strain of A. sojae isolated from soysauce factories within Malaysia and Southeast Asia that use primitive traditional methods in producing 'tamari type' Cantonese soy sauce. PstI digests of total genomic DNA from each isol...

  8. Beyond aflatoxin: four distinct expression patterns and functional roles associated with Aspergillus flavus secondary metabolism gene clusters

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Species of Aspergillus produce a diverse array of secondary metabolites, and recent genomic analysis has predicted that these species have the capacity to synthesize many more compounds. It has been possible to infer the presence of 55 gene clusters associated with secondary metabolism in Aspergill...

  9. NITRIFICATION BY ASPERGILLUS FLAVUS1

    PubMed Central

    Marshall, K. C.; Alexander, M.

    1962-01-01

    Marshall, K. C. (Cornell University, Ithaca, N. Y.) and M. Alexander. Nitrification by Aspergillus flavus. J. Bacteriol. 83:572–578. 1962.—Aspergillus flavus has been shown to produce bound hydroxylamine, nitrite, and nitrate when grown in peptone, amino acid, or buffered ammonium media. Free hydroxylamine was not detected in these cultures, but it was found in an unbuffered ammonium medium in which neither nitrite nor nitrate was formed. Evidence was obtained for the presence of β-nitropropionic acid in the filtrate of an actively nitrifying culture. Alumina treatment of an ammonium medium prevented the formation by growing cultures of nitrite and nitrate but not bound hydroxylamine. The effect of alumina treatment was reversed by the addition of 10−3m CeCl3 to the medium. Extracts of the fungus contained peroxidase and an enzyme capable of catalyzing the production of nitrite from β-nitropropionic acid. The nitrite-forming enzyme is apparently specific for β-nitropropionate; no activity was found with nitromethane, nitroethane, and nitropropane as substrates. Nitrate was not reduced to nitrite nor was nitrite oxidized to nitrate by the hyphal extracts. The significance of these observations in nitrification by A. flavus is discussed. PMID:14470254

  10. Secondary metabolite profiles and antifungal drug susceptibility of Aspergillus fumigatus and closely related species, Aspergillus lentulus, Aspergillus udagawae, and Aspergillus viridinutans.

    PubMed

    Tamiya, Hiroyuki; Ochiai, Eri; Kikuchi, Kazuyo; Yahiro, Maki; Toyotome, Takahito; Watanabe, Akira; Yaguchi, Takashi; Kamei, Katsuhiko

    2015-05-01

    The incidence of Aspergillus infection has been increasing in the past few years. Also, new Aspergillus fumigatus-related species, namely Aspergillus lentulus, Aspergillus udagawae, and Aspergillus viridinutans, were shown to infect humans. These fungi exhibit marked morphological similarities to A. fumigatus, albeit with different clinical courses and antifungal drug susceptibilities. The present study used liquid chromatography/time-of-flight mass spectrometry to identify the secondary metabolites secreted as virulence factors by these Aspergillus species and compared their antifungal susceptibility. The metabolite profiles varied widely among A. fumigatus, A. lentulus, A. udagawae, and A. viridinutans, producing 27, 13, 8, and 11 substances, respectively. Among the mycotoxins, fumifungin, fumiquinazoline A/B and D, fumitremorgin B, gliotoxin, sphingofungins, pseurotins, and verruculogen were only found in A. fumigatus, whereas auranthine was only found in A. lentulus. The amount of gliotoxin, one of the most abundant mycotoxins in A. fumigatus, was negligible in these related species. In addition, they had decreased susceptibility to antifungal agents such as itraconazole and voriconazole, even though metabolites that were shared in the isolates showing higher minimum inhibitory concentrations than epidemiological cutoff values were not detected. These strikingly different secondary metabolite profiles may lead to the development of more discriminative identification protocols for such closely related Aspergillus species as well as improved treatment outcomes. PMID:25737146

  11. The 2013 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection

    PubMed Central

    Fernández-Suárez, Xosé M.; Galperin, Michael Y.

    2013-01-01

    The 20th annual Database Issue of Nucleic Acids Research includes 176 articles, half of which describe new online molecular biology databases and the other half provide updates on the databases previously featured in NAR and other journals. This year’s highlights include two databases of DNA repeat elements; several databases of transcriptional factors and transcriptional factor-binding sites; databases on various aspects of protein structure and protein–protein interactions; databases for metagenomic and rRNA sequence analysis; and four databases specifically dedicated to Escherichia coli. The increased emphasis on using the genome data to improve human health is reflected in the development of the databases of genomic structural variation (NCBI’s dbVar and EBI’s DGVa), the NIH Genetic Testing Registry and several other databases centered on the genetic basis of human disease, potential drugs, their targets and the mechanisms of protein–ligand binding. Two new databases present genomic and RNAseq data for monkeys, providing wealth of data on our closest relatives for comparative genomics purposes. The NAR online Molecular Biology Database Collection, available at http://www.oxfordjournals.org/nar/database/a/, has been updated and currently lists 1512 online databases. The full content of the Database Issue is freely available online on the Nucleic Acids Research website (http://nar.oxfordjournals.org/). PMID:23203983

  12. Biofuel Database

    National Institute of Standards and Technology Data Gateway

    Biofuel Database (Web, free access)   This database brings together structural, biological, and thermodynamic data for enzymes that are either in current use or are being considered for use in the production of biofuels.

  13. Electronic Databases.

    ERIC Educational Resources Information Center

    Williams, Martha E.

    1985-01-01

    Presents examples of bibliographic, full-text, and numeric databases. Also discusses how to access these databases online, aids to online retrieval, and several issues and trends (including copyright and downloading, transborder data flow, use of optical disc/videodisc technology, and changing roles in database generation and processing). (JN)

  14. Database Administrator

    ERIC Educational Resources Information Center

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…

  15. ord1, an oxidoreductase gene responsible for conversion of O-methylsterigmatocystin to aflatoxin in Aspergillus flavus.

    PubMed Central

    Prieto, R; Woloshuk, C P

    1997-01-01

    Among the enzymatic steps in the aflatoxin biosynthetic pathway, the conversion of O-methylsterigmatocystin to aflatoxin has been proposed to be catalyzed by an oxidoreductase. Transformants of Aspergillus flavus 649WAF2 containing a 3.3-kb genomic DNA fragment and the aflatoxin biosynthesis regulatory gene aflR converted exogenously supplied O-methylsterigmatocystin to aflatoxin B1. A gene, ord1, corresponding to a transcript of about 2 kb was identified within the 3.3-kb DNA fragment. The promoter region presented a putative AFLR binding site and a TATA sequence. The nucleotide sequence of the gene revealed an open reading frame encoding a protein of 528 amino acids with a deduced molecular mass of 60.2 kDa. The gene contained six introns and seven exons. Heterologous expression of the ord1 open reading frame under the transcriptional control of the Saccharomyces cerevisiae galactose-inducible gal1 promoter results in the ability to convert O-methylsterigmatocystin to aflatoxin B1. The data indicate that ord1 is sufficient to accomplish the last step of the aflatoxin biosynthetic pathway. A search of various databases for similarity indicated that ord1 encodes a cytochrome P-450-type monooxygenase, and the gene has been assigned to a new P-450 gene family named CYP64. PMID:9143099

  16. Deeper insight into the structure of the anaerobic digestion microbial community; the biogas microbiome database is expanded with 157 new genomes.

    PubMed

    Treu, Laura; Kougias, Panagiotis G; Campanaro, Stefano; Bassani, Ilaria; Angelidaki, Irini

    2016-09-01

    This research aimed to better characterize the biogas microbiome by means of high throughput metagenomic sequencing and to elucidate the core microbial consortium existing in biogas reactors independently from the operational conditions. Assembly of shotgun reads followed by an established binning strategy resulted in the highest, up to now, extraction of microbial genomes involved in biogas producing systems. From the 236 extracted genome bins, it was remarkably found that the vast majority of them could only be characterized at high taxonomic levels. This result confirms that the biogas microbiome is comprised by a consortium of unknown species. A comparative analysis between the genome bins of the current study and those extracted from a previous metagenomic assembly demonstrated a similar phylogenetic distribution of the main taxa. Finally, this analysis led to the identification of a subset of common microbes that could be considered as the core essential group in biogas production. PMID:27243603

  17. Genetic diversity of Aspergillus species isolated from onychomycosis and Aspergillus hongkongensis sp. nov., with implications to antifungal susceptibility testing.

    PubMed

    Tsang, Chi-Ching; Hui, Teresa W S; Lee, Kim-Chung; Chen, Jonathan H K; Ngan, Antonio H Y; Tam, Emily W T; Chan, Jasper F W; Wu, Andrea L; Cheung, Mei; Tse, Brian P H; Wu, Alan K L; Lai, Christopher K C; Tsang, Dominic N C; Que, Tak-Lun; Lam, Ching-Wan; Yuen, Kwok-Yung; Lau, Susanna K P; Woo, Patrick C Y

    2016-02-01

    Thirteen Aspergillus isolates recovered from nails of 13 patients (fingernails, n=2; toenails, n=11) with onychomycosis were characterized. Twelve strains were identified by multilocus sequencing as Aspergillus spp. (Aspergillus sydowii [n=4], Aspergillus welwitschiae [n=3], Aspergillus terreus [n=2], Aspergillus flavus [n=1], Aspergillus tubingensis [n=1], and Aspergillus unguis [n=1]). Isolates of A. terreus, A. flavus, and A. unguis were also identifiable by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. The 13th isolate (HKU49(T)) possessed unique morphological characteristics different from other Aspergillus spp. Molecular characterization also unambiguously showed that HKU49(T) was distinct from other Aspergillus spp. We propose the novel species Aspergillus hongkongensis to describe this previously unknown fungus. Antifungal susceptibility testing showed most Aspergillus isolates had low MICs against itraconazole and voriconazole, but all Aspergillus isolates had high MICs against fluconazole. A diverse spectrum of Aspergillus species is associated with onychomycosis. Itraconazole and voriconazole are probably better drug options for Aspergillus onychomycosis. PMID:26658315

  18. Three new species of Aspergillus section Flavi isolated from almonds and maize in Portugal

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Three new aflatoxin-producing species belonging to Aspergillus section Flavi are described, Aspergillus mottae, Aspergillus sergii and Aspergillus transmontanensis. These species were isolated from Portuguese almonds and maize. An investigation examining morphology, extrolites and molecular data was...

  19. Characterization of the Aspergillus ochraceoroseus aflatoxin/sterigmatocystin biosynthetic gene cluster

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Production of the carcinogenic aflatoxins has been reported from members of Aspergillus section Flavi, Aspergillus section Nidulantes, and a newly proposed section, Aspergillus section Ochraceorosei that consists of Aspergillus ochraceoroseus and A. rambellii. Unlike members of section Flavi, A. oc...

  20. Protein Model Database

    SciTech Connect

    Fidelis, K; Adzhubej, A; Kryshtafovych, A; Daniluk, P

    2005-02-23

    The phenomenal success of the genome sequencing projects reveals the power of completeness in revolutionizing biological science. Currently it is possible to sequence entire organisms at a time, allowing for a systemic rather than fractional view of their organization and the various genome-encoded functions. There is an international plan to move towards a similar goal in the area of protein structure. This will not be achieved by experiment alone, but rather by a combination of efforts in crystallography, NMR spectroscopy, and computational modeling. Only a small fraction of structures are expected to be identified experimentally, the remainder to be modeled. Presently there is no organized infrastructure to critically evaluate and present these data to the biological community. The goal of the Protein Model Database project is to create such infrastructure, including (1) public database of theoretically derived protein structures; (2) reliable annotation of protein model quality, (3) novel structure analysis tools, and (4) access to the highest quality modeling techniques available.

  1. Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes

    SciTech Connect

    Kuo, Alan; Grigoriev, Igor

    2009-04-17

    Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentous ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.

  2. [Overexpression of Aspergillus candidus lactase and analysis of enzymatic properties].

    PubMed

    Zhang, Wei; Fan, Yun-liu; Yao, Bin

    2005-04-01

    The lactase gene lacb' from Aspergillus candidus was fused behind alpha-factor signal sequence in the Pichia pastoris expression vector pPIC9, then integrated into the genome of P. pastoris by recombination events. The P. pastoris recombinants for lactase overexpression were screened by enzyme activity analysis and SDS-PAGE. The lactase expressed in P. pastoris was glycosylated protein with an apparent molecular weight of 130 kD, while the deglycosylated lactase treated with Endo H had an apparent molecular weight of about 110 kD. The expression level of secreted lactase protein in recombinant P. pastoris was 6 mg/mL with enzymatic activity of 3600 U/mL in the 5 L fermenter, which was the highest among that of all kinds of recombinant strains reported now. The optimal pH and optimal temperature of the lactase are 5.2 and 60 degrees C. The Vmax, Km, and specific activity of the lactase are 3.3 micromol/min, 1.7 mmol/L and 706.5 +/- 2.6 U/mg, respectively. Compare to the lactase from Aspergillus oryzae ATCC 20423, the expressed lactase from A. candidus have better enzymatic properties including the high thermostability, high specific activity and wide pH range for enzyme reaction. PMID:15989270

  3. Polyketides in Aspergillus terreus: biosynthesis pathway discovery and application.

    PubMed

    Yin, Ying; Cai, Menghao; Zhou, Xiangshan; Li, Zhiyong; Zhang, Yuanxing

    2016-09-01

    The knowledge of biosynthesis gene clusters, production improving methods, and bioactivity mechanisms is very important for the development of filamentous fungi metabolites. Metabolic engineering and heterologous expression methods can be applied to improve desired metabolite production, when their biosynthesis pathways have been revealed. And, stable supplement is a necessary basis of bioactivity mechanism discovery and following clinical trial. Aspergillus terreus is an outstanding producer of many bioactive agents, and a large part of them are polyketides. In this review, we took polyketides from A. terreus as examples, focusing on 13 polyketide synthase (PKS) genes in A. terreus NIH 2624 genome. The biosynthesis pathways of nine PKS genes have been reported, and their downstream metabolites are lovastatin, terreic acid, terrein, geodin, terretonin, citreoviridin, and asperfuranone, respectively. Among them, lovastatin is a well-known hypolipidemic agent. Terreic acid, terrein, citreoviridin, and asperfuranone show good bioactivities, especially anticancer activities. On the other hand, geodin and terretonin are mycotoxins. So, biosynthesis gene cluster information is important for the production or elimination of them. We also predicted three possible gene clusters that contain four PKS genes by homologous gene alignment with other Aspergillus strains. We think that this is an effective way to mine secondary metabolic gene clusters. PMID:27455860

  4. Aspergillus bronchitis in cystic fibrosis.

    PubMed

    Shoseyov, David; Brownlee, Keith G; Conway, Steven P; Kerem, Eitan

    2006-07-01

    Aspergillus fumigatus, a widely distributed spore-bearing fungus, is commonly grown in sputum cultures of patients with cystic fibrosis (CF). A fumigatus may cause allergic bronchopulmonary aspergillosis (ABPA), a complex condition that leads to worsening of airway inflammation and progressive damage and is diagnosed by specific criteria. In this report, we present six CF patients with respiratory deterioration that did not respond to appropriate antibiotic treatment. All had had A fumigatus in sputum cultures but did not fulfill the criteria of ABPA. Treatment with antifungal agents was followed by improvement in clinical condition. We suggest that in patients with CF, A fumigatus should be considered as a pathogen that may directly cause respiratory exacerbations. Antifungal therapy should be considered when deteriorating respiratory function is not responding to antibacterial therapy and A fumigatus is growing in sputum cultures. PMID:16840406

  5. Aspergillus Infections in Transplant Recipients

    PubMed Central

    Singh, Nina; Paterson, David L.

    2005-01-01

    Aspergillus infections are occurring with an increasing frequency in transplant recipients. Notable changes in the epidemiologic characteristics of this infection have occurred; these include a change in risk factors and later onset of infection. Management of invasive aspergillosis continues to be challenging, and the mortality rate, despite the use of newer antifungal agents, remains unacceptably high. Performing molecular studies to discern new targets for antifungal activity, identifying signaling pathways that may be amenable to immunologic interventions, assessing combination regimens of antifungal agents or combining antifungal agents with modulation of the host defense mechanisms, and devising diagnostic assays that can rapidly and reliably diagnose infections represent areas for future investigations that may lead to further improvement in outcomes. PMID:15653818

  6. Data on the presence or absence of genes encoding essential proteins for ochratoxin and fumonisin biosynthesis in Aspergillus niger and Aspergillus welwitschiae

    PubMed Central

    Massi, Fernanda Pelisson; Sartori, Daniele; Ferranti, Larissa de Souza; Iamanaka, Beatriz Thie; Taniwaki, Marta Hiromi; Vieira, Maria Lucia Carneiro; Fungaro, Maria Helena Pelegrinelli

    2016-01-01

    We present the multiplex PCR data for the presence/absence of genes involved in OTA and FB2 biosynthesis in Aspergillus niger/Aspergillus welwitschiae strains isolated from different food substrates in Brazil. Among the 175 strains analyzed, four mPCR profiles were found: Profile 1 (17%) highlights strains harboring in their genome the pks, radH and the fum8 genes. Profile 2 (3.5%) highlights strains harboring genes involved in OTA biosynthesis i.e. radH and pks. Profile 3 (51.5%) highlights strains harboring the fum8 gene. Profile 4 (28%) highlights strains not carrying the genes studied herein. This research content is supplemental to our original research article, “Prospecting for the incidence of genes involved in ochratoxin and fumonisin biosynthesis in Brazilian strains of A. niger and A. welwitschiae” [1]. PMID:27054181

  7. Data on the presence or absence of genes encoding essential proteins for ochratoxin and fumonisin biosynthesis in Aspergillus niger and Aspergillus welwitschiae.

    PubMed

    Massi, Fernanda Pelisson; Sartori, Daniele; Ferranti, Larissa de Souza; Iamanaka, Beatriz Thie; Taniwaki, Marta Hiromi; Vieira, Maria Lucia Carneiro; Fungaro, Maria Helena Pelegrinelli

    2016-06-01

    We present the multiplex PCR data for the presence/absence of genes involved in OTA and FB2 biosynthesis in Aspergillus niger/Aspergillus welwitschiae strains isolated from different food substrates in Brazil. Among the 175 strains analyzed, four mPCR profiles were found: Profile 1 (17%) highlights strains harboring in their genome the pks, radH and the fum8 genes. Profile 2 (3.5%) highlights strains harboring genes involved in OTA biosynthesis i.e. radH and pks. Profile 3 (51.5%) highlights strains harboring the fum8 gene. Profile 4 (28%) highlights strains not carrying the genes studied herein. This research content is supplemental to our original research article, "Prospecting for the incidence of genes involved in ochratoxin and fumonisin biosynthesis in Brazilian strains of A. niger and A. welwitschiae" [1]. PMID:27054181

  8. KdmB, a Jumonji Histone H3 Demethylase, Regulates Genome-Wide H3K4 Trimethylation and Is Required for Normal Induction of Secondary Metabolism in Aspergillus nidulans.

    PubMed

    Gacek-Matthews, Agnieszka; Berger, Harald; Sasaki, Takahiko; Wittstein, Kathrin; Gruber, Clemens; Lewis, Zachary A; Strauss, Joseph

    2016-08-01

    Histone posttranslational modifications (HPTMs) are involved in chromatin-based regulation of fungal secondary metabolite biosynthesis (SMB) in which the corresponding genes-usually physically linked in co-regulated clusters-are silenced under optimal physiological conditions (nutrient-rich) but are activated when nutrients are limiting. The exact molecular mechanisms by which HPTMs influence silencing and activation, however, are still to be better understood. Here we show by a combined approach of quantitative mass spectrometry (LC-MS/MS), genome-wide chromatin immunoprecipitation (ChIP-seq) and transcriptional network analysis (RNA-seq) that the core regions of silent A. nidulans SM clusters generally carry low levels of all tested chromatin modifications and that heterochromatic marks flank most of these SM clusters. During secondary metabolism, histone marks typically associated with transcriptional activity such as H3 trimethylated at lysine-4 (H3K4me3) are established in some, but not all gene clusters even upon full activation. KdmB, a Jarid1-family histone H3 lysine demethylase predicted to comprise a BRIGHT domain, a zinc-finger and two PHD domains in addition to the catalytic Jumonji domain, targets and demethylates H3K4me3 in vivo and mediates transcriptional downregulation. Deletion of kdmB leads to increased transcription of about ~1750 genes across nutrient-rich (primary metabolism) and nutrient-limiting (secondary metabolism) conditions. Unexpectedly, an equally high number of genes exhibited reduced expression in the kdmB deletion strain and notably, this group was significantly enriched for genes with known or predicted functions in secondary metabolite biosynthesis. Taken together, this study extends our general knowledge about multi-domain KDM5 histone demethylases and provides new details on the chromatin-level regulation of fungal secondary metabolite production. PMID:27548260

  9. KdmB, a Jumonji Histone H3 Demethylase, Regulates Genome-Wide H3K4 Trimethylation and Is Required for Normal Induction of Secondary Metabolism in Aspergillus nidulans

    PubMed Central

    Gacek-Matthews, Agnieszka; Sasaki, Takahiko; Wittstein, Kathrin; Gruber, Clemens; Strauss, Joseph

    2016-01-01

    Histone posttranslational modifications (HPTMs) are involved in chromatin-based regulation of fungal secondary metabolite biosynthesis (SMB) in which the corresponding genes—usually physically linked in co-regulated clusters—are silenced under optimal physiological conditions (nutrient-rich) but are activated when nutrients are limiting. The exact molecular mechanisms by which HPTMs influence silencing and activation, however, are still to be better understood. Here we show by a combined approach of quantitative mass spectrometry (LC-MS/MS), genome-wide chromatin immunoprecipitation (ChIP-seq) and transcriptional network analysis (RNA-seq) that the core regions of silent A. nidulans SM clusters generally carry low levels of all tested chromatin modifications and that heterochromatic marks flank most of these SM clusters. During secondary metabolism, histone marks typically associated with transcriptional activity such as H3 trimethylated at lysine-4 (H3K4me3) are established in some, but not all gene clusters even upon full activation. KdmB, a Jarid1-family histone H3 lysine demethylase predicted to comprise a BRIGHT domain, a zinc-finger and two PHD domains in addition to the catalytic Jumonji domain, targets and demethylates H3K4me3 in vivo and mediates transcriptional downregulation. Deletion of kdmB leads to increased transcription of about ~1750 genes across nutrient-rich (primary metabolism) and nutrient-limiting (secondary metabolism) conditions. Unexpectedly, an equally high number of genes exhibited reduced expression in the kdmB deletion strain and notably, this group was significantly enriched for genes with known or predicted functions in secondary metabolite biosynthesis. Taken together, this study extends our general knowledge about multi-domain KDM5 histone demethylases and provides new details on the chromatin-level regulation of fungal secondary metabolite production. PMID:27548260

  10. Clustered array of ochratoxin A biosynthetic genes in Aspergillus steynii and their expression patterns in permissive conditions.

    PubMed

    Gil-Serna, Jessica; Vázquez, Covadonga; González-Jaén, María Teresa; Patiño, Belén

    2015-12-01

    Aspergillus steynii is probably the most relevant species of section Circumdati producing ochratoxin A (OTA). This mycotoxin contaminates a wide number of commodities and it is highly toxic for humans and animals. Little is known on the biosynthetic genes and their regulation in Aspergillus species. In this work, we identified and analysed three contiguous genes in A. steynii using 5'-RACE and genome walking approaches which predicted a cytochrome P450 monooxygenase (p450ste), a non-ribosomal peptide synthetase (nrpsste) and a polyketide synthase (pksste). These three genes were contiguous within a 20742 bp long genomic DNA fragment. Their corresponding cDNA were sequenced and their expression was analysed in three A. steynii strains using real time RT-PCR specific assays in permissive conditions in in vitro cultures. OTA was also analysed in these cultures. Comparative analyses of predicted genomic, cDNA and amino acid sequences were performed with sequences of similar gene functions. All the results obtained in these analyses were consistent and point out the involvement of these three genes in OTA biosynthesis by A. steynii and showed a co-ordinated expression pattern. This is the first time that a clustered organization OTA biosynthetic genes has been reported in Aspergillus genus. The results also suggested that this situation might be common in Aspergillus OTA-producing species and distinct to the one described for Penicillium species. PMID:26256718

  11. Comparative and Functional Genomics in Identifying Aflatoxin Biosynthetic Genes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Identification of genes involved in aflatoxin biosynthesis through Aspergillus flavus genomics has been actively pursued. A. flavus Expressed Sequence Tags (EST’s) and whole genome sequencing have been completed. Groups of genes that are potentially involved in aflatoxin production have been profi...

  12. Mining a database of single amplified genomes from Red Sea brine pool extremophiles—improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA)

    PubMed Central

    Grötzinger, Stefan W.; Alam, Intikhab; Ba Alawi, Wail; Bajic, Vladimir B.; Stingl, Ulrich; Eppinger, Jörg

    2014-01-01

    Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile's genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available

  13. Recurrent prosthetic valve endocarditis caused by Aspergillus delacroxii (formerly Aspergillus nidulans var. echinulatus)

    PubMed Central

    Uhrin, Gábor Balázs; Jensen, Rasmus Hare; Korup, Eva; Grønlund, Jens; Hjort, Ulla; Moser, Claus; Arendrup, Maiken Cavling; Schønheyder, Henrik Carl

    2015-01-01

    We report Aspergillus delacroxii (formerly Aspergillus nidulans var. echinulatus) causing recurrent prosthetic valve endocarditis. The fungus was the sole agent detected during replacement of a mechanical aortic valve conduit due to abscess formation. Despite extensive surgery and anti-fungal treatment, the patient had a cerebral hemorrhage 4 months post-surgery prompting a diagnosis of recurrent prosthetic valve endocarditis and fungemia. PMID:26909244

  14. Environmental and Developmental Factors Influencing Aflatoxin Production by Aspergillus flavus and Aspergillus parasiticus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxins are carcinogenic mycotoxins formed by a number of fungi in the genus Aspergillus. The major fungi responsible for aflatoxin formation in crop seeds in the field and in storage are Aspergillus flavus and A. parasiticus. This review emphasizes developmental, environmental, biological and ...

  15. Statistical databases

    SciTech Connect

    Kogalovskii, M.R.

    1995-03-01

    This paper presents a review of problems related to statistical database systems, which are wide-spread in various fields of activity. Statistical databases (SDB) are referred to as databases that consist of data and are used for statistical analysis. Topics under consideration are: SDB peculiarities, properties of data models adequate for SDB requirements, metadata functions, null-value problems, SDB compromise protection problems, stored data compression techniques, and statistical data representation means. Also examined is whether the present Database Management Systems (DBMS) satisfy the SDB requirements. Some actual research directions in SDB systems are considered.

  16. Sesterterpene ophiobolin biosynthesis involving multiple gene clusters in Aspergillus ustus

    PubMed Central

    Chai, Hangzhen; Yin, Ru; Liu, Yongfeng; Meng, Huiying; Zhou, Xianqiang; Zhou, Guolin; Bi, Xupeng; Yang, Xue; Zhu, Tonghan; Zhu, Weiming; Deng, Zixin; Hong, Kui

    2016-01-01

    Terpenoids are the most diverse and abundant natural products among which sesterterpenes account for less than 2%, with very few reports on their biosynthesis. Ophiobolins are tricyclic 5–8–5 ring sesterterpenes with potential pharmaceutical application. Aspergillus ustus 094102 from mangrove rizhosphere produces ophiobolin and other terpenes. We obtained five gene cluster knockout mutants, with altered ophiobolin yield using genome sequencing and in silico analysis, combined with in vivo genetic manipulation. Involvement of the five gene clusters in ophiobolin synthesis was confirmed by investigation of the five key terpene synthesis relevant enzymes in each gene cluster, either by gene deletion and complementation or in vitro verification of protein function. The results demonstrate that ophiobolin skeleton biosynthesis involves five gene clusters, which are responsible for C15, C20, C25, and C30 terpenoid biosynthesis. PMID:27273151

  17. Sesterterpene ophiobolin biosynthesis involving multiple gene clusters in Aspergillus ustus.

    PubMed

    Chai, Hangzhen; Yin, Ru; Liu, Yongfeng; Meng, Huiying; Zhou, Xianqiang; Zhou, Guolin; Bi, Xupeng; Yang, Xue; Zhu, Tonghan; Zhu, Weiming; Deng, Zixin; Hong, Kui

    2016-01-01

    Terpenoids are the most diverse and abundant natural products among which sesterterpenes account for less than 2%, with very few reports on their biosynthesis. Ophiobolins are tricyclic 5-8-5 ring sesterterpenes with potential pharmaceutical application. Aspergillus ustus 094102 from mangrove rizhosphere produces ophiobolin and other terpenes. We obtained five gene cluster knockout mutants, with altered ophiobolin yield using genome sequencing and in silico analysis, combined with in vivo genetic manipulation. Involvement of the five gene clusters in ophiobolin synthesis was confirmed by investigation of the five key terpene synthesis relevant enzymes in each gene cluster, either by gene deletion and complementation or in vitro verification of protein function. The results demonstrate that ophiobolin skeleton biosynthesis involves five gene clusters, which are responsible for C15, C20, C25, and C30 terpenoid biosynthesis. PMID:27273151

  18. In-silico analysis of Aspergillus niger beta-glucosidases

    NASA Astrophysics Data System (ADS)

    Yeo S., L.; Shazilah, K.; Suhaila, S.; Abu Bakar F., D.; Murad A. M., A.

    2014-09-01

    Genomic data mining was carried out and revealed a total of seventeen β-glucosidases in filamentous fungi Aspergillus niger. Two of them belonged to glycoside hydrolase family 1 (GH1) while the rest belonged to genes in family 3 (GH3). These proteins were then named according to the nomenclature as proposed by the International Union of Biochemistry (IUB), starting from the lowest pI and glycoside hydrolase family. Their properties were predicted using various bionformatic tools showing the presence of domains for signal peptide and active sites. Interestingly, one particular domain, PA14 (protective antigen) was present in four of the enzymes, predicted to be involved in carbohydrate binding. A phylogenetic tree grouped the two glycoside hydrolase families with GH1 and GH3 related organisms. This study showed that the various domains present in these β-glucosidases are postulated to be crucial for the survival of this fungus, as supported by other analysis.

  19. Analytical and computational approaches to define the Aspergillus niger secretome

    SciTech Connect

    Tsang, Adrian; Butler, Gregory D.; Powlowski, Justin; Panisko, Ellen A.; Baker, Scott E.

    2009-03-01

    We used computational and mass spectrometric approaches to characterize the Aspergillus niger secretome. The 11,200 gene models predicted in the genome of A. niger strain ATCC 1015 were the data source for the analysis. Depending on the computational methods used, 691 to 881 proteins were predicted to be secreted proteins. We cultured A. niger in six different media and analyzed the extracellular proteins produced using mass spectrometry. A total of 222 proteins were identified, with 39 proteins expressed under all six conditions and 74 proteins expressed under only one condition. The secreted proteins identified by mass spectrometry were used to guide the correction of about 20 gene models. Additional analysis focused on extracellular enzymes of interest for biomass processing. Of the 63 glycoside hydrolases predicted to be capable of hydrolyzing cellulose, hemicellulose or pectin, 94% of the exo-acting enzymes and only 18% of the endo-acting enzymes were experimentally detected.

  20. Scientific Advances with Aspergillus Species that Are Used for Food and Biotech Applications.

    PubMed

    Biesebeke, Rob Te; Record, Erik

    2008-01-01

    Yeast and filamentous fungi have been used for centuries in diverse biotechnological processes. Fungal fermentation technology is traditionally used in relation to food production, such as for bread, beer, cheese, sake and soy sauce. Last century, the industrial application of yeast and filamentous fungi expanded rapidly, with excellent examples such as purified enzymes and secondary metabolites (e.g. antibiotics), which are used in a wide range of food as well as non-food industries. Research on protein and/or metabolite secretion by fungal species has focused on identifying bottlenecks in (post-) transcriptional regulation of protein production, metabolic rerouting, morphology and the transit of proteins through the secretion pathway. In past years, genome sequencing of some fungi (e.g. Aspergillus oryzae, Aspergillus niger) has been completed. The available genome sequences have enabled identification of genes and functionally important regions of the genome. This has directed research to focus on a post-genomics era in which transcriptomics, proteomics and metabolomics methodologies will help to explore the scientific relevance and industrial application of fungal genome sequences. PMID:21558706