Science.gov

Sample records for aspergillus genome database

  1. The Aspergillus Genome Database: multispecies curation and incorporation of RNA-Seq data to improve structural gene annotations

    PubMed Central

    Cerqueira, Gustavo C.; Arnaud, Martha B.; Inglis, Diane O.; Skrzypek, Marek S.; Binkley, Gail; Simison, Matt; Miyasato, Stuart R.; Binkley, Jonathan; Orvis, Joshua; Shah, Prachi; Wymore, Farrell; Sherlock, Gavin; Wortman, Jennifer R.

    2014-01-01

    The Aspergillus Genome Database (AspGD; http://www.aspgd.org) is a freely available web-based resource that was designed for Aspergillus researchers and is also a valuable source of information for the entire fungal research community. In addition to being a repository and central point of access to genome, transcriptome and polymorphism data, AspGD hosts a comprehensive comparative genomics toolbox that facilitates the exploration of precomputed orthologs among the 20 currently available Aspergillus genomes. AspGD curators perform gene product annotation based on review of the literature for four key Aspergillus species: Aspergillus nidulans, Aspergillus oryzae, Aspergillus fumigatus and Aspergillus niger. We have iteratively improved the structural annotation of Aspergillus genomes through the analysis of publicly available transcription data, mostly expressed sequenced tags, as described in a previous NAR Database article (Arnaud et al. 2012). In this update, we report substantive structural annotation improvements for A. nidulans, A. oryzae and A. fumigatus genomes based on recently available RNA-Seq data. Over 26 000 loci were updated across these species; although those primarily comprise the addition and extension of untranslated regions (UTRs), the new analysis also enabled over 1000 modifications affecting the coding sequence of genes in each target genome. PMID:24194595

  2. Genome databases

    SciTech Connect

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts in the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.

  3. Plant Genome Duplication Database.

    PubMed

    Lee, Tae-Ho; Kim, Junah; Robertson, Jon S; Paterson, Andrew H

    2017-01-01

    Genome duplication, widespread in flowering plants, is a driving force in evolution. Genome alignments between/within genomes facilitate identification of homologous regions and individual genes to investigate evolutionary consequences of genome duplication. PGDD (the Plant Genome Duplication Database), a public web service database, provides intra- or interplant genome alignment information. At present, PGDD contains information for 47 plants whose genome sequences have been released. Here, we describe methods for identification and estimation of dates of genome duplication and speciation by functions of PGDD.The database is freely available at http://chibba.agtec.uga.edu/duplication/.

  4. Querying genomic databases

    SciTech Connect

    Baehr, A.; Hagstrom, R.; Joerg, D.; Overbeek, R.

    1991-09-01

    A natural-language interface has been developed that retrieves genomic information by using a simple subset of English. The interface spares the biologist from the task of learning database-specific query languages and computer programming. Currently, the interface deals with the E. coli genome. It can, however, be readily extended and shows promise as a means of easy access to other sequenced genomic databases as well.

  5. Comparative Reannotation of 21 Aspergillus Genomes

    SciTech Connect

    Salamov, Asaf; Riley, Robert; Kuo, Alan; Grigoriev, Igor

    2013-03-08

    We used comparative gene modeling to reannotate 21 Aspergillus genomes. Initial automatic annotation of individual genomes may contain some errors of different nature, e.g. missing genes, incorrect exon-intron structures, 'chimeras', which fuse 2 or more real genes or alternatively splitting some real genes into 2 or more models. The main premise behind the comparative modeling approach is that for closely related genomes most orthologous families have the same conserved gene structure. The algorithm maps all gene models predicted in each individual Aspergillus genome to the other genomes and, for each locus, selects from potentially many competing models, the one which most closely resembles the orthologous genes from other genomes. This procedure is iterated until no further change in gene models is observed. For Aspergillus genomes we predicted in total 4503 new gene models ( ~;;2percent per genome), supported by comparative analysis, additionally correcting ~;;18percent of old gene models. This resulted in a total of 4065 more genes with annotated PFAM domains (~;;3percent increase per genome). Analysis of a few genomes with EST/transcriptomics data shows that the new annotation sets also have a higher number of EST-supported splice sites at exon-intron boundaries.

  6. Genome sequence of Aspergillus luchuensis NBRC 4314

    PubMed Central

    Yamada, Osamu; Machida, Masayuki; Hosoyama, Akira; Goto, Masatoshi; Takahashi, Toru; Futagami, Taiki; Yamagata, Youhei; Takeuchi, Michio; Kobayashi, Tetsuo; Koike, Hideaki; Abe, Keietsu; Asai, Kiyoshi; Arita, Masanori; Fujita, Nobuyuki; Fukuda, Kazuro; Higa, Ken-ichi; Horikawa, Hiroshi; Ishikawa, Takeaki; Jinno, Koji; Kato, Yumiko; Kirimura, Kohtaro; Mizutani, Osamu; Nakasone, Kaoru; Sano, Motoaki; Shiraishi, Yohei; Tsukahara, Masatoshi; Gomi, Katsuya

    2016-01-01

    Awamori is a traditional distilled beverage made from steamed Thai-Indica rice in Okinawa, Japan. For brewing the liquor, two microbes, local kuro (black) koji mold Aspergillus luchuensis and awamori yeast Saccharomyces cerevisiae are involved. In contrast, that yeasts are used for ethanol fermentation throughout the world, a characteristic of Japanese fermentation industries is the use of Aspergillus molds as a source of enzymes for the maceration and saccharification of raw materials. Here we report the draft genome of a kuro (black) koji mold, A. luchuensis NBRC 4314 (RIB 2604). The total length of nonredundant sequences was nearly 34.7 Mb, comprising approximately 2,300 contigs with 16 telomere-like sequences. In total, 11,691 genes were predicted to encode proteins. Most of the housekeeping genes, such as transcription factors and N-and O-glycosylation system, were conserved with respect to Aspergillus niger and Aspergillus oryzae. An alternative oxidase and acid-stable α-amylase regarding citric acid production and fermentation at a low pH as well as a unique glutamic peptidase were also found in the genome. Furthermore, key biosynthetic gene clusters of ochratoxin A and fumonisin B were absent when compared with A. niger genome, showing the safety of A. luchuensis for food and beverage production. This genome information will facilitate not only comparative genomics with industrial kuro-koji molds, but also molecular breeding of the molds in improvements of awamori fermentation. PMID:27651094

  7. Mouse genome database 2016

    PubMed Central

    Bult, Carol J.; Eppig, Janan T.; Blake, Judith A.; Kadin, James A.; Richardson, Joel E.

    2016-01-01

    The Mouse Genome Database (MGD; http://www.informatics.jax.org) is the primary community model organism database for the laboratory mouse and serves as the source for key biological reference data related to mouse genes, gene functions, phenotypes and disease models with a strong emphasis on the relationship of these data to human biology and disease. As the cost of genome-scale sequencing continues to decrease and new technologies for genome editing become widely adopted, the laboratory mouse is more important than ever as a model system for understanding the biological significance of human genetic variation and for advancing the basic research needed to support the emergence of genome-guided precision medicine. Recent enhancements to MGD include new graphical summaries of biological annotations for mouse genes, support for mobile access to the database, tools to support the annotation and analysis of sets of genes, and expanded support for comparative biology through the expansion of homology data. PMID:26578600

  8. Mouse genome database 2016.

    PubMed

    Bult, Carol J; Eppig, Janan T; Blake, Judith A; Kadin, James A; Richardson, Joel E

    2016-01-04

    The Mouse Genome Database (MGD; http://www.informatics.jax.org) is the primary community model organism database for the laboratory mouse and serves as the source for key biological reference data related to mouse genes, gene functions, phenotypes and disease models with a strong emphasis on the relationship of these data to human biology and disease. As the cost of genome-scale sequencing continues to decrease and new technologies for genome editing become widely adopted, the laboratory mouse is more important than ever as a model system for understanding the biological significance of human genetic variation and for advancing the basic research needed to support the emergence of genome-guided precision medicine. Recent enhancements to MGD include new graphical summaries of biological annotations for mouse genes, support for mobile access to the database, tools to support the annotation and analysis of sets of genes, and expanded support for comparative biology through the expansion of homology data.

  9. Genomic Islands in Pathogenic Filamentous Fungus Aspergillus fumigatus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We present the genome sequences of a new clinical isolate, CEA10, of an important human pathogen, Aspergillus fumigatus, and two closely related, but rarely pathogenic species, Neosartorya fischeri NRRL181 and Aspergillus clavatus NRRL1. Comparative genomic analysis of CEA10 with the recently sequen...

  10. [Comparison of genomes between Aspergillus nidulans and 30 filamentous ascomycetes].

    PubMed

    Zeng, Zhao-Qing; Zhao, Fu-Yong; Hsiang, Tom; Yu, Zhi-He

    2010-11-01

    To investigate the conserved homologs of filamentous ascomycetes genomes, the local fungal genome database used in this analysis was established, which consisted of 31 latest and complete genome data publicly available on the Internet. An expectation value cutoff of 0.1 was used to identify significant hits. Each complete gene set of the query genome Aspergillus nidulans genome with 10,560 annotated genes was splitted into individual FASTA files with Seqverter and then compared separately against each filamentous ascomycete genome using Standalone BLASTN. The result indicated that the number of matches reflected the evolutional relationships of the filamentous ascomycetes analysed. Of 10,560 genes in Aspergillus nidulans genome, 924 had match sequences with other 30 filamentous ascomycetes ones. The number of homology sequences were 6, 3, 6, and 6 at E-values in the range of 10(-5) to 0.1, 10(-30) to 10(-5), 10(-100) to 10(-30) and 0 to 1000(-100), respectively. Six homologs at E-values ranging from 10(-5) to 0.1 and 3 at E-values ranging from 10(-30) to 10(-5) were variable, while the 6 at E-values ranging from 0 to 10(-100) were highly conserved based on the alignments using ClustalX. Six homologs were relatively conserved at E-values in the range of 10(-100) to 10(-30), which can be used in phylogeny of these filamentous ascomycetes in this study.

  11. The Giardia genome project database.

    PubMed

    McArthur, A G; Morrison, H G; Nixon, J E; Passamaneck, N Q; Kim, U; Hinkle, G; Crocker, M K; Holder, M E; Farr, R; Reich, C I; Olsen, G E; Aley, S B; Adam, R D; Gillin, F D; Sogin, M L

    2000-08-15

    The Giardia genome project database provides an online resource for Giardia lamblia (WB strain, clone C6) genome sequence information. The database includes edited single-pass reads, the results of BLASTX searches, and details of progress towards sequencing the entire 12 million-bp Giardia genome. Pre-sorted BLASTX results can be retrieved based on keyword searches and BLAST searches of the high throughput Giardia data can be initiated from the web site or through NCBI. Descriptions of the genomic DNA libraries, project protocols and summary statistics are also available. Although the Giardia genome project is ongoing, new sequences are made available on a bi-monthly basis to ensure that researchers have access to information that may assist them in the search for genes and their biological function. The current URL of the Giardia genome project database is www.mbl.edu/Giardia.

  12. Genomic sequence for the aflatoxigenic filamentous fungus Aspergillus nomius

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genome of the A. nomius type strain was sequenced using a personal genome machine. Annotation of the genes was undertaken, followed by gene ontology and an investigation into the number of secondary metabolite clusters. Comparative studies with other Aspergillus species involved shared/unique ge...

  13. GOLD: The Genomes Online Database

    DOE Data Explorer

    Kyrpides, Nikos; Liolios, Dinos; Chen, Amy; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor; Bernal, Alex

    Since its inception in 1997, GOLD has continuously monitored genome sequencing projects worldwide and has provided the community with a unique centralized resource that integrates diverse information related to Archaea, Bacteria, Eukaryotic and more recently Metagenomic sequencing projects. As of September 2007, GOLD recorded 639 completed genome projects. These projects have their complete sequence deposited into the public archival sequence databases such as GenBank EMBL,and DDBJ. From the total of 639 complete and published genome projects as of 9/2007, 527 were bacterial, 47 were archaeal and 65 were eukaryotic. In addition to the complete projects, there were 2158 ongoing sequencing projects. 1328 of those were bacterial, 59 archaeal and 771 eukaryotic projects. Two types of metadata are provided by GOLD: (i) project metadata and (ii) organism/environment metadata. GOLD CARD pages for every project are available from the link of every GOLD_STAMP ID. The information in every one of these pages is organized into three tables: (a) Organism information, (b) Genome project information and (c) External links. [The Genomes On Line Database (GOLD) in 2007: Status of genomic and metagenomic projects and their associated metadata, Konstantinos Liolios, Konstantinos Mavromatis, Nektarios Tavernarakis and Nikos C. Kyrpides, Nucleic Acids Research Advance Access published online on November 2, 2007, Nucleic Acids Research, doi:10.1093/nar/gkm884]

    The basic tables in the GOLD database that can be browsed or searched include the following information:

    • Gold Stamp ID
    • Organism name
    • Domain
    • Links to information sources
    • Size and link to a map, when available
    • Chromosome number, Plas number, and GC content
    • A link for downloading the actual genome data
    • Institution that did the sequencing
    • Funding source
    • Database where information resides
    • Publication status and information

    • Maize Genetics and Genomics Database

      Technology Transfer Automated Retrieval System (TEKTRAN)

      The 2007 report for MaizeGDB lists the new hires who will focus on curation/outreach and the genome sequence, respectively. Currently all sequence in the database comes from a PlantGDB pipeline and is presented with deep links to external resources such as PlantGDB, Dana Farber, GenBank, the Arizona...

    • What can comparative genomics tell us about species concepts in the genus Aspergillus?

      SciTech Connect

      Rokas, Antonis; payne, gary; Federova, Natalie D.; Baker, Scott E.; Machida, Masa; yu, Jiujiang; georgianna, D. R.; Dean, Ralph A.; Bhatnagar, Deepak; Cleveland, T. E.; Wortman, Jennifer R.; Maiti, R.; Joardar, V.; Amedeo, Paolo; Denning, David W.; Nierman, William C.

      2007-12-15

      Understanding the nature of species" boundaries is a fundamental question in evolutionary biology. The availability of genomes from several species of the genus Aspergillus allows us for the first time to examine the demarcation of fungal species at the whole-genome level. Here, we examine four case studies, two of which involve intraspecific comparisons, whereas the other two deal with interspecific genomic comparisons between closely related species. These four comparisons reveal significant variation in the nature of species boundaries across Aspergillus. For example, comparisons between A. fumigatus and Neosartorya fischeri (the teleomorph of A. fischerianus) and between A. oryzae and A. flavus suggest that measures of sequence similarity and species-specific genes are significantly higher for the A. fumigatus - N. fischeri pair. Importantly, the values obtained from the comparison between A. oryzae and A. flavus are remarkably similar to those obtained from an intra-specific comparison of A. fumigatus strains, giving support to the proposal that A. oryzae represents a distinct ecotype of A. flavus and not a distinct species. We argue that genomic data can aid Aspergillus taxonomy by serving as a source of novel and unprecedented amounts of comparative data, as a resource for the development of additional diagnostic tools, and finally as a knowledge database about the biological differences between strains and species.

    • The 2008 update of the Aspergillus nidulans genome annotation: a community effort

      PubMed Central

      Wortman, Jennifer Russo; Gilsenan, Jane Mabey; Joardar, Vinita; Deegan, Jennifer; Clutterbuck, John; Andersen, Mikael R.; Archer, David; Bencina, Mojca; Braus, Gerhard; Coutinho, Pedro; von Döhren, Hans; Doonan, John; Driessen, Arnold J.M.; Durek, Pawel; Espeso, Eduardo; Fekete, Erzsébet; Flipphi, Michel; Estrada, Carlos Garcia; Geysens, Steven; Goldman, Gustavo; de Groot, Piet W.J.; Hansen, Kim; Harris, Steven D.; Heinekamp, Thorsten; Helmstaedt, Kerstin; Henrissat, Bernard; Hofmann, Gerald; Homan, Tim; Horio, Tetsuya; Horiuchi, Hiroyuki; James, Steve; Jones, Meriel; Karaffa, Levente; Karányi, Zsolt; Kato, Masashi; Keller, Nancy; Kelly, Diane E.; Kiel, Jan A.K.W.; Kim, Jung-Mi; van der Klei, Ida J.; Klis, Frans M.; Kovalchuk, Andriy; Kraševec, Nada; Kubicek, Christian P.; Liu, Bo; MacCabe, Andrew; Meyer, Vera; Mirabito, Pete; Miskei, Márton; Mos, Magdalena; Mullins, Jonathan; Nelson, David R.; Nielsen, Jens; Oakley, Berl R.; Osmani, Stephen A.; Pakula, Tiina; Paszewski, Andrzej; Paulsen, Ian; Pilsyk, Sebastian; Pócsi, István; Punt, Peter J.; Ram, Arthur F.J.; Ren, Qinghu; Robellet, Xavier; Robson, Geoff; Seiboth, Bernhard; Solingen, Piet van; Specht, Thomas; Sun, Jibin; Taheri-Talesh, Naimeh; Takeshita, Norio; Ussery, Dave; vanKuyk, Patricia A.; Visser, Hans; van de Vondervoort, Peter J.I.; de Vries, Ronald P.; Walton, Jonathan; Xiang, Xin; Xiong, Yi; Zeng, An Ping; Brandt, Bernd W.; Cornell, Michael J.; van den Hondel, Cees A.M.J.J.; Visser, Jacob; Oliver, Stephen G.; Turner, Geoffrey

      2010-01-01

      The identification and annotation of protein-coding genes is one of the primary goals of whole-genome sequencing projects, and the accuracy of predicting the primary protein products of gene expression is vital to the interpretation of the available data and the design of downstream functional applications. Nevertheless, the comprehensive annotation of eukaryotic genomes remains a considerable challenge. Many genomes submitted to public databases, including those of major model organisms, contain significant numbers of wrong and incomplete gene predictions. We present a community-based reannotation of the Aspergillus nidulans genome with the primary goal of increasing the number and quality of protein functional assignments through the careful review of experts in the field of fungal biology. PMID:19146970

    • Comparative Genomics of Aspergillus flavus and A. oryzae: An Early View

      Technology Transfer Automated Retrieval System (TEKTRAN)

      Aspergillus flavus produces aflatoxins and is the second leading cause of aspergillosis in immunocompromised individuals. Aspergillus oryzae, on the other hand, has been used for centuries in Japan for the fermentation of food. The recently available whole genome sequences of Aspergillus flavus an...

    • The GDB Human Genome Database Anno 1997.

      PubMed Central

      Fasman, K H; Letovsky, S I; Li, P; Cottingham, R W; Kingsbury, D T

      1997-01-01

      The value of the Genome Database (GDB) for the human genome research community has been greatly increased since the release of version 6. 0 last year. Thanks to the introduction of significant technical improvements, GDB has seen dramatic growth in the type and volume of information stored in the database. This article summarizes the types of data that are now available in the Genome Database, demonstrates how the database is interconnected with other biomedical resources on the World Wide Web, discusses how researchers can contribute new or updated information to the database, and describes our current efforts as well as planned improvements for the future. PMID:9016507

    • Aspergillus Niger Genomics: Past, Present and into the Future

      SciTech Connect

      Baker, Scott E.

      2006-09-01

      Aspergillus niger is a filamentous ascomycete fungus that is ubiquitous in the environment and has been implicated in opportunistic infections of humans. In addition to its role as an opportunistic human pathogen, A. niger is economically important as a fermentation organism used for the production of citric acid. Industrial citric acid production by A. niger represents one of the most efficient, highest yield bioprocesses in use currently by industry. The genome size of A. niger is estimated to be between 35.5 and 38.5 megabases (Mb) divided among eight chromosomes/linkage groups that vary in size from 3.5 - 6.6 Mb. Currently, there are three independent A. niger genome projects, an indication of the economic importance of this organism. The rich amount of data resulting from these multiple A. niger genome sequences will be used for basic and applied research programs applicable to fermentation process development, morphology and pathogenicity.

    • Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine

      PubMed Central

      Elsik, Christine G.; Tayal, Aditi; Diesh, Colin M.; Unni, Deepak R.; Emery, Marianne L.; Nguyen, Hung N.; Hagen, Darren E.

      2016-01-01

      We report an update of the Hymenoptera Genome Database (HGD) (http://HymenopteraGenome.org), a model organism database for insect species of the order Hymenoptera (ants, bees and wasps). HGD maintains genomic data for 9 bee species, 10 ant species and 1 wasp, including the versions of genome and annotation data sets published by the genome sequencing consortiums and those provided by NCBI. A new data-mining warehouse, HymenopteraMine, based on the InterMine data warehousing system, integrates the genome data with data from external sources and facilitates cross-species analyses based on orthology. New genome browsers and annotation tools based on JBrowse/WebApollo provide easy genome navigation, and viewing of high throughput sequence data sets and can be used for collaborative genome annotation. All of the genomes and annotation data sets are combined into a single BLAST server that allows users to select and combine sequence data sets to search. PMID:26578564

  1. Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine.

    PubMed

    Elsik, Christine G; Tayal, Aditi; Diesh, Colin M; Unni, Deepak R; Emery, Marianne L; Nguyen, Hung N; Hagen, Darren E

    2016-01-04

    We report an update of the Hymenoptera Genome Database (HGD) (http://HymenopteraGenome.org), a model organism database for insect species of the order Hymenoptera (ants, bees and wasps). HGD maintains genomic data for 9 bee species, 10 ant species and 1 wasp, including the versions of genome and annotation data sets published by the genome sequencing consortiums and those provided by NCBI. A new data-mining warehouse, HymenopteraMine, based on the InterMine data warehousing system, integrates the genome data with data from external sources and facilitates cross-species analyses based on orthology. New genome browsers and annotation tools based on JBrowse/WebApollo provide easy genome navigation, and viewing of high throughput sequence data sets and can be used for collaborative genome annotation. All of the genomes and annotation data sets are combined into a single BLAST server that allows users to select and combine sequence data sets to search.

  2. Draft Genome Sequence of an Aflatoxigenic Aspergillus Species, A. bombycis

    PubMed Central

    Moore, Geromy G.; Mack, Brian M.; Beltz, Shannon B.; Gilbert, Matthew K.

    2016-01-01

    Aspergillus bombycis was first isolated from silkworm frass in Japan. It has been reportedly misidentified as A. nomius due to their macro-morphological and chemotype similarities. We sequenced the genome of the A. bombycis Type strain and found it to be comparable in size (37 Mb), as well as in numbers of predicted genes (12,266), to other sequenced Aspergilli. The aflatoxin gene cluster in this strain is similar in size and the genes are oriented the same as other B- + G-aflatoxin producing species, and this strain contains a complete but nonfunctional gene cluster for the production of cyclopiazonic acid. Our findings also showed that the A. bombycis Type strain contains a single MAT1-2 gene indicating that this species is likely heterothallic (self-infertile). This draft genome will contribute to our understanding of the genes and pathways necessary for aflatoxin synthesis as well as the evolutionary relationships of aflatoxigenic fungi. PMID:27664179

  3. The UCSC Genome Browser database: 2015 update.

    PubMed

    Rosenbloom, Kate R; Armstrong, Joel; Barber, Galt P; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Dreszer, Timothy R; Fujita, Pauline A; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A; Heitner, Steve; Hickey, Glenn; Hinrichs, Angie S; Hubley, Robert; Karolchik, Donna; Learned, Katrina; Lee, Brian T; Li, Chin H; Miga, Karen H; Nguyen, Ngan; Paten, Benedict; Raney, Brian J; Smit, Arian F A; Speir, Matthew L; Zweig, Ann S; Haussler, David; Kuhn, Robert M; Kent, W James

    2015-01-01

    Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), 'mined the web' for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled.

  4. Rapid genome resequencing of an atoxigenic strain of Aspergillus carbonarius

    DOE PAGES

    Cabañes, F. Javier; Sanseverino, Walter; Castellá, Gemma; ...

    2015-03-13

    In microorganisms, Ion Torrent sequencing technology has been proved to be useful in whole-genome sequencing of bacterial genomes (5 Mbp). In our study, for the first time we used this technology to perform a resequencing approach in a whole fungal genome (36 Mbp), a non-ochratoxin A producing strain of Aspergillus carbonarius. Ochratoxin A (OTA) is a potent nephrotoxin which is found mainly in cereals and their products, but it also occurs in a variety of common foods and beverages. Due to the fact that this strain does not produce OTA, we focused some of the bioinformatics analyses in genes involvedmore » in OTA biosynthesis, using a reference genome of an OTA producing strain of the same species. This study revealed that in the atoxigenic strain there is a high accumulation of nonsense and missense mutations in several genes. Importantly, a two fold increase in gene mutation ratio was observed in PKS and NRPS encoding genes which are suggested to be involved in OTA biosynthesis.« less

  5. The UCSC Genome Browser Database: update 2006.

    PubMed

    Hinrichs, A S; Karolchik, D; Baertsch, R; Barber, G P; Bejerano, G; Clawson, H; Diekhans, M; Furey, T S; Harte, R A; Hsu, F; Hillman-Jackson, J; Kuhn, R M; Pedersen, J S; Pohl, A; Raney, B J; Rosenbloom, K R; Siepel, A; Smith, K E; Sugnet, C W; Sultan-Qurraie, A; Thomas, D J; Trumbower, H; Weber, R J; Weirauch, M; Zweig, A S; Haussler, D; Kent, W J

    2006-01-01

    The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, mRNA and expressed sequence tag evidence, comparative genomics, regulation, expression and variation data. The database is optimized to support fast interactive performance with web tools that provide powerful visualization and querying capabilities for mining the data. The Genome Browser displays a wide variety of annotations at all scales from single nucleotide level up to a full chromosome. The Table Browser provides direct access to the database tables and sequence data, enabling complex queries on genome-wide datasets. The Proteome Browser graphically displays protein properties. The Gene Sorter allows filtering and comparison of genes by several metrics including expression data and several gene properties. BLAT and In Silico PCR search for sequences in entire genomes in seconds. These tools are highly integrated and provide many hyperlinks to other databases and websites. The GBD, browsing tools, downloadable data files and links to documentation and other information can be found at http://genome.ucsc.edu/.

  6. Genomics of Compensatory Adaptation in Experimental Populations of Aspergillus nidulans

    PubMed Central

    Dettman, Jeremy R.; Rodrigue, Nicolas; Schoustra, Sijmen E.; Kassen, Rees

    2016-01-01

    Knowledge of the number and nature of genetic changes responsible for adaptation is essential for understanding and predicting evolutionary trajectories. Here, we study the genomic basis of compensatory adaptation to the fitness cost of fungicide resistance in experimentally evolved strains of the filamentous fungus Aspergillus nidulans. The original selection experiment tracked the fitness recovery of lines founded by an ancestral strain that was resistant to fludioxonil, but paid a fitness cost in the absence of the fungicide. We obtained whole-genome sequence data for the ancestral A. nidulans strain and eight experimentally evolved strains. We find that fludioxonil resistance in the ancestor was likely conferred by a mutation in histidine kinase nikA, part of the two-component signal transduction system of the high-osmolarity glycerol (HOG) stress response pathway. To compensate for the pleiotropic negative effects of the resistance mutation, the subsequent fitness gains observed in the evolved lines were likely caused by secondary modification of HOG pathway activity. Candidate genes for the compensatory fitness increases were significantly overrepresented by stress response functions, and some were specifically associated with the HOG pathway itself. Parallel evolution at the gene level was rare among evolved lines. There was a positive relationship between the predicted number of adaptive steps, estimated from fitness data, and the number of genomic mutations, determined by whole-genome sequencing. However, the number of genomic mutations was, on average, 8.45 times greater than the number of adaptive steps inferred from fitness data. This research expands our understanding of the genetic basis of adaptation in multicellular eukaryotes and lays out a framework for future work on the genomics of compensatory adaptation in A. nidulans. PMID:27903631

  7. BGD: a database of bat genomes.

    PubMed

    Fang, Jianfei; Wang, Xuan; Mu, Shuo; Zhang, Shuyi; Dong, Dong

    2015-01-01

    Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD). BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/.

  8. The UCSC Genome Browser database: 2016 update

    PubMed Central

    Speir, Matthew L.; Zweig, Ann S.; Rosenbloom, Kate R.; Raney, Brian J.; Paten, Benedict; Nejad, Parisa; Lee, Brian T.; Learned, Katrina; Karolchik, Donna; Hinrichs, Angie S.; Heitner, Steve; Harte, Rachel A.; Haeussler, Maximilian; Guruvadoo, Luvina; Fujita, Pauline A.; Eisenhart, Christopher; Diekhans, Mark; Clawson, Hiram; Casper, Jonathan; Barber, Galt P.; Haussler, David; Kuhn, Robert M.; Kent, W. James

    2016-01-01

    For the past 15 years, the UCSC Genome Browser (http://genome.ucsc.edu/) has served the international research community by offering an integrated platform for viewing and analyzing information from a large database of genome assemblies and their associated annotations. The UCSC Genome Browser has been under continuous development since its inception with new data sets and software features added frequently. Some release highlights of this year include new and updated genome browsers for various assemblies, including bonobo and zebrafish; new gene annotation sets; improvements to track and assembly hub support; and a new interactive tool, the “Data Integrator”, for intersecting data from multiple tracks. We have greatly expanded the data sets available on the most recent human assembly, hg38/GRCh38, to include updated gene prediction sets from GENCODE, more phenotype- and disease-associated variants from ClinVar and ClinGen, more genomic regulatory data, and a new multiple genome alignment. PMID:26590259

  9. The UCSC Genome Browser database: 2016 update.

    PubMed

    Speir, Matthew L; Zweig, Ann S; Rosenbloom, Kate R; Raney, Brian J; Paten, Benedict; Nejad, Parisa; Lee, Brian T; Learned, Katrina; Karolchik, Donna; Hinrichs, Angie S; Heitner, Steve; Harte, Rachel A; Haeussler, Maximilian; Guruvadoo, Luvina; Fujita, Pauline A; Eisenhart, Christopher; Diekhans, Mark; Clawson, Hiram; Casper, Jonathan; Barber, Galt P; Haussler, David; Kuhn, Robert M; Kent, W James

    2016-01-04

    For the past 15 years, the UCSC Genome Browser (http://genome.ucsc.edu/) has served the international research community by offering an integrated platform for viewing and analyzing information from a large database of genome assemblies and their associated annotations. The UCSC Genome Browser has been under continuous development since its inception with new data sets and software features added frequently. Some release highlights of this year include new and updated genome browsers for various assemblies, including bonobo and zebrafish; new gene annotation sets; improvements to track and assembly hub support; and a new interactive tool, the "Data Integrator", for intersecting data from multiple tracks. We have greatly expanded the data sets available on the most recent human assembly, hg38/GRCh38, to include updated gene prediction sets from GENCODE, more phenotype- and disease-associated variants from ClinVar and ClinGen, more genomic regulatory data, and a new multiple genome alignment.

  10. Complete mitochondrial genome of an Amynthas earthworm, Amynthas aspergillus (Oligochaeta: Megascolecidae).

    PubMed

    Zhang, Liangliang; Jiang, Jibao; Dong, Yan; Qiu, Jiangping

    2016-05-01

    We have determined the mitochondrial genome of the first Amynthas earthworm, Amynthas aspergillus (Perrier, 1872), which is a natural medical resource in Chinese traditional medicine. Its mitogenome is 15,115 bp in length containing 37 genes with the same contents and order as other sequenced earthworms. All genes are encoded by the same strand, all 13 PCGs use ATG as start codon. The content of A + T is 63.04% for A. aspergillus (33.41% A, 29.63% T, 14.56% G and 22.41% C). The complete mitochondrial genomes of A. aspergillus would be useful for the reconstruction of Oligochaeta polygenetic relationships.

  11. The Saccharomyces Genome Database Variant Viewer

    PubMed Central

    Sheppard, Travis K.; Hitz, Benjamin C.; Engel, Stacia R.; Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C.; Dalusag, Kyla S.; Demeter, Janos; Hellerstedt, Sage T.; Karra, Kalpana; Nash, Robert S.; Paskov, Kelley M.; Skrzypek, Marek S.; Weng, Shuai; Wong, Edith D.; Cherry, J. Michael

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer. PMID:26578556

  12. The UCSC Genome Browser database: 2014 update

    PubMed Central

    Karolchik, Donna; Barber, Galt P.; Casper, Jonathan; Clawson, Hiram; Cline, Melissa S.; Diekhans, Mark; Dreszer, Timothy R.; Fujita, Pauline A.; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A.; Heitner, Steve; Hinrichs, Angie S.; Learned, Katrina; Lee, Brian T.; Li, Chin H.; Raney, Brian J.; Rhead, Brooke; Rosenbloom, Kate R.; Sloan, Cricket A.; Speir, Matthew L.; Zweig, Ann S.; Haussler, David; Kuhn, Robert M.; Kent, W. James

    2014-01-01

    The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a large collection of organisms, primarily vertebrates, with an emphasis on the human and mouse genomes. The Browser’s web-based tools provide an integrated environment for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic data sets. As of September 2013, the database contained genomic sequence and a basic set of annotation ‘tracks’ for ∼90 organisms. Significant new annotations include a 60-species multiple alignment conservation track on the mouse, updated UCSC Genes tracks for human and mouse, and several new sets of variation and ENCODE data. New software tools include a Variant Annotation Integrator that returns predicted functional effects of a set of variants uploaded as a custom track, an extension to UCSC Genes that displays haplotype alleles for protein-coding genes and an expansion of data hubs that includes the capability to display remotely hosted user-provided assembly sequence in addition to annotation data. To improve European access, we have added a Genome Browser mirror (http://genome-euro.ucsc.edu) hosted at Bielefeld University in Germany. PMID:24270787

  13. The UCSC Genome Browser database: 2014 update.

    PubMed

    Karolchik, Donna; Barber, Galt P; Casper, Jonathan; Clawson, Hiram; Cline, Melissa S; Diekhans, Mark; Dreszer, Timothy R; Fujita, Pauline A; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A; Heitner, Steve; Hinrichs, Angie S; Learned, Katrina; Lee, Brian T; Li, Chin H; Raney, Brian J; Rhead, Brooke; Rosenbloom, Kate R; Sloan, Cricket A; Speir, Matthew L; Zweig, Ann S; Haussler, David; Kuhn, Robert M; Kent, W James

    2014-01-01

    The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a large collection of organisms, primarily vertebrates, with an emphasis on the human and mouse genomes. The Browser's web-based tools provide an integrated environment for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic data sets. As of September 2013, the database contained genomic sequence and a basic set of annotation 'tracks' for ∼90 organisms. Significant new annotations include a 60-species multiple alignment conservation track on the mouse, updated UCSC Genes tracks for human and mouse, and several new sets of variation and ENCODE data. New software tools include a Variant Annotation Integrator that returns predicted functional effects of a set of variants uploaded as a custom track, an extension to UCSC Genes that displays haplotype alleles for protein-coding genes and an expansion of data hubs that includes the capability to display remotely hosted user-provided assembly sequence in addition to annotation data. To improve European access, we have added a Genome Browser mirror (http://genome-euro.ucsc.edu) hosted at Bielefeld University in Germany.

  14. WGE: a CRISPR database for genome engineering

    PubMed Central

    Hodgkins, Alex; Farne, Anna; Perera, Sajith; Grego, Tiago; Parry-Smith, David J.; Skarnes, William C.; Iyer, Vivek

    2015-01-01

    Summary: The rapid development of CRISPR-Cas9 mediated genome editing techniques has given rise to a number of online and stand-alone tools to find and score CRISPR sites for whole genomes. Here we describe the Wellcome Trust Sanger Institute Genome Editing database (WGE), which uses novel methods to compute, visualize and select optimal CRISPR sites in a genome browser environment. The WGE database currently stores single and paired CRISPR sites and pre-calculated off-target information for CRISPRs located in the mouse and human exomes. Scoring and display of off-target sites is simple, and intuitive, and filters can be applied to identify high-quality CRISPR sites rapidly. WGE also provides a tool for the design and display of gene targeting vectors in the same genome browser, along with gene models, protein translation and variation tracks. WGE is open, extensible and can be set up to compute and present CRISPR sites for any genome. Availability and implementation: The WGE database is freely available at www.sanger.ac.uk/htgt/wge Contact: vvi@sanger.ac.uk or skarnes@sanger.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25979474

  15. Bovine Genome Database: integrated tools for genome annotation and discovery.

    PubMed

    Childers, Christopher P; Reese, Justin T; Sundaram, Jaideep P; Vile, Donald C; Dickens, C Michael; Childs, Kevin L; Salih, Hanni; Bennett, Anna K; Hagen, Darren E; Adelson, David L; Elsik, Christine G

    2011-01-01

    The Bovine Genome Database (BGD; http://BovineGenome.org) strives to improve annotation of the bovine genome and to integrate the genome sequence with other genomics data. BGD includes GBrowse genome browsers, the Apollo Annotation Editor, a quantitative trait loci (QTL) viewer, BLAST databases and gene pages. Genome browsers, available for both scaffold and chromosome coordinate systems, display the bovine Official Gene Set (OGS), RefSeq and Ensembl gene models, non-coding RNA, repeats, pseudogenes, single-nucleotide polymorphism, markers, QTL and alignments to complementary DNAs, ESTs and protein homologs. The Bovine QTL viewer is connected to the BGD Chromosome GBrowse, allowing for the identification of candidate genes underlying QTL. The Apollo Annotation Editor connects directly to the BGD Chado database to provide researchers with remote access to gene evidence in a graphical interface that allows editing and creating new gene models. Researchers may upload their annotations to the BGD server for review and integration into the subsequent release of the OGS. Gene pages display information for individual OGS gene models, including gene structure, transcript variants, functional descriptions, gene symbols, Gene Ontology terms, annotator comments and links to National Center for Biotechnology Information and Ensembl. Each gene page is linked to a wiki page to allow input from the research community.

  16. The UCSC Genome Browser database: 2015 update

    PubMed Central

    Rosenbloom, Kate R.; Armstrong, Joel; Barber, Galt P.; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Dreszer, Timothy R.; Fujita, Pauline A.; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A.; Heitner, Steve; Hickey, Glenn; Hinrichs, Angie S.; Hubley, Robert; Karolchik, Donna; Learned, Katrina; Lee, Brian T.; Li, Chin H.; Miga, Karen H.; Nguyen, Ngan; Paten, Benedict; Raney, Brian J.; Smit, Arian F. A.; Speir, Matthew L.; Zweig, Ann S.; Haussler, David; Kuhn, Robert M.; Kent, W. James

    2015-01-01

    Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), ‘mined the web’ for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled. PMID:25428374

  17. Benchmarking database performance for genomic data.

    PubMed

    Khushi, Matloob

    2015-06-01

    Genomic regions represent features such as gene annotations, transcription factor binding sites and epigenetic modifications. Performing various genomic operations such as identifying overlapping/non-overlapping regions or nearest gene annotations are common research needs. The data can be saved in a database system for easy management, however, there is no comprehensive database built-in algorithm at present to identify overlapping regions. Therefore I have developed a novel region-mapping (RegMap) SQL-based algorithm to perform genomic operations and have benchmarked the performance of different databases. Benchmarking identified that PostgreSQL extracts overlapping regions much faster than MySQL. Insertion and data uploads in PostgreSQL were also better, although general searching capability of both databases was almost equivalent. In addition, using the algorithm pair-wise, overlaps of >1000 datasets of transcription factor binding sites and histone marks, collected from previous publications, were reported and it was found that HNF4G significantly co-locates with cohesin subunit STAG1 (SA1).Inc.

  18. What Can Comparative Genomics Tell Us About Species Concepts in the Genus Aspergillus?

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Understanding the nature of species’ boundaries is a fundamental question in evolutionary biology. The availability of genomes from several species of the genus Aspergillus allows us for the first time to examine the demarcation of fungal species at the whole-genome level. Here, we examine four ca...

  19. Orchidstra: an integrated orchid functional genomics database.

    PubMed

    Su, Chun-lin; Chao, Ya-Ting; Yen, Shao-Hua; Chen, Chun-Yi; Chen, Wan-Chieh; Chang, Yao-Chien Alex; Shih, Ming-Che

    2013-02-01

    A specialized orchid database, named Orchidstra (URL: http://orchidstra.abrc.sinica.edu.tw), has been constructed to collect, annotate and share genomic information for orchid functional genomics studies. The Orchidaceae is a large family of Angiosperms that exhibits extraordinary biodiversity in terms of both the number of species and their distribution worldwide. Orchids exhibit many unique biological features; however, investigation of these traits is currently constrained due to the limited availability of genomic information. Transcriptome information for five orchid species and one commercial hybrid has been included in the Orchidstra database. Altogether, these comprise >380,000 non-redundant orchid transcript sequences, of which >110,000 are protein-coding genes. Sequences from the transcriptome shotgun assembly (TSA) were obtained either from output reads from next-generation sequencing technologies assembled into contigs, or from conventional cDNA library approaches. An annotation pipeline using Gene Ontology, KEGG and Pfam was built to assign gene descriptions and functional annotation to protein-coding genes. Deep sequencing of small RNA was also performed for Phalaenopsis aphrodite to search for microRNAs (miRNAs), extending the information archived for this species to miRNA annotation, precursors and putative target genes. The P. aphrodite transcriptome information was further used to design probes for an oligonucleotide microarray, and expression profiling analysis was carried out. The intensities of hybridized probes derived from microarray assays of various tissues were incorporated into the database as part of the functional evidence. In the future, the content of the Orchidstra database will be expanded with transcriptome data and genomic information from more orchid species.

  20. The UCSC Genome Browser database: 2017 update.

    PubMed

    Tyner, Cath; Barber, Galt P; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Eisenhart, Christopher; Fischer, Clayton M; Gibson, David; Gonzalez, Jairo Navarro; Guruvadoo, Luvina; Haeussler, Maximilian; Heitner, Steve; Hinrichs, Angie S; Karolchik, Donna; Lee, Brian T; Lee, Christopher M; Nejad, Parisa; Raney, Brian J; Rosenbloom, Kate R; Speir, Matthew L; Villarreal, Chris; Vivian, John; Zweig, Ann S; Haussler, David; Kuhn, Robert M; Kent, W James

    2017-01-04

    Since its 2001 debut, the University of California, Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu/) team has provided continuous support to the international genomics and biomedical communities through a web-based, open source platform designed for the fast, scalable display of sequence alignments and annotations landscaped against a vast collection of quality reference genome assemblies. The browser's publicly accessible databases are the backbone of a rich, integrated bioinformatics tool suite that includes a graphical interface for data queries and downloads, alignment programs, command-line utilities and more. This year's highlights include newly designed home and gateway pages; a new 'multi-region' track display configuration for exon-only, gene-only and custom regions visualization; new genome browsers for three species (brown kiwi, crab-eating macaque and Malayan flying lemur); eight updated genome assemblies; extended support for new data types such as CRAM, RNA-seq expression data and long-range chromatin interaction pairs; and the unveiling of a new supported mirror site in Japan.

  1. The UCSC Genome Browser database: 2017 update

    PubMed Central

    Tyner, Cath; Barber, Galt P.; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Eisenhart, Christopher; Fischer, Clayton M.; Gibson, David; Gonzalez, Jairo Navarro; Guruvadoo, Luvina; Haeussler, Maximilian; Heitner, Steve; Hinrichs, Angie S.; Karolchik, Donna; Lee, Brian T.; Lee, Christopher M.; Nejad, Parisa; Raney, Brian J.; Rosenbloom, Kate R.; Speir, Matthew L.; Villarreal, Chris; Vivian, John; Zweig, Ann S.; Haussler, David; Kuhn, Robert M.; Kent, W. James

    2017-01-01

    Since its 2001 debut, the University of California, Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu/) team has provided continuous support to the international genomics and biomedical communities through a web-based, open source platform designed for the fast, scalable display of sequence alignments and annotations landscaped against a vast collection of quality reference genome assemblies. The browser's publicly accessible databases are the backbone of a rich, integrated bioinformatics tool suite that includes a graphical interface for data queries and downloads, alignment programs, command-line utilities and more. This year's highlights include newly designed home and gateway pages; a new ‘multi-region’ track display configuration for exon-only, gene-only and custom regions visualization; new genome browsers for three species (brown kiwi, crab-eating macaque and Malayan flying lemur); eight updated genome assemblies; extended support for new data types such as CRAM, RNA-seq expression data and long-range chromatin interaction pairs; and the unveiling of a new supported mirror site in Japan. PMID:27899642

  2. Requirements and standards for organelle genome databases

    SciTech Connect

    Boore, Jeffrey L.

    2006-01-09

    Mitochondria and plastids (collectively called organelles)descended from prokaryotes that adopted an intracellular, endosymbioticlifestyle within early eukaryotes. Comparisons of their remnant genomesaddress a wide variety of biological questions, especially when includingthe genomes of their prokaryotic relatives and the many genes transferredto the eukaryotic nucleus during the transitions from endosymbiont toorganelle. The pace of producing complete organellar genome sequences nowmakes it unfeasible to do broad comparisons using the primary literatureand, even if it were feasible, it is now becoming uncommon for journalsto accept detailed descriptions of genome-level features. Unfortunatelyno database is currently useful for this task, since they have littlestandardization and are riddled with error. Here I outline what iscurrently wrong and what must be done to make this data useful to thescientific community.

  3. How to use the Candida Genome Database

    PubMed Central

    Skrzypek, Marek S.; Binkley, Jonathan; Sherlock, Gavin

    2016-01-01

    Summary Studying Candida biology requires access to genomic sequence data in conjunction with experimental information that provides functional context to genes and proteins. The Candida Genome Database (CGD) integrates functional information about Candida genes and their products with a set of analysis tools that facilitate searching for sets of genes and exploring their biological roles. This chapter describes how the various types of information available at CGD can be searched, retrieved, and analyzed. Starting with the guided tour of the CGD Home page and Locus Summary page, this unit shows how to navigate the various assemblies of the C. albicans genome, how to use Gene Ontology tools to make sense of large-scale data, and how to access the microarray data archived at CGD. PMID:26519061

  4. Complete Genome Sequence of Soil Fungus Aspergillus terreus (KM017963), a Potent Lovastatin Producer

    PubMed Central

    Bhargavi, S. D.; Praveen, V. K.

    2016-01-01

    We report the complete genome of Aspergillus terreus (KM017963), a tropical soil isolate. The genome sequence is 29 Mb, with a G+C content of 51.12%. The genome sequence of A. terreus shows the presence of the complete gene cluster responsible for lovastatin (an anti-cholesterol drug) production in a single scaffold (1.16). PMID:27284150

  5. Genome mining and functional genomics for siderophore production in Aspergillus niger.

    PubMed

    Franken, Angelique C W; Lechner, Beatrix E; Werner, Ernst R; Haas, Hubertus; Lokman, B Christien; Ram, Arthur F J; van den Hondel, Cees A M J J; de Weert, Sandra; Punt, Peter J

    2014-11-01

    Iron is an essential metal for many organisms, but the biologically relevant form of iron is scarce because of rapid oxidation resulting in low solubility. Simultaneously, excessive accumulation of iron is toxic. Consequently, iron uptake is a highly controlled process. In most fungal species, siderophores play a central role in iron handling. Siderophores are small iron-specific chelators that can be secreted to scavenge environmental iron or bind intracellular iron with high affinity. A second high-affinity iron uptake mechanism is reductive iron assimilation (RIA). As shown in Aspergillus fumigatus and Aspergillus nidulans, synthesis of siderophores in Aspergilli is predominantly under control of the transcription factors SreA and HapX, which are connected by a negative transcriptional feedback loop. Abolishing this fine-tuned regulation corroborates iron homeostasis, including heme biosynthesis, which could be biotechnologically of interest, e.g. the heterologous production of heme-dependent peroxidases. Aspergillus niger genome inspection identified orthologues of several genes relevant for RIA and siderophore metabolism, as well as sreA and hapX. Interestingly, genes related to synthesis of the common fungal extracellular siderophore triacetylfusarinine C were absent. Reverse-phase high-performance liquid chromatography (HPLC) confirmed the absence of triacetylfusarinine C, and demonstrated that the major secreted siderophores of A. niger are coprogen B and ferrichrome, which is also the dominant intracellular siderophore. In A. niger wild type grown under iron-replete conditions, the expression of genes involved in coprogen biosynthesis and RIA was low in the exponential growth phase but significantly induced during ascospore germination. Deletion of sreA in A. niger resulted in elevated iron uptake and increased cellular ferrichrome accumulation. Increased sensitivity toward phleomycin and high iron concentration reflected the toxic effects of excessive

  6. Potential of Aspergillus flavus Genomics for Applications in Biotechnology

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus is a common saprophyte and opportunistic pathogen that survives in the natural environment by extracting nutrition from plant debris, insect carcasses and a variety of other carbon sources. A. flavus produces numerous secondary metabolites and hydrolytic enzymes. The primary obj...

  7. Genomic sequence of the aflatoxigenic filamentous fungus Aspergillus nomius

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus nomius is an opportunistic pathogen and one of the three most important producers of aflatoxins in section Flavi. This fungus has been reported to contaminate agricultural commodities, but it has also been sampled in non-agricultural soils so the host range is not well known. Having a si...

  8. Draft Genome Sequences of Two Aspergillus fumigatus Strains, Isolated from the International Space Station.

    PubMed

    Singh, Nitin Kumar; Blachowicz, Adriana; Checinska, Aleksandra; Wang, Clay; Venkateswaran, Kasthuri

    2016-07-14

    Draft genome sequences of Aspergillus fumigatus strains (ISSFT-021 and IF1SW-F4), opportunistic pathogens isolated from the International Space Station (ISS), were assembled to facilitate investigations of the nature of the virulence characteristics of the ISS strains to other clinical strains isolated on Earth.

  9. Draft Genome Sequences of Two Aspergillus fumigatus Strains, Isolated from the International Space Station

    PubMed Central

    Singh, Nitin Kumar; Blachowicz, Adriana; Checinska, Aleksandra; Wang, Clay

    2016-01-01

    Draft genome sequences of Aspergillus fumigatus strains (ISSFT-021 and IF1SW-F4), opportunistic pathogens isolated from the International Space Station (ISS), were assembled to facilitate investigations of the nature of the virulence characteristics of the ISS strains to other clinical strains isolated on Earth. PMID:27417828

  10. Draft genome sequence of an aflatoxigenic Aspergillus species, A. bombycis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genome of the A. bombycis Type strain was sequenced using a Personal Genome Machine, followed by annotation of its predicted genes. The genome size for A. bombycis was found to be approximately 37 Mb and contained 12,266 genes. This announcement introduces a sequenced genome for an aflatoxigenic...

  11. Genome Sequences of Eight Aspergillus flavus spp. and One A. parasiticus sp., Isolated From Peanut Seeds in Georgia

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus and A. parasiticus fungi, carcinogen-mycotoxins producers, infect peanut seeds, causing considerable impact on both human health and the economy. Here we report 9 genome sequences of Aspergillus spp. isolated from peanut seeds. The information obtained will allow conducting biodiv...

  12. Private and Efficient Query Processing on Outsourced Genomic Databases.

    PubMed

    Ghasemi, Reza; Aziz, Md Momin Al; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian

    2016-11-04

    Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a timeconsuming and expensive process. Second, it requires large-scale computation and storage systems to processes genomic sequences. Third, genomic databases are often owned by different organizations and thus not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 SNPs in a database of 20,000 records takes around 100 and 150 seconds, respectively.

  13. CyanoBase: the cyanobacteria genome database update 2010.

    PubMed

    Nakao, Mitsuteru; Okamoto, Shinobu; Kohara, Mitsuyo; Fujishiro, Tsunakazu; Fujisawa, Takatomo; Sato, Shusei; Tabata, Satoshi; Kaneko, Takakazu; Nakamura, Yasukazu

    2010-01-01

    CyanoBase (http://genome.kazusa.or.jp/cyanobase) is the genome database for cyanobacteria, which are model organisms for photosynthesis. The database houses cyanobacteria species information, complete genome sequences, genome-scale experiment data, gene information, gene annotations and mutant information. In this version, we updated these datasets and improved the navigation and the visual display of the data views. In addition, a web service API now enables users to retrieve the data in various formats with other tools, seamlessly.

  14. Automatic query mapping among genomic databases: a pilot exploration.

    PubMed Central

    Cheung, K. H.; Nadkarni, P.; Miller, P.; Shin, D. G.

    1998-01-01

    As databases in the human genome project proliferate, it is important for users of one genomic database to identify similar or inconsistent data in other autonomously developed genomic databases. To do so, the user needs to issue the same query across multiple databases. We describe an approach that allows a query issued against one database to be automatically mapped to an equivalent query against another structurally different database. Our approach features two components: 1) a database designed to capture knowledge (metadata) that describes the correspondences among individual database components and 2) a module that utilizes the metadata to perform query mappings. As a demonstration, we apply our query mapping approach to two chromosome map databases (DB/12 and GDB). Images Figure 2 Figure 3 PMID:9929357

  15. MIPS: a database for genomes and protein sequences.

    PubMed

    Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).

  16. Tomato functional genomics database (TFGD): a comprehensive collection and analysis package for tomato functional genomics

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Tomato Functional Genomics Database (TFGD; http://ted.bti.cornell.edu) provides a comprehensive systems biology resource to store, mine, analyze, visualize and integrate large-scale tomato functional genomics datasets. The database is expanded from the previously described Tomato Expression Database...

  17. Recent advances in genome mining of secondary metabolites in Aspergillus terreus

    PubMed Central

    Guo, Chun-Jun; Wang, Clay C. C.

    2014-01-01

    Filamentous fungi are rich resources of secondary metabolites (SMs) with a variety of interesting biological activities. Recent advances in genome sequencing and techniques in genetic manipulation have enabled researchers to study the biosynthetic genes of these SMs. Aspergillus terreus is the well-known producer of lovastatin, a cholesterol-lowering drug. This fungus also produces other SMs, including acetylaranotin, butyrolactones, and territram, with interesting bioactivities. This review will cover recent progress in genome mining of SMs identified in this fungus. The identification and characterization of the gene cluster for these SMs, as well as the proposed biosynthetic pathways, will be discussed in depth. PMID:25566227

  18. Plant database resources at The Institute for Genomic Research.

    PubMed

    Chan, Agnes P; Rabinowicz, Pablo D; Quackenbush, John; Buell, C Robin; Town, Chris D

    2007-01-01

    With the completion of the genome sequences of the model plants Arabidopsis and rice, and the continuing sequencing efforts of other economically important crop plants, an unprecedented amount of genome sequence data is now available for large-scale genomics studies and analyses, such as the identification and discovery of novel genes, comparative genomics, and functional genomics. Efficient utilization of these large data sets is critically dependent on the ease of access and organization of the data. The plant databases at The Institute for Genomic Research (TIGR) have been set up to maintain various data types including genomic sequence, annotation and analyses, expressed transcript assemblies and analyses, and gene expression profiles from microarray studies. We present here an overview of the TIGR database resources for plant genomics and describe methods to access the data.

  19. Recent updates and developments to plant genome size databases

    PubMed Central

    Garcia, Sònia; Leitch, Ilia J.; Anadon-Rosell, Alba; Canela, Miguel Á.; Gálvez, Francisco; Garnatje, Teresa; Gras, Airy; Hidalgo, Oriane; Johnston, Emmeline; Mas de Xaxars, Gemma; Pellicer, Jaume; Siljak-Yakovlev, Sonja; Vallès, Joan; Vitales, Daniel; Bennett, Michael D.

    2014-01-01

    Two plant genome size databases have been recently updated and/or extended: the Plant DNA C-values database (http://data.kew.org/cvalues), and GSAD, the Genome Size in Asteraceae database (http://www.asteraceaegenomesize.com). While the first provides information on nuclear DNA contents across land plants and some algal groups, the second is focused on one of the largest and most economically important angiosperm families, Asteraceae. Genome size data have numerous applications: they can be used in comparative studies on genome evolution, or as a tool to appraise the cost of whole-genome sequencing programs. The growing interest in genome size and increasing rate of data accumulation has necessitated the continued update of these databases. Currently, the Plant DNA C-values database (Release 6.0, Dec. 2012) contains data for 8510 species, while GSAD has 1219 species (Release 2.0, June 2013), representing increases of 17 and 51%, respectively, in the number of species with genome size data, compared with previous releases. Here we provide overviews of the most recent releases of each database, and outline new features of GSAD. The latter include (i) a tool to visually compare genome size data between species, (ii) the option to export data and (iii) a webpage containing information about flow cytometry protocols. PMID:24288377

  20. Recent updates and developments to plant genome size databases.

    PubMed

    Garcia, Sònia; Leitch, Ilia J; Anadon-Rosell, Alba; Canela, Miguel Á; Gálvez, Francisco; Garnatje, Teresa; Gras, Airy; Hidalgo, Oriane; Johnston, Emmeline; Mas de Xaxars, Gemma; Pellicer, Jaume; Siljak-Yakovlev, Sonja; Vallès, Joan; Vitales, Daniel; Bennett, Michael D

    2014-01-01

    Two plant genome size databases have been recently updated and/or extended: the Plant DNA C-values database (http://data.kew.org/cvalues), and GSAD, the Genome Size in Asteraceae database (http://www.asteraceaegenomesize.com). While the first provides information on nuclear DNA contents across land plants and some algal groups, the second is focused on one of the largest and most economically important angiosperm families, Asteraceae. Genome size data have numerous applications: they can be used in comparative studies on genome evolution, or as a tool to appraise the cost of whole-genome sequencing programs. The growing interest in genome size and increasing rate of data accumulation has necessitated the continued update of these databases. Currently, the Plant DNA C-values database (Release 6.0, Dec. 2012) contains data for 8510 species, while GSAD has 1219 species (Release 2.0, June 2013), representing increases of 17 and 51%, respectively, in the number of species with genome size data, compared with previous releases. Here we provide overviews of the most recent releases of each database, and outline new features of GSAD. The latter include (i) a tool to visually compare genome size data between species, (ii) the option to export data and (iii) a webpage containing information about flow cytometry protocols.

  1. Gramene database: navigating plant comparative genomics resources

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Gramene (http://www.gramene.org) is an online, open source, curated resource for plant comparative genomics and pathway analysis designed to support researchers working in plant genomics, breeding, evolutionary biology, system biology, and metabolic engineering. It exploits phylogenetic relationship...

  2. GDR (Genome Database for Rosaceae): integrated web-database for Rosaceae genomics and genetics data.

    PubMed

    Jung, Sook; Staton, Margaret; Lee, Taein; Blenda, Anna; Svancara, Randall; Abbott, Albert; Main, Dorrie

    2008-01-01

    The Genome Database for Rosaceae (GDR) is a central repository of curated and integrated genetics and genomics data of Rosaceae, an economically important family which includes apple, cherry, peach, pear, raspberry, rose and strawberry. GDR contains annotated databases of all publicly available Rosaceae ESTs, the genetically anchored peach physical map, Rosaceae genetic maps and comprehensively annotated markers and traits. The ESTs are assembled to produce unigene sets of each genus and the entire Rosaceae. Other annotations include putative function, microsatellites, open reading frames, single nucleotide polymorphisms, gene ontology terms and anchored map position where applicable. Most of the published Rosaceae genetic maps can be viewed and compared through CMap, the comparative map viewer. The peach physical map can be viewed using WebFPC/WebChrom, and also through our integrated GDR map viewer, which serves as a portal to the combined genetic, transcriptome and physical mapping information. ESTs, BACs, markers and traits can be queried by various categories and the search result sites are linked to the mapping visualization tools. GDR also provides online analysis tools such as a batch BLAST/FASTA server for the GDR datasets, a sequence assembly server and microsatellite and primer detection tools. GDR is available at http://www.rosaceae.org.

  3. GDR (Genome Database for Rosaceae): integrated web-database for Rosaceae genomics and genetics data

    PubMed Central

    Jung, Sook; Staton, Margaret; Lee, Taein; Blenda, Anna; Svancara, Randall; Abbott, Albert; Main, Dorrie

    2008-01-01

    The Genome Database for Rosaceae (GDR) is a central repository of curated and integrated genetics and genomics data of Rosaceae, an economically important family which includes apple, cherry, peach, pear, raspberry, rose and strawberry. GDR contains annotated databases of all publicly available Rosaceae ESTs, the genetically anchored peach physical map, Rosaceae genetic maps and comprehensively annotated markers and traits. The ESTs are assembled to produce unigene sets of each genus and the entire Rosaceae. Other annotations include putative function, microsatellites, open reading frames, single nucleotide polymorphisms, gene ontology terms and anchored map position where applicable. Most of the published Rosaceae genetic maps can be viewed and compared through CMap, the comparative map viewer. The peach physical map can be viewed using WebFPC/WebChrom, and also through our integrated GDR map viewer, which serves as a portal to the combined genetic, transcriptome and physical mapping information. ESTs, BACs, markers and traits can be queried by various categories and the search result sites are linked to the mapping visualization tools. GDR also provides online analysis tools such as a batch BLAST/FASTA server for the GDR datasets, a sequence assembly server and microsatellite and primer detection tools. GDR is available at http://www.rosaceae.org. PMID:17932055

  4. An extended bioreaction database that significantly improves reconstruction and analysis of genome-scale metabolic networks.

    PubMed

    Stelzer, Michael; Sun, Jibin; Kamphans, Tom; Fekete, Sándor P; Zeng, An-Ping

    2011-11-01

    The bioreaction database established by Ma and Zeng (Bioinformatics, 2003, 19, 270-277) for in silico reconstruction of genome-scale metabolic networks has been widely used. Based on more recent information in the reference databases KEGG LIGAND and Brenda, we upgrade the bioreaction database in this work by almost doubling the number of reactions from 3565 to 6851. Over 70% of the reactions have been manually updated/revised in terms of reversibility, reactant pairs, currency metabolites and error correction. For the first time, 41 spontaneous sugar mutarotation reactions are introduced into the biochemical database. The upgrade significantly improves the reconstruction of genome scale metabolic networks. Many gaps or missing biochemical links can be recovered, as exemplified with three model organisms Homo sapiens, Aspergillus niger, and Escherichia coli. The topological parameters of the constructed networks were also largely affected, however, the overall network structure remains scale-free. Furthermore, we consider the problem of computing biologically feasible shortest paths in reconstructed metabolic networks. We show that these paths are hard to compute and present solutions to find such paths in networks of small and medium size.

  5. Exploration of the Chemical Space of Public Genomic Databases

    EPA Science Inventory

    The current project aims to chemically index the content of public genomic databases to make these data accessible in relation to other publicly available, chemically-indexed toxicological information.

  6. viruSITE—integrated database for viral genomics

    PubMed Central

    Stano, Matej; Beke, Gabor; Klucar, Lubos

    2016-01-01

    Viruses are the most abundant biological entities and the reservoir of most of the genetic diversity in the Earth's biosphere. Viral genomes are very diverse, generally short in length and compared to other organisms carry only few genes. viruSITE is a novel database which brings together high-value information compiled from various resources. viruSITE covers the whole universe of viruses and focuses on viral genomes, genes and proteins. The database contains information on virus taxonomy, host range, genome features, sequential relatedness as well as the properties and functions of viral genes and proteins. All entries in the database are linked to numerous information resources. The above-mentioned features make viruSITE a comprehensive knowledge hub in the field of viral genomics. The web interface of the database was designed so as to offer an easy-to-navigate, intuitive and user-friendly environment. It provides sophisticated text searching and a taxonomy-based browsing system. viruSITE also allows for an alternative approach based on sequence search. A proprietary genome browser generates a graphical representation of viral genomes. In addition to retrieving and visualising data, users can perform comparative genomics analyses using a variety of tools. Database URL: http://www.virusite.org/ PMID:28025349

  7. Uniform standards for genome databases in forest and fruit trees

    Technology Transfer Automated Retrieval System (TEKTRAN)

    TreeGenes and tfGDR serve the international forestry and fruit tree genomics research communities, respectively. These databases hold similar sequence data and provide resources for the submission and recovery of this information in order to enable comparative genomics research. Large-scale genotype...

  8. Design and implementation of the cacao genome database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The Cacao Genome Database (CGD, www.cacaogenomedb.org) is being developed to provide a comprehensive data mining resource of genomic, genetic and breeding data for Theobroma cacao. Designed using Chado and a collection of Drupal modules, known as Tripal, CGD currently contains the genetically anchor...

  9. Genome Sequence Databases (Overview): Sequencing and Assembly

    SciTech Connect

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  10. Epidemiological and Genomic Landscape of Azole Resistance Mechanisms in Aspergillus Fungi

    PubMed Central

    Hagiwara, Daisuke; Watanabe, Akira; Kamei, Katsuhiko; Goldman, Gustavo H.

    2016-01-01

    Invasive aspergillosis is a life-threatening mycosis caused by the pathogenic fungus Aspergillus. The predominant causal species is Aspergillus fumigatus, and azole drugs are the treatment of choice. Azole drugs approved for clinical use include itraconazole, voriconazole, posaconazole, and the recently added isavuconazole. However, epidemiological research has indicated that the prevalence of azole-resistant A. fumigatus isolates has increased significantly over the last decade. What is worse is that azole-resistant strains are likely to have emerged not only in response to long-term drug treatment but also because of exposure to azole fungicides in the environment. Resistance mechanisms include amino acid substitutions in the target Cyp51A protein, tandem repeat sequence insertions at the cyp51A promoter, and overexpression of the ABC transporter Cdr1B. Environmental azole-resistant strains harboring the association of a tandem repeat sequence and punctual mutation of the Cyp51A gene (TR34/L98H and TR46/Y121F/T289A) have become widely disseminated across the world within a short time period. The epidemiological data also suggests that the number of Aspergillus spp. other than A. fumigatus isolated has risen. Some non-fumigatus species intrinsically show low susceptibility to azole drugs, imposing the need for accurate identification, and drug susceptibility testing in most clinical cases. Currently, our knowledge of azole resistance mechanisms in non-fumigatus Aspergillus species such as A. flavus, A. niger, A. tubingensis, A. terreus, A. fischeri, A. lentulus, A. udagawae, and A. calidoustus is limited. In this review, we present recent advances in our understanding of azole resistance mechanisms particularly in A. fumigatus. We then provide an overview of the genome sequences of non-fumigatus species, focusing on the proteins related to azole resistance mechanisms. PMID:27708619

  11. Kazusa Marker DataBase: a database for genomics, genetics, and molecular breeding in plants.

    PubMed

    Shirasawa, Kenta; Isobe, Sachiko; Tabata, Satoshi; Hirakawa, Hideki

    2014-09-01

    In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10 plant species have been targeted: tomato (Solanum lycopersicum), pepper (Capsicum annuum), strawberry (Fragaria × ananassa), radish (Raphanus sativus), Lotus japonicus, soybean (Glycine max), peanut (Arachis hypogaea), red clover (Trifolium pratense), white clover (Trifolium repens), and eucalyptus (Eucalyptus camaldulensis). In addition, the number of plant species registered in this database will be increased as our research progresses. The Kazusa Marker DataBase will be a useful tool for both basic and applied sciences, such as genomics, genetics, and molecular breeding in crops.

  12. Kazusa Marker DataBase: a database for genomics, genetics, and molecular breeding in plants

    PubMed Central

    Shirasawa, Kenta; Isobe, Sachiko; Tabata, Satoshi; Hirakawa, Hideki

    2014-01-01

    In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10 plant species have been targeted: tomato (Solanum lycopersicum), pepper (Capsicum annuum), strawberry (Fragaria × ananassa), radish (Raphanus sativus), Lotus japonicus, soybean (Glycine max), peanut (Arachis hypogaea), red clover (Trifolium pratense), white clover (Trifolium repens), and eucalyptus (Eucalyptus camaldulensis). In addition, the number of plant species registered in this database will be increased as our research progresses. The Kazusa Marker DataBase will be a useful tool for both basic and applied sciences, such as genomics, genetics, and molecular breeding in crops. PMID:25320561

  13. Prokaryotic Genomes from Microbes Online Database

    DOE Data Explorer

    Alm, Eric J.; Huang, Katherine H.; Price, Morgan N.; Koche, Richard P.; Keller, Keith; Dubchak, Inna L.; Arkin, Adam P.

    To describe the potential functions of genes, MicrobesOnline includes protein family analyses (from InterPro and COG), metabolic maps (from KEGG), links to research papers (from UniProt and PubMed), and operon predictions for every genome. To examine each gene's evolutionary history, MicrobesOnline includes precomputed phylogenetic trees for all the gene families. It displays gene trees with genomic context or it compares the gene tree to the species tree. The tools provided with MicrobesOnline allow users to: compute customized motifs, sequence alignments, and phylogenetic trees change expression patterns in metabolic maps annotate genes in various ways. A browse tree tool and a genome browser are available, along with specialized search capabilities. (Specialized Interface)

  14. Integration of new alternative reference strain genome sequences into the Saccharomyces genome database.

    PubMed

    Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C; Dalusag, Kyla; Demeter, Janos; Engel, Stacia; Hellerstedt, Sage T; Karra, Kalpana; Hitz, Benjamin C; Nash, Robert S; Paskov, Kelley; Sheppard, Travis; Skrzypek, Marek; Weng, Shuai; Wong, Edith; Michael Cherry, J

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. To provide a wider scope of genetic and phenotypic variation in yeast, the genome sequences and their corresponding annotations from 11 alternative S. cerevisiae reference strains have been integrated into SGD. Genomic and protein sequence information for genes from these strains are now available on the Sequence and Protein tab of the corresponding Locus Summary pages. We illustrate how these genome sequences can be utilized to aid our understanding of strain-specific functional and phenotypic differences.Database URL: www.yeastgenome.org.

  15. OryzaGenome: Genome Diversity Database of Wild Oryza Species.

    PubMed

    Ohyanagi, Hajime; Ebata, Toshinobu; Huang, Xuehui; Gong, Hao; Fujita, Masahiro; Mochizuki, Takako; Toyoda, Atsushi; Fujiyama, Asao; Kaminuma, Eli; Nakamura, Yasukazu; Feng, Qi; Wang, Zi-Xuan; Han, Bin; Kurata, Nori

    2016-01-01

    The species in the genus Oryza, encompassing nine genome types and 23 species, are a rich genetic resource and may have applications in deeper genomic analyses aiming to understand the evolution of plant genomes. With the advancement of next-generation sequencing (NGS) technology, a flood of Oryza species reference genomes and genomic variation information has become available in recent years. This genomic information, combined with the comprehensive phenotypic information that we are accumulating in our Oryzabase, can serve as an excellent genotype-phenotype association resource for analyzing rice functional and structural evolution, and the associated diversity of the Oryza genus. Here we integrate our previous and future phenotypic/habitat information and newly determined genotype information into a united repository, named OryzaGenome, providing the variant information with hyperlinks to Oryzabase. The current version of OryzaGenome includes genotype information of 446 O. rufipogon accessions derived by imputation and of 17 accessions derived by imputation-free deep sequencing. Two variant viewers are implemented: SNP Viewer as a conventional genome browser interface and Variant Table as a text-based browser for precise inspection of each variant one by one. Portable VCF (variant call format) file or tab-delimited file download is also available. Following these SNP (single nucleotide polymorphism) data, reference pseudomolecules/scaffolds/contigs and genome-wide variation information for almost all of the closely and distantly related wild Oryza species from the NIG Wild Rice Collection will be available in future releases. All of the resources can be accessed through http://viewer.shigen.info/oryzagenome/.

  16. StellaBase: the Nematostella vectensis Genomics Database.

    PubMed

    Sullivan, James C; Ryan, Joseph F; Watson, James A; Webb, Jeramy; Mullikin, James C; Rokhsar, Daniel; Finnerty, John R

    2006-01-01

    StellaBase, the Nematostella vectensis Genomics Database, is a web-based resource that will facilitate desktop and bench-top studies of the starlet sea anemone. Nematostella is an emerging model organism that has already proven useful for addressing fundamental questions in developmental evolution and evolutionary genomics. StellaBase allows users to query the assembled Nematostella genome, a confirmed gene library, and a predicted genome using both keyword and homology based search functions. Data provided by these searches will elucidate gene family evolution in early animals. Unique research tools, including a Nematostella genetic stock library, a primer library, a literature repository and a gene expression library will provide support to the burgeoning Nematostella research community. The development of StellaBase accompanies significant upgrades to CnidBase, the Cnidarian Evolutionary Genomics Database. With the completion of the first sequenced cnidarian genome, genome comparison tools have been added to CnidBase. In addition, StellaBase provides a framework for the integration of additional species-specific databases into CnidBase. StellaBase is available at http://www.stellabase.org.

  17. GenColors-based comparative genome databases for small eukaryotic genomes.

    PubMed

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources.

  18. GenColors-based comparative genome databases for small eukaryotic genomes

    PubMed Central

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources. PMID:23193285

  19. Genome Sequence of Aspergillus flavus NRRL 3357, a Strain That Causes Aflatoxin Contamination of Food and Feed.

    PubMed

    Nierman, William C; Yu, Jiujiang; Fedorova-Abrams, Natalie D; Losada, Liliana; Cleveland, Thomas E; Bhatnagar, Deepak; Bennett, Joan W; Dean, Ralph; Payne, Gary A

    2015-04-16

    Aflatoxin contamination of food and livestock feed results in significant annual crop losses internationally. Aspergillus flavus is the major fungus responsible for this loss. Additionally, A. flavus is the second leading cause of aspergillosis in immunocompromised human patients. Here, we report the genome sequence of strain NRRL 3357.

  20. DBM-DB: the diamondback moth genome database.

    PubMed

    Tang, Weiqi; Yu, Liying; He, Weiyi; Yang, Guang; Ke, Fushi; Baxter, Simon W; You, Shijun; Douglas, Carl J; You, Minsheng

    2014-01-01

    The diamondback moth Genome Database (DBM-DB) is a central online repository for storing and integrating genomic data of diamondback moth (DBM), Plutella xylostella (L.). It provides comprehensive search tools and downloadable datasets for scientists to study comparative genomics, biological interpretation and gene annotation of this insect pest. DBM-DB contains assembled transcriptome datasets from multiple DBM strains and developmental stages, and the annotated genome of P. xylostella (version 2). We have also integrated publically available ESTs from NCBI and a putative gene set from a second DBM genome (KONAGbase) to enable users to compare different gene models. DBM-DB was developed with the capacity to incorporate future data resources, and will serve as a long-term and open-access database that can be conveniently used for research on the biology, distribution and evolution of DBM. This resource aims to help reduce the impact DBM has on agriculture using genomic and molecular tools. Database URL: http://iae.fafu.edu.cn/DBM/

  1. Megx.net: integrated database resource for marine ecological genomics

    PubMed Central

    Kottmann, Renzo; Kostadinov, Ivalyo; Duhaime, Melissa Beth; Buttigieg, Pier Luigi; Yilmaz, Pelin; Hankeln, Wolfgang; Waldmann, Jost; Glöckner, Frank Oliver

    2010-01-01

    Megx.net is a database and portal that provides integrated access to georeferenced marker genes, environment data and marine genome and metagenome projects for microbial ecological genomics. All data are stored in the Microbial Ecological Genomics DataBase (MegDB), which is subdivided to hold both sequence and habitat data and global environmental data layers. The extended system provides access to several hundreds of genomes and metagenomes from prokaryotes and phages, as well as over a million small and large subunit ribosomal RNA sequences. With the refined Genes Mapserver, all data can be interactively visualized on a world map and statistics describing environmental parameters can be calculated. Sequence entries have been curated to comply with the proposed minimal standards for genomes and metagenomes (MIGS/MIMS) of the Genomic Standards Consortium. Access to data is facilitated by Web Services. The updated megx.net portal offers microbial ecologists greatly enhanced database content, and new features and tools for data analysis, all of which are freely accessible from our webpage http://www.megx.net. PMID:19858098

  2. Rapid genome resequencing of an atoxigenic strain of Aspergillus carbonarius

    SciTech Connect

    Cabañes, F. Javier; Sanseverino, Walter; Castellá, Gemma; Bragulat, M. Rosa; Cigliano, Riccardo Aiese; Sánchez, Armand

    2015-03-13

    In microorganisms, Ion Torrent sequencing technology has been proved to be useful in whole-genome sequencing of bacterial genomes (5 Mbp). In our study, for the first time we used this technology to perform a resequencing approach in a whole fungal genome (36 Mbp), a non-ochratoxin A producing strain of Aspergillus carbonarius. Ochratoxin A (OTA) is a potent nephrotoxin which is found mainly in cereals and their products, but it also occurs in a variety of common foods and beverages. Due to the fact that this strain does not produce OTA, we focused some of the bioinformatics analyses in genes involved in OTA biosynthesis, using a reference genome of an OTA producing strain of the same species. This study revealed that in the atoxigenic strain there is a high accumulation of nonsense and missense mutations in several genes. Importantly, a two fold increase in gene mutation ratio was observed in PKS and NRPS encoding genes which are suggested to be involved in OTA biosynthesis.

  3. Choosing a Genome Browser for a Model Organism Database (MOD): Surveying the Maize Community

    Technology Transfer Automated Retrieval System (TEKTRAN)

    As the maize genome sequencing is nearing its completion, the Maize Genetics and Genomics Database (MaizeGDB), the Model Organism Database for maize, integrated a genome browser to its already existing Web interface and database. The addition of the MaizeGDB Genome Browser to MaizeGDB will allow it ...

  4. A primer on rapid prototyping of genomic databases in Prolog

    SciTech Connect

    Yoshida, Kaoru; Smith, C.L.; Overbeek, R.

    1992-01-01

    This report presents a tutorial on how one might create an integrated database of genomic information. We outline the required steps for implementation, give a brief introduction to Prolog, and discuss the query facility supported by our system. Our goal is to enable researchers to being constructing their own biological information system.

  5. MaizeGDB: The Maize Genetics and Genomics Database.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    MaizeGDB is the community database for biological information about the crop plant Zea mays. Genomic, genetic, sequence, gene product, functional characterization, literature reference, and person/organization contact information are among the datatypes stored at MaizeGDB. At the project’s website...

  6. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases

    PubMed Central

    Zobel, Justin; Zhang, Xiuzhen; Verspoor, Karin

    2016-01-01

    Motivation First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. Results We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material. PMID:27489953

  7. PGDD: a database of gene and genome duplication in plants

    PubMed Central

    Lee, Tae-Ho; Tang, Haibao; Wang, Xiyin; Paterson, Andrew H.

    2013-01-01

    Genome duplication (GD) has permanently shaped the architecture and function of many higher eukaryotic genomes. The angiosperms (flowering plants) are outstanding models in which to elucidate consequences of GD for higher eukaryotes, owing to their propensity for chromosomal duplication or even triplication in a few cases. Duplicated genome structures often require both intra- and inter-genome alignments to unravel their evolutionary history, also providing the means to deduce both obvious and otherwise-cryptic orthology, paralogy and other relationships among genes. The burgeoning sets of angiosperm genome sequences provide the foundation for a host of investigations into the functional and evolutionary consequences of gene and GD. To provide genome alignments from a single resource based on uniform standards that have been validated by empirical studies, we built the Plant Genome Duplication Database (PGDD; freely available at http://chibba.agtec.uga.edu/duplication/), a web service providing synteny information in terms of colinearity between chromosomes. At present, PGDD contains data for 26 plants including bryophytes and chlorophyta, as well as angiosperms with draft genome sequences. In addition to the inclusion of new genomes as they become available, we are preparing new functions to enhance PGDD. PMID:23180799

  8. Tripal: a construction toolkit for online genome databases.

    PubMed

    Ficklin, Stephen P; Sanderson, Lacey-Anne; Cheng, Chun-Huai; Staton, Margaret E; Lee, Taein; Cho, Il-Hyung; Jung, Sook; Bett, Kirstin E; Main, Doreen

    2011-01-01

    As the availability, affordability and magnitude of genomics and genetics research increases so does the need to provide online access to resulting data and analyses. Availability of a tailored online database is the desire for many investigators or research communities; however, managing the Information Technology infrastructure needed to create such a database can be an undesired distraction from primary research or potentially cost prohibitive. Tripal provides simplified site development by merging the power of Drupal, a popular web Content Management System with that of Chado, a community-derived database schema for storage of genomic, genetic and other related biological data. Tripal provides an interface that extends the content management features of Drupal to the data housed in Chado. Furthermore, Tripal provides a web-based Chado installer, genomic data loaders, web-based editing of data for organisms, genomic features, biological libraries, controlled vocabularies and stock collections. Also available are Tripal extensions that support loading and visualizations of NCBI BLAST, InterPro, Kyoto Encyclopedia of Genes and Genomes and Gene Ontology analyses, as well as an extension that provides integration of Tripal with GBrowse, a popular GMOD tool. An Application Programming Interface is available to allow creation of custom extensions by site developers, and the look-and-feel of the site is completely customizable through Drupal-based PHP template files. Addition of non-biological content and user-management is afforded through Drupal. Tripal is an open source and freely available software package found at http://tripal.sourceforge.net.

  9. Integrated database for identifying candate genes for Aspergillus flavus resistance in maize

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus Link:Fr, an opportunistic fungus that produces aflatoxin, is pathogenic to maize and other oilseed crops. Aflatoxin is a potent carcinogen, and its presence markedly reduces the value of grain. Understanding and enhancing host resistance to A. flavus infection and/or subsequent af...

  10. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

    PubMed

    Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

    2016-01-04

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches.

  11. OGRe: a relational database for comparative analysis of mitochondrial genomes

    PubMed Central

    Jameson, Daniel; Gibson, Andrew P.; Hudelot, Cendrine; Higgs, Paul G.

    2003-01-01

    Organellar Genome Retrieval (OGRe) is a relational database of complete mitochondrial genome sequences for over 250 Metazoan species. OGRe provides a resource for the comparative analysis of mitochondrial genomes at several levels. At the sequence level, OGRe allows the retrieval of any selected set of mitochondrial genes from any selected set of species. Species are classified using a taxonomic system that allows easy selection of related groups of species. Sequence alignments are also available for some species. At the level of individual nucleotides, the system contains information on base frequencies and codon usage frequencies that can be compared between organisms. At the level of whole genomes, OGRe provides several ways of visualizing information on gene order. Diagrams illustrating the genome arrangement can be generated for any selected set of species automatically from the information in the database. Searches can be done based on gene arrangement to find sets of species that have the same order as one another. Diagrams for pairwise comparison of species can be produced that show the positions of break-points in the gene order and use colour to highlight the sections of the genome that have moved. OGRe is available from http://www.bioinf.man.ac.uk/ogre. PMID:12519982

  12. DemaDb: an integrated dematiaceous fungal genomes database.

    PubMed

    Kuan, Chee Sian; Yew, Su Mei; Chan, Chai Ling; Toh, Yue Fen; Lee, Kok Wei; Cheong, Wei-Hien; Yee, Wai-Yan; Hoh, Chee-Choong; Yap, Soon-Joo; Ng, Kee Peng

    2016-01-01

    Many species of dematiaceous fungi are associated with allergic reactions and potentially fatal diseases in human, especially in tropical climates. Over the past 10 years, we have isolated more than 400 dematiaceous fungi from various clinical samples. In this study, DemaDb, an integrated database was designed to support the integration and analysis of dematiaceous fungal genomes. A total of 92 072 putative genes and 6527 pathways that identified in eight dematiaceous fungi (Bipolaris papendorfii UM 226, Daldinia eschscholtzii UM 1400, D. eschscholtzii UM 1020, Pyrenochaeta unguis-hominis UM 256, Ochroconis mirabilis UM 578, Cladosporium sphaerospermum UM 843, Herpotrichiellaceae sp. UM 238 and Pleosporales sp. UM 1110) were deposited in DemaDb. DemaDb includes functional annotations for all predicted gene models in all genomes, such as Gene Ontology, EuKaryotic Orthologous Groups, Kyoto Encyclopedia of Genes and Genomes (KEGG), Pfam and InterProScan. All predicted protein models were further functionally annotated to Carbohydrate-Active enzymes, peptidases, secondary metabolites and virulence factors. DemaDb Genome Browser enables users to browse and visualize entire genomes with annotation data including gene prediction, structure, orientation and custom feature tracks. The Pathway Browser based on the KEGG pathway database allows users to look into molecular interaction and reaction networks for all KEGG annotated genes. The availability of downloadable files containing assembly, nucleic acid, as well as protein data allows the direct retrieval for further downstream works. DemaDb is a useful resource for fungal research community especially those involved in genome-scale analysis, functional genomics, genetics and disease studies of dematiaceous fungi. Database URL: http://fungaldb.um.edu.my.

  13. DemaDb: an integrated dematiaceous fungal genomes database

    PubMed Central

    Kuan, Chee Sian; Yew, Su Mei; Chan, Chai Ling; Toh, Yue Fen; Lee, Kok Wei; Cheong, Wei-Hien; Yee, Wai-Yan; Hoh, Chee-Choong; Yap, Soon-Joo; Ng, Kee Peng

    2016-01-01

    Many species of dematiaceous fungi are associated with allergic reactions and potentially fatal diseases in human, especially in tropical climates. Over the past 10 years, we have isolated more than 400 dematiaceous fungi from various clinical samples. In this study, DemaDb, an integrated database was designed to support the integration and analysis of dematiaceous fungal genomes. A total of 92 072 putative genes and 6527 pathways that identified in eight dematiaceous fungi (Bipolaris papendorfii UM 226, Daldinia eschscholtzii UM 1400, D. eschscholtzii UM 1020, Pyrenochaeta unguis-hominis UM 256, Ochroconis mirabilis UM 578, Cladosporium sphaerospermum UM 843, Herpotrichiellaceae sp. UM 238 and Pleosporales sp. UM 1110) were deposited in DemaDb. DemaDb includes functional annotations for all predicted gene models in all genomes, such as Gene Ontology, EuKaryotic Orthologous Groups, Kyoto Encyclopedia of Genes and Genomes (KEGG), Pfam and InterProScan. All predicted protein models were further functionally annotated to Carbohydrate-Active enzymes, peptidases, secondary metabolites and virulence factors. DemaDb Genome Browser enables users to browse and visualize entire genomes with annotation data including gene prediction, structure, orientation and custom feature tracks. The Pathway Browser based on the KEGG pathway database allows users to look into molecular interaction and reaction networks for all KEGG annotated genes. The availability of downloadable files containing assembly, nucleic acid, as well as protein data allows the direct retrieval for further downstream works. DemaDb is a useful resource for fungal research community especially those involved in genome-scale analysis, functional genomics, genetics and disease studies of dematiaceous fungi. Database URL: http://fungaldb.um.edu.my PMID:26980516

  14. WheatGenome.info: an integrated database and portal for wheat genome information.

    PubMed

    Lai, Kaitao; Berkman, Paul J; Lorenc, Michal Tadeusz; Duran, Chris; Smits, Lars; Manoli, Sahana; Stiller, Jiri; Edwards, David

    2012-02-01

    Bread wheat (Triticum aestivum) is one of the most important crop plants, globally providing staple food for a large proportion of the human population. However, improvement of this crop has been limited due to its large and complex genome. Advances in genomics are supporting wheat crop improvement. We provide a variety of web-based systems hosting wheat genome and genomic data to support wheat research and crop improvement. WheatGenome.info is an integrated database resource which includes multiple web-based applications. These include a GBrowse2-based wheat genome viewer with BLAST search portal, TAGdb for searching wheat second-generation genome sequence data, wheat autoSNPdb, links to wheat genetic maps using CMap and CMap3D, and a wheat genome Wiki to allow interaction between diverse wheat genome sequencing activities. This system includes links to a variety of wheat genome resources hosted at other research organizations. This integrated database aims to accelerate wheat genome research and is freely accessible via the web interface at http://www.wheatgenome.info/.

  15. EuPathDB: the eukaryotic pathogen genomics database resource.

    PubMed

    Aurrecoechea, Cristina; Barreto, Ana; Basenko, Evelina Y; Brestelli, John; Brunk, Brian P; Cade, Shon; Crouch, Kathryn; Doherty, Ryan; Falke, Dave; Fischer, Steve; Gajria, Bindu; Harb, Omar S; Heiges, Mark; Hertz-Fowler, Christiane; Hu, Sufen; Iodice, John; Kissinger, Jessica C; Lawrence, Cris; Li, Wei; Pinney, Deborah F; Pulman, Jane A; Roos, David S; Shanmugasundram, Achchuthan; Silva-Franco, Fatima; Steinbiss, Sascha; Stoeckert, Christian J; Spruill, Drew; Wang, Haiming; Warrenfeltz, Susanne; Zheng, Jie

    2017-01-04

    The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, http://eupathdb.org) is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of host-pathogen interactions.

  16. EuPathDB: the eukaryotic pathogen genomics database resource

    PubMed Central

    Aurrecoechea, Cristina; Barreto, Ana; Basenko, Evelina Y.; Brestelli, John; Brunk, Brian P.; Cade, Shon; Crouch, Kathryn; Doherty, Ryan; Falke, Dave; Fischer, Steve; Gajria, Bindu; Harb, Omar S.; Heiges, Mark; Hertz-Fowler, Christiane; Hu, Sufen; Iodice, John; Kissinger, Jessica C.; Lawrence, Cris; Li, Wei; Pinney, Deborah F.; Pulman, Jane A.; Roos, David S.; Shanmugasundram, Achchuthan; Silva-Franco, Fatima; Steinbiss, Sascha; Stoeckert, Christian J.; Spruill, Drew; Wang, Haiming; Warrenfeltz, Susanne; Zheng, Jie

    2017-01-01

    The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, http://eupathdb.org) is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of host–pathogen interactions. PMID:27903906

  17. Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics.

    PubMed

    Sakai, Hiroaki; Lee, Sung Shin; Tanaka, Tsuyoshi; Numa, Hisataka; Kim, Jungsok; Kawahara, Yoshihiro; Wakimoto, Hironobu; Yang, Ching-chia; Iwamoto, Masao; Abe, Takashi; Yamada, Yuko; Muto, Akira; Inokuchi, Hachiro; Ikemura, Toshimichi; Matsumoto, Takashi; Sasaki, Takuji; Itoh, Takeshi

    2013-02-01

    The Rice Annotation Project Database (RAP-DB, http://rapdb.dna.affrc.go.jp/) has been providing a comprehensive set of gene annotations for the genome sequence of rice, Oryza sativa (japonica group) cv. Nipponbare. Since the first release in 2005, RAP-DB has been updated several times along with the genome assembly updates. Here, we present our newest RAP-DB based on the latest genome assembly, Os-Nipponbare-Reference-IRGSP-1.0 (IRGSP-1.0), which was released in 2011. We detected 37,869 loci by mapping transcript and protein sequences of 150 monocot species. To provide plant researchers with highly reliable and up to date rice gene annotations, we have been incorporating literature-based manually curated data, and 1,626 loci currently incorporate literature-based annotation data, including commonly used gene names or gene symbols. Transcriptional activities are shown at the nucleotide level by mapping RNA-Seq reads derived from 27 samples. We also mapped the Illumina reads of a Japanese leading japonica cultivar, Koshihikari, and a Chinese indica cultivar, Guangluai-4, to the genome and show alignments together with the single nucleotide polymorphisms (SNPs) and gene functional annotations through a newly developed browser, Short-Read Assembly Browser (S-RAB). We have developed two satellite databases, Plant Gene Family Database (PGFD) and Integrative Database of Cereal Gene Phylogeny (IDCGP), which display gene family and homologous gene relationships among diverse plant species. RAP-DB and the satellite databases offer simple and user-friendly web interfaces, enabling plant and genome researchers to access the data easily and facilitating a broad range of plant research topics.

  18. The Genome Database for Rosaceae (GDR): year 10 update.

    PubMed

    Jung, Sook; Ficklin, Stephen P; Lee, Taein; Cheng, Chun-Huai; Blenda, Anna; Zheng, Ping; Yu, Jing; Bombarely, Aureliano; Cho, Ilhyung; Ru, Sushan; Evans, Kate; Peace, Cameron; Abbott, Albert G; Mueller, Lukas A; Olmstead, Mercy A; Main, Dorrie

    2014-01-01

    The Genome Database for Rosaceae (GDR, http:/www.rosaceae.org), the long-standing central repository and data mining resource for Rosaceae research, has been enhanced with new genomic, genetic and breeding data, and improved functionality. Whole genome sequences of apple, peach and strawberry are available to browse or download with a range of annotations, including gene model predictions, aligned transcripts, repetitive elements, polymorphisms, mapped genetic markers, mapped NCBI Rosaceae genes, gene homologs and association of InterPro protein domains, GO terms and Kyoto Encyclopedia of Genes and Genomes pathway terms. Annotated sequences can be queried using search interfaces and visualized using GBrowse. New expressed sequence tag unigene sets are available for major genera, and Pathway data are available through FragariaCyc, AppleCyc and PeachCyc databases. Synteny among the three sequenced genomes can be viewed using GBrowse_Syn. New markers, genetic maps and extensively curated qualitative/Mendelian and quantitative trait loci are available. Phenotype and genotype data from breeding projects and genetic diversity projects are also included. Improved search pages are available for marker, trait locus, genetic diversity and publication data. New search tools for breeders enable selection comparison and assistance with breeding decision making.

  19. The Genome Database for Rosaceae (GDR): year 10 update

    PubMed Central

    Jung, Sook; Ficklin, Stephen P.; Lee, Taein; Cheng, Chun-Huai; Blenda, Anna; Zheng, Ping; Yu, Jing; Bombarely, Aureliano; Cho, Ilhyung; Ru, Sushan; Evans, Kate; Peace, Cameron; Abbott, Albert G.; Mueller, Lukas A.; Olmstead, Mercy A.; Main, Dorrie

    2014-01-01

    The Genome Database for Rosaceae (GDR, http:/www.rosaceae.org), the long-standing central repository and data mining resource for Rosaceae research, has been enhanced with new genomic, genetic and breeding data, and improved functionality. Whole genome sequences of apple, peach and strawberry are available to browse or download with a range of annotations, including gene model predictions, aligned transcripts, repetitive elements, polymorphisms, mapped genetic markers, mapped NCBI Rosaceae genes, gene homologs and association of InterPro protein domains, GO terms and Kyoto Encyclopedia of Genes and Genomes pathway terms. Annotated sequences can be queried using search interfaces and visualized using GBrowse. New expressed sequence tag unigene sets are available for major genera, and Pathway data are available through FragariaCyc, AppleCyc and PeachCyc databases. Synteny among the three sequenced genomes can be viewed using GBrowse_Syn. New markers, genetic maps and extensively curated qualitative/Mendelian and quantitative trait loci are available. Phenotype and genotype data from breeding projects and genetic diversity projects are also included. Improved search pages are available for marker, trait locus, genetic diversity and publication data. New search tools for breeders enable selection comparison and assistance with breeding decision making. PMID:24225320

  20. Construction of an integrated database to support genomic sequence analysis

    SciTech Connect

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  1. Genome Information Broker for Viruses (GIB-V): database for comparative analysis of virus genomes

    PubMed Central

    Hirahata, Masaki; Abe, Takashi; Tanaka, Naoto; Kuwana, Yoshikazu; Shigemoto, Yasumasa; Miyazaki, Satoru; Suzuki, Yoshiyuki; Sugawara, Hideaki

    2007-01-01

    Genome Information Broker for Viruses (GIB-V) is a comprehensive virus genome/segment database. We extracted 18 418 complete virus genomes/segments from the International Nucleotide Sequence Database Collaboration (INSDC, ) by DNA Data Bank of Japan (DDBJ), EMBL and GenBank and stored them in our system. The list of registered viruses is arranged hierarchically according to taxonomy. Keyword searches can be performed for genome/segment data or biological features of any virus stored in GIB-V. GIB-V is equipped with a BLAST search function, and search results are displayed graphically or in list form. Moreover, the BLAST results can be used online with the ClustalW feature of the DDBJ. All available virus genome/segment data can be collected by the GIB-V download function. GIB-V can be accessed at no charge at . PMID:17158166

  2. GEAR: A database of Genomic Elements Associated with drug Resistance

    PubMed Central

    Wang, Yin-Ying; Chen, Wei-Hua; Xiao, Pei-Pei; Xie, Wen-Bin; Luo, Qibin; Bork, Peer; Zhao, Xing-Ming

    2017-01-01

    Drug resistance is becoming a serious problem that leads to the failure of standard treatments, which is generally developed because of genetic mutations of certain molecules. Here, we present GEAR (A database of Genomic Elements Associated with drug Resistance) that aims to provide comprehensive information about genomic elements (including genes, single-nucleotide polymorphisms and microRNAs) that are responsible for drug resistance. Right now, GEAR contains 1631 associations between 201 human drugs and 758 genes, 106 associations between 29 human drugs and 66 miRNAs, and 44 associations between 17 human drugs and 22 SNPs. These relationships are firstly extracted from primary literature with text mining and then manually curated. The drug resistome deposited in GEAR provides insights into the genetic factors underlying drug resistance. In addition, new indications and potential drug combinations can be identified based on the resistome. The GEAR database can be freely accessed through http://gear.comp-sysbio.org. PMID:28294141

  3. GEAR: A database of Genomic Elements Associated with drug Resistance.

    PubMed

    Wang, Yin-Ying; Chen, Wei-Hua; Xiao, Pei-Pei; Xie, Wen-Bin; Luo, Qibin; Bork, Peer; Zhao, Xing-Ming

    2017-03-15

    Drug resistance is becoming a serious problem that leads to the failure of standard treatments, which is generally developed because of genetic mutations of certain molecules. Here, we present GEAR (A database of Genomic Elements Associated with drug Resistance) that aims to provide comprehensive information about genomic elements (including genes, single-nucleotide polymorphisms and microRNAs) that are responsible for drug resistance. Right now, GEAR contains 1631 associations between 201 human drugs and 758 genes, 106 associations between 29 human drugs and 66 miRNAs, and 44 associations between 17 human drugs and 22 SNPs. These relationships are firstly extracted from primary literature with text mining and then manually curated. The drug resistome deposited in GEAR provides insights into the genetic factors underlying drug resistance. In addition, new indications and potential drug combinations can be identified based on the resistome. The GEAR database can be freely accessed through http://gear.comp-sysbio.org.

  4. Tripal: a construction toolkit for online genome databases

    PubMed Central

    Sanderson, Lacey-Anne; Cheng, Chun-Huai; Staton, Margaret E.; Lee, Taein; Cho, Il-Hyung; Jung, Sook; Bett, Kirstin E.; Main, Doreen

    2011-01-01

    As the availability, affordability and magnitude of genomics and genetics research increases so does the need to provide online access to resulting data and analyses. Availability of a tailored online database is the desire for many investigators or research communities; however, managing the Information Technology infrastructure needed to create such a database can be an undesired distraction from primary research or potentially cost prohibitive. Tripal provides simplified site development by merging the power of Drupal, a popular web Content Management System with that of Chado, a community-derived database schema for storage of genomic, genetic and other related biological data. Tripal provides an interface that extends the content management features of Drupal to the data housed in Chado. Furthermore, Tripal provides a web-based Chado installer, genomic data loaders, web-based editing of data for organisms, genomic features, biological libraries, controlled vocabularies and stock collections. Also available are Tripal extensions that support loading and visualizations of NCBI BLAST, InterPro, Kyoto Encyclopedia of Genes and Genomes and Gene Ontology analyses, as well as an extension that provides integration of Tripal with GBrowse, a popular GMOD tool. An Application Programming Interface is available to allow creation of custom extensions by site developers, and the look-and-feel of the site is completely customizable through Drupal-based PHP template files. Addition of non-biological content and user-management is afforded through Drupal. Tripal is an open source and freely available software package found at http://tripal.sourceforge.net PMID:21959868

  5. Data mining approaches for information retrieval from genomic databases

    NASA Astrophysics Data System (ADS)

    Liu, Donglin; Singh, Gautam B.

    2000-04-01

    Sequence retrieval in genomic databases is used for finding sequences related to a query sequence specified by a user. Comparison is the main part of the retrieval system in genomic databases. An efficient sequence comparison algorithm is critical in bioinformatics. There are several different algorithms to perform sequence comparison, such as the suffix array based database search, divergence measurement, methods that rely upon the existence of a local similarity between the query sequence and sequences in the database, or common mutual information between query and sequences in DB. In this paper we have described a new method for DNA sequence retrieval based on data mining techniques. Data mining tools generally find patterns among data and have been successfully applied in industries to improve marketing, sales, and customer support operations. We have applied the descriptive data mining techniques to find relevant patterns that are significant for comparing genetic sequences. Relevance feedback score based on common patterns is developed and employed to compute distance between sequences. The contigs of human chromosomes are used to test the retrieval accuracy and the experimental results are presented.

  6. Future vision of the GDB human genome database.

    PubMed

    Cuticchia, A J

    2000-01-01

    In 1973, scientists assembled at the first Human Gene Mapping Workshop to discuss the 64 human genes mapped at that time. In 1989, the GDB Human Genome Database was created to store information on 1, 700 mapped human genes. Ten years later, as the human genome project closes in on the release of the complete DNA sequence holding as many as 100,000 human genes, GDB is evolving to continue to meet the needs of the scientific community. Well known as a resource for data which has been stringently reviewed as part of the curation process, GDB prepares to continue to provide a compilation of the human genome including maps, map objects, polymorphisms, and mutations. As more sites across the Internet are established to share biological information, it becomes increasingly burdensome for the scientist to collect data from all sources of a particular domain. In an attempt to reduce this burden, GDB continues to load data from large genome centres and accept submissions from researchers around the world. Moreover, GDB looks to provide a mechanism to link gene-related information to the human reference sequence. In doing this, GDB plans to establish federated linkages with "boutique" databases around the world that could contain enormous amounts of valuable information about specific genes or chromosomes.

  7. CnidBase: The Cnidarian Evolutionary Genomics Database

    PubMed Central

    Ryan, Joseph F.; Finnerty, John R.

    2003-01-01

    CnidBase, the Cnidarian Evolutionary Genomics Database, is a tool for investigating the evolutionary, developmental and ecological factors that affect gene expression and gene function in cnidarians. In turn, CnidBase will help to illuminate the role of specific genes in shaping cnidarian biodiversity in the present day and in the distant past. CnidBase highlights evolutionary changes between species within the phylum Cnidaria and structures genomic and expression data to facilitate comparisons to non-cnidarian metazoans. CnidBase aims to further the progress that has already been made in the realm of cnidarian evolutionary genomics by creating a central community resource which will help drive future research and facilitate more accurate classification and comparison of new experimental data with existing data. CnidBase is available at http://cnidbase.bu.edu/. PMID:12519972

  8. Sinbase: an integrated database to study genomics, genetics and comparative genomics in Sesamum indicum.

    PubMed

    Wang, Linhai; Yu, Jingyin; Li, Donghua; Zhang, Xiurong

    2015-01-01

    Sesame (Sesamum indicum L.) is an ancient and important oilseed crop grown widely in tropical and subtropical areas. It belongs to the gigantic order Lamiales, which includes many well-known or economically important species, such as olive (Olea europaea), leonurus (Leonurus japonicus) and lavender (Lavandula spica), many of which have important pharmacological properties. Despite their importance, genetic and genomic analyses on these species have been insufficient due to a lack of reference genome information. The now available S. indicum genome will provide an unprecedented opportunity for studying both S. indicum genetic traits and comparative genomics. To deliver S. indicum genomic information to the worldwide research community, we designed Sinbase, a web-based database with comprehensive sesame genomic, genetic and comparative genomic information. Sinbase includes sequences of assembled sesame pseudomolecular chromosomes, protein-coding genes (27,148), transposable elements (372,167) and non-coding RNAs (1,748). In particular, Sinbase provides unique and valuable information on colinear regions with various plant genomes, including Arabidopsis thaliana, Glycine max, Vitis vinifera and Solanum lycopersicum. Sinbase also provides a useful search function and data mining tools, including a keyword search and local BLAST service. Sinbase will be updated regularly with new features, improvements to genome annotation and new genomic sequences, and is freely accessible at http://ocri-genomics.org/Sinbase/.

  9. A web-based genomic sequence database for the Streptomycetaceae: a tool for systematics and genome mining

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The ARS Microbial Genome Sequence Database (http://199.133.98.43), a web-based database server, was established utilizing the BIGSdb (Bacterial Isolate Genomics Sequence Database) software package, developed at Oxford University, as a tool to manage multi-locus sequence data for the family Streptomy...

  10. Addition of a breeding database in the Genome Database for Rosaceae.

    PubMed

    Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie

    2013-01-01

    Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will

  11. MonarchBase: the monarch butterfly genome database

    PubMed Central

    Zhan, Shuai; Reppert, Steven M.

    2013-01-01

    The monarch butterfly (Danaus plexippus) is emerging as a model organism to study the mechanisms of circadian clocks and animal navigation, and the genetic underpinnings of long-distance migration. The initial assembly of the monarch genome was released in 2011, and the biological interpretation of the genome focused on the butterfly’s migration biology. To make the extensive data associated with the genome accessible to the general biological and lepidopteran communities, we established MonarchBase (available at http://monarchbase.umassmed.edu). The database is an open-access, web-available portal that integrates all available data associated with the monarch butterfly genome. Moreover, MonarchBase provides access to an updated version of genome assembly (v3) upon which all data integration is based. These include genes with systematic annotation, as well as other molecular resources, such as brain expressed sequence tags, migration expression profiles and microRNAs. MonarchBase utilizes a variety of retrieving methods to access data conveniently and for integrating biological interpretations. PMID:23143105

  12. MonarchBase: the monarch butterfly genome database.

    PubMed

    Zhan, Shuai; Reppert, Steven M

    2013-01-01

    The monarch butterfly (Danaus plexippus) is emerging as a model organism to study the mechanisms of circadian clocks and animal navigation, and the genetic underpinnings of long-distance migration. The initial assembly of the monarch genome was released in 2011, and the biological interpretation of the genome focused on the butterfly's migration biology. To make the extensive data associated with the genome accessible to the general biological and lepidopteran communities, we established MonarchBase (available at http://monarchbase.umassmed.edu). The database is an open-access, web-available portal that integrates all available data associated with the monarch butterfly genome. Moreover, MonarchBase provides access to an updated version of genome assembly (v3) upon which all data integration is based. These include genes with systematic annotation, as well as other molecular resources, such as brain expressed sequence tags, migration expression profiles and microRNAs. MonarchBase utilizes a variety of retrieving methods to access data conveniently and for integrating biological interpretations.

  13. Bovine Genome Database: new tools for gleaning function from the Bos taurus genome.

    PubMed

    Elsik, Christine G; Unni, Deepak R; Diesh, Colin M; Tayal, Aditi; Emery, Marianne L; Nguyen, Hung N; Hagen, Darren E

    2016-01-04

    We report an update of the Bovine Genome Database (BGD) (http://BovineGenome.org). The goal of BGD is to support bovine genomics research by providing genome annotation and data mining tools. We have developed new genome and annotation browsers using JBrowse and WebApollo for two Bos taurus genome assemblies, the reference genome assembly (UMD3.1.1) and the alternate genome assembly (Btau_4.6.1). Annotation tools have been customized to highlight priority genes for annotation, and to aid annotators in selecting gene evidence tracks from 91 tissue specific RNAseq datasets. We have also developed BovineMine, based on the InterMine data warehousing system, to integrate the bovine genome, annotation, QTL, SNP and expression data with external sources of orthology, gene ontology, gene interaction and pathway information. BovineMine provides powerful query building tools, as well as customized query templates, and allows users to analyze and download genome-wide datasets. With BovineMine, bovine researchers can use orthology to leverage the curated gene pathways of model organisms, such as human, mouse and rat. BovineMine will be especially useful for gene ontology and pathway analyses in conjunction with GWAS and QTL studies.

  14. Exploring Genetic, Genomic, and Phenotypic Data at the Rat Genome Database

    PubMed Central

    Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Lowry, Timothy F.; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R.; Dwinell, Melinda R.; Jacob, Howard J.; Shimoyama, Mary

    2013-01-01

    The laboratory rat, Rattus norvegicus, is an important model of human health and disease, and experimental findings in the rat have relevance to human physiology and disease. The Rat Genome Database (RGD, http://rgd.mcw.edu) is a model organism database that provides access to a wide variety of curated rat data including disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components for genes, quantitative trait loci, and strains. We present an overview of the database followed by specific examples that can be used to gain experience in employing RGD to explore the wealth of functional data available for the rat. PMID:23255149

  15. Mouse Genome Database: From sequence to phenotypes and disease models

    PubMed Central

    Richardson, Joel E.; Kadin, James A.; Smith, Cynthia L.; Blake, Judith A.; Bult, Carol J.

    2015-01-01

    Summary The Mouse Genome Database (MGD, www.informatics.jax.org) is the international scientific database for genetic, genomic, and biological data on the laboratory mouse to support the research requirements of the biomedical community. To accomplish this goal, MGD provides broad data coverage, serves as the authoritative standard for mouse nomenclature for genes, mutants, and strains, and curates and integrates many types of data from literature and electronic sources. Among the key data sets MGD supports are: the complete catalog of mouse genes and genome features, comparative homology data for mouse and vertebrate genes, the authoritative set of Gene Ontology (GO) annotations for mouse gene functions, a comprehensive catalog of mouse mutations and their phenotypes, and a curated compendium of mouse models of human diseases. Here, we describe the data acquisition process, specifics about MGD's key data areas, methods to access and query MGD data, and outreach and user help facilities. genesis 53:458–473, 2015. © 2015 The Authors. Genesis Published by Wiley Periodicals, Inc. PMID:26150326

  16. DoGSD: the dog and wolf genome SNP database.

    PubMed

    Bai, Bing; Zhao, Wen-Ming; Tang, Bi-Xia; Wang, Yan-Qing; Wang, Lu; Zhang, Zhang; Yang, He-Chuan; Liu, Yan-Hu; Zhu, Jun-Wei; Irwin, David M; Wang, Guo-Dong; Zhang, Ya-Ping

    2015-01-01

    The rapid advancement of next-generation sequencing technology has generated a deluge of genomic data from domesticated dogs and their wild ancestor, grey wolves, which have simultaneously broadened our understanding of domestication and diseases that are shared by humans and dogs. To address the scarcity of single nucleotide polymorphism (SNP) data provided by authorized databases and to make SNP data more easily/friendly usable and available, we propose DoGSD (http://dogsd.big.ac.cn), the first canidae-specific database which focuses on whole genome SNP data from domesticated dogs and grey wolves. The DoGSD is a web-based, open-access resource comprising ∼ 19 million high-quality whole-genome SNPs. In addition to the dbSNP data set (build 139), DoGSD incorporates a comprehensive collection of SNPs from two newly sequenced samples (1 wolf and 1 dog) and collected SNPs from three latest dog/wolf genetic studies (7 wolves and 68 dogs), which were taken together for analysis with the population genetic statistics, Fst. In addition, DoGSD integrates some closely related information including SNP annotation, summary lists of SNPs located in genes, synonymous and non-synonymous SNPs, sampling location and breed information. All these features make DoGSD a useful resource for in-depth analysis in dog-/wolf-related studies.

  17. Functional Genomic Analysis of Aspergillus flavus Interacting with Resistant and Susceptible Peanut

    PubMed Central

    Wang, Houmiao; Lei, Yong; Yan, Liying; Wan, Liyun; Ren, Xiaoping; Chen, Silong; Dai, Xiaofeng; Guo, Wei; Jiang, Huifang; Liao, Boshou

    2016-01-01

    In the Aspergillus flavus (A. flavus)–peanut pathosystem, development and metabolism of the fungus directly influence aflatoxin contamination. To comprehensively understand the molecular mechanism of A. flavus interaction with peanut, RNA-seq was used for global transcriptome profiling of A. flavus during interaction with resistant and susceptible peanut genotypes. In total, 67.46 Gb of high-quality bases were generated for A. flavus-resistant (af_R) and -susceptible peanut (af_S) at one (T1), three (T2) and seven (T3) days post-inoculation. The uniquely mapped reads to A. flavus reference genome in the libraries of af_R and af_S at T2 and T3 were subjected to further analysis, with more than 72% of all obtained genes expressed in the eight libraries. Comparison of expression levels both af_R vs. af_S and T2 vs. T3 uncovered 1926 differentially expressed genes (DEGs). DEGs associated with mycelial growth, conidial development and aflatoxin biosynthesis were up-regulated in af_S compared with af_R, implying that A. flavus mycelia more easily penetrate and produce much more aflatoxin in susceptible than in resistant peanut. Our results serve as a foundation for understanding the molecular mechanisms of aflatoxin production differences between A. flavus-R and -S peanut, and offer new clues to manage aflatoxin contamination in crops. PMID:26891328

  18. The integrated web service and genome database for agricultural plants with biotechnology information.

    PubMed

    Kim, Changkug; Park, Dongsuk; Seol, Youngjoo; Hahn, Jangho

    2011-01-01

    The National Agricultural Biotechnology Information Center (NABIC) constructed an agricultural biology-based infrastructure and developed a Web based relational database for agricultural plants with biotechnology information. The NABIC has concentrated on functional genomics of major agricultural plants, building an integrated biotechnology database for agro-biotech information that focuses on genomics of major agricultural resources. This genome database provides annotated genome information from 1,039,823 records mapped to rice, Arabidopsis, and Chinese cabbage.

  19. Comparative genomics and transcriptome analysis of Aspergillus niger and metabolic engineering for citrate production

    PubMed Central

    Yin, Xian; Shin, Hyun-dong; Li, Jianghua; Du, Guocheng; Liu, Long; Chen, Jian

    2017-01-01

    Despite a long and successful history of citrate production in Aspergillus niger, the molecular mechanism of citrate accumulation is only partially understood. In this study, we used comparative genomics and transcriptome analysis of citrate-producing strains—namely, A. niger H915-1 (citrate titer: 157 g L−1), A1 (117 g L−1), and L2 (76 g L−1)—to gain a genome-wide view of the mechanism of citrate accumulation. Compared with A. niger A1 and L2, A. niger H915-1 contained 92 mutated genes, including a succinate-semialdehyde dehydrogenase in the γ-aminobutyric acid shunt pathway and an aconitase family protein involved in citrate synthesis. Furthermore, transcriptome analysis of A. niger H915-1 revealed that the transcription levels of 479 genes changed between the cell growth stage (6 h) and the citrate synthesis stage (12 h, 24 h, 36 h, and 48 h). In the glycolysis pathway, triosephosphate isomerase was up-regulated, whereas pyruvate kinase was down-regulated. Two cytosol ATP-citrate lyases, which take part in the cycle of citrate synthesis, were up-regulated, and may coordinate with the alternative oxidases in the alternative respiratory pathway for energy balance. Finally, deletion of the oxaloacetate acetylhydrolase gene in H915-1 eliminated oxalate formation but neither influence on pH decrease nor difference in citrate production were observed. PMID:28106122

  20. MaizeGDB: The Maize Genetics and Genomics Database.

    PubMed

    Harper, Lisa; Gardiner, Jack; Andorf, Carson; Lawrence, Carolyn J

    2016-01-01

    MaizeGDB is the community database for biological information about the crop plant Zea mays. Genomic, genetic, sequence, gene product, functional characterization, literature reference, and person/organization contact information are among the datatypes stored at MaizeGDB. At the project's website ( http://www.maizegdb.org ) are custom interfaces enabling researchers to browse data and to seek out specific information matching explicit search criteria. In addition, pre-compiled reports are made available for particular types of data and bulletin boards are provided to facilitate communication and coordination among members of the community of maize geneticists.

  1. GDR (Genome Database for Rosaceae): integrated web resources for Rosaceae genomics and genetics research

    PubMed Central

    Jung, Sook; Jesudurai, Christopher; Staton, Margaret; Du, Zhidian; Ficklin, Stephen; Cho, Ilhyung; Abbott, Albert; Tomkins, Jeffrey; Main, Dorrie

    2004-01-01

    Background Peach is being developed as a model organism for Rosaceae, an economically important family that includes fruits and ornamental plants such as apple, pear, strawberry, cherry, almond and rose. The genomics and genetics data of peach can play a significant role in the gene discovery and the genetic understanding of related species. The effective utilization of these peach resources, however, requires the development of an integrated and centralized database with associated analysis tools. Description The Genome Database for Rosaceae (GDR) is a curated and integrated web-based relational database. GDR contains comprehensive data of the genetically anchored peach physical map, an annotated peach EST database, Rosaceae maps and markers and all publicly available Rosaceae sequences. Annotations of ESTs include contig assembly, putative function, simple sequence repeats, and anchored position to the peach physical map where applicable. Our integrated map viewer provides graphical interface to the genetic, transcriptome and physical mapping information. ESTs, BACs and markers can be queried by various categories and the search result sites are linked to the integrated map viewer or to the WebFPC physical map sites. In addition to browsing and querying the database, users can compare their sequences with the annotated GDR sequences via a dedicated sequence similarity server running either the BLAST or FASTA algorithm. To demonstrate the utility of the integrated and fully annotated database and analysis tools, we describe a case study where we anchored Rosaceae sequences to the peach physical and genetic map by sequence similarity. Conclusions The GDR has been initiated to meet the major deficiency in Rosaceae genomics and genetics research, namely a centralized web database and bioinformatics tools for data storage, analysis and exchange. GDR can be accessed at . PMID:15357877

  2. Plant Genome DataBase Japan (PGDBj): A Portal Website for the Integration of Plant Genome-Related Databases

    PubMed Central

    Asamizu, Erika; Ichihara, Hisako; Nakaya, Akihiro; Nakamura, Yasukazu; Hirakawa, Hideki; Ishii, Takahiro; Tamura, Takuro; Fukami-Kobayashi, Kaoru; Nakajima, Yukari; Tabata, Satoshi

    2014-01-01

    The Plant Genome DataBase Japan (PGDBj, http://pgdbj.jp/?ln=en) is a portal website that aims to integrate plant genome-related information from databases (DBs) and the literature. The PGDBj is comprised of three component DBs and a cross-search engine, which provides a seamless search over the contents of the DBs. The three DBs are as follows. (i) The Ortholog DB, providing gene cluster information based on the amino acid sequence similarity. Over 500,000 amino acid sequences of 20 Viridiplantae species were subjected to reciprocal BLAST searches and clustered. Sequences from plant genome DBs (e.g. TAIR10 and RAP-DB) were also included in the cluster with a direct link to the original DB. (ii) The Plant Resource DB, integrating the SABRE DB, which provides cDNA and genome sequence resources accumulated and maintained in the RIKEN BioResource Center and National BioResource Projects. (iii) The DNA Marker DB, providing manually or automatically curated information of DNA markers, quantitative trait loci and related linkage maps, from the literature and external DBs. As the PGDBj targets various plant species, including model plants, algae, and crops important as food, fodder and biofuel, researchers in the field of basic biology as well as a wide range of agronomic fields are encouraged to perform searches using DNA sequences, gene names, traits and phenotypes of interest. The PGDBj will return the search results from the component DBs and various types of linked external DBs. PMID:24363285

  3. Bovine Genome Database: supporting community annotation and analysis of the Bos taurus genome

    PubMed Central

    2010-01-01

    Background A goal of the Bovine Genome Database (BGD; http://BovineGenome.org) has been to support the Bovine Genome Sequencing and Analysis Consortium (BGSAC) in the annotation and analysis of the bovine genome. We were faced with several challenges, including the need to maintain consistent quality despite diversity in annotation expertise in the research community, the need to maintain consistent data formats, and the need to minimize the potential duplication of annotation effort. With new sequencing technologies allowing many more eukaryotic genomes to be sequenced, the demand for collaborative annotation is likely to increase. Here we present our approach, challenges and solutions facilitating a large distributed annotation project. Results and Discussion BGD has provided annotation tools that supported 147 members of the BGSAC in contributing 3,871 gene models over a fifteen-week period, and these annotations have been integrated into the bovine Official Gene Set. Our approach has been to provide an annotation system, which includes a BLAST site, multiple genome browsers, an annotation portal, and the Apollo Annotation Editor configured to connect directly to our Chado database. In addition to implementing and integrating components of the annotation system, we have performed computational analyses to create gene evidence tracks and a consensus gene set, which can be viewed on individual gene pages at BGD. Conclusions We have provided annotation tools that alleviate challenges associated with distributed annotation. Our system provides a consistent set of data to all annotators and eliminates the need for annotators to format data. Involving the bovine research community in genome annotation has allowed us to leverage expertise in various areas of bovine biology to provide biological insight into the genome sequence. PMID:21092105

  4. Metabolic network driven analysis of genome-wide transcription data from Aspergillus nidulans

    PubMed Central

    David, Helga; Hofmann, Gerald; Oliveira, Ana Paula; Jarmer, Hanne; Nielsen, Jens

    2006-01-01

    Background Aspergillus nidulans (the asexual form of Emericella nidulans) is a model organism for aspergilli, which are an important group of filamentous fungi that encompasses human and plant pathogens as well as industrial cell factories. Aspergilli have a highly diversified metabolism and, because of their medical, agricultural and biotechnological importance, it would be valuable to have an understanding of how their metabolism is regulated. We therefore conducted a genome-wide transcription analysis of A. nidulans grown on three different carbon sources (glucose, glycerol, and ethanol) with the objective of identifying global regulatory structures. Furthermore, we reconstructed the complete metabolic network of this organism, which resulted in linking 666 genes to metabolic functions, as well as assigning metabolic roles to 472 genes that were previously uncharacterized. Results Through combination of the reconstructed metabolic network and the transcription data, we identified subnetwork structures that pointed to coordinated regulation of genes that are involved in many different parts of the metabolism. Thus, for a shift from glucose to ethanol, we identified coordinated regulation of the complete pathway for oxidation of ethanol, as well as upregulation of gluconeogenesis and downregulation of glycolysis and the pentose phosphate pathway. Furthermore, on change in carbon source from glucose to ethanol, the cells shift from using the pentose phosphate pathway as the major source of NADPH (nicotinamide adenine dinucleotide phosphatase, reduced form) for biosynthesis to use of the malic enzyme. Conclusion Our analysis indicates that some of the genes are regulated by common transcription factors, making it possible to establish new putative links between known transcription factors and genes through clustering. PMID:17107606

  5. fPoxDB: fungal peroxidase database for comparative genomics

    PubMed Central

    2014-01-01

    Background Peroxidases are a group of oxidoreductases which mediate electron transfer from hydrogen peroxide (H2O2) and organic peroxide to various electron acceptors. They possess a broad spectrum of impact on industry and fungal biology. There are numerous industrial applications using peroxidases, such as to catalyse highly reactive pollutants and to breakdown lignin for recycling of carbon sources. Moreover, genes encoding peroxidases play important roles in fungal pathogenicity in both humans and plants. For better understanding of fungal peroxidases at the genome-level, a novel genomics platform is required. To this end, Fungal Peroxidase Database (fPoxDB; http://peroxidase.riceblast.snu.ac.kr/) has been developed to provide such a genomics platform for this important gene family. Description In order to identify and classify fungal peroxidases, 24 sequence profiles were built and applied on 331 genomes including 216 from fungi and Oomycetes. In addition, NoxR, which is known to regulate NADPH oxidases (NoxA and NoxB) in fungi, was also added to the pipeline. Collectively, 6,113 genes were predicted to encode 25 gene families, presenting well-separated distribution along the taxonomy. For instance, the genes encoding lignin peroxidase, manganese peroxidase, and versatile peroxidase were concentrated in the rot-causing basidiomycetes, reflecting their ligninolytic capability. As a genomics platform, fPoxDB provides diverse analysis resources, such as gene family predictions based on fungal sequence profiles, pre-computed results of eight bioinformatics programs, similarity search tools, a multiple sequence alignment tool, domain analysis functions, and taxonomic distribution summary, some of which are not available in the previously developed peroxidase resource. In addition, fPoxDB is interconnected with other family web systems, providing extended analysis opportunities. Conclusions fPoxDB is a fungi-oriented genomics platform for peroxidases. The sequence

  6. Genomics and Public Health Research: Can the State Allow Access to Genomic Databases?

    PubMed Central

    Cousineau, J; Girard, N; Monardes, C; Leroux, T; Jean, M Stanton

    2012-01-01

    Because many diseases are multifactorial disorders, the scientific progress in genomics and genetics should be taken into consideration in public health research. In this context, genomic databases will constitute an important source of information. Consequently, it is important to identify and characterize the State’s role and authority on matters related to public health, in order to verify whether it has access to such databases while engaging in public health genomic research. We first consider the evolution of the concept of public health, as well as its core functions, using a comparative approach (e.g. WHO, PAHO, CDC and the Canadian province of Quebec). Following an analysis of relevant Quebec legislation, the precautionary principle is examined as a possible avenue to justify State access to and use of genomic databases for research purposes. Finally, we consider the Influenza pandemic plans developed by WHO, Canada, and Quebec, as examples of key tools framing public health decision-making process. We observed that State powers in public health, are not, in Quebec, well adapted to the expansion of genomics research. We propose that the scope of the concept of research in public health should be clear and include the following characteristics: a commitment to the health and well-being of the population and to their determinants; the inclusion of both applied research and basic research; and, an appropriate model of governance (authorization, follow-up, consent, etc.). We also suggest that the strategic approach version of the precautionary principle could guide collective choices in these matters. PMID:23113174

  7. Sequence modelling and an extensible data model for genomic database

    SciTech Connect

    Li, Peter Wei-Der |

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  8. Sequence modelling and an extensible data model for genomic database

    SciTech Connect

    Li, Peter Wei-Der Lawrence Berkeley Lab., CA )

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  9. Using FlyBase, a Database of Drosophila Genes & Genomes

    PubMed Central

    Marygold, Steven J.; Crosby, Madeline A.; Goodman, Joshua L.

    2016-01-01

    SUMMARY For nearly 25 years, FlyBase (flybase.org) has provided a freely available online database of biological information about Drosophila species, focusing on the model organism D. melanogaster. The need for a centralized, integrated view of Drosophila research has never been greater as advances in genomic, proteomic and high-throughput technologies add to the quantity and diversity of available data and resources. FlyBase has taken several approaches to respond to these changes in the research landscape. Novel report pages have been generated for new reagent types and physical interaction data; Drosophila models of human disease are now represented and showcased in dedicated Human Disease Model Reports; other integrated reports have been established that bring together related genes, datasets or reagents; Gene Reports have been revised to improve access to new data types and to highlight functional data; links to external sites have been organized and expanded; and new tools have been developed to display and interrogate all these data, including improved batch processing and bulk file availability. In addition, several new community initiatives have served to enhance interactions between researchers and FlyBase, resulting in direct user contributions and improved feedback. This chapter provides an overview of the data content, organization and available tools within FlyBase, focusing on recent improvements. We hope it serves as a guide for our diverse user base, enabling efficient and effective exploration of the database and thereby accelerating research discoveries. PMID:27730573

  10. The Saccharomyces Genome Database: Advanced Searching Methods and Data Mining.

    PubMed

    Cherry, J Michael

    2015-12-02

    At the core of the Saccharomyces Genome Database (SGD) are chromosomal features that encode a product. These include protein-coding genes and major noncoding RNA genes, such as tRNA and rRNA genes. The basic entry point into SGD is a gene or open-reading frame name that leads directly to the locus summary information page. A keyword describing function, phenotype, selective condition, or text from abstracts will also provide a door into the SGD. A DNA or protein sequence can be used to identify a gene or a chromosomal region using BLAST. Protein and DNA sequence identifiers, PubMed and NCBI IDs, author names, and function terms are also valid entry points. The information in SGD has been gathered and is maintained by a group of scientific biocurators and software developers who are devoted to providing researchers with up-to-date information from the published literature, connections to all the major research resources, and tools that allow the data to be explored. All the collected information cannot be represented or summarized for every possible question; therefore, it is necessary to be able to search the structured data in the database. This protocol describes the YeastMine tool, which provides an advanced search capability via an interactive tool. The SGD also archives results from microarray expression experiments, and a strategy designed to explore these data using the SPELL (Serial Pattern of Expression Levels Locator) tool is provided.

  11. Oryzabase. An integrated biological and genome information database for rice.

    PubMed

    Kurata, Nori; Yamazaki, Yukiko

    2006-01-01

    The aim of Oryzabase is to create a comprehensive view of rice (Oryza sativa) as a model monocot plant by integrating biological data with molecular genomic information (http://www.shigen.nig.ac.jp/rice/oryzabase/top/top.jsp). The database contains information about rice development and anatomy, rice mutants, and genetic resources, especially for wild varieties of rice. The anatomical description of rice development is unique and is the first known representation for rice. Developmental and anatomical descriptions include in situ gene expression data serving as stage and tissue markers. The systematic presentation of a large number of rice mutant and mutant trait genes is indispensable, as is description of research in wild strains, core collections, and their detailed characterization. Several genetic, physical, and expression maps with full genome and cDNA sequences are also combined with biological data in Oryzabase. These datasets, when pooled together, could provide a useful tool for gaining greater knowledge about the life cycle of rice, the relationship between phenotype and gene function, and rice genetic diversity. For exchanging community information, Oryzabase publishes the Rice Genetics Newsletter organized by the Rice Genetics Cooperative and provides a mailing service, rice-e-net/rice-net.

  12. The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata

    SciTech Connect

    Fenner, Marsha W; Liolios, Konstantinos; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Kyrpides, Nikos C.

    2007-12-31

    The Genomes On Line Database (GOLD) is a comprehensive resource of information for genome and metagenome projects world-wide. GOLD provides access to complete and ongoing projects and their associated metadata through pre-computed lists and a search page. The database currently incorporates information for more than 2900 sequencing projects, of which 639 have been completed and the data deposited in the public databases. GOLD is constantly expanding to provide metadata information related to the project and the organism and is compliant with the Minimum Information about a Genome Sequence (MIGS) specifications.

  13. ECRbase: Database of Evolutionary Conserved Regions, Promoters, and Transcription Factor Binding Sites in Vertebrate Genomes

    DOE Data Explorer

    Loots, Gabriela G. [LLNL; Ovcharenko, I. [LLNL

    Evolutionary conservation of DNA sequences provides a tool for the identification of functional elements in genomes. This database of evolutionary conserved regions (ECRs) in vertebrate genomes features a database of syntenic blocks that recapitulate the evolution of rearrangements in vertebrates and a comprehensive collection of promoters in all vertebrate genomes generated using multiple sources of gene annotation. The database also contains a collection of annotated transcription factor binding sites (TFBSs) in evolutionary conserved and promoter elements. ECRbase currently includes human, rhesus macaque, dog, opossum, rat, mouse, chicken, frog, zebrafish, and fugu genomes. (taken from paper in Journal: Bioinformatics, November 7, 2006, pp. 122-124

  14. ECRbase: Database of Evolutionary Conserved Regions, Promoters, and Transcription Factor Binding Sites in Vertebrate Genomes

    SciTech Connect

    Loots, G; Ovcharenko, I

    2006-08-08

    Evolutionary conservation of DNA sequences provides a tool for the identification of functional elements in genomes. We have created a database of evolutionary conserved regions (ECRs) in vertebrate genomes entitled ECRbase that is constructed from a collection of pairwise vertebrate genome alignments produced by the ECR Browser database. ECRbase features a database of syntenic blocks that recapitulate the evolution of rearrangements in vertebrates and a collection of promoters in all vertebrate genomes presented in the database. The database also contains a collection of annotated transcription factor binding sites (TFBS) in all ECRs and promoter elements. ECRbase currently includes human, rhesus macaque, dog, opossum, rat, mouse, chicken, frog, zebrafish, and two pufferfish genomes. It is freely accessible at http://ECRbase.dcode.org.

  15. Use of functional genomics to assess the climate change impact on Aspergillus flavus and aflatoxin production

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus is an opportunistic pathogenic fungus that infects several crops of agricultural importance, among them, corn, cotton, and peanuts. Once established as a pathogen the fungus may secrete secondary metabolites commonly known as mycotoxins, that if consumed by humans or animals may r...

  16. Genome wide association mapping of Aspergillus flavus and aflatoxin accumulation resistance in maize

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Contamination of maize with aflatoxin, produced by the fungus Aspergillus flavus, has severe health and economic consequences. Efforts to reduce aflatoxin accumulation in maize have focused on identifying and selecting germplasm with natural host resistance factors, and several maize lines with sign...

  17. MELOGEN: an EST database for melon functional genomics

    PubMed Central

    Gonzalez-Ibeas, Daniel; Blanca, José; Roig, Cristina; González-To, Mireia; Picó, Belén; Truniger, Verónica; Gómez, Pedro; Deleu, Wim; Caño-Delgado, Ana; Arús, Pere; Nuez, Fernando; Garcia-Mas, Jordi; Puigdomènech, Pere; Aranda, Miguel A

    2007-01-01

    Background Melon (Cucumis melo L.) is one of the most important fleshy fruits for fresh consumption. Despite this, few genomic resources exist for this species. To facilitate the discovery of genes involved in essential traits, such as fruit development, fruit maturation and disease resistance, and to speed up the process of breeding new and better adapted melon varieties, we have produced a large collection of expressed sequence tags (ESTs) from eight normalized cDNA libraries from different tissues in different physiological conditions. Results We determined over 30,000 ESTs that were clustered into 16,637 non-redundant sequences or unigenes, comprising 6,023 tentative consensus sequences (contigs) and 10,614 unclustered sequences (singletons). Many potential molecular markers were identified in the melon dataset: 1,052 potential simple sequence repeats (SSRs) and 356 single nucleotide polymorphisms (SNPs) were found. Sixty-nine percent of the melon unigenes showed a significant similarity with proteins in databases. Functional classification of the unigenes was carried out following the Gene Ontology scheme. In total, 9,402 unigenes were mapped to one or more ontology. Remarkably, the distributions of melon and Arabidopsis unigenes followed similar tendencies, suggesting that the melon dataset is representative of the whole melon transcriptome. Bioinformatic analyses primarily focused on potential precursors of melon micro RNAs (miRNAs) in the melon dataset, but many other genes potentially controlling disease resistance and fruit quality traits were also identified. Patterns of transcript accumulation were characterised by Real-Time-qPCR for 20 of these genes. Conclusion The collection of ESTs characterised here represents a substantial increase on the genetic information available for melon. A database (MELOGEN) which contains all EST sequences, contig images and several tools for analysis and data mining has been created. This set of sequences constitutes

  18. Comparative genomic analysis of Aspergillus oryzae strains 3.042 and RIB40 for soy sauce fermentation.

    PubMed

    Zhao, Guozhong; Yao, Yunping; Wang, Chunling; Hou, Lihua; Cao, Xiaohong

    2013-06-17

    The filamentous fungus Aspergillus oryzae 3.042 (Chinese strain) is a close relative of A. oryzae RIB40 (Japanese strain), which is the important agent used for soy sauce fermentation. The genome of A. oryzae 3.042 was sequenced and compared with A. oryzae RIB40 in an attempt to understand why different soy sauce flavors are produced by these strains. The A. oryzae 3.042 chromosome is 36,547,279bp and contains 11,399 protein-encoding genes. MUMmer analysis revealed that the genomes of A. oryzae 3.042 and RIB40 are mostly collinear. Genome sequence data and comparative analysis of the two strains identified several strain-specific genes that encode putative proteins involved in cell growth, salt tolerance, environmental resistance and flavor formation. A. oryzae 3.042 showed stronger potential for mycelial growth. Some genes unique to A. oryzae RIB40 were related to salt tolerance, especially genes for K(+) transport, while others were associated with ester formation and amino acid metabolism, which likely contribute to flavor formation. In conclusion, comparative genome analysis provided insights into the different genetic traits of the two A. oryzae strains. The unique genes that we found in A. oryzae would make sense to the soy sauce fermentation.

  19. Draft genome sequences of two closely-related aflatoxigenic Aspergillus species obtained from the Ivory Coast

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genomes of the A. ochraceoroseus and A. rambellii type strains were sequenced using a personal genome machine, followed by annotation of their genes. The genome size for A. ochraceoroseus was found to be approximately 23 Mb and contained 7,837 genes, while the A. rambellii genome was found to be...

  20. Genomic analysis of the aconidial and high-performance protein producer, industrially relevant Aspergillus niger SH2 strain.

    PubMed

    Yin, Chao; Wang, Bin; He, Pan; Lin, Ying; Pan, Li

    2014-05-15

    Aspergillus niger is usually regarded as a beneficial species widely used in biotechnological industry. Obtaining the genome sequence of the widely used aconidial A. niger SH2 strain is of great importance to understand its unusual production capability. In this study we assembled a high-quality genome sequence of A. niger SH2 with approximately 11,517 ORFs. Relatively high proportion of genes enriched for protein expression related FunCat items verify its efficient capacity in protein production. Furthermore, genome-wide comparative analysis between A. niger SH2 and CBS513.88 reveals insights into unique properties of A. niger SH2. A. niger SH2 lacks the gene related with the initiation of asexual sporulation (PrpA), leading to its distinct aconidial phenotype. Frame shift mutations and non-synonymous SNPs in genes of cell wall integrity signaling, β-1,3-glucan synthesis and chitin synthesis influence its cell wall development which is important for its hyphal fragmentation during industrial high-efficiency protein production.

  1. pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.

    PubMed

    Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael

    2013-08-01

    With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains.

  2. SpBase: the sea urchin genome database and web site.

    PubMed

    Cameron, R Andrew; Samanta, Manoj; Yuan, Autumn; He, Dong; Davidson, Eric

    2009-01-01

    SpBase is a system of databases focused on the genomic information from sea urchins and related echinoderms. It is exposed to the public through a web site served with open source software (http://spbase.org/). The enterprise was undertaken to provide an easily used collection of information to directly support experimental work on these useful research models in cell and developmental biology. The information served from the databases emerges from the draft genomic sequence of the purple sea urchin, Strongylocentrotus purpuratus and includes sequence data and genomic resource descriptions for other members of the echinoderm clade which in total span 540 million years of evolutionary time. This version of the system contains two assemblies of the purple sea urchin genome, associated expressed sequences, gene annotations and accessory resources. Search mechanisms for the sequences and the gene annotations are provided. Because the system is maintained along with the Sea Urchin Genome resource, a database of sequenced clones is also provided.

  3. Cloning and Genomic Organization of a Rhamnogalacturonase Gene from Locally Isolated Strain of Aspergillus niger.

    PubMed

    Damak, Naourez; Abdeljalil, Salma; Taeib, Noomen Hadj; Gargouri, Ali

    2015-08-01

    The rhg gene encoding a rhamnogalacturonase was isolated from the novel strain A1 of Aspergillus niger. It consists of an ORF of 1.505 kb encoding a putative protein of 446 amino acids with a predicted molecular mass of 47 kDa, belonging to the family 28 of glycosyl hydrolases. The nature and position of amino acids comprising the active site as well as the three-dimensional structure were well conserved between the A. niger CTM10548 and fungal rhamnogalacturonases. The coding region of the rhg gene is interrupted by three short introns of 56 (introns 1 and 3) and 52 (intron 2) bp in length. The comparison of the peptide sequence with A. niger rhg sequences revealed that the A1 rhg should be an endo-rhamnogalacturonases, more homologous to rhg A than rhg B A. niger known enzymes. The comparison of rhg nucleotide sequence from A. niger A1 with rhg A from A. niger shows several base changes. Most of these changes (59 %) are located at the third base of codons suggesting maintaining the same enzyme function. We used the rhamnogalacturonase A from Aspergillus aculeatus as a template to build a structural model of rhg A1 that adopted a right-handed parallel β-helix.

  4. GenomeCRISPR - a database for high-throughput CRISPR/Cas9 screens

    PubMed Central

    Rauscher, Benedikt; Heigwer, Florian; Breinig, Marco; Winter, Jan; Boutros, Michael

    2017-01-01

    Over the past years, CRISPR/Cas9 mediated genome editing has developed into a powerful tool for modifying genomes in various organisms. In high-throughput screens, CRISPR/Cas9 mediated gene perturbations can be used for the systematic functional analysis of whole genomes. Discoveries from such screens provide a wealth of knowledge about gene to phenotype relationships in various biological model systems. However, a database resource to query results efficiently has been lacking. To this end, we developed GenomeCRISPR (http://genomecrispr.org), a database for genome-scale CRISPR/Cas9 screens. Currently, GenomeCRISPR contains data on more than 550 000 single guide RNAs (sgRNA) derived from 84 different experiments performed in 48 different human cell lines, comprising all screens in human cells using CRISPR/Cas published to date. GenomeCRISPR provides data mining options and tools, such as gene or genomic region search. Phenotypic and genome track views allow users to investigate and compare the results of different screens, or the impact of different sgRNAs on the gene of interest. An Application Programming Interface (API) allows for automated data access and batch download. As more screening data will become available, we also aim at extending the database to include functional genomic data from other organisms and enable cross-species comparisons. PMID:27789686

  5. The MaizeGDB Genome Browser Tutorial: One example of database outreach to biologists via video

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Video tutorials are an effective way for researchers to quickly learn how to use online tools offered by biological databases. At the Maize Genetics and Genomics Database (MaizeGDB), we have developed a number of video tutorials that aim to demonstrate how to use various tools as well as to explici...

  6. CottonGen: a genomics, genetics and breeding database for cotton research

    Technology Transfer Automated Retrieval System (TEKTRAN)

    CottonGen (http://www.cottongen.org) is a curated and integrated web-based relational database providing access to publicly available genomic, genetic and breeding data for cotton. CottonGen supercedes CottonDB and the Cotton Marker Database, with enhanced tools for easier data sharing, mining, vis...

  7. Use of Genomic Databases for Inquiry-Based Learning about Influenza

    ERIC Educational Resources Information Center

    Ledley, Fred; Ndung'u, Eric

    2011-01-01

    The genome projects of the past decades have created extensive databases of biological information with applications in both research and education. We describe an inquiry-based exercise that uses one such database, the National Center for Biotechnology Information Influenza Virus Resource, to advance learning about influenza. This database…

  8. The Princeton Protein Orthology Database (P-POD): A Comparative Genomics Analysis Tool for Biologists

    PubMed Central

    Kang, Fan; Angiuoli, Samuel V.; White, Owen; Botstein, David; Dolinski, Kara

    2007-01-01

    Many biological databases that provide comparative genomics information and tools are now available on the internet. While certainly quite useful, to our knowledge none of the existing databases combine results from multiple comparative genomics methods with manually curated information from the literature. Here we describe the Princeton Protein Orthology Database (P-POD, http://ortholog.princeton.edu), a user-friendly database system that allows users to find and visualize the phylogenetic relationships among predicted orthologs (based on the OrthoMCL method) to a query gene from any of eight eukaryotic organisms, and to see the orthologs in a wider evolutionary context (based on the Jaccard clustering method). In addition to the phylogenetic information, the database contains experimental results manually collected from the literature that can be compared to the computational analyses, as well as links to relevant human disease and gene information via the OMIM, model organism, and sequence databases. Our aim is for the P-POD resource to be extremely useful to typical experimental biologists wanting to learn more about the evolutionary context of their favorite genes. P-POD is based on the commonly used Generic Model Organism Database (GMOD) schema and can be downloaded in its entirety for installation on one's own system. Thus, bioinformaticians and software developers may also find P-POD useful because they can use the P-POD database infrastructure when developing their own comparative genomics resources and database tools. PMID:17712414

  9. GBshape: a genome browser database for DNA shape annotations.

    PubMed

    Chiu, Tsu-Pei; Yang, Lin; Zhou, Tianyin; Main, Bradley J; Parker, Stephen C J; Nuzhdin, Sergey V; Tullius, Thomas D; Rohs, Remo

    2015-01-01

    Many regulatory mechanisms require a high degree of specificity in protein-DNA binding. Nucleotide sequence does not provide an answer to the question of why a protein binds only to a small subset of the many putative binding sites in the genome that share the same core motif. Whereas higher-order effects, such as chromatin accessibility, cooperativity and cofactors, have been described, DNA shape recently gained attention as another feature that fine-tunes the DNA binding specificities of some transcription factor families. Our Genome Browser for DNA shape annotations (GBshape; freely available at http://rohslab.cmb.usc.edu/GBshape/) provides minor groove width, propeller twist, roll, helix twist and hydroxyl radical cleavage predictions for the entire genomes of 94 organisms. Additional genomes can easily be added using the GBshape framework. GBshape can be used to visualize DNA shape annotations qualitatively in a genome browser track format, and to download quantitative values of DNA shape features as a function of genomic position at nucleotide resolution. As biological applications, we illustrate the periodicity of DNA shape features that are present in nucleosome-occupied sequences from human, fly and worm, and we demonstrate structural similarities between transcription start sites in the genomes of four Drosophila species.

  10. Database of Periodic DNA Regions in Major Genomes.

    PubMed

    Frenkel, Felix E; Korotkova, Maria A; Korotkov, Eugene V

    2017-01-01

    Summary. We analyzed several prokaryotic and eukaryotic genomes looking for the periodicity sequences availability and employing a new mathematical method. The method envisaged using the random position weight matrices and dynamic programming. Insertions and deletions were allowed inside periodicities, thus adding a novelty to the results we obtained. A periodicity length, one of the key periodicity features, varied from 2 to 50 nt. Totally over 60,000 periodicity sequences were found in 15 genomes including some chromosomes of the H. sapiens (partial), C. elegans, D. melanogaster, and A. thaliana genomes.

  11. Database of Periodic DNA Regions in Major Genomes

    PubMed Central

    2017-01-01

    Summary. We analyzed several prokaryotic and eukaryotic genomes looking for the periodicity sequences availability and employing a new mathematical method. The method envisaged using the random position weight matrices and dynamic programming. Insertions and deletions were allowed inside periodicities, thus adding a novelty to the results we obtained. A periodicity length, one of the key periodicity features, varied from 2 to 50 nt. Totally over 60,000 periodicity sequences were found in 15 genomes including some chromosomes of the H. sapiens (partial), C. elegans, D. melanogaster, and A. thaliana genomes. PMID:28182099

  12. Comparative genomics of citric-acid producing Aspergillus niger ATCC 1015 versus enzyme-producing CBS 513.88

    SciTech Connect

    Grigoriev, Igor V.; Baker, Scott E.; Andersen, Mikael R.; Salazar, Margarita P.; Schaap, Peter J.; Vondervoot, Peter J.I. van de; Culley, David; Thykaer, Jette; Frisvad, Jens C.; Nielsen, Kristen F.; Albang, Richard; Albermann, Kaj; Berka, Randy M.; Braus, Gerhard H.; Braus-Stromeyer, Susanna A.; Corrochano, Luis M.; Dai, Ziyu; Dijck, Piet W.M. van; Hofmann, Gerald; Lasure, Linda L.; Magnusson, Jon K.; Meijer, Susan L.; Nielsen, Jakob B.; Nielsen, Michael L.; Ooyen, Albert J.J. van; Panther, Kathyrn S.; Pel, Herman J.; Poulsen, Lars; Samson, Rob A.; Stam, Hen; Tsang, Adrian; Brink, Johannes M. van den; Atkins, Alex; Aerts, Andrea; Shapiro, Harris; Pangilinan, Jasmyn; Salamov, Asaf; Lou, Yigong; Lindquist, Erika; Lucas, Susan; Grimwood, Jane; Kubicek, Christian P.; Martinez, Diego; Peij, Noel N.M.E. van; Roubos, Johannes A.; Nielsen, Jens

    2011-04-28

    The filamentous fungus Aspergillus niger exhibits great diversity in its phenotype. It is found globally, both as marine and terrestrial strains, produces both organic acids and hydrolytic enzymes in high amounts, and some isolates exhibit pathogenicity. Although the genome of an industrial enzyme-producing A. niger strain (CBS 513.88) has already been sequenced, the versatility and diversity of this species compels additional exploration. We therefore undertook whole genome sequencing of the acidogenic A. niger wild type strain (ATCC 1015), and produced a genome sequence of very high quality. Only 15 gaps are present in the sequence and half the telomeric regions have been elucidated. Moreover, sequence information from ATCC 1015 was utilized to improve the genome sequence of CBS 513.88. Chromosome-level comparisons uncovered several genome rearrangements, deletions, a clear case of strain-specific horizontal gene transfer, and identification of 0.8 megabase of novel sequence. Single nucleotide polymorphisms per kilobase (SNPs/kb) between the two strains were found to be exceptionally high (average: 7.8, maximum: 160 SNPs/kb). High variation within the species was confirmed with exo-metabolite profiling and phylogenetics. Detailed lists of alleles were generated, and genotypic differences were observed to accumulate in metabolic pathways essential to acid production and protein synthesis. A transcriptome analysis revealed up-regulation of the electron transport chain, specifically the alternative oxidative pathway in ATCC 1015, while CBS 513.88 showed significant up-regulation of genes relevant to glucoamylase A production, such as tRNA-synthases and protein transporters. Our results and datasets from this integrative systems biology analysis resulted in a snapshot of fungal evolution and will support further optimization of cell factories based on filamentous fungi.[Supplemental materials (10 figures, three text documents and 16 tables) have been made available

  13. Sputnik: a database platform for comparative plant genomics.

    PubMed

    Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F X

    2003-01-01

    Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics.

  14. Sputnik: a database platform for comparative plant genomics

    PubMed Central

    Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F.X.

    2003-01-01

    Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics. PMID:12519965

  15. Characterization of the Mutagenic Spectrum of 4-Nitroquinoline 1-Oxide (4-NQO) in Aspergillus nidulans by Whole Genome Sequencing

    PubMed Central

    Downes, Damien J.; Chonofsky, Mark; Tan, Kaeling; Pfannenstiel, Brandon T.; Reck-Peterson, Samara L.; Todd, Richard B.

    2014-01-01

    4-Nitroquinoline 1-oxide (4-NQO) is a highly carcinogenic chemical that induces mutations in bacteria, fungi, and animals through the formation of bulky purine adducts. 4-NQO has been used as a mutagen for genetic screens and in both the study of DNA damage and DNA repair. In the model eukaryote Aspergillus nidulans, 4-NQO−based genetic screens have been used to study diverse processes, including gene regulation, mitosis, metabolism, organelle transport, and septation. Early work during the 1970s using bacterial and yeast mutation tester strains concluded that 4-NQO was a guanine-specific mutagen. However, these strains were limited in their ability to determine full mutagenic potential, as they could not identify mutations at multiple sites, unlinked suppressor mutations, or G:C to C:G transversions. We have now used a whole genome resequencing approach with mutant strains generated from two independent genetic screens to determine the full mutagenic spectrum of 4-NQO in A. nidulans. Analysis of 3994 mutations from 38 mutant strains reveals that 4-NQO induces substitutions in both guanine and adenine residues, although with a 19-fold preference for guanine. We found no association between mutation load and mutagen dose and observed no sequence bias in the residues flanking the mutated purine base. The mutations were distributed randomly throughout most of the genome. Our data provide new evidence that 4-NQO can potentially target all base pairs. Furthermore, we predict that current practices for 4-NQO−induced mutagenesis are sufficient to reach gene saturation for genetic screens with feasible identification of causative mutations via whole genome resequencing. PMID:25352541

  16. MIPS PlantsDB: a database framework for comparative plant genome research.

    PubMed

    Nussbaumer, Thomas; Martis, Mihaela M; Roessner, Stephan K; Pfeifer, Matthias; Bader, Kai C; Sharma, Sapna; Gundlach, Heidrun; Spannagl, Manuel

    2013-01-01

    The rapidly increasing amount of plant genome (sequence) data enables powerful comparative analyses and integrative approaches and also requires structured and comprehensive information resources. Databases are needed for both model and crop plant organisms and both intuitive search/browse views and comparative genomics tools should communicate the data to researchers and help them interpret it. MIPS PlantsDB (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) was initially described in NAR in 2007 [Spannagl,M., Noubibou,O., Haase,D., Yang,L., Gundlach,H., Hindemitt, T., Klee,K., Haberer,G., Schoof,H. and Mayer,K.F. (2007) MIPSPlantsDB-plant database resource for integrative and comparative plant genome research. Nucleic Acids Res., 35, D834-D840] and was set up from the start to provide data and information resources for individual plant species as well as a framework for integrative and comparative plant genome research. PlantsDB comprises database instances for tomato, Medicago, Arabidopsis, Brachypodium, Sorghum, maize, rice, barley and wheat. Building up on that, state-of-the-art comparative genomics tools such as CrowsNest are integrated to visualize and investigate syntenic relationships between monocot genomes. Results from novel genome analysis strategies targeting the complex and repetitive genomes of triticeae species (wheat and barley) are provided and cross-linked with model species. The MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) as well as tight connections to other databases, e.g. via web services, are further important components of PlantsDB.

  17. Databases in SenseLab for the Genomics, Protemics, and Function of Olfactory Receptors

    PubMed Central

    Marenco, Luis N.; Bahl, Gautam; Hyland, Lorra; Shi, Jing; Wang, Rixin; Lai, Peter C.; Miller, Perry L.; Shepherd, Gordon M.; Crasto, Chiquito J.

    2013-01-01

    We present here, the salient aspects of three databases: Olfactory Receptor Database (ORDB) is a repository of genomics and proteomics information of ORs; OdorDB stores information related to odorous compounds, specifically identitying those that have been shown to interact with olfactory rectors; and OdorModelDB disseminates information related to computational models of olfactory receptors (ORs). The data stored among these databases is integrated. Presented in this chapter are descriptions of these resources, which are part of the SenseLab suite of databases, a discussion of the computational infrastructure that enhances the efficacy of information storage, retrieval, dissemination, and automated data population from external sources. PMID:23585030

  18. Expanded national database collection and data coverage in the FINDbase worldwide database for clinically relevant genomic variation allele frequencies

    PubMed Central

    Viennas, Emmanouil; Komianou, Angeliki; Mizzi, Clint; Stojiljkovic, Maja; Mitropoulou, Christina; Muilu, Juha; Vihinen, Mauno; Grypioti, Panagiota; Papadaki, Styliani; Pavlidis, Cristiana; Zukic, Branka; Katsila, Theodora; van der Spek, Peter J.; Pavlovic, Sonja; Tzimas, Giannis; Patrinos, George P.

    2017-01-01

    FINDbase (http://www.findbase.org) is a comprehensive data repository that records the prevalence of clinically relevant genomic variants in various populations worldwide, such as pathogenic variants leading mostly to monogenic disorders and pharmacogenomics biomarkers. The database also records the incidence of rare genetic diseases in various populations, all in well-distinct data modules. Here, we report extensive data content updates in all data modules, with direct implications to clinical pharmacogenomics. Also, we report significant new developments in FINDbase, namely (i) the release of a new version of the ETHNOS software that catalyzes development curation of national/ethnic genetic databases, (ii) the migration of all FINDbase data content into 90 distinct national/ethnic mutation databases, all built around Microsoft's PivotViewer (http://www.getpivot.com) software (iii) new data visualization tools and (iv) the interrelation of FINDbase with DruGeVar database with direct implications in clinical pharmacogenomics. The abovementioned updates further enhance the impact of FINDbase, as a key resource for Genomic Medicine applications. PMID:27924022

  19. Expanded national database collection and data coverage in the FINDbase worldwide database for clinically relevant genomic variation allele frequencies.

    PubMed

    Viennas, Emmanouil; Komianou, Angeliki; Mizzi, Clint; Stojiljkovic, Maja; Mitropoulou, Christina; Muilu, Juha; Vihinen, Mauno; Grypioti, Panagiota; Papadaki, Styliani; Pavlidis, Cristiana; Zukic, Branka; Katsila, Theodora; van der Spek, Peter J; Pavlovic, Sonja; Tzimas, Giannis; Patrinos, George P

    2017-01-04

    FINDbase (http://www.findbase.org) is a comprehensive data repository that records the prevalence of clinically relevant genomic variants in various populations worldwide, such as pathogenic variants leading mostly to monogenic disorders and pharmacogenomics biomarkers. The database also records the incidence of rare genetic diseases in various populations, all in well-distinct data modules. Here, we report extensive data content updates in all data modules, with direct implications to clinical pharmacogenomics. Also, we report significant new developments in FINDbase, namely (i) the release of a new version of the ETHNOS software that catalyzes development curation of national/ethnic genetic databases, (ii) the migration of all FINDbase data content into 90 distinct national/ethnic mutation databases, all built around Microsoft's PivotViewer (http://www.getpivot.com) software (iii) new data visualization tools and (iv) the interrelation of FINDbase with DruGeVar database with direct implications in clinical pharmacogenomics. The abovementioned updates further enhance the impact of FINDbase, as a key resource for Genomic Medicine applications.

  20. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

    SciTech Connect

    Liolios, Konstantinos; Chen, Amy; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Phil; Markowitz, Victor; Kyrpides, Nikos C.

    2009-09-01

    The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification.

  1. Databases and Web Tools for Cancer Genomics Study

    PubMed Central

    Yang, Yadong; Dong, Xunong; Xie, Bingbing; Ding, Nan; Chen, Juan; Li, Yongjun; Zhang, Qian; Qu, Hongzhu; Fang, Xiangdong

    2015-01-01

    Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new knowledge. Here, we describe the web resources for cancer genomics research and rate them on the basis of the diversity of cancer types, sample size, omics data comprehensiveness, and user experience. The resources reviewed include data repository and analysis tools; and we hope such introduction will promote the awareness and facilitate the usage of these resources in the cancer research community. PMID:25707591

  2. Databases and web tools for cancer genomics study.

    PubMed

    Yang, Yadong; Dong, Xunong; Xie, Bingbing; Ding, Nan; Chen, Juan; Li, Yongjun; Zhang, Qian; Qu, Hongzhu; Fang, Xiangdong

    2015-02-01

    Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new knowledge. Here, we describe the web resources for cancer genomics research and rate them on the basis of the diversity of cancer types, sample size, omics data comprehensiveness, and user experience. The resources reviewed include data repository and analysis tools; and we hope such introduction will promote the awareness and facilitate the usage of these resources in the cancer research community.

  3. The Human OligoGenome Resource: a database of oligonucleotide capture probes for resequencing target regions across the human genome.

    PubMed

    Newburger, Daniel E; Natsoulis, Georges; Grimes, Sue; Bell, John M; Davis, Ronald W; Batzoglou, Serafim; Ji, Hanlee P

    2012-01-01

    Recent exponential growth in the throughput of next-generation DNA sequencing platforms has dramatically spurred the use of accessible and scalable targeted resequencing approaches. This includes candidate region diagnostic resequencing and novel variant validation from whole genome or exome sequencing analysis. We have previously demonstrated that selective genomic circularization is a robust in-solution approach for capturing and resequencing thousands of target human genome loci such as exons and regulatory sequences. To facilitate the design and production of customized capture assays for any given region in the human genome, we developed the Human OligoGenome Resource (http://oligogenome.stanford.edu/). This online database contains over 21 million capture oligonucleotide sequences. It enables one to create customized and highly multiplexed resequencing assays of target regions across the human genome and is not restricted to coding regions. In total, this resource provides 92.1% in silico coverage of the human genome. The online server allows researchers to download a complete repository of oligonucleotide probes and design customized capture assays to target multiple regions throughout the human genome. The website has query tools for selecting and evaluating capture oligonucleotides from specified genomic regions.

  4. PGSB PlantsDB: updates to the database framework for comparative plant genome research

    PubMed Central

    Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai C.; Martis, Mihaela M.; Seidel, Michael; Kugler, Karl G.; Gundlach, Heidrun; Mayer, Klaus F.X.

    2016-01-01

    PGSB (Plant Genome and Systems Biology: formerly MIPS) PlantsDB (http://pgsb.helmholtz-muenchen.de/plant/index.jsp) is a database framework for the comparative analysis and visualization of plant genome data. The resource has been updated with new data sets and types as well as specialized tools and interfaces to address user demands for intuitive access to complex plant genome data. In its latest incarnation, we have re-worked both the layout and navigation structure and implemented new keyword search options and a new BLAST sequence search functionality. Actively involved in corresponding sequencing consortia, PlantsDB has dedicated special efforts to the integration and visualization of complex triticeae genome data, especially for barley, wheat and rye. We enhanced CrowsNest, a tool to visualize syntenic relationships between genomes, with data from the wheat sub-genome progenitor Aegilops tauschii and added functionality to the PGSB RNASeqExpressionBrowser. GenomeZipper results were integrated for the genomes of barley, rye, wheat and perennial ryegrass and interactive access is granted through PlantsDB interfaces. Data exchange and cross-linking between PlantsDB and other plant genome databases is stimulated by the transPLANT project (http://transplantdb.eu/). PMID:26527721

  5. Minos as a novel Tc1/mariner-type transposable element for functional genomic analysis in Aspergillus nidulans.

    PubMed

    Evangelinos, Minoas; Anagnostopoulos, Gerasimos; Karvela-Kalogeraki, Iliana; Stathopoulou, Panagiota M; Scazzocchio, Claudio; Diallinas, George

    2015-08-01

    Transposons constitute powerful genetic tools for gene inactivation, exon or promoter trapping and genome analyses. The Minos element from Drosophila hydei, a Tc1/mariner-like transposon, has proved as a very efficient tool for heterologous transposition in several metazoa. In filamentous fungi, only a handful of fungal-specific transposable elements have been exploited as genetic tools, with the impala Tc1/mariner element from Fusarium oxysporum being the most successful. Here, we developed a two-component transposition system to manipulate Minos transposition in Aspergillus nidulans (AnMinos). Our system allows direct selection of transposition events based on re-activation of niaD, a gene necessary for growth on nitrate as a nitrogen source. On average, among 10(8) conidiospores, we obtain up to ∼0.8×10(2) transposition events leading to the expected revertant phenotype (niaD(+)), while ∼16% of excision events lead to AnMinos loss. Characterized excision footprints consisted of the four terminal bases of the transposon flanked by the TA target duplication and led to no major DNA rearrangements. AnMinos transposition depends on the presence of its homologous transposase. Its frequency was not significantly affected by temperature, UV irradiation or the transcription status of the original integration locus (niaD). Importantly, transposition is dependent on nkuA, encoding an enzyme essential for non-homologous end joining of DNA in double-strand break repair. AnMinos proved to be an efficient tool for functional analysis as it seems to transpose in different genomic loci positions in all chromosomes, including a high proportion of integration events within or close to genes. We have used Minos to obtain morphological and toxic analogue resistant mutants. Interestingly, among morphological mutants some seem to be due to Minos-elicited over-expression of specific genes, rather than gene inactivation.

  6. Databases and information integration for the Medicago truncatula genome and transcriptome.

    PubMed

    Cannon, Steven B; Crow, John A; Heuer, Michael L; Wang, Xiaohong; Cannon, Ethalinda K S; Dwan, Christopher; Lamblin, Anne-Francoise; Vasdewani, Jayprakash; Mudge, Joann; Cook, Andrew; Gish, John; Cheung, Foo; Kenton, Steve; Kunau, Timothy M; Brown, Douglas; May, Gregory D; Kim, Dongjin; Cook, Douglas R; Roe, Bruce A; Town, Chris D; Young, Nevin D; Retzel, Ernest F

    2005-05-01

    An international consortium is sequencing the euchromatic genespace of Medicago truncatula. Extensive bioinformatic and database resources support the marker-anchored bacterial artificial chromosome (BAC) sequencing strategy. Existing physical and genetic maps and deep BAC-end sequencing help to guide the sequencing effort, while EST databases provide essential resources for genome annotation as well as transcriptome characterization and microarray design. Finished BAC sequences are joined into overlapping sequence assemblies and undergo an automated annotation process that integrates ab initio predictions with EST, protein, and other recognizable features. Because of the sequencing project's international and collaborative nature, data production, storage, and visualization tools are broadly distributed. This paper describes databases and Web resources for the project, which provide support for physical and genetic maps, genome sequence assembly, gene prediction, and integration of EST data. A central project Web site at medicago.org/genome provides access to genome viewers and other resources project-wide, including an Ensembl implementation at medicago.org, physical map and marker resources at mtgenome.ucdavis.edu, and genome viewers at the University of Oklahoma (www.genome.ou.edu), the Institute for Genomic Research (www.tigr.org), and Munich Information for Protein Sequences Center (mips.gsf.de).

  7. BambooGDB: a bamboo genome database with functional annotation and an analysis platform

    PubMed Central

    Zhao, Hansheng; Peng, Zhenhua; Fei, Benhua; Li, Lubin; Hu, Tao; Gao, Zhimin; Jiang, Zehui

    2014-01-01

    Bamboo, as one of the most important non-timber forest products and fastest-growing plants in the world, represents the only major lineage of grasses that is native to forests. Recent success on the first high-quality draft genome sequence of moso bamboo (Phyllostachys edulis) provides new insights on bamboo genetics and evolution. To further extend our understanding on bamboo genome and facilitate future studies on the basis of previous achievements, here we have developed BambooGDB, a bamboo genome database with functional annotation and analysis platform. The de novo sequencing data, together with the full-length complementary DNA and RNA-seq data of moso bamboo composed the main contents of this database. Based on these sequence data, a comprehensively functional annotation for bamboo genome was made. Besides, an analytical platform composed of comparative genomic analysis, protein–protein interactions network, pathway analysis and visualization of genomic data was also constructed. As discovery tools to understand and identify biological mechanisms of bamboo, the platform can be used as a systematic framework for helping and designing experiments for further validation. Moreover, diverse and powerful search tools and a convenient browser were incorporated to facilitate the navigation of these data. As far as we know, this is the first genome database for bamboo. Through integrating high-throughput sequencing data, a full functional annotation and several analysis modules, BambooGDB aims to provide worldwide researchers with a central genomic resource and an extensible analysis platform for bamboo genome. BambooGDB is freely available at http://www.bamboogdb.org/. Database URL: http://www.bamboogdb.org PMID:24602877

  8. BambooGDB: a bamboo genome database with functional annotation and an analysis platform.

    PubMed

    Zhao, Hansheng; Peng, Zhenhua; Fei, Benhua; Li, Lubin; Hu, Tao; Gao, Zhimin; Jiang, Zehui

    2014-01-01

    Bamboo, as one of the most important non-timber forest products and fastest-growing plants in the world, represents the only major lineage of grasses that is native to forests. Recent success on the first high-quality draft genome sequence of moso bamboo (Phyllostachys edulis) provides new insights on bamboo genetics and evolution. To further extend our understanding on bamboo genome and facilitate future studies on the basis of previous achievements, here we have developed BambooGDB, a bamboo genome database with functional annotation and analysis platform. The de novo sequencing data, together with the full-length complementary DNA and RNA-seq data of moso bamboo composed the main contents of this database. Based on these sequence data, a comprehensively functional annotation for bamboo genome was made. Besides, an analytical platform composed of comparative genomic analysis, protein-protein interactions network, pathway analysis and visualization of genomic data was also constructed. As discovery tools to understand and identify biological mechanisms of bamboo, the platform can be used as a systematic framework for helping and designing experiments for further validation. Moreover, diverse and powerful search tools and a convenient browser were incorporated to facilitate the navigation of these data. As far as we know, this is the first genome database for bamboo. Through integrating high-throughput sequencing data, a full functional annotation and several analysis modules, BambooGDB aims to provide worldwide researchers with a central genomic resource and an extensible analysis platform for bamboo genome. BambooGDB is freely available at http://www.bamboogdb.org/. Database URL: http://www.bamboogdb.org.

  9. The MAPPER2 Database: a multi-genome catalog of putative transcription factor binding sites

    PubMed Central

    Riva, Alberto

    2012-01-01

    The mapper2 Database (http://genome.ufl.edu/mapperdb) is a component of mapper2, a web-based system for the analysis of transcription factor binding sites in multiple genomes. The database contains predicted binding sites identified in the promoters of all human, mouse and Drosophila genes using 1017 probabilistic models representing over 600 different transcription factors. In this article we outline the current contents of the database and we describe its web-based user interface in detail. We then discuss ongoing work to extend the database contents to experimental data and to add analysis capabilities. Finally, we provide information about recent improvements to the hardware and software platform that mapper2 is based on. PMID:22121218

  10. RefSeq microbial genomes database: new representation and annotation strategy

    PubMed Central

    Tatusova, Tatiana; Ciufo, Stacy; Fedorov, Boris; O’Neill, Kathleen; Tolstoy, Igor

    2014-01-01

    The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives. These can be accessed through the Entrez search and retrieval system at http://www.ncbi.nlm.nih.gov/genome. Next-generation sequencing has enabled researchers to perform genomic sequencing at rates that were unimaginable in the past. Microbial genomes can now be sequenced in a matter of hours, which has led to a significant increase in the number of assembled genomes deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been developed for the annotation and representation of reference genomes and sequence variations derived from population studies and clinical outbreaks. PMID:24316578

  11. RefSeq microbial genomes database: new representation and annotation strategy.

    PubMed

    Tatusova, Tatiana; Ciufo, Stacy; Fedorov, Boris; O'Neill, Kathleen; Tolstoy, Igor

    2014-01-01

    The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives. These can be accessed through the Entrez search and retrieval system at http://www.ncbi.nlm.nih.gov/genome. Next-generation sequencing has enabled researchers to perform genomic sequencing at rates that were unimaginable in the past. Microbial genomes can now be sequenced in a matter of hours, which has led to a significant increase in the number of assembled genomes deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been developed for the annotation and representation of reference genomes and sequence variations derived from population studies and clinical outbreaks.

  12. Bioinformatics tools and databases for whole genome sequence analysis of Mycobacterium tuberculosis.

    PubMed

    Faksri, Kiatichai; Tan, Jun Hao; Chaiprasert, Angkana; Teo, Yik-Ying; Ong, Rick Twee-Hee

    2016-11-01

    Tuberculosis (TB) is an infectious disease of global public health importance caused by Mycobacterium tuberculosis complex (MTC) in which M. tuberculosis (Mtb) is the major causative agent. Recent advancements in genomic technologies such as next generation sequencing have enabled high throughput cost-effective generation of whole genome sequence information from Mtb clinical isolates, providing new insights into the evolution, genomic diversity and transmission of the Mtb bacteria, including molecular mechanisms of antibiotic resistance. The large volume of sequencing data generated however necessitated effective and efficient management, storage, analysis and visualization of the data and results through development of novel and customized bioinformatics software tools and databases. In this review, we aim to provide a comprehensive survey of the current freely available bioinformatics software tools and publicly accessible databases for genomic analysis of Mtb for identifying disease transmission in molecular epidemiology and in rapid determination of the antibiotic profiles of clinical isolates for prompt and optimal patient treatment.

  13. DEPPDB - DNA electrostatic potential properties database. Electrostatic properties of genome DNA elements.

    PubMed

    Osypov, Alexander A; Krutinin, Gleb G; Krutinina, Eugenia A; Kamzolova, Svetlana G

    2012-04-01

    Electrostatic properties of genome DNA are important to its interactions with different proteins, in particular, related to transcription. DEPPDB - DNA Electrostatic Potential (and other Physical) Properties Database - provides information on the electrostatic and other physical properties of genome DNA combined with its sequence and annotation of biological and structural properties of genomes and their elements. Genomes are organized on taxonomical basis, supporting comparative and evolutionary studies. Currently, DEPPDB contains all completely sequenced bacterial, viral, mitochondrial, and plastids genomes according to the NCBI RefSeq, and some model eukaryotic genomes. Data for promoters, regulation sites, binding proteins, etc., are incorporated from established DBs and literature. The database is complemented by analytical tools. User sequences calculations are available. Case studies discovered electrostatics complementing DNA bending in E.coli plasmid BNT2 promoter functioning, possibly affecting host-environment metabolic switch. Transcription factors binding sites gravitate to high potential regions, confirming the electrostatics universal importance in protein-DNA interactions beyond the classical promoter-RNA polymerase recognition and regulation. Other genome elements, such as terminators, also show electrostatic peculiarities. Most intriguing are gene starts, exhibiting taxonomic correlations. The necessity of the genome electrostatic properties studies is discussed.

  14. Mining genomic databases to identify novel hydrogen producers.

    PubMed

    Kalia, Vipin C; Lal, Sadhana; Ghai, Rohit; Mandal, Manabendra; Chauhan, Ashwini

    2003-04-01

    The realization that fossil fuel reserves are limited and their adverse effect on the environment has forced us to look into alternative sources of energy. Hydrogen is a strong contender as a future fuel. Biological hydrogen production ranges from 0.37 to 3.3 moles H(2) per mole of glucose and, considering the high theoretical values of production (4.0 moles H(2) per mole of glucose), it is worth exploring approaches to increase hydrogen yields. Screening the untapped microbial population is a promising possibility. Sequence analysis and pathway alignment of hydrogen metabolism in complete and incomplete genomes has led to the identification of potential hydrogen producers.

  15. MBGD update 2013: the microbial genome database for exploring the diversity of microbial world

    PubMed Central

    Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu

    2013-01-01

    The microbial genome database for comparative analysis (MBGD, available at http://mbgd.genome.ad.jp/) is a platform for microbial genome comparison based on orthology analysis. As its unique feature, MBGD allows users to conduct orthology analysis among any specified set of organisms; this flexibility allows MBGD to adapt to a variety of microbial genomic study. Reflecting the huge diversity of microbial world, the number of microbial genome projects now becomes several thousands. To efficiently explore the diversity of the entire microbial genomic data, MBGD now provides summary pages for pre-calculated ortholog tables among various taxonomic groups. For some closely related taxa, MBGD also provides the conserved synteny information (core genome alignment) pre-calculated using the CoreAligner program. In addition, efficient incremental updating procedure can create extended ortholog table by adding additional genomes to the default ortholog table generated from the representative set of genomes. Combining with the functionalities of the dynamic orthology calculation of any specified set of organisms, MBGD is an efficient and flexible tool for exploring the microbial genome diversity. PMID:23118485

  16. A Database Federation Platform for Gene Chips and the Human Genome Database

    DTIC Science & Technology

    2007-11-02

    different methods and approaches. [6-14] This activity was very pronounced during the period around 1990-1994, but nearly all of those projects...Java and JDBC also simplified many of the internal structures [19]. It is entirely consistent with this architecture to use other methods , including...results are readily generalizable to other experimental methods and other database collections. No compromises have been made that would restrict

  17. PvTFDB: a Phaseolus vulgaris transcription factors database for expediting functional genomics in legumes

    PubMed Central

    Bhawna; Bonthala, V.S.; Gajula, MNV Prasad

    2016-01-01

    The common bean [Phaseolus vulgaris (L.)] is one of the essential proteinaceous vegetables grown in developing countries. However, its production is challenged by low yields caused by numerous biotic and abiotic stress conditions. Regulatory transcription factors (TFs) symbolize a key component of the genome and are the most significant targets for producing stress tolerant crop and hence functional genomic studies of these TFs are important. Therefore, here we have constructed a web-accessible TFs database for P. vulgaris, called PvTFDB, which contains 2370 putative TF gene models in 49 TF families. This database provides a comprehensive information for each of the identified TF that includes sequence data, functional annotation, SSRs with their primer sets, protein physical properties, chromosomal location, phylogeny, tissue-specific gene expression data, orthologues, cis-regulatory elements and gene ontology (GO) assignment. Altogether, this information would be used in expediting the functional genomic studies of a specific TF(s) of interest. The objectives of this database are to understand functional genomics study of common bean TFs and recognize the regulatory mechanisms underlying various stress responses to ease breeding strategy for variety production through a couple of search interfaces including gene ID, functional annotation and browsing interfaces including by family and by chromosome. This database will also serve as a promising central repository for researchers as well as breeders who are working towards crop improvement of legume crops. In addition, this database provide the user unrestricted public access and the user can download entire data present in the database freely. Database URL: http://www.multiomics.in/PvTFDB/ PMID:27465131

  18. PvTFDB: a Phaseolus vulgaris transcription factors database for expediting functional genomics in legumes.

    PubMed

    Bhawna; Bonthala, V S; Gajula, Mnv Prasad

    2016-01-01

    The common bean [Phaseolus vulgaris (L.)] is one of the essential proteinaceous vegetables grown in developing countries. However, its production is challenged by low yields caused by numerous biotic and abiotic stress conditions. Regulatory transcription factors (TFs) symbolize a key component of the genome and are the most significant targets for producing stress tolerant crop and hence functional genomic studies of these TFs are important. Therefore, here we have constructed a web-accessible TFs database for P. vulgaris, called PvTFDB, which contains 2370 putative TF gene models in 49 TF families. This database provides a comprehensive information for each of the identified TF that includes sequence data, functional annotation, SSRs with their primer sets, protein physical properties, chromosomal location, phylogeny, tissue-specific gene expression data, orthologues, cis-regulatory elements and gene ontology (GO) assignment. Altogether, this information would be used in expediting the functional genomic studies of a specific TF(s) of interest. The objectives of this database are to understand functional genomics study of common bean TFs and recognize the regulatory mechanisms underlying various stress responses to ease breeding strategy for variety production through a couple of search interfaces including gene ID, functional annotation and browsing interfaces including by family and by chromosome. This database will also serve as a promising central repository for researchers as well as breeders who are working towards crop improvement of legume crops. In addition, this database provide the user unrestricted public access and the user can download entire data present in the database freely.Database URL: http://www.multiomics.in/PvTFDB/.

  19. Deppdb--DNA electrostatic potential properties database: electrostatic properties of genome DNA.

    PubMed

    Osypov, Alexander A; Krutinin, Gleb G; Kamzolova, Svetlana G

    2010-06-01

    The electrostatic properties of genome DNA influence its interactions with different proteins, in particular, the regulation of transcription by RNA-polymerases. DEPPDB--DNA Electrostatic Potential Properties Database--was developed to hold and provide all available information on the electrostatic properties of genome DNA combined with its sequence and annotation of biological and structural properties of genome elements and whole genomes. Genomes in DEPPDB are organized on a taxonomical basis. Currently, the database contains all the completely sequenced bacterial and viral genomes according to NCBI RefSeq. General properties of the genome DNA electrostatic potential profile and principles of its formation are revealed. This potential correlates with the GC content but does not correspond to it exactly and strongly depends on both the sequence arrangement and its context (flanking regions). Analysis of the promoter regions for bacterial and viral RNA polymerases revealed a correspondence between the scale of these proteins' physical properties and electrostatic profile patterns. We also discovered a direct correlation between the potential value and the binding frequency of RNA polymerase to DNA, supporting the idea of the role of electrostatics in these interactions. This matches a pronounced tendency of the promoter regions to possess higher values of the electrostatic potential.

  20. A DATABASE FOR TRACKING TOXICOGENOMIC SAMPLES AND PROCEDURES WITH GENOMIC, PROTEOMIC AND METABONOMIC COMPONENTS

    EPA Science Inventory

    A Database for Tracking Toxicogenomic Samples and Procedures with Genomic, Proteomic and Metabonomic Components
    Wenjun Bao1, Jennifer Fostel2, Michael D. Waters2, B. Alex Merrick2, Drew Ekman3, Mitchell Kostich4, Judith Schmid1, David Dix1
    Office of Research and Developmen...

  1. LegumeIP: an integrative database for comparative genomics and transcriptomics of model legumes.

    PubMed

    Li, Jun; Dai, Xinbin; Liu, Tingsong; Zhao, Patrick Xuechun

    2012-01-01

    Legumes play a vital role in maintaining the nitrogen cycle of the biosphere. They conduct symbiotic nitrogen fixation through endosymbiotic relationships with bacteria in root nodules. However, this and other characteristics of legumes, including mycorrhization, compound leaf development and profuse secondary metabolism, are absent in the typical model plant Arabidopsis thaliana. We present LegumeIP (http://plantgrn.noble.org/LegumeIP/), an integrative database for comparative genomics and transcriptomics of model legumes, for studying gene function and genome evolution in legumes. LegumeIP compiles gene and gene family information, syntenic and phylogenetic context and tissue-specific transcriptomic profiles. The database holds the genomic sequences of three model legumes, Medicago truncatula, Glycine max and Lotus japonicus plus two reference plant species, A. thaliana and Populus trichocarpa, with annotations based on UniProt, InterProScan, Gene Ontology and the Kyoto Encyclopedia of Genes and Genomes databases. LegumeIP also contains large-scale microarray and RNA-Seq-based gene expression data. Our new database is capable of systematic synteny analysis across M. truncatula, G. max, L. japonicas and A. thaliana, as well as construction and phylogenetic analysis of gene families across the five hosted species. Finally, LegumeIP provides comprehensive search and visualization tools that enable flexible queries based on gene annotation, gene family, synteny and relative gene expression.

  2. The Changing Face of Scientific Discourse: Analysis of Genomic and Proteomic Database Usage and Acceptance.

    ERIC Educational Resources Information Center

    Brown, Cecelia

    2003-01-01

    Discusses the growth in use and acceptance of Web-based genomic and proteomic databases (GPD) in scholarly communication. Confirms the role of GPD in the scientific literature cycle, suggests GPD are a storage and retrieval mechanism for molecular biology information, and recommends that existing models of scientific communication be updated to…

  3. BmTEdb: a collective database of transposable elements in the silkworm genome.

    PubMed

    Xu, Hong-En; Zhang, Hua-Hao; Xia, Tian; Han, Min-Jin; Shen, Yi-Hong; Zhang, Ze

    2013-01-01

    The silkworm, Bombyx mori, is one of the major insect model organisms, and its draft and fine genome sequences became available in 2004 and 2008, respectively. Transposable elements (TEs) constitute ~40% of the silkworm genome. To better understand the roles of TEs in organization, structure and evolution of the silkworm genome, we used a combination of de novo, structure-based and homology-based approaches for identification of the silkworm TEs and identified 1308 silkworm TE families. These TE families and their classification information were organized into a comprehensive and easy-to-use web-based database, BmTEdb. Users are entitled to browse, search and download the sequences in the database. Sequence analyses such as BLAST, HMMER and EMBOSS GetORF were also provided in BmTEdb. This database will facilitate studies for the silkworm genomics, the TE functions in the silkworm and the comparative analysis of the insect TEs. Database URL: http://gene.cqu.edu.cn/BmTEdb/.

  4. A searchable database for the genome of Phomopsis longicolla (isolate MSPL 10-6).

    PubMed

    Darwish, Omar; Li, Shuxian; May, Zane; Matthews, Benjamin; Alkharouf, Nadim W

    2016-01-01

    Phomopsis longicolla (syn. Diaporthe longicolla) is an important seed-borne fungal pathogen that primarily causes Phomopsis seed decay (PSD) in most soybean production areas worldwide. This disease severely decreases soybean seed quality by reducing seed viability and oil quality, altering seed composition, and increasing frequencies of moldy and/or split beans. To facilitate investigation of the genetic base of fungal virulence factors and understand the mechanism of disease development, we designed and developed a database for P. longicolla isolate MSPL 10-6 that contains information about the genome assemblies (contigs), gene models, gene descriptions and GO functional ontologies. A web-based front end to the database was built using ASP.NET, which allows researchers to search and mine the genome of this important fungus. This database represents the first reported genome database for a seed borne fungal pathogen in the Diaporthe- Phomopsis complex. The database will also be a valuable resource for research and agricultural communities. It will aid in the development of new control strategies for this pathogen.

  5. CTDB: An Integrated Chickpea Transcriptome Database for Functional and Applied Genomics

    PubMed Central

    Patel, Ravi K.; Garg, Rohini; Jain, Mukesh

    2015-01-01

    Chickpea is an important grain legume used as a rich source of protein in human diet. The narrow genetic diversity and limited availability of genomic resources are the major constraints in implementing breeding strategies and biotechnological interventions for genetic enhancement of chickpea. We developed an integrated Chickpea Transcriptome Database (CTDB), which provides the comprehensive web interface for visualization and easy retrieval of transcriptome data in chickpea. The database features many tools for similarity search, functional annotation (putative function, PFAM domain and gene ontology) search and comparative gene expression analysis. The current release of CTDB (v2.0) hosts transcriptome datasets with high quality functional annotation from cultivated (desi and kabuli types) and wild chickpea. A catalog of transcription factor families and their expression profiles in chickpea are available in the database. The gene expression data have been integrated to study the expression profiles of chickpea transcripts in major tissues/organs and various stages of flower development. The utilities, such as similarity search, ortholog identification and comparative gene expression have also been implemented in the database to facilitate comparative genomic studies among different legumes and Arabidopsis. Furthermore, the CTDB represents a resource for the discovery of functional molecular markers (microsatellites and single nucleotide polymorphisms) between different chickpea types. We anticipate that integrated information content of this database will accelerate the functional and applied genomic research for improvement of chickpea. The CTDB web service is freely available at http://nipgr.res.in/ctdb.html. PMID:26322998

  6. RadishBase: a database for genomics and genetics of radish.

    PubMed

    Shen, Di; Sun, Honghe; Huang, Mingyun; Zheng, Yi; Li, Xixiang; Fei, Zhangjun

    2013-02-01

    Radish is an economically important vegetable crop. During the past several years, large-scale genomics and genetics resources have been accumulated for this species. To store, query, analyze and integrate these radish resources efficiently, we have developed RadishBase (http://bioinfo.bti.cornell.edu/radish), a genomics and genetics database of radish. Currently the database contains radish mitochondrial genome sequences, expressed sequence tag (EST) and unigene sequences and annotations, biochemical pathways, EST-derived single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers, and genetic maps. RadishBase is designed to enable users easily to retrieve and visualize biologically important information through a set of efficient query interfaces and analysis tools, including the BLAST search and unigene annotation query interfaces, and tools to classify unigenes functionally, to identify enriched gene ontology (GO) terms and to visualize genetic maps. A database containing radish pathways predicted from unigene sequences is also included in RadishBase. The tools and interfaces in RadishBase allow efficient mining of recently released and continually expanding large-scale radish genomics and genetics data sets, including the radish genome sequences and RNA-seq data sets.

  7. CoGemiR: A comparative genomics microRNA database

    PubMed Central

    Maselli, Vincenza; Di Bernardo, Diego; Banfi, Sandro

    2008-01-01

    Background MicroRNAs are small highly conserved non-coding RNAs which play an important role in regulating gene expression by binding the 3'UTR of target mRNAs. The majority of microRNAs are localized within other transcriptional units (host genes) and are co-expressed with them, which strongly suggests that microRNAs and corresponding host genes use the same promoter and other expression control elements. The remaining fraction of microRNAs is intergenic and is endowed with an independent regulatory region. A number of databases have already been developed to collect information about microRNAs but none of them allow an easy exploration of microRNA genomic organization across evolution. Results CoGemiR is a publicly available microRNA-centered database whose aim is to offer an overview of the genomic organization of microRNAs and of its extent of conservation during evolution in different metazoan species. The database collects information on genomic location, conservation and expression data of both known and newly predicted microRNAs and displays the data by privileging a comparative point of view. The database also includes a microRNA prediction pipeline to annotate microRNAs in recently sequenced genomes. This information is easily accessible via web through a user-friendly query page. The CoGemiR database is available at Conclusion The knowledge of the genomic organization of microRNAs can provide useful information to understand their biology. In order to have a comparative genomics overview of microRNAs genomic organization, we developed CoGemiR. To achieve this goal, we both collected and integrated data from pre-existing databases and generated new ones, such as the identification in several species of a number of previously unannotated microRNAs. For a more effective use of this data, we developed a user-friendly web interface that simply shows how a microRNA genomic context is related in different species. PMID:18837977

  8. Cycloquest: Identification of cyclopeptides via database search of their mass spectra against genome databases

    PubMed Central

    Mohimani, Hosein; Liu, Wei-Ting; Mylne, Joshua S.; Poth, Aaron G.; Colgrave, Michelle L.; Tran, Dat; Selsted, Michael E.; Dorrestein, Pieter C.; Pevzner, Pavel A.

    2011-01-01

    Hundreds of ribosomally synthesized cyclopeptides have been isolated from all domains of life, the vast majority having been reported in the last 15 years. Studies of cyclic peptides have highlighted their exceptional potential both as stable drug scaffolds and as biomedicines in their own right. Despite this, computational techniques for cyclopeptide identification are still in their infancy, with many such peptides remaining uncharacterized. Tandem mass spectrometry has occupied a niche role in cyclopeptide identification, taking over from traditional techniques such as nuclear magnetic resonance spectroscopy (NMR). MS/MS studies require only picogram quantities of peptide (compared to milligrams for NMR studies) and are applicable to complex samples, abolishing the requirement for time-consuming chromatographic purification. While database search tools such as Sequest and Mascot have become standard tools for the MS/MS identification of linear peptides, they are not applicable to cyclopeptides, due to the parent mass shift resulting from cyclization, and different fragmentation pattern of cyclic peptides. In this paper, we describe the development of a novel database search methodology to aid in the identification of cyclopeptides by mass spectrometry, and evaluate its utility in identifying two peptide rings from Helianthus annuus, a bacterial cannibalism factor from Bacillus subtilis, and a θ-defensin from Rhesus macaque. PMID:21851130

  9. Towards a Universal Clinical Genomics Database: the 2012 International Standards for Cytogenomic Arrays Consortium Meeting.

    PubMed

    Riggs, Erin Rooney; Wain, Karen E; Riethmaier, Darlene; Savage, Melissa; Smith-Packard, Bethanny; Kaminsky, Erin B; Rehm, Heidi L; Martin, Christa Lese; Ledbetter, David H; Faucett, W Andrew

    2013-06-01

    The 2012 International Standards for Cytogenomic Arrays (ISCA) Consortium Meeting, "Towards a Universal Clinical Genomic Database," was held in Bethesda, Maryland, May 21-22, 2012, and was attended by over 200 individuals from around the world representing clinical genetic testing laboratories, clinicians, academia, industry, research, and regulatory agencies. The scientific program centered on expanding the current focus of the ISCA Consortium to include the collection and curation of both structural and sequence-level variation into a unified clinical genomics database, available to the public through resources such as the National Center for Biotechnology Information's ClinVar database. Here, we provide an overview of the conference, with summaries of the topics presented for discussion by over 25 different speakers. Presentations are available online at www.iscaconsortium.org.

  10. Gene3D: Structural Assignment for Whole Genes and Genomes Using the CATH Domain Structure Database

    PubMed Central

    Buchan, Daniel W.A.; Shepherd, Adrian J.; Lee, David; Pearl, Frances M.G.; Rison, Stuart C.G.; Thornton, Janet M.; Orengo, Christine A.

    2002-01-01

    We present a novel web-based resource, Gene3D, of precalculated structural assignments to gene sequences and whole genomes. This resource assigns structural domains from the CATH database to whole genes and links these to their curated functional and structural annotations within the CATH domain structure database, the functional Dictionary of Homologous Superfamilies (DHS) and PDBsum. Currently Gene3D provides annotation for 36 complete genomes (two eukaryotes, six archaea, and 28 bacteria). On average, between 30% and 40% of the genes of a given genome can be structurally annotated. Matches to structural domains are found using the profile-based method (PSI-BLAST). and a novel protocol, DRange, is used to resolve conflicts in matches involving different homologous superfamilies. PMID:11875040

  11. Short Interspersed Nuclear Element (SINE) Sequences in the Genome of the Human Pathogenic Fungus Aspergillus fumigatus Af293

    PubMed Central

    Kanhayuwa, Lakkhana; Coutts, Robert H. A.

    2016-01-01

    Novel families of short interspersed nuclear element (SINE) sequences in the human pathogenic fungus Aspergillus fumigatus, clinical isolate Af293, were identified and categorised into tRNA-related and 5S rRNA-related SINEs. Eight predicted tRNA-related SINE families originating from different tRNAs, and nominated as AfuSINE2 sequences, contained target site duplications of short direct repeat sequences (4–14 bp) flanking the elements, an extended tRNA-unrelated region and typical features of RNA polymerase III promoter sequences. The elements ranged in size from 140–493 bp and were present in low copy number in the genome and five out of eight were actively transcribed. One putative tRNAArg-derived sequence, AfuSINE2-1a possessed a unique feature of repeated trinucleotide ACT residues at its 3’-terminus. This element was similar in sequence to the I-4_AO element found in A. oryzae and an I-1_AF long nuclear interspersed element-like sequence identified in A. fumigatus Af293. Families of 5S rRNA-related SINE sequences, nominated as AfuSINE3, were also identified and their 5'-5S rRNA-related regions show 50–65% and 60–75% similarity to respectively A. fumigatus 5S rRNAs and SINE3-1_AO found in A. oryzae. A. fumigatus Af293 contains five copies of AfuSINE3 sequences ranging in size from 259–343 bp and two out of five AfuSINE3 sequences were actively transcribed. Investigations on AfuSINE distribution in the fungal genome revealed that the elements are enriched in pericentromeric and subtelomeric regions and inserted within gene-rich regions. We also demonstrated that some, but not all, AfuSINE sequences are targeted by host RNA silencing mechanisms. Finally, we demonstrated that infection of the fungus with mycoviruses had no apparent effects on SINE activity. PMID:27736869

  12. Genome Shuffling of Mangrove Endophytic Aspergillus luchuensis MERV10 for Improving the Cholesterol-Lowering Agent Lovastatin under Solid State Fermentation

    PubMed Central

    El-Gendy, Mervat Morsy Abbas Ahmed; Al-Zahrani, Hind A. A.

    2016-01-01

    In the screening of marine mangrove derived fungi for lovastatin productivity, endophytic Aspergillus luchuensis MERV10 exhibited the highest lovastatin productivity (9.5 mg/gds) in solid state fermentation (SSF) using rice bran. Aspergillus luchuensis MERV10 was used as the parental strain in which to induce genetic variabilities after application of different mixtures as well as doses of mutagens followed by three successive rounds of genome shuffling. Four potent mutants, UN6, UN28, NE11, and NE23, with lovastatin productivity equal to 2.0-, 2.11-, 1.95-, and 2.11-fold higher than the parental strain, respectively, were applied for three rounds of genome shuffling as the initial mutants. Four hereditarily stable recombinants (F3/3, F3/7, F3/9, and F3/13) were obtained with lovastatin productivity equal to 50.8, 57.0, 49.7, and 51.0 mg/gds, respectively. Recombinant strain F3/7 yielded 57.0 mg/gds of lovastatin, which is 6-fold and 2.85-fold higher, respectively, than the initial parental strain and the highest mutants UN28 and NE23. It was therefore selected for the optimization of lovastatin production through improvement of SSF parameters. Lovastatin productivity was increased 32-fold through strain improvement methods, including mutations and three successive rounds of genome shuffling followed by optimizing SSF factors. PMID:27790068

  13. Design and implementation of a database for Brucella melitensis genome annotation.

    PubMed

    De Hertogh, Benoît; Lahlimi, Leïla; Lambert, Christophe; Letesson, Jean-Jacques; Depiereux, Eric

    2008-03-18

    The genome sequences of three Brucella biovars and of some species close to Brucella sp. have become available, leading to new relationship analysis. Moreover, the automatic genome annotation of the pathogenic bacteria Brucella melitensis has been manually corrected by a consortium of experts, leading to 899 modifications of start sites predictions among the 3198 open reading frames (ORFs) examined. This new annotation, coupled with the results of automatic annotation tools of the complete genome sequences of the B. melitensis genome (including BLASTs to 9 genomes close to Brucella), provides numerous data sets related to predicted functions, biochemical properties and phylogenic comparisons. To made these results available, alphaPAGe, a functional auto-updatable database of the corrected sequence genome of B. melitensis, has been built, using the entity-relationship (ER) approach and a multi-purpose database structure. A friendly graphical user interface has been designed, and users can carry out different kinds of information by three levels of queries: (1) the basic search use the classical keywords or sequence identifiers; (2) the original advanced search engine allows to combine (by using logical operators) numerous criteria: (a) keywords (textual comparison) related to the pCDS's function, family domains and cellular localization; (b) physico-chemical characteristics (numerical comparison) such as isoelectric point or molecular weight and structural criteria such as the nucleic length or the number of transmembrane helix (TMH); (c) similarity scores with Escherichia coli and 10 species phylogenetically close to B. melitensis; (3) complex queries can be performed by using a SQL field, which allows all queries respecting the database's structure. The database is publicly available through a Web server at the following url: http://www.fundp.ac.be/urbm/bioinfo/aPAGe.

  14. VaProS: a database-integration approach for protein/genome information retrieval.

    PubMed

    Gojobori, Takashi; Ikeo, Kazuho; Katayama, Yukie; Kawabata, Takeshi; Kinjo, Akira R; Kinoshita, Kengo; Kwon, Yeondae; Migita, Ohsuke; Mizutani, Hisashi; Muraoka, Masafumi; Nagata, Koji; Omori, Satoshi; Sugawara, Hideaki; Yamada, Daichi; Yura, Kei

    2016-12-01

    Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein-protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts' knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/ .

  15. The mouse genome database: genotypes, phenotypes, and models of human disease.

    PubMed

    Bult, Carol J; Eppig, Janan T; Blake, Judith A; Kadin, James A; Richardson, Joel E

    2013-01-01

    The laboratory mouse is the premier animal model for studying human biology because all life stages can be accessed experimentally, a completely sequenced reference genome is publicly available and there exists a myriad of genomic tools for comparative and experimental research. In the current era of genome scale, data-driven biomedical research, the integration of genetic, genomic and biological data are essential for realizing the full potential of the mouse as an experimental model. The Mouse Genome Database (MGD; http://www.informatics.jax.org), the community model organism database for the laboratory mouse, is designed to facilitate the use of the laboratory mouse as a model system for understanding human biology and disease. To achieve this goal, MGD integrates genetic and genomic data related to the functional and phenotypic characterization of mouse genes and alleles and serves as a comprehensive catalog for mouse models of human disease. Recent enhancements to MGD include the addition of human ortholog details to mouse Gene Detail pages, the inclusion of microRNA knockouts to MGD's catalog of alleles and phenotypes, the addition of video clips to phenotype images, providing access to genotype and phenotype data associated with quantitative trait loci (QTL) and improvements to the layout and display of Gene Ontology annotations.

  16. CottonGen: a genomics, genetics and breeding database for cotton research

    PubMed Central

    Yu, Jing; Jung, Sook; Cheng, Chun-Huai; Ficklin, Stephen P.; Lee, Taein; Zheng, Ping; Jones, Don; Percy, Richard G.; Main, Dorrie

    2014-01-01

    CottonGen (http://www.cottongen.org) is a curated and integrated web-based relational database providing access to publicly available genomic, genetic and breeding data for cotton. CottonGen supercedes CottonDB and the Cotton Marker Database, with enhanced tools for easier data sharing, mining, visualization and data retrieval of cotton research data. CottonGen contains annotated whole genome sequences, unigenes from expressed sequence tags (ESTs), markers, trait loci, genetic maps, genes, taxonomy, germplasm, publications and communication resources for the cotton community. Annotated whole genome sequences of Gossypium raimondii are available with aligned genetic markers and transcripts. These whole genome data can be accessed through genome pages, search tools and GBrowse, a popular genome browser. Most of the published cotton genetic maps can be viewed and compared using CMap, a comparative map viewer, and are searchable via map search tools. Search tools also exist for markers, quantitative trait loci (QTLs), germplasm, publications and trait evaluation data. CottonGen also provides online analysis tools such as NCBI BLAST and Batch BLAST. PMID:24203703

  17. Construction of a Pan-Genome Allele Database of Salmonella enterica Serovar Enteritidis for Molecular Subtyping and Disease Cluster Identification

    PubMed Central

    Liu, Yen-Yi; Chen, Chih-Chieh; Chiou, Chien-Shun

    2016-01-01

    We built a pan-genome allele database with 395 genomes of Salmonella enterica serovar Enteritidis and developed computer tools for analysis of whole genome sequencing (WGS) data of bacterial isolates for disease cluster identification. A web server (http://wgmlst.imst.nsysu.edu.tw) was set up with the database and the tools, allowing users to upload WGS data to generate whole genome multilocus sequence typing (wgMLST) profiles and to perform cluster analysis of wgMLST profiles. The usefulness of the database in disease cluster identification was demonstrated by analyzing a panel of genomes from 55 epidemiologically well-defined S. Enteritidis isolates provided by the Minnesota Department of Health. The wgMLST-based cluster analysis revealed distinct clades that were concordant with the epidemiologically defined outbreaks. Thus, using a common pan-genome allele database, wgMLST can be a promising WGS-based subtyping approach for disease surveillance and outbreak investigation across laboratories. PMID:28018331

  18. VIDA: a virus database system for the organization of animal virus genome open reading frames.

    PubMed

    Albà, M M; Lee, D; Pearl, F M; Shepherd, A J; Martin, N; Orengo, C A; Kellam, P

    2001-01-01

    VIDA is a new virus database that organizes open reading frames (ORFs) from partial and complete genomic sequences from animal viruses. Currently VIDA includes all sequences from GenBank for Herpesviridae, Coronaviridae and Arteriviridae. The ORFs are organized into homologous protein families, which are identified on the basis of sequence similarity relationships. Conserved sequence regions of potential functional importance are identified and can be retrieved as sequence alignments. We use a controlled taxonomical and functional classification for all the proteins and protein families in the database. When available, protein structures that are related to the families have also been included. The database is available for online search and sequence information retrieval at http://www.biochem.ucl.ac.uk/bsm/virus_database/ VIDA.html.

  19. VIDA: a virus database system for the organization of animal virus genome open reading frames

    PubMed Central

    Albà, M. Mar; Lee, David; Pearl, Frances M. G.; Shepherd, Adrian J.; Martin, Nigel; Orengo, Christine A.; Kellam, Paul

    2001-01-01

    VIDA is a new virus database that organizes open reading frames (ORFs) from partial and complete genomic sequences from animal viruses. Currently VIDA includes all sequences from GenBank for Herpesviridae, Coronaviridae and Arteriviridae. The ORFs are organized into homologous protein families, which are identified on the basis of sequence similarity relationships. Conserved sequence regions of potential functional importance are identified and can be retrieved as sequence alignments. We use a controlled taxonomical and functional classification for all the proteins and protein families in the database. When available, protein structures that are related to the families have also been included. The database is available for online search and sequence information retrieval at http://www.biochem.ucl.ac.uk/bsm/virus_database/VIDA.html. PMID:11125070

  20. dbEM: A database of epigenetic modifiers curated from cancerous and normal genomes.

    PubMed

    Singh Nanda, Jagpreet; Kumar, Rahul; Raghava, Gajendra P S

    2016-01-18

    We have developed a database called dbEM (database of Epigenetic Modifiers) to maintain the genomic information of about 167 epigenetic modifiers/proteins, which are considered as potential cancer targets. In dbEM, modifiers are classified on functional basis and comprise of 48 histone methyl transferases, 33 chromatin remodelers and 31 histone demethylases. dbEM maintains the genomic information like mutations, copy number variation and gene expression in thousands of tumor samples, cancer cell lines and healthy samples. This information is obtained from public resources viz. COSMIC, CCLE and 1000-genome project. Gene essentiality data retrieved from COLT database further highlights the importance of various epigenetic proteins for cancer survival. We have also reported the sequence profiles, tertiary structures and post-translational modifications of these epigenetic proteins in cancer. It also contains information of 54 drug molecules against different epigenetic proteins. A wide range of tools have been integrated in dbEM e.g. Search, BLAST, Alignment and Profile based prediction. In our analysis, we found that epigenetic proteins DNMT3A, HDAC2, KDM6A, and TET2 are highly mutated in variety of cancers. We are confident that dbEM will be very useful in cancer research particularly in the field of epigenetic proteins based cancer therapeutics. This database is available for public at URL: http://crdd.osdd.net/raghava/dbem.

  1. dbEM: A database of epigenetic modifiers curated from cancerous and normal genomes

    NASA Astrophysics Data System (ADS)

    Singh Nanda, Jagpreet; Kumar, Rahul; Raghava, Gajendra P. S.

    2016-01-01

    We have developed a database called dbEM (database of Epigenetic Modifiers) to maintain the genomic information of about 167 epigenetic modifiers/proteins, which are considered as potential cancer targets. In dbEM, modifiers are classified on functional basis and comprise of 48 histone methyl transferases, 33 chromatin remodelers and 31 histone demethylases. dbEM maintains the genomic information like mutations, copy number variation and gene expression in thousands of tumor samples, cancer cell lines and healthy samples. This information is obtained from public resources viz. COSMIC, CCLE and 1000-genome project. Gene essentiality data retrieved from COLT database further highlights the importance of various epigenetic proteins for cancer survival. We have also reported the sequence profiles, tertiary structures and post-translational modifications of these epigenetic proteins in cancer. It also contains information of 54 drug molecules against different epigenetic proteins. A wide range of tools have been integrated in dbEM e.g. Search, BLAST, Alignment and Profile based prediction. In our analysis, we found that epigenetic proteins DNMT3A, HDAC2, KDM6A, and TET2 are highly mutated in variety of cancers. We are confident that dbEM will be very useful in cancer research particularly in the field of epigenetic proteins based cancer therapeutics. This database is available for public at URL: http://crdd.osdd.net/raghava/dbem.

  2. PGSB/MIPS PlantsDB Database Framework for the Integration and Analysis of Plant Genome Data.

    PubMed

    Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai; Gundlach, Heidrun; Mayer, Klaus F X

    2017-01-01

    Plant Genome and Systems Biology (PGSB), formerly Munich Institute for Protein Sequences (MIPS) PlantsDB, is a database framework for the integration and analysis of plant genome data, developed and maintained for more than a decade now. Major components of that framework are genome databases and analysis resources focusing on individual (reference) genomes providing flexible and intuitive access to data. Another main focus is the integration of genomes from both model and crop plants to form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny). Data exchange and integrated search functionality with/over many plant genome databases is provided within the transPLANT project.

  3. Unlimited Thirst for Genome Sequencing, Data Interpretation, and Database Usage in Genomic Era: The Road towards Fast-Track Crop Plant Improvement

    PubMed Central

    Govindaraj, Mahalingam

    2015-01-01

    The number of sequenced crop genomes and associated genomic resources is growing rapidly with the advent of inexpensive next generation sequencing methods. Databases have become an integral part of all aspects of science research, including basic and applied plant and animal sciences. The importance of databases keeps increasing as the volume of datasets from direct and indirect genomics, as well as other omics approaches, keeps expanding in recent years. The databases and associated web portals provide at a minimum a uniform set of tools and automated analysis across a wide range of crop plant genomes. This paper reviews some basic terms and considerations in dealing with crop plant databases utilization in advancing genomic era. The utilization of databases for variation analysis with other comparative genomics tools, and data interpretation platforms are well described. The major focus of this review is to provide knowledge on platforms and databases for genome-based investigations of agriculturally important crop plants. The utilization of these databases in applied crop improvement program is still being achieved widely; otherwise, the end for sequencing is not far away. PMID:25874133

  4. gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes

    PubMed Central

    Nakagawa, So; Takahashi, Mahoko Ueda

    2016-01-01

    In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs. To investigate the function and evolution of EVEs in mammalian genomes, we developed EVE ORF databases for 20 genomes of 19 mammalian species. A total of 736,771 non-overlapping EVE ORFs were identified and archived in a database named gEVE (http://geve.med.u-tokai.ac.jp). The gEVE database provides nucleotide and amino acid sequences, genomic loci and functional annotations of EVE ORFs for all 20 genomes. In analyzing RNA-seq data with the gEVE database, we successfully identified the expressed EVE genes, suggesting that the gEVE database facilitates studies of the genomic analyses of various mammalian species. Database URL: http://geve.med.u-tokai.ac.jp PMID:27242033

  5. gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes.

    PubMed

    Nakagawa, So; Takahashi, Mahoko Ueda

    2016-01-01

    In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs. To investigate the function and evolution of EVEs in mammalian genomes, we developed EVE ORF databases for 20 genomes of 19 mammalian species. A total of 736,771 non-overlapping EVE ORFs were identified and archived in a database named gEVE (http://geve.med.u-tokai.ac.jp). The gEVE database provides nucleotide and amino acid sequences, genomic loci and functional annotations of EVE ORFs for all 20 genomes. In analyzing RNA-seq data with the gEVE database, we successfully identified the expressed EVE genes, suggesting that the gEVE database facilitates studies of the genomic analyses of various mammalian species.Database URL: http://geve.med.u-tokai.ac.jp.

  6. Large-Scale Phylogenetic Classification of Fungal Chitin Synthases and Identification of a Putative Cell-Wall Metabolism Gene Cluster in Aspergillus Genomes

    PubMed Central

    Pacheco-Arjona, Jose Ramon; Ramirez-Prado, Jorge Humberto

    2014-01-01

    The cell wall is a protective and versatile structure distributed in all fungi. The component responsible for its rigidity is chitin, a product of chitin synthase (Chsp) enzymes. There are seven classes of chitin synthase genes (CHS) and the amount and type encoded in fungal genomes varies considerably from one species to another. Previous Chsp sequence analyses focused on their study as individual units, regardless of genomic context. The identification of blocks of conserved genes between genomes can provide important clues about the interactions and localization of chitin synthases. On the present study, we carried out an in silico search of all putative Chsp encoded in 54 full fungal genomes, encompassing 21 orders from five phyla. Phylogenetic studies of these Chsp were able to confidently classify 347 out of the 369 Chsp identified (94%). Patterns in the distribution of Chsp related to taxonomy were identified, the most prominent being related to the type of fungal growth. More importantly, a synteny analysis for genomic blocks centered on class IV Chsp (the most abundant and widely distributed Chsp class) identified a putative cell wall metabolism gene cluster in members of the genus Aspergillus, the first such association reported for any fungal genome. PMID:25148134

  7. SinEx DB: a database for single exon coding sequences in mammalian genomes.

    PubMed

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl.

  8. SinEx DB: a database for single exon coding sequences in mammalian genomes

    PubMed Central

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F.; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S.

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as ‘single exon genes’ (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs. Database URL: www.sinex.cl PMID:27278816

  9. Genome Data from DOOR: a Database for prOkaryotic OpeRons

    DOE Data Explorer

    DOOR (Database of prOkaryotic OpeRons) is an operon database developed by Computational Systems Biology Lab (CSBL) at University of Georgia. Although the operons in the database are based on prediction, there are some unique features. These are: • A algorithm is consistently best at all aspects including sensitivity and specificity for both true positives and true negatives, and the overall accuracy reaches 90 percent. The prediction algorithm is based on this paper: P. Dam, V. Olman, K. Harris, Z. Su, Y. Xu., Operon prediction using both genome-specific and general genomic information, Nucleic Acids Res., 35(1):288-98, 2007 • DOOR provides one of the largest data sets of operon information available to the public. DOOR provides operons for 675 prokaryotic genomes. Although most of operons in DOOR are not verified by experiments, the creators are also trying to provide some limited literature information, which is extracted from ODB. They emphasize that if the users are looking for strictly experimentally verified operons, they should look into DBTBS and RegulonDB first. • Operons which include RNA genes, which are rarely seen in other operon databases especially for predicted operon databases • Defined the similarity scores between operons, which is based on weighted maximum matching between operons. Similar operon groups can be used to predict accurate orthologous genes,and their upstream regions can be used to find the consensus binding motifs. • Integration of two motif finding programs in the database: MEME and CUBIC. DOOR provides an Organism View for browsing, a gene search tool, an operon search tool, and the operon prediction interface.[Text taken and edited from http://csbl1.bmb.uga.edu/OperonDB/tutorial.php

  10. SorghumFDB: sorghum functional genomics database with multidimensional network analysis

    PubMed Central

    Tian, Tian; You, Qi; Zhang, Liwei; Yi, Xin; Yan, Hengyu; Xu, Wenying; Su, Zhen

    2016-01-01

    Sorghum (Sorghum bicolor [L.] Moench) has excellent agronomic traits and biological properties, such as heat and drought-tolerance. It is a C4 grass and potential bioenergy-producing plant, which makes it an important crop worldwide. With the sorghum genome sequence released, it is essential to establish a sorghum functional genomics data mining platform. We collected genomic data and some functional annotations to construct a sorghum functional genomics database (SorghumFDB). SorghumFDB integrated knowledge of sorghum gene family classifications (transcription regulators/factors, carbohydrate-active enzymes, protein kinases, ubiquitins, cytochrome P450, monolignol biosynthesis related enzymes, R-genes and organelle-genes), detailed gene annotations, miRNA and target gene information, orthologous pairs in the model plants Arabidopsis, rice and maize, gene loci conversions and a genome browser. We further constructed a dynamic network of multidimensional biological relationships, comprised of the co-expression data, protein–protein interactions and miRNA-target pairs. We took effective measures to combine the network, gene set enrichment and motif analyses to determine the key regulators that participate in related metabolic pathways, such as the lignin pathway, which is a major biological process in bioenergy-producing plants. Database URL: http://structuralbiology.cau.edu.cn/sorghum/index.html. PMID:27352859

  11. Importance of databases of nucleic acids for bioinformatic analysis focused to genomics

    NASA Astrophysics Data System (ADS)

    Jimenez-Gutierrez, L. R.; Barrios-Hernández, C. J.; Pedraza-Ferreira, G. R.; Vera-Cala, L.; Martinez-Perez, F.

    2016-08-01

    Recently, bioinformatics has become a new field of science, indispensable in the analysis of millions of nucleic acids sequences, which are currently deposited in international databases (public or private); these databases contain information of genes, RNA, ORF, proteins, intergenic regions, including entire genomes from some species. The analysis of this information requires computer programs; which were renewed in the use of new mathematical methods, and the introduction of the use of artificial intelligence. In addition to the constant creation of supercomputing units trained to withstand the heavy workload of sequence analysis. However, it is still necessary the innovation on platforms that allow genomic analyses, faster and more effectively, with a technological understanding of all biological processes.

  12. Manual Gene Ontology annotation workflow at the Mouse Genome Informatics Database.

    PubMed

    Drabkin, Harold J; Blake, Judith A

    2012-01-01

    The Mouse Genome Database, the Gene Expression Database and the Mouse Tumor Biology database are integrated components of the Mouse Genome Informatics (MGI) resource (http://www.informatics.jax.org). The MGI system presents both a consensus view and an experimental view of the knowledge concerning the genetics and genomics of the laboratory mouse. From genotype to phenotype, this information resource integrates information about genes, sequences, maps, expression analyses, alleles, strains and mutant phenotypes. Comparative mammalian data are also presented particularly in regards to the use of the mouse as a model for the investigation of molecular and genetic components of human diseases. These data are collected from literature curation as well as downloads of large datasets (SwissProt, LocusLink, etc.). MGI is one of the founding members of the Gene Ontology (GO) and uses the GO for functional annotation of genes. Here, we discuss the workflow associated with manual GO annotation at MGI, from literature collection to display of the annotations. Peer-reviewed literature is collected mostly from a set of journals available electronically. Selected articles are entered into a master bibliography and indexed to one of eight areas of interest such as 'GO' or 'homology' or 'phenotype'. Each article is then either indexed to a gene already contained in the database or funneled through a separate nomenclature database to add genes. The master bibliography and associated indexing provide information for various curator-reports such as 'papers selected for GO that refer to genes with NO GO annotation'. Once indexed, curators who have expertise in appropriate disciplines enter pertinent information. MGI makes use of several controlled vocabularies that ensure uniform data encoding, enable robust analysis and support the construction of complex queries. These vocabularies range from pick-lists to structured vocabularies such as the GO. All data associations are supported

  13. Evaluation of relational and NoSQL database architectures to manage genomic annotations.

    PubMed

    Schulz, Wade L; Nelson, Brent G; Felker, Donn K; Durant, Thomas J S; Torres, Richard

    2016-12-01

    While the adoption of next generation sequencing has rapidly expanded, the informatics infrastructure used to manage the data generated by this technology has not kept pace. Historically, relational databases have provided much of the framework for data storage and retrieval. Newer technologies based on NoSQL architectures may provide significant advantages in storage and query efficiency, thereby reducing the cost of data management. But their relative advantage when applied to biomedical data sets, such as genetic data, has not been characterized. To this end, we compared the storage, indexing, and query efficiency of a common relational database (MySQL), a document-oriented NoSQL database (MongoDB), and a relational database with NoSQL support (PostgreSQL). When used to store genomic annotations from the dbSNP database, we found the NoSQL architectures to outperform traditional, relational models for speed of data storage, indexing, and query retrieval in nearly every operation. These findings strongly support the use of novel database technologies to improve the efficiency of data management within the biological sciences.

  14. GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences.

    PubMed

    Antonov, Ivan; Baranov, Pavel; Borodovsky, Mark

    2013-01-01

    Database annotations of prokaryotic genomes and eukaryotic mRNA sequences pay relatively low attention to frame transitions that disrupt protein-coding genes. Frame transitions (frameshifts) could be caused by sequencing errors or indel mutations inside protein-coding regions. Other observed frameshifts are related to recoding events (that evolved to control expression of some genes). Earlier, we have developed an algorithm and software program GeneTack for ab initio frameshift finding in intronless genes. Here, we describe a database (freely available at http://topaz.gatech.edu/GeneTack/db.html) containing genes with frameshifts (fs-genes) predicted by GeneTack. The database includes 206 991 fs-genes from 1106 complete prokaryotic genomes and 45 295 frameshifts predicted in mRNA sequences from 100 eukaryotic genomes. The whole set of fs-genes was grouped into clusters based on sequence similarity between fs-proteins (conceptually translated fs-genes), conservation of the frameshift position and frameshift direction (-1, +1). The fs-genes can be retrieved by similarity search to a given query sequence via a web interface, by fs-gene cluster browsing, etc. Clusters of fs-genes are characterized with respect to their likely origin, such as pseudogenization, phase variation, etc. The largest clusters contain fs-genes with programed frameshifts (related to recoding events).

  15. The need for high-quality whole-genome sequence databases in microbial forensics.

    PubMed

    Sjödin, Andreas; Broman, Tina; Melefors, Öjar; Andersson, Gunnar; Rasmusson, Birgitta; Knutsson, Rickard; Forsman, Mats

    2013-09-01

    Microbial forensics is an important part of a strengthened capability to respond to biocrime and bioterrorism incidents to aid in the complex task of distinguishing between natural outbreaks and deliberate acts. The goal of a microbial forensic investigation is to identify and criminally prosecute those responsible for a biological attack, and it involves a detailed analysis of the weapon--that is, the pathogen. The recent development of next-generation sequencing (NGS) technologies has greatly increased the resolution that can be achieved in microbial forensic analyses. It is now possible to identify, quickly and in an unbiased manner, previously undetectable genome differences between closely related isolates. This development is particularly relevant for the most deadly bacterial diseases that are caused by bacterial lineages with extremely low levels of genetic diversity. Whole-genome analysis of pathogens is envisaged to be increasingly essential for this purpose. In a microbial forensic context, whole-genome sequence analysis is the ultimate method for strain comparisons as it is informative during identification, characterization, and attribution--all 3 major stages of the investigation--and at all levels of microbial strain identity resolution (ie, it resolves the full spectrum from family to isolate). Given these capabilities, one bottleneck in microbial forensics investigations is the availability of high-quality reference databases of bacterial whole-genome sequences. To be of high quality, databases need to be curated and accurate in terms of sequences, metadata, and genetic diversity coverage. The development of whole-genome sequence databases will be instrumental in successfully tracing pathogens in the future.

  16. The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata

    PubMed Central

    Pagani, Ioanna; Liolios, Konstantinos; Jansson, Jakob; Chen, I-Min A.; Smirnova, Tatyana; Nosrat, Bahador; Markowitz, Victor M.; Kyrpides, Nikos C.

    2012-01-01

    The Genomes OnLine Database (GOLD, http://www.genomesonline.org/) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2011, GOLD, now on version 4.0, contains information for 11 472 sequencing projects, of which 2907 have been completed and their sequence data has been deposited in a public repository. Out of these complete projects, 1918 are finished and 989 are permanent drafts. Moreover, GOLD contains information for 340 metagenome studies associated with 1927 metagenome samples. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about any (x) Sequence specification and beyond. PMID:22135293

  17. Genome sequence of Aspergillus flavus NRRL 3357, a strain that causes aflatoxin contamination of food and feed

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxin contamination of food and livestock feed results in significant annual crop losses internationally. Aspergillus flavus is the major fungus responsible for this loss. Additionally, A. flavus is the second leading cause of aspergillosis in immune compromised human patients. Here we report th...

  18. The Rat Genome Database, update 2007--easing the path from disease to data and back again.

    PubMed

    Twigger, Simon N; Shimoyama, Mary; Bromberg, Susan; Kwitek, Anne E; Jacob, Howard J

    2007-01-01

    The Rat Genome Database (RGD, http://rgd.mcw.edu) is one of the core resources for rat genomics and recent developments have focused on providing support for disease-based research using the rat model. Recognizing the importance of the rat as a disease model we have employed targeted curation strategies to curate genes, QTL and strain data for neurological and cardiovascular disease areas. This work has centered on rat but also includes data for mouse and human to create 'disease portals' that provide a unified view of the genes, QTL and strain models for these diseases across the three species. The disease curation efforts combined with normal curation activities have served to greatly increase the content of the database, particularly for biological information, including gene ontology, disease, pathway and phenotype ontology annotations. In addition to improving the features and database content, community outreach has been expanded to demonstrate how investigators can leverage the resources at RGD to facilitate their research and to elicit suggestions and needs for future developments. We have published a number of papers that provide additional information on the ontology annotations and the tools at RGD for data mining and analysis to better enable researchers to fully utilize the database.

  19. Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements

    PubMed Central

    Mukherjee, Supratim; Stamatis, Dimitri; Bertsch, Jon; Ovchinnikova, Galina; Verezemska, Olena; Isbandi, Michelle; Thomas, Alex D.; Ali, Rida; Sharma, Kaushal; Kyrpides, Nikos C.; Reddy, T. B. K.

    2017-01-01

    The Genomes Online Database (GOLD) (https://gold.jgi.doe.gov) is a manually curated data management system that catalogs sequencing projects with associated metadata from around the world. In the current version of GOLD (v.6), all projects are organized based on a four level classification system in the form of a Study, Organism (for isolates) or Biosample (for environmental samples), Sequencing Project and Analysis Project. Currently, GOLD provides information for 26 117 Studies, 239 100 Organisms, 15 887 Biosamples, 97 212 Sequencing Projects and 78 579 Analysis Projects. These are integrated with over 312 metadata fields from which 58 are controlled vocabularies with 2067 terms. The web interface facilitates submission of a diverse range of Sequencing Projects (such as isolate genome, single-cell genome, metagenome, metatranscriptome) and complex Analysis Projects (such as genome from metagenome, or combined assembly from multiple Sequencing Projects). GOLD provides a seamless interface with the Integrated Microbial Genomes (IMG) system and supports and promotes the Genomic Standards Consortium (GSC) Minimum Information standards. This paper describes the data updates and additional features added during the last two years. PMID:27794040

  20. TRACTOR_DB: a database of regulatory networks in gamma-proteobacterial genomes.

    PubMed

    González, Abel D; Espinosa, Vladimir; Vasconcelos, Ana T; Pérez-Rueda, Ernesto; Collado-Vides, Julio

    2005-01-01

    Experimental data on the Escherichia coli transcriptional regulatory system has been used in the past years to predict new regulatory elements (promoters, transcription factors (TFs), TFs' binding sites and operons) within its genome. As more genomes of gamma-proteobacteria are being sequenced, the prediction of these elements in a growing number of organisms has become more feasible, as a step towards the study of how different bacteria respond to environmental changes at the level of transcriptional regulation. In this work, we present TRACTOR_DB (TRAnscription FaCTORs' predicted binding sites in prokaryotic genomes), a relational database that contains computational predictions of new members of 74 regulons in 17 gamma-proteobacterial genomes. For these predictions we used a comparative genomics approach regarding which several proof-of-principle articles for large regulons have been published. TRACTOR_DB may be currently accessed at http://www.bioinfo.cu/Tractor_DB, http://www.tractor.lncc.br/ or at http://www.cifn.unam.mx/Computational_Genomics/tractorDB. Contact Email id is tractor@cifn.unam.mx.

  1. The MiST2 database: a comprehensive genomics resource on microbial signal transduction.

    PubMed

    Ulrich, Luke E; Zhulin, Igor B

    2010-01-01

    The MiST2 database (http://mistdb.com) identifies and catalogs the repertoire of signal transduction proteins in microbial genomes. Signal transduction systems regulate the majority of cellular activities including the metabolism, development, host-recognition, biofilm production, virulence, and antibiotic resistance of human pathogens. Thus, knowledge of the proteins and interactions that comprise these communication networks is an essential component to furthering biomedical discovery. These are identified by searching protein sequences for specific domain profiles that implicate a protein in signal transduction. Compared to the previous version of the database, MiST2 contains a host of new features and improvements including the following: draft genomes; extracytoplasmic function (ECF) sigma factor protein identification; enhanced classification of signaling proteins; novel, high-quality domain models for identifying histidine kinases and response regulators; neighboring two-component genes; gene cart; better search capabilities; enhanced taxonomy browser; advanced genome browser; and a modern, biologist-friendly web interface. MiST2 currently contains 966 complete and 157 draft bacterial and archaeal genomes, which collectively contain more than 245 000 signal transduction proteins. The majority (66%) of these are one-component systems, followed by two-component proteins (26%), chemotaxis (6%), and finally ECF factors (2%).

  2. A web-based microsatellite database for the Magnaporthe oryzae genome

    PubMed Central

    Singh, Pankaj Kumar; Singh, Akshay; Pawar, Deepak V.; Devanna, B. N.; Singh, Jyoti; Sharma, Vinay; Sharma, Tilak R.

    2016-01-01

    Microsatellites have been widely utilized for molecular marker development. Codominant and multiallelic nature of these simple repeats have several advantages over other types of molecular markers. Their broad applicability in the area of molecular biology like gene mapping, genome characterization, genome evolution, and gene regulation has been reported in various crop plants, animals and fungi. Considering these benefits of the SSR markers, a MMDB (Magnaporthe oryzae Microsatellite Database) was developed to help in understanding about the pathogen and its diversity at strains level of a particular geographic region, which can help us to make a proper utilization of blast resistance genes in the region. This microsatellite database is based on whole genome sequence of two M. oryzae isolates, RML-29 (2665 SSRs from 43037792 bp) and RP-2421 (3169 SSRs from 45510614 bp). Although, first M. oryzae genome (70-15) was sequenced in 2005, but this sequenced isolate is not a true field isolate of M. oryzae. Therefore, MMDB has great potential in the study of diversification and characterization of M. oryzae and other related fungi. Availability: http://14.139.229.199/home.aspx PMID:28293068

  3. SmedGD 2.0: The Schmidtea mediterranea genome database.

    PubMed

    Robb, Sofia M C; Gotting, Kirsten; Ross, Eric; Sánchez Alvarado, Alejandro

    2015-08-01

    Planarians have emerged as excellent models for the study of key biological processes such as stem cell function and regulation, axial polarity specification, regeneration, and tissue homeostasis among others. The most widely used organism for these studies is the free-living flatworm Schmidtea mediterranea. In 2007, the Schmidtea mediterranea Genome Database (SmedGD) was first released to provide a much needed resource for the small, but growing planarian community. SmedGD 1.0 has been a depository for genome sequence, a draft assembly, and related experimental data (e.g., RNAi phenotypes, in situ hybridization images, and differential gene expression results). We report here a comprehensive update to SmedGD (SmedGD 2.0) that aims to expand its role as an interactive community resource. The new database includes more recent, and up-to-date transcription data, provides tools that enhance interconnectivity between different genome assemblies and transcriptomes, including next-generation assemblies for both the sexual and asexual biotypes of S. mediterranea. SmedGD 2.0 (http://smedgd.stowers.org) not only provides significantly improved gene annotations, but also tools for data sharing, attributes that will help both the planarian and biomedical communities to more efficiently mine the genomics and transcriptomics of S. mediterranea.

  4. The MiST2 database: a comprehensive genomics resource on microbial signal transduction

    PubMed Central

    Ulrich, Luke E.; Zhulin, Igor B.

    2010-01-01

    The MiST2 database (http://mistdb.com) identifies and catalogs the repertoire of signal transduction proteins in microbial genomes. Signal transduction systems regulate the majority of cellular activities including the metabolism, development, host-recognition, biofilm production, virulence, and antibiotic resistance of human pathogens. Thus, knowledge of the proteins and interactions that comprise these communication networks is an essential component to furthering biomedical discovery. These are identified by searching protein sequences for specific domain profiles that implicate a protein in signal transduction. Compared to the previous version of the database, MiST2 contains a host of new features and improvements including the following: draft genomes; extracytoplasmic function (ECF) sigma factor protein identification; enhanced classification of signaling proteins; novel, high-quality domain models for identifying histidine kinases and response regulators; neighboring two-component genes; gene cart; better search capabilities; enhanced taxonomy browser; advanced genome browser; and a modern, biologist-friendly web interface. MiST2 currently contains 966 complete and 157 draft bacterial and archaeal genomes, which collectively contain more than 245 000 signal transduction proteins. The majority (66%) of these are one-component systems, followed by two-component proteins (26%), chemotaxis (6%), and finally ECF factors (2%). PMID:19900966

  5. SmedGD 2.0: The Schmidtea mediterranea genome database

    PubMed Central

    Robb, Sofia M.C.; Gotting, Kirsten; Ross, Eric; Sánchez Alvarado, Alejandro

    2016-01-01

    Planarians have emerged as excellent models for the study of key biological processes such as stem cell function and regulation, axial polarity specification, regeneration, and tissue homeostasis among others. The most widely used organism for these studies is the free-living flatworm Schmidtea mediterranea. In 2007, the Schmidtea mediterranea Genome Database (SmedGD) was first released to provide a much needed resource for the small, but growing planarian community. SmedGD 1.0 has been a depository for genome sequence, a draft assembly, and related experimental data (e.g., RNAi phenotypes, in situ hybridization images, and differential gene expression results). We report here a comprehensive update to SmedGD (SmedGD 2.0) that aims to expand its role as an interactive community resource. The new database includes more recent, and up-to-date transcription data, provides tools that enhance interconnectivity between different genome assemblies and transcriptomes, including next generation assemblies for both the sexual and asexual biotypes of S. mediterranea. SmedGD 2.0 (http://smedgd.stowers.org) not only provides significantly improved gene annotations, but also tools for data sharing, attributes that will help both the planarian and biomedical communities to more efficiently mine the genomics and transcriptomics of S. mediterranea. PMID:26138588

  6. PReMod: a database of genome-wide mammalian cis-regulatory module predictions.

    PubMed

    Ferretti, Vincent; Poitras, Christian; Bergeron, Dominique; Coulombe, Benoit; Robert, François; Blanchette, Mathieu

    2007-01-01

    We describe PReMod, a new database of genome-wide cis-regulatory module (CRM) predictions for both the human and the mouse genomes. The prediction algorithm, described previously in Blanchette et al. (2006) Genome Res., 16, 656-668, exploits the fact that many known CRMs are made of clusters of phylogenetically conserved and repeated transcription factors (TF) binding sites. Contrary to other existing databases, PReMod is not restricted to modules located proximal to genes, but in fact mostly contains distal predicted CRMs (pCRMs). Through its web interface, PReMod allows users to (i) identify pCRMs around a gene of interest; (ii) identify pCRMs that have binding sites for a given TF (or a set of TFs) or (iii) download the entire dataset for local analyses. Queries can also be refined by filtering for specific chromosomal regions, for specific regions relative to genes or for the presence of CpG islands. The output includes information about the binding sites predicted within the selected pCRMs, and a graphical display of their distribution within the pCRMs. It also provides a visual depiction of the chromosomal context of the selected pCRMs in terms of neighboring pCRMs and genes, all of which are linked to the UCSC Genome Browser and the NCBI. PReMod: http://genomequebec.mcgill.ca/PReMod.

  7. PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database

    PubMed Central

    Davis, James J.; Gerdes, Svetlana; Olsen, Gary J.; Olson, Robert; Pusch, Gordon D.; Shukla, Maulik; Vonstein, Veronika; Wattam, Alice R.; Yoo, Hyunseung

    2016-01-01

    The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods. PMID:26903996

  8. PATtyFams: Protein families for the microbial genomes in the PATRIC database

    SciTech Connect

    Davis, James J.; Gerdes, Svetlana; Olsen, Gary J.; Olson, Robert; Pusch, Gordon D.; Shukla, Maulik; Vonstein, Veronika; Wattam, Alice R.; Yoo, Hyunseung

    2016-02-08

    The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). In conclusion, this new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.

  9. CyanoBase and RhizoBase: databases of manually curated annotations for cyanobacterial and rhizobial genomes.

    PubMed

    Fujisawa, Takatomo; Okamoto, Shinobu; Katayama, Toshiaki; Nakao, Mitsuteru; Yoshimura, Hidehisa; Kajiya-Kanegae, Hiromi; Yamamoto, Sumiko; Yano, Chiyoko; Yanaka, Yuka; Maita, Hiroko; Kaneko, Takakazu; Tabata, Satoshi; Nakamura, Yasukazu

    2014-01-01

    To understand newly sequenced genomes of closely related species, comprehensively curated reference genome databases are becoming increasingly important. We have extended CyanoBase (http://genome.microbedb.jp/cyanobase), a genome database for cyanobacteria, and newly developed RhizoBase (http://genome.microbedb.jp/rhizobase), a genome database for rhizobia, nitrogen-fixing bacteria associated with leguminous plants. Both databases focus on the representation and reusability of reference genome annotations, which are continuously updated by manual curation. Domain experts have extracted names, products and functions of each gene reported in the literature. To ensure effectiveness of this procedure, we developed the TogoAnnotation system offering a web-based user interface and a uniform storage of annotations for the curators of the CyanoBase and RhizoBase databases. The number of references investigated for CyanoBase increased from 2260 in our previous report to 5285, and for RhizoBase, we perused 1216 references. The results of these intensive annotations are displayed on the GeneView pages of each database. Advanced users can also retrieve this information through the representational state transfer-based web application programming interface in an automated manner.

  10. Construction of an Ortholog Database Using the Semantic Web Technology for Integrative Analysis of Genomic Data

    PubMed Central

    Chiba, Hirokazu; Nishide, Hiroyo; Uchiyama, Ikuo

    2015-01-01

    Recently, various types of biological data, including genomic sequences, have been rapidly accumulating. To discover biological knowledge from such growing heterogeneous data, a flexible framework for data integration is necessary. Ortholog information is a central resource for interlinking corresponding genes among different organisms, and the Semantic Web provides a key technology for the flexible integration of heterogeneous data. We have constructed an ortholog database using the Semantic Web technology, aiming at the integration of numerous genomic data and various types of biological information. To formalize the structure of the ortholog information in the Semantic Web, we have constructed the Ortholog Ontology (OrthO). While the OrthO is a compact ontology for general use, it is designed to be extended to the description of database-specific concepts. On the basis of OrthO, we described the ortholog information from our Microbial Genome Database for Comparative Analysis (MBGD) in the form of Resource Description Framework (RDF) and made it available through the SPARQL endpoint, which accepts arbitrary queries specified by users. In this framework based on the OrthO, the biological data of different organisms can be integrated using the ortholog information as a hub. Besides, the ortholog information from different data sources can be compared with each other using the OrthO as a shared ontology. Here we show some examples demonstrating that the ortholog information described in RDF can be used to link various biological data such as taxonomy information and Gene Ontology. Thus, the ortholog database using the Semantic Web technology can contribute to biological knowledge discovery through integrative data analysis. PMID:25875762

  11. Construction of an ortholog database using the semantic web technology for integrative analysis of genomic data.

    PubMed

    Chiba, Hirokazu; Nishide, Hiroyo; Uchiyama, Ikuo

    2015-01-01

    Recently, various types of biological data, including genomic sequences, have been rapidly accumulating. To discover biological knowledge from such growing heterogeneous data, a flexible framework for data integration is necessary. Ortholog information is a central resource for interlinking corresponding genes among different organisms, and the Semantic Web provides a key technology for the flexible integration of heterogeneous data. We have constructed an ortholog database using the Semantic Web technology, aiming at the integration of numerous genomic data and various types of biological information. To formalize the structure of the ortholog information in the Semantic Web, we have constructed the Ortholog Ontology (OrthO). While the OrthO is a compact ontology for general use, it is designed to be extended to the description of database-specific concepts. On the basis of OrthO, we described the ortholog information from our Microbial Genome Database for Comparative Analysis (MBGD) in the form of Resource Description Framework (RDF) and made it available through the SPARQL endpoint, which accepts arbitrary queries specified by users. In this framework based on the OrthO, the biological data of different organisms can be integrated using the ortholog information as a hub. Besides, the ortholog information from different data sources can be compared with each other using the OrthO as a shared ontology. Here we show some examples demonstrating that the ortholog information described in RDF can be used to link various biological data such as taxonomy information and Gene Ontology. Thus, the ortholog database using the Semantic Web technology can contribute to biological knowledge discovery through integrative data analysis.

  12. ATGC: a database of orthologous genes from closely related prokaryotic genomes and a research platform for microevolution of prokaryotes

    SciTech Connect

    Novichkov, Pavel S.; Ratnere, Igor; Wolf, Yuri I.; Koonin, Eugene V.; Dubchak, Inna

    2009-07-23

    The database of Alignable Tight Genomic Clusters (ATGCs) consists of closely related genomes of archaea and bacteria, and is a resource for research into prokaryotic microevolution. Construction of a data set with appropriate characteristics is a major hurdle for this type of studies. With the current rate of genome sequencing, it is difficult to follow the progress of the field and to determine which of the available genome sets meet the requirements of a given research project, in particular, with respect to the minimum and maximum levels of similarity between the included genomes. Additionally, extraction of specific content, such as genomic alignments or families of orthologs, from a selected set of genomes is a complicated and time-consuming process. The database addresses these problems by providing an intuitive and efficient web interface to browse precomputed ATGCs, select appropriate ones and access ATGC-derived data such as multiple alignments of orthologous proteins, matrices of pairwise intergenomic distances based on genome-wide analysis of synonymous and nonsynonymous substitution rates and others. The ATGC database will be regularly updated following new releases of the NCBI RefSeq. The database is hosted by the Genomics Division at Lawrence Berkeley National laboratory and is publicly available at http://atgc.lbl.gov.

  13. ATGC: a database of orthologous genes from closely related prokaryotic genomes and a research platform for microevolution of prokaryotes

    PubMed Central

    Novichkov, Pavel S.; Ratnere, Igor; Wolf, Yuri I.; Koonin, Eugene V.; Dubchak, Inna

    2009-01-01

    The database of Alignable Tight Genomic Clusters (ATGCs) consists of closely related genomes of archaea and bacteria, and is a resource for research into prokaryotic microevolution. Construction of a data set with appropriate characteristics is a major hurdle for this type of studies. With the current rate of genome sequencing, it is difficult to follow the progress of the field and to determine which of the available genome sets meet the requirements of a given research project, in particular, with respect to the minimum and maximum levels of similarity between the included genomes. Additionally, extraction of specific content, such as genomic alignments or families of orthologs, from a selected set of genomes is a complicated and time-consuming process. The database addresses these problems by providing an intuitive and efficient web interface to browse precomputed ATGCs, select appropriate ones and access ATGC-derived data such as multiple alignments of orthologous proteins, matrices of pairwise intergenomic distances based on genome-wide analysis of synonymous and nonsynonymous substitution rates and others. The ATGC database will be regularly updated following new releases of the NCBI RefSeq. The database is hosted by the Genomics Division at Lawrence Berkeley National laboratory and is publicly available at http://atgc.lbl.gov PMID:18845571

  14. The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease.

    PubMed

    Eppig, Janan T; Blake, Judith A; Bult, Carol J; Kadin, James A; Richardson, Joel E

    2015-01-01

    The Mouse Genome Database (MGD, http://www.informatics.jax.org) serves the international biomedical research community as the central resource for integrated genomic, genetic and biological data on the laboratory mouse. To facilitate use of mouse as a model in translational studies, MGD maintains a core of high-quality curated data and integrates experimentally and computationally generated data sets. MGD maintains a unified catalog of genes and genome features, including functional RNAs, QTL and phenotypic loci. MGD curates and provides functional and phenotype annotations for mouse genes using the Gene Ontology and Mammalian Phenotype Ontology. MGD integrates phenotype data and associates mouse genotypes to human diseases, providing critical mouse-human relationships and access to repositories holding mouse models. MGD is the authoritative source of nomenclature for genes, genome features, alleles and strains following guidelines of the International Committee on Standardized Genetic Nomenclature for Mice. A new addition to MGD, the Human-Mouse: Disease Connection, allows users to explore gene-phenotype-disease relationships between human and mouse. MGD has also updated search paradigms for phenotypic allele attributes, incorporated incidental mutation data, added a module for display and exploration of genes and microRNA interactions and adopted the JBrowse genome browser. MGD resources are freely available to the scientific community.

  15. Rat Genome Database: a unique resource for rat, human, and mouse quantitative trait locus data.

    PubMed

    Nigam, Rajni; Laulederkind, Stanley J F; Hayman, G Thomas; Smith, Jennifer R; Wang, Shur-Jen; Lowry, Timothy F; Petri, Victoria; De Pons, Jeff; Tutaj, Marek; Liu, Weisong; Jayaraman, Pushkala; Munzenmaier, Diane H; Worthey, Elizabeth A; Dwinell, Melinda R; Shimoyama, Mary; Jacob, Howard J

    2013-09-16

    The rat has been widely used as a disease model in a laboratory setting, resulting in an abundance of genetic and phenotype data from a wide variety of studies. These data can be found at the Rat Genome Database (RGD, http://rgd.mcw.edu/), which provides a platform for researchers interested in linking genomic variations to phenotypes. Quantitative trait loci (QTLs) form one of the earliest and core datasets, allowing researchers to identify loci harboring genes associated with disease. These QTLs are not only important for those using the rat to identify genes and regions associated with disease, but also for cross-organism analyses of syntenic regions on the mouse and the human genomes to identify potential regions for study in these organisms. Currently, RGD has data on >1,900 rat QTLs that include details about the methods and animals used to determine the respective QTL along with the genomic positions and markers that define the region. RGD also curates human QTLs (>1,900) and houses>4,000 mouse QTLs (imported from Mouse Genome Informatics). Multiple ontologies are used to standardize traits, phenotypes, diseases, and experimental methods to facilitate queries, analyses, and cross-organism comparisons. QTLs are visualized in tools such as GBrowse and GViewer, with additional tools for analysis of gene sets within QTL regions. The QTL data at RGD provide valuable information for the study of mapped phenotypes and identification of candidate genes for disease associations.

  16. The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease

    PubMed Central

    Eppig, Janan T.; Blake, Judith A.; Bult, Carol J.; Kadin, James A.; Richardson, Joel E.

    2015-01-01

    The Mouse Genome Database (MGD, http://www.informatics.jax.org) serves the international biomedical research community as the central resource for integrated genomic, genetic and biological data on the laboratory mouse. To facilitate use of mouse as a model in translational studies, MGD maintains a core of high-quality curated data and integrates experimentally and computationally generated data sets. MGD maintains a unified catalog of genes and genome features, including functional RNAs, QTL and phenotypic loci. MGD curates and provides functional and phenotype annotations for mouse genes using the Gene Ontology and Mammalian Phenotype Ontology. MGD integrates phenotype data and associates mouse genotypes to human diseases, providing critical mouse–human relationships and access to repositories holding mouse models. MGD is the authoritative source of nomenclature for genes, genome features, alleles and strains following guidelines of the International Committee on Standardized Genetic Nomenclature for Mice. A new addition to MGD, the Human–Mouse: Disease Connection, allows users to explore gene–phenotype–disease relationships between human and mouse. MGD has also updated search paradigms for phenotypic allele attributes, incorporated incidental mutation data, added a module for display and exploration of genes and microRNA interactions and adopted the JBrowse genome browser. MGD resources are freely available to the scientific community. PMID:25348401

  17. The phytophthora genome initiative database: informatics and analysis for distributed pathogenomic research.

    PubMed

    Waugh, M; Hraber, P; Weller, J; Wu, Y; Chen, G; Inman, J; Kiphart, D; Sobral, B

    2000-01-01

    The Phytophthora Genome Initiative (PGI) is a distributed collaboration to study the genome and evolution of a particularly destructive group of plant pathogenic oomycete, with the goal of understanding the mechanisms of infection and resistance. NCGR provides informatics support for the collaboration as well as a centralized data repository. In the pilot phase of the project, several investigators prepared Phytophthora infestans and Phytophthora sojae EST and Phytophthora sojae BAC libraries and sent them to another laboratory for sequencing. Data from sequencing reactions were transferred to NCGR for analysis and curation. An analysis pipeline transforms raw data by performing simple analyses (i.e., vector removal and similarity searching) that are stored and can be retrieved by investigators using a web browser. Here we describe the database and access tools, provide an overview of the data therein and outline future plans. This resource has provided a unique opportunity for the distributed, collaborative study of a genus from which relatively little sequence data are available. Results may lead to insight into how better to control these pathogens. The homepage of PGI can be accessed at http:www.ncgr.org/pgi, with database access through the database access hyperlink.

  18. Tree shrew database (TreeshrewDB): a genomic knowledge base for the Chinese tree shrew

    PubMed Central

    Fan, Yu; Yu, Dandan; Yao, Yong-Gang

    2014-01-01

    The tree shrew (Tupaia belangeri) is a small mammal with a close relationship to primates and it has been proposed as an alternative experimental animal to primates in biomedical research. The recent release of a high-quality Chinese tree shrew genome enables more researchers to use this species as the model animal in their studies. With the aim to making the access to an extensively annotated genome database straightforward and easy, we have created the Tree shrew Database (TreeshrewDB). This is a web-based platform that integrates the currently available data from the tree shrew genome, including an updated gene set, with a systematic functional annotation and a mRNA expression pattern. In addition, to assist with automatic gene sequence analysis, we have integrated the common programs Blast, Muscle, GBrowse, GeneWise and codeml, into TreeshrewDB. We have also developed a pipeline for the analysis of positive selection. The user-friendly interface of TreeshrewDB, which is available at http://www.treeshrewdb.org, will undoubtedly help in many areas of biological research into the tree shrew. PMID:25413576

  19. A survey of locus-specific database curation. Human Genome Variation Society.

    PubMed

    Cotton, Richard G H; Phillips, Kate; Horaitis, Ourania

    2007-04-01

    It is widely accepted that curation of variation in genes is best performed by experts in those genes and their variation. However, obtaining funding for such variation is difficult even though up-to-date lists of variations in genes are essential for optimum delivery of genetic healthcare and for medical research. This study was undertaken to gather information on gene-specific databases (locus-specific databases) in an effort to understand their functioning, funding and needs. A questionnaire was sent to 125 curators and we received 47 responses. Individuals performed curation of up to 69 genes. The time curators spent curating was extremely variable. This ranged from 0 h per week up to 5 curators spending over 4 h per week. The funding required ranged from US$600 to US$45,000 per year. Most databases were stimulated by the Human Genome Organization-Mutation Database Initiative and used their guidelines. Many databases reported unpublished mutations, with all but one respondent reporting errors in the literature. Of the 13 who reported hit rates, 9 reported over 52,000 hits per year. On the basis of this, five recommendations were made to improve the curation of variation information, particularly that of mutations causing single-gene disorder: 1. A curator for each gene, who is an expert in it, should be identified or nominated. 2. Curation at a minimum of 2 h per week at US$2000 per gene per year should be encouraged. 3. Guidelines and custom software use should be encouraged to facilitate easy setup and curation. 4. Hits per week on the website should be recorded to allow the importance of the site to be illustrated for grant-giving purposes. 5. Published protocols should be followed in the establishment of locus-specific databases.

  20. Overlap of the cancer genome atlas and the immune epitope database.

    PubMed

    Sait, Shaimaa; Fawcett, Timothy; Blanck, George

    2016-10-01

    Mutant peptides resulting from cancer drivers or passenger mutations are expected to have the potential to serve as a basis for cancer vaccines. However, a number of parameters regulate vaccine-associated immunogenicity, including the suitability of a peptide for binding to an antigen-presenting molecule or antibody. In order to obtain a basic indication of the prospect of human cancer epitope identification via current database development strategies, an overlap of the mutant Homo sapiens epitopes listed on the Immune Epitope Database (IEDB) and the mutant peptides indicated by The Cancer Genome Atlas (TCGA) somatic mutation database was obtained. No putative TCGA mutant peptides were detected among the 8,890 14-18 amino acid (AA) IEDB peptides available. In total, 3 IEDB mutant epitopes that encompassed a TCGA mutant AA position, but did not overlap the exact position of the TCGA mutant AA, were detected. The results of the present analysis confirm that verification of certain aspects of cancer epitope function can be obtained via the continued and systematic expansion of databases representing human protein epitopes. However, the analysis also indicates that there is relatively limited systematic information available regarding antigen-presenting molecule epitopes and cancer-related mutant peptides.

  1. Overlap of the cancer genome atlas and the immune epitope database

    PubMed Central

    Sait, Shaimaa; Fawcett, Timothy; Blanck, George

    2016-01-01

    Mutant peptides resulting from cancer drivers or passenger mutations are expected to have the potential to serve as a basis for cancer vaccines. However, a number of parameters regulate vaccine-associated immunogenicity, including the suitability of a peptide for binding to an antigen-presenting molecule or antibody. In order to obtain a basic indication of the prospect of human cancer epitope identification via current database development strategies, an overlap of the mutant Homo sapiens epitopes listed on the Immune Epitope Database (IEDB) and the mutant peptides indicated by The Cancer Genome Atlas (TCGA) somatic mutation database was obtained. No putative TCGA mutant peptides were detected among the 8,890 14–18 amino acid (AA) IEDB peptides available. In total, 3 IEDB mutant epitopes that encompassed a TCGA mutant AA position, but did not overlap the exact position of the TCGA mutant AA, were detected. The results of the present analysis confirm that verification of certain aspects of cancer epitope function can be obtained via the continued and systematic expansion of databases representing human protein epitopes. However, the analysis also indicates that there is relatively limited systematic information available regarding antigen-presenting molecule epitopes and cancer-related mutant peptides. PMID:27703532

  2. Strategies to explore functional genomics data sets in NCBI's GEO database.

    PubMed

    Wilhite, Stephen E; Barrett, Tanya

    2012-01-01

    The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze, and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries.

  3. Mouse Genome Database (MGD)-2017: community knowledge resource for the laboratory mouse

    PubMed Central

    Blake, Judith A.; Eppig, Janan T.; Kadin, James A.; Richardson, Joel E.; Smith, Cynthia L.; Bult, Carol J.

    2017-01-01

    The Mouse Genome Database (MGD: http://www.informatics.jax.org) is the primary community data resource for the laboratory mouse. It provides a highly integrated and highly curated system offering a comprehensive view of current knowledge about mouse genes, genetic markers and genomic features as well as the associations of those features with sequence, phenotypes, functional and comparative information, and their relationships to human diseases. MGD continues to enhance access to these data, to extend the scope of data content and visualizations, and to provide infrastructure and user support that ensures effective and efficient use of MGD in the advancement of scientific knowledge. Here, we report on recent enhancements made to the resource and new features. PMID:27899570

  4. Research Update: The materials genome initiative: Data sharing and the impact of collaborative ab initio databases

    NASA Astrophysics Data System (ADS)

    Jain, Anubhav; Persson, Kristin A.; Ceder, Gerbrand

    2016-05-01

    Materials innovations enable new technological capabilities and drive major societal advancements but have historically required long and costly development cycles. The Materials Genome Initiative (MGI) aims to greatly reduce this time and cost. In this paper, we focus on data reuse in the MGI and, in particular, discuss the impact of three different computational databases based on density functional theory methods to the research community. We also discuss and provide recommendations on technical aspects of data reuse, outline remaining fundamental challenges, and present an outlook on the future of MGI's vision of data sharing.

  5. Mining the Plasmodium genome database to define organellar function: what does the apicoplast do?

    PubMed Central

    Roos, David S; Crawford, Michael J; Donald, Robert G K; Fraunholz, Martin; Harb, Omar S; He, Cynthia Y; Kissinger, Jessica C; Shaw, Michael K; Striepen, Boris

    2002-01-01

    Apicomplexan species constitute a diverse group of parasitic protozoa, which are responsible for a wide range of diseases in many organisms. Despite differences in the diseases they cause, these parasites share an underlying biology, from the genetic controls used to differentiate through the complex parasite life cycle, to the basic biochemical pathways employed for intracellular survival, to the distinctive cell biology necessary for host cell attachment and invasion. Different parasites lend themselves to the study of different aspects of parasite biology: Eimeria for biochemical studies, Toxoplasma for molecular genetic and cell biological investigation, etc. The Plasmodium falciparum Genome Project contributes the first large-scale genomic sequence for an apicomplexan parasite. The Plasmodium Genome Database (http://PlasmoDB.org) has been designed to permit individual investigators to ask their own questions, even prior to formal release of the reference P. falciparum genome sequence. As a case in point, PlasmoDB has been exploited to identify metabolic pathways associated with the apicomplexan plastid, or 'apicoplast' - an essential organelle derived by secondary endosymbiosis of an alga, and retention of the algal plastid. PMID:11839180

  6. MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource for plant genomics.

    PubMed

    Schoof, Heiko; Ernst, Rebecca; Nazarov, Vladimir; Pfeifer, Lukas; Mewes, Hans-Werner; Mayer, Klaus F X

    2004-01-01

    Arabidopsis thaliana is the most widely studied model plant. Functional genomics is intensively underway in many laboratories worldwide. Beyond the basic annotation of the primary sequence data, the annotated genetic elements of Arabidopsis must be linked to diverse biological data and higher order information such as metabolic or regulatory pathways. The MIPS Arabidopsis thaliana database MAtDB aims to provide a comprehensive resource for Arabidopsis as a genome model that serves as a primary reference for research in plants and is suitable for transfer of knowledge to other plants, especially crops. The genome sequence as a common backbone serves as a scaffold for the integration of data, while, in a complementary effort, these data are enhanced through the application of state-of-the-art bioinformatics tools. This information is visualized on a genome-wide and a gene-by-gene basis with access both for web users and applications. This report updates the information given in a previous report and provides an outlook on further developments. The MAtDB web interface can be accessed at http://mips.gsf.de/proj/thal/db.

  7. Coverage of whole proteome by structural genomics observed through protein homology modeling database

    PubMed Central

    Yamaguchi, Akihiro; Go, Mitiko

    2006-01-01

    We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE (http://daisy.nagahama-i-bio.ac.jp/Famsbase/), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50% of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60% of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30% in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5%, and that for eubacteria by 7% in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics. PMID:17146617

  8. OntoMate: a text-mining tool aiding curation at the Rat Genome Database

    PubMed Central

    Liu, Weisong; Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R.; De Pons, Jeff; Dwinell, Melinda R.; Shimoyama, Mary

    2015-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu PMID:25619558

  9. OntoMate: a text-mining tool aiding curation at the Rat Genome Database.

    PubMed

    Liu, Weisong; Laulederkind, Stanley J F; Hayman, G Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R; De Pons, Jeff; Dwinell, Melinda R; Shimoyama, Mary

    2015-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu.

  10. SNPpy - Database Management for SNP Data from Genome Wide Association Studies

    PubMed Central

    Mitha, Faheem; Herodotou, Herodotos; Borisov, Nedyalko; Jiang, Chen; Yoder, Josh; Owzar, Kouros

    2011-01-01

    Background We describe SNPpy, a hybrid script database system using the Python SQLAlchemy library coupled with the PostgreSQL database to manage genotype data from Genome-Wide Association Studies (GWAS). This system makes it possible to merge study data with HapMap data and merge across studies for meta-analyses, including data filtering based on the values of phenotype and Single-Nucleotide Polymorphism (SNP) data. SNPpy and its dependencies are open source software. Results The current version of SNPpy offers utility functions to import genotype and annotation data from two commercial platforms. We use these to import data from two GWAS studies and the HapMap Project. We then export these individual datasets to standard data format files that can be imported into statistical software for downstream analyses. Conclusions By leveraging the power of relational databases, SNPpy offers integrated management and manipulation of genotype and phenotype data from GWAS studies. The analysis of these studies requires merging across GWAS datasets as well as patient and marker selection. To this end, SNPpy enables the user to filter the data and output the results as standardized GWAS file formats. It does low level and flexible data validation, including validation of patient data. SNPpy is a practical and extensible solution for investigators who seek to deploy central management of their GWAS data. PMID:22039405

  11. dbWGFP: a database and web server of human whole-genome single nucleotide variants and their functional predictions.

    PubMed

    Wu, Jiaxin; Wu, Mengmeng; Li, Lianshuo; Liu, Zhuo; Zeng, Wanwen; Jiang, Rui

    2016-01-01

    The recent advancement of the next generation sequencing technology has enabled the fast and low-cost detection of all genetic variants spreading across the entire human genome, making the application of whole-genome sequencing a tendency in the study of disease-causing genetic variants. Nevertheless, there still lacks a repository that collects predictions of functionally damaging effects of human genetic variants, though it has been well recognized that such predictions play a central role in the analysis of whole-genome sequencing data. To fill this gap, we developed a database named dbWGFP (a database and web server of human whole-genome single nucleotide variants and their functional predictions) that contains functional predictions and annotations of nearly 8.58 billion possible human whole-genome single nucleotide variants. Specifically, this database integrates 48 functional predictions calculated by 17 popular computational methods and 44 valuable annotations obtained from various data sources. Standalone software, user-friendly query services and free downloads of this database are available at http://bioinfo.au.tsinghua.edu.cn/dbwgfp. dbWGFP provides a valuable resource for the analysis of whole-genome sequencing, exome sequencing and SNP array data, thereby complementing existing data sources and computational resources in deciphering genetic bases of human inherited diseases.

  12. PATtyFams: Protein families for the microbial genomes in the PATRIC database

    DOE PAGES

    Davis, James J.; Gerdes, Svetlana; Olsen, Gary J.; ...

    2016-02-08

    The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based functionmore » assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). In conclusion, this new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.« less

  13. The new modern era of yeast genomics: community sequencing and the resulting annotation of multiple Saccharomyces cerevisiae strains at the Saccharomyces Genome Database.

    PubMed

    Engel, Stacia R; Cherry, J Michael

    2013-01-01

    The first completed eukaryotic genome sequence was that of the yeast Saccharomyces cerevisiae, and the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the original model organism database. SGD remains the authoritative community resource for the S. cerevisiae reference genome sequence and its annotation, and continues to provide comprehensive biological information correlated with S. cerevisiae genes and their products. A diverse set of yeast strains have been sequenced to explore commercial and laboratory applications, and a brief history of those strains is provided. The publication of these new genomes has motivated the creation of new tools, and SGD will annotate and provide comparative analyses of these sequences, correlating changes with variations in strain phenotypes and protein function. We are entering a new era at SGD, as we incorporate these new sequences and make them accessible to the scientific community, all in an effort to continue in our mission of educating researchers and facilitating discovery.

  14. Genome-wide analysis of the Zn(II)2Cys6 zinc cluster-encoding gene family in Aspergillus flavus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Proteins with a Zn(II)2Cys6 domain, Cys-X2-Cys-X6-Cys-X5-12-Cys-X2-Cys-X6-9-Cys (hereafter, referred to as the C6 domain), form a subclass of zinc finger proteins found exclusively in fungi and yeast. Genome sequence databases of Saccharomyces cerevisiae and Candida albicans have provided an overvie...

  15. SBMDb: first whole genome putative microsatellite DNA marker database of sugarbeet for bioenergy and industrial applications

    PubMed Central

    Iquebal, Mir Asif; Jaiswal, Sarika; Angadi, U.B.; Sablok, Gaurav; Arora, Vasu; Kumar, Sunil; Rai, Anil; Kumar, Dinesh

    2015-01-01

    DNA marker plays important role as valuable tools to increase crop productivity by finding plausible answers to genetic variations and linking the Quantitative Trait Loci (QTL) of beneficial trait. Prior approaches in development of Short Tandem Repeats (STR) markers were time consuming and inefficient. Recent methods invoking the development of STR markers using whole genomic or transcriptomics data has gained wide importance with immense potential in developing breeding and cultivator improvement approaches. Availability of whole genome sequences and in silico approaches has revolutionized bulk marker discovery. We report world’s first sugarbeet whole genome marker discovery having 145 K markers along with 5 K functional domain markers unified in common platform using MySQL, Apache and PHP in SBMDb. Embedded markers and corresponding location information can be selected for desired chromosome, location/interval and primers can be generated using Primer3 core, integrated at backend. Our analyses revealed abundance of ‘mono’ repeat (76.82%) over ‘di’ repeats (13.68%). Highest density (671.05 markers/Mb) was found in chromosome 1 and lowest density (341.27 markers/Mb) in chromosome 6. Current investigation of sugarbeet genome marker density has direct implications in increasing mapping marker density. This will enable present linkage map having marker distance of ∼2 cM, i.e. from 200 to 2.6 Kb, thus facilitating QTL/gene mapping. We also report e-PCR-based detection of 2027 polymorphic markers in panel of five genotypes. These markers can be used for DUS test of variety identification and MAS/GAS in variety improvement program. The present database presents wide source of potential markers for developing and implementing new approaches for molecular breeding required to accelerate industrious use of this crop, especially for sugar, health care products, medicines and color dye. Identified markers will also help in improvement of bioenergy trait

  16. TP53 Variations in Human Cancers: New Lessons from the IARC TP53 Database and Genomics Data.

    PubMed

    Bouaoun, Liacine; Sonkin, Dmitriy; Ardin, Maude; Hollstein, Monica; Byrnes, Graham; Zavadil, Jiri; Olivier, Magali

    2016-09-01

    TP53 gene mutations are one of the most frequent somatic events in cancer. The IARC TP53 Database (http://p53.iarc.fr) is a popular resource that compiles occurrence and phenotype data on TP53 germline and somatic variations linked to human cancer. The deluge of data coming from cancer genomic studies generates new data on TP53 variations and attracts a growing number of database users for the interpretation of TP53 variants. Here, we present the current contents and functionalities of the IARC TP53 Database and perform a systematic analysis of TP53 somatic mutation data extracted from this database and from genomic data repositories. This analysis showed that IARC has more TP53 somatic mutation data than genomic repositories (29,000 vs. 4,000). However, the more complete screening achieved by genomic studies highlighted some overlooked facts about TP53 mutations, such as the presence of a significant number of mutations occurring outside the DNA-binding domain in specific cancer types. We also provide an update on TP53 inherited variants including the ones that should be considered as neutral frequent variations. We thus provide an update of current knowledge on TP53 variations in human cancer as well as inform users on the efficient use of the IARC TP53 Database.

  17. Analysis of disease-associated objects at the Rat Genome Database.

    PubMed

    Wang, Shur-Jen; Laulederkind, Stanley J F; Hayman, G T; Smith, Jennifer R; Petri, Victoria; Lowry, Timothy F; Nigam, Rajni; Dwinell, Melinda R; Worthey, Elizabeth A; Munzenmaier, Diane H; Shimoyama, Mary; Jacob, Howard J

    2013-01-01

    The Rat Genome Database (RGD) is the premier resource for genetic, genomic and phenotype data for the laboratory rat, Rattus norvegicus. In addition to organizing biological data from rats, the RGD team focuses on manual curation of gene-disease associations for rat, human and mouse. In this work, we have analyzed disease-associated strains, quantitative trait loci (QTL) and genes from rats. These disease objects form the basis for seven disease portals. Among disease portals, the cardiovascular disease and obesity/metabolic syndrome portals have the highest number of rat strains and QTL. These two portals share 398 rat QTL, and these shared QTL are highly concentrated on rat chromosomes 1 and 2. For disease-associated genes, we performed gene ontology (GO) enrichment analysis across portals using RatMine enrichment widgets. Fifteen GO terms, five from each GO aspect, were selected to profile enrichment patterns of each portal. Of the selected biological process (BP) terms, 'regulation of programmed cell death' was the top enriched term across all disease portals except in the obesity/metabolic syndrome portal where 'lipid metabolic process' was the most enriched term. 'Cytosol' and 'nucleus' were common cellular component (CC) annotations for disease genes, but only the cancer portal genes were highly enriched with 'nucleus' annotations. Similar enrichment patterns were observed in a parallel analysis using the DAVID functional annotation tool. The relationship between the preselected 15 GO terms and disease terms was examined reciprocally by retrieving rat genes annotated with these preselected terms. The individual GO term-annotated gene list showed enrichment in physiologically related diseases. For example, the 'regulation of blood pressure' genes were enriched with cardiovascular disease annotations, and the 'lipid metabolic process' genes with obesity annotations. Furthermore, we were able to enhance enrichment of neurological diseases by combining 'G

  18. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases

    PubMed Central

    Caspi, Ron; Altman, Tomer; Billington, Richard; Dreher, Kate; Foerster, Hartmut; Fulcher, Carol A.; Holland, Timothy A.; Keseler, Ingrid M.; Kothari, Anamika; Kubo, Aya; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A.; Ong, Quang; Paley, Suzanne; Subhraveti, Pallavi; Weaver, Daniel S.; Weerasinghe, Deepika; Zhang, Peifen; Karp, Peter D.

    2014-01-01

    The MetaCyc database (MetaCyc.org) is a comprehensive and freely accessible database describing metabolic pathways and enzymes from all domains of life. MetaCyc pathways are experimentally determined, mostly small-molecule metabolic pathways and are curated from the primary scientific literature. MetaCyc contains >2100 pathways derived from >37 000 publications, and is the largest curated collection of metabolic pathways currently available. BioCyc (BioCyc.org) is a collection of >3000 organism-specific Pathway/Genome Databases (PGDBs), each containing the full genome and predicted metabolic network of one organism, including metabolites, enzymes, reactions, metabolic pathways, predicted operons, transport systems and pathway-hole fillers. Additions to BioCyc over the past 2 years include YeastCyc, a PGDB for Saccharomyces cerevisiae, and 891 new genomes from the Human Microbiome Project. The BioCyc Web site offers a variety of tools for querying and analysis of PGDBs, including Omics Viewers and tools for comparative analysis. New developments include atom mappings in reactions, a new representation of glycan degradation pathways, improved compound structure display, better coverage of enzyme kinetic data, enhancements of the Web Groups functionality, improvements to the Omics viewers, a new representation of the Enzyme Commission system and, for the desktop version of the software, the ability to save display states. PMID:24225315

  19. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases.

    PubMed

    Caspi, Ron; Altman, Tomer; Billington, Richard; Dreher, Kate; Foerster, Hartmut; Fulcher, Carol A; Holland, Timothy A; Keseler, Ingrid M; Kothari, Anamika; Kubo, Aya; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A; Ong, Quang; Paley, Suzanne; Subhraveti, Pallavi; Weaver, Daniel S; Weerasinghe, Deepika; Zhang, Peifen; Karp, Peter D

    2014-01-01

    The MetaCyc database (MetaCyc.org) is a comprehensive and freely accessible database describing metabolic pathways and enzymes from all domains of life. MetaCyc pathways are experimentally determined, mostly small-molecule metabolic pathways and are curated from the primary scientific literature. MetaCyc contains >2100 pathways derived from >37,000 publications, and is the largest curated collection of metabolic pathways currently available. BioCyc (BioCyc.org) is a collection of >3000 organism-specific Pathway/Genome Databases (PGDBs), each containing the full genome and predicted metabolic network of one organism, including metabolites, enzymes, reactions, metabolic pathways, predicted operons, transport systems and pathway-hole fillers. Additions to BioCyc over the past 2 years include YeastCyc, a PGDB for Saccharomyces cerevisiae, and 891 new genomes from the Human Microbiome Project. The BioCyc Web site offers a variety of tools for querying and analysis of PGDBs, including Omics Viewers and tools for comparative analysis. New developments include atom mappings in reactions, a new representation of glycan degradation pathways, improved compound structure display, better coverage of enzyme kinetic data, enhancements of the Web Groups functionality, improvements to the Omics viewers, a new representation of the Enzyme Commission system and, for the desktop version of the software, the ability to save display states.

  20. The Saccharomyces Genome Database: Gene Product Annotation of Function, Process, and Component.

    PubMed

    Cherry, J Michael

    2015-12-02

    An ontology is a highly structured form of controlled vocabulary. Each entry in the ontology is commonly called a term. These terms are used when talking about an annotation. However, each term has a definition that, like the definition of a word found within a dictionary, provides the complete usage and detailed explanation of the term. It is critical to consult a term's definition because the distinction between terms can be subtle. The use of ontologies in biology started as a way of unifying communication between scientific communities and to provide a standard dictionary for different topics, including molecular functions, biological processes, mutant phenotypes, chemical properties and structures. The creation of ontology terms and their definitions often requires debate to reach agreement but the result has been a unified descriptive language used to communicate knowledge. In addition to terms and definitions, ontologies require a relationship used to define the type of connection between terms. In an ontology, a term can have more than one parent term, the term above it in an ontology, as well as more than one child, the term below it in the ontology. Many ontologies are used to construct annotations in the Saccharomyces Genome Database (SGD), as in all modern biological databases; however, Gene Ontology (GO), a descriptive system used to categorize gene function, is the most extensively used ontology in SGD annotations. Examples included in this protocol illustrate the structure and features of this ontology.

  1. PeroxisomeDB: a database for the peroxisomal proteome, functional genomics and disease

    PubMed Central

    Schlüter, Agatha; Fourcade, Stéphane; Domènech-Estévez, Enric; Gabaldón, Toni; Huerta-Cepas, Jaime; Berthommier, Guillaume; Ripp, Raymond; Wanders, Ronald J. A.; Poch, Olivier; Pujol, Aurora

    2007-01-01

    Peroxisomes are essential organelles of eukaryotic origin, ubiquitously distributed in cells and organisms, playing key roles in lipid and antioxidant metabolism. Loss or malfunction of peroxisomes causes more than 20 fatal inherited conditions. We have created a peroxisomal database () that includes the complete peroxisomal proteome of Homo sapiens and Saccharomyces cerevisiae, by gathering, updating and integrating the available genetic and functional information on peroxisomal genes. PeroxisomeDB is structured in interrelated sections ‘Genes’, ‘Functions’, ‘Metabolic pathways’ and ‘Diseases’, that include hyperlinks to selected features of NCBI, ENSEMBL and UCSC databases. We have designed graphical depictions of the main peroxisomal metabolic routes and have included updated flow charts for diagnosis. Precomputed BLAST, PSI-BLAST, multiple sequence alignment (MUSCLE) and phylogenetic trees are provided to assist in direct multispecies comparison to study evolutionary conserved functions and pathways. Highlights of the PeroxisomeDB include new tools developed for facilitating (i) identification of novel peroxisomal proteins, by means of identifying proteins carrying peroxisome targeting signal (PTS) motifs, (ii) detection of peroxisomes in silico, particularly useful for screening the deluge of newly sequenced genomes. PeroxisomeDB should contribute to the systematic characterization of the peroxisomal proteome and facilitate system biology approaches on the organelle. PMID:17135190

  2. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification.

    PubMed

    Reddy, T B K; Thomas, Alex D; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A; Kyrpides, Nikos C

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19,200 studies, 56,000 Biosamples, 56,000 sequencing projects and 39,400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

  3. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    PubMed Central

    Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402

  4. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    SciTech Connect

    Reddy, Tatiparthi B. K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2014-10-27

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Within this paper, we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. Lastly, GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

  5. openSputnik--a database to ESTablish comparative plant genomics using unsaturated sequence collections.

    PubMed

    Rudd, Stephen

    2005-01-01

    The public expressed sequence tag collections are continually being enriched with high-quality sequences that represent an ever-expanding range of taxonomically diverse plant species. While these sequence collections provide biased insight into the populations of expressed genes available within individual species and their associated tissues, the information is conceivably of wider relevance in a comparative context. When we consider the available expressed sequence tag (EST) collections of summer 2004, most of the major plant taxonomic clades are at least superficially represented. Investigation of the five million available plant ESTs provides a wealth of information that has applications in modelling the routes of plant genome evolution and the identification of lineage-specific genes and gene families. Over four million ESTs from over 50 distinct plant species have been collated within an EST analysis pipeline called openSputnik. The ESTs were resolved down into approximately one million unigene sequences. These have been annotated using orthology-based annotation transfer from reference plant genomes and using a variety of contemporary bioinformatics methods to assign peptide, structural and functional attributes. The openSputnik database is available at http://sputnik.btk.fi.

  6. Xenbase, the Xenopus model organism database; new virtualized system, data types and genomes

    PubMed Central

    Karpinka, J. Brad; Fortriede, Joshua D.; Burns, Kevin A.; James-Zorn, Christina; Ponferrada, Virgilio G.; Lee, Jacqueline; Karimi, Kamran; Zorn, Aaron M.; Vize, Peter D.

    2015-01-01

    Xenbase (http://www.xenbase.org), the Xenopus frog model organism database, integrates a wide variety of data from this biomedical model genus. Two closely related species are represented: the allotetraploid Xenopus laevis that is widely used for microinjection and tissue explant-based protocols, and the diploid Xenopus tropicalis which is used for genetics and gene targeting. The two species are extremely similar and protocols, reagents and results from each species are often interchangeable. Xenbase imports, indexes, curates and manages data from both species; all of which are mapped via unique IDs and can be queried in either a species-specific or species agnostic manner. All our services have now migrated to a private cloud to achieve better performance and reliability. We have added new content, including providing full support for morpholino reagents, used to inhibit mRNA translation or splicing and binding to regulatory microRNAs. New genomes assembled by the JGI for both species and are displayed in Gbrowse and are also available for searches using BLAST. Researchers can easily navigate from genome content to gene page reports, literature, experimental reagents and many other features using hyperlinks. Xenbase has also greatly expanded image content for figures published in papers describing Xenopus research via PubMedCentral. PMID:25313157

  7. Development of Database and Genomic Medicine for von Hippel-Lindau Disease in Japan

    PubMed Central

    TAKAYANAGI, Shunsaku; MUKASA, Akitake; NAKATOMI, Hirofumi; KANNO, Hiroshi; KURATSU, Jun-ichi; NISHIKAWA, Ryo; MISHIMA, Kazuhiko; NATSUME, Atushi; WAKABAYASHI, Toshihiko; HOUKIN, Kiyohiro; TERASAKA, Shunsuke; YAO, Masahiro; SHINOHARA, Nobuo; SHUIN, Taro; SAITO, Nobuhito

    2017-01-01

    von Hippel-Lindau (VHL) disease is a hereditary tumor disease in which tumors develop in multiple organs, not only as hemangioblastomas (HBs) in the central nervous system, but also as kidney tumors, pheochromocytomas, and so on. Much about the epidemiology of VHL disease remained unknown until fairly recently in Japan, leading to calls for the establishment of a VHL disease epidemiological database in Japanese. To elucidate its epidemiology in Japan, the Japanese Ministry of Health, Labour and Welfare created the VHL Disease Study Group, which was put in charge of carrying out a nationwide epidemiological survey. The survey found close to 400 Japanese VHL disease patients throughout the country. Based on those results, the VHL Disease Study Group created the VHL Disease Treatment Guideline and also a severity classification. It is thought that the prognosis of VHL disease patients can be improved by performing genetic diagnosis and careful follow-up. Accordingly, the University of Tokyo Hospital put in place an in-hospital system for implementing genomic medicine for VHL disease based on genetic diagnosis. For that system, it was especially important to establish (I) accurate genetic diagnostic techniques, (II) genetic counseling capabilities for the patients and their families, and (III) a system of cooperation among multiple departments, including urology departments, and so on. Further elucidation of the epidemiology and the development of genomic medicine are needed to improve the treatment results of VHL disease in Japan. PMID:28070114

  8. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases

    PubMed Central

    Caspi, Ron; Billington, Richard; Ferrer, Luciana; Foerster, Hartmut; Fulcher, Carol A.; Keseler, Ingrid M.; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A.; Ong, Quang; Paley, Suzanne; Subhraveti, Pallavi; Weaver, Daniel S.; Karp, Peter D.

    2016-01-01

    The MetaCyc database (MetaCyc.org) is a freely accessible comprehensive database describing metabolic pathways and enzymes from all domains of life. The majority of MetaCyc pathways are small-molecule metabolic pathways that have been experimentally determined. MetaCyc contains more than 2400 pathways derived from >46 000 publications, and is the largest curated collection of metabolic pathways. BioCyc (BioCyc.org) is a collection of 5700 organism-specific Pathway/Genome Databases (PGDBs), each containing the full genome and predicted metabolic network of one organism, including metabolites, enzymes, reactions, metabolic pathways, predicted operons, transport systems, and pathway-hole fillers. The BioCyc website offers a variety of tools for querying and analyzing PGDBs, including Omics Viewers and tools for comparative analysis. This article provides an update of new developments in MetaCyc and BioCyc during the last two years, including addition of Gibbs free energy values for compounds and reactions; redesign of the primary gene/protein page; addition of a tool for creating diagrams containing multiple linked pathways; several new search capabilities, including searching for genes based on sequence patterns, searching for databases based on an organism's phenotypes, and a cross-organism search; and a metabolite identifier translation service. PMID:26527732

  9. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases.

    PubMed

    Caspi, Ron; Billington, Richard; Ferrer, Luciana; Foerster, Hartmut; Fulcher, Carol A; Keseler, Ingrid M; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A; Ong, Quang; Paley, Suzanne; Subhraveti, Pallavi; Weaver, Daniel S; Karp, Peter D

    2016-01-04

    The MetaCyc database (MetaCyc.org) is a freely accessible comprehensive database describing metabolic pathways and enzymes from all domains of life. The majority of MetaCyc pathways are small-molecule metabolic pathways that have been experimentally determined. MetaCyc contains more than 2400 pathways derived from >46,000 publications, and is the largest curated collection of metabolic pathways. BioCyc (BioCyc.org) is a collection of 5700 organism-specific Pathway/Genome Databases (PGDBs), each containing the full genome and predicted metabolic network of one organism, including metabolites, enzymes, reactions, metabolic pathways, predicted operons, transport systems, and pathway-hole fillers. The BioCyc website offers a variety of tools for querying and analyzing PGDBs, including Omics Viewers and tools for comparative analysis. This article provides an update of new developments in MetaCyc and BioCyc during the last two years, including addition of Gibbs free energy values for compounds and reactions; redesign of the primary gene/protein page; addition of a tool for creating diagrams containing multiple linked pathways; several new search capabilities, including searching for genes based on sequence patterns, searching for databases based on an organism's phenotypes, and a cross-organism search; and a metabolite identifier translation service.

  10. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases

    PubMed Central

    Caspi, Ron; Altman, Tomer; Dale, Joseph M.; Dreher, Kate; Fulcher, Carol A.; Gilham, Fred; Kaipa, Pallavi; Karthikeyan, Athikkattuvalasu S.; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A.; Paley, Suzanne; Popescu, Liviu; Pujar, Anuradha; Shearer, Alexander G.; Zhang, Peifen; Karp, Peter D.

    2010-01-01

    The MetaCyc database (MetaCyc.org) is a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. With more than 1400 pathways, MetaCyc is the largest collection of metabolic pathways currently available. Pathways reactions are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes, and literature citations. BioCyc (BioCyc.org) is a collection of more than 500 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs also contain additional features, such as predicted operons, transport systems, and pathway hole-fillers. The BioCyc Web site offers several tools for the analysis of the PGDBs, including Omics Viewers that enable visualization of omics datasets on two different genome-scale diagrams and tools for comparative analysis. The BioCyc PGDBs generated by SRI are offered for adoption by any party interested in curation of metabolic, regulatory, and genome-related information about an organism. PMID:19850718

  11. Novel LanT Associated Lantibiotic Clusters Identified by Genome Database Mining

    PubMed Central

    Singh, Mangal; Sareen, Dipti

    2014-01-01

    Background Frequent use of antibiotics has led to the emergence of antibiotic resistance in bacteria. Lantibiotic compounds are ribosomally synthesized antimicrobial peptides against which bacteria are not able to produce resistance, hence making them a good alternative to antibiotics. Nisin is the oldest and the most widely used lantibiotic, in food preservation, without having developed any significant resistance against it. Having their antimicrobial potential and a limited number, there is a need to identify novel lantibiotics. Methodology/Findings Identification of novel lantibiotic biosynthetic clusters from an ever increasing database of bacterial genomes, can provide a major lead in this direction. In order to achieve this, a strategy was adopted to identify novel lantibiotic biosynthetic clusters by screening the sequenced genomes for LanT homolog, which is a conserved lantibiotic transporter specific to type IB clusters. This strategy resulted in identification of 54 bacterial strains containing the LanT homologs, which are not the known lantibiotic producers. Of these, 24 strains were subjected to a detailed bioinformatic analysis to identify genes encoding for precursor peptides, modification enzyme, immunity and quorum sensing proteins. Eight clusters having two LanM determinants, similar to haloduracin and lichenicidin were identified, along with 13 clusters having a single LanM determinant as in mersacidin biosynthetic cluster. Besides these, orphan LanT homologs were also identified which might be associated with novel bacteriocins, encoded somewhere else in the genome. Three identified gene clusters had a C39 domain containing LanT transporter, associated with the LanBC proteins and double glycine type precursor peptides, the only known example of such a cluster is that of salivaricin. Conclusion This study led to the identification of 8 novel putative two-component lantibiotic clusters along with 13 having a single LanM and 3 with LanBC genes

  12. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

    PubMed

    Pruitt, Kim D; Tatusova, Tatiana; Maglott, Donna R

    2005-01-01

    The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff.

  13. Uncovering the Genome-Wide Transcriptional Responses of the Filamentous Fungus Aspergillus niger to Lignocellulose Using RNA Sequencing

    PubMed Central

    Gaddipati, Sanyasi; Kokolski, Matthew; Malla, Sunir; Blythe, Martin J.; Ibbett, Roger; Campbell, Maria; Liddell, Susan; Aboobaker, Aziz; Tucker, Gregory A.; Archer, David B.

    2012-01-01

    A key challenge in the production of second generation biofuels is the conversion of lignocellulosic substrates into fermentable sugars. Enzymes, particularly those from fungi, are a central part of this process, and many have been isolated and characterised. However, relatively little is known of how fungi respond to lignocellulose and produce the enzymes necessary for dis-assembly of plant biomass. We studied the physiological response of the fungus Aspergillus niger when exposed to wheat straw as a model lignocellulosic substrate. Using RNA sequencing we showed that, 24 hours after exposure to straw, gene expression of known and presumptive plant cell wall–degrading enzymes represents a huge investment for the cells (about 20% of the total mRNA). Our results also uncovered new esterases and surface interacting proteins that might form part of the fungal arsenal of enzymes for the degradation of plant biomass. Using transcription factor deletion mutants (xlnR and creA) to study the response to both lignocellulosic substrates and low carbon source concentrations, we showed that a subset of genes coding for degradative enzymes is induced by starvation. Our data support a model whereby this subset of enzymes plays a scouting role under starvation conditions, testing for available complex polysaccharides and liberating inducing sugars, that triggers the subsequent induction of the majority of hydrolases. We also showed that antisense transcripts are abundant and that their expression can be regulated by growth conditions. PMID:22912594

  14. The new modern era of yeast genomics: community sequencing and the resulting annotation of multiple Saccharomyces cerevisiae strains at the Saccharomyces Genome Database

    PubMed Central

    Engel, Stacia R.; Cherry, J. Michael

    2013-01-01

    The first completed eukaryotic genome sequence was that of the yeast Saccharomyces cerevisiae, and the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the original model organism database. SGD remains the authoritative community resource for the S. cerevisiae reference genome sequence and its annotation, and continues to provide comprehensive biological information correlated with S. cerevisiae genes and their products. A diverse set of yeast strains have been sequenced to explore commercial and laboratory applications, and a brief history of those strains is provided. The publication of these new genomes has motivated the creation of new tools, and SGD will annotate and provide comparative analyses of these sequences, correlating changes with variations in strain phenotypes and protein function. We are entering a new era at SGD, as we incorporate these new sequences and make them accessible to the scientific community, all in an effort to continue in our mission of educating researchers and facilitating discovery. Database URL: http://www.yeastgenome.org/ PMID:23487186

  15. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases

    PubMed Central

    Caspi, Ron; Altman, Tomer; Dreher, Kate; Fulcher, Carol A.; Subhraveti, Pallavi; Keseler, Ingrid M.; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A.; Ong, Quang; Paley, Suzanne; Pujar, Anuradha; Shearer, Alexander G.; Travers, Michael; Weerasinghe, Deepika; Zhang, Peifen; Karp, Peter D.

    2012-01-01

    The MetaCyc database (http://metacyc.org/) provides a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. MetaCyc contains more than 1800 pathways derived from more than 30 000 publications, and is the largest curated collection of metabolic pathways currently available. Most reactions in MetaCyc pathways are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes and literature citations. BioCyc (http://biocyc.org/) is a collection of more than 1700 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference database, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs contain additional features, including predicted operons, transport systems and pathway-hole fillers. The BioCyc website and Pathway Tools software offer many tools for querying and analysis of PGDBs, including Omics Viewers and comparative analysis. New developments include a zoomable web interface for diagrams; flux-balance analysis model generation from PGDBs; web services; and a new tool called Web Groups. PMID:22102576

  16. The Biofuel Feedstock Genomics Resource: a web-based portal and database to enable functional genomics of plant biofuel feedstock species.

    PubMed

    Childs, Kevin L; Konganti, Kranti; Buell, C Robin

    2012-01-01

    Major feedstock sources for future biofuel production are likely to be high biomass producing plant species such as poplar, pine, switchgrass, sorghum and maize. One active area of research in these species is genome-enabled improvement of lignocellulosic biofuel feedstock quality and yield. To facilitate genomic-based investigations in these species, we developed the Biofuel Feedstock Genomic Resource (BFGR), a database and web-portal that provides high-quality, uniform and integrated functional annotation of gene and transcript assembly sequences from species of interest to lignocellulosic biofuel feedstock researchers. The BFGR includes sequence data from 54 species and permits researchers to view, analyze and obtain annotation at the gene, transcript, protein and genome level. Annotation of biochemical pathways permits the identification of key genes and transcripts central to the improvement of lignocellulosic properties in these species. The integrated nature of the BFGR in terms of annotation methods, orthologous/paralogous relationships and linkage to seven species with complete genome sequences allows comparative analyses for biofuel feedstock species with limited sequence resources. Database URL: http://bfgr.plantbiology.msu.edu.

  17. Genome-wide transcriptional response of Trichoderma reesei to lignocellulose using RNA sequencing and comparison with Aspergillus niger

    PubMed Central

    2013-01-01

    Background A major part of second generation biofuel production is the enzymatic saccharification of lignocellulosic biomass into fermentable sugars. Many fungi produce enzymes that can saccarify lignocellulose and cocktails from several fungi, including well-studied species such as Trichoderma reesei and Aspergillus niger, are available commercially for this process. Such commercially-available enzyme cocktails are not necessarily representative of the array of enzymes used by the fungi themselves when faced with a complex lignocellulosic material. The global induction of genes in response to exposure of T. reesei to wheat straw was explored using RNA-seq and compared to published RNA-seq data and model of how A. niger senses and responds to wheat straw. Results In T. reesei, levels of transcript that encode known and predicted cell-wall degrading enzymes were very high after 24 h exposure to straw (approximately 13% of the total mRNA) but were less than recorded in A. niger (approximately 19% of the total mRNA). Closer analysis revealed that enzymes from the same glycoside hydrolase families but different carbohydrate esterase and polysaccharide lyase families were up-regulated in both organisms. Accessory proteins which have been hypothesised to possibly have a role in enhancing carbohydrate deconstruction in A. niger were also uncovered in T. reesei and categories of enzymes induced were in general similar to those in A. niger. Similarly to A. niger, antisense transcripts are present in T. reesei and their expression is regulated by the growth condition. Conclusions T. reesei uses a similar array of enzymes, for the deconstruction of a solid lignocellulosic substrate, to A. niger. This suggests a conserved strategy towards lignocellulose degradation in both saprobic fungi. This study provides a basis for further analysis and characterisation of genes shown to be highly induced in the presence of a lignocellulosic substrate. The data will help to elucidate the

  18. PGAdb-builder: A web service tool for creating pan-genome allele database for molecular fine typing

    PubMed Central

    Liu, Yen-Yi; Chiou, Chien-Shun; Chen, Chih-Chieh

    2016-01-01

    With the advance of next generation sequencing techniques, whole genome sequencing (WGS) is expected to become the optimal method for molecular subtyping of bacterial isolates. To use WGS as a general subtyping method for disease outbreak investigation and surveillance, the layout of WGS-based typing must be comparable among laboratories. Whole genome multilocus sequence typing (wgMLST) is an approach that achieves this requirement. To apply wgMLST as a standard subtyping approach, a pan-genome allele database (PGAdb) for the population of a bacterial organism must first be established. We present a free web service tool, PGAdb-builder (http://wgmlstdb.imst.nsysu.edu.tw), for the construction of bacterial PGAdb. The effectiveness of PGAdb-builder was tested by constructing a pan-genome allele database for Salmonella enterica serovar Typhimurium, with the database being applied to create a wgMLST tree for a panel of epidemiologically well-characterized S. Typhimurium isolates. The performance of the wgMLST-based approach was as high as that of the SNP-based approach in Leekitcharoenphon’s study used for discerning among epidemiologically related and non-related isolates. PMID:27824078

  19. A Genome-Scale Database and Reconstruction of Caenorhabditis elegans Metabolism.

    PubMed

    Gebauer, Juliane; Gentsch, Christoph; Mansfeld, Johannes; Schmeißer, Kathrin; Waschina, Silvio; Brandes, Susanne; Klimmasch, Lukas; Zamboni, Nicola; Zarse, Kim; Schuster, Stefan; Ristow, Michael; Schäuble, Sascha; Kaleta, Christoph

    2016-05-25

    We present a genome-scale model of Caenorhabditis elegans metabolism along with the public database ElegCyc (http://elegcyc.bioinf.uni-jena.de:1100), which represents a reference for metabolic pathways in the worm and allows for the visualization as well as analysis of omics datasets. Our model reflects the metabolic peculiarities of C. elegans that make it distinct from other higher eukaryotes and mammals, including mice and humans. We experimentally verify one of these peculiarities by showing that the lifespan-extending effect of L-tryptophan supplementation is dose dependent (hormetic). Finally, we show the utility of our model for analyzing omics datasets through predicting changes in amino acid concentrations after genetic perturbations and analyzing metabolic changes during normal aging as well as during two distinct, reactive oxygen species (ROS)-related lifespan-extending treatments. Our analyses reveal a notable similarity in metabolic adaptation between distinct lifespan-extending interventions and point to key pathways affecting lifespan in nematodes.

  20. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects

    PubMed Central

    2011-01-01

    Background Second-generation sequencing technologies are precipitating major shifts with regards to what kinds of genomes are being sequenced and how they are annotated. While the first generation of genome projects focused on well-studied model organisms, many of today's projects involve exotic organisms whose genomes are largely terra incognita. This complicates their annotation, because unlike first-generation projects, there are no pre-existing 'gold-standard' gene-models with which to train gene-finders. Improvements in genome assembly and the wide availability of mRNA-seq data are also creating opportunities to update and re-annotate previously published genome annotations. Today's genome projects are thus in need of new genome annotation tools that can meet the challenges and opportunities presented by second-generation sequencing technologies. Results We present MAKER2, a genome annotation and data management tool designed for second-generation genome projects. MAKER2 is a multi-threaded, parallelized application that can process second-generation datasets of virtually any size. We show that MAKER2 can produce accurate annotations for novel genomes where training-data are limited, of low quality or even non-existent. MAKER2 also provides an easy means to use mRNA-seq data to improve annotation quality; and it can use these data to update legacy annotations, significantly improving their quality. We also show that MAKER2 can evaluate the quality of genome annotations, and identify and prioritize problematic annotations for manual review. Conclusions MAKER2 is the first annotation engine specifically designed for second-generation genome projects. MAKER2 scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets. PMID:22192575

  1. Genome-Wide Transcriptome Analysis of Cotton (Gossypium hirsutum L.) Identifies Candidate Gene Signatures in Response to Aflatoxin Producing Fungus Aspergillus flavus.

    PubMed

    Bedre, Renesh; Rajasekaran, Kanniah; Mangu, Venkata Ramanarao; Sanchez Timm, Luis Eduardo; Bhatnagar, Deepak; Baisakh, Niranjan

    2015-01-01

    Aflatoxins are toxic and potent carcinogenic metabolites produced from the fungi Aspergillus flavus and A. parasiticus. Aflatoxins can contaminate cottonseed under conducive preharvest and postharvest conditions. United States federal regulations restrict the use of aflatoxin contaminated cottonseed at >20 ppb for animal feed. Several strategies have been proposed for controlling aflatoxin contamination, and much success has been achieved by the application of an atoxigenic strain of A. flavus in cotton, peanut and maize fields. Development of cultivars resistant to aflatoxin through overexpression of resistance associated genes and/or knocking down aflatoxin biosynthesis of A. flavus will be an effective strategy for controlling aflatoxin contamination in cotton. In this study, genome-wide transcriptome profiling was performed to identify differentially expressed genes in response to infection with both toxigenic and atoxigenic strains of A. flavus on cotton (Gossypium hirsutum L.) pericarp and seed. The genes involved in antifungal response, oxidative burst, transcription factors, defense signaling pathways and stress response were highly differentially expressed in pericarp and seed tissues in response to A. flavus infection. The cell-wall modifying genes and genes involved in the production of antimicrobial substances were more active in pericarp as compared to seed. The genes involved in auxin and cytokinin signaling were also induced. Most of the genes involved in defense response in cotton were highly induced in pericarp than in seed. The global gene expression analysis in response to fungal invasion in cotton will serve as a source for identifying biomarkers for breeding, potential candidate genes for transgenic manipulation, and will help in understanding complex plant-fungal interaction for future downstream research.

  2. Genome-wide transcriptome analysis of cotton (Gossypium hirsutum L.) identifies candidate gene signatures in response to aflatoxin producing fungus Aspergillus flavus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxins are toxic metabolites and potent carcinogen produced from asexual fungi Aspergillus flavus and A. parasiticus. Aflatoxins can contaminate cottonseed under conducive preharvest and postharvest conditions. U.S. federal regulations restrict the use of aflatoxin contaminated cottonseed at >20...

  3. A Genome-Wide Survey of the Microsatellite Content of the Globe Artichoke Genome and the Development of a Web-Based Database.

    PubMed

    Portis, Ezio; Portis, Flavio; Valente, Luisa; Moglia, Andrea; Barchi, Lorenzo; Lanteri, Sergio; Acquadro, Alberto

    2016-01-01

    The recently acquired genome sequence of globe artichoke (Cynara cardunculus var. scolymus) has been used to catalog the genome's content of simple sequence repeat (SSR) markers. More than 177,000 perfect SSRs were revealed, equivalent to an overall density across the genome of 244.5 SSRs/Mbp, but some 224,000 imperfect SSRs were also identified. About 21% of these SSRs were complex (two stretches of repeats separated by <100 nt). Some 73% of the SSRs were composed of dinucleotide motifs. The SSRs were categorized for the numbers of repeats present, their overall length and were allocated to their linkage group. A total of 4,761 perfect and 6,583 imperfect SSRs were present in 3,781 genes (14.11% of the total), corresponding to an overall density across the gene space of 32,5 and 44,9 SSRs/Mbp for perfect and imperfect motifs, respectively. A putative function has been assigned, using the gene ontology approach, to the set of genes harboring at least one SSR. The same search parameters were applied to reveal the SSR content of 14 other plant species for which genome sequence is available. Certain species-specific SSR motifs were identified, along with a hexa-nucleotide motif shared only with the other two Compositae species (sunflower (Helianthus annuus) and horseweed (Conyza canadensis)) included in the study. Finally, a database, called "Cynara cardunculus MicroSatellite DataBase" (CyMSatDB) was developed to provide a searchable interface to the SSR data. CyMSatDB facilitates the retrieval of SSR markers, as well as suggested forward and reverse primers, on the basis of genomic location, genomic vs genic context, perfect vs imperfect repeat, motif type, motif sequence and repeat number. The SSR markers were validated via an in silico based PCR analysis adopting two available assembled transcriptomes, derived from contrasting globe artichoke accessions, as templates.

  4. TRIPATH: A Biological Genetic and Genomic Database of Three Economically Important Fungal Pathogen of Wheat – Rust: Smut: Bunt

    PubMed Central

    Garg, Swati; Pandey, Dinesh; Taj, Gohar; Goel, Anshita; Kumar, Anil

    2014-01-01

    Wheat, the major source of vegetable protein in human diet, provides staple food globally for a large proportion of the human population. With higher protein content than other major cereals, wheat has great socio- economic importance. Nonetheless for wheat, three important fungal pathogens i.e. rust, smut and bunt are major cause of significant yield losses throughout the world. Researchers are putting up a strong fight against devastating wheat pathogens, and have made progress in tracking and controlling disease outbreaks from East Africa to South Asia. The aim of the present work hence was to develop a fungal pathogens database dedicated to wheat, gathering information about different pathogen species and linking them to their biological classification, distribution and control. Towards this end, we developed an open access database Tripath: A biological, genetic and genomic database of economically important wheat fungal pathogens – rust: smut: bunt. Data collected from peer-reviewed publications and fungal pathogens were added to the customizable database through an extended relational design. The strength of this resource is in providing rapid retrieval of information from large volumes of text at a high degree of accuracy. Database TRIPATH is freely accessible. Availability http://www.gbpuat-cbsh.ac.in/departments/bi/database/tripath/ PMID:25187689

  5. The Human Oral Microbiome Database: a web accessible resource for investigating oral microbe taxonomic and genomic information

    PubMed Central

    Chen, Tsute; Yu, Wen-Han; Izard, Jacques; Baranova, Oxana V.; Lakshmanan, Abirami; Dewhirst, Floyd E.

    2010-01-01

    The human oral microbiome is the most studied human microflora, but 53% of the species have not yet been validly named and 35% remain uncultivated. The uncultivated taxa are known primarily from 16S rRNA sequence information. Sequence information tied solely to obscure isolate or clone numbers, and usually lacking accurate phylogenetic placement, is a major impediment to working with human oral microbiome data. The goal of creating the Human Oral Microbiome Database (HOMD) is to provide the scientific community with a body site-specific comprehensive database for the more than 600 prokaryote species that are present in the human oral cavity based on a curated 16S rRNA gene-based provisional naming scheme. Currently, two primary types of information are provided in HOMD—taxonomic and genomic. Named oral species and taxa identified from 16S rRNA gene sequence analysis of oral isolates and cloning studies were placed into defined 16S rRNA phylotypes and each given unique Human Oral Taxon (HOT) number. The HOT interlinks phenotypic, phylogenetic, genomic, clinical and bibliographic information for each taxon. A BLAST search tool is provided to match user 16S rRNA gene sequences to a curated, full length, 16S rRNA gene reference data set. For genomic analysis, HOMD provides comprehensive set of analysis tools and maintains frequently updated annotations for all the human oral microbial genomes that have been sequenced and publicly released. Oral bacterial genome sequences, determined as part of the Human Microbiome Project, are being added to the HOMD as they become available. We provide HOMD as a conceptual model for the presentation of microbiome data for other human body sites. Database URL: http://www.homd.org PMID:20624719

  6. The TTSMI database: a catalog of triplex target DNA sites associated with genes and regulatory elements in the human genome.

    PubMed

    Jenjaroenpun, Piroon; Chew, Chee Siang; Yong, Tai Pang; Choowongkomon, Kiattawee; Thammasorn, Wimada; Kuznetsov, Vladimir A

    2015-01-01

    A triplex target DNA site (TTS), a stretch of DNA that is composed of polypurines, is able to form a triple-helix (triplex) structure with triplex-forming oligonucleotides (TFOs) and is able to influence the site-specific modulation of gene expression and/or the modification of genomic DNA. The co-localization of a genomic TTS with gene regulatory signals and functional genome structures suggests that TFOs could potentially be exploited in antigene strategies for the therapy of cancers and other genetic diseases. Here, we present the TTS Mapping and Integration (TTSMI; http://ttsmi.bii.a-star.edu.sg) database, which provides a catalog of unique TTS locations in the human genome and tools for analyzing the co-localization of TTSs with genomic regulatory sequences and signals that were identified using next-generation sequencing techniques and/or predicted by computational models. TTSMI was designed as a user-friendly tool that facilitates (i) fast searching/filtering of TTSs using several search terms and criteria associated with sequence stability and specificity, (ii) interactive filtering of TTSs that co-localize with gene regulatory signals and non-B DNA structures, (iii) exploration of dynamic combinations of the biological signals of specific TTSs and (iv) visualization of a TTS simultaneously with diverse annotation tracks via the UCSC genome browser.

  7. Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits

    PubMed Central

    Dessimoz, Christophe; Boeckmann, Brigitte; Roth, Alexander C. J.; Gonnet, Gaston H.

    2006-01-01

    Correct orthology assignment is a critical prerequisite of numerous comparative genomics procedures, such as function prediction, construction of phylogenetic species trees and genome rearrangement analysis. We present an algorithm for the detection of non-orthologs that arise by mistake in current orthology classification methods based on genome-specific best hits, such as the COGs database. The algorithm works with pairwise distance estimates, rather than computationally expensive and error-prone tree-building methods. The accuracy of the algorithm is evaluated through verification of the distribution of predicted cases, case-by-case phylogenetic analysis and comparisons with predictions from other projects using independent methods. Our results show that a very significant fraction of the COG groups include non-orthologs: using conservative parameters, the algorithm detects non-orthology in a third of all COG groups. Consequently, sequence analysis sensitive to correct orthology assignments will greatly benefit from these findings. PMID:16835308

  8. Use of a Drosophila Genome-Wide Conserved Sequence Database to Identify Functionally Related cis-Regulatory Enhancers

    PubMed Central

    Brody, Thomas; Yavatkar, Amarendra S; Kuzin, Alexander; Kundu, Mukta; Tyson, Leonard J; Ross, Jermaine; Lin, Tzu-Yang; Lee, Chi-Hon; Awasaki, Takeshi; Lee, Tzumin; Odenwald, Ward F

    2012-01-01

    Background: Phylogenetic footprinting has revealed that cis-regulatory enhancers consist of conserved DNA sequence clusters (CSCs). Currently, there is no systematic approach for enhancer discovery and analysis that takes full-advantage of the sequence information within enhancer CSCs. Results: We have generated a Drosophila genome-wide database of conserved DNA consisting of >100,000 CSCs derived from EvoPrints spanning over 90% of the genome. cis-Decoder database search and alignment algorithms enable the discovery of functionally related enhancers. The program first identifies conserved repeat elements within an input enhancer and then searches the database for CSCs that score highly against the input CSC. Scoring is based on shared repeats as well as uniquely shared matches, and includes measures of the balance of shared elements, a diagnostic that has proven to be useful in predicting cis-regulatory function. To demonstrate the utility of these tools, a temporally-restricted CNS neuroblast enhancer was used to identify other functionally related enhancers and analyze their structural organization. Conclusions: cis-Decoder reveals that co-regulating enhancers consist of combinations of overlapping shared sequence elements, providing insights into the mode of integration of multiple regulating transcription factors. The database and accompanying algorithms should prove useful in the discovery and analysis of enhancers involved in any developmental process. Developmental Dynamics 241:169–189, 2012. © 2011 Wiley Periodicals, Inc. Key findings A genome-wide catalog of Drosophila conserved DNA sequence clusters. cis-Decoder discovers functionally related enhancers. Functionally related enhancers share balanced sequence element copy numbers. Many enhancers function during multiple phases of development. PMID:22174086

  9. Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases.

    PubMed

    Sanderson, Lacey-Anne; Ficklin, Stephen P; Cheng, Chun-Huai; Jung, Sook; Feltus, Frank A; Bett, Kirstin E; Main, Dorrie

    2013-01-01

    Tripal is an open-source freely available toolkit for construction of online genomic and genetic databases. It aims to facilitate development of community-driven biological websites by integrating the GMOD Chado database schema with Drupal, a popular website creation and content management software. Tripal provides a suite of tools for interaction with a Chado database and display of content therein. The tools are designed to be generic to support the various ways in which data may be stored in Chado. Previous releases of Tripal have supported organisms, genomic libraries, biological stocks, stock collections and genomic features, their alignments and annotations. Also, Tripal and its extension modules provided loaders for commonly used file formats such as FASTA, GFF, OBO, GAF, BLAST XML, KEGG heir files and InterProScan XML. Default generic templates were provided for common views of biological data, which could be customized using an open Application Programming Interface to change the way data are displayed. Here, we report additional tools and functionality that are part of release v1.1 of Tripal. These include (i) a new bulk loader that allows a site curator to import data stored in a custom tab delimited format; (ii) full support of every Chado table for Drupal Views (a powerful tool allowing site developers to construct novel displays and search pages); (iii) new modules including 'Feature Map', 'Genetic', 'Publication', 'Project', 'Contact' and the 'Natural Diversity' modules. Tutorials, mailing lists, download and set-up instructions, extension modules and other documentation can be found at the Tripal website located at http://tripal.info. DATABASE URL: http://tripal.info/.

  10. The FlyBase database of the Drosophila genome projects andcommunity literature

    SciTech Connect

    Gelbart, William; Bayraktaroglu, Leyla; Bettencourt, Brian; Campbell, Kathy; Crosby, Madeline; Emmert, David; Hradecky, Pavel; Huang,Yanmei; Letovsky, Stan; Matthews, Beverly; Russo, Susan; Schroeder,Andrew; Smutniak, Frank; Zhou, Pinglei; Zytkovicz, Mark; Ashburner,Michael; Drysdale, Rachel; de Grey, Aubrey; Foulger, Rebecca; Millburn,Gillian; Yamada, Chihiro; Kaufman, Thomas; Matthews, Kathy; Gilbert, Don; Grumbling, Gary; Strelets, Victor; Shemen, C.; Rubin, Gerald; Berman,Brian; Frise, Erwin; Gibson, Mark; Harris, Nomi; Kaminker, Josh; Lewis,Suzanna; Marshall, Brad; Misra, Sima; Mungall, Christopher; Prochnik,Simon; Richter, John; Smith, Christopher; Shu, ShengQiang; Tupy,Jonathan; Wiel, Colin

    2002-09-16

    FlyBase (http://flybase.bio.indiana.edu/) provides an integrated view of the fundamental genomic and genetic data on the major genetic model Drosophila melanogaster and related species. FlyBase has primary responsibility for the continual reannotation of the D.melanogaster genome. The ultimate goal of the reannotation effort is to decorate the euchromatic sequence of the genome with as much biological information as is available from the community and from the major genome project centers. A complete revision of the annotations of the now-finished euchromatic genomic sequence has been completed. There are many points of entry to the genome within FlyBase, most notably through maps, gene products and ontologies, structured phenotypic and gene expression data, and anatomy.

  11. OrthoMaM v8: a database of orthologous exons and coding sequences for comparative genomics in mammals.

    PubMed

    Douzery, Emmanuel J P; Scornavacca, Celine; Romiguier, Jonathan; Belkhir, Khalid; Galtier, Nicolas; Delsuc, Frédéric; Ranwez, Vincent

    2014-07-01

    Comparative genomic studies extensively rely on alignments of orthologous sequences. Yet, selecting, gathering, and aligning orthologous exons and protein-coding sequences (CDS) that are relevant for a given evolutionary analysis can be a difficult and time-consuming task. In this context, we developed OrthoMaM, a database of ORTHOlogous MAmmalian Markers describing the evolutionary dynamics of orthologous genes in mammalian genomes using a phylogenetic framework. Since its first release in 2007, OrthoMaM has regularly evolved, not only to include newly available genomes but also to incorporate up-to-date software in its analytic pipeline. This eighth release integrates the 40 complete mammalian genomes available in Ensembl v73 and provides alignments, phylogenies, evolutionary descriptor information, and functional annotations for 13,404 single-copy orthologous CDS and 6,953 long exons. The graphical interface allows to easily explore OrthoMaM to identify markers with specific characteristics (e.g., taxa availability, alignment size, %G+C, evolutionary rate, chromosome location). It hence provides an efficient solution to sample preprocessed markers adapted to user-specific needs. OrthoMaM has proven to be a valuable resource for researchers interested in mammalian phylogenomics, evolutionary genomics, and has served as a source of benchmark empirical data sets in several methodological studies. OrthoMaM is available for browsing, query and complete or filtered downloads at http://www.orthomam.univ-montp2.fr/.

  12. Fungal plant cell wall-degrading enzyme database: a platform for comparative and evolutionary genomics in fungi and Oomycetes

    PubMed Central

    2013-01-01

    Background Plant cell wall-degrading enzymes (PCWDEs) play significant roles throughout the fungal life including acquisition of nutrients and decomposition of plant cell walls. In addition, many of PCWDEs are also utilized by biofuel and pulp industries. In order to develop a comparative genomics platform focused in fungal PCWDEs and provide a resource for evolutionary studies, Fungal PCWDE Database (FPDB) is constructed (http://pcwde.riceblast.snu.ac.kr/). Results In order to archive fungal PCWDEs, 22 sequence profiles were constructed and searched on 328 genomes of fungi, Oomycetes, plants and animals. A total of 6,682 putative genes encoding PCWDEs were predicted, showing differential distribution by their life styles, host ranges and taxonomy. Genes known to be involved in fungal pathogenicity, including polygalacturonase (PG) and pectin lyase, were enriched in plant pathogens. Furthermore, crop pathogens had more PCWDEs than those of rot fungi, implying that the PCWDEs analysed in this study are more needed for invading plant hosts than wood-decaying processes. Evolutionary analysis of PGs in 34 selected genomes revealed that gene duplication and loss events were mainly driven by taxonomic divergence and partly contributed by those events in species-level, especially in plant pathogens. Conclusions The FPDB would provide a fungi-specialized genomics platform, a resource for evolutionary studies of PCWDE gene families and extended analysis option by implementing Favorite, which is a data exchange and analysis hub built in Comparative Fungal Genomics Platform (CFGP 2.0; http://cfgp.snu.ac.kr/). PMID:24564786

  13. VitisExpDB: A Database Resource for Grape Functional Genomics

    Technology Transfer Automated Retrieval System (TEKTRAN)

    VitisExpDB is an online MySQL-PHP driven relational database that houses annotated EST and gene expression data for Vitis vinifera and non-vinifera grape varieties. Currently, the database stores ~320,000 EST sequences derived from 8 species/hybrids, their annotation details and gene ontology based...

  14. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

    PubMed

    Pruitt, Kim D; Tatusova, Tatiana; Maglott, Donna R

    2007-01-01

    NCBI's reference sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. The database includes 3774 organisms spanning prokaryotes, eukaryotes and viruses, and has records for 2,879,860 proteins (RefSeq release 19). RefSeq records integrate information from multiple sources, when additional data are available from those sources and therefore represent a current description of the sequence and its features. Annotations include coding regions, conserved domains, tRNAs, sequence tagged sites (STS), variation, references, gene and protein product names, and database cross-references. Sequence is reviewed and features are added using a combined approach of collaboration and other input from the scientific community, prediction, propagation from GenBank and curation by NCBI staff. The format of all RefSeq records is validated, and an increasing number of tests are being applied to evaluate the quality of sequence and annotation, especially in the context of complete genomic sequence.

  15. TransportDB 2.0: a database for exploring membrane transporters in sequenced genomes from all domains of life.

    PubMed

    Elbourne, Liam D H; Tetu, Sasha G; Hassan, Karl A; Paulsen, Ian T

    2017-01-04

    All cellular life contains an extensive array of membrane transport proteins. The vast majority of these transporters have not been experimentally characterized. We have developed a bioinformatic pipeline to identify and annotate complete sets of transporters in any sequenced genome. This pipeline is now fully automated enabling it to better keep pace with the accelerating rate of genome sequencing. This manuscript describes TransportDB 2.0 (http://www.membranetransport.org/transportDB2/), a completely updated version of TransportDB, which provides access to the large volumes of data generated by our automated transporter annotation pipeline. The TransportDB 2.0 web portal has been rebuilt to utilize contemporary JavaScript libraries, providing a highly interactive interface to the annotation information, and incorporates analysis tools that enable users to query the database on a number of levels. For example, TransportDB 2.0 includes tools that allow users to select annotated genomes of interest from the thousands of species held in the database and compare their complete transporter complements.

  16. TransportDB 2.0: a database for exploring membrane transporters in sequenced genomes from all domains of life

    PubMed Central

    Elbourne, Liam D. H.; Tetu, Sasha G.; Hassan, Karl A.; Paulsen, Ian T.

    2017-01-01

    All cellular life contains an extensive array of membrane transport proteins. The vast majority of these transporters have not been experimentally characterized. We have developed a bioinformatic pipeline to identify and annotate complete sets of transporters in any sequenced genome. This pipeline is now fully automated enabling it to better keep pace with the accelerating rate of genome sequencing. This manuscript describes TransportDB 2.0 (http://www.membranetransport.org/transportDB2/), a completely updated version of TransportDB, which provides access to the large volumes of data generated by our automated transporter annotation pipeline. The TransportDB 2.0 web portal has been rebuilt to utilize contemporary JavaScript libraries, providing a highly interactive interface to the annotation information, and incorporates analysis tools that enable users to query the database on a number of levels. For example, TransportDB 2.0 includes tools that allow users to select annotated genomes of interest from the thousands of species held in the database and compare their complete transporter complements. PMID:27899676

  17. Genome-related datasets within the E. coli Genetic Stock Center database.

    PubMed Central

    Berlyn, M B; Letovsky, S

    1992-01-01

    The contents of the E. coli Genetic Stock Center database and the availability in electronic form of the subset of information most relevant to sequence databases are described. The database uses the long-standing Stock Center records (developed and curated by Dr B.J.Bachmann) in describing genotypes of mutant derivatives of E.coli K-12 in terms of alleles, structural mutations, mating type, and plasmids as well as the derivation, names and originators of the strain, and references. The database includes descriptions of mutations, mutation properties, genes, gene properties, and gene products, with EC number identifiers for enzymes. Sequence information is not included, but entries refer to sequence database accession numbers for sequenced regions. A gene is described as a subtype of a more general category of chromosome interval called Site. Since sites are used to describe any chromosomal interval, mapping information is associated with sites. Alleles are described as mutations of those sites and they are not primary map objects, but inherit map position information from the corresponding site description. The database design is intended to preserve richness of detail where it is known and uncertainty of measurements or information as it occurs in order to represent the stock center records as accurately as possible. PMID:1475178

  18. Practical Value of Food Pathogen Traceability through Building a Whole-Genome Sequencing Network and Database

    PubMed Central

    Strain, Errol; Melka, David; Bunning, Kelly; Musser, Steven M.; Brown, Eric W.; Timme, Ruth

    2016-01-01

    The FDA has created a United States-based open-source whole-genome sequencing network of state, federal, international, and commercial partners. The GenomeTrakr network represents a first-of-its-kind distributed genomic food shield for characterizing and tracing foodborne outbreak pathogens back to their sources. The GenomeTrakr network is leading investigations of outbreaks of foodborne illnesses and compliance actions with more accurate and rapid recalls of contaminated foods as well as more effective monitoring of preventive controls for food manufacturing environments. An expanded network would serve to provide an international rapid surveillance system for pathogen traceback, which is critical to support an effective public health response to bacterial outbreaks. PMID:27008877

  19. A Genome-Wide Survey of the Microsatellite Content of the Globe Artichoke Genome and the Development of a Web-Based Database

    PubMed Central

    Portis, Ezio; Portis, Flavio; Valente, Luisa; Moglia, Andrea; Barchi, Lorenzo; Lanteri, Sergio; Acquadro, Alberto

    2016-01-01

    The recently acquired genome sequence of globe artichoke (Cynara cardunculus var. scolymus) has been used to catalog the genome’s content of simple sequence repeat (SSR) markers. More than 177,000 perfect SSRs were revealed, equivalent to an overall density across the genome of 244.5 SSRs/Mbp, but some 224,000 imperfect SSRs were also identified. About 21% of these SSRs were complex (two stretches of repeats separated by <100 nt). Some 73% of the SSRs were composed of dinucleotide motifs. The SSRs were categorized for the numbers of repeats present, their overall length and were allocated to their linkage group. A total of 4,761 perfect and 6,583 imperfect SSRs were present in 3,781 genes (14.11% of the total), corresponding to an overall density across the gene space of 32,5 and 44,9 SSRs/Mbp for perfect and imperfect motifs, respectively. A putative function has been assigned, using the gene ontology approach, to the set of genes harboring at least one SSR. The same search parameters were applied to reveal the SSR content of 14 other plant species for which genome sequence is available. Certain species-specific SSR motifs were identified, along with a hexa-nucleotide motif shared only with the other two Compositae species (sunflower (Helianthus annuus) and horseweed (Conyza canadensis)) included in the study. Finally, a database, called “Cynara cardunculus MicroSatellite DataBase” (CyMSatDB) was developed to provide a searchable interface to the SSR data. CyMSatDB facilitates the retrieval of SSR markers, as well as suggested forward and reverse primers, on the basis of genomic location, genomic vs genic context, perfect vs imperfect repeat, motif type, motif sequence and repeat number. The SSR markers were validated via an in silico based PCR analysis adopting two available assembled transcriptomes, derived from contrasting globe artichoke accessions, as templates. PMID:27648830

  20. Genome-Wide Enzyme Annotation with Precision Control: Catalytic Families (CatFam) Databases

    DTIC Science & Technology

    2008-01-01

    classification, decision trees , association rules, neural networks,18 and support vector machines,19,20 to classify protein catalytic func- tions using various...genomes To evaluate the performance of CatFam for whole ge- nome annotation, we select two Yersinia genomes [Y. pes- tis mediaevails (ypm) and Y...pestis and F. tularensis Organism Annotation Number of enzyme-catalyzed reactions Number of predicted pathways Number of pathways with holes Yersinia

  1. Aspergillus niger contains the cryptic phylogenetic species A. awamori.

    PubMed

    Perrone, Giancarlo; Stea, Gaetano; Epifani, Filomena; Varga, János; Frisvad, Jens C; Samson, Robert A

    2011-11-01

    Aspergillus section Nigri is an important group of species for food and medical mycology, and biotechnology. The Aspergillus niger 'aggregate' represents its most complicated taxonomic subgroup containing eight morphologically indistinguishable taxa: A. niger, Aspergillus tubingensis, Aspergillus acidus, Aspergillus brasiliensis, Aspergillus costaricaensis, Aspergillus lacticoffeatus, Aspergillus piperis, and Aspergillus vadensis. Aspergillus awamori, first described by Nakazawa, has been compared taxonomically with other black aspergilli and recently it has been treated as a synonym of A. niger. Phylogenetic analyses of sequences generated from portions of three genes coding for the proteins β-tubulin (benA), calmodulin (CaM), and the translation elongation factor-1 alpha (TEF-1α) of a population of A. niger strains isolated from grapes in Europe revealed the presence of a cryptic phylogenetic species within this population, A. awamori. Morphological, physiological, ecological and chemical data overlap occurred between A. niger and the cryptic A. awamori, however the splitting of these two species was also supported by AFLP analysis of the full genome. Isolates in both phylospecies can produce the mycotoxins ochratoxin A and fumonisin B₂, and they also share the production of pyranonigrin A, tensidol B, funalenone, malformins, and naphtho-γ-pyrones. In addition, sequence analysis of four putative A. awamori strains from Japan, used in the koji industrial fermentation, revealed that none of these strains belong to the A. awamori phylospecies.

  2. The Littorina sequence database (LSD)--an online resource for genomic data.

    PubMed

    Canbäck, Björn; André, Carl; Galindo, Juan; Johannesson, Kerstin; Johansson, Tomas; Panova, Marina; Tunlid, Anders; Butlin, Roger

    2012-01-01

    We present an interactive, searchable expressed sequence tag database for the periwinkle snail Littorina saxatilis, an upcoming model species in evolutionary biology. The database is the result of a hybrid assembly between Sanger and 454 sequences, 1290 and 147,491 sequences respectively. Normalized and non-normalized cDNA was obtained from different ecotypes of L. saxatilis collected in the UK and Sweden. The Littorina sequence database (LSD) contains 26,537 different contigs, of which 2453 showed similarity with annotated proteins in UniProt. Querying the LSD permits the selection of the taxonomic origin of blast hits for each contig, and the search can be restricted to particular taxonomic groups. The database allows access to UniProt annotations, blast output, protein family domains (PFAM) and Gene Ontology. The database will allow users to search for genetic markers and identifying candidate genes or genes for expression analyses. It is open for additional deposition of sequence information for L. saxatilis and other species of the genus Littorina. The LSD is available at http://mbio-serv2.mbioekol.lu.se/Littorina/.

  3. Citrus sinensis annotation project (CAP): a comprehensive database for sweet orange genome.

    PubMed

    Wang, Jia; Chen, Dijun; Lei, Yang; Chang, Ji-Wei; Hao, Bao-Hai; Xing, Feng; Li, Sen; Xu, Qiang; Deng, Xiu-Xin; Chen, Ling-Ling

    2014-01-01

    Citrus is one of the most important and widely grown fruit crop with global production ranking firstly among all the fruit crops in the world. Sweet orange accounts for more than half of the Citrus production both in fresh fruit and processed juice. We have sequenced the draft genome of a double-haploid sweet orange (C. sinensis cv. Valencia), and constructed the Citrus sinensis annotation project (CAP) to store and visualize the sequenced genomic and transcriptome data. CAP provides GBrowse-based organization of sweet orange genomic data, which integrates ab initio gene prediction, EST, RNA-seq and RNA-paired end tag (RNA-PET) evidence-based gene annotation. Furthermore, we provide a user-friendly web interface to show the predicted protein-protein interactions (PPIs) and metabolic pathways in sweet orange. CAP provides comprehensive information beneficial to the researchers of sweet orange and other woody plants, which is freely available at http://citrus.hzau.edu.cn/.

  4. The Disease Portals, disease-gene annotation and the RGD disease ontology at the Rat Genome Database.

    PubMed

    Hayman, G Thomas; Laulederkind, Stanley J F; Smith, Jennifer R; Wang, Shur-Jen; Petri, Victoria; Nigam, Rajni; Tutaj, Marek; De Pons, Jeff; Dwinell, Melinda R; Shimoyama, Mary

    2016-01-01

    The Rat Genome Database (RGD;http://rgd.mcw.edu/) provides critical datasets and software tools to a diverse community of rat and non-rat researchers worldwide. To meet the needs of the many users whose research is disease oriented, RGD has created a series of Disease Portals and has prioritized its curation efforts on the datasets important to understanding the mechanisms of various diseases. Gene-disease relationships for three species, rat, human and mouse, are annotated to capture biomarkers, genetic associations, molecular mechanisms and therapeutic targets. To generate gene-disease annotations more effectively and in greater detail, RGD initially adopted the MEDIC disease vocabulary from the Comparative Toxicogenomics Database and adapted it for use by expanding this framework with the addition of over 1000 terms to create the RGD Disease Ontology (RDO). The RDO provides the foundation for, at present, 10 comprehensive disease area-related dataset and analysis platforms at RGD, the Disease Portals. Two major disease areas are the focus of data acquisition and curation efforts each year, leading to the release of the related Disease Portals. Collaborative efforts to realize a more robust disease ontology are underway. Database URL:http://rgd.mcw.edu.

  5. The Disease Portals, disease–gene annotation and the RGD disease ontology at the Rat Genome Database

    PubMed Central

    Hayman, G. Thomas; Laulederkind, Stanley J. F.; Smith, Jennifer R.; Wang, Shur-Jen; Petri, Victoria; Nigam, Rajni; Tutaj, Marek; De Pons, Jeff; Dwinell, Melinda R.; Shimoyama, Mary

    2016-01-01

    The Rat Genome Database (RGD; http://rgd.mcw.edu/) provides critical datasets and software tools to a diverse community of rat and non-rat researchers worldwide. To meet the needs of the many users whose research is disease oriented, RGD has created a series of Disease Portals and has prioritized its curation efforts on the datasets important to understanding the mechanisms of various diseases. Gene-disease relationships for three species, rat, human and mouse, are annotated to capture biomarkers, genetic associations, molecular mechanisms and therapeutic targets. To generate gene–disease annotations more effectively and in greater detail, RGD initially adopted the MEDIC disease vocabulary from the Comparative Toxicogenomics Database and adapted it for use by expanding this framework with the addition of over 1000 terms to create the RGD Disease Ontology (RDO). The RDO provides the foundation for, at present, 10 comprehensive disease area-related dataset and analysis platforms at RGD, the Disease Portals. Two major disease areas are the focus of data acquisition and curation efforts each year, leading to the release of the related Disease Portals. Collaborative efforts to realize a more robust disease ontology are underway. Database URL: http://rgd.mcw.edu PMID:27009807

  6. ATGC database and ATGC-COGs: an updated resource for micro- and macro-evolutionary studies of prokaryotic genomes and protein family annotation

    PubMed Central

    Kristensen, David M.; Wolf, Yuri I.; Koonin, Eugene V.

    2017-01-01

    The Alignable Tight Genomic Clusters (ATGCs) database is a collection of closely related bacterial and archaeal genomes that provides several tools to aid research into evolutionary processes in the microbial world. Each ATGC is a taxonomy-independent cluster of 2 or more completely sequenced genomes that meet the objective criteria of a high degree of local gene order (synteny) and a small number of synonymous substitutions in the protein-coding genes. As such, each ATGC is suited for analysis of microevolutionary variations within a cohesive group of organisms (e.g. species), whereas the entire collection of ATGCs is useful for macroevolutionary studies. The ATGC database includes many forms of pre-computed data, in particular ATGC-COGs (Clusters of Orthologous Genes), multiple sequence alignments, a set of ‘index’ orthologs representing the most well-conserved members of each ATGC-COG, the phylogenetic tree of the organisms within each ATGC, etc. Although the ATGC database contains several million proteins from thousands of genomes organized into hundreds of clusters (roughly a 4-fold increase since the last version of the ATGC database), it is now built with completely automated methods and will be regularly updated following new releases of the NCBI RefSeq database. The ATGC database is hosted jointly at the University of Iowa at dmk-brain.ecn.uiowa.edu/ATGC/ and the NCBI at ftp.ncbi.nlm.nih.gov/pub/kristensen/ATGC/atgc_home.html. PMID:28053163

  7. Islander: A database of precisely mapped genomic islands in tRNA and tmRNA genes

    DOE PAGES

    Hudson, Corey M.; Lau, Britney Y.; Williams, Kelly P.

    2014-11-05

    Genomic islands are mobile DNAs that are major agents of bacterial and archaeal evolution. Integration into prokaryotic chromosomes usually occurs site-specifically at tRNA or tmRNA gene (together, tDNA) targets, catalyzed by tyrosine integrases. This splits the target gene, yet sequences within the island restore the disrupted gene; the regenerated target and its displaced fragment precisely mark the endpoints of the island. We applied this principle to search for islands in genomic DNA sequences. Our algorithm identifies tDNAs, finds fragments of those tDNAs in the same replicon and removes unlikely candidate islands through a series of filters. A search for islandsmore » in 2168 whole prokaryotic genomes produced 3919 candidates. The website Islander (recently moved to http://bioinformatics.sandia.gov/islander/) presents these precisely mapped candidate islands, the gene content and the island sequence. The algorithm further insists that each island encode an integrase, and attachment site sequence identity is carefully noted; therefore, the database also serves in the study of integrase site-specificity and its evolution.« less

  8. Islander: A database of precisely mapped genomic islands in tRNA and tmRNA genes

    SciTech Connect

    Hudson, Corey M.; Lau, Britney Y.; Williams, Kelly P.

    2014-11-05

    Genomic islands are mobile DNAs that are major agents of bacterial and archaeal evolution. Integration into prokaryotic chromosomes usually occurs site-specifically at tRNA or tmRNA gene (together, tDNA) targets, catalyzed by tyrosine integrases. This splits the target gene, yet sequences within the island restore the disrupted gene; the regenerated target and its displaced fragment precisely mark the endpoints of the island. We applied this principle to search for islands in genomic DNA sequences. Our algorithm identifies tDNAs, finds fragments of those tDNAs in the same replicon and removes unlikely candidate islands through a series of filters. A search for islands in 2168 whole prokaryotic genomes produced 3919 candidates. The website Islander (recently moved to http://bioinformatics.sandia.gov/islander/) presents these precisely mapped candidate islands, the gene content and the island sequence. The algorithm further insists that each island encode an integrase, and attachment site sequence identity is carefully noted; therefore, the database also serves in the study of integrase site-specificity and its evolution.

  9. Islander: a database of precisely mapped genomic islands in tRNA and tmRNA genes

    PubMed Central

    Hudson, Corey M.; Lau, Britney Y.; Williams, Kelly P.

    2015-01-01

    Genomic islands are mobile DNAs that are major agents of bacterial and archaeal evolution. Integration into prokaryotic chromosomes usually occurs site-specifically at tRNA or tmRNA gene (together, tDNA) targets, catalyzed by tyrosine integrases. This splits the target gene, yet sequences within the island restore the disrupted gene; the regenerated target and its displaced fragment precisely mark the endpoints of the island. We applied this principle to search for islands in genomic DNA sequences. Our algorithm identifies tDNAs, finds fragments of those tDNAs in the same replicon and removes unlikely candidate islands through a series of filters. A search for islands in 2168 whole prokaryotic genomes produced 3919 candidates. The website Islander (recently moved to http://bioinformatics.sandia.gov/islander/) presents these precisely mapped candidate islands, the gene content and the island sequence. The algorithm further insists that each island encode an integrase, and attachment site sequence identity is carefully noted; therefore, the database also serves in the study of integrase site-specificity and its evolution. PMID:25378302

  10. Islander: a database of precisely mapped genomic islands in tRNA and tmRNA genes.

    PubMed

    Hudson, Corey M; Lau, Britney Y; Williams, Kelly P

    2015-01-01

    Genomic islands are mobile DNAs that are major agents of bacterial and archaeal evolution. Integration into prokaryotic chromosomes usually occurs site-specifically at tRNA or tmRNA gene (together, tDNA) targets, catalyzed by tyrosine integrases. This splits the target gene, yet sequences within the island restore the disrupted gene; the regenerated target and its displaced fragment precisely mark the endpoints of the island. We applied this principle to search for islands in genomic DNA sequences. Our algorithm identifies tDNAs, finds fragments of those tDNAs in the same replicon and removes unlikely candidate islands through a series of filters. A search for islands in 2168 whole prokaryotic genomes produced 3919 candidates. The website Islander (recently moved to http://bioinformatics.sandia.gov/islander/) presents these precisely mapped candidate islands, the gene content and the island sequence. The algorithm further insists that each island encode an integrase, and attachment site sequence identity is carefully noted; therefore, the database also serves in the study of integrase site-specificity and its evolution.

  11. An Integrated Database for Grass and Endophyte Genomics at www.grassendophyte.org

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The endophytic microbes are able to promote plant growth and health under various stresses via their symbiotic association with host plants. Genome-wide comparative analysis has been extensively employed to decipher complex mechanisms of interactions between endophytic microbes and host plants, resu...

  12. Under-representation of repetitive sequences in whole-genome shotgun sequence databases: an illustration using a recently acquired transposable element.

    PubMed

    Koga, Akihiko

    2012-02-01

    It is widely accepted in a conceptual framework that repetitive sequences, especially those with high sequence homogeneity among copies, tend to be under-represented in whole-genome shotgun sequence databases, because of the difficulty of assembling sequence reads into contigs. Although this is easily inferred, there is no quantitative illustration of this phenomenon. An example using a currently used database is expected to contribute to the intuitive understanding of how serious the under-representation is. The present study provides the first quantitative example (in the case of 16 copies of virtually identical, 4.7-kb sequences in a genome of 7 × 10 (8) bp) by comparing the results of BLAST searches of a sequence database (contig N50; 9.8 kb) with those of Southern blot analysis of genomic DNA. This has revealed that the internal regions of the repetitive sequences are under-represented to a striking extent.

  13. The University of Minnesota Biocatalysis/Biodegradation Database: post-genomic data mining.

    PubMed

    Ellis, Lynda B M; Hou, Bo Kyeng; Kang, Wenjun; Wackett, Lawrence P

    2003-01-01

    The University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD, http://umbbd.ahc.umn.edu/) provides curated information on microbial catabolism and related biotransformations, primarily for environmental pollutants. Currently, it contains information on over 130 metabolic pathways, 800 reactions, 750 compounds and 500 enzymes. In the past two years, it has increased its breath to include more examples of microbial metabolism of metals and metalloids; and expanded the types of information it includes to contain microbial biotransformations of, and binding interactions with many chemical elements. It has also increased the ways in which this data can be accessed (mined). Structure-based searching was added, for exact matches, similarity, or substructures. Analysis of UM-BBD reactions has lead to a prototype, guided, pathway prediction system. Guided prediction means that the user is shown all possible biotransformations at each step and guides the process to its conclusion. Mining the UM-BBD's data provides a unique view into how the microbial world recycles organic functional groups. UM-BBD users are encouraged to comment on all aspects of the database, including the information it contains and the tools by which it can be mined. The database and prediction system develop under the direction of the scientific community.

  14. Citrus sinensis Annotation Project (CAP): A Comprehensive Database for Sweet Orange Genome

    PubMed Central

    Chang, Ji-Wei; Hao, Bao-Hai; Xing, Feng; Li, Sen; Xu, Qiang; Deng, Xiu-Xin; Chen, Ling-Ling

    2014-01-01

    Citrus is one of the most important and widely grown fruit crop with global production ranking firstly among all the fruit crops in the world. Sweet orange accounts for more than half of the Citrus production both in fresh fruit and processed juice. We have sequenced the draft genome of a double-haploid sweet orange (C. sinensis cv. Valencia), and constructed the Citrus sinensis annotation project (CAP) to store and visualize the sequenced genomic and transcriptome data. CAP provides GBrowse-based organization of sweet orange genomic data, which integrates ab initio gene prediction, EST, RNA-seq and RNA-paired end tag (RNA-PET) evidence-based gene annotation. Furthermore, we provide a user-friendly web interface to show the predicted protein-protein interactions (PPIs) and metabolic pathways in sweet orange. CAP provides comprehensive information beneficial to the researchers of sweet orange and other woody plants, which is freely available at http://citrus.hzau.edu.cn/. PMID:24489955

  15. Maize databases

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This chapter is a succinct overview of maize data held in the species-specific database MaizeGDB (the Maize Genomics and Genetics Database), and selected multi-species data repositories, such as Gramene/Ensembl Plants, Phytozome, UniProt and the National Center for Biotechnology Information (NCBI), ...

  16. Aspergillus spinal epidural abscess

    SciTech Connect

    Byrd, B.F. III; Weiner, M.H.; McGee, Z.A.

    1982-12-17

    A spinal epidural abscess developed in a renal transplant recipient; results of a serum radioimmunoassay for Aspergillus antigen were positive. Laminectomy disclosed an abscess of the L4-5 interspace and L-5 vertebral body that contained hyphal forms and from which Aspergillus species was cultured. Serum Aspergillus antigen radioimmunoassay may be a valuable, specific early diagnostic test when systemic aspergillosis is a consideration in an immunosuppressed host.

  17. GELBANK : A database of annotated two-dimensional gel electrophoresis patterns of biological systems with completed genomes.

    SciTech Connect

    Babnigg, G.; Giometti, C. S.; Biosciences Division

    2004-01-01

    GELBANK is a publicly available database of two-dimensional gel electrophoresis (2DE) gel patterns of proteomes from organisms with known genome information (available at and ftp://bioinformatics.anl.gov/gelbank/). Currently it includes 131 completed, mostly microbial proteomes available from the National Center for Biotechnology Information. A web interface allows the upload of 2D gel patterns and their annotation for registered users. The images are organized by species, tissue type, separation method, sample type and staining method. The database can be queried based on protein or 2DE-pattern attributes. A web interface allows registered users to assign molecular weight and pH gradient profiles to their own 2D gel patterns as well as to link protein identifications to a given spot on the pattern. The website presents all of the submitted 2D gel patterns where the end-user can dynamically display the images or parts of images along with molecular weight, pH profile information and linked protein identification. A collection of images can be selected for the creation of animations from which the user can select sub-regions of interest and unlimited 2D gel patterns for visualization. The website currently presents 233 identifications for 81 gel patterns for Homo sapiens, Methanococcus jannaschii, Pyro coccus furiosus, Shewanella oneidensis, Escherichia coli and Deinococcus radiodurans.

  18. MitoFish and MitoAnnotator: a mitochondrial genome database of fish with an accurate and automatic annotation pipeline.

    PubMed

    Iwasaki, Wataru; Fukunaga, Tsukasa; Isagozawa, Ryota; Yamada, Koichiro; Maeda, Yasunobu; Satoh, Takashi P; Sado, Tetsuya; Mabuchi, Kohji; Takeshima, Hirohiko; Miya, Masaki; Nishida, Mutsumi

    2013-11-01

    Mitofish is a database of fish mitochondrial genomes (mitogenomes) that includes powerful and precise de novo annotations for mitogenome sequences. Fish occupy an important position in the evolution of vertebrates and the ecology of the hydrosphere, and mitogenomic sequence data have served as a rich source of information for resolving fish phylogenies and identifying new fish species. The importance of a mitogenomic database continues to grow at a rapid pace as massive amounts of mitogenomic data are generated with the advent of new sequencing technologies. A severe bottleneck seems likely to occur with regard to mitogenome annotation because of the overwhelming pace of data accumulation and the intrinsic difficulties in annotating sequences with degenerating transfer RNA structures, divergent start/stop codons of the coding elements, and the overlapping of adjacent elements. To ease this data backlog, we developed an annotation pipeline named MitoAnnotator. MitoAnnotator automatically annotates a fish mitogenome with a high degree of accuracy in approximately 5 min; thus, it is readily applicable to data sets of dozens of sequences. MitoFish also contains re-annotations of previously sequenced fish mitogenomes, enabling researchers to refer to them when they find annotations that are likely to be erroneous or while conducting comparative mitogenomic analyses. For users who need more information on the taxonomy, habitats, phenotypes, or life cycles of fish, MitoFish provides links to related databases. MitoFish and MitoAnnotator are freely available at http://mitofish.aori.u-tokyo.ac.jp/ (last accessed August 28, 2013); all of the data can be batch downloaded, and the annotation pipeline can be used via a web interface.

  19. MitoFish and MitoAnnotator: A Mitochondrial Genome Database of Fish with an Accurate and Automatic Annotation Pipeline

    PubMed Central

    Iwasaki, Wataru; Fukunaga, Tsukasa; Isagozawa, Ryota; Yamada, Koichiro; Maeda, Yasunobu; Satoh, Takashi P.; Sado, Tetsuya; Mabuchi, Kohji; Takeshima, Hirohiko; Miya, Masaki; Nishida, Mutsumi

    2013-01-01

    Mitofish is a database of fish mitochondrial genomes (mitogenomes) that includes powerful and precise de novo annotations for mitogenome sequences. Fish occupy an important position in the evolution of vertebrates and the ecology of the hydrosphere, and mitogenomic sequence data have served as a rich source of information for resolving fish phylogenies and identifying new fish species. The importance of a mitogenomic database continues to grow at a rapid pace as massive amounts of mitogenomic data are generated with the advent of new sequencing technologies. A severe bottleneck seems likely to occur with regard to mitogenome annotation because of the overwhelming pace of data accumulation and the intrinsic difficulties in annotating sequences with degenerating transfer RNA structures, divergent start/stop codons of the coding elements, and the overlapping of adjacent elements. To ease this data backlog, we developed an annotation pipeline named MitoAnnotator. MitoAnnotator automatically annotates a fish mitogenome with a high degree of accuracy in approximately 5 min; thus, it is readily applicable to data sets of dozens of sequences. MitoFish also contains re-annotations of previously sequenced fish mitogenomes, enabling researchers to refer to them when they find annotations that are likely to be erroneous or while conducting comparative mitogenomic analyses. For users who need more information on the taxonomy, habitats, phenotypes, or life cycles of fish, MitoFish provides links to related databases. MitoFish and MitoAnnotator are freely available at http://mitofish.aori.u-tokyo.ac.jp/ (last accessed August 28, 2013); all of the data can be batch downloaded, and the annotation pipeline can be used via a web interface. PMID:23955518

  20. Aspergillus tubingensis and Aspergillus niger as the dominant black Aspergillus, use of simple PCR-RFLP for preliminary differentiation.

    PubMed

    Mirhendi, H; Zarei, F; Motamedi, M; Nouripour-Sisakht, S

    2016-03-01

    This work aimed to identify the species distribution of common clinical and environmental isolates of black Aspergilli based on simple restriction fragment length polymorphism (RFLP) analysis of the β-tubulin gene. A total of 149 clinical and environmental strains of black Aspergilli were collected and subjected to preliminary morphological examination. Total genomic DNAs were extracted, and PCR was performed to amplify part of the β-tubulin gene. At first, 52 randomly selected samples were species-delineated by sequence analysis. In order to distinguish the most common species, PCR amplicons of 117 black Aspergillus strains were identified by simple PCR-RFLP analysis using the enzyme TasI. Among 52 sequenced isolates, 28 were Aspergillus tubingensis, 21 Aspergillus niger, and the three remaining isolates included Aspergillus uvarum, Aspergillus awamori, and Aspergillus acidus. All 100 environmental and 17 BAL samples subjected to TasI-RFLP analysis of the β-tubulin gene, fell into two groups, consisting of about 59% (n=69) A. tubingensis and 41% (n=48) A. niger. Therefore, the method successfully and rapidly distinguished A. tubingensis and A. niger as the most common species among the clinical and environmental isolates. Although tardy, the Ehrlich test was also able to differentiate A. tubingensis and A. niger according to the yellow color reaction specific to A. niger. A. tubingensis and A. niger are the most common black Aspergillus in both clinical and environmental isolates in Iran. PCR-RFLP using TasI digestion of β-tubulin DNA enables rapid screening for these common species.

  1. Recon2Neo4j: applying graph database technologies for managing comprehensive genome-scale networks.

    PubMed

    Balaur, Irina; Mazein, Alexander; Saqi, Mansoor; Lysenko, Artem; Rawlings, Christopher J; Auffray, Charles

    2016-12-19

    The goal of this work is to offer a computational framework for exploring data from the Recon2 human metabolic reconstruction model. Advanced user access features have been developed using the Neo4j graph database technology and this paper describes key features such as efficient management of the network data, examples of the network querying for addressing particular tasks, and how query results are converted back to the Systems Biology Markup Language (SBML) standard format. The Neo4j-based metabolic framework facilitates exploration of highly connected and comprehensive human metabolic data and identification of metabolic subnetworks of interest. A Java-based parser component has been developed to convert query results (available in the JSON format) into SBML and SIF formats in order to facilitate further results exploration, enhancement or network sharing.

  2. Aspergillus fumigatus and Aspergillosis

    PubMed Central

    Latgé, Jean-Paul

    1999-01-01

    Aspergillus fumigatus is one of the most ubiquitous of the airborne saprophytic fungi. Humans and animals constantly inhale numerous conidia of this fungus. The conidia are normally eliminated in the immunocompetent host by innate immune mechanisms, and aspergilloma and allergic bronchopulmonary aspergillosis, uncommon clinical syndromes, are the only infections observed in such hosts. Thus, A. fumigatus was considered for years to be a weak pathogen. With increases in the number of immunosuppressed patients, however, there has been a dramatic increase in severe and usually fatal invasive aspergillosis, now the most common mold infection worldwide. In this review, the focus is on the biology of A. fumigatus and the diseases it causes. Included are discussions of (i) genomic and molecular characterization of the organism, (ii) clinical and laboratory methods available for the diagnosis of aspergillosis in immunocompetent and immunocompromised hosts, (iii) identification of host and fungal factors that play a role in the establishment of the fungus in vivo, and (iv) problems associated with antifungal therapy. PMID:10194462

  3. Allergens/Antigens, toxins and polyketides of important Aspergillus species.

    PubMed

    Bhetariya, Preetida J; Madan, Taruna; Basir, Seemi Farhat; Varma, Anupam; Usha, Sarma P

    2011-04-01

    The medical, agricultural and biotechnological importance of the primitive eukaryotic microorganisms, the Fungi was recognized way back in 1920. Among various groups of fungi, the Aspergillus species are studied in great detail using advances in genomics and proteomics to unravel biological and molecular mechanisms in these fungi. Aspergillus fumigatus, Aspergillus flavus, Aspergillus niger, Aspergillus parasiticus, Aspergillus nidulans and Aspergillus terreus are some of the important species relevant to human, agricultural and biotechnological applications. The potential of Aspergillus species to produce highly diversified complex biomolecules such as multifunctional proteins (allergens, antigens, enzymes) and polyketides is fascinating and demands greater insight into the understanding of these fungal species for application to human health. Recently a regulator gene for secondary metabolites, LaeA has been identified. Gene mining based on LaeA has facilitated new metabolites with antimicrobial activity such as emericellamides and antitumor activity such as terrequinone A from A. nidulans. Immunoproteomic approach was reported for identification of few novel allergens for A. fumigatus. In this context, the review is focused on recent developments in allergens, antigens, structural and functional diversity of the polyketide synthases that produce polyketides of pharmaceutical and biological importance. Possible antifungal drug targets for development of effective antifungal drugs and new strategies for development of molecular diagnostics are considered.

  4. DLGP: A database for lineage-conserved and lineage-specific gene pairs in animal and plant genomes.

    PubMed

    Wang, Dapeng

    2016-01-15

    The conservation of gene organization in the genome with lineage-specificity is an invaluable resource to decipher their potential functionality with diverse selective constraints, especially in higher animals and plants. Gene pairs appear to be the minimal structure for such kind of gene clusters that tend to reside in their preferred locations, representing the distinctive genomic characteristics in single species or a given lineage. Despite gene families having been investigated in a widespread manner, the definition of gene pair families in various taxa still lacks adequate attention. To address this issue, we report DLGP (http://lcgbase.big.ac.cn/DLGP/) that stores the pre-calculated lineage-based gene pairs in currently available 134 animal and plant genomes and inspect them under the same analytical framework, bringing out a set of innovational features. First, the taxonomy or lineage has been classified into four levels such as Kingdom, Phylum, Class and Order. It adopts all-to-all comparison strategy to identify the possible conserved gene pairs in all species for each gene pair in certain species and reckon those that are conserved in over a significant proportion of species in a given lineage (e.g. Primates, Diptera or Poales) as the lineage-conserved gene pairs. Furthermore, it predicts the lineage-specific gene pairs by retaining the above-mentioned lineage-conserved gene pairs that are not conserved in any other lineages. Second, it carries out pairwise comparison for the gene pairs between two compared species and creates the table including all the conserved gene pairs and the image elucidating the conservation degree of gene pairs in chromosomal level. Third, it supplies gene order browser to extend gene pairs to gene clusters, allowing users to view the evolution dynamics in the gene context in an intuitive manner. This database will be able to facilitate the particular comparison between animals and plants, between vertebrates and arthropods, and

  5. Cas-Database: web-based genome-wide guide RNA library design for gene knockout screens using CRISPR-Cas9

    PubMed Central

    Park, Jeongbin; Kim, Jin-Soo; Bae, Sangsu

    2016-01-01

    Motivation: CRISPR-derived RNA guided endonucleases (RGENs) have been widely used for both gene knockout and knock-in at the level of single or multiple genes. RGENs are now available for forward genetic screens at genome scale, but single guide RNA (sgRNA) selection at this scale is difficult. Results: We develop an online tool, Cas-Database, a genome-wide gRNA library design tool for Cas9 nucleases from Streptococcus pyogenes (SpCas9). With an easy-to-use web interface, Cas-Database allows users to select optimal target sequences simply by changing the filtering conditions. Furthermore, it provides a powerful way to select multiple optimal target sequences from thousands of genes at once for the creation of a genome-wide library. Cas-Database also provides a web application programming interface (web API) for advanced bioinformatics users. Availability and implementation: Free access at http://www.rgenome.net/cas-database/. Contact: sangsubae@hanyang.ac.kr or jskim01@snu.ac.kr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153724

  6. Creation of a genome-wide metabolic pathway database for Populus trichocarpa using a new approach for reconstruction and curation of metabolic pathways for plants.

    PubMed

    Zhang, Peifen; Dreher, Kate; Karthikeyan, A; Chi, Anjo; Pujar, Anuradha; Caspi, Ron; Karp, Peter; Kirkup, Vanessa; Latendresse, Mario; Lee, Cynthia; Mueller, Lukas A; Muller, Robert; Rhee, Seung Yon

    2010-08-01

    Metabolic networks reconstructed from sequenced genomes or transcriptomes can help visualize and analyze large-scale experimental data, predict metabolic phenotypes, discover enzymes, engineer metabolic pathways, and study metabolic pathway evolution. We developed a general approach for reconstructing metabolic pathway complements of plant genomes. Two new reference databases were created and added to the core of the infrastructure: a comprehensive, all-plant reference pathway database, PlantCyc, and a reference enzyme sequence database, RESD, for annotating metabolic functions of protein sequences. PlantCyc (version 3.0) includes 714 metabolic pathways and 2,619 reactions from over 300 species. RESD (version 1.0) contains 14,187 literature-supported enzyme sequences from across all kingdoms. We used RESD, PlantCyc, and MetaCyc (an all-species reference metabolic pathway database), in conjunction with the pathway prediction software Pathway Tools, to reconstruct a metabolic pathway database, PoplarCyc, from the recently sequenced genome of Populus trichocarpa. PoplarCyc (version 1.0) contains 321 pathways with 1,807 assigned enzymes. Comparing PoplarCyc (version 1.0) with AraCyc (version 6.0, Arabidopsis [Arabidopsis thaliana]) showed comparable numbers of pathways distributed across all domains of metabolism in both databases, except for a higher number of AraCyc pathways in secondary metabolism and a 1.5-fold increase in carbohydrate metabolic enzymes in PoplarCyc. Here, we introduce these new resources and demonstrate the feasibility of using them to identify candidate enzymes for specific pathways and to analyze metabolite profiling data through concrete examples. These resources can be searched by text or BLAST, browsed, and downloaded from our project Web site (http://plantcyc.org).

  7. Human Mitochondrial Protein Database

    National Institute of Standards and Technology Data Gateway

    SRD 131 Human Mitochondrial Protein Database (Web, free access)   The Human Mitochondrial Protein Database (HMPDb) provides comprehensive data on mitochondrial and human nuclear encoded proteins involved in mitochondrial biogenesis and function. This database consolidates information from SwissProt, LocusLink, Protein Data Bank (PDB), GenBank, Genome Database (GDB), Online Mendelian Inheritance in Man (OMIM), Human Mitochondrial Genome Database (mtDB), MITOMAP, Neuromuscular Disease Center and Human 2-D PAGE Databases. This database is intended as a tool not only to aid in studying the mitochondrion but in studying the associated diseases.

  8. Final Technical Report on the Genome Sequence DataBase (GSDB): DE-FG03 95 ER 62062 September 1997-September 1999

    SciTech Connect

    Harger, Carol A.

    1999-10-28

    Since September 1997 NCGR has produced two web-based tools for researchers to use to access and analyze data in the Genome Sequence DataBase (GSDB). These tools are: Sequence Viewer, a nucleotide sequence and annotation visualization tool, and MAR-Finder, a tool that predicts, base upon statistical inferences, the location of matrix attachment regions (MARS) within a nucleotide sequence. [The annual report for June 1996 to August 1997 is included as an attachment to this final report.

  9. Multi-Sample Pooling and Illumina Genome Analyzer Sequencing Methods to Determine Gene Sequence Variation for Database Development

    PubMed Central

    Margraf, Rebecca L.; Durtschi, Jacob D.; Dames, Shale; Pattison, David C.; Stephens, Jack E.; Mao, Rong; Voelkerding, Karl V.

    2010-01-01

    Determination of sequence variation within a genetic locus to develop clinically relevant databases is critical for molecular assay design and clinical test interpretation, so multisample pooling for Illumina genome analyzer (GA) sequencing was investigated using the RET proto-oncogene as a model. Samples were Sanger-sequenced for RET exons 10, 11, and 13–16. Ten samples with 13 known unique variants (“singleton variants” within the pool) and seven common changes were amplified and then equimolar-pooled before sequencing on a single flow cell lane, generating 36 base reads. For comparison, a single “control” sample was run in a different lane. After alignment, a 24-base quality score-screening threshold and 3` read end trimming of three bases yielded low background error rates with a 27% decrease in aligned read coverage. Sequencing data were evaluated using an established variant detection method (percent variant reads), by the presented subtractive correction method, and with SNPSeeker software. In total, 41 variants (of which 23 were singleton variants) were detected in the 10 pool data, which included all Sanger-identified variants. The 23 singleton variants were detected near the expected 5% allele frequency (average 5.17%±0.90% variant reads), well above the highest background error (1.25%). Based on background error rates, read coverage, simulated 30, 40, and 50 sample pool data, expected singleton allele frequencies within pools, and variant detection methods; ≥30 samples (which demonstrated a minimum 1% variant reads for singletons) could be pooled to reliably detect singleton variants by GA sequencing. PMID:20808642

  10. KEGG orthology-based annotation of the predicted proteome of Acropora digitifera: ZoophyteBase - an open access and searchable database of a coral genome

    PubMed Central

    2013-01-01

    Background Contemporary coral reef research has firmly established that a genomic approach is urgently needed to better understand the effects of anthropogenic environmental stress and global climate change on coral holobiont interactions. Here we present KEGG orthology-based annotation of the complete genome sequence of the scleractinian coral Acropora digitifera and provide the first comprehensive view of the genome of a reef-building coral by applying advanced bioinformatics. Description Sequences from the KEGG database of protein function were used to construct hidden Markov models. These models were used to search the predicted proteome of A. digitifera to establish complete genomic annotation. The annotated dataset is published in ZoophyteBase, an open access format with different options for searching the data. A particularly useful feature is the ability to use a Google-like search engine that links query words to protein attributes. We present features of the annotation that underpin the molecular structure of key processes of coral physiology that include (1) regulatory proteins of symbiosis, (2) planula and early developmental proteins, (3) neural messengers, receptors and sensory proteins, (4) calcification and Ca2+-signalling proteins, (5) plant-derived proteins, (6) proteins of nitrogen metabolism, (7) DNA repair proteins, (8) stress response proteins, (9) antioxidant and redox-protective proteins, (10) proteins of cellular apoptosis, (11) microbial symbioses and pathogenicity proteins, (12) proteins of viral pathogenicity, (13) toxins and venom, (14) proteins of the chemical defensome and (15) coral epigenetics. Conclusions We advocate that providing annotation in an open-access searchable database available to the public domain will give an unprecedented foundation to interrogate the fundamental molecular structure and interactions of coral symbiosis and allow critical questions to be addressed at the genomic level based on combined aspects of

  11. Development in Aspergillus

    PubMed Central

    Krijgsheld, P.; Bleichrodt, R.; van Veluw, G.J.; Wang, F.; Müller, W.H.; Dijksterhuis, J.; Wösten, H.A.B.

    2013-01-01

    The genus Aspergillus represents a diverse group of fungi that are among the most abundant fungi in the world. Germination of a spore can lead to a vegetative mycelium that colonizes a substrate. The hyphae within the mycelium are highly heterogeneous with respect to gene expression, growth, and secretion. Aspergilli can reproduce both asexually and sexually. To this end, conidiophores and ascocarps are produced that form conidia and ascospores, respectively. This review describes the molecular mechanisms underlying growth and development of Aspergillus. PMID:23450714

  12. Construction of an Ostrea edulis database from genomic and expressed sequence tags (ESTs) obtained from Bonamia ostreae infected haemocytes: Development of an immune-enriched oligo-microarray.

    PubMed

    Pardo, Belén G; Álvarez-Dios, José Antonio; Cao, Asunción; Ramilo, Andrea; Gómez-Tato, Antonio; Planas, Josep V; Villalba, Antonio; Martínez, Paulino

    2016-12-01

    The flat oyster, Ostrea edulis, is one of the main farmed oysters, not only in Europe but also in the United States and Canada. Bonamiosis due to the parasite Bonamia ostreae has been associated with high mortality episodes in this species. This parasite is an intracellular protozoan that infects haemocytes, the main cells involved in oyster defence. Due to the economical and ecological importance of flat oyster, genomic data are badly needed for genetic improvement of the species, but they are still very scarce. The objective of this study is to develop a sequence database, OedulisDB, with new genomic and transcriptomic resources, providing new data and convenient tools to improve our knowledge of the oyster's immune mechanisms. Transcriptomic and genomic sequences were obtained using 454 pyrosequencing and compiled into an O. edulis database, OedulisDB, consisting of two sets of 10,318 and 7159 unique sequences that represent the oyster's genome (WG) and de novo haemocyte transcriptome (HT), respectively. The flat oyster transcriptome was obtained from two strains (naïve and tolerant) challenged with B. ostreae, and from their corresponding non-challenged controls. Approximately 78.5% of 5619 HT unique sequences were successfully annotated by Blast search using public databases. A total of 984 sequences were identified as being related to immune response and several key immune genes were identified for the first time in flat oyster. Additionally, transcriptome information was used to design and validate the first oligo-microarray in flat oyster enriched with immune sequences from haemocytes. Our transcriptomic and genomic sequencing and subsequent annotation have largely increased the scarce resources available for this economically important species and have enabled us to develop an OedulisDB database and accompanying tools for gene expression analysis. This study represents the first attempt to characterize in depth the O. edulis haemocyte transcriptome in

  13. The PlaNet Consortium: A Network of European Plant Databases Connecting Plant Genome Data in an Integrated Biological Knowledge Resource

    PubMed Central

    Ernst, R.; Mayer, K. F. X.

    2004-01-01

    The completion of the Arabidopsis genome and the large collections of other plant sequences generated in recent years have sparked extensive functional genomics efforts. However, the utilization of this data is inefficient, as data sources are distributed and heterogeneous and efforts at data integration are lagging behind. PlaNet aims to overcome the limitations of individual efforts as well as the limitations of heterogeneous, independent data collections. PlaNet is a distributed effort among European bioinformatics groups and plant molecular biologists to establish a comprehensive integrated database in a collaborative network. Objectives are the implementation of infrastructure and data sources to capture plant genomic information into a comprehensive, integrated platform. This will facilitate the systematic exploration of Arabidopsis and other plants. New methods for data exchange, database integration and access are being developed to create a highly integrated, federated data resource for research. The connection between the individual resources is realized with BioMOBY. BioMOBY provides an architecture for the discovery and distribution of biological data through web services. While knowledge is centralized, data is maintained at its primary source without a need for warehousing. To standardize nomenclature and data representation, ontologies and generic data models are defined in interaction with the relevant communities.Minimal data models should make it simple to allow broad integration, while inheritance allows detail and depth to be added to more complex data objects without losing integration. To allow expert annotation and keep databases curated, local and remote annotation interfaces are provided. Easy and direct access to all data is key to the project. PMID:18629059

  14. The PlaNet Consortium: a network of European plant databases connecting plant genome data in an integrated biological knowledge resource.

    PubMed

    Schoof, H; Ernst, R; Mayer, K F X

    2004-01-01

    The completion of the Arabidopsis genome and the large collections of other plant sequences generated in recent years have sparked extensive functional genomics efforts. However, the utilization of this data is inefficient, as data sources are distributed and heterogeneous and efforts at data integration are lagging behind. PlaNet aims to overcome the limitations of individual efforts as well as the limitations of heterogeneous, independent data collections. PlaNet is a distributed effort among European bioinformatics groups and plant molecular biologists to establish a comprehensive integrated database in a collaborative network. Objectives are the implementation of infrastructure and data sources to capture plant genomic information into a comprehensive, integrated platform. This will facilitate the systematic exploration of Arabidopsis and other plants. New methods for data exchange, database integration and access are being developed to create a highly integrated, federated data resource for research. The connection between the individual resources is realized with BioMOBY. BioMOBY provides an architecture for the discovery and distribution of biological data through web services. While knowledge is centralized, data is maintained at its primary source without a need for warehousing. To standardize nomenclature and data representation, ontologies and generic data models are defined in interaction with the relevant communities.Minimal data models should make it simple to allow broad integration, while inheritance allows detail and depth to be added to more complex data objects without losing integration. To allow expert annotation and keep databases curated, local and remote annotation interfaces are provided. Easy and direct access to all data is key to the project.

  15. ANCUT2, a Thermo-alkaline Cutinase from Aspergillus nidulans and Its Potential Applications.

    PubMed

    Bermúdez-García, Eva; Peña-Montes, Carolina; Castro-Rodríguez, José Augusto; González-Canto, Augusto; Navarro-Ocaña, Arturo; Farrés, Amelia

    2017-01-25

    Biochemical characterization of purified ANCUT2 cutinase from Aspergillus nidulans is described. The identified amino acid sequence differs from that predicted in Aspergillus genomic databases in amino acids not relevant for catalysis. The enzyme is thermo-alkaline, showing its maximum activity at pH 9 and 60 °C, and it retains more than 60% of its initial activity after incubation for 1 h at 60 °C for pH values between 6 and 10. ANCUT2 is more active towards long-chain esters and it hydrolyzes cutin; however, it also hydrolyzes short-chain esters. Cutinase is inhibited by metal ions, PMSF, SDS, and EDTA (10 mM). It retains 50% of its activity in most of the solvents tested, although it is more stable in hydrophobic solvents. According to its found biochemical properties, preliminary assays demonstrate its ability to synthesize methyl esters from sesame oil and the most likely application of this enzyme remains in detergent formulations.

  16. Enhancing a Pathway-Genome Database (PGDB) to Capture Subcellular Localization of Metabolites and Enzymes: The Nucleotide-Sugar Biosynthetic Pathways of Populus trichocarpa

    SciTech Connect

    Nag, A.; Karpinets, T. V.; Chang, C. H.; Bar-Peled, M.

    2012-01-01

    Understanding how cellular metabolism works and is regulated requires that the underlying biochemical pathways be adequately represented and integrated with large metabolomic data sets to establish a robust network model. Genetically engineering energy crops to be less recalcitrant to saccharification requires detailed knowledge of plant polysaccharide structures and a thorough understanding of the metabolic pathways involved in forming and regulating cell-wall synthesis. Nucleotide-sugars are building blocks for synthesis of cell wall polysaccharides. The biosynthesis of nucleotide-sugars is catalyzed by a multitude of enzymes that reside in different subcellular organelles, and precise representation of these pathways requires accurate capture of this biological compartmentalization. The lack of simple localization cues in genomic sequence data and annotations however leads to missing compartmentalization information for eukaryotes in automatically generated databases, such as the Pathway-Genome Databases (PGDBs) of the SRI Pathway Tools software that drives much biochemical knowledge representation on the internet. In this report, we provide an informal mechanism using the existing Pathway Tools framework to integrate protein and metabolite sub-cellular localization data with the existing representation of the nucleotide-sugar metabolic pathways in a prototype PGDB for Populus trichocarpa. The enhanced pathway representations have been successfully used to map SNP abundance data to individual nucleotide-sugar biosynthetic genes in the PGDB. The manually curated pathway representations are more conducive to the construction of a computational platform that will allow the simulation of natural and engineered nucleotide-sugar precursor fluxes into specific recalcitrant polysaccharide(s).

  17. Gene Expression Atlas update—a value-added database of microarray and sequencing-based functional genomics experiments

    PubMed Central

    Kapushesky, Misha; Adamusiak, Tomasz; Burdett, Tony; Culhane, Aedin; Farne, Anna; Filippov, Alexey; Holloway, Ele; Klebanov, Andrey; Kryvych, Nataliya; Kurbatova, Natalja; Kurnosov, Pavel; Malone, James; Melnichuk, Olga; Petryszak, Robert; Pultsin, Nikolay; Rustici, Gabriella; Tikhonov, Andrew; Travillian, Ravensara S.; Williams, Eleanor; Zorin, Andrey; Parkinson, Helen; Brazma, Alvis

    2012-01-01

    Gene Expression Atlas (http://www.ebi.ac.uk/gxa) is an added-value database providing information about gene expression in different cell types, organism parts, developmental stages, disease states, sample treatments and other biological/experimental conditions. The content of this database derives from curation, re-annotation and statistical analysis of selected data from the ArrayExpress Archive and the European Nucleotide Archive. A simple interface allows the user to query for differential gene expression either by gene names or attributes or by biological conditions, e.g. diseases, organism parts or cell types. Since our previous report we made 20 monthly releases and, as of Release 11.08 (August 2011), the database supports 19 species, which contains expression data measured for 19 014 biological conditions in 136 551 assays from 5598 independent studies. PMID:22064864

  18. SpinachDB: A Well-Characterized Genomic Database for Gene Family Classification and SNP Information of Spinach.

    PubMed

    Yang, Xue-Dong; Tan, Hua-Wei; Zhu, Wei-Min

    2016-01-01

    Spinach (Spinacia oleracea L.), which originated in central and western Asia, belongs to the family Amaranthaceae. Spinach is one of most important leafy vegetables with a high nutritional value as well as being a perfect research material for plant sex chromosome models. As the completion of genome assembly and gene prediction of spinach, we developed SpinachDB (http://222.73.98.124/spinachdb) to store, annotate, mine and analyze genomics and genetics datasets efficiently. In this study, all of 21702 spinach genes were annotated. A total of 15741 spinach genes were catalogued into 4351 families, including identification of a substantial number of transcription factors. To construct a high-density genetic map, a total of 131592 SSRs and 1125743 potential SNPs located in 548801 loci of spinach genome were identified in 11 cultivated and wild spinach cultivars. The expression profiles were also performed with RNA-seq data using the FPKM method, which could be used to compare the genes. Paralogs in spinach and the orthologous genes in Arabidopsis, grape, sugar beet and rice were identified for comparative genome analysis. Finally, the SpinachDB website contains seven main sections, including the homepage; the GBrowse map that integrates genome, genes, SSR and SNP marker information; the Blast alignment service; the gene family classification search tool; the orthologous and paralogous gene pairs search tool; and the download and useful contact information. SpinachDB will be continually expanded to include newly generated robust genomics and genetics data sets along with the associated data mining and analysis tools.

  19. SpinachDB: A Well-Characterized Genomic Database for Gene Family Classification and SNP Information of Spinach

    PubMed Central

    Zhu, Wei-Min

    2016-01-01

    Spinach (Spinacia oleracea L.), which originated in central and western Asia, belongs to the family Amaranthaceae. Spinach is one of most important leafy vegetables with a high nutritional value as well as being a perfect research material for plant sex chromosome models. As the completion of genome assembly and gene prediction of spinach, we developed SpinachDB (http://222.73.98.124/spinachdb) to store, annotate, mine and analyze genomics and genetics datasets efficiently. In this study, all of 21702 spinach genes were annotated. A total of 15741 spinach genes were catalogued into 4351 families, including identification of a substantial number of transcription factors. To construct a high-density genetic map, a total of 131592 SSRs and 1125743 potential SNPs located in 548801 loci of spinach genome were identified in 11 cultivated and wild spinach cultivars. The expression profiles were also performed with RNA-seq data using the FPKM method, which could be used to compare the genes. Paralogs in spinach and the orthologous genes in Arabidopsis, grape, sugar beet and rice were identified for comparative genome analysis. Finally, the SpinachDB website contains seven main sections, including the homepage; the GBrowse map that integrates genome, genes, SSR and SNP marker information; the Blast alignment service; the gene family classification search tool; the orthologous and paralogous gene pairs search tool; and the download and useful contact information. SpinachDB will be continually expanded to include newly generated robust genomics and genetics data sets along with the associated data mining and analysis tools. PMID:27148975

  20. Biomarkers of Aspergillus spores

    NASA Astrophysics Data System (ADS)

    Sulc, Miroslav; Peslova, Katerina; Zabka, Martin; Hajduch, Marian; Havlicek, Vladimir

    2009-02-01

    We applied both matrix-assisted laser desorption/ionization time of flight (MALDI-TOF) mass spectrometric and 1D sodium dodecylsulfate polyacrylamide gel electrophoretic (1D-PAGE) approaches for direct analysis of intact fungal spores of twenty four Aspergillus species. In parallel, we optimized various protocols for protein extraction from Aspergillus spores using acidic conditions, step organic gradient and variable sonication treatment. The MALDI-TOF mass spectra obtained from optimally prepared samples provided a reproducible fingerprint demonstrating the capability of the MALDI-TOF approach to type and characterize different fungal strains within the Aspergillus genus. Mass spectra of intact fungal spores provided signals mostly below 20 kDa. The minimum material amount represented 0.3 [mu]g (10,000 spores). Proteins with higher molecular weight were detected by 1D-PAGEE Eleven proteins were identified from three selected strains in the range 5-25 kDa by the proteomic approach. Hemolysin and hydrophobin have the highest relevance in host-pathogen interactions.

  1. A new single-nucleotide polymorphisms database for rainbow trout generated through whole genome resequencing of selected samples

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Single-nucleotide polymorphisms (SNPs) are highly abundant markers, which are broadly distributed in animal genomes. For rainbow trout, SNP discovery has been done through sequencing of restriction-site associated DNA (RAD) libraries, reduced representation libraries (RRL), RNA sequencing, and whole...

  2. Modern taxonomy of biotechnologically important Aspergillus and Penicillium species.

    PubMed

    Houbraken, Jos; de Vries, Ronald P; Samson, Robert A

    2014-01-01

    Taxonomy is a dynamic discipline and name changes of fungi with biotechnological, industrial, or medical importance are often difficult to understand for researchers in the applied field. Species belonging to the genera Aspergillus and Penicillium are commonly used or isolated, and inadequate taxonomy or uncertain nomenclature of these genera can therefore lead to tremendous confusion. Misidentification of strains used in biotechnology can be traced back to (1) recent changes in nomenclature, (2) new taxonomic insights, including description of new species, and/or (3) incorrect identifications. Changes in the recent published International Code of Nomenclature for Algae, Fungi and Plants will lead to numerous name changes of existing Aspergillus and Penicillium species and an overview of the current names of biotechnological important species is given. Furthermore, in (biotechnological) literature old and invalid names are still used, such as Aspergillus awamori, A. foetidus, A. kawachii, Talaromyces emersonii, Acremonium cellulolyticus, and Penicillium funiculosum. An overview of these and other species with their correct names is presented. Furthermore, the biotechnologically important species Talaromyces thermophilus is here combined in Thermomyces as Th. dupontii. The importance of Aspergillus, Penicillium, and related genera is also illustrated by the high number of undertaken genome sequencing projects. A number of these strains are incorrectly identified or atypical strains are selected for these projects. Recommendations for correct strain selection are given here. Phylogenetic analysis shows a close relationship between the genome-sequenced strains of Aspergillus, Penicillium, and Monascus. Talaromyces stipitatus and T. marneffei (syn. Penicillium marneffei) are closely related to Thermomyces lanuginosus and Th. dupontii (syn. Talaromyces thermophilus), and these species appear to be distantly related to Aspergillus and Penicillium. In the last part of

  3. Myosinome: a database of myosins from select eukaryotic genomes to facilitate analysis of sequence-structure-function relationships.

    PubMed

    Syamaladevi, Divya P; Sunitha, Margaret S; Kalaimathy, S; Reddy, Chandrashekar C; Iftekhar, Mohammed; Pasha, Shaik N; Sowdhamini, R

    2012-01-01

    Myosins are one of the largest protein superfamilies with 24 classes. They have conserved structural features and catalytic domains yet show huge variation at different domains resulting in a variety of functions. Myosins are molecules driving various kinds of cellular processes and motility until the level of organisms. These are ATPases that utilize the chemical energy released by ATP hydrolysis to bring about conformational changes leading to a motor function. Myosins are important as they are involved in almost all cellular activities ranging from cell division to transcriptional regulation. They are crucial due to their involvement in many congenital diseases symptomatized by muscular malfunctions, cardiac diseases, deafness, neural and immunological dysfunction, and so on, many of which lead to death at an early age. We present Myosinome, a database of selected myosin classes (myosin II, V, and VI) from five model organisms. This knowledge base provides the sequences, phylogenetic clustering, domain architectures of myosins and molecular models, structural analyses, and relevant literature of their coiled-coil domains. In the current version of Myosinome, information about 71 myosin sequences belonging to three myosin classes (myosin II, V, and VI) in five model organisms (Homo Sapiens, Mus musculus, D. melanogaster, C. elegans and S. cereviseae) identified using bioinformatics surveys are presented, and several of them are yet to be functionally characterized. As these proteins are involved in congenital diseases, such a database would be useful in short-listing candidates for gene therapy and drug development. The database can be accessed from http://caps.ncbs.res.in/myosinome.

  4. Expression Atlas update—a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments

    PubMed Central

    Petryszak, Robert; Burdett, Tony; Fiorelli, Benedetto; Fonseca, Nuno A.; Gonzalez-Porta, Mar; Hastings, Emma; Huber, Wolfgang; Jupp, Simon; Keays, Maria; Kryvych, Nataliya; McMurry, Julie; Marioni, John C.; Malone, James; Megy, Karine; Rustici, Gabriella; Tang, Amy Y.; Taubert, Jan; Williams, Eleanor; Mannion, Oliver; Parkinson, Helen E.; Brazma, Alvis

    2014-01-01

    Expression Atlas (http://www.ebi.ac.uk/gxa) is a value-added database providing information about gene, protein and splice variant expression in different cell types, organism parts, developmental stages, diseases and other biological and experimental conditions. The database consists of selected high-quality microarray and RNA-sequencing experiments from ArrayExpress that have been manually curated, annotated with Experimental Factor Ontology terms and processed using standardized microarray and RNA-sequencing analysis methods. The new version of Expression Atlas introduces the concept of ‘baseline’ expression, i.e. gene and splice variant abundance levels in healthy or untreated conditions, such as tissues or cell types. Differential gene expression data benefit from an in-depth curation of experimental intent, resulting in biologically meaningful ‘contrasts’, i.e. instances of differential pairwise comparisons between two sets of biological replicates. Other novel aspects of Expression Atlas are its strict quality control of raw experimental data, up-to-date RNA-sequencing analysis methods, expression data at the level of gene sets, as well as genes and a more powerful search interface designed to maximize the biological value provided to the user. PMID:24304889

  5. A genome-wide gene-expression analysis and database in transgenic mice during development of amyloid or tau pathology.

    PubMed

    Matarin, Mar; Salih, Dervis A; Yasvoina, Marina; Cummings, Damian M; Guelfi, Sebastian; Liu, Wenfei; Nahaboo Solim, Muzammil A; Moens, Thomas G; Paublete, Rocio Moreno; Ali, Shabinah S; Perona, Marina; Desai, Roshni; Smith, Kenneth J; Latcham, Judy; Fulleylove, Michael; Richardson, Jill C; Hardy, John; Edwards, Frances A

    2015-02-03

    We provide microarray data comparing genome-wide differential expression and pathology throughout life in four lines of "amyloid" transgenic mice (mutant human APP, PSEN1, or APP/PSEN1) and "TAU" transgenic mice (mutant human MAPT gene). Microarray data were validated by qPCR and by comparison to human studies, including genome-wide association study (GWAS) hits. Immune gene expression correlated tightly with plaques whereas synaptic genes correlated negatively with neurofibrillary tangles. Network analysis of immune gene modules revealed six hub genes in hippocampus of amyloid mice, four in common with cortex. The hippocampal network in TAU mice was similar except that Trem2 had hub status only in amyloid mice. The cortical network of TAU mice was entirely different with more hub genes and few in common with the other networks, suggesting reasons for specificity of cortical dysfunction in FTDP17. This Resource opens up many areas for investigation. All data are available and searchable at http://www.mouseac.org.

  6. Human mapping databases.

    PubMed

    Talbot, C; Cuticchia, A J

    2001-05-01

    This unit concentrates on the data contained within two human genome databasesGDB (Genome Database) and OMIM (Online Mendelian Inheritance in Man)and includes discussion of different methods for submitting and accessing data. An understanding of electronic mail, FTP, and the use of a World Wide Web (WWW) navigational tool such as Netscape or Internet Explorer is a prerequisite for utilizing the information in this unit.

  7. Significant variance in genetic diversity among populations of Schistosoma haematobium detected using microsatellite DNA loci from a genome-wide database

    PubMed Central

    2013-01-01

    Background Urogenital schistosomiasis caused by Schistosoma haematobium is widely distributed across Africa and is increasingly being targeted for control. Genome sequences and population genetic parameters can give insight into the potential for population- or species-level drug resistance. Microsatellite DNA loci are genetic markers in wide use by Schistosoma researchers, but there are few primers available for S. haematobium. Methods We sequenced 1,058,114 random DNA fragments from clonal cercariae collected from a snail infected with a single Schistosoma haematobium miracidium. We assembled and aligned the S. haematobium sequences to the genomes of S. mansoni and S. japonicum, identifying microsatellite DNA loci across all three species and designing primers to amplify the loci in S. haematobium. To validate our primers, we screened 32 randomly selected primer pairs with population samples of S. haematobium. Results We designed >13,790 primer pairs to amplify unique microsatellite loci in S. haematobium, (available at http://www.cebio.org/projetos/schistosoma-haematobium-genome). The three Schistosoma genomes contained similar overall frequencies of microsatellites, but the frequency and length distributions of specific motifs differed among species. We identified 15 primer pairs that amplified consistently and were easily scored. We genotyped these 15 loci in S. haematobium individuals from six locations: Zanzibar had the highest levels of diversity; Malawi, Mauritius, Nigeria, and Senegal were nearly as diverse; but the sample from South Africa was much less diverse. Conclusions About half of the primers in the database of Schistosoma haematobium microsatellite DNA loci should yield amplifiable and easily scored polymorphic markers, thus providing thousands of potential markers. Sequence conservation among S. haematobium, S. japonicum, and S. mansoni is relatively high, thus it should now be possible to identify markers that are universal among Schistosoma

  8. Previously unknown species of Aspergillus.

    PubMed

    Gautier, M; Normand, A-C; Ranque, S

    2016-08-01

    The use of multi-locus DNA sequence analysis has led to the description of previously unknown 'cryptic' Aspergillus species, whereas classical morphology-based identification of Aspergillus remains limited to the section or species-complex level. The current literature highlights two main features concerning these 'cryptic' Aspergillus species. First, the prevalence of such species in clinical samples is relatively high compared with emergent filamentous fungal taxa such as Mucorales, Scedosporium or Fusarium. Second, it is clearly important to identify these species in the clinical laboratory because of the high frequency of antifungal drug-resistant isolates of such Aspergillus species. Matrix-assisted laser desorption/ionization-time of flight mass spectrometry (MALDI-TOF MS) has recently been shown to enable the identification of filamentous fungi with an accuracy similar to that of DNA sequence-based methods. As MALDI-TOF MS is well suited to the routine clinical laboratory workflow, it facilitates the identification of these 'cryptic' Aspergillus species at the routine mycology bench. The rapid establishment of enhanced filamentous fungi identification facilities will lead to a better understanding of the epidemiology and clinical importance of these emerging Aspergillus species. Based on routine MALDI-TOF MS-based identification results, we provide original insights into the key interpretation issues of a positive Aspergillus culture from a clinical sample. Which ubiquitous species that are frequently isolated from air samples are rarely involved in human invasive disease? Can both the species and the type of biological sample indicate Aspergillus carriage, colonization or infection in a patient? Highly accurate routine filamentous fungi identification is central to enhance the understanding of these previously unknown Aspergillus species, with a vital impact on further improved patient care.

  9. Characteristic clinical features of Aspergillus appendicitis: Case report and literature review.

    PubMed

    Gjeorgjievski, Mihajlo; Amin, Mitual B; Cappell, Mitchell S

    2015-11-28

    This work aims to facilitate diagnosing Aspergillus appendicitis, which can be missed clinically due to its rarity, by proposing a clinical pentad for Aspergillus appendicitis based on literature review and one new case. The currently reported case of pathologically-proven Aspergillus appendicitis was identified by computerized search of pathology database at William Beaumont Hospital, 1999-2014. Prior cases were identified by computerized literature search. Among 10980 pathology reports of pathologically-proven appendicitis, one case of Aspergillus appendicitis was identified (rate = 0.01%). A young boy with profound neutropenia, recent chemotherapy, and acute myelogenous leukemia presented with right lower quadrant pain, pyrexia, and generalized malaise. Abdominal computed tomography scan showed a thickened appendiceal wall and periappendiceal inflammation, suggesting appendicitis. Emergent laparotomy showed an inflamed, thickened appendix, which was resected. The patient did poorly postoperatively with low-grade-fevers while receiving antibacterial therapy, but rapidly improved after initiating amphotericin therapy. Microscopic examination of a silver stain of the appendectomy specimen revealed fungi with characteristic Aspergillus morphology, findings confirmed by immunohistochemistry. Primary Aspergillus appendicitis is exceptionally rare, with only 3 previously reported cases. All three cases presented with (1)-neutropenia, (2)-recent chemotherapy, (3)-acute leukemia, and (4)-suspected appendicitis; (5)-the two prior cases initially treated with antibacterial therapy, fared poorly before instituting anti-Aspergillus therapy. The current patient satisfied all these five criteria. Based on these four cases, a clinical pentad is proposed for Aspergillus appendicitis: clinically-suspected appendicitis, neutropenia, recent chemotherapy, acute leukemia, and poor clinical response if treated solely by antibacterial/anti-candidial therapy. Patients presenting with

  10. Characteristic clinical features of Aspergillus appendicitis: Case report and literature review

    PubMed Central

    Gjeorgjievski, Mihajlo; Amin, Mitual B; Cappell, Mitchell S

    2015-01-01

    This work aims to facilitate diagnosing Aspergillus appendicitis, which can be missed clinically due to its rarity, by proposing a clinical pentad for Aspergillus appendicitis based on literature review and one new case. The currently reported case of pathologically-proven Aspergillus appendicitis was identified by computerized search of pathology database at William Beaumont Hospital, 1999-2014. Prior cases were identified by computerized literature search. Among 10980 pathology reports of pathologically-proven appendicitis, one case of Aspergillus appendicitis was identified (rate = 0.01%). A young boy with profound neutropenia, recent chemotherapy, and acute myelogenous leukemia presented with right lower quadrant pain, pyrexia, and generalized malaise. Abdominal computed tomography scan showed a thickened appendiceal wall and periappendiceal inflammation, suggesting appendicitis. Emergent laparotomy showed an inflamed, thickened appendix, which was resected. The patient did poorly postoperatively with low-grade-fevers while receiving antibacterial therapy, but rapidly improved after initiating amphotericin therapy. Microscopic examination of a silver stain of the appendectomy specimen revealed fungi with characteristic Aspergillus morphology, findings confirmed by immunohistochemistry. Primary Aspergillus appendicitis is exceptionally rare, with only 3 previously reported cases. All three cases presented with (1)-neutropenia, (2)-recent chemotherapy, (3)-acute leukemia, and (4)-suspected appendicitis; (5)-the two prior cases initially treated with antibacterial therapy, fared poorly before instituting anti-Aspergillus therapy. The current patient satisfied all these five criteria. Based on these four cases, a clinical pentad is proposed for Aspergillus appendicitis: clinically-suspected appendicitis, neutropenia, recent chemotherapy, acute leukemia, and poor clinical response if treated solely by antibacterial/anti-candidial therapy. Patients presenting with

  11. RSSsite: a reference database and prediction tool for the identification of cryptic Recombination Signal Sequences in human and murine genomes.

    PubMed

    Merelli, Ivan; Guffanti, Alessandro; Fabbri, Marco; Cocito, Andrea; Furia, Laura; Grazini, Ursula; Bonnal, Raoul J; Milanesi, Luciano; McBlane, Fraser

    2010-07-01

    Recombination signal sequences (RSSs) flanking V, D and J gene segments are recognized and cut by the VDJ recombinase during development of B and T lymphocytes. All RSSs are composed of seven conserved nucleotides, followed by a spacer (containing either 12 +/- 1 or 23 +/- 1 poorly conserved nucleotides) and a conserved nonamer. Errors in V(D)J recombination, including cleavage of cryptic RSS outside the immunoglobulin and T cell receptor loci, are associated with oncogenic translocations observed in some lymphoid malignancies. We present in this paper the RSSsite web server, which is available from the address http://www.itb.cnr.it/rss. RSSsite consists of a web-accessible database, RSSdb, for the identification of pre-computed potential RSSs, and of the related search tool, DnaGrab, which allows the scoring of potential RSSs in user-supplied sequences. This latter algorithm makes use of probability models, which can be recasted to Bayesian network, taking into account correlations between groups of positions of a sequence, developed starting from specific reference sets of RSSs. In validation laboratory experiments, we selected 33 predicted cryptic RSSs (cRSSs) from 11 chromosomal regions outside the immunoglobulin and TCR loci for functional testing.

  12. RegTransBase - A Database Of Regulatory Sequences and Interactionsin a Wide Range of Prokaryotic Genomes

    SciTech Connect

    Kazakov, Alexei E.; Cipriano, Michael J.; Novichkov, Pavel S.; Minovitsky, Simon; Vinogradov, Dmitry V.; Arkin, Adam; Mironov, AndreyA.; Gelfand, Mikhail S.; Dubchak, Inna

    2006-07-01

    RegTransBase, a manually curated database of regulatoryinteractions in prokaryotes, captures the knowledge in publishedscientific literature using a controlled vocabulary. Although a number ofdatabases describing interactions between regulatory proteins and theirbinding sites are currently being maintained, they focus mostly on themodel organisms Escherichia coli and Bacillus subtilis, or are entirelycomputationally derived. RegTransBase describes a large number ofregulatory interactions reported in many organisms and contains varioustypes of experimental data, in particular: the activation or repressionof transcription by an identified direct regulator; determining thetranscriptional regulatory function of a protein (or RNA) directlybinding to DNA (RNA); mapping or prediction of binding site for aregulatory protein; characterization of regulatory mutations. Currently,the RegTransBase content is derived from about 3000 relevant articlesdescribing over 7000 experiments in relation to 128 microbes. It containsdata on the regulation of about 7500 genes and evidence for 6500interactions with 650 regulators. RegTransBase also contains manuallycreated position weight matrices (PWM) that can be used to identifycandidate regulatory sites in over 60 species. RegTransBase is availableat http://regtransbase.lbl.gov.

  13. Aspergillus fumigatus in Poultry

    PubMed Central

    Arné, Pascal; Thierry, Simon; Wang, Dongying; Deville, Manjula; Le Loc'h, Guillaume; Desoutter, Anaïs; Féménia, Françoise; Nieguitsila, Adélaïde; Huang, Weiyi; Chermette, René; Guillot, Jacques

    2011-01-01

    Aspergillus fumigatus remains a major respiratory pathogen in birds. In poultry, infection by A. fumigatus may induce significant economic losses particularly in turkey production. A. fumigatus develops and sporulates easily in poor quality bedding or contaminated feedstuffs in indoor farm environments. Inadequate ventilation and dusty conditions increase the risk of bird exposure to aerosolized spores. Acute cases are seen in young animals following inhalation of spores, causing high morbidity and mortality. The chronic form affects older birds and looks more sporadic. The respiratory tract is the primary site of A. fumigatus development leading to severe respiratory distress and associated granulomatous airsacculitis and pneumonia. Treatments for infected poultry are nonexistent; therefore, prevention is the only way to protect poultry. Development of avian models of aspergillosis may improve our understanding of its pathogenesis, which remains poorly understood. PMID:21826144

  14. Aspergillus antigen skin test (image)

    MedlinePlus

    ... After 48 to 72 hours the site of injection is evaluated by a physician. If a positive reaction occurs (the test site is inflamed), the person has been exposed to the aspergillus mold and is at risk for developing aspergillosis.

  15. Weighted gene co-expression network analysis in identification of metastasis-related genes of lung squamous cell carcinoma based on the Cancer Genome Atlas database

    PubMed Central

    Tian, Feng; Zhao, Jinlong; Kang, Zhenxing

    2017-01-01

    Background Lung squamous cell carcinoma (lung SCC) is a common type of malignancy. Its pathogenesis mechanism of tumor development is unclear. The aim of this study was to identify key genes for diagnosis biomarkers in lung SCC metastasis. Methods We searched and downloaded mRNA expression data and clinical data from The Cancer Genome Atlas (TCGA) database to identify differences in mRNA expression of primary tumor tissues from lung SCC with and without metastasis. Gene co-expression network analysis, protein-protein interaction (PPI) network, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis and quantitative real-time polymerase chain reactions (qRT-PCR) were used to explore the biological functions of the identified dysregulated genes. Results Four hundred and eighty-two differentially expressed genes (DEGs) were identified between lung SCC with and without metastasis. Nineteen modules were identified in lung SCC through weighted gene co-expression network analysis (WGCNA). Twenty-three DEGs and 26 DEGs were significantly enriched in the respective pink and black module. KEGG pathway analysis displayed that 26 DEGs in the black module were significantly enriched in bile secretion pathway. Forty-nine DEGs in the two gene co-expression module were used to construct PPI network. CFTR in the black module was the hub protein, had the connectivity with 182 genes. The results of qRT-PCR displayed that FIGF, SFTPD, DYNLRB2 were significantly down-regulated in the tumor samples of lung SCC with metastasis and CFTR, SCGB3A2, SSTR1, SCTR, ROPN1L had the down-regulation tendency in lung SCC with metastasis compared to lung SCC without metastasis. Conclusions The dysregulated genes including CFTR, SCTR and FIGF might be involved in the pathology of lung SCC metastasis and could be used as potential diagnosis biomarkers or therapeutic targets for lung SCC. PMID:28203405

  16. MDP, a database linking drug response data to genomic information, identifies dasatinib and statins as a combinatorial strategy to inhibit YAP/TAZ in cancer cells.

    PubMed

    Taccioli, Cristian; Sorrentino, Giovanni; Zannini, Alessandro; Caroli, Jimmy; Beneventano, Domenico; Anderlucci, Laura; Lolli, Marco; Bicciato, Silvio; Del Sal, Giannino

    2015-11-17

    Targeted anticancer therapies represent the most effective pharmacological strategies in terms of clinical responses. In this context, genetic alteration of several oncogenes represents an optimal predictor of response to targeted therapy. Integration of large-scale molecular and pharmacological data from cancer cell lines promises to be effective in the discovery of new genetic markers of drug sensitivity and of clinically relevant anticancer compounds. To define novel pharmacogenomic dependencies in cancer, we created the Mutations and Drugs Portal (MDP, http://mdp.unimore.it), a web accessible database that combines the cell-based NCI60 screening of more than 50,000 compounds with genomic data extracted from the Cancer Cell Line Encyclopedia and the NCI60 DTP projects. MDP can be queried for drugs active in cancer cell lines carrying mutations in specific cancer genes or for genetic markers associated to sensitivity or resistance to a given compound. As proof of performance, we interrogated MDP to identify both known and novel pharmacogenomics associations and unveiled an unpredicted combination of two FDA-approved compounds, namely statins and Dasatinib, as an effective strategy to potently inhibit YAP/TAZ in cancer cells.

  17. Databases for Microbiologists

    DOE PAGES

    Zhulin, Igor B.

    2015-05-26

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. Finally, the purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists.

  18. Databases for Microbiologists

    PubMed Central

    2015-01-01

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. The purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists. PMID:26013493

  19. Tracheobronchial Manifestations of Aspergillus Infections

    PubMed Central

    Krenke, Rafal; Grabczak, Elzbieta M.

    2011-01-01

    Human lungs are constantly exposed to a large number of Aspergillus spores which are present in ambient air. These spores are usually harmless to immunocompetent subjects but can produce a symptomatic disease in patients with impaired antifungal defense. In a small percentage of patients, the trachea and bronchi may be the main or even the sole site of Aspergillus infection. The clinical entities that may develop in tracheobronchial location include saprophytic, allergic and invasive diseases. Although this review is focused on invasive Aspergillus tracheobronchial infections, some aspects of allergic and saprophytic tracheobronchial diseases are also discussed in order to present the whole spectrum of tracheobronchial aspergillosis. To be consistent with clinical practice, an approach basing on specific conditions predisposing to invasive Aspergillus tracheobronchial infections is used to present the differences in the clinical course and prognosis of these infections. Thus, invasive or potentially invasive Aspergillus airway diseases are discussed separately in three groups of patients: (1) lung transplant recipients, (2) highly immunocompromised patients with hematologic malignancies and/or patients undergoing hematopoietic stem cell transplantation, and (3) the remaining, less severely immunocompromised patients or even immunocompetent subjects. PMID:22194666

  20. Evaluation of Aspergillus PCR protocols for testing serum specimens.

    PubMed

    White, P Lewis; Mengoli, Carlo; Bretagne, Stéphane; Cuenca-Estrella, Manuel; Finnstrom, Niklas; Klingspor, Lena; Melchers, Willem J G; McCulloch, Elaine; Barnes, Rosemary A; Donnelly, J Peter; Loeffler, Juergen

    2011-11-01

    A panel of human serum samples spiked with various amounts of Aspergillus fumigatus genomic DNA was distributed to 23 centers within the European Aspergillus PCR Initiative to determine analytical performance of PCR. Information regarding specific methodological components and PCR performance was requested. The information provided was made anonymous, and meta-regression analysis was performed to determine any procedural factors that significantly altered PCR performance. Ninety-seven percent of protocols were able to detect a threshold of 10 genomes/ml on at least one occasion, with 83% of protocols reproducibly detecting this concentration. Sensitivity and specificity were 86.1% and 93.6%, respectively. Positive associations between sensitivity and the use of larger sample volumes, an internal control PCR, and PCR targeting the internal transcribed spacer (ITS) region were shown. Negative associations between sensitivity and the use of larger elution volumes (≥100 μl) and PCR targeting the mitochondrial genes were demonstrated. Most Aspergillus PCR protocols used to test serum generate satisfactory analytical performance. Testing serum requires less standardization, and the specific recommendations shown in this article will only improve performance.

  1. Cyclopiazonic Acid Biosynthesis of Aspergillus flavus and Aspergillus oryzae

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Cyclopiazonic acid (CPA) is an indole-tetramic acid neurotoxin produced by some of the same strains of A. flavus that produce aflatoxins and by some Aspergillus oryzae strains. Despite its discovery 40 years ago, few reviews of its toxicity and biosynthesis have been reported. This review examines w...

  2. Maize Genetics and Genomics Database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This report is provided each year to our stakeholders in the maize genetic community. In this report, we describe the five-year plan for MaizeGDB reviewed in early 2008 by the USDA-ARS peer review process and which was developed with inputs from our Working Group and the Allerton 2007 Report (MNL 82...

  3. Identification by Molecular Methods and Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry and Antifungal Susceptibility Profiles of Clinically Significant Rare Aspergillus Species in a Referral Chest Hospital in Delhi, India.

    PubMed

    Masih, Aradhana; Singh, Pradeep K; Kathuria, Shallu; Agarwal, Kshitij; Meis, Jacques F; Chowdhary, Anuradha

    2016-09-01

    Aspergillus species cause a wide spectrum of clinical infections. Although Aspergillus fumigatus and Aspergillus flavus remain the most commonly isolated species in aspergillosis, in the last decade, rare and cryptic Aspergillus species have emerged in diverse clinical settings. The present study analyzed the distribution and in vitro antifungal susceptibility profiles of rare Aspergillus species in clinical samples from patients with suspected aspergillosis in 8 medical centers in India. Further, a matrix-assisted laser desorption ionization-time of flight mass spectrometry in-house database was developed to identify these clinically relevant Aspergillus species. β-Tubulin and calmodulin gene sequencing identified 45 rare Aspergillus isolates to the species level, except for a solitary isolate. They included 23 less common Aspergillus species belonging to 12 sections, mainly in Circumdati, Nidulantes, Flavi, Terrei, Versicolores, Aspergillus, and Nigri Matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) identified only 8 (38%) of the 23 rare Aspergillus isolates to the species level. Following the creation of an in-house database with the remaining 14 species not available in the Bruker database, the MALDI-TOF MS identification rate increased to 95%. Overall, high MICs of ≥2 μg/ml were noted for amphotericin B in 29% of the rare Aspergillus species, followed by voriconazole in 20% and isavuconazole in 7%, whereas MICs of >0.5 μg/ml for posaconazole were observed in 15% of the isolates. Regarding the clinical diagnoses in 45 patients with positive rare Aspergillus species cultures, 19 (42%) were regarded to represent colonization. In the remaining 26 patients, rare Aspergillus species were the etiologic agent of invasive, chronic, and allergic bronchopulmonary aspergillosis, allergic fungal rhinosinusitis, keratitis, and mycetoma.

  4. Identification by Molecular Methods and Matrix-Assisted Laser Desorption Ionization–Time of Flight Mass Spectrometry and Antifungal Susceptibility Profiles of Clinically Significant Rare Aspergillus Species in a Referral Chest Hospital in Delhi, India

    PubMed Central

    Masih, Aradhana; Singh, Pradeep K.; Kathuria, Shallu; Agarwal, Kshitij

    2016-01-01

    Aspergillus species cause a wide spectrum of clinical infections. Although Aspergillus fumigatus and Aspergillus flavus remain the most commonly isolated species in aspergillosis, in the last decade, rare and cryptic Aspergillus species have emerged in diverse clinical settings. The present study analyzed the distribution and in vitro antifungal susceptibility profiles of rare Aspergillus species in clinical samples from patients with suspected aspergillosis in 8 medical centers in India. Further, a matrix-assisted laser desorption ionization–time of flight mass spectrometry in-house database was developed to identify these clinically relevant Aspergillus species. β-Tubulin and calmodulin gene sequencing identified 45 rare Aspergillus isolates to the species level, except for a solitary isolate. They included 23 less common Aspergillus species belonging to 12 sections, mainly in Circumdati, Nidulantes, Flavi, Terrei, Versicolores, Aspergillus, and Nigri. Matrix-assisted laser desorption ionization–time of flight mass spectrometry (MALDI-TOF MS) identified only 8 (38%) of the 23 rare Aspergillus isolates to the species level. Following the creation of an in-house database with the remaining 14 species not available in the Bruker database, the MALDI-TOF MS identification rate increased to 95%. Overall, high MICs of ≥2 μg/ml were noted for amphotericin B in 29% of the rare Aspergillus species, followed by voriconazole in 20% and isavuconazole in 7%, whereas MICs of >0.5 μg/ml for posaconazole were observed in 15% of the isolates. Regarding the clinical diagnoses in 45 patients with positive rare Aspergillus species cultures, 19 (42%) were regarded to represent colonization. In the remaining 26 patients, rare Aspergillus species were the etiologic agent of invasive, chronic, and allergic bronchopulmonary aspergillosis, allergic fungal rhinosinusitis, keratitis, and mycetoma. PMID:27413188

  5. Sexual reproduction in Aspergillus flavus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus is the major producer of carcinogenic aflatoxins in crops worldwide and is also an important opportunistic human pathogen in aspergillosis. The sexual state of this heterothallic fungus is described from crosses between strains of the opposite mating type. Sexual reproduction oc...

  6. Sexual recombination in Aspergillus tubingensis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus tubingensis from section Nigri (Black Aspergilli) is closely related to A. niger and is used extensively in the industrial production of enzymes and organic acids. We recently discovered sexual reproduction in A. tubingensis and in this study, demonstrate that the progeny are products o...

  7. RNA-Seq-Based Transcriptome Analysis of Aflatoxigenic Aspergillus flavus in Response to Water Activity

    PubMed Central

    Zhang, Feng; Guo, Zhenni; Zhong, Hong; Wang, Sen; Yang, Weiqiang; Liu, Yongfeng; Wang, Shihua

    2014-01-01

    Aspergillus flavus is one of the most important producers of carcinogenic aflatoxins in crops, and the effect of water activity (aw) on growth and aflatoxin production of A. flavus has been previously studied. Here we found the strains under 0.93 aw exhibited decreased conidiation and aflatoxin biosynthesis compared to that under 0.99 aw. When RNA-Seq was used to delineate gene expression profile under different water activities, 23,320 non-redundant unigenes, with an average length of 1297 bp, were yielded. By database comparisons, 19,838 unigenes were matched well (e-value < 10−5) with known gene sequences, and another 6767 novel unigenes were obtained by comparison to the current genome annotation of A. flavus. Based on the RPKM equation, 5362 differentially expressed unigenes (with |log2Ratio| ≥ 1) were identified between 0.99 aw and 0.93 aw treatments, including 3156 up-regulated and 2206 down-regulated unigenes, suggesting that A. flavus underwent an extensive transcriptome response during water activity variation. Furthermore, we found that the expression of 16 aflatoxin producing-related genes decreased obviously when water activity decreased, and the expression of 11 development-related genes increased after 0.99 aw treatment. Our data corroborate a model where water activity affects aflatoxin biosynthesis through increasing the expression of aflatoxin producing-related genes and regulating development-related genes. PMID:25421810

  8. Sequence- and Structure-Based Functional Annotation and Assessment of Metabolic Transporters in Aspergillus oryzae: A Representative Case Study

    PubMed Central

    Raethong, Nachon; Wong-ekkabut, Jirasak; Laoteng, Kobkul; Vongsangnak, Wanwipa

    2016-01-01

    Aspergillus oryzae is widely used for the industrial production of enzymes. In A. oryzae metabolism, transporters appear to play crucial roles in controlling the flux of molecules for energy generation, nutrients delivery, and waste elimination in the cell. While the A. oryzae genome sequence is available, transporter annotation remains limited and thus the connectivity of metabolic networks is incomplete. In this study, we developed a metabolic annotation strategy to understand the relationship between the sequence, structure, and function for annotation of A. oryzae metabolic transporters. Sequence-based analysis with manual curation showed that 58 genes of 12,096 total genes in the A. oryzae genome encoded metabolic transporters. Under consensus integrative databases, 55 unambiguous metabolic transporter genes were distributed into channels and pores (7 genes), electrochemical potential-driven transporters (33 genes), and primary active transporters (15 genes). To reveal the transporter functional role, a combination of homology modeling and molecular dynamics simulation was implemented to assess the relationship between sequence to structure and structure to function. As in the energy metabolism of A. oryzae, the H+-ATPase encoded by the AO090005000842 gene was selected as a representative case study of multilevel linkage annotation. Our developed strategy can be used for enhancing metabolic network reconstruction. PMID:27274991

  9. MIPS plant genome information resources.

    PubMed

    Spannagl, Manuel; Haberer, Georg; Ernst, Rebecca; Schoof, Heiko; Mayer, Klaus F X

    2007-01-01

    The Munich Institute for Protein Sequences (MIPS) has been involved in maintaining plant genome databases since the Arabidopsis thaliana genome project. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable data sets for model plant genomes as a backbone against which experimental data, for example from high-throughput functional genomics, can be organized and evaluated. In addition, model genomes also form a scaffold for comparative genomics, and much can be learned from genome-wide evolutionary studies.

  10. Overexpression of Aspergillus tubingensis faeA in protease-deficient Aspergillus niger enables ferulic acid production from plant material.

    PubMed

    Zwane, Eunice N; Rose, Shaunita H; van Zyl, Willem H; Rumbold, Karl; Viljoen-Bloom, Marinda

    2014-06-01

    The production of ferulic acid esterase involved in the release of ferulic acid side groups from xylan was investigated in strains of Aspergillus tubingensis, Aspergillus carneus, Aspergillus niger and Rhizopus oryzae. The highest activity on triticale bran as sole carbon source was observed with the A. tubingensis T8.4 strain, which produced a type A ferulic acid esterase active against methyl p-coumarate, methyl ferulate and methyl sinapate. The activity of the A. tubingensis ferulic acid esterase (AtFAEA) was inhibited twofold by glucose and induced twofold in the presence of maize bran. An initial accumulation of endoglucanase was followed by the production of endoxylanase, suggesting a combined action with ferulic acid esterase on maize bran. A genomic copy of the A. tubingensis faeA gene was cloned and expressed in A. niger D15#26 under the control of the A. niger gpd promoter. The recombinant strain has reduced protease activity and does not acidify the media, therefore promoting high-level expression of recombinant enzymes. It produced 13.5 U/ml FAEA after 5 days on autoclaved maize bran as sole carbon source, which was threefold higher than for the A. tubingensis donor strain. The recombinant AtFAEA was able to extract 50 % of the available ferulic acid from non-pretreated maize bran, making this enzyme suitable for the biological production of ferulic acid from lignocellulosic plant material.

  11. Atypical Aspergillus parasiticus isolates from pistachio with aflR gene nucleotide insertion identical to Aspergillus sojae

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxins are the most toxic and carcinogenic secondary metabolites produced primarily by the filamentous fungi Aspergillus flavus and Aspergillus parasiticus. The toxins cause devastating economic losses because of strict regulations on distribution of contaminated products. Aspergillus sojae are...

  12. Data-based Reconstruction of Gene Regulatory Networks of Fungal Pathogens

    PubMed Central

    Guthke, Reinhard; Gerber, Silvia; Conrad, Theresia; Vlaic, Sebastian; Durmuş, Saliha; Çakır, Tunahan; Sevilgen, F. E.; Shelest, Ekaterina; Linde, Jörg

    2016-01-01

    In the emerging field of systems biology of fungal infection, one of the central roles belongs to the modeling of gene regulatory networks (GRNs). Utilizing omics-data, GRNs can be predicted by mathematical modeling. Here, we review current advances of data-based reconstruction of both small-scale and large-scale GRNs for human pathogenic fungi. The advantage of large-scale genome-wide modeling is the possibility to predict central (hub) genes and thereby indicate potential biomarkers and drug targets. In contrast, small-scale GRN models provide hypotheses on the mode of gene regulatory interactions, which have to be validated experimentally. Due to the lack of sufficient quantity and quality of both experimental data and prior knowledge about regulator–target gene relations, the genome-wide modeling still remains problematic for fungal pathogens. While a first genome-wide GRN model has already been published for Candida albicans, the feasibility of such modeling for Aspergillus fumigatus is evaluated in the present article. Based on this evaluation, opinions are drawn on future directions of GRN modeling of fungal pathogens. The crucial point of genome-wide GRN modeling is the experimental evidence, both used for inferring the networks (omics ‘first-hand’ data as well as literature data used as prior knowledge) and for validation and evaluation of the inferred network models. PMID:27148247

  13. Genome-wide transcriptome analysis of Aspergillus fumigatus exposed to osmotic stress reveals regulators of osmotic and cell wall stresses that are SakA(HOG1) and MpkC dependent.

    PubMed

    Pereira Silva, Lilian; Alves de Castro, Patrícia; Dos Reis, Thaila Fernanda; Paziani, Mario Henrique; Von Zeska Kress, Márcia Regina; Riaño-Pachón, Diego M; Hagiwara, Daisuke; Ries, Laure N A; Brown, Neil Andrew; Goldman, Gustavo H

    2017-04-01

    Invasive aspergillosis is predominantly caused by Aspergillus fumigatus, and adaptations to stresses experienced within the human host are a prerequisite for the survival and virulence strategies of the pathogen. The central signal transduction pathway operating during hyperosmotic stress is the high osmolarity glycerol mitogen-activated protein kinase cascade. A. fumigatus MpkC and SakA, orthologues of the Saccharomyces cerevisiae Hog1p, constitute the primary regulator of the hyperosmotic stress response. We compared A. fumigatus wild-type transcriptional response to osmotic stress with the ΔmpkC, ΔsakA, and ΔmpkC ΔsakA strains. Our results strongly indicate that MpkC and SakA have independent and collaborative functions during the transcriptional response to transient osmotic stress. We have identified and characterized null mutants for four A. fumigatus basic leucine zipper proteins transcription factors. The atfA and atfB have comparable expression levels with the wild-type in ΔmpkC but are repressed in ΔsakA and ΔmpkC ΔsakA post-osmotic stress. The atfC and atfD have reduced expression levels in all mutants post-osmotic stress. The atfA-D null mutants displayed several phenotypes related to osmotic, oxidative, and cell wall stresses. The ΔatfA and ΔatfB were shown to be avirulent and to have attenuated virulence, respectively, in both Galleria mellonella and a neutropenic murine model of invasive pulmonary aspergillosis.

  14. Generation, annotation, and analysis of an extensive Aspergillus niger EST collection

    PubMed Central

    Semova, Natalia; Storms, Reginald; John, Tricia; Gaudet, Pascale; Ulycznyj, Peter; Min, Xiang Jia; Sun, Jian; Butler, Greg; Tsang, Adrian

    2006-01-01

    Background Aspergillus niger, a saprophyte commonly found on decaying vegetation, is widely used and studied for industrial purposes. Despite its place as one of the most important organisms for commercial applications, the lack of available information about its genetic makeup limits research with this filamentous fungus. Results We present here the analysis of 12,820 expressed sequence tags (ESTs) generated from A. niger cultured under seven different growth conditions. These ESTs identify about 5,108 genes of which 44.5% code for proteins sharing similarity (E ≤ 1e -5) with GenBank entries of known function, 38% code for proteins that only share similarity with GenBank entries of unknown function and 17.5% encode proteins that do not have a GenBank homolog. Using the Gene Ontology hierarchy, we present a first classification of the A. niger proteins encoded by these genes and compare its protein repertoire with other well-studied fungal species. We have established a searchable web-based database that includes the EST and derived contig sequences and their annotation. Details about this project and access to the annotated A. niger database are available. Conclusion This EST collection and its annotation provide a significant resource for fundamental and applied research with A. niger. The gene set identified in this manuscript will be highly useful in the annotation of the genome sequence of A. niger, the genes described in the manuscript, especially those encoding hydrolytic enzymes will provide a valuable source for researchers interested in enzyme properties and applications. PMID:16457709

  15. Investigating core genetic-and-epigenetic cell cycle networks for stemness and carcinogenic mechanisms, and cancer drug design using big database mining and genome-wide next-generation sequencing data.

    PubMed

    Li, Cheng-Wei; Chen, Bor-Sen

    2016-10-01

    Recent studies have demonstrated that cell cycle plays a central role in development and carcinogenesis. Thus, the use of big databases and genome-wide high-throughput data to unravel the genetic and epigenetic mechanisms underlying cell cycle progression in stem cells and cancer cells is a matter of considerable interest. Real genetic-and-epigenetic cell cycle networks (GECNs) of embryonic stem cells (ESCs) and HeLa cancer cells were constructed by applying system modeling, system identification, and big database mining to genome-wide next-generation sequencing data. Real GECNs were then reduced to core GECNs of HeLa cells and ESCs by applying principal genome-wide network projection. In this study, we investigated potential carcinogenic and stemness mechanisms for systems cancer drug design by identifying common core and specific GECNs between HeLa cells and ESCs. Integrating drug database information with the specific GECNs of HeLa cells could lead to identification of multiple drugs for cervical cancer treatment with minimal side-effects on the genes in the common core. We found that dysregulation of miR-29C, miR-34A, miR-98, and miR-215; and methylation of ANKRD1, ARID5B, CDCA2, PIF1, STAMBPL1, TROAP, ZNF165, and HIST1H2AJ in HeLa cells could result in cell proliferation and anti-apoptosis through NFκB, TGF-β, and PI3K pathways. We also identified 3 drugs, methotrexate, quercetin, and mimosine, which repressed the activated cell cycle genes, ARID5B, STK17B, and CCL2, in HeLa cells with minimal side-effects.

  16. Discovery of a novel superfamily of type III polyketide synthases in Aspergillus oryzae.

    PubMed

    Seshime, Yasuyo; Juvvadi, Praveen Rao; Fujii, Isao; Kitamoto, Katsuhiko

    2005-05-27

    Identification of genes encoding type III polyketide synthase (PKS) superfamily members in the industrially useful filamentous fungus, Aspergillus oryzae, revealed that their distribution is not specific to plants or bacteria. Among other Aspergilli (Aspergillus nidulans and Aspergillus fumigatus), A. oryzae was unique in possessing four chalcone synthase (CHS)-like genes (csyA, csyB, csyC, and csyD). Expression of csyA, csyB, and csyD genes was confirmed by RT-PCR. Comparative genome analyses revealed single putative type III PKS in Neurospora crassa and Fusarium graminearum, two each in Magnaporthe grisea and Podospora anserina, and three in Phenarocheate chrysosporium, with a phylogenic distinction from bacteria and plants. Conservation of catalytic residues in the CHSs across species implicated enzymatically active nature of these newly discovered homologs.

  17. Ecophysiological characterization of Aspergillus carbonarius, Aspergillus tubingensis and Aspergillus niger isolated from grapes in Spanish vineyards.

    PubMed

    García-Cela, E; Crespo-Sempere, A; Ramos, A J; Sanchis, V; Marin, S

    2014-03-03

    The aim of this study was to evaluate the diversity of black aspergilli isolated from berries from different agroclimatic regions of Spain. Growth characterization (in terms of temperature and water activity requirements) of Aspergillus carbonarius, Aspergillus tubingensis and Aspergillus niger was carried out on synthetic grape medium. A. tubingensis and A. niger showed higher maximum temperatures for growth (>45 °C versus 40-42 °C), and lower minimum aw requirements (0.83 aw versus 0.87 aw) than A. carbonarius. No differences in growth boundaries due to their geographical origin were found within A. niger aggregate isolates. Conversely, A. carbonarius isolates from the hotter and drier region grew and produced OTA at lower aw than other isolates. However, little genetic diversity in A. carbonarius was observed for the microsatellites tested and the same sequence of β-tubulin gene was observed; therefore intraspecific variability did not correlate with the geographical origin of the isolates or with their ability to produce OTA. Climatic change prediction points to drier and hotter climatic scenarios where A. tubingensis and A. niger could be even more prevalent over A. carbonarius, since they are better adapted to extreme high temperature and drier conditions.

  18. Genetics of Polyketide Metabolism in Aspergillus nidulans

    PubMed Central

    Klejnstrup, Marie L.; Frandsen, Rasmus J. N.; Holm, Dorte K.; Nielsen, Morten T.; Mortensen, Uffe H.; Larsen, Thomas O.; Nielsen, Jakob B.

    2012-01-01

    Secondary metabolites are small molecules that show large structural diversity and a broad range of bioactivities. Some metabolites are attractive as drugs or pigments while others act as harmful mycotoxins. Filamentous fungi have the capacity to produce a wide array of secondary metabolites including polyketides. The majority of genes required for production of these metabolites are mostly organized in gene clusters, which often are silent or barely expressed under laboratory conditions, making discovery and analysis difficult. Fortunately, the genome sequences of several filamentous fungi are publicly available, greatly facilitating the establishment of links between genes and metabolites. This review covers the attempts being made to trigger the activation of polyketide metabolism in the fungal model organism Aspergillus nidulans. Moreover, it will provide an overview of the pathways where ten polyketide synthase genes have been coupled to polyketide products. Therefore, the proposed biosynthesis of the following metabolites will be presented; naphthopyrone, sterigmatocystin, aspyridones, emericellamides, asperthecin, asperfuranone, monodictyphenone/emodin, orsellinic acid, and the austinols. PMID:24957370

  19. Receptor-mediated signaling in Aspergillus fumigatus

    PubMed Central

    Grice, C. M.; Bertuzzi, M.; Bignell, E. M.

    2013-01-01

    Aspergillus fumigatus is the most pathogenic species among the Aspergilli, and the major fungal agent of human pulmonary infection. To prosper in diverse ecological niches, Aspergilli have evolved numerous mechanisms for adaptive gene regulation, some of which are also crucial for mammalian infection. Among the molecules which govern such responses, integral membrane receptors are thought to be the most amenable to therapeutic modulation. This is due to the localization of these molecular sensors at the periphery of the fungal cell, and to the prevalence of small molecules and licensed drugs which target receptor-mediated signaling in higher eukaryotic cells. In this review we highlight the progress made in characterizing receptor-mediated environmental adaptation in A. fumigatus and its relevance for pathogenicity in mammals. By presenting a first genomic survey of integral membrane proteins in this organism, we highlight an abundance of putative seven transmembrane domain (7TMD) receptors, the majority of which remain uncharacterized. Given the dependency of A. fumigatus upon stress adaptation for colonization and infection of mammalian hosts, and the merits of targeting receptor-mediated signaling as an antifungal strategy, a closer scrutiny of sensory perception and signal transduction in this organism is warranted. PMID:23430083

  20. Biological databases for human research.

    PubMed

    Zou, Dong; Ma, Lina; Yu, Jun; Zhang, Zhang

    2015-02-01

    The completion of the Human Genome Project lays a foundation for systematically studying the human genome from evolutionary history to precision medicine against diseases. With the explosive growth of biological data, there is an increasing number of biological databases that have been developed in aid of human-related research. Here we present a collection of human-related biological databases and provide a mini-review by classifying them into different categories according to their data types. As human-related databases continue to grow not only in count but also in volume, challenges are ahead in big data storage, processing, exchange and curation.

  1. Biofuel Database

    National Institute of Standards and Technology Data Gateway

    Biofuel Database (Web, free access)   This database brings together structural, biological, and thermodynamic data for enzymes that are either in current use or are being considered for use in the production of biofuels.

  2. LAMP-PCR detection of ochratoxigenic Aspergillus species collected from peanut kernel.

    PubMed

    Al-Sheikh, H M

    2015-01-30

    Over the last decade, ochratoxin A (OTA) has been widely described and is ubiquitous in several agricultural products. Ochratoxins represent the second-most important mycotoxin group after aflatoxins. A total of 34 samples were surveyed from 3 locations, including Mecca, Madina, and Riyadh, Saudi Arabia, during 2012. Fungal contamination frequency was determined for surface-sterilized peanut seeds, which were seeded onto malt extract agar media. Aspergillus niger (35%), Aspergillus ochraceus (30%), and Aspergillus carbonarius (25%) were the most frequently observed Aspergillius species, while Aspergillus flavus and Aspergillus phoenicis isolates were only infrequently recovered and in small numbers (10%). OTA production was evaluated on yeast extract sucrose medium, which revealed that 57% of the isolates were A. niger and 60% of A. carbonarius isolates were OTA producers; 100% belonged to A. ochraceus. Only one isolate, morphologically identified as A. carbonarius, and 3 A. niger isolates unstably produced OTA. A polymerase chain reaction (PCR)-based identification and detection assay was used to identify A. ochraceus isolates. Using the primer sets OCRA1/OCRA2, 400-base pair PCR fragments were produced only when genomic DNA from A. ochraceus isolates was used. Recently, the loop-mediated isothermal amplification assay using recombinase polymerase amplification chemistry was used for A. carbonarius and A. niger DNA identification. As a non-gel-based technique, the amplification product was directly visualized in the reaction tube after adding calcein for naked-eye examination.

  3. Database Administrator

    ERIC Educational Resources Information Center

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…

  4. Development of RFLP-PCR method for the identification of medically important Aspergillus species using single restriction enzyme MwoI

    PubMed Central

    Diba, K.; Mirhendi, H.; Kordbacheh, P.; Rezaie, S.

    2014-01-01

    In this study we attempted to modify the PCR-RFLP method using restriction enzyme MwoI for the identification of medically important Aspergillus species. Our subjects included nine standard Aspergillus species and 205 Aspergillus isolates of approved hospital acquired infections and hospital indoor sources. First of all, Aspergillus isolates were identified in the level of species by using morphologic method. A twenty four hours culture was performed for each isolates to harvest Aspergillus mycelia and then genomic DNA was extracted using Phenol-Chloroform method. PCR-RFLP using single restriction enzyme MwoI was performed in ITS regions of rDNA gene. The electrophoresis data were analyzed and compared with those of morphologic identifications. Total of 205 Aspergillus isolates included 153 (75%) environmental and 52 (25%) clinical isolates. A. flavus was the most frequently isolate in our study (55%), followed by A. niger 65(31.7%), A. fumigatus 18(8.7%), A. nidulans and A. parasiticus 2(1% each). MwoI enabled us to discriminate eight medically important Aspergillus species including A. fumigatus, A. niger, A. flavus as the most common isolated species. PCR-RFLP method using the restriction enzyme MwoI is a rapid and reliable test for identification of at least the most medically important Aspergillus species. PMID:25242934

  5. The TIGR Maize Database.

    PubMed

    Chan, Agnes P; Pertea, Geo; Cheung, Foo; Lee, Dan; Zheng, Li; Whitelaw, Cathy; Pontaroli, Ana C; SanMiguel, Phillip; Yuan, Yinan; Bennetzen, Jeffrey; Barbazuk, William Brad; Quackenbush, John; Rabinowicz, Pablo D

    2006-01-01

    Maize is a staple crop of the grass family and also an excellent model for plant genetics. Owing to the large size and repetitiveness of its genome, we previously investigated two approaches to accelerate gene discovery and genome analysis in maize: methylation filtration and high C(0)t selection. These techniques allow the construction of gene-enriched genomic libraries by minimizing repeat sequences due to either their methylation status or their copy number, yielding a 7-fold enrichment in genic sequences relative to a random genomic library. Approximately 900,000 gene-enriched reads from maize were generated and clustered into Assembled Zea mays (AZM) sequences. Here we report the current AZM release, which consists of approximately 298 Mb representing 243,807 sequence assemblies and singletons. In order to provide a repository of publicly available maize genomic sequences, we have created the TIGR Maize Database (http://maize.tigr.org). In this resource, we have assembled and annotated the AZMs and used available sequenced markers to anchor AZMs to maize chromosomes. We have constructed a maize repeat database and generated draft sequence assemblies of 287 maize bacterial artificial chromosome (BAC) clone sequences, which we annotated along with 172 additional publicly available BAC clones. All sequences, assemblies and annotations are available at the project website via web interfaces and FTP downloads.

  6. Aspergillus Osteomyelitis of the Skull.

    PubMed

    Nicholson, Simon; King, Richard; Chumas, Paul; Russell, John; Liddington, Mark

    2016-07-01

    Osteomyelitis of the craniofacial skeleton is rare, with fungal pathogens least commonly implicated. The authors present 2 patients of osteomyelitis of the skull caused by Aspergillus spp. and discuss the diagnosis, clinicopathological course, and management strategies.Late recurrence seen in this type of infection warrants long-term follow-up and a high index of suspicion for the clinical signs associated with recurrence.Such patients would benefit from their surgical debridement being planned and managed via a specialist craniofacial unit, so as to utilize the most aesthetically sensitive approach and the experience of specialists from several surgical disciplines.

  7. Two novel species of Aspergillus section Nigri from indoor air

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus collinsii, Aspergillus floridensis, and Aspergillus trinidadensis are described as novel uniseriate species of Aspergillus section Nigri isolated from air samples. To describe the species we used phenotypes from 7-d Czapek yeast extract agar culture (CYA) and malt extract agar culture (M...

  8. Deeper insight into the structure of the anaerobic digestion microbial community; the biogas microbiome database is expanded with 157 new genomes.

    PubMed

    Treu, Laura; Kougias, Panagiotis G; Campanaro, Stefano; Bassani, Ilaria; Angelidaki, Irini

    2016-09-01

    This research aimed to better characterize the biogas microbiome by means of high throughput metagenomic sequencing and to elucidate the core microbial consortium existing in biogas reactors independently from the operational conditions. Assembly of shotgun reads followed by an established binning strategy resulted in the highest, up to now, extraction of microbial genomes involved in biogas producing systems. From the 236 extracted genome bins, it was remarkably found that the vast majority of them could only be characterized at high taxonomic levels. This result confirms that the biogas microbiome is comprised by a consortium of unknown species. A comparative analysis between the genome bins of the current study and those extracted from a previous metagenomic assembly demonstrated a similar phylogenetic distribution of the main taxa. Finally, this analysis led to the identification of a subset of common microbes that could be considered as the core essential group in biogas production.

  9. Structural and functional analysis of the nor-1 gene involved in the biosynthesis of aflatoxins by Aspergillus parasiticus.

    PubMed Central

    Trail, F; Chang, P K; Cary, J; Linz, J E

    1994-01-01

    The nor-1 gene was cloned previously by complementation of a mutation (nor-1) in Aspergillus parasiticus SU-1 which blocked aflatoxin B1 biosynthesis, resulting in the accumulation of norsolorinic acid (NA). In this study, the nucleotide sequences of the cDNA and genomic DNA clones encompassing the coding region of the nor-1 gene were determined. The transcription initiation and polyadenylation sites of nor-1 were located by primer extension and RNase protection analyses and by comparison of the nucleotide sequences of the nor-1 genomic and cDNA clones. A plasmid, pNA51-82, was created for one-step disruption of the nor-1 gene by inserting a functional copy of the nitrate reductase (niaD) gene from A. parasiticus into the coding region of the nor-1 gene. Transformation of A. parasiticus NR-3 (niaD Afl+) with pNA51-82 resulted in niaD+ transformants that accumulated NA and produced reduced levels of aflatoxin as determined by thin-layer chromatography and enzyme-linked immunosorbent assay analyses of extracts from mycelia and the growth medium. Southern analysis of genomic DNA isolated from the NA-accumulating transformants indicated that the wild-type nor-1 gene in the chromosome had been replaced by the nonfunctional allele carried on pNA51-82. This recombinational inactivation event provides direct evidence that the nor-1 gene is functionally involved in aflatoxin biosynthesis. Comparison of the predicted nor-1 amino acid sequence with sequences in the GenBank and EMBL databases suggested that the protein is a member of the family of short-chain alcohol dehydrogenases, consistent with its proposed function as a keto reductase. Images PMID:7993094

  10. Protein Model Database

    SciTech Connect

    Fidelis, K; Adzhubej, A; Kryshtafovych, A; Daniluk, P

    2005-02-23

    The phenomenal success of the genome sequencing projects reveals the power of completeness in revolutionizing biological science. Currently it is possible to sequence entire organisms at a time, allowing for a systemic rather than fractional view of their organization and the various genome-encoded functions. There is an international plan to move towards a similar goal in the area of protein structure. This will not be achieved by experiment alone, but rather by a combination of efforts in crystallography, NMR spectroscopy, and computational modeling. Only a small fraction of structures are expected to be identified experimentally, the remainder to be modeled. Presently there is no organized infrastructure to critically evaluate and present these data to the biological community. The goal of the Protein Model Database project is to create such infrastructure, including (1) public database of theoretically derived protein structures; (2) reliable annotation of protein model quality, (3) novel structure analysis tools, and (4) access to the highest quality modeling techniques available.

  11. Beyond aflatoxin: four distinct expression patterns and functional roles associated with Aspergillus flavus secondary metabolism gene clusters

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Species of Aspergillus produce a diverse array of secondary metabolites, and recent genomic analysis has predicted that these species have the capacity to synthesize many more compounds. It has been possible to infer the presence of 55 gene clusters associated with secondary metabolism in Aspergill...

  12. Comparison of expression of secondary metabolite biosynthesis cluster genes in Aspergillus flavus, A. parasiticus, and A. oryzae

    Technology Transfer Automated Retrieval System (TEKTRAN)

    More than 55 secondary metabolite biosynthesis gene clusters are predicted to be present in the Aspergillus flavus genome. In spite of this the biosynthesis of only a few metabolites, such as the aflatoxin, cyclopiazonic acid and aflatrem, has been correlated with a particular gene cluster. Using RN...

  13. Diversity of Aspergillus oryzae genotypes (RFLP) isolated from traditional soy sauce production within Malaysia and Southeast Asia

    Technology Transfer Automated Retrieval System (TEKTRAN)

    DNA fingerprinting was performed on 64 strains of Aspergillus oryzae and one strain of A. sojae isolated from soysauce factories within Malaysia and Southeast Asia that use primitive traditional methods in producing 'tamari type' Cantonese soy sauce. PstI digests of total genomic DNA from each isol...

  14. Aspergillus coronary embolization causing acute myocardial infarction.

    PubMed

    Laszewski, M; Trigg, M; de Alarcon, P; Giller, R

    1988-05-01

    An increased frequency of disseminated aspergillosis has been observed in the last decade, mostly occurring in immunocompromised patients including the bone marrow transplant population. Cardiac involvement by Aspergillus remains rare. We report the clinical and postmortem findings of an unusual case of Aspergillus pancarditis in a 7-year-old bone marrow transplant patient with Aspergillus embolization to the coronary arteries leading to a massive acute myocardial infarction. This case suggests that myocardial injury secondary to disseminated aspergillosis should be included in the differential diagnosis of chest pain in the immunocompromised pediatric patient.

  15. ord1, an oxidoreductase gene responsible for conversion of O-methylsterigmatocystin to aflatoxin in Aspergillus flavus.

    PubMed Central

    Prieto, R; Woloshuk, C P

    1997-01-01

    Among the enzymatic steps in the aflatoxin biosynthetic pathway, the conversion of O-methylsterigmatocystin to aflatoxin has been proposed to be catalyzed by an oxidoreductase. Transformants of Aspergillus flavus 649WAF2 containing a 3.3-kb genomic DNA fragment and the aflatoxin biosynthesis regulatory gene aflR converted exogenously supplied O-methylsterigmatocystin to aflatoxin B1. A gene, ord1, corresponding to a transcript of about 2 kb was identified within the 3.3-kb DNA fragment. The promoter region presented a putative AFLR binding site and a TATA sequence. The nucleotide sequence of the gene revealed an open reading frame encoding a protein of 528 amino acids with a deduced molecular mass of 60.2 kDa. The gene contained six introns and seven exons. Heterologous expression of the ord1 open reading frame under the transcriptional control of the Saccharomyces cerevisiae galactose-inducible gal1 promoter results in the ability to convert O-methylsterigmatocystin to aflatoxin B1. The data indicate that ord1 is sufficient to accomplish the last step of the aflatoxin biosynthetic pathway. A search of various databases for similarity indicated that ord1 encodes a cytochrome P-450-type monooxygenase, and the gene has been assigned to a new P-450 gene family named CYP64. PMID:9143099

  16. Genetic diversity of Aspergillus species isolated from onychomycosis and Aspergillus hongkongensis sp. nov., with implications to antifungal susceptibility testing.

    PubMed

    Tsang, Chi-Ching; Hui, Teresa W S; Lee, Kim-Chung; Chen, Jonathan H K; Ngan, Antonio H Y; Tam, Emily W T; Chan, Jasper F W; Wu, Andrea L; Cheung, Mei; Tse, Brian P H; Wu, Alan K L; Lai, Christopher K C; Tsang, Dominic N C; Que, Tak-Lun; Lam, Ching-Wan; Yuen, Kwok-Yung; Lau, Susanna K P; Woo, Patrick C Y

    2016-02-01

    Thirteen Aspergillus isolates recovered from nails of 13 patients (fingernails, n=2; toenails, n=11) with onychomycosis were characterized. Twelve strains were identified by multilocus sequencing as Aspergillus spp. (Aspergillus sydowii [n=4], Aspergillus welwitschiae [n=3], Aspergillus terreus [n=2], Aspergillus flavus [n=1], Aspergillus tubingensis [n=1], and Aspergillus unguis [n=1]). Isolates of A. terreus, A. flavus, and A. unguis were also identifiable by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. The 13th isolate (HKU49(T)) possessed unique morphological characteristics different from other Aspergillus spp. Molecular characterization also unambiguously showed that HKU49(T) was distinct from other Aspergillus spp. We propose the novel species Aspergillus hongkongensis to describe this previously unknown fungus. Antifungal susceptibility testing showed most Aspergillus isolates had low MICs against itraconazole and voriconazole, but all Aspergillus isolates had high MICs against fluconazole. A diverse spectrum of Aspergillus species is associated with onychomycosis. Itraconazole and voriconazole are probably better drug options for Aspergillus onychomycosis.

  17. Mining a database of single amplified genomes from Red Sea brine pool extremophiles-improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA).

    PubMed

    Grötzinger, Stefan W; Alam, Intikhab; Ba Alawi, Wail; Bajic, Vladimir B; Stingl, Ulrich; Eppinger, Jörg

    2014-01-01

    Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile's genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available

  18. Mining a database of single amplified genomes from Red Sea brine pool extremophiles—improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA)

    PubMed Central

    Grötzinger, Stefan W.; Alam, Intikhab; Ba Alawi, Wail; Bajic, Vladimir B.; Stingl, Ulrich; Eppinger, Jörg

    2014-01-01

    Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile's genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available

  19. Image Databases.

    ERIC Educational Resources Information Center

    Pettersson, Rune

    Different kinds of pictorial databases are described with respect to aims, user groups, search possibilities, storage, and distribution. Some specific examples are given for databases used for the following purposes: (1) labor markets for artists; (2) document management; (3) telling a story; (4) preservation (archives and museums); (5) research;…

  20. Aspergillus Infections in Transplant Recipients

    PubMed Central

    Singh, Nina; Paterson, David L.

    2005-01-01

    Aspergillus infections are occurring with an increasing frequency in transplant recipients. Notable changes in the epidemiologic characteristics of this infection have occurred; these include a change in risk factors and later onset of infection. Management of invasive aspergillosis continues to be challenging, and the mortality rate, despite the use of newer antifungal agents, remains unacceptably high. Performing molecular studies to discern new targets for antifungal activity, identifying signaling pathways that may be amenable to immunologic interventions, assessing combination regimens of antifungal agents or combining antifungal agents with modulation of the host defense mechanisms, and devising diagnostic assays that can rapidly and reliably diagnose infections represent areas for future investigations that may lead to further improvement in outcomes. PMID:15653818

  1. KdmB, a Jumonji Histone H3 Demethylase, Regulates Genome-Wide H3K4 Trimethylation and Is Required for Normal Induction of Secondary Metabolism in Aspergillus nidulans

    PubMed Central

    Gacek-Matthews, Agnieszka; Sasaki, Takahiko; Wittstein, Kathrin; Gruber, Clemens; Strauss, Joseph

    2016-01-01

    Histone posttranslational modifications (HPTMs) are involved in chromatin-based regulation of fungal secondary metabolite biosynthesis (SMB) in which the corresponding genes—usually physically linked in co-regulated clusters—are silenced under optimal physiological conditions (nutrient-rich) but are activated when nutrients are limiting. The exact molecular mechanisms by which HPTMs influence silencing and activation, however, are still to be better understood. Here we show by a combined approach of quantitative mass spectrometry (LC-MS/MS), genome-wide chromatin immunoprecipitation (ChIP-seq) and transcriptional network analysis (RNA-seq) that the core regions of silent A. nidulans SM clusters generally carry low levels of all tested chromatin modifications and that heterochromatic marks flank most of these SM clusters. During secondary metabolism, histone marks typically associated with transcriptional activity such as H3 trimethylated at lysine-4 (H3K4me3) are established in some, but not all gene clusters even upon full activation. KdmB, a Jarid1-family histone H3 lysine demethylase predicted to comprise a BRIGHT domain, a zinc-finger and two PHD domains in addition to the catalytic Jumonji domain, targets and demethylates H3K4me3 in vivo and mediates transcriptional downregulation. Deletion of kdmB leads to increased transcription of about ~1750 genes across nutrient-rich (primary metabolism) and nutrient-limiting (secondary metabolism) conditions. Unexpectedly, an equally high number of genes exhibited reduced expression in the kdmB deletion strain and notably, this group was significantly enriched for genes with known or predicted functions in secondary metabolite biosynthesis. Taken together, this study extends our general knowledge about multi-domain KDM5 histone demethylases and provides new details on the chromatin-level regulation of fungal secondary metabolite production. PMID:27548260

  2. Three new species of Aspergillus section Flavi isolated from almonds and maize in Portugal

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Three new aflatoxin-producing species belonging to Aspergillus section Flavi are described, Aspergillus mottae, Aspergillus sergii and Aspergillus transmontanensis. These species were isolated from Portuguese almonds and maize. An investigation examining morphology, extrolites and molecular data was...

  3. pgaA and pgaB encode two constitutively expressed endopolygalacturonases of Aspergillus niger.

    PubMed Central

    Parenicová, L; Benen, J A; Kester, H C; Visser, J

    2000-01-01

    The nucleotide sequence data for pgaA and pgaB have been deposited with the EMBL, GenBank and DDBJ Databases under accession numbers Y18804 and Y18805 respectively. pgaA and pgaB, two genes encoding endopolygalacturonases (PGs, EC 3.2.1.15) A and B, were isolated from a phage genomic library of Aspergillus niger N400. The 1167 bp protein coding region of the pgaA gene is interrupted by one intron, whereas the 1234 bp coding region of the pgaB gene contains two introns. The corresponding proteins, PGA and PGB, consist of 370 and 362 amino acid residues respectively. Northern-blot analysis revealed that pgaA- and pgaB-specific mRNA accumulate in mycelia grown on sucrose. mRNAs are also present upon transfer to media containing D-galacturonic acid and pectin. Recombinant PGA and PGB were characterized with respect to pH optimum, activity on polygalacturonic acid, and mode of action and kinetics on oligogalacturonates of different chain length (n=3-7). At their pH optimum the specific activities in a standard assay for PGA (pH 4.2) and PGB (pH 5.0) were 16.5 mu+kat.mg(-1) and 8.3 mu+kat.mg(-1) respectively. Product progression analysis, using polygalacturonate as a substrate, revealed a random cleavage pattern for both enzymes and indicated processive behaviour for PGA. This result was confirmed by analysis of the mode of action using oligogalacturonates. Processivity was observed when the degree of polymerization of the substrate exceeded 6. Using pectins of various degrees of methyl esterification, it was shown that PGA and PGB both preferred partially methylated substrates. PMID:10642523

  4. [Overexpression of Aspergillus candidus lactase and analysis of enzymatic properties].

    PubMed

    Zhang, Wei; Fan, Yun-liu; Yao, Bin

    2005-04-01

    The lactase gene lacb' from Aspergillus candidus was fused behind alpha-factor signal sequence in the Pichia pastoris expression vector pPIC9, then integrated into the genome of P. pastoris by recombination events. The P. pastoris recombinants for lactase overexpression were screened by enzyme activity analysis and SDS-PAGE. The lactase expressed in P. pastoris was glycosylated protein with an apparent molecular weight of 130 kD, while the deglycosylated lactase treated with Endo H had an apparent molecular weight of about 110 kD. The expression level of secreted lactase protein in recombinant P. pastoris was 6 mg/mL with enzymatic activity of 3600 U/mL in the 5 L fermenter, which was the highest among that of all kinds of recombinant strains reported now. The optimal pH and optimal temperature of the lactase are 5.2 and 60 degrees C. The Vmax, Km, and specific activity of the lactase are 3.3 micromol/min, 1.7 mmol/L and 706.5 +/- 2.6 U/mg, respectively. Compare to the lactase from Aspergillus oryzae ATCC 20423, the expressed lactase from A. candidus have better enzymatic properties including the high thermostability, high specific activity and wide pH range for enzyme reaction.

  5. Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes

    SciTech Connect

    Kuo, Alan; Grigoriev, Igor

    2009-04-17

    Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentous ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.

  6. Comparison of Multiple Methods for Determination of FCGR3A/B Genomic Copy Numbers in HapMap Asian Populations with Two Public Databases

    PubMed Central

    Qi, Yuan-yuan; Zhou, Xu-jie; Bu, Ding-fang; Hou, Ping; Lv, Ji-cheng; Zhang, Hong

    2016-01-01

    Low FCGR3 copy numbers (CNs) has been associated with susceptibility to several systemic autoimmune diseases. However, inconsistent associations were reported and errors caused by shaky methods were suggested to be the major causes. In large scale case control association studies, robust copy number determination method is thus warranted, which was the main focus of the current study. In the present study, FCGR3 CNs of 90 HapMap Asians were firstly checked using four assays including paralog ratio test combined with restriction enzyme digest variant ratio (PRT-REDVR), real-time quantitative (qPCR) using TaqMan assay, real-time qPCR using SYBR Green dye and short tenden repeat (STR). To improve the comparison precision reproductively, the results were compared with those from recently released sequencing data from 1000 genomes project as well as whole-genome tiling BAC array data. The tendencies of inconsistent samples by these methods were also characterized. Refined in-home TaqMan qPCR assay showed the highest correlation with array-CGH results (r = 0.726, p < 0.001) and the highest concordant rate with 1000 genome sequencing data (FCGR3A 91.76%, FCGR3B 85.88%, and FCGR3 81.18%). For samples with copy number variations, comprehensive analysis of multiple methods was required in order to improve detection accuracy. All these method were prone to detect copy number to be higher than that from direct sequencing. All the four PCR based CN determination methods (qPCR using TaqMan probes or SYBR Green, PRT, STR) were prone to higher estimation errors and thus may lead to artificial associations in large-scale case-control association studies. But different to previous reports, we observed that properly refined TaqMan qPCR assay was not inferior to or even more accurate than PRT when using sequencing data as the reference. PMID:28083015

  7. Data on the presence or absence of genes encoding essential proteins for ochratoxin and fumonisin biosynthesis in Aspergillus niger and Aspergillus welwitschiae

    PubMed Central

    Massi, Fernanda Pelisson; Sartori, Daniele; Ferranti, Larissa de Souza; Iamanaka, Beatriz Thie; Taniwaki, Marta Hiromi; Vieira, Maria Lucia Carneiro; Fungaro, Maria Helena Pelegrinelli

    2016-01-01

    We present the multiplex PCR data for the presence/absence of genes involved in OTA and FB2 biosynthesis in Aspergillus niger/Aspergillus welwitschiae strains isolated from different food substrates in Brazil. Among the 175 strains analyzed, four mPCR profiles were found: Profile 1 (17%) highlights strains harboring in their genome the pks, radH and the fum8 genes. Profile 2 (3.5%) highlights strains harboring genes involved in OTA biosynthesis i.e. radH and pks. Profile 3 (51.5%) highlights strains harboring the fum8 gene. Profile 4 (28%) highlights strains not carrying the genes studied herein. This research content is supplemental to our original research article, “Prospecting for the incidence of genes involved in ochratoxin and fumonisin biosynthesis in Brazilian strains of A. niger and A. welwitschiae” [1]. PMID:27054181

  8. Translational genomics for plant breeding with the genome sequence explosion.

    PubMed

    Kang, Yang Jae; Lee, Taeyoung; Lee, Jayern; Shim, Sangrea; Jeong, Haneul; Satyawan, Dani; Kim, Moon Young; Lee, Suk-Ha

    2016-04-01

    The use of next-generation sequencers and advanced genotyping technologies has propelled the field of plant genomics in model crops and plants and enhanced the discovery of hidden bridges between genotypes and phenotypes. The newly generated reference sequences of unstudied minor plants can be annotated by the knowledge of model plants via translational genomics approaches. Here, we reviewed the strategies of translational genomics and suggested perspectives on the current databases of genomic resources and the database structures of translated information on the new genome. As a draft picture of phenotypic annotation, translational genomics on newly sequenced plants will provide valuable assistance for breeders and researchers who are interested in genetic studies.

  9. Aspergillus waksmanii sp. nov. and Aspergillus marvanovae sp. nov., two closely related species in section Fumigati

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Two new and phylogenetically closely related species in Aspergillus section Fumigati are described and illustrated. Homothallic A. waksmanii was isolated from New Jersey soil (USA) and is represented by the ex-type isolate NRRL 179T (=CCF 4266= IBT 31900). Aspergillus marvanovae was isolated from wa...

  10. Aspergillus brasiliensis sp. nov., a biseriate black Aspergillus species with world-wide distribution.

    PubMed

    Varga, János; Kocsubé, Sándor; Tóth, Beáta; Frisvad, Jens C; Perrone, Giancarlo; Susca, Antonia; Meijer, Martin; Samson, Robert A

    2007-08-01

    A novel species, Aspergillus brasiliensis sp. nov., is described within Aspergillus section Nigri. This species can be distinguished from other black aspergilli based on intergenic transcribed region, beta-tubulin and calmodulin gene sequences, by amplified fragment length polymorphism analysis and by extrolite profiles. A. brasiliensis isolates produced naphtho-gamma-pyrones, tensidol A and B and pyrophen in common with Aspergillus niger and Aspergillus tubingensis, but also several unique compounds, justifying their treatment as representing a separate species. None of the isolates were found to produce ochratoxin A, kotanins, funalenone or pyranonigrins. The novel species was most closely related to A. niger, and was isolated from soil from Brazil, Australia, USA and The Netherlands, and from grape berries from Portugal. The type strain of Aspergillus brasiliensis sp. nov. is CBS 101740(T) (=IMI 381727(T)=IBT 21946(T)).

  11. Database tools in genetic diseases research.

    PubMed

    Bianco, Anna Monica; Marcuzzi, Annalisa; Zanin, Valentina; Girardelli, Martina; Vuch, Josef; Crovella, Sergio

    2013-02-01

    The knowledge of the human genome is in continuous progression: a large number of databases have been developed to make meaningful connections among worldwide scientific discoveries. This paper reviews bioinformatics resources and database tools specialized in disseminating information regarding genetic disorders. The databases described are useful for managing sample sequences, gene expression and post-transcriptional regulation. In relation to data sets available from genome-wide association studies, we describe databases that could be the starting point for developing studies in the field of complex diseases, particularly those in which the causal genes are difficult to identify.

  12. Analytical and computational approaches to define the Aspergillus niger secretome

    SciTech Connect

    Tsang, Adrian; Butler, Gregory D.; Powlowski, Justin; Panisko, Ellen A.; Baker, Scott E.

    2009-03-01

    We used computational and mass spectrometric approaches to characterize the Aspergillus niger secretome. The 11,200 gene models predicted in the genome of A. niger strain ATCC 1015 were the data source for the analysis. Depending on the computational methods used, 691 to 881 proteins were predicted to be secreted proteins. We cultured A. niger in six different media and analyzed the extracellular proteins produced using mass spectrometry. A total of 222 proteins were identified, with 39 proteins expressed under all six conditions and 74 proteins expressed under only one condition. The secreted proteins identified by mass spectrometry were used to guide the correction of about 20 gene models. Additional analysis focused on extracellular enzymes of interest for biomass processing. Of the 63 glycoside hydrolases predicted to be capable of hydrolyzing cellulose, hemicellulose or pectin, 94% of the exo-acting enzymes and only 18% of the endo-acting enzymes were experimentally detected.

  13. Sesterterpene ophiobolin biosynthesis involving multiple gene clusters in Aspergillus ustus

    PubMed Central

    Chai, Hangzhen; Yin, Ru; Liu, Yongfeng; Meng, Huiying; Zhou, Xianqiang; Zhou, Guolin; Bi, Xupeng; Yang, Xue; Zhu, Tonghan; Zhu, Weiming; Deng, Zixin; Hong, Kui

    2016-01-01

    Terpenoids are the most diverse and abundant natural products among which sesterterpenes account for less than 2%, with very few reports on their biosynthesis. Ophiobolins are tricyclic 5–8–5 ring sesterterpenes with potential pharmaceutical application. Aspergillus ustus 094102 from mangrove rizhosphere produces ophiobolin and other terpenes. We obtained five gene cluster knockout mutants, with altered ophiobolin yield using genome sequencing and in silico analysis, combined with in vivo genetic manipulation. Involvement of the five gene clusters in ophiobolin synthesis was confirmed by investigation of the five key terpene synthesis relevant enzymes in each gene cluster, either by gene deletion and complementation or in vitro verification of protein function. The results demonstrate that ophiobolin skeleton biosynthesis involves five gene clusters, which are responsible for C15, C20, C25, and C30 terpenoid biosynthesis. PMID:27273151

  14. In-silico analysis of Aspergillus niger beta-glucosidases

    NASA Astrophysics Data System (ADS)

    Yeo S., L.; Shazilah, K.; Suhaila, S.; Abu Bakar F., D.; Murad A. M., A.

    2014-09-01

    Genomic data mining was carried out and revealed a total of seventeen β-glucosidases in filamentous fungi Aspergillus niger. Two of them belonged to glycoside hydrolase family 1 (GH1) while the rest belonged to genes in family 3 (GH3). These proteins were then named according to the nomenclature as proposed by the International Union of Biochemistry (IUB), starting from the lowest pI and glycoside hydrolase family. Their properties were predicted using various bionformatic tools showing the presence of domains for signal peptide and active sites. Interestingly, one particular domain, PA14 (protective antigen) was present in four of the enzymes, predicted to be involved in carbohydrate binding. A phylogenetic tree grouped the two glycoside hydrolase families with GH1 and GH3 related organisms. This study showed that the various domains present in these β-glucosidases are postulated to be crucial for the survival of this fungus, as supported by other analysis.

  15. A novel selectable marker based on Aspergillus niger arginase expression.

    PubMed

    Dave, Kashyap; Ahuja, Manmeet; Jayashri, T N; Sirola, Rekha Bisht; Punekar, Narayan S

    2012-06-10

    Selectable markers are valuable tools in transforming asexual fungi like Aspergillus niger. An arginase (agaA) expression vector and a suitable arginase-disrupted host would define a novel nutritional marker/selection for transformation. The development of such a marker was successfully achieved in two steps. The single genomic copy of A. niger arginase gene was disrupted by homologous integration of the bar marker. The agaA disruptant was subsequently complemented by transforming it with agaA expression vectors. Both citA and trpC promoters were able to drive the expression of arginase cDNA. Such agaA+ transformants displayed arginase expression pattern distinct from that of the parent strain. The results are also consistent with a single catabolic route for arginine in this fungus. A simple yet novel arginine-based selection for filamentous fungal transformation is thus described.

  16. Prototyping a genetics deductive database

    SciTech Connect

    Hearne, C.; Cui, Zhan; Parsons, S.; Hajnal, S.

    1994-12-31

    We are developing a laboratory notebook system known as the Genetics Deductive Database. Currently our prototype provides storage for biological facts and rules with flexible access via an interactive graphical display. We have introduced a formal basis for the representation and reasoning necessary to order genome map data and handle the uncertainty inherent in biological data. We aim to support laboratory activities by introducing an experiment planner into our prototype. The Genetics Deductive Database is built using new database technology which provides an object-oriented conceptual model, a declarative rule language, and a procedural update language. This combination of features allows the implementation of consistency maintenance, automated reasoning, and data verification.

  17. Matrix-assisted laser desorption ionization time-of-flight mass spectrometry for fast and accurate identification of clinically relevant Aspergillus species.

    PubMed

    Alanio, A; Beretti, J-L; Dauphin, B; Mellado, E; Quesne, G; Lacroix, C; Amara, A; Berche, P; Nassif, X; Bougnoux, M-E

    2011-05-01

    New Aspergillus species have recently been described with the use of multilocus sequencing in refractory cases of invasive aspergillosis. The classical phenotypic identification methods routinely used in clinical laboratories failed to identify them adequately. Some of these Aspergillus species have specific patterns of susceptibility to antifungal agents, and misidentification may lead to inappropriate therapy. We developed a matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry (MS)-based strategy to adequately identify Aspergillus species to the species level. A database including the reference spectra of 28 clinically relevant species from seven Aspergillus sections (five common and 23 unusual species) was engineered. The profiles of young and mature colonies were analysed for each reference strain, and species-specific spectral fingerprints were identified. The performance of the database was then tested on 124 clinical and 16 environmental isolates previously characterized by partial sequencing of the β-tubulin and calmodulin genes. One hundred and thirty-eight isolates of 140 (98.6%) were correctly identified. Two atypical isolates could not be identified, but no isolate was misidentified (specificity: 100%). The database, including species-specific spectral fingerprints of young and mature colonies of the reference strains, allowed identification regardless of the maturity of the clinical isolate. These results indicate that MALDI-TOF MS is a powerful tool for rapid and accurate identification of both common and unusual species of Aspergillus. It can give better results than morphological identification in clinical laboratories.

  18. Taxonomy, chemodiversity, and chemoconsistency of Aspergillus, Penicillium, and Talaromyces species

    PubMed Central

    Frisvad, Jens C.

    2014-01-01

    Aspergillus, Penicillium, and Talaromyces are among the most chemically inventive of all fungi, producing a wide array of secondary metabolites (exometabolites). The three genera are holophyletic in a cladistic sense and polythetic classes in an anagenetic or functional sense, and contain 344, 354, and 88 species, respectively. New developments in classification, cladification, and nomenclature have meant that the species, series, and sections suggested are natural groups that share many extrolites, including exometabolites, exoproteins, exocarbohydrates, and exolipids in addition to morphological features. The number of exometabolites reported from these species is very large, and genome sequencing projects have shown that a large number of additional exometabolites may be expressed, given the right conditions (“cryptic” gene clusters for exometabolites). The exometabolites are biosynthesized via shikimic acid, tricarboxylic acid cycle members, nucleotides, carbohydrates or as polyketides, non-ribosomal peptides, terpenes, or mixtures of those. The gene clusters coding for these compounds contain genes for the biosynthetic building blocks, the linking of these building blocks, tailoring enzymes, resistance for own products, and exporters. Species within a series or section in Aspergillus, Penicillium, and Talaromyces have many exometabolites in common, seemingly acquired by cladogenesis, but some the gene clusters for autapomorphic exometabolites may have been acquired by horizontal gene transfer. Despite genome sequencing efforts, and the many breakthroughs these will give, it is obvious that epigenetic factors play a large role in evolution and function of chemodiversity, and better methods for characterizing the epigenome are needed. Most of the individual species of the three genera produce a consistent and characteristic profile of exometabolites, but growth medium variations, stimulation by exometabolites from other species, and variations in abiotic

  19. Comparison of species composition and fumonisin production in Aspergillus section Nigri populations in maize kernels from USA and Italy.

    PubMed

    Susca, Antonia; Moretti, Antonio; Stea, Gaetano; Villani, Alessandra; Haidukowski, Miriam; Logrieco, Antonio; Munkvold, Gary

    2014-10-01

    non-producing strains distributed among the clades: A. welwitschiae, A. niger group 1 and A. niger group 2, confirming the potential of Aspergillus sect. Nigri species to contribute to total fumonisin contamination of maize. A higher percentage of A. niger isolates (72.0%) produced FB2 compared to A. welwitschiae (36.6%). The percentage of FB2-producing A. niger strains was similar in the USA and Italian populations; however, the predominance of A. niger in the USA population suggests a higher potential for fumonisin production. Some strains with fum8 present in the genome did not produce FB2in vitro, confirming the ineffectiveness of fum8 presence as a predictor of FB2 production.

  20. Experiment Databases

    NASA Astrophysics Data System (ADS)

    Vanschoren, Joaquin; Blockeel, Hendrik

    Next to running machine learning algorithms based on inductive queries, much can be learned by immediately querying the combined results of many prior studies. Indeed, all around the globe, thousands of machine learning experiments are being executed on a daily basis, generating a constant stream of empirical information on machine learning techniques. While the information contained in these experiments might have many uses beyond their original intent, results are typically described very concisely in papers and discarded afterwards. If we properly store and organize these results in central databases, they can be immediately reused for further analysis, thus boosting future research. In this chapter, we propose the use of experiment databases: databases designed to collect all the necessary details of these experiments, and to intelligently organize them in online repositories to enable fast and thorough analysis of a myriad of collected results. They constitute an additional, queriable source of empirical meta-data based on principled descriptions of algorithm executions, without reimplementing the algorithms in an inductive database. As such, they engender a very dynamic, collaborative approach to experimentation, in which experiments can be freely shared, linked together, and immediately reused by researchers all over the world. They can be set up for personal use, to share results within a lab or to create open, community-wide repositories. Here, we provide a high-level overview of their design, and use an existing experiment database to answer various interesting research questions about machine learning algorithms and to verify a number of recent studies.

  1. The Volatome of Aspergillus fumigatus

    PubMed Central

    Calvo, A. M.; Latgé, J. P.

    2014-01-01

    Early detection of invasive aspergillosis is absolutely required for efficient therapy of this fungal infection. The identification of fungal volatiles in patient breath can be an alternative for the detection of Aspergillus fumigatus that still remains problematic. In this work, we investigated the production of volatile organic compounds (VOCs) by A. fumigatus in vitro, and we show that volatile production depends on the nutritional environment. A. fumigatus produces a multiplicity of VOCs, predominantly terpenes and related compounds. The production of sesquiterpenoid compounds was found to be strongly induced by increased iron concentrations and certain drugs, i.e., pravastatin. Terpenes that were always detectable in large amounts were α-pinene, camphene, and limonene, as well as sesquiterpenes, identified as α-bergamotene and β-trans-bergamotene. Other substance classes that were found to be present in the volatome, such as 1-octen-3-ol, 3-octanone, and pyrazines, were found only under specific growth conditions. Drugs that interfere with the terpene biosynthesis pathway influenced the composition of the fungal volatome, and most notably, a block of sesquiterpene biosynthesis by the bisphosphonate alendronate fundamentally changed the VOC composition. Using deletion mutants, we also show that a terpene cyclase and a putative kaurene synthase are essential for the synthesis of volatile terpenes by A. fumigatus. The present analysis of in vitro volatile production by A. fumigatus suggests that VOCs may be used in the diagnosis of infections caused by this fungus. PMID:24906414

  2. Session title: Distributed and intelligent databases

    SciTech Connect

    Argos, P.; Mewes, H.W.; Frishman, D.

    1996-12-31

    This session focuses on the recent advances in the delivery of information to the biological community concerning genome sequencing and related information. New approaches include interconnecting existing databases, knowledge-based expert systems, interface languages and multiserver management.

  3. ASPERGILLUS LUCHUENSIS , AN INDUSTRIALLY IMPORTANT BLACK ASPERGILLUS IN EAST ASIA

    PubMed Central

    Hong, Seung-Beom; Lee, Mina; Kim, Dae-Ho; Varga, Janos; Frisvad, Jens C.; Perrone, Giancarlo; Gomi, Katsuya; Yamada, Osamu; Machida, Masayuki; Houbraken, Jos; Samson, Robert A.

    2013-01-01

    Aspergilli known as black- and white-koji molds which are used for awamori, shochu, makgeolli and other food and beverage fermentations, are reported in the literature as A. luchuensis, A. awamori, A. kawachii, or A. acidus. In order to elucidate the taxonomic position of these species, available ex-type cultures were compared based on morphology and molecular characters. A. luchuensis, A. kawachii and A. acidus showed the same banding patterns in RAPD, and the three species had the same rDNA-ITS, β-tubulin and calmodulin sequences and these differed from those of the closely related A. niger and A. tubingensis. Morphologically, the three species are not significantly different from each other or from A. niger and A. tubingensis. It is concluded that A. luchuensis, A. kawachii and A. acidus are the same species, and A. luchuensis is selected as the correct name based on priority. Strains of A. awamori which are stored in National Research Institute of Brewing in Japan, represent A. niger (n = 14) and A. luchuensis (n = 6). The neotype of A. awamori (CBS 557.65 =  NRRL 4948) does not originate from awamori fermentation and it is shown to be identical with the unknown taxon Aspergillus welwitschiae. Extrolite analysis of strains of A. luchuensis showed that they do not produce mycotoxins and therefore can be considered safe for food and beverage fermentations. A. luchuensis is also frequently isolated from meju and nuruk in Korea and Puerh tea in China and the species is probably common in the fermentation environment of East Asia. A re-description of A. luchuensis is provided because the incomplete data in the original literature. PMID:23723998

  4. Solubility Database

    National Institute of Standards and Technology Data Gateway

    SRD 106 IUPAC-NIST Solubility Database (Web, free access)   These solubilities are compiled from 18 volumes (Click here for List) of the International Union for Pure and Applied Chemistry(IUPAC)-NIST Solubility Data Series. The database includes liquid-liquid, solid-liquid, and gas-liquid systems. Typical solvents and solutes include water, seawater, heavy water, inorganic compounds, and a variety of organic compounds such as hydrocarbons, halogenated hydrocarbons, alcohols, acids, esters and nitrogen compounds. There are over 67,500 solubility measurements and over 1800 references.

  5. [Aspergillus fumigatus endocarditis in a patient with a biventricular pacemaker].

    PubMed

    Cuesta, José M; Fariñas, María C; Rodilla, Irene G; Salesa, Ricardo; de Berrazueta, José R

    2005-05-01

    Aspergillus fumigatus endocarditis is one of the rarest and severest complications in cardiological patients. We describe a patient with an intracardial pacemaker who was diagnosed as having Aspergillus fumigatus endocarditis. Postmortem examination showed a large, Aspergillus-infected thrombus encased in the right ventricle, pulmonary trunk and main pulmonary branches.

  6. Misidentification of Aspergillus nomius and Aspergillus tamarii as Aspergillus flavus: characterization by internal transcribed spacer, β-Tubulin, and calmodulin gene sequencing, metabolic fingerprinting, and matrix-assisted laser desorption ionization-time of flight mass spectrometry.

    PubMed

    Tam, Emily W T; Chen, Jonathan H K; Lau, Eunice C L; Ngan, Antonio H Y; Fung, Kitty S C; Lee, Kim-Chung; Lam, Ching-Wan; Yuen, Kwok-Yung; Lau, Susanna K P; Woo, Patrick C Y

    2014-04-01

    Aspergillus nomius and Aspergillus tamarii are Aspergillus species that phenotypically resemble Aspergillus flavus. In the last decade, a number of case reports have identified A. nomius and A. tamarii as causes of human infections. In this study, using an internal transcribed spacer, β-tubulin, and calmodulin gene sequencing, only 8 of 11 clinical isolates reported as A. flavus in our clinical microbiology laboratory by phenotypic methods were identified as A. flavus. The other three isolates were A. nomius (n = 2) or A. tamarii (n = 1). The results corresponded with those of metabolic fingerprinting, in which the A. flavus, A. nomius, and A. tamarii strains were separated into three clusters based on ultra-high-performance liquid chromatography-tandem mass spectrometry (UHPLC MS) analysis. The first two patients with A. nomius infections had invasive aspergillosis and chronic cavitary and fibrosing pulmonary and pleural aspergillosis, respectively, whereas the third patient had A. tamarii colonization of the airway. Identification of the 11 clinical isolates and three reference strains by matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) showed that only six of the nine strains of A. flavus were identified correctly. None of the strains of A. nomius and A. tamarii was correctly identified. β-Tubulin or the calmodulin gene should be the gene target of choice for identifying A. flavus, A. nomius, and A. tamarii. To improve the usefulness of MALDI-TOF MS, the number of strains for each species in MALDI-TOF MS databases should be expanded to cover intraspecies variability.

  7. PPD - Proteome Profile Database.

    PubMed

    Sakharkar, Kishore R; Chow, Vincent T K

    2004-01-01

    With the complete sequencing of multiple genomes, there have been extensions in the methods of sequence analysis from single gene/protein-based to analyzing multiple genes and proteins simultaneously. Therefore, there is a demand of user-friendly software tools that will allow mining of these enormous datasets. PPD is a WWW-based database for comparative analysis of protein lengths in completely sequenced prokaryotic and eukaryotic genomes. PPD's core objective is to create protein classification tables based on the lengths of proteins by specifying a set of organisms and parameters. The interface can also generate information on changes in proteins of specific length distributions. This feature is of importance when the user's interest is focused on some evolutionarily related organisms or on organisms with similar or related tissue specificity or life-style. PPD is available at: PPD Home.

  8. Secretome data from Trichoderma reesei and Aspergillus niger cultivated in submerged and sequential fermentation methods.

    PubMed

    Florencio, Camila; Cunha, Fernanda M; Badino, Alberto C; Farinas, Cristiane S; Ximenes, Eduardo; Ladisch, Michael R

    2016-09-01

    The cultivation procedure and the fungal strain applied for enzyme production may influence levels and profile of the proteins produced. The proteomic analysis data presented here provide critical information to compare proteins secreted by Trichoderma reesei and Aspergillus niger when cultivated through submerged and sequential fermentation processes, using steam-explosion sugarcane bagasse as inducer for enzyme production. The proteins were organized according to the families described in CAZy database as cellulases, hemicellulases, proteases/peptidases, cell-wall-protein, lipases, others (catalase, esterase, etc.), glycoside hydrolases families, predicted and hypothetical proteins. Further detailed analysis of this data is provided in "Secretome analysis of Trichoderma reesei and Aspergillus niger cultivated by submerged and sequential fermentation process: enzyme production for sugarcane bagasse hydrolysis" C. Florencio, F.M. Cunha, A.C Badino, C.S. Farinas, E. Ximenes, M.R. Ladisch (2016) [1].

  9. WheatGenome.info: A Resource for Wheat Genomics Resource.

    PubMed

    Lai, Kaitao

    2016-01-01

    An integrated database with a variety of Web-based systems named WheatGenome.info hosting wheat genome and genomic data has been developed to support wheat research and crop improvement. The resource includes multiple Web-based applications, which are implemented as a variety of Web-based systems. These include a GBrowse2-based wheat genome viewer with BLAST search portal, TAGdb for searching wheat second generation genome sequence data, wheat autoSNPdb, links to wheat genetic maps using CMap and CMap3D, and a wheat genome Wiki to allow interaction between diverse wheat genome sequencing activities. This portal provides links to a variety of wheat genome resources hosted at other research organizations. This integrated database aims to accelerate wheat genome research and is freely accessible via the web interface at http://www.wheatgenome.info/ .

  10. Aspergillus species cystitis in a cat.

    PubMed

    Adamama-Moraitou, K K; Paitaki, C G; Rallis, T S; Tontis, D

    2001-03-01

    A Persian male cat with a history of lower urinary tract disease was presented because of polydipsia, polyuria, constipation and nasal discharge. Ten weeks before admission, the cat had been treated for lower urinary tract disease by catheterisation and flushing of the bladder. The animal was thin, dehydrated, anaemic and azotaemic. Urine culture revealed Aspergillus species cystitis. Antibodies against Aspergillus nidulans were identified in serum. Fluconazole was administered orally (7.5 mg/kg, q 12 h) for 10 consecutive weeks. The azotaemia was resolved, the kidney concentrating ability was recovered and the cat has remained healthy without similar problems.

  11. Rice Glycosyltransferase (GT) Phylogenomic Database

    DOE Data Explorer

    Ronald, Pamela

    The Ronald Laboratory staff at the University of California-Davis has a primary research focus on the genes of the rice plant. They study the role that genetics plays in the way rice plants respond to their environment. They created the Rice GT Database in order to integrate functional genomic information for putative rice Glycosyltransferases (GTs). This database contains information on nearly 800 putative rice GTs (gene models) identified by sequence similarity searches based on the Carbohydrate Active enZymes (CAZy) database. The Rice GT Database provides a platform to display user-selected functional genomic data on a phylogenetic tree. This includes sequence information, mutant line information, expression data, etc. An interactive chromosomal map shows the position of all rice GTs, and links to rice annotation databases are included. The format is intended to "facilitate the comparison of closely related GTs within different families, as well as perform global comparisons between sets of related families." [From http://ricephylogenomics.ucdavis.edu/cellwalls/gt/genInfo.shtml] See also the primary paper discussing this work: Peijian Cao, Laura E. Bartley, Ki-Hong Jung and Pamela C. Ronalda. Construction of a Rice Glycosyltransferase Phylogenomic Database and Identification of Rice-Diverged Glycosyltransferases. Molecular Plant, 2008, 1(5): 858-877.

  12. Enhanced diversity and aflatoxigenicity in interspecific hybrids of Aspergillus flavus and Aspergillus parasiticus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus and A. parasiticus are two of the most important aflatoxin-producing species that contaminate agricultural commodities worldwide. Both species are heterothallic and undergo sexual reproduction in laboratory crosses. Here, we examine the possibility of interspecific matings betwe...

  13. Developmental validation of the MiSeq FGx Forensic Genomics System for Targeted Next Generation Sequencing in Forensic DNA Casework and Database Laboratories.

    PubMed

    Jäger, Anne C; Alvarez, Michelle L; Davis, Carey P; Guzmán, Ernesto; Han, Yonmee; Way, Lisa; Walichiewicz, Paulina; Silva, David; Pham, Nguyen; Caves, Glorianna; Bruand, Jocelyne; Schlesinger, Felix; Pond, Stephanie J K; Varlaro, Joe; Stephens, Kathryn M; Holt, Cydne L

    2017-05-01

    Human DNA profiling using PCR at polymorphic short tandem repeat (STR) loci followed by capillary electrophoresis (CE) size separation and length-based allele typing has been the standard in the forensic community for over 20 years. Over the last decade, Next-Generation Sequencing (NGS) matured rapidly, bringing modern advantages to forensic DNA analysis. The MiSeq FGx™ Forensic Genomics System, comprised of the ForenSeq™ DNA Signature Prep Kit, MiSeq FGx™ Reagent Kit, MiSeq FGx™ instrument and ForenSeq™ Universal Analysis Software, uses PCR to simultaneously amplify up to 231 forensic loci in a single multiplex reaction. Targeted loci include Amelogenin, 27 common, forensic autosomal STRs, 24 Y-STRs, 7 X-STRs and three classes of single nucleotide polymorphisms (SNPs). The ForenSeq™ kit includes two primer sets: Amelogenin, 58 STRs and 94 identity informative SNPs (iiSNPs) are amplified using DNA Primer Set A (DPMA; 153 loci); if a laboratory chooses to generate investigative leads using DNA Primer Set B, amplification is targeted to the 153 loci in DPMA plus 22 phenotypic informative (piSNPs) and 56 biogeographical ancestry SNPs (aiSNPs). High-resolution genotypes, including detection of intra-STR sequence variants, are semi-automatically generated with the ForenSeq™ software. This system was subjected to developmental validation studies according to the 2012 Revised SWGDAM Validation Guidelines. A two-step PCR first amplifies the target forensic STR and SNP loci (PCR1); unique, sample-specific indexed adapters or "barcodes" are attached in PCR2. Approximately 1736 ForenSeq™ reactions were analyzed. Studies include DNA substrate testing (cotton swabs, FTA cards, filter paper), species studies from a range of nonhuman organisms, DNA input sensitivity studies from 1ng down to 7.8pg, two-person human DNA mixture testing with three genotype combinations, stability analysis of partially degraded DNA, and effects of five commonly encountered PCR

  14. Taxonomic Characterization and Secondary Metabolite Profiling of Aspergillus Section Aspergillus Contaminating Feeds and Feedstuffs

    PubMed Central

    Greco, Mariana; Kemppainen, Minna; Pose, Graciela; Pardo, Alejandro

    2015-01-01

    Xerophilic fungal species of the genus Aspergillus are economically highly relevant due to their ability to grow on low water activity substrates causing spoilage of stored goods and animal feeds. These fungi can synthesize a variety of secondary metabolites, many of which show animal toxicity, creating a health risk for food production animals and to humans as final consumers, respectively. Animal feeds used for rabbit, chinchilla and rainbow trout production in Argentina were analysed for the presence of xerophilic Aspergillus section Aspergillus species. High isolation frequencies (>60%) were detected in all the studied rabbit and chinchilla feeds, while the rainbow trout feeds showed lower fungal charge (25%). These section Aspergillus contaminations comprised predominantly five taxa. Twenty isolates were subjected to taxonomic characterization using both ascospore SEM micromorphology and two independent DNA loci sequencing. The secondary metabolite profiles of the isolates were determined qualitatively by HPLC-MS. All the isolates produced neoechinulin A, 17 isolates were positive for cladosporin and echinulin, and 18 were positive for neoechinulin B. Physcion and preechinulin were detected in a minor proportion of the isolates. This is the first report describing the detailed species composition and the secondary metabolite profiles of Aspergillus section Aspergillus contaminating animal feeds. PMID:26364643

  15. Taxonomic Characterization and Secondary Metabolite Profiling of Aspergillus Section Aspergillus Contaminating Feeds and Feedstuffs.

    PubMed

    Greco, Mariana; Kemppainen, Minna; Pose, Graciela; Pardo, Alejandro

    2015-09-02

    Xerophilic fungal species of the genus Aspergillus are economically highly relevant due to their ability to grow on low water activity substrates causing spoilage of stored goods and animal feeds. These fungi can synthesize a variety of secondary metabolites, many of which show animal toxicity, creating a health risk for food production animals and to humans as final consumers, respectively. Animal feeds used for rabbit, chinchilla and rainbow trout production in Argentina were analysed for the presence of xerophilic Aspergillus section Aspergillus species. High isolation frequencies (>60%) were detected in all the studied rabbit and chinchilla feeds, while the rainbow trout feeds showed lower fungal charge (25%). These section Aspergillus contaminations comprised predominantly five taxa. Twenty isolates were subjected to taxonomic characterization using both ascospore SEM micromorphology and two independent DNA loci sequencing. The secondary metabolite profiles of the isolates were determined qualitatively by HPLC-MS. All the isolates produced neoechinulin A, 17 isolates were positive for cladosporin and echinulin, and 18 were positive for neoechinulin B. Physcion and preechinulin were detected in a minor proportion of the isolates. This is the first report describing the detailed species composition and the secondary metabolite profiles of Aspergillus section Aspergillus contaminating animal feeds.

  16. User Guidelines for the Brassica Database: BRAD.

    PubMed

    Wang, Xiaobo; Cheng, Feng; Wang, Xiaowu

    2016-01-01

    The genome sequence of Brassica rapa was first released in 2011. Since then, further Brassica genomes have been sequenced or are undergoing sequencing. It is therefore necessary to develop tools that help users to mine information from genomic data efficiently. This will greatly aid scientific exploration and breeding application, especially for those with low levels of bioinformatic training. Therefore, the Brassica database (BRAD) was built to collect, integrate, illustrate, and visualize Brassica genomic datasets. BRAD provides useful searching and data mining tools, and facilitates the search of gene annotation datasets, syntenic or non-syntenic orthologs, and flanking regions of functional genomic elements. It also includes genome-analysis tools such as BLAST and GBrowse. One of the important aims of BRAD is to build a bridge between Brassica crop genomes with the genome of the model species Arabidopsis thaliana, thus transferring the bulk of A. thaliana gene study information for use with newly sequenced Brassica crops.

  17. Use of UHPLC high-resolution Orbitrap mass spectrometry to investigate the genes involved in the production of secondary metabolites in Aspergillus flavus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The fungus Aspergillus flavus is known for its ability to produce the toxic and carcinogenic aflatoxins in food and feed. While aflatoxins are of most concern, A. flavus is predicted to be capable of producing many more metabolites based on a study of its complete genome sequence. Some of these meta...

  18. An Aspergillus flavus secondary metabolic gene cluster containing a hybrid PKS-NRPS is necessary for synthesis of the 2-pyridones, leporins

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genome of the filamentous fungus, Aspergillus flavus, has been shown to harbor as many as 55 putative secondary metabolic gene clusters including the one responsible for production of the toxic and carcinogenic, polyketide synthase (PKS)-derived family of secondary metabolites termed aflatoxins....

  19. Functional characterization of a veA-dependent polyketide synthase gene in Aspergillus flavus necessary for the synthesis of asparasone, a sclerotium-specific pigment

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The filamentous fungus, Aspergillus flavus, produces the toxic and carcinogenic, polyketide synthase (PKS)-derived family of secondary metabolites termed aflatoxins. While analysis of the A. flavus genome has identified many other PKSs capable of producing secondary metabolites, to date, only a few ...

  20. Variation in copy number of the 28S rDNA of Aspergillus fumigatus measured by droplet digital PCR and analog quantitative real-time PCR.

    PubMed

    Alanio, Alexandre; Sturny-Leclère, Aude; Benabou, Marion; Guigue, Nicolas; Bretagne, Stéphane

    2016-08-01

    Droplet digital PCR (ddPCR) after DNA digestion yielded a 28S rDNA copy number of 61 to 86 copies/genome when testing 10 unrelated Aspergillus fumigatus isolates, higher than with quantitative PCR. Unfortunately, ddPCR after DNA digestion did not improve the sensitivity of our PCR assay when testing serum patients with invasive aspergillosis.

  1. The sexual state of Aspergillus parasiticus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The sexual state of Aspergillus parasiticus, a potent aflatoxin-producing fungus within section Flavi, is described. The production of nonostiolate ascocarps surrounded by a separate peridium within the stroma places the teleomorph in the genus Petromyces. Petromyces parasiticus differs from P. a...

  2. Kipukasins: Nucleoside derivatives from Aspergillus versicolor.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Seven new aroyl uridine derivatives (kipukasins A-G; 1-7) were isolated from solid-substrate fermentation cultures of two different Hawaiian isolates of Aspergillus versicolor. The structures of compounds 1-7 were determined by analysis of NMR and MS data. The nucleoside portion of lead compound 1...

  3. Recombination and cryptic heterokaryosis in Aspergillus flavus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus is a pathogen of many agronomically important crops worldwide and can also cause human and animal diseases. A. flavus is the major producer of aflatoxins (AFs), which are carcinogenic secondary metabolites. In the United States, mycotoxins have been estimated to cause agricultur...

  4. Cyclopiazonic acid biosynthesis by Aspergillus flavus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Cyclopiazonic acid (CPA) is an indole-tetramic acid mycotoxin produced by some strains of Aspergillus flavus. Characterization of the CPA biosynthesis gene cluster confirmed that formation of CPA is via a three-enzyme pathway. This review examines the structure and organization of the CPA genes, elu...

  5. Aspergillus flavus: The Major Producer of Aflatoxin

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus is an opportunistic pathogen of crops. It is important because it produces aflatoxin as a secondary metabolite in the seeds of a number of crops both before and after harvest. Aflatoxin is a potent carcinogen that is highly regulated in most countries. In the field, aflatoxin i...

  6. Detection of Aspergillus fumigatus by polymerase chain reaction.

    PubMed Central

    Spreadbury, C; Holden, D; Aufauvre-Brown, A; Bainbridge, B; Cohen, J

    1993-01-01

    Aspergillus fumigatus is an opportunistic nosocomial pathogen causing an often fatal pneumonia, invasive aspergillosis (IA), in immunosuppressed patients. Oligonucleotide primers were used to amplify a 401-bp fragment spanning the 26S/intergenic spacer region of the rDNA complex of A. fumigatus by the polymerase chain reaction (PCR). The primers were highly sensitive and specific: as little as 1 pg of A. fumigatus genomic DNA could be detected, and the primers only amplified DNA from A. fumigatus and not any other fungal, bacterial, viral, or human DNA tested. Using the PCR, we were able to detect A. fumigatus DNA in lung homogenates from immunosuppressed mice experimentally infected with A. fumigatus but not from immunosuppressed uninfected controls. There was 93% correlation between the culture results and the PCR results. In a retrospective clinical study, the sensitivity of the PCR for the detection of A. fumigatus in clinical samples was confirmed by positive amplification in three of three culture-positive respiratory samples from confirmed cases of IA. Because isolation of Aspergillus spp. may reflect contamination and colonization without infection, the feasibility of using the PCR was evaluated by analyzing culture-negative samples from both immunosuppressed patients at high risk for IA and immunocompetent patients with other lung infections. Only 2 of 10 patients were culture negative and PCR positive in the high-risk group, and 2 of 7 patients were culture negative and PCR positive in the immunocompetent group. The results indicate that PCR detection might be a valuable adjunct to current laboratory methods to diagnose IA. Images PMID:8458955

  7. FluG affects secretion in colonies of Aspergillus niger.

    PubMed

    Wang, Fengfeng; Krijgsheld, Pauline; Hulsman, Marc; de Bekker, Charissa; Müller, Wally H; Reinders, Marcel; de Vries, Ronald P; Wösten, Han A B

    2015-01-01

    Colonies of Aspergillus niger are characterized by zonal heterogeneity in growth, sporulation, gene expression and secretion. For instance, the glucoamylase gene glaA is more highly expressed at the periphery of colonies when compared to the center. As a consequence, its encoded protein GlaA is mainly secreted at the outer part of the colony. Here, multiple copies of amyR were introduced in A. niger. Most transformants over-expressing this regulatory gene of amylolytic genes still displayed heterogeneous glaA expression and GlaA secretion. However, heterogeneity was abolished in transformant UU-A001.13 by expressing glaA and secreting GlaA throughout the mycelium. Sequencing the genome of UU-A001.13 revealed that transformation had been accompanied by deletion of part of the fluG gene and disrupting its 3' end by integration of a transformation vector. Inactivation of fluG in the wild-type background of A. niger also resulted in breakdown of starch under the whole colony. Asexual development of the ∆fluG strain was not affected, unlike what was previously shown in Aspergillus nidulans. Genes encoding proteins with a signal sequence for secretion, including part of the amylolytic genes, were more often downregulated in the central zone of maltose-grown ∆fluG colonies and upregulated in the intermediate part and periphery when compared to the wild-type. Together, these data indicate that FluG of A. niger is a repressor of secretion.

  8. Visualizing Genomic Annotations with the UCSC Genome Browser.

    PubMed

    Hung, Jui-Hung; Weng, Zhiping

    2016-11-01

    Genomic data and annotations are rapidly accumulating in databases such as the UCSC Genome Browser, NCBI, and Ensembl. Given the massive scale of these genomic databases, it is important to be able to easily retrieve known data and annotations of a specified genomic locus. For example, for a newly identified cis-regulatory element bound by a transcription factor, questions that immediately come to mind include whether the element is near a transcriptional start site and, if so, the name of the corresponding gene, and whether the histones or DNA at the locus are modified. The UCSC Genome Browser organizes data and annotations (called tracks) around the reference sequences or draft assemblies of many eukaryotic genomes and presents them using a powerful web-based graphical interface. This protocol describes how to use the UCSC Genome Browser to visualize selected tracks at specified genomic regions, download the data and annotations for further analysis, and retrieve multiple sequence alignments and their conservation scores.

  9. Isolation of Aspergillus spp. from the respiratory tract in critically ill patients: risk factors, clinical presentation and outcome

    PubMed Central

    Garnacho-Montero, José; Amaya-Villar, Rosario; Ortiz-Leyba, Carlos; León, Cristóbal; Álvarez-Lerma, Francisco; Nolla-Salas, Juan; Iruretagoyena, José R; Barcenilla, Fernando

    2005-01-01

    Introduction Our aims were to assess risk factors, clinical features, management and outcomes in critically ill patients in whom Aspergillus spp. were isolated from respiratory secretions, using a database from a study designed to assess fungal infections. Methods A multicentre prospective study was conducted over a 9-month period in 73 intensive care units (ICUs) and included patients with an ICU stay longer than 7 days. Tracheal aspirate and urine samples, and oropharyngeal and gastric swabs were collected and cultured each week. On admission to the ICU and at the initiation of antifungal therapy, the severity of illness was evaluated using the Acute Physiology and Chronic Health Evaluation II score. Retrospectively, isolation of Aspergillus spp. was considered to reflect colonization if the patient did not fulfil criteria for pneumonia, and infection if the patient met criteria for pulmonary infection and if the clinician in charge considered the isolation to be clinically valuable. Risk factors, antifungal use and duration of therapy were noted. Results Out of a total of 1756 patients, Aspergillus spp. were recovered in 36. Treatment with steroids (odds ratio = 4.5) and chronic obstructive pulmonary disease (odds ratio = 2.9) were significantly associated with Aspergillus spp. isolation in multivariate analysis. In 14 patients isolation of Aspergillus spp. was interpreted as colonization, in 20 it was interpreted as invasive aspergillosis, and two cases were not classified. The mortality rates were 50% in the colonization group and 80% in the invasive infection group. Autopsy was performed in five patients with clinically suspected infection and confirmed the diagnosis in all of these cases. Conclusion In critically ill patients, treatment should be considered if features of pulmonary infection are present and Aspergillus spp. are isolated from respiratory secretions. PMID:15987390

  10. REDIdb: the RNA editing database.

    PubMed

    Picardi, Ernesto; Regina, Teresa Maria Rosaria; Brennicke, Axel; Quagliariello, Carla

    2007-01-01

    The RNA Editing Database (REDIdb) is an interactive, web-based database created and designed with the aim to allocate RNA editing events such as substitutions, insertions and deletions occurring in a wide range of organisms. The database contains both fully and partially sequenced DNA molecules for which editing information is available either by experimental inspection (in vitro) or by computational detection (in silico). Each record of REDIdb is organized in a specific flat-file containing a description of the main characteristics of the entry, a feature table with the editing events and related details and a sequence zone with both the genomic sequence and the corresponding edited transcript. REDIdb is a relational database in which the browsing and identification of editing sites has been simplified by means of two facilities to either graphically display genomic or cDNA sequences or to show the corresponding alignment. In both cases, all editing sites are highlighted in colour and their relative positions are detailed by mousing over. New editing positions can be directly submitted to REDIdb after a user-specific registration to obtain authorized secure access. This first version of REDIdb database stores 9964 editing events and can be freely queried at http://biologia.unical.it/py_script/search.html.

  11. Atomic Databases

    NASA Astrophysics Data System (ADS)

    Mendoza, Claudio

    2000-10-01

    Atomic and molecular data are required in a variety of fields ranging from the traditional astronomy, atmospherics and fusion research to fast growing technologies such as lasers, lighting, low-temperature plasmas, plasma assisted etching and radiotherapy. In this context, there are some research groups, both theoretical and experimental, scattered round the world that attend to most of this data demand, but the implementation of atomic databases has grown independently out of sheer necessity. In some cases the latter has been associated with the data production process or with data centers involved in data collection and evaluation; but sometimes it has been the result of individual initiatives that have been quite successful. In any case, the development and maintenance of atomic databases call for a number of skills and an entrepreneurial spirit that are not usually associated with most physics researchers. In the present report we present some of the highlights in this area in the past five years and discuss what we think are some of the main issues that have to be addressed.

  12. Elucidation of primary metabolic pathways in Aspergillus species: orphaned research in characterizing orphan genes.

    PubMed

    Andersen, Mikael Rørdam

    2014-11-01

    Primary metabolism affects all phenotypical traits of filamentous fungi. Particular examples include reacting to extracellular stimuli, producing precursor molecules required for cell division and morphological changes as well as providing monomer building blocks for production of secondary metabolites and extracellular enzymes. In this review, all annotated genes from four Aspergillus species have been examined. In this process, it becomes evident that 80-96% of the genes (depending on the species) are still without verified function. A significant proportion of the genes with verified metabolic functions are assigned to secondary or extracellular metabolism, leaving only 2-4% of the annotated genes within primary metabolism. It is clear that primary metabolism has not received the same attention in the post-genomic area as many other research areas--despite its role at the very centre of cellular function. However, several methods can be employed to use the metabolic networks in tandem with comparative genomics to accelerate functional assignment of genes in primary metabolism. In particular, gaps in metabolic pathways can be used to assign functions to orphan genes. In this review, applications of this from the Aspergillus genes will be examined, and it is proposed that, where feasible, this should be a standard part of functional annotation of fungal genomes.

  13. GDP-mannose pyrophosphorylase is essential for cell wall integrity, morphogenesis and viability of Aspergillus fumigatus.

    PubMed

    Jiang, Hechun; Ouyang, Haomiao; Zhou, Hui; Jin, Cheng

    2008-09-01

    GDP-mannose pyrophosphorylase (GMPP) catalyses the synthesis of GDP-mannose, which is the precursor for the mannose residues in glycoconjugates, using mannose 1-phosphate and GTP as substrates. Repression of GMPP in yeast leads to phenotypes including cell lysis, defective cell wall, and failure of polarized growth and cell separation. Although several GMPPs have been isolated and characterized in filamentous fungi, the physiological consequences of their actions are not clear. In this study, Afsrb1, which is a homologue of yeast SRB1/PSA1/VIG9, was identified in the Aspergillus fumigatus genome. The Afsrb1 gene was expressed in Escherichia coli, and recombinant AfSrb1 was functionally confirmed as a GMPP. By the replacement of the native Afsrb1 promoter with an inducible Aspergillus nidulans alcA promoter, the conditional inactivation mutant strain YJ-gmpp was constructed. The presence of 3 % glucose completely blocked transcription of P(alcA)-Afsrb1, and was lethal to strain YJ-gmpp. Repression of Afsrb1 expression in strain YJ-gmpp led to phenotypes including hyphal lysis, defective cell wall, impaired polarity maintenance, and branching site selection. Also, rapid germination and reduced conidiation were documented. However, in contrast to yeast, strain YJ-gmpp retained the ability to direct polarity establishment and septation. Our results showed that the Afsrb1 gene is essential for cell wall integrity, morphogenesis and viability of Aspergillus fumigatus.

  14. Aspergillus hancockii sp. nov., a biosynthetically talented fungus endemic to southeastern Australian soils.

    PubMed

    Pitt, John I; Lange, Lene; Lacey, Alastair E; Vuong, Daniel; Midgley, David J; Greenfield, Paul; Bradbury, Mark I; Lacey, Ernest; Busk, Peter K; Pilgaard, Bo; Chooi, Yit-Heng; Piggott, Andrew M

    2017-01-01

    Aspergillus hancockii sp. nov., classified in Aspergillus subgenus Circumdati section Flavi, was originally isolated from soil in peanut fields near Kumbia, in the South Burnett region of southeast Queensland, Australia, and has since been found occasionally from other substrates and locations in southeast Australia. It is phylogenetically and phenotypically related most closely to A. leporis States and M. Chr., but differs in conidial colour, other minor features and particularly in metabolite profile. When cultivated on rice as an optimal substrate, A. hancockii produced an extensive array of 69 secondary metabolites. Eleven of the 15 most abundant secondary metabolites, constituting 90% of the total area under the curve of the HPLC trace of the crude extract, were novel. The genome of A. hancockii, approximately 40 Mbp, was sequenced and mined for genes encoding carbohydrate degrading enzymes identified the presence of more than 370 genes in 114 gene clusters, demonstrating that A. hancockii has the capacity to degrade cellulose, hemicellulose, lignin, pectin, starch, chitin, cutin and fructan as nutrient sources. Like most Aspergillus species, A. hancockii exhibited a diverse secondary metabolite gene profile, encoding 26 polyketide synthase, 16 nonribosomal peptide synthase and 15 nonribosomal peptide synthase-like enzymes.

  15. Partial Reconstruction of the Ergot Alkaloid Pathway by Heterologous Gene Expression in Aspergillus nidulans

    PubMed Central

    Ryan, Katy L.; Moore, Christopher T.; Panaccione, Daniel G.

    2013-01-01

    Ergot alkaloids are pharmaceutically and agriculturally important secondary metabolites produced by several species of fungi. Ergot alkaloid pathways vary among different fungal lineages, but the pathway intermediate chanoclavine-I is evolutionarily conserved among ergot alkaloid producers. At least four genes, dmaW, easF, easE, and easC, are necessary for pathway steps prior to chanoclavine-I; however, the sufficiency of these genes for chanoclavine-I synthesis has not been established. A fragment of genomic DNA containing dmaW, easF, easE, and easC was amplified from the human-pathogenic, ergot alkaloid-producing fungus Aspergillus fumigatus and transformed into Aspergillus nidulans, a model fungus that does not contain any of the ergot alkaloid synthesis genes. HPLC and LC-MS analyses demonstrated that transformed A. nidulans strains produced chanoclavine-I and an earlier pathway intermediate. Aspergillus nidulans transformants containing dmaW, easF, and either easE or easC did not produce chanoclavine-I but did produce an early pathway intermediate and, in the case of the easC transformant, an additional ergot alkaloid-like compound. We conclude that dmaW, easF, easE, and easC are sufficient for the synthesis of chanoclavine-I in A. nidulans and expressing ergot alkaloid pathway genes in A. nidulans provides a novel approach to understanding the early steps in ergot alkaloid synthesis. PMID:23435153

  16. Identifying and characterizing the most significant β-glucosidase of the novel species Aspergillus saccharolyticus

    SciTech Connect

    Sorensen, Anette; Ahring, Birgitte K.; Lubeck, Mette; Ubhayasekera, Wimal; Bruno, Kenneth S.; Culley, David E.; Lubeck, Peter S.

    2012-08-20

    A newly discovered fungal species, Aspergillus saccharolyticus, was found to produce a culture broth rich in beta-glucosidase activity. In this present work, the main beta-glucosidase of A. saccharolyticus responsible for the efficient hydrolytic activity was identified, isolated, and characterized. Ion exchange chromatography was used to fractionate the culture broth, yielding fractions with high beta-glucosidase activity and only one visible band on an SDS-PAGE gel. Mass spectrometry analysis of this band gave peptide matches to beta-glucosidases from aspergilli. Through a PCR approach using degenerate primers and genome walking, a 2919 base pair sequence encoding the 860 amino acid BGL1 polypeptide was determined. BGL1 of A. saccharolyticus has 91% and 82% identity with BGL1 from Aspergillus aculeatus and BGL1 from Aspergillus niger, respectively, both belonging to Glycoside hydrolase family 3. Homology modeling studies suggested beta-glucosidase activity with preserved retaining mechanism and a wider catalytic pocket compared to other beta-glucosidases. The bgl1 gene was heterologously expressed in Trichoderma reesei QM6a, purified, and characterized by enzyme kinetics studies. The enzyme can hydrolyze cellobiose, pNPG, and cellodextrins. The enzyme showed good thermostability, was stable at 50°C, and at 60°C it had a half-life of approximately 6 hours.

  17. Stackfile Database

    NASA Technical Reports Server (NTRS)

    deVarvalho, Robert; Desai, Shailen D.; Haines, Bruce J.; Kruizinga, Gerhard L.; Gilmer, Christopher

    2013-01-01

    This software provides storage retrieval and analysis functionality for managing satellite altimetry data. It improves the efficiency and analysis capabilities of existing database software with improved flexibility and documentation. It offers flexibility in the type of data that can be stored. There is efficient retrieval either across the spatial domain or the time domain. Built-in analysis tools are provided for frequently performed altimetry tasks. This software package is used for storing and manipulating satellite measurement data. It was developed with a focus on handling the requirements of repeat-track altimetry missions such as Topex and Jason. It was, however, designed to work with a wide variety of satellite measurement data [e.g., Gravity Recovery And Climate Experiment -- GRACE). The software consists of several command-line tools for importing, retrieving, and analyzing satellite measurement data.

  18. Aflatoxigenic Aspergillus flavus and Aspergillus parasiticus strains in Hungarian maize fields.

    PubMed

    Sebők, Flóra; Dobolyi, Csaba; Zágoni, Dóra; Risa, Anita; Krifaton, Csilla; Hartman, Mátyás; Cserháti, Mátyás; Szoboszlay, Sándor; Kriszt, Balázs

    2016-12-01

    Due to the climate change, aflatoxigenic Aspergillus species and strains have appeared in several European countries, contaminating different agricultural commodities with aflatoxin. Our aim was to screen the presence of aflatoxigenic fungi in maize fields throughout the seven geographic regions of Hungary. Fungi belonging to Aspergillus section Flavi were isolated in the ratio of 26.9% and 42.3% from soil and maize samples in 2013, and these ratios decreased to 16.1% and 34.7% in 2014. Based on morphological characteristics and the sequence analysis of the partial calmodulin gene, all isolates proved to be Aspergillus flavus, except four strains, which were identified as Aspergillus parasiticus. About half of the A. flavus strains and all the A. parasiticus strains were able to synthesize aflatoxins. Aflatoxigenic Aspergillus strains were isolated from all the seven regions of Hungary. A. parasiticus strains were found in the soil of the regions Southern Great Plain and Southern Transdanubia and in a maize sample of the region Western Transdanubia. In spite of the fact that aflatoxins have rarely been detected in feeds and foods in Hungary, aflatoxigenic A. flavus and A. parasiticus strains are present in the maize culture throughout Hungary posing a potential threat to food safety.

  19. Aspergillus pacemaker endocarditis presenting as pulmonary embolism.

    PubMed

    Mateos-Colino, A; Golpe, R; González-Rodríguez, A; González-Juanatey, C; Legarra, J J; Blanco, M

    2005-06-01

    Pacemaker endocarditis (PME) is a rare but severe complication of endocardial pacemaker implantation. Fungal PME is extremely uncommon. The case of a 66-year-old female patient who was diagnosed as having a pulmonary embolus based upon the patient's clinical presentation and computed tomography angiography findings is presented. Transthoracic echocardiography demonstrated a huge vegetation attached to the pacemaker wire. The pacemaker system was removed surgically during cardiovascular bypass. The vegetation was cultured, the results of which were positive for Aspergillus spp. No risk factors for Aspergillus infection were found in the patient. She was treated with liposomal amphotericin B for 3 weeks, followed by itraconazole for 40 weeks. At 1 year later, the patient remains asymptomatic.

  20. Aspergillus bertholletius sp. nov. from Brazil Nuts

    PubMed Central

    Taniwaki, Marta H.; Pitt, John I.; Iamanaka, Beatriz T.; Sartori, Daniele; Copetti, Marina V.; Balajee, Arun; Fungaro, Maria Helena P.; Frisvad, Jens C.

    2012-01-01

    During a study on the mycobiota of brazil nuts (Bertholletia excelsa) in Brazil, a new Aspergillus species, A. bertholletius, was found, and is described here. A polyphasic approach was applied using morphological characters, extrolite data as well as partial β-tubulin, calmodulin and ITS sequences to characterize this taxon. A. bertholletius is represented by nineteen isolates from samples of brazil nuts at various stages of production and soil close to Bertholletia excelsa trees. The following extrolites were produced by this species: aflavinin, cyclopiazonic acid, kojic acid, tenuazonic acid and ustilaginoidin C. Phylogenetic analysis using partial β-tubulin and camodulin gene sequences showed that A. bertholletius represents a new phylogenetic clade in Aspergillus section Flavi. The type strain of A. bertholletius is CCT 7615 ( = ITAL 270/06 = IBT 29228). PMID:22952594

  1. Aspergillus bertholletius sp. nov. from Brazil nuts.

    PubMed

    Taniwaki, Marta H; Pitt, John I; Iamanaka, Beatriz T; Sartori, Daniele; Copetti, Marina V; Balajee, Arun; Fungaro, Maria Helena P; Frisvad, Jens C

    2012-01-01

    During a study on the mycobiota of brazil nuts (Bertholletia excelsa) in Brazil, a new Aspergillus species, A. bertholletius, was found, and is described here. A polyphasic approach was applied using morphological characters, extrolite data as well as partial β-tubulin, calmodulin and ITS sequences to characterize this taxon. A. bertholletius is represented by nineteen isolates from samples of brazil nuts at various stages of production and soil close to Bertholletia excelsa trees. The following extrolites were produced by this species: aflavinin, cyclopiazonic acid, kojic acid, tenuazonic acid and ustilaginoidin C. Phylogenetic analysis using partial β-tubulin and camodulin gene sequences showed that A. bertholletius represents a new phylogenetic clade in Aspergillus section Flavi. The type strain of A. bertholletius is CCT 7615 ( = ITAL 270/06 = IBT 29228).

  2. Chronic bilateral otomycosis caused by Aspergillus niger.

    PubMed

    Mishra, G S; Mehta, Niral; Pal, M

    2004-02-01

    Aspergillus niger, an opportunistic filamentous fungus, was identified as the cause of chronic bilateral otomycosis in a 46-year-old female patient who was unresponsive to different drugs. The patient showed signs of erythema, otalgia, itching, otorrhoea and presence of greyish black coloured mass in both the ear canals. The direct microscopical examination of the ear debris in potassium hydroxide preparations, Giemsa, phase contrast and Gram revealed many thin, branched septate hyphae, condia and conidiophores morphologically indistinguishable from Aspergillus spp. The histopathological section of the ear wax mass by haematoxylin and eosin and periodic acid-Schiff techniques also showed similar fungal elements. The patient responded to 1% solution of mercurochrome. The use of mercurochrome in developing countries like India may be recommended to treat the fungal otitis in patients. We also emphasize that 'Narayan' stain should be routinely employed by microbiology and public health laboratories to study the morphology of pathogenic fungi.

  3. Identification of a Novel L-rhamnose Uptake Transporter in the Filamentous Fungus Aspergillus niger

    PubMed Central

    Sloothaak, Jasper; Odoni, Dorett I.; Martins dos Santos, Vitor A. P.; Schaap, Peter J.

    2016-01-01

    The study of plant biomass utilization by fungi is a research field of great interest due to its many implications in ecology, agriculture and biotechnology. Most of the efforts done to increase the understanding of the use of plant cell walls by fungi have been focused on the degradation of cellulose and hemicellulose, and transport and metabolism of their constituent monosaccharides. Pectin is another important constituent of plant cell walls, but has received less attention. In relation to the uptake of pectic building blocks, fungal transporters for the uptake of galacturonic acid recently have been reported in Aspergillus niger and Neurospora crassa. However, not a single L-rhamnose (6-deoxy-L-mannose) transporter has been identified yet in fungi or in other eukaryotic organisms. L-rhamnose is a deoxy-sugar present in plant cell wall pectic polysaccharides (mainly rhamnogalacturonan I and rhamnogalacturonan II), but is also found in diverse plant secondary metabolites (e.g. anthocyanins, flavonoids and triterpenoids), in the green seaweed sulfated polysaccharide ulvan, and in glycan structures from viruses and bacteria. Here, a comparative plasmalemma proteomic analysis was used to identify candidate L-rhamnose transporters in A. niger. Further analysis was focused on protein ID 1119135 (RhtA) (JGI A. niger ATCC 1015 genome database). RhtA was classified as a Family 7 Fucose: H+ Symporter (FHS) within the Major Facilitator Superfamily. Family 7 currently includes exclusively bacterial transporters able to use different sugars. Strong indications for its role in L-rhamnose transport were obtained by functional complementation of the Saccharomyces cerevisiae EBY.VW.4000 strain in growth studies with a range of potential substrates. Biochemical analysis using L-[3H(G)]-rhamnose confirmed that RhtA is a L-rhamnose transporter. The RhtA gene is located in tandem with a hypothetical alpha-L-rhamnosidase gene (rhaB). Transcriptional analysis of rhtA and rha

  4. MaizeGDB, the maize model organism database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    MaizeGDB is the maize research community's database for maize genetic and genomic information. In this seminar I will outline our current endeavors including a full website redesign, the status of maize genome assembly and annotation projects, and work toward genome functional annotation. Mechanis...

  5. Aspergillus deflectus infection in four dogs.

    PubMed

    Jang, S S; Dorr, T E; Biberstein, E L; Wong, A

    1986-04-01

    Four cases of disseminated aspergillosis caused by Aspergillus deflectus in German Shepherds are presented. Three of the cases, which involved multiple organs, terminated in euthanasia. One case, with bony involvement of the limbs and skull, lived. The unique morphological characteristic of the conidial head resembling a briar pipe led to the identification of A. deflectus. To the authors' knowledge these are the first reported cases of infections caused by A. deflectus in man or animal.

  6. Aspergillus thyroiditis in a renal transplant recipient mimicking subacute thyroiditis.

    PubMed

    Solak, Y; Atalay, H; Nar, A; Ozbek, O; Turkmen, K; Erekul, S; Turk, S

    2011-04-01

    Fungal pathogens are increasingly encountered after renal transplantation. Aspergillus causes significant morbidity and mortality in transplant patients. Fungal thyroiditis is a rare occurrence owing to unique features of the thyroid gland. Most cases are caused by Aspergillus species and have been described in immunocompromised patients. Presentation may be identical with that of subacute thyroiditis, in which hyperthyroidism features and painful thyroid are the prominent findings. Diagnosis can be ascertained by fine-needle aspiration of thyroid showing branching hyphae of Aspergillus. We describe a renal transplant patient who developed Aspergillus thyroiditis as part of a disseminated infection successfully treated with voriconazole.

  7. Prospecting for the incidence of genes involved in ochratoxin and fumonisin biosynthesis in Brazilian strains of Aspergillus niger and Aspergillus welwitschiae.

    PubMed

    Massi, Fernanda Pelisson; Sartori, Daniele; de Souza Ferranti, Larissa; Iamanaka, Beatriz Thie; Taniwaki, Marta Hiromi; Vieira, Maria Lucia Carneiro; Fungaro, Maria Helena Pelegrinelli

    2016-03-16

    Aspergillus niger "aggregate" is an informal taxonomic rank that represents a group of species from the section Nigri. Among A. niger "aggregate" species Aspergillus niger sensu stricto and its cryptic species Aspergillus welwitschiae (=Aspergillus awamori sensu Perrone) are proven as ochratoxin A and fumonisin B2 producing species. A. niger has been frequently found in tropical and subtropical foods. A. welwitschiae is a new species, which was recently dismembered from the A. niger taxon. These species are morphologically very similar and molecular data are indispensable for their identification. A total of 175 Brazilian isolates previously identified as A. niger collected from dried fruits, Brazil nuts, coffee beans, grapes, cocoa and onions were investigated in this study. Based on partial calmodulin gene sequences about one-half of our isolates were identified as A. welwitschiae. This new species was the predominant species in onions analyzed in Brazil. A. niger and A. welwitschiae differ in their ability to produce ochratoxin A and fumonisin B2. Among A. niger isolates, approximately 32% were OTA producers, but in contrast only 1% of the A. welwitschiae isolates revealed the ability to produce ochratoxin A. Regarding fumonisin B2 production, there was a higher frequency of FB2 producing isolates in A. niger (74%) compared to A. welwitschiae (34%). Because not all A. niger and A. welwitschiae strains produce ochratoxin A and fumonisin B2, in this study a multiplex PCR was developed for detecting the presence of essential genes involved in ochratoxin (polyketide synthase and radHflavin-dependent halogenase) and fumonisin (α-oxoamine synthase) biosynthesis in the genome of A. niger and A. welwitschiae isolates. The frequency of strains harboring the mycotoxin genes was markedly different between A. niger and A. welwitschiae. All OTA producing isolates of A. niger and A. welwitschiae showed in their genome the pks and radH genes, and 95.2% of the nonproducing

  8. Genomics and Health Impact Update

    MedlinePlus

    ... Publications Birth Defects/ Child Health Cancer Cardiovascular Diseases Chronic Disease Ethics, Policy and Law Genomics in Practice Newborn Screening Pharmacogenomics Reproductive Health Tools/ Databases AMD Clips News Concepts/ Comments Pathogenicity/ Antimicrobial Resistance Epidemiology/ ...

  9. Antibiotic Extraction as a Recent Biocontrol Method for Aspergillus Niger andAspergillus Flavus Fungi in Ancient Egyptian mural paintings

    NASA Astrophysics Data System (ADS)

    Hemdan, R. Elmitwalli; Fatma, Helmi M.; Rizk, Mohammed A.; Hagrassy, Abeer F.

    Biodeterioration of mural paintings by Aspergillus niger and Aspergillus flavus Fungi has been proved in different mural paintings in Egypt nowadays. Several researches have studied the effect of fungi on mural paintings, the mechanism of interaction and methods of control. But none of these researches gives us the solution without causing a side effect. In this paper, for the first time, a recent treatment by antibiotic "6 penthyl α pyrone phenol" was applied as a successful technique for elimination of Aspergillus niger and Aspergillus flavus. On the other hand, it is favorable for cleaning Surfaces of Murals executed by tembera technique from the fungi metabolism which caused a black pigments on surfaces.

  10. GOTTCHA Database, Version 1

    SciTech Connect

    Freitas, Tracey; Chain, Patrick; Lo, Chien-Chi; Li, Po-E

    2015-08-03

    One major challenge in the field of shotgun metagenomics is the accurate identification of the organisms present within the community, based on classification of short sequence reads. Though microbial community profiling methods have emerged to attempt to rapidly classify the millions of reads output from contemporary sequencers, the combination of incomplete databases, similarity among otherwise divergent genomes, and the large volumes of sequencing data required for metagenome sequencing has led to unacceptably high false discovery rates (FDR). Here we present the application of a novel, gene-independent and signature-based metagenomic taxonomic profiling tool with significantly smaller FDR, which is also capable of classifying never-before seen genomes into the appropriate parent taxa.The algorithm is based upon three primary computational phases: (I) genomic decomposition into bit vectors, (II) bit vector intersections to identify shared regions, and (III) bit vector subtractions to remove shared regions and reveal unique, signature regions.In the Decomposition phase, genomic data is first masked to highlight only the valid (non-ambiguous) regions and then decomposed into overlapping 24-mers. The k-mers are sorted along with their start positions, de-replicated, and then prefixed, to minimize data duplication. The prefixes are indexed and an identical data structure is created for the start positions to mimic that of the k-mer data structure.During the Intersection phase -- which is the most computationally intensive phase -- as an all-vs-all comparison is made, the number of comparisons is first reduced by four methods: (a) Prefix restriction, (b) Overlap detection, (c) Overlap restriction, and (d) Result recording. In Prefix restriction, only k-mers of the same prefix are compared. Within that group, potential overlap of k-mer suffixes that would result in a non-empty set intersection are screened for. If such an overlap exists, the region which intersects is

  11. Linking Virus Genomes with Host Taxonomy

    PubMed Central

    Mihara, Tomoko; Nishimura, Yosuke; Shimizu, Yugo; Nishiyama, Hiroki; Yoshikawa, Genki; Uehara, Hideya; Hingamp, Pascal; Goto, Susumu; Ogata, Hiroyuki

    2016-01-01

    Environmental genomics can describe all forms of organisms—cellular and viral—present in a community. The analysis of such eco-systems biology data relies heavily on reference databases, e.g., taxonomy or gene function databases. Reference databases of symbiosis sensu lato, although essential for the analysis of organism interaction networks, are lacking. By mining existing databases and literature, we here provide a comprehensive and manually curated database of taxonomic links between viruses and their cellular hosts. PMID:26938550

  12. Linking Virus Genomes with Host Taxonomy.

    PubMed

    Mihara, Tomoko; Nishimura, Yosuke; Shimizu, Yugo; Nishiyama, Hiroki; Yoshikawa, Genki; Uehara, Hideya; Hingamp, Pascal; Goto, Susumu; Ogata, Hiroyuki

    2016-03-01

    Environmental genomics can describe all forms of organisms--cellular and viral--present in a community. The analysis of such eco-systems biology data relies heavily on reference databases, e.g., taxonomy or gene function databases. Reference databases of symbiosis sensu lato, although essential for the analysis of organism interaction networks, are lacking. By mining existing databases and literature, we here provide a comprehensive and manually curated database of taxonomic links between viruses and their cellular hosts.

  13. Efficient degradation of tannic acid by black Aspergillus species.

    PubMed

    Van Diepeningen, Anne D; Debets, Alfons J M; Varga, Janos; van der Gaag, Marijn; Swart, Klaas; Hoekstra, Rolf F

    2004-08-01

    A set of aspergillus strains from culture collections and wild-type black aspergilli isolated on non-selective media were used to validate the use of media with 20% tannic acid for exclusive and complete selection of the black aspergilli. The 20% tannic acid medium proved useful for both quantitative and qualitative selection of all different black aspergilli, including all recognized species: A. carbonarius, A. japonicus, A. aculeatus, A foetidus, A. heteromorphus, A. niger, A. tubingensis and A. brasiliensis haplotypes. Even higher concentrations of tannic acid can be utilized by the black aspergilli suggesting a very efficient tannic acid-degrading system. Colour mutants show that the characteristic ability to grow on high tannic acid concentrations is not causally linked to the other typical feature of these aspergilli, i.e. the formation of brown-black pigments. Sequence analysis of the A. niger genome using the A. oryzae tannase gene yielded eleven tannase-like genes, far more than in related species. Therefore, a unique ecological niche in the degradation of tannic acid and connected nitrogen release seems to be reserved for these black-spored cosmopolitans.

  14. In vitro biosynthesis of glycosylphosphatidylinositol in Aspergillus fumigatus.

    PubMed

    Fontaine, Thierry; Smith, Terry K; Crossman, Arthur; Brimacombe, John S; Latgé, Jean-Paul; Ferguson, Michael A J

    2004-12-07

    Glycosylphosphatidylinositol (GPI) represents a mechanism for the attachment of proteins to the plasma membrane found in all eukaryotic cells. GPI biosynthesis has been mainly studied in parasites, yeast, and mammalian cells. Aspergillus fumigatus, a filamentous fungus, produces GPI-anchored molecules, some of them being essential in the construction of the cell wall. An in vitro assay was used to study the GPI biosynthesis in the mycelium form of this organism. In the presence of UDP-GlcNAc and coenzyme A, the cell-free system produces the initial intermediates of the GPI biosynthesis: GlcNAc-PI, GlcN-PI, and GlcN-(acyl)PI. Using GDP-Man, two types of mannosylation are observed. First, one or two mannose residues are added to GlcN-PI. This mannosylation, never described in fungi, does not require dolichol phosphomannoside (Dol-P-Man) as the monosaccharide donor. Second, one to five mannose residues are added to GlcN-(acyl)PI using Dol-P-Man as the mannose donor. The addition of ethanolamine phosphate groups to the first, second, and third mannose residue is also observed. This latter series of GPI intermediates identified in the A. fumigatus cell-free system indicates that GPI biosynthesis in this filamentous fungus is similar to the mammalian or yeast systems. Thus, these biochemical data are in agreement with a comparative genome analysis that shows that all but 3 of the 21 genes described in the Saccharomyces cerevisiae GPI pathways are found in A. fumigatus.

  15. Regulation of Conidiation by Light in Aspergillus nidulans

    PubMed Central

    Ruger-Herreros, Carmen; Rodríguez-Romero, Julio; Fernández-Barranco, Raul; Olmedo, María; Fischer, Reinhard; Corrochano, Luis M.; Canovas, David

    2011-01-01

    Light regulates several aspects of the biology of many organisms, including the balance between asexual and sexual development in some fungi. To understand how light regulates fungal development at the molecular level we have used Aspergillus nidulans as a model. We have performed a genome-wide expression analysis that has allowed us to identify >400 genes upregulated and >100 genes downregulated by light in developmentally competent mycelium. Among the upregulated genes were genes required for the regulation of asexual development, one of the major biological responses to light in A. nidulans, which is a pathway controlled by the master regulatory gene brlA. The expression of brlA, like conidiation, is induced by light. A detailed analysis of brlA light regulation revealed increased expression after short exposures with a maximum after 60 min of light followed by photoadaptation with longer light exposures. In addition to brlA, genes flbA–C and fluG are also light regulated, and flbA–C are required for the correct light-dependent regulation of the upstream regulator fluG. We have found that light induction of brlA required the photoreceptor complex composed of a phytochrome FphA, and the white-collar homologs LreA and LreB, and the fluffy genes flbA–C. We propose that the activation of regulatory genes by light is the key event in the activation of asexual development by light in A. nidulans. PMID:21624998

  16. Mapping the polysaccharide degradation potential of Aspergillus niger

    PubMed Central

    2012-01-01

    Background The degradation of plant materials by enzymes is an industry of increasing importance. For sustainable production of second generation biofuels and other products of industrial biotechnology, efficient degradation of non-edible plant polysaccharides such as hemicellulose is required. For each type of hemicellulose, a complex mixture of enzymes is required for complete conversion to fermentable monosaccharides. In plant-biomass degrading fungi, these enzymes are regulated and released by complex regulatory structures. In this study, we present a methodology for evaluating the potential of a given fungus for polysaccharide degradation. Results Through the compilation of information from 203 articles, we have systematized knowledge on the structure and degradation of 16 major types of plant polysaccharides to form a graphical overview. As a case example, we have combined this with a list of 188 genes coding for carbohydrate-active enzymes from Aspergillus niger, thus forming an analysis framework, which can be queried. Combination of this information network with gene expression analysis on mono- and polysaccharide substrates has allowed elucidation of concerted gene expression from this organism. One such example is the identification of a full set of extracellular polysaccharide-acting genes for the degradation of oat spelt xylan. Conclusions The mapping of plant polysaccharide structures along with the corresponding enzymatic activities is a powerful framework for expression analysis of carbohydrate-active enzymes. Applying this network-based approach, we provide the first genome-scale characterization of all genes coding for carbohydrate-active enzymes identified in A. niger. PMID:22799883

  17. Transcriptome analysis of Aspergillus niger grown on sugarcane bagasse

    PubMed Central

    2011-01-01

    Background Considering that the costs of cellulases and hemicellulases contribute substantially to the price of bioethanol, new studies aimed at understanding and improving cellulase efficiency and productivity are of paramount importance. Aspergillus niger has been shown to produce a wide spectrum of polysaccharide hydrolytic enzymes. To understand how to improve enzymatic cocktails that can hydrolyze pretreated sugarcane bagasse, we used a genomics approach to investigate which genes and pathways are transcriptionally modulated during growth of A. niger on steam-exploded sugarcane bagasse (SEB). Results Herein we report the main cellulase- and hemicellulase-encoding genes with increased expression during growth on SEB. We also sought to determine whether the mRNA accumulation of several SEB-induced genes encoding putative transporters is induced by xylose and dependent on glucose. We identified 18 (58% of A. niger predicted cellulases) and 21 (58% of A. niger predicted hemicellulases) cellulase- and hemicellulase-encoding genes, respectively, that were highly expressed during growth on SEB. Conclusions Degradation of sugarcane bagasse requires production of many different enzymes which are regulated by the type and complexity of the available substrate. Our presently reported work opens new possibilities for understanding sugarcane biomass saccharification by A. niger hydrolases and for the construction of more efficient enzymatic cocktails for second-generation bioethanol. PMID:22008461

  18. Secretome of Aspergillus oryzae in Shaoxing rice wine koji.

    PubMed

    Zhang, Bo; Guan, Zheng-Bing; Cao, Yu; Xie, Guang-Fa; Lu, Jian

    2012-04-16

    Shaoxing rice wine is the most famous and representative Chinese rice wine. Aspergillus oryzae SU16 is used in the manufacture of koji, the Shaoxing rice wine starter culture. In the current study, a comprehensive analysis of the secretome profile of A. oryzae SU16 in Shaoxing rice wine koji was performed for the first time. The proteomic analysis for the identification of the secretory proteins was done using two-dimensional electrophoresis combined with matrix-assisted laser desorption/ionization-tandem time of flight mass spectrometry based on the annotated A. oryzae genome sequence. A total of 41 unique proteins were identified from the secretome. These proteins included 17 extracellular proteins following the classical secretory pathway, and 10 extracellular proteins putatively secreted by the non-classical secretory pathway. The present secretome profile greatly differed from previous reports on A. oryzae growing in other solid-state nutrient sources. Several new secretory or putative secretory proteins were also found. These proteomic data will significantly aid the advancement of research on the secretome of A. oryzae, especially in solid-state cultures, and in elucidating the production process mechanism of Shaoxing rice wine koji. The findings may promote the technological development and innovation of the Shaoxing rice wine industry.

  19. Plant-like biosynthesis of isoquinoline alkaloids in Aspergillus fumigatus

    PubMed Central

    Baccile, Joshua A.; Spraker, Joseph E.; Le, Henry H.; Brandenburger, Eileen; Gomez, Christian; Bok, Jin Woo; Macheleidt, Juliane; Brakhage, Axel A.; Hoffmeister, Dirk; Keller, Nancy P.; Schroeder, Frank C.

    2016-01-01

    Natural product discovery efforts have focused primarily on microbial biosynthetic gene clusters (BGCs) containing large multi-modular PKSs and NRPSs; however, sequencing of fungal genomes has revealed a vast number of BGCs containing smaller NRPS-like genes of unknown biosynthetic function. Using comparative metabolomics, we show that a BGC in the human pathogen Aspergillus fumigatus named fsq, which contains an NRPS-like gene lacking a condensation domain, produces several novel isoquinoline alkaloids, the fumisoquins. These compounds derive from carbon-carbon bond formation between two amino acid-derived moieties followed by a sequence that is directly analogous to isoquinoline alkaloid biosynthesis in plants. Fumisoquin biosynthesis requires the N-methyltransferase FsqC and the FAD-dependent oxidase FsqB, which represent functional analogs of coclaurine N-methyltransferase and berberine bridge enzyme in plants. Our results show that BGCs containing incomplete NRPS modules may reveal new biosynthetic paradigms and suggest that plant-like isoquinoline biosynthesis occurs in diverse fungi. PMID:27065235

  20. SENTRA, a database of signal transduction proteins.

    SciTech Connect

    D'Souza, M.; Romine, M. F.; Maltsev, N.; Mathematics and Computer Science; PNNL

    2000-01-01

    SENTRA, available via URL http://wit.mcs.anl.gov/WIT2/Sentra/, is a database of proteins associated with microbial signal transduction. The database currently includes the classical two-component signal transduction pathway proteins and methyl-accepting chemotaxis proteins, but will be expanded to also include other classes of signal transduction systems that are modulated by phosphorylation or methylation reactions. Although the majority of database entries are from prokaryotic systems, eukaroytic proteins with bacterial-like signal transduction domains are also included. Currently SENTRA contains signal transduction proteins in 34 complete and almost completely sequenced prokaryotic genomes, as well as sequences from 243 organisms available in public databases (SWISS-PROT and EMBL). The analysis was carried out within the framework of the WIT2 system, which is designed and implemented to support genetic sequence analysis and comparative analysis of sequenced genomes.

  1. Identification of two aflatrem biosynthesis gene loci in Aspergillus flavus and metabolic engineering of Penicillium paxilli to elucidate their function.

    PubMed

    Nicholson, Matthew J; Koulman, Albert; Monahan, Brendon J; Pritchard, Beth L; Payne, Gary A; Scott, Barry

    2009-12-01

    Aflatrem is a potent tremorgenic toxin produced by the soil fungus Aspergillus flavus, and a member of a structurally diverse group of fungal secondary metabolites known as indole-diterpenes. Gene clusters for indole-diterpene biosynthesis have recently been described in several species of filamentous fungi. A search of Aspergillus complete genome sequence data identified putative aflatrem gene clusters in the genomes of A. flavus and Aspergillus oryzae. In both species the genes for aflatrem biosynthesis cluster at two discrete loci; the first, ATM1, is telomere proximal on chromosome 5 and contains a cluster of three genes, atmG, atmC, and atmM, and the second, ATM2, is telomere distal on chromosome 7 and contains five genes, atmD, atmQ, atmB, atmA, and atmP. Reverse transcriptase PCR in A. flavus demonstrated that aflatrem biosynthesis transcript levels increased with the onset of aflatrem production. Transfer of atmP and atmQ into Penicillium paxilli paxP and paxQ deletion mutants, known to accumulate paxilline intermediates paspaline and 13-desoxypaxilline, respectively, showed that AtmP is a functional homolog of PaxP and that AtmQ utilizes 13-desoxypaxilline as a substrate to synthesize aflatrem pathway-specific intermediates, paspalicine and paspalinine. We propose a scheme for aflatrem biosynthesis in A. flavus based on these reconstitution experiments in P. paxilli and identification of putative intermediates in wild-type cultures of A. flavus.

  2. Biodiversity of Aspergillus Species in Some Important Agricultural Products

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genus Aspergillus is one of the most important filamentous fungal genera. Aspergillus species are used in the fermentation industry, but they are also responsible of various plant and food secondary rot, with the consequence of possible accumulation of mycotoxins. The aflatoxin-producing A. fl...

  3. Phylogeny, identification and nomenclature of the genus Aspergillus

    PubMed Central

    Samson, R.A.; Visagie, C.M.; Houbraken, J.; Hong, S.-B.; Hubka, V.; Klaassen, C.H.W.; Perrone, G.; Seifert, K.A.; Susca, A.; Tanney, J.B.; Varga, J.; Kocsubé, S.; Szigeti, G.; Yaguchi, T.; Frisvad, J.C.

    2014-01-01

    Aspergillus comprises a diverse group of species based on morphological, physiological and phylogenetic characters, which significantly impact biotechnology, food production, indoor environments and human health. Aspergillus was traditionally associated with nine teleomorph genera, but phylogenetic data suggest that together with genera such as Polypaecilum, Phialosimplex, Dichotomomyces and Cristaspora, Aspergillus forms a monophyletic clade closely related to Penicillium. Changes in the International Code of Nomenclature for algae, fungi and plants resulted in the move to one name per species, meaning that a decision had to be made whether to keep Aspergillus as one big genus or to split it into several smaller genera. The International Commission of Penicillium and Aspergillus decided to keep Aspergillus instead of using smaller genera. In this paper, we present the arguments for this decision. We introduce new combinations for accepted species presently lacking an Aspergillus name and provide an updated accepted species list for the genus, now containing 339 species. To add to the scientific value of the list, we include information about living ex-type culture collection numbers and GenBank accession numbers for available representative ITS, calmodulin, β-tubulin and RPB2 sequences. In addition, we recommend a standard working technique for Aspergillus and propose calmodulin as a secondary identification marker. PMID:25492982

  4. Clonality and sex impact aflatoxigenicity in Aspergillus populations

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Species in Aspergillus section Flavi commonly infect agricultural staples such as corn, peanuts, cottonseed, and tree nuts and produce an array of mycotoxins, the most potent of which are aflatoxins. Aspergillus flavus is the dominant aflatoxin-producing species in the majority of crops. Populatio...

  5. Surgical management of Aspergillus colonization associated with lung hydatid disease.

    PubMed

    Vasquez, Julio C; Montesinos, Efrain; Rojas, Luis; Peralta, Julio; Delarosa, Jacob; Leon, Juan J

    2008-04-01

    Colonization with Aspergillus sp. usually occurs in previously formed lung cavities. Cystectomy is a widely used surgical technique for hydatid lung disease that can also leave residual cavities and potentially result in aspergilloma. We present two cases of this rare entity and a case with Aspergillus sp. colonization of an existing ruptured hydatid cyst.

  6. Fatal coinfection with Legionella pneumophila serogroup 8 and Aspergillus fumigatus.

    PubMed

    Guillouzouic, Aurélie; Bemer, Pascale; Gay-Andrieu, Françoise; Bretonnière, Cédric; Lepelletier, Didier; Mahé, Pierre-Joachim; Villers, Daniel; Jarraud, Sophie; Reynaud, Alain; Corvec, Stéphane

    2008-02-01

    Legionella pneumophila is an important cause of community-acquired and nosocomial pneumonia. We report on a patient who simultaneously developed L. pneumophila serogroup 8 pneumonia and Aspergillus fumigatus lung abscesses. Despite appropriate treatments, Aspergillus disease progressed with metastasis. Coinfections caused by L. pneumophila and A. fumigatus remain exceptional. In apparently immunocompetent patients, corticosteroid therapy is a key risk factor for aspergillosis.

  7. Ecology, development and gene regulation in Aspergillus flavus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus is one of the most widely known species of Aspergillus. It was described as a species in 1809 and first reported as a plant pathogen in 1920. More recently, A. flavus has emerged as an important opportunistic pathogen and is now rec¬ognized as the second leading cause of aspergill...

  8. Prospective Multicenter International Surveillance of Azole Resistance in Aspergillus fumigatus

    PubMed Central

    Arendrup, M.C.; Warris, A.; Lagrou, K.; Pelloux, H.; Hauser, P.M.; Chryssanthou, E.; Mellado, E.; Kidd, S.E.; Tortorano, A.M.; Dannaoui, E.; Gaustad, P.; Baddley, J.W.; Uekötter, A.; Lass-Flörl, C.; Klimko, N.; Moore, C.B.; Denning, D.W.; Pasqualotto, A.C.; Kibbler, C.; Arikan-Akdagli, S.; Andes, D.; Meletiadis, J.; Naumiuk, L.; Nucci, M.; Melchers, W.J.G.; Verweij, P.E.

    2015-01-01

    To investigate azole resistance in clinical Aspergillus isolates, we conducted prospective multicenter international surveillance. A total of 3,788 Aspergillus isolates were screened in 22 centers from 19 countries. Azole-resistant A. fumigatus was more frequently found (3.2% prevalence) than previously acknowledged, causing resistant invasive and noninvasive aspergillosis and severely compromising clinical use of azoles. PMID:25988348

  9. Sexual reproduction in Aspergillus tubingensis from section Nigri

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A sclerotium-forming member of Aspergillus section Nigri was sampled from a population in a single field in North Carolina, USA, and identified as A. tubingensis based on genealogical concordance analysis. Aspergillus tubingensis was shown to be heterothallic, with individual strains containing ei...

  10. The current status of species recognition and identification in Aspergillus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genus Aspergillus is a large economically important genus of fungi. In agriculture, some of the 250 species in this genus cause disease in plants and animals and some also produce poisons (mycotoxins) in foods and feeds. Aspergillus is a major killer of immunosuppressed people, such as diabeti...

  11. Aflatoxin production by entomopathogenic isolates of Aspergillus parasiticus and Aspergillus flavus.

    PubMed

    Drummond, J; Pinnock, D E

    1990-05-01

    Fourteen isolates of Aspergillus parasiticus and 2 isolates of Aspergillus flavus isolated from the mealybug Saccharicoccus sacchari were analyzed for production of aflatoxins B1, B2, G1, and G2 in liquid culture over a 20-day period. Twelve Aspergillus isolates including 11 A. parasiticus and 1 A. flavus produced aflatoxins which were extracted from both the mycelium and culture filtrate. Aflatoxin production was detected at day 3 and was detected continually for up to day 20. Aflatoxin B1 production was greatest between 7 and 10 days and significantly higher quantities were produced by A. flavus compared to A. parasiticus. Aflatoxin production was not a stable trait in 1 A. parasiticus isolate passaged 50 times on agar. In addition to loss of aflatoxin production, an associated loss in sporulation ability was also observed in this passaged isolate, although it did maintain pathogenicity against S. sacchari. An aflatoxin B1 concentration of 0.16 micrograms/mealybug (14.2 micrograms/g wet wt) was detected within the tissues of infected mealybugs 7 days after inoculation. In conclusion, the ability of Aspergillus isolates to produce aflatoxins was not essential to the entomopathogenic activity of this fungus against its host S. sacchari.

  12. Aspergillus tanneri sp. nov, a new pathogenic Aspergillus that causes invasive disease refractory to antifungal therapy

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This is the first report documenting fatal invasive aspergillosis caused by a new pathogenic Aspergillus species that is inherently resistant to antifungal drugs. Phenotypic characteristics of A. tanneri combined with the molecular approach enabled diagnosis of this new pathogen. This study undersco...

  13. Computational Tools for Genomic Studies in Plants.

    PubMed

    Martinez, Manuel

    2016-12-01

    In recent years, the genomic sequence of numerous plant species including the main crop species has been determined. Computational tools have been developed to deal with the issue of which plant has been sequenced and where is the sequence hosted. In this mini-review, the databases for genome projects, the databases created to host species/clade projects and the databases developed to perform plant comparative genomics are revised. Because of their importance in modern research, an in-depth analysis of the plant comparative genomics databases has been performed. This comparative analysis is focused in the common and specific computational tools developed to achieve the particular objectives of each database. Besides, emerging high-performance bioinformatics tools specific for plant research are commented. What kind of computational approaches should be implemented in next years to efficiently analyze plant genomes is discussed.

  14. Reduction of aflatoxin production by Aspergillus flavus and Aspergillus parasiticus in interaction with Streptomyces.

    PubMed

    Verheecke, C; Liboz, T; Anson, P; Diaz, R; Mathieu, F

    2015-05-01

    The aim of this study is to investigate aflatoxin gene expression during Streptomyces-Aspergillus interaction. Aflatoxins are carcinogenic compounds produced mainly by Aspergillus flavus and Aspergillus parasiticus. A previous study has shown that Streptomyces-A. flavus interaction can reduce aflatoxin content in vitro. Here, we first validated this same effect in the interaction with A. parasiticus. Moreover, we showed that growth reduction and aflatoxin content were correlated in A. parasiticus but not in A. flavus. Secondly, we investigated the mechanisms of action by reverse-transcriptase quantitative PCR. As microbial interaction can lead to variations in expression of household genes, the most stable [act1, βtub (and cox5 for A. parasiticus)] were chosen using geNorm software. To shed light on the mechanisms involved, we studied during the interaction the expression of five genes (aflD, aflM, aflP, aflR and aflS). Overall, the results of aflatoxin gene expression showed that Streptomyces repressed gene expression to a greater level in A. parasiticus than in A. flavus. Expression of aflR and aflS was generally repressed in both Aspergillus species. Expression of aflM was repressed and was correlated with aflatoxin B1 content. The results suggest that aflM expression could be a potential aflatoxin indicator in Streptomyces species interactions. Therefore, we demonstrate that Streptomyces can reduce aflatoxin production by both Aspergillus species and that this effect can be correlated with the repression of aflM expression.

  15. CottonDB: A resource for cotton genome research

    Technology Transfer Automated Retrieval System (TEKTRAN)

    CottonDB (http://cottondb.org/) is a database and web resource for cotton genomic and genetic research. Created in 1995, CottonDB was among the first plant genome databases established by the USDA-ARS. Accessed through a website interface, the database aims to be a convenient, inclusive medium of ...

  16. ADOPTING SELECTED HYDROGEN BONDING AND IONIC INTERACTIONS FROM ASPERGILLUS FUMIGATUS PHYTASE STRUCTURE IMPROVES THE THERMOSTABILITY OF ASPERGILLUS NIGER PHYA PHYTASE

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Although it has been widely used as a feed supplement to reduce manure phosphorus pollution of swine and poultry, Aspergillus niger PhyA phytase is unable to withstand heat inactivation during feed pelleting. Crystal structure comparisons with its close homolog, the thermostable Aspergillus fumigatu...

  17. Constitutive expression of fluorescent protein by Aspergillus var. niger and Aspergillus carbonarius to monitor fungal colonization in maize plants

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus niger and A. carbonarius are two species in the Aspergillus section Nigri (black-spored aspergilli) frequently associated with peanut (Arachis hypogea), maize (Zea mays), and other plants as pathogens. These infections are symptomless and as such are major concerns since some black aspe...

  18. Fumitoxins, new mycotoxins from Aspergillus fumigatus Fres.

    PubMed Central

    Debeaupuis, J P; Lafont, P

    1978-01-01

    Extracts of cultures of Aspergillus fumigatus isolated from silage were lethal to chicken embryos. Using this test and thin-layer chromatography, four UV-absorbing toxins, designated as fumitoxins A, B, C and D, were isolated. Analysis and mass spectrometry of crystallized fumitoxin A, the most abundant in the extract, established its molecular formula to be C31H42O8. Infrared, UV spectroscopy, and chemical reactions suggested that fumitoxin A is a steroid. Fumitoxins appear to be clearly different from the previously described toxins recognized in A. fumigatus. PMID:358921

  19. Aspergillus otomycosis in an immunocompromised patient.

    PubMed

    Rutt, Amy L; Sataloff, Robert T

    2008-11-01

    Aspergillus niger, an opportunistic filamentous fungus, was identified as the cause of chronic unilateral otomycosis in a 55-year old, immunocompromised man who had been unresponsive to a variety of treatment regimens. The patient presented with intermittent otalgia and otorrhea and with a perforation of his left tympanic membrane. A niger was identified in a culture specimen obtained from the patient's left ear canal. In immunocompromised patients, it is important that the treatment of otomycosis be prompt and vigorous, to minimize the likelihood of hearing loss and invasive temporal bone infection.

  20. JICST Factual Database JICST DNA Database

    NASA Astrophysics Data System (ADS)

    Shirokizawa, Yoshiko; Abe, Atsushi

    Japan Information Center of Science and Technology (JICST) has started the on-line service of DNA database in October 1988. This database is composed of EMBL Nucleotide Sequence Library and Genetic Sequence Data Bank. The authors outline the database system, data items and search commands. Examples of retrieval session are presented.