Sample records for aspergillus genome database

  1. The Aspergillus Genome Database: multispecies curation and incorporation of RNA-Seq data to improve structural gene annotations.

    PubMed

    Cerqueira, Gustavo C; Arnaud, Martha B; Inglis, Diane O; Skrzypek, Marek S; Binkley, Gail; Simison, Matt; Miyasato, Stuart R; Binkley, Jonathan; Orvis, Joshua; Shah, Prachi; Wymore, Farrell; Sherlock, Gavin; Wortman, Jennifer R

    2014-01-01

    The Aspergillus Genome Database (AspGD; http://www.aspgd.org) is a freely available web-based resource that was designed for Aspergillus researchers and is also a valuable source of information for the entire fungal research community. In addition to being a repository and central point of access to genome, transcriptome and polymorphism data, AspGD hosts a comprehensive comparative genomics toolbox that facilitates the exploration of precomputed orthologs among the 20 currently available Aspergillus genomes. AspGD curators perform gene product annotation based on review of the literature for four key Aspergillus species: Aspergillus nidulans, Aspergillus oryzae, Aspergillus fumigatus and Aspergillus niger. We have iteratively improved the structural annotation of Aspergillus genomes through the analysis of publicly available transcription data, mostly expressed sequenced tags, as described in a previous NAR Database article (Arnaud et al. 2012). In this update, we report substantive structural annotation improvements for A. nidulans, A. oryzae and A. fumigatus genomes based on recently available RNA-Seq data. Over 26 000 loci were updated across these species; although those primarily comprise the addition and extension of untranslated regions (UTRs), the new analysis also enabled over 1000 modifications affecting the coding sequence of genes in each target genome.

  2. What the Aspergillus genomes have told us.

    PubMed

    Nierman, W C; May, G; Kim, H S; Anderson, M J; Chen, D; Denning, D W

    2005-05-01

    The sequencing and annotation of the genomes of the first strains of Aspergillus nidulans, Aspergillus oryzae, and Aspergillus fumigatus will be seen in retrospect as a transformational event in Aspergillus biology. With this event the entire genetic composition of A. nidulans, the sexual experimental model organism of the genus Aspergillus, A. oryzae, the food biotechnology organism which is the product of centuries of cultivation, and A. fumigatus, the most common causative agent of invasive aspergillosis is now revealed to the extent that we are at present able to understand. Each genome exhibits a large set of genes common to the three as well as a much smaller set of genes unique to each. Moreover, these sequences serve as resources providing the major tool to expanding our understanding of the biology of each. Transcription profiling of A. fumigatus at high temperatures and comparative genomic hybridization between A. fumigatus and a closely related Aspergillus species provides microarray based examples of the beginning of functional analysis of the genomes of these organisms going forward from the genome sequence.

  3. The function and evolution of the Aspergillus genome

    PubMed Central

    Gibbons, John G.; Rokas, Antonis

    2012-01-01

    Species in the filamentous fungal genus Aspergillus display a wide diversity of lifestyles and are of great importance to humans. The decoding of genome sequences from a dozen species that vary widely in their degree of evolutionary affinity has galvanized studies of the function and evolution of the Aspergillus genome in clinical, industrial, and agricultural environments. Here, we synthesize recent key findings that shed light on the architecture of the Aspergillus genome, on the molecular foundations of the genus’ astounding dexterity and diversity in secondary metabolism, and on the genetic underpinnings of virulence in Aspergillus fumigatus, one of the most lethal fungal pathogens. Many of these insights dramatically expand our knowledge of fungal and microbial eukaryote genome evolution and function and argue that Aspergillus constitutes a superb model clade for the study of functional and comparative genomics. PMID:23084572

  4. Aspergillus flavus Blast2GO gene ontology database: elevated growth temperature alters amino acid metabolism

    USDA-ARS?s Scientific Manuscript database

    The availability of a representative gene ontology (GO) database is a prerequisite for a successful functional genomics study. Using online Blast2GO resources we constructed a GO database of Aspergillus flavus. Of the predicted total 13,485 A. flavus genes 8,987 were annotated with GO terms. The mea...

  5. Comparative Reannotation of 21 Aspergillus Genomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Salamov, Asaf; Riley, Robert; Kuo, Alan

    2013-03-08

    We used comparative gene modeling to reannotate 21 Aspergillus genomes. Initial automatic annotation of individual genomes may contain some errors of different nature, e.g. missing genes, incorrect exon-intron structures, 'chimeras', which fuse 2 or more real genes or alternatively splitting some real genes into 2 or more models. The main premise behind the comparative modeling approach is that for closely related genomes most orthologous families have the same conserved gene structure. The algorithm maps all gene models predicted in each individual Aspergillus genome to the other genomes and, for each locus, selects from potentially many competing models, the one whichmore » most closely resembles the orthologous genes from other genomes. This procedure is iterated until no further change in gene models is observed. For Aspergillus genomes we predicted in total 4503 new gene models ( ~;;2percent per genome), supported by comparative analysis, additionally correcting ~;;18percent of old gene models. This resulted in a total of 4065 more genes with annotated PFAM domains (~;;3percent increase per genome). Analysis of a few genomes with EST/transcriptomics data shows that the new annotation sets also have a higher number of EST-supported splice sites at exon-intron boundaries.« less

  6. Genomic Islands in Pathogenic Filamentous Fungus Aspergillus fumigatus

    USDA-ARS?s Scientific Manuscript database

    We present the genome sequences of a new clinical isolate, CEA10, of an important human pathogen, Aspergillus fumigatus, and two closely related, but rarely pathogenic species, Neosartorya fischeri NRRL181 and Aspergillus clavatus NRRL1. Comparative genomic analysis of CEA10 with the recently sequen...

  7. Genome sequence of Aspergillus luchuensis NBRC 4314

    PubMed Central

    Yamada, Osamu; Machida, Masayuki; Hosoyama, Akira; Goto, Masatoshi; Takahashi, Toru; Futagami, Taiki; Yamagata, Youhei; Takeuchi, Michio; Kobayashi, Tetsuo; Koike, Hideaki; Abe, Keietsu; Asai, Kiyoshi; Arita, Masanori; Fujita, Nobuyuki; Fukuda, Kazuro; Higa, Ken-ichi; Horikawa, Hiroshi; Ishikawa, Takeaki; Jinno, Koji; Kato, Yumiko; Kirimura, Kohtaro; Mizutani, Osamu; Nakasone, Kaoru; Sano, Motoaki; Shiraishi, Yohei; Tsukahara, Masatoshi; Gomi, Katsuya

    2016-01-01

    Awamori is a traditional distilled beverage made from steamed Thai-Indica rice in Okinawa, Japan. For brewing the liquor, two microbes, local kuro (black) koji mold Aspergillus luchuensis and awamori yeast Saccharomyces cerevisiae are involved. In contrast, that yeasts are used for ethanol fermentation throughout the world, a characteristic of Japanese fermentation industries is the use of Aspergillus molds as a source of enzymes for the maceration and saccharification of raw materials. Here we report the draft genome of a kuro (black) koji mold, A. luchuensis NBRC 4314 (RIB 2604). The total length of nonredundant sequences was nearly 34.7 Mb, comprising approximately 2,300 contigs with 16 telomere-like sequences. In total, 11,691 genes were predicted to encode proteins. Most of the housekeeping genes, such as transcription factors and N-and O-glycosylation system, were conserved with respect to Aspergillus niger and Aspergillus oryzae. An alternative oxidase and acid-stable α-amylase regarding citric acid production and fermentation at a low pH as well as a unique glutamic peptidase were also found in the genome. Furthermore, key biosynthetic gene clusters of ochratoxin A and fumonisin B were absent when compared with A. niger genome, showing the safety of A. luchuensis for food and beverage production. This genome information will facilitate not only comparative genomics with industrial kuro-koji molds, but also molecular breeding of the molds in improvements of awamori fermentation. PMID:27651094

  8. What can comparative genomics tell us about species concepts in the genus Aspergillus?

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rokas, Antonis; payne, gary; Federova, Natalie D.

    2007-12-15

    Understanding the nature of species" boundaries is a fundamental question in evolutionary biology. The availability of genomes from several species of the genus Aspergillus allows us for the first time to examine the demarcation of fungal species at the whole-genome level. Here, we examine four case studies, two of which involve intraspecific comparisons, whereas the other two deal with interspecific genomic comparisons between closely related species. These four comparisons reveal significant variation in the nature of species boundaries across Aspergillus. For example, comparisons between A. fumigatus and Neosartorya fischeri (the teleomorph of A. fischerianus) and between A. oryzae and A.more » flavus suggest that measures of sequence similarity and species-specific genes are significantly higher for the A. fumigatus - N. fischeri pair. Importantly, the values obtained from the comparison between A. oryzae and A. flavus are remarkably similar to those obtained from an intra-specific comparison of A. fumigatus strains, giving support to the proposal that A. oryzae represents a distinct ecotype of A. flavus and not a distinct species. We argue that genomic data can aid Aspergillus taxonomy by serving as a source of novel and unprecedented amounts of comparative data, as a resource for the development of additional diagnostic tools, and finally as a knowledge database about the biological differences between strains and species.« less

  9. Sequencing and functional annotation of the whole genome of the filamentous fungus Aspergillus westerdijkiae.

    PubMed

    Han, Xiaolong; Chakrabortti, Alolika; Zhu, Jindong; Liang, Zhao-Xun; Li, Jinming

    2016-08-15

    Aspergillus westerdijkiae produces ochratoxin A (OTA) in Aspergillus section Circumdati. It is responsible for the contamination of agricultural crops, fruits, and food commodities, as its secondary metabolite OTA poses a potential threat to animals and humans. As a member of the filamentous fungi family, its capacity for enzymatic catalysis and secondary metabolite production is valuable in industrial production and medicine. To understand the genetic factors underlying its pathogenicity, enzymatic degradation, and secondary metabolism, we analysed the whole genome of A. westerdijkiae and compared it with eight other sequenced Aspergillus species. We sequenced the complete genome of A. westerdijkiae and assembled approximately 36 Mb of its genomic DNA, in which we identified 10,861 putative protein-coding genes. We constructed a phylogenetic tree of A. westerdijkiae and eight other sequenced Aspergillus species and found that the sister group of A. westerdijkiae was the A. oryzae - A. flavus clade. By searching the associated databases, we identified 716 cytochrome P450 enzymes, 633 carbohydrate-active enzymes, and 377 proteases. By combining comparative analysis with Kyoto Encyclopaedia of Genes and Genomes (KEGG), Conserved Domains Database (CDD), and Pfam annotations, we predicted 228 potential carbohydrate-active enzymes related to plant polysaccharide degradation (PPD). We found a large number of secondary biosynthetic gene clusters, which suggested that A. westerdijkiae had a remarkable capacity to produce secondary metabolites. Furthermore, we obtained two more reliable and integrated gene sequences containing the reported portions of OTA biosynthesis and identified their respective secondary metabolite clusters. We also systematically annotated these two hybrid t1pks-nrps gene clusters involved in OTA biosynthesis. These two clusters were separate in the genome, and one of them encoded a couple of GH3 and AA3 enzyme genes involved in sucrose and glucose

  10. Post-genomic insights into the plant polysaccharide degradation potential of Aspergillus nidulans and comparison to Aspergillus niger and Aspergillus oryzae.

    PubMed

    Coutinho, Pedro M; Andersen, Mikael R; Kolenova, Katarina; vanKuyk, Patricia A; Benoit, Isabelle; Gruben, Birgit S; Trejo-Aguilar, Blanca; Visser, Hans; van Solingen, Piet; Pakula, Tiina; Seiboth, Bernard; Battaglia, Evy; Aguilar-Osorio, Guillermo; de Jong, Jan F; Ohm, Robin A; Aguilar, Mariana; Henrissat, Bernard; Nielsen, Jens; Stålbrand, Henrik; de Vries, Ronald P

    2009-03-01

    The plant polysaccharide degradative potential of Aspergillus nidulans was analysed in detail and compared to that of Aspergillus niger and Aspergillus oryzae using a combination of bioinformatics, physiology and transcriptomics. Manual verification indicated that 28.4% of the A. nidulans ORFs analysed in this study do not contain a secretion signal, of which 40% may be secreted through a non-classical method.While significant differences were found between the species in the numbers of ORFs assigned to the relevant CAZy families, no significant difference was observed in growth on polysaccharides. Growth differences were observed between the Aspergilli and Podospora anserina, which has a more different genomic potential for polysaccharide degradation, suggesting that large genomic differences are required to cause growth differences on polysaccharides. Differences were also detected between the Aspergilli in the presence of putative regulatory sequences in the promoters of the ORFs of this study and correlation of the presence of putative XlnR binding sites to induction by xylose was detected for A. niger. These data demonstrate differences at genome content, substrate specificity of the enzymes and gene regulation in these three Aspergilli, which likely reflect their individual adaptation to their natural biotope.

  11. Structural and functional insights of β-glucosidases identified from the genome of Aspergillus fumigatus

    NASA Astrophysics Data System (ADS)

    Dodda, Subba Reddy; Aich, Aparajita; Sarkar, Nibedita; Jain, Piyush; Jain, Sneha; Mondal, Sudipa; Aikat, Kaustav; Mukhopadhyay, Sudit S.

    2018-03-01

    Thermostable glucose tolerant β-glucosidase from Aspergillus species has attracted worldwide interest for their potentiality in industrial applications and bioethanol production. A strain of Aspergillus fumigatus (AfNITDGPKA3) identified by our laboratory from straw retting ground showed higher cellulase activity, specifically the β-glucosidase activity, compared to other contemporary strains. Though A. fumigatus has been known for high cellulase activity, detailed identification and characterization of the cellulase genes from their genome is yet to be done. In this work we have been analyzed the cellulase genes from the genome sequence database of Aspergillus fumigatus (Af293). Genome analysis suggests two cellobiohydrolase, eleven endoglucanase and seventeen β-glucosidase genes present. β-Glucosidase genes belong to either Glycohydro1 (GH1 or Bgl1) or Glycohydro3 (GH3 or Bgl3) family. The sequence similarity suggests that Bgl1 and Bgl3 of A. fumagatus are phylogenetically close to those of A. fisheri and A. oryzae. The modelled structure of the Bgl1 predicts the (β/α)8 barrel type structure with deep and narrow active site, whereas, Bgl3 shows the (α/β)8 barrel and (α/β)6 sandwich structure with shallow and open active site. Docking results suggest that amino acids Glu544, Glu466, Trp408,Trp567,Tyr44,Tyr222,Tyr770,Asp844,Asp537,Asn212,Asn217 of Bgl3 and Asp224,Asn242,Glu440, Glu445, Tyr367, Tyr365,Thr994,Trp435,Trp446 of Bgl1 are involved in the hydrolysis. Binding affinity analyses suggest that Bgl3 and Bgl1 enzymes are more active on the substrates like 4-methylumbelliferyl glycoside (MUG) and p-nitrophenyl-β-D-1, 4-glucopyranoside (pNPG) than on cellobiose. Further docking with glucose suggests that Bgl1 is more glucose tolerant than Bgl3. Analysis of the Aspergillus fumigatus genome may help to identify a β-glucosidase enzyme with better property and the structural information may help to develop an engineered recombinant enzyme.

  12. Draft Genome Sequence of Aspergillus oryzae ATCC 12892

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Deng, Shuang; Pomraning, Kyle R.; Bohutskyi, Pavlo

    The draft genome sequence ofAspergillus oryzaeATCC 12892 is presented here.A. oryzaeproduces 3-nitropropionic acid, which has been investigated with regard to understanding the biosynthesis of nitroorganic compounds.

  13. Genomic sequence for the aflatoxigenic filamentous fungus Aspergillus nomius

    USDA-ARS?s Scientific Manuscript database

    The genome of the A. nomius type strain was sequenced using a personal genome machine. Annotation of the genes was undertaken, followed by gene ontology and an investigation into the number of secondary metabolite clusters. Comparative studies with other Aspergillus species involved shared/unique ge...

  14. Genome databases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts inmore » the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.« less

  15. Linking secondary metabolites to gene clusters through genome sequencing of six diverse Aspergillus species

    DOE PAGES

    Kjerbolling, Inge; Vesth, Tammi C.; Frisvad, Jens C.; ...

    2018-01-09

    The fungal genus of Aspergillus is highly interesting, containing everything from industrial cell factories over model organisms to human pathogens. In particular, this group has a prolific production of bioactive secondary metabolites (SMs). In this work, four diverse Aspergillus species (A. campestris, A. novofumigatus, A. ochraceoroseus and A. steynii) has been whole genome PacBio sequenced to provide genetic references in three Aspergillus sections. Additionally, A. taichungensis and A. candidus were sequenced for SM elucidation. Thirteen Aspergillus genomes were analysed with comparative genomics to determine phylogeny and genetic diversity, showing that each new genome contains 15–27% genes not found in othermore » sequenced Aspergilli. In particular, the new species A. novofumigatus was compared to the pathogenic species A. fumigatus. This suggests that A. novofumigatus can produce most of the same allergens, virulence and pathogenicity factors as A. fumigatus suggesting that A. novofumigatus could be as pathogenic as A. fumigatus. Furthermore, SMs were linked to gene clusters based on biological and chemical knowledge and analysis, genome sequences and predictive algorithms.« less

  16. Linking secondary metabolites to gene clusters through genome sequencing of six diverse Aspergillus species

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kjerbolling, Inge; Vesth, Tammi C.; Frisvad, Jens C.

    The fungal genus of Aspergillus is highly interesting, containing everything from industrial cell factories over model organisms to human pathogens. In particular, this group has a prolific production of bioactive secondary metabolites (SMs). In this work, four diverse Aspergillus species (A. campestris, A. novofumigatus, A. ochraceoroseus and A. steynii) has been whole genome PacBio sequenced to provide genetic references in three Aspergillus sections. Additionally, A. taichungensis and A. candidus were sequenced for SM elucidation. Thirteen Aspergillus genomes were analysed with comparative genomics to determine phylogeny and genetic diversity, showing that each new genome contains 15–27% genes not found in othermore » sequenced Aspergilli. In particular, the new species A. novofumigatus was compared to the pathogenic species A. fumigatus. This suggests that A. novofumigatus can produce most of the same allergens, virulence and pathogenicity factors as A. fumigatus suggesting that A. novofumigatus could be as pathogenic as A. fumigatus. Furthermore, SMs were linked to gene clusters based on biological and chemical knowledge and analysis, genome sequences and predictive algorithms.« less

  17. The Sequenced Angiosperm Genomes and Genome Databases.

    PubMed

    Chen, Fei; Dong, Wei; Zhang, Jiawei; Guo, Xinyue; Chen, Junhao; Wang, Zhengjia; Lin, Zhenguo; Tang, Haibao; Zhang, Liangsheng

    2018-01-01

    Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology.

  18. The Sequenced Angiosperm Genomes and Genome Databases

    PubMed Central

    Chen, Fei; Dong, Wei; Zhang, Jiawei; Guo, Xinyue; Chen, Junhao; Wang, Zhengjia; Lin, Zhenguo; Tang, Haibao; Zhang, Liangsheng

    2018-01-01

    Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology. PMID:29706973

  19. Survey of protein–DNA interactions in Aspergillus oryzae on a genomic scale

    PubMed Central

    Wang, Chao; Lv, Yangyong; Wang, Bin; Yin, Chao; Lin, Ying; Pan, Li

    2015-01-01

    The genome-scale delineation of in vivo protein–DNA interactions is key to understanding genome function. Only ∼5% of transcription factors (TFs) in the Aspergillus genus have been identified using traditional methods. Although the Aspergillus oryzae genome contains >600 TFs, knowledge of the in vivo genome-wide TF-binding sites (TFBSs) in aspergilli remains limited because of the lack of high-quality antibodies. We investigated the landscape of in vivo protein–DNA interactions across the A. oryzae genome through coupling the DNase I digestion of intact nuclei with massively parallel sequencing and the analysis of cleavage patterns in protein–DNA interactions at single-nucleotide resolution. The resulting map identified overrepresented de novo TF-binding motifs from genomic footprints, and provided the detailed chromatin remodeling patterns and the distribution of digital footprints near transcription start sites. The TFBSs of 19 known Aspergillus TFs were also identified based on DNase I digestion data surrounding potential binding sites in conjunction with TF binding specificity information. We observed that the cleavage patterns of TFBSs were dependent on the orientation of TF motifs and independent of strand orientation, consistent with the DNA shape features of binding motifs with flanking sequences. PMID:25883143

  20. A genomics based discovery of secondary metabolite biosynthetic gene clusters in Aspergillus ustus.

    PubMed

    Pi, Borui; Yu, Dongliang; Dai, Fangwei; Song, Xiaoming; Zhu, Congyi; Li, Hongye; Yu, Yunsong

    2015-01-01

    Secondary metabolites (SMs) produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic.

  1. A Genomics Based Discovery of Secondary Metabolite Biosynthetic Gene Clusters in Aspergillus ustus

    PubMed Central

    Pi, Borui; Yu, Dongliang; Dai, Fangwei; Song, Xiaoming; Zhu, Congyi; Li, Hongye; Yu, Yunsong

    2015-01-01

    Secondary metabolites (SMs) produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic. PMID:25706180

  2. Sequencing of mitochondrial genomes of nine Aspergillus and Penicillium species identifies mobile introns and accessory genes as main sources of genome size variability.

    PubMed

    Joardar, Vinita; Abrams, Natalie F; Hostetler, Jessica; Paukstelis, Paul J; Pakala, Suchitra; Pakala, Suman B; Zafar, Nikhat; Abolude, Olukemi O; Payne, Gary; Andrianopoulos, Alex; Denning, David W; Nierman, William C

    2012-12-12

    The genera Aspergillus and Penicillium include some of the most beneficial as well as the most harmful fungal species such as the penicillin-producer Penicillium chrysogenum and the human pathogen Aspergillus fumigatus, respectively. Their mitochondrial genomic sequences may hold vital clues into the mechanisms of their evolution, population genetics, and biology, yet only a handful of these genomes have been fully sequenced and annotated. Here we report the complete sequence and annotation of the mitochondrial genomes of six Aspergillus and three Penicillium species: A. fumigatus, A. clavatus, A. oryzae, A. flavus, Neosartorya fischeri (A. fischerianus), A. terreus, P. chrysogenum, P. marneffei, and Talaromyces stipitatus (P. stipitatum). The accompanying comparative analysis of these and related publicly available mitochondrial genomes reveals wide variation in size (25-36 Kb) among these closely related fungi. The sources of genome expansion include group I introns and accessory genes encoding putative homing endonucleases, DNA and RNA polymerases (presumed to be of plasmid origin) and hypothetical proteins. The two smallest sequenced genomes (A. terreus and P. chrysogenum) do not contain introns in protein-coding genes, whereas the largest genome (T. stipitatus), contains a total of eleven introns. All of the sequenced genomes have a group I intron in the large ribosomal subunit RNA gene, suggesting that this intron is fixed in these species. Subsequent analysis of several A. fumigatus strains showed low intraspecies variation. This study also includes a phylogenetic analysis based on 14 concatenated core mitochondrial proteins. The phylogenetic tree has a different topology from published multilocus trees, highlighting the challenges still facing the Aspergillus systematics. The study expands the genomic resources available to fungal biologists by providing mitochondrial genomes with consistent annotations for future genetic, evolutionary and population

  3. Mycobacteriophage genome database.

    PubMed

    Joseph, Jerrine; Rajendran, Vasanthi; Hassan, Sameer; Kumar, Vanaja

    2011-01-01

    Mycobacteriophage genome database (MGDB) is an exclusive repository of the 64 completely sequenced mycobacteriophages with annotated information. It is a comprehensive compilation of the various gene parameters captured from several databases pooled together to empower mycobacteriophage researchers. The MGDB (Version No.1.0) comprises of 6086 genes from 64 mycobacteriophages classified into 72 families based on ACLAME database. Manual curation was aided by information available from public databases which was enriched further by analysis. Its web interface allows browsing as well as querying the classification. The main objective is to collect and organize the complexity inherent to mycobacteriophage protein classification in a rational way. The other objective is to browse the existing and new genomes and describe their functional annotation. The database is available for free at http://mpgdb.ibioinformatics.org/mpgdb.php.

  4. Standards for Clinical Grade Genomic Databases.

    PubMed

    Yohe, Sophia L; Carter, Alexis B; Pfeifer, John D; Crawford, James M; Cushman-Vokoun, Allison; Caughron, Samuel; Leonard, Debra G B

    2015-11-01

    Next-generation sequencing performed in a clinical environment must meet clinical standards, which requires reproducibility of all aspects of the testing. Clinical-grade genomic databases (CGGDs) are required to classify a variant and to assist in the professional interpretation of clinical next-generation sequencing. Applying quality laboratory standards to the reference databases used for sequence-variant interpretation presents a new challenge for validation and curation. To define CGGD and the categories of information contained in CGGDs and to frame recommendations for the structure and use of these databases in clinical patient care. Members of the College of American Pathologists Personalized Health Care Committee reviewed the literature and existing state of genomic databases and developed a framework for guiding CGGD development in the future. Clinical-grade genomic databases may provide different types of information. This work group defined 3 layers of information in CGGDs: clinical genomic variant repositories, genomic medical data repositories, and genomic medicine evidence databases. The layers are differentiated by the types of genomic and medical information contained and the utility in assisting with clinical interpretation of genomic variants. Clinical-grade genomic databases must meet specific standards regarding submission, curation, and retrieval of data, as well as the maintenance of privacy and security. These organizing principles for CGGDs should serve as a foundation for future development of specific standards that support the use of such databases for patient care.

  5. A multilocus database for the identification of Aspergillus and Penicillium species

    USDA-ARS?s Scientific Manuscript database

    Identification of Aspergillus and Penicillium isolates using phenotypic methods is increasingly complex and difficult but genetic tools allow recognition and description of species formerly unrecognized or cryptic. We constructed a web-based taxonomic database using BIGSdb for the identification of ...

  6. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger.

    PubMed

    Wright, James C; Sugden, Deana; Francis-McIntyre, Sue; Riba-Garcia, Isabel; Gaskell, Simon J; Grigoriev, Igor V; Baker, Scott E; Beynon, Robert J; Hubbard, Simon J

    2009-02-04

    Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR). 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.

  7. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

    PubMed Central

    Wright, James C; Sugden, Deana; Francis-McIntyre, Sue; Riba-Garcia, Isabel; Gaskell, Simon J; Grigoriev, Igor V; Baker, Scott E; Beynon, Robert J; Hubbard, Simon J

    2009-01-01

    Background Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR). Results 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. Conclusion This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method. PMID:19193216

  8. Comparative Genome Analysis Between Aspergillus oryzae Strains Reveals Close Relationship Between Sites of Mutation Localization and Regions of Highly Divergent Genes among Aspergillus Species

    PubMed Central

    Umemura, Myco; Koike, Hideaki; Yamane, Noriko; Koyama, Yoshinori; Satou, Yuki; Kikuzato, Ikuya; Teruya, Morimi; Tsukahara, Masatoshi; Imada, Yumi; Wachi, Youji; Miwa, Yukino; Yano, Shuichi; Tamano, Koichi; Kawarabayasi, Yutaka; Fujimori, Kazuhiro E.; Machida, Masayuki; Hirano, Takashi

    2012-01-01

    Aspergillus oryzae has been utilized for over 1000 years in Japan for the production of various traditional foods, and a large number of A. oryzae strains have been isolated and/or selected for the effective fermentation of food ingredients. Characteristics of genetic alterations among the strains used are of particular interest in studies of A. oryzae. Here, we have sequenced the whole genome of an industrial fungal isolate, A. oryzae RIB326, by using a next-generation sequencing system and compared the data with those of A. oryzae RIB40, a wild-type strain sequenced in 2005. The aim of this study was to evaluate the mutation pressure on the non-syntenic blocks (NSBs) of the genome, which were previously identified through comparative genomic analysis of A. oryzae, Aspergillus fumigatus, and Aspergillus nidulans. We found that genes within the NSBs of RIB326 accumulate mutations more frequently than those within the SBs, regardless of their distance from the telomeres or of their expression level. Our findings suggest that the high mutation frequency of NSBs might contribute to maintaining the diversity of the A. oryzae genome. PMID:22912434

  9. Comparative genome analysis between Aspergillus oryzae strains reveals close relationship between sites of mutation localization and regions of highly divergent genes among Aspergillus species.

    PubMed

    Umemura, Myco; Koike, Hideaki; Yamane, Noriko; Koyama, Yoshinori; Satou, Yuki; Kikuzato, Ikuya; Teruya, Morimi; Tsukahara, Masatoshi; Imada, Yumi; Wachi, Youji; Miwa, Yukino; Yano, Shuichi; Tamano, Koichi; Kawarabayasi, Yutaka; Fujimori, Kazuhiro E; Machida, Masayuki; Hirano, Takashi

    2012-10-01

    Aspergillus oryzae has been utilized for over 1000 years in Japan for the production of various traditional foods, and a large number of A. oryzae strains have been isolated and/or selected for the effective fermentation of food ingredients. Characteristics of genetic alterations among the strains used are of particular interest in studies of A. oryzae. Here, we have sequenced the whole genome of an industrial fungal isolate, A. oryzae RIB326, by using a next-generation sequencing system and compared the data with those of A. oryzae RIB40, a wild-type strain sequenced in 2005. The aim of this study was to evaluate the mutation pressure on the non-syntenic blocks (NSBs) of the genome, which were previously identified through comparative genomic analysis of A. oryzae, Aspergillus fumigatus, and Aspergillus nidulans. We found that genes within the NSBs of RIB326 accumulate mutations more frequently than those within the SBs, regardless of their distance from the telomeres or of their expression level. Our findings suggest that the high mutation frequency of NSBs might contribute to maintaining the diversity of the A. oryzae genome.

  10. Improved annotation through genome-scale metabolic modeling of Aspergillus oryzae

    PubMed Central

    Vongsangnak, Wanwipa; Olsen, Peter; Hansen, Kim; Krogsgaard, Steen; Nielsen, Jens

    2008-01-01

    Background Since ancient times the filamentous fungus Aspergillus oryzae has been used in the fermentation industry for the production of fermented sauces and the production of industrial enzymes. Recently, the genome sequence of A. oryzae with 12,074 annotated genes was released but the number of hypothetical proteins accounted for more than 50% of the annotated genes. Considering the industrial importance of this fungus, it is therefore valuable to improve the annotation and further integrate genomic information with biochemical and physiological information available for this microorganism and other related fungi. Here we proposed the gene prediction by construction of an A. oryzae Expressed Sequence Tag (EST) library, sequencing and assembly. We enhanced the function assignment by our developed annotation strategy. The resulting better annotation was used to reconstruct the metabolic network leading to a genome scale metabolic model of A. oryzae. Results Our assembled EST sequences we identified 1,046 newly predicted genes in the A. oryzae genome. Furthermore, it was possible to assign putative protein functions to 398 of the newly predicted genes. Noteworthy, our annotation strategy resulted in assignment of new putative functions to 1,469 hypothetical proteins already present in the A. oryzae genome database. Using the substantially improved annotated genome we reconstructed the metabolic network of A. oryzae. This network contains 729 enzymes, 1,314 enzyme-encoding genes, 1,073 metabolites and 1,846 (1,053 unique) biochemical reactions. The metabolic reactions are compartmentalized into the cytosol, the mitochondria, the peroxisome and the extracellular space. Transport steps between the compartments and the extracellular space represent 281 reactions, of which 161 are unique. The metabolic model was validated and shown to correctly describe the phenotypic behavior of A. oryzae grown on different carbon sources. Conclusion A much enhanced annotation of the A

  11. The Giardia genome project database.

    PubMed

    McArthur, A G; Morrison, H G; Nixon, J E; Passamaneck, N Q; Kim, U; Hinkle, G; Crocker, M K; Holder, M E; Farr, R; Reich, C I; Olsen, G E; Aley, S B; Adam, R D; Gillin, F D; Sogin, M L

    2000-08-15

    The Giardia genome project database provides an online resource for Giardia lamblia (WB strain, clone C6) genome sequence information. The database includes edited single-pass reads, the results of BLASTX searches, and details of progress towards sequencing the entire 12 million-bp Giardia genome. Pre-sorted BLASTX results can be retrieved based on keyword searches and BLAST searches of the high throughput Giardia data can be initiated from the web site or through NCBI. Descriptions of the genomic DNA libraries, project protocols and summary statistics are also available. Although the Giardia genome project is ongoing, new sequences are made available on a bi-monthly basis to ensure that researchers have access to information that may assist them in the search for genes and their biological function. The current URL of the Giardia genome project database is www.mbl.edu/Giardia.

  12. Comparative genomic and transcriptomic analyses of the Fuzhuan brick tea-fermentation fungus Aspergillus cristatus.

    PubMed

    Ge, Yongyi; Wang, Yuchen; Liu, YongXiang; Tan, Yumei; Ren, Xiuxiu; Zhang, Xinyu; Hyde, Kevin D; Liu, Yongfeng; Liu, Zuoyi

    2016-06-07

    Aspergillus cristatus is the dominant fungus involved in the fermentation of Chinese Fuzhuan brick tea. Aspergillus cristatus is a homothallic fungus that undergoes a sexual stage without asexual conidiation when cultured in hypotonic medium. The asexual stage is induced by a high salt concentration, which completely inhibits sexual development. The taxon is therefore appropriate for investigating the mechanisms of asexual and sexual reproduction in fungi. In this study, de novo genome sequencing and analysis of transcriptomes during culture under high- and low-osmolarity conditions were performed. These analyses facilitated investigation of the evolution of mating-type genes, which determine the mode of sexual reproduction, in A. cristatus, the response of the high-osmolarity glycerol (HOG) pathway to osmotic stimulation, and the detection of mycotoxins and evaluation of the relationship with the location of the encoding genes. The A. cristatus genome comprised 27.9 Mb and included 68 scaffolds, from which 10,136 protein-coding gene models were predicted. A phylogenetic analysis suggested a considerable phylogenetic distance between A. cristatus and A. nidulans. Comparison of the mating-type gene loci among Aspergillus species indicated that the mode in A. cristatus differs from those in other Aspergillus species. The components of the HOG pathway were conserved in the genome of A. cristatus. Differential gene expression analysis in A. cristatus using RNA-Seq demonstrated that the expression of most genes in the HOG pathway was unaffected by osmotic pressure. No gene clusters associated with the production of carcinogens were detected. A model of the mating-type locus in A. cristatus is reported for the first time. Aspergillus cristatus has evolved various mechanisms to cope with high osmotic stress. As a fungus associated with Fuzhuan tea, it is considered to be safe under low- and high-osmolarity conditions.

  13. Genome sequences of three strains of Aspergillus flavus for the biological control of Aflatoxin

    USDA-ARS?s Scientific Manuscript database

    The genomes of three strains of Aspergillus flavus with demonstrated utility for the biological control of aflatoxin were sequenced. These sequences were assembled with MIRA and annotated with Augustus using A. flavus strain 3357 (NCBI EQ963472) as a reference. Each strain had a genome of 36.3 to ...

  14. MIPS: a database for genomes and protein sequences.

    PubMed

    Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).

  15. GenColors-based comparative genome databases for small eukaryotic genomes.

    PubMed

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources.

  16. Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine

    PubMed Central

    Elsik, Christine G.; Tayal, Aditi; Diesh, Colin M.; Unni, Deepak R.; Emery, Marianne L.; Nguyen, Hung N.; Hagen, Darren E.

    2016-01-01

    We report an update of the Hymenoptera Genome Database (HGD) (http://HymenopteraGenome.org), a model organism database for insect species of the order Hymenoptera (ants, bees and wasps). HGD maintains genomic data for 9 bee species, 10 ant species and 1 wasp, including the versions of genome and annotation data sets published by the genome sequencing consortiums and those provided by NCBI. A new data-mining warehouse, HymenopteraMine, based on the InterMine data warehousing system, integrates the genome data with data from external sources and facilitates cross-species analyses based on orthology. New genome browsers and annotation tools based on JBrowse/WebApollo provide easy genome navigation, and viewing of high throughput sequence data sets and can be used for collaborative genome annotation. All of the genomes and annotation data sets are combined into a single BLAST server that allows users to select and combine sequence data sets to search. PMID:26578564

  17. Genome Sequences of Eight Aspergillus flavus spp. and One A. parasiticus sp., Isolated From Peanut Seeds in Georgia

    USDA-ARS?s Scientific Manuscript database

    Aspergillus flavus and A. parasiticus fungi, carcinogen-mycotoxins producers, infect peanut seeds, causing considerable impact on both human health and the economy. Here we report 9 genome sequences of Aspergillus spp. isolated from peanut seeds. The information obtained will allow conducting biodiv...

  18. MIPS: a database for genomes and protein sequences.

    PubMed Central

    Mewes, H W; Heumann, K; Kaps, A; Mayer, K; Pfeiffer, F; Stocker, S; Frishman, D

    1999-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, develops and maintains genome oriented databases. It is commonplace that the amount of sequence data available increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. Therefore, our strategy aims to cope with the data stream by the comprehensive application of analysis tools to sequences of complete genomes, the systematic classification of protein sequences and the active support of sequence analysis and functional genomics projects. This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). MIPS provides access through its WWW server (http://www.mips.biochem.mpg.de) to a spectrum of generic databases, including the above mentioned as well as a database of protein families (PROTFAM), the MITOP database, and the all-against-all FASTA database. PMID:9847138

  19. MIPS: a database for genomes and protein sequences

    PubMed Central

    Mewes, H. W.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Mayer, K.; Mokrejs, M.; Morgenstern, B.; Münsterkötter, M.; Rudd, S.; Weil, B.

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz–Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91–93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155–158; Barker et al. (2001) Nucleic Acids Res., 29, 29–32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de). PMID:11752246

  20. Integrated database for identifying candidate genes for Aspergillus flavus resistance in maize

    PubMed Central

    2010-01-01

    Background Aspergillus flavus Link:Fr, an opportunistic fungus that produces aflatoxin, is pathogenic to maize and other oilseed crops. Aflatoxin is a potent carcinogen, and its presence markedly reduces the value of grain. Understanding and enhancing host resistance to A. flavus infection and/or subsequent aflatoxin accumulation is generally considered an efficient means of reducing grain losses to aflatoxin. Different proteomic, genomic and genetic studies of maize (Zea mays L.) have generated large data sets with the goal of identifying genes responsible for conferring resistance to A. flavus, or aflatoxin. Results In order to maximize the usage of different data sets in new studies, including association mapping, we have constructed a relational database with web interface integrating the results of gene expression, proteomic (both gel-based and shotgun), Quantitative Trait Loci (QTL) genetic mapping studies, and sequence data from the literature to facilitate selection of candidate genes for continued investigation. The Corn Fungal Resistance Associated Sequences Database (CFRAS-DB) (http://agbase.msstate.edu/) was created with the main goal of identifying genes important to aflatoxin resistance. CFRAS-DB is implemented using MySQL as the relational database management system running on a Linux server, using an Apache web server, and Perl CGI scripts as the web interface. The database and the associated web-based interface allow researchers to examine many lines of evidence (e.g. microarray, proteomics, QTL studies, SNP data) to assess the potential role of a gene or group of genes in the response of different maize lines to A. flavus infection and subsequent production of aflatoxin by the fungus. Conclusions CFRAS-DB provides the first opportunity to integrate data pertaining to the problem of A. flavus and aflatoxin resistance in maize in one resource and to support queries across different datasets. The web-based interface gives researchers different query

  1. Integrated database for identifying candidate genes for Aspergillus flavus resistance in maize.

    PubMed

    Kelley, Rowena Y; Gresham, Cathy; Harper, Jonathan; Bridges, Susan M; Warburton, Marilyn L; Hawkins, Leigh K; Pechanova, Olga; Peethambaran, Bela; Pechan, Tibor; Luthe, Dawn S; Mylroie, J E; Ankala, Arunkanth; Ozkan, Seval; Henry, W B; Williams, W P

    2010-10-07

    Aspergillus flavus Link:Fr, an opportunistic fungus that produces aflatoxin, is pathogenic to maize and other oilseed crops. Aflatoxin is a potent carcinogen, and its presence markedly reduces the value of grain. Understanding and enhancing host resistance to A. flavus infection and/or subsequent aflatoxin accumulation is generally considered an efficient means of reducing grain losses to aflatoxin. Different proteomic, genomic and genetic studies of maize (Zea mays L.) have generated large data sets with the goal of identifying genes responsible for conferring resistance to A. flavus, or aflatoxin. In order to maximize the usage of different data sets in new studies, including association mapping, we have constructed a relational database with web interface integrating the results of gene expression, proteomic (both gel-based and shotgun), Quantitative Trait Loci (QTL) genetic mapping studies, and sequence data from the literature to facilitate selection of candidate genes for continued investigation. The Corn Fungal Resistance Associated Sequences Database (CFRAS-DB) (http://agbase.msstate.edu/) was created with the main goal of identifying genes important to aflatoxin resistance. CFRAS-DB is implemented using MySQL as the relational database management system running on a Linux server, using an Apache web server, and Perl CGI scripts as the web interface. The database and the associated web-based interface allow researchers to examine many lines of evidence (e.g. microarray, proteomics, QTL studies, SNP data) to assess the potential role of a gene or group of genes in the response of different maize lines to A. flavus infection and subsequent production of aflatoxin by the fungus. CFRAS-DB provides the first opportunity to integrate data pertaining to the problem of A. flavus and aflatoxin resistance in maize in one resource and to support queries across different datasets. The web-based interface gives researchers different query options for mining the database

  2. Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine.

    PubMed

    Elsik, Christine G; Tayal, Aditi; Diesh, Colin M; Unni, Deepak R; Emery, Marianne L; Nguyen, Hung N; Hagen, Darren E

    2016-01-04

    We report an update of the Hymenoptera Genome Database (HGD) (http://HymenopteraGenome.org), a model organism database for insect species of the order Hymenoptera (ants, bees and wasps). HGD maintains genomic data for 9 bee species, 10 ant species and 1 wasp, including the versions of genome and annotation data sets published by the genome sequencing consortiums and those provided by NCBI. A new data-mining warehouse, HymenopteraMine, based on the InterMine data warehousing system, integrates the genome data with data from external sources and facilitates cross-species analyses based on orthology. New genome browsers and annotation tools based on JBrowse/WebApollo provide easy genome navigation, and viewing of high throughput sequence data sets and can be used for collaborative genome annotation. All of the genomes and annotation data sets are combined into a single BLAST server that allows users to select and combine sequence data sets to search. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. BGD: a database of bat genomes.

    PubMed

    Fang, Jianfei; Wang, Xuan; Mu, Shuo; Zhang, Shuyi; Dong, Dong

    2015-01-01

    Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD). BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/.

  4. Aspergillus Niger Genomics: Past, Present and into the Future

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baker, Scott E.

    2006-09-01

    Aspergillus niger is a filamentous ascomycete fungus that is ubiquitous in the environment and has been implicated in opportunistic infections of humans. In addition to its role as an opportunistic human pathogen, A. niger is economically important as a fermentation organism used for the production of citric acid. Industrial citric acid production by A. niger represents one of the most efficient, highest yield bioprocesses in use currently by industry. The genome size of A. niger is estimated to be between 35.5 and 38.5 megabases (Mb) divided among eight chromosomes/linkage groups that vary in size from 3.5 - 6.6 Mb. Currently,more » there are three independent A. niger genome projects, an indication of the economic importance of this organism. The rich amount of data resulting from these multiple A. niger genome sequences will be used for basic and applied research programs applicable to fermentation process development, morphology and pathogenicity.« less

  5. Enhanced production of fructosyltransferase in Aspergillus oryzae by genome shuffling.

    PubMed

    Wang, Shenghai; Duan, Mengjie; Liu, Yalan; Fan, Sen; Lin, Xiaoshan; Zhang, Yi

    2017-03-01

    To breed Aspergillus oryzae strains with high fructosyltransferase (FTase) activity using intraspecific protoplast fusion via genome-shuffling. A candidate library was developed using UV/LiCl of the conidia of A. oryzae SBB201. By screening for enzyme activity and cell biomass, two mutants (UV-11 and UV-76) were chosen for protoplast fusion and subsequent genome shuffling. After three rounds of genome recombination, a fusion mutant RIII-7 was obtained. Its FTase activity was 180 U g -1 , approximately double that of the original strain, and RIII-7 was genetically stable. In fermentation culture, FTase activity of the genome-shuffled strain reached a maximum of 353 U g -1 using substrate-feeding method, and this value was approximately 3.4-times higher than that of the original strain A. oryzae SBB201. Intraspecific protoplast fusion of A. oryzae significantly enhanced FTase activity and generated a potentially useful strain for industrial production.

  6. Pseudomonas Genome Database: facilitating user-friendly, comprehensive comparisons of microbial genomes.

    PubMed

    Winsor, Geoffrey L; Van Rossum, Thea; Lo, Raymond; Khaira, Bhavjinder; Whiteside, Matthew D; Hancock, Robert E W; Brinkman, Fiona S L

    2009-01-01

    Pseudomonas aeruginosa is a well-studied opportunistic pathogen that is particularly known for its intrinsic antimicrobial resistance, diverse metabolic capacity, and its ability to cause life threatening infections in cystic fibrosis patients. The Pseudomonas Genome Database (http://www.pseudomonas.com) was originally developed as a resource for peer-reviewed, continually updated annotation for the Pseudomonas aeruginosa PAO1 reference strain genome. In order to facilitate cross-strain and cross-species genome comparisons with other Pseudomonas species of importance, we have now expanded the database capabilities to include all Pseudomonas species, and have developed or incorporated methods to facilitate high quality comparative genomics. The database contains robust assessment of orthologs, a novel ortholog clustering method, and incorporates five views of the data at the sequence and annotation levels (Gbrowse, Mauve and custom views) to facilitate genome comparisons. A choice of simple and more flexible user-friendly Boolean search features allows researchers to search and compare annotations or sequences within or between genomes. Other features include more accurate protein subcellular localization predictions and a user-friendly, Boolean searchable log file of updates for the reference strain PAO1. This database aims to continue to provide a high quality, annotated genome resource for the research community and is available under an open source license.

  7. Analysis of Aspergillus nidulans metabolism at the genome-scale

    PubMed Central

    David, Helga; Özçelik, İlknur Ş; Hofmann, Gerald; Nielsen, Jens

    2008-01-01

    Background Aspergillus nidulans is a member of a diverse group of filamentous fungi, sharing many of the properties of its close relatives with significance in the fields of medicine, agriculture and industry. Furthermore, A. nidulans has been a classical model organism for studies of development biology and gene regulation, and thus it has become one of the best-characterized filamentous fungi. It was the first Aspergillus species to have its genome sequenced, and automated gene prediction tools predicted 9,451 open reading frames (ORFs) in the genome, of which less than 10% were assigned a function. Results In this work, we have manually assigned functions to 472 orphan genes in the metabolism of A. nidulans, by using a pathway-driven approach and by employing comparative genomics tools based on sequence similarity. The central metabolism of A. nidulans, as well as biosynthetic pathways of relevant secondary metabolites, was reconstructed based on detailed metabolic reconstructions available for A. niger and Saccharomyces cerevisiae, and information on the genetics, biochemistry and physiology of A. nidulans. Thereby, it was possible to identify metabolic functions without a gene associated, and to look for candidate ORFs in the genome of A. nidulans by comparing its sequence to sequences of well-characterized genes in other species encoding the function of interest. A classification system, based on defined criteria, was developed for evaluating and selecting the ORFs among the candidates, in an objective and systematic manner. The functional assignments served as a basis to develop a mathematical model, linking 666 genes (both previously and newly annotated) to metabolic roles. The model was used to simulate metabolic behavior and additionally to integrate, analyze and interpret large-scale gene expression data concerning a study on glucose repression, thereby providing a means of upgrading the information content of experimental data and getting further

  8. Private and Efficient Query Processing on Outsourced Genomic Databases.

    PubMed

    Ghasemi, Reza; Al Aziz, Md Momin; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian

    2017-09-01

    Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time consuming and expensive process. Second, it requires large-scale computation and storage systems to process genomic sequences. Third, genomic databases are often owned by different organizations, and thus, not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 Single Nucleotide Polymorphisms (SNPs) in a database of 20 000 records takes around 100 and 150 s, respectively.

  9. Private and Efficient Query Processing on Outsourced Genomic Databases

    PubMed Central

    Ghasemi, Reza; Al Aziz, Momin; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian

    2017-01-01

    Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time-consuming and expensive process. Second, it requires large-scale computation and storage systems to processes genomic sequences. Third, genomic databases are often owned by different organizations and thus not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 SNPs in a database of 20,000 records takes around 100 and 150 seconds, respectively. PMID:27834660

  10. Orthology for comparative genomics in the mouse genome database.

    PubMed

    Dolan, Mary E; Baldarelli, Richard M; Bello, Susan M; Ni, Li; McAndrews, Monica S; Bult, Carol J; Kadin, James A; Richardson, Joel E; Ringwald, Martin; Eppig, Janan T; Blake, Judith A

    2015-08-01

    The mouse genome database (MGD) is the model organism database component of the mouse genome informatics system at The Jackson Laboratory. MGD is the international data resource for the laboratory mouse and facilitates the use of mice in the study of human health and disease. Since its beginnings, MGD has included comparative genomics data with a particular focus on human-mouse orthology, an essential component of the use of mouse as a model organism. Over the past 25 years, novel algorithms and addition of orthologs from other model organisms have enriched comparative genomics in MGD data, extending the use of orthology data to support the laboratory mouse as a model of human biology. Here, we describe current comparative data in MGD and review the history and refinement of orthology representation in this resource.

  11. Ginseng Genome Database: an open-access platform for genomics of Panax ginseng.

    PubMed

    Jayakodi, Murukarthick; Choi, Beom-Soon; Lee, Sang-Choon; Kim, Nam-Hoon; Park, Jee Young; Jang, Woojong; Lakshmanan, Meiyappan; Mohan, Shobhana V G; Lee, Dong-Yup; Yang, Tae-Jin

    2018-04-12

    The ginseng (Panax ginseng C.A. Meyer) is a perennial herbaceous plant that has been used in traditional oriental medicine for thousands of years. Ginsenosides, which have significant pharmacological effects on human health, are the foremost bioactive constituents in this plant. Having realized the importance of this plant to humans, an integrated omics resource becomes indispensable to facilitate genomic research, molecular breeding and pharmacological study of this herb. The first draft genome sequences of P. ginseng cultivar "Chunpoong" were reported recently. Here, using the draft genome, transcriptome, and functional annotation datasets of P. ginseng, we have constructed the Ginseng Genome Database http://ginsengdb.snu.ac.kr /, the first open-access platform to provide comprehensive genomic resources of P. ginseng. The current version of this database provides the most up-to-date draft genome sequence (of approximately 3000 Mbp of scaffold sequences) along with the structural and functional annotations for 59,352 genes and digital expression of genes based on transcriptome data from different tissues, growth stages and treatments. In addition, tools for visualization and the genomic data from various analyses are provided. All data in the database were manually curated and integrated within a user-friendly query page. This database provides valuable resources for a range of research fields related to P. ginseng and other species belonging to the Apiales order as well as for plant research communities in general. Ginseng genome database can be accessed at http://ginsengdb.snu.ac.kr /.

  12. Rapid genome resequencing of an atoxigenic strain of Aspergillus carbonarius

    DOE PAGES

    Cabañes, F. Javier; Sanseverino, Walter; Castellá, Gemma; ...

    2015-03-13

    In microorganisms, Ion Torrent sequencing technology has been proved to be useful in whole-genome sequencing of bacterial genomes (5 Mbp). In our study, for the first time we used this technology to perform a resequencing approach in a whole fungal genome (36 Mbp), a non-ochratoxin A producing strain of Aspergillus carbonarius. Ochratoxin A (OTA) is a potent nephrotoxin which is found mainly in cereals and their products, but it also occurs in a variety of common foods and beverages. Due to the fact that this strain does not produce OTA, we focused some of the bioinformatics analyses in genes involvedmore » in OTA biosynthesis, using a reference genome of an OTA producing strain of the same species. This study revealed that in the atoxigenic strain there is a high accumulation of nonsense and missense mutations in several genes. Importantly, a two fold increase in gene mutation ratio was observed in PKS and NRPS encoding genes which are suggested to be involved in OTA biosynthesis.« less

  13. Rapid genome resequencing of an atoxigenic strain of Aspergillus carbonarius

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cabañes, F. Javier; Sanseverino, Walter; Castellá, Gemma

    In microorganisms, Ion Torrent sequencing technology has been proved to be useful in whole-genome sequencing of bacterial genomes (5 Mbp). In our study, for the first time we used this technology to perform a resequencing approach in a whole fungal genome (36 Mbp), a non-ochratoxin A producing strain of Aspergillus carbonarius. Ochratoxin A (OTA) is a potent nephrotoxin which is found mainly in cereals and their products, but it also occurs in a variety of common foods and beverages. Due to the fact that this strain does not produce OTA, we focused some of the bioinformatics analyses in genes involvedmore » in OTA biosynthesis, using a reference genome of an OTA producing strain of the same species. This study revealed that in the atoxigenic strain there is a high accumulation of nonsense and missense mutations in several genes. Importantly, a two fold increase in gene mutation ratio was observed in PKS and NRPS encoding genes which are suggested to be involved in OTA biosynthesis.« less

  14. Recent updates and developments to plant genome size databases

    PubMed Central

    Garcia, Sònia; Leitch, Ilia J.; Anadon-Rosell, Alba; Canela, Miguel Á.; Gálvez, Francisco; Garnatje, Teresa; Gras, Airy; Hidalgo, Oriane; Johnston, Emmeline; Mas de Xaxars, Gemma; Pellicer, Jaume; Siljak-Yakovlev, Sonja; Vallès, Joan; Vitales, Daniel; Bennett, Michael D.

    2014-01-01

    Two plant genome size databases have been recently updated and/or extended: the Plant DNA C-values database (http://data.kew.org/cvalues), and GSAD, the Genome Size in Asteraceae database (http://www.asteraceaegenomesize.com). While the first provides information on nuclear DNA contents across land plants and some algal groups, the second is focused on one of the largest and most economically important angiosperm families, Asteraceae. Genome size data have numerous applications: they can be used in comparative studies on genome evolution, or as a tool to appraise the cost of whole-genome sequencing programs. The growing interest in genome size and increasing rate of data accumulation has necessitated the continued update of these databases. Currently, the Plant DNA C-values database (Release 6.0, Dec. 2012) contains data for 8510 species, while GSAD has 1219 species (Release 2.0, June 2013), representing increases of 17 and 51%, respectively, in the number of species with genome size data, compared with previous releases. Here we provide overviews of the most recent releases of each database, and outline new features of GSAD. The latter include (i) a tool to visually compare genome size data between species, (ii) the option to export data and (iii) a webpage containing information about flow cytometry protocols. PMID:24288377

  15. Benchmarking database performance for genomic data.

    PubMed

    Khushi, Matloob

    2015-06-01

    Genomic regions represent features such as gene annotations, transcription factor binding sites and epigenetic modifications. Performing various genomic operations such as identifying overlapping/non-overlapping regions or nearest gene annotations are common research needs. The data can be saved in a database system for easy management, however, there is no comprehensive database built-in algorithm at present to identify overlapping regions. Therefore I have developed a novel region-mapping (RegMap) SQL-based algorithm to perform genomic operations and have benchmarked the performance of different databases. Benchmarking identified that PostgreSQL extracts overlapping regions much faster than MySQL. Insertion and data uploads in PostgreSQL were also better, although general searching capability of both databases was almost equivalent. In addition, using the algorithm pair-wise, overlaps of >1000 datasets of transcription factor binding sites and histone marks, collected from previous publications, were reported and it was found that HNF4G significantly co-locates with cohesin subunit STAG1 (SA1).Inc. © 2015 Wiley Periodicals, Inc.

  16. Genome mining and functional genomics for siderophore production in Aspergillus niger.

    PubMed

    Franken, Angelique C W; Lechner, Beatrix E; Werner, Ernst R; Haas, Hubertus; Lokman, B Christien; Ram, Arthur F J; van den Hondel, Cees A M J J; de Weert, Sandra; Punt, Peter J

    2014-11-01

    Iron is an essential metal for many organisms, but the biologically relevant form of iron is scarce because of rapid oxidation resulting in low solubility. Simultaneously, excessive accumulation of iron is toxic. Consequently, iron uptake is a highly controlled process. In most fungal species, siderophores play a central role in iron handling. Siderophores are small iron-specific chelators that can be secreted to scavenge environmental iron or bind intracellular iron with high affinity. A second high-affinity iron uptake mechanism is reductive iron assimilation (RIA). As shown in Aspergillus fumigatus and Aspergillus nidulans, synthesis of siderophores in Aspergilli is predominantly under control of the transcription factors SreA and HapX, which are connected by a negative transcriptional feedback loop. Abolishing this fine-tuned regulation corroborates iron homeostasis, including heme biosynthesis, which could be biotechnologically of interest, e.g. the heterologous production of heme-dependent peroxidases. Aspergillus niger genome inspection identified orthologues of several genes relevant for RIA and siderophore metabolism, as well as sreA and hapX. Interestingly, genes related to synthesis of the common fungal extracellular siderophore triacetylfusarinine C were absent. Reverse-phase high-performance liquid chromatography (HPLC) confirmed the absence of triacetylfusarinine C, and demonstrated that the major secreted siderophores of A. niger are coprogen B and ferrichrome, which is also the dominant intracellular siderophore. In A. niger wild type grown under iron-replete conditions, the expression of genes involved in coprogen biosynthesis and RIA was low in the exponential growth phase but significantly induced during ascospore germination. Deletion of sreA in A. niger resulted in elevated iron uptake and increased cellular ferrichrome accumulation. Increased sensitivity toward phleomycin and high iron concentration reflected the toxic effects of excessive

  17. CyanoBase: the cyanobacteria genome database update 2010.

    PubMed

    Nakao, Mitsuteru; Okamoto, Shinobu; Kohara, Mitsuyo; Fujishiro, Tsunakazu; Fujisawa, Takatomo; Sato, Shusei; Tabata, Satoshi; Kaneko, Takakazu; Nakamura, Yasukazu

    2010-01-01

    CyanoBase (http://genome.kazusa.or.jp/cyanobase) is the genome database for cyanobacteria, which are model organisms for photosynthesis. The database houses cyanobacteria species information, complete genome sequences, genome-scale experiment data, gene information, gene annotations and mutant information. In this version, we updated these datasets and improved the navigation and the visual display of the data views. In addition, a web service API now enables users to retrieve the data in various formats with other tools, seamlessly.

  18. MPD: a pathogen genome and metagenome database

    PubMed Central

    Zhang, Tingting; Miao, Jiaojiao; Han, Na; Qiang, Yujun; Zhang, Wen

    2018-01-01

    Abstract Advances in high-throughput sequencing have led to unprecedented growth in the amount of available genome sequencing data, especially for bacterial genomes, which has been accompanied by a challenge for the storage and management of such huge datasets. To facilitate bacterial research and related studies, we have developed the Mypathogen database (MPD), which provides access to users for searching, downloading, storing and sharing bacterial genomics data. The MPD represents the first pathogenic database for microbial genomes and metagenomes, and currently covers pathogenic microbial genomes (6604 genera, 11 071 species, 41 906 strains) and metagenomic data from host, air, water and other sources (28 816 samples). The MPD also functions as a management system for statistical and storage data that can be used by different organizations, thereby facilitating data sharing among different organizations and research groups. A user-friendly local client tool is provided to maintain the steady transmission of big sequencing data. The MPD is a useful tool for analysis and management in genomic research, especially for clinical Centers for Disease Control and epidemiological studies, and is expected to contribute to advancing knowledge on pathogenic bacteria genomes and metagenomes. Database URL: http://data.mypathogen.org PMID:29917040

  19. WGE: a CRISPR database for genome engineering.

    PubMed

    Hodgkins, Alex; Farne, Anna; Perera, Sajith; Grego, Tiago; Parry-Smith, David J; Skarnes, William C; Iyer, Vivek

    2015-09-15

    The rapid development of CRISPR-Cas9 mediated genome editing techniques has given rise to a number of online and stand-alone tools to find and score CRISPR sites for whole genomes. Here we describe the Wellcome Trust Sanger Institute Genome Editing database (WGE), which uses novel methods to compute, visualize and select optimal CRISPR sites in a genome browser environment. The WGE database currently stores single and paired CRISPR sites and pre-calculated off-target information for CRISPRs located in the mouse and human exomes. Scoring and display of off-target sites is simple, and intuitive, and filters can be applied to identify high-quality CRISPR sites rapidly. WGE also provides a tool for the design and display of gene targeting vectors in the same genome browser, along with gene models, protein translation and variation tracks. WGE is open, extensible and can be set up to compute and present CRISPR sites for any genome. The WGE database is freely available at www.sanger.ac.uk/htgt/wge : vvi@sanger.ac.uk or skarnes@sanger.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  20. MIPS: a database for protein sequences and complete genomes.

    PubMed Central

    Mewes, H W; Hani, J; Pfeiffer, F; Frishman, D

    1998-01-01

    The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). Through its WWW server (http://www.mips.biochem.mpg.de ) MIPS provides access to a variety of generic databases, including a database of protein families as well as automatically generated data by the systematic application of sequence analysis algorithms. The yeast genome sequence and its related information was also compiled on CD-ROM to provide dynamic interactive access to the 16 chromosomes of the first eukaryotic genome unraveled. PMID:9399795

  1. CyanoBase: the cyanobacteria genome database update 2010

    PubMed Central

    Nakao, Mitsuteru; Okamoto, Shinobu; Kohara, Mitsuyo; Fujishiro, Tsunakazu; Fujisawa, Takatomo; Sato, Shusei; Tabata, Satoshi; Kaneko, Takakazu; Nakamura, Yasukazu

    2010-01-01

    CyanoBase (http://genome.kazusa.or.jp/cyanobase) is the genome database for cyanobacteria, which are model organisms for photosynthesis. The database houses cyanobacteria species information, complete genome sequences, genome-scale experiment data, gene information, gene annotations and mutant information. In this version, we updated these datasets and improved the navigation and the visual display of the data views. In addition, a web service API now enables users to retrieve the data in various formats with other tools, seamlessly. PMID:19880388

  2. Draft Genome Sequences of Two Aspergillus fumigatus Strains, Isolated from the International Space Station.

    PubMed

    Singh, Nitin Kumar; Blachowicz, Adriana; Checinska, Aleksandra; Wang, Clay; Venkateswaran, Kasthuri

    2016-07-14

    Draft genome sequences of Aspergillus fumigatus strains (ISSFT-021 and IF1SW-F4), opportunistic pathogens isolated from the International Space Station (ISS), were assembled to facilitate investigations of the nature of the virulence characteristics of the ISS strains to other clinical strains isolated on Earth. Copyright © 2016 Singh et al.

  3. WheatGenome.info: an integrated database and portal for wheat genome information.

    PubMed

    Lai, Kaitao; Berkman, Paul J; Lorenc, Michal Tadeusz; Duran, Chris; Smits, Lars; Manoli, Sahana; Stiller, Jiri; Edwards, David

    2012-02-01

    Bread wheat (Triticum aestivum) is one of the most important crop plants, globally providing staple food for a large proportion of the human population. However, improvement of this crop has been limited due to its large and complex genome. Advances in genomics are supporting wheat crop improvement. We provide a variety of web-based systems hosting wheat genome and genomic data to support wheat research and crop improvement. WheatGenome.info is an integrated database resource which includes multiple web-based applications. These include a GBrowse2-based wheat genome viewer with BLAST search portal, TAGdb for searching wheat second-generation genome sequence data, wheat autoSNPdb, links to wheat genetic maps using CMap and CMap3D, and a wheat genome Wiki to allow interaction between diverse wheat genome sequencing activities. This system includes links to a variety of wheat genome resources hosted at other research organizations. This integrated database aims to accelerate wheat genome research and is freely accessible via the web interface at http://www.wheatgenome.info/.

  4. Genome Sequences of Three Strains of Aspergillus flavus for the Biological Control of Aflatoxin

    PubMed Central

    Scheffler, Brian E.; Duke, Mary; Ballard, Linda; Abbas, Hamed K.; Grodowitz, Michael J.

    2017-01-01

    ABSTRACT Aflatoxin is a carcinogenic contaminant of many commodities that are infected by Aspergillus flavus. Nonaflatoxigenic strains of A. flavus have been utilized as biological control agents. Here, we report the genome sequences from three biocontrol strains. This information will be useful in developing markers for postrelease monitoring of these fungi. PMID:29097466

  5. Brassica ASTRA: an integrated database for Brassica genomic research.

    PubMed

    Love, Christopher G; Robinson, Andrew J; Lim, Geraldine A C; Hopkins, Clare J; Batley, Jacqueline; Barker, Gary; Spangenberg, German C; Edwards, David

    2005-01-01

    Brassica ASTRA is a public database for genomic information on Brassica species. The database incorporates expressed sequences with Swiss-Prot and GenBank comparative sequence annotation as well as secondary Gene Ontology (GO) annotation derived from the comparison with Arabidopsis TAIR GO annotations. Simple sequence repeat molecular markers are identified within resident sequences and mapped onto the closely related Arabidopsis genome sequence. Bacterial artificial chromosome (BAC) end sequences derived from the Multinational Brassica Genome Project are also mapped onto the Arabidopsis genome sequence enabling users to identify candidate Brassica BACs corresponding to syntenic regions of Arabidopsis. This information is maintained in a MySQL database with a web interface providing the primary means of interrogation. The database is accessible at http://hornbill.cspp.latrobe.edu.au.

  6. Tomato functional genomics database (TFGD): a comprehensive collection and analysis package for tomato functional genomics

    USDA-ARS?s Scientific Manuscript database

    Tomato Functional Genomics Database (TFGD; http://ted.bti.cornell.edu) provides a comprehensive systems biology resource to store, mine, analyze, visualize and integrate large-scale tomato functional genomics datasets. The database is expanded from the previously described Tomato Expression Database...

  7. The UCSC Genome Browser database: extensions and updates 2013.

    PubMed

    Meyer, Laurence R; Zweig, Ann S; Hinrichs, Angie S; Karolchik, Donna; Kuhn, Robert M; Wong, Matthew; Sloan, Cricket A; Rosenbloom, Kate R; Roe, Greg; Rhead, Brooke; Raney, Brian J; Pohl, Andy; Malladi, Venkat S; Li, Chin H; Lee, Brian T; Learned, Katrina; Kirkup, Vanessa; Hsu, Fan; Heitner, Steve; Harte, Rachel A; Haeussler, Maximilian; Guruvadoo, Luvina; Goldman, Mary; Giardine, Belinda M; Fujita, Pauline A; Dreszer, Timothy R; Diekhans, Mark; Cline, Melissa S; Clawson, Hiram; Barber, Galt P; Haussler, David; Kent, W James

    2013-01-01

    The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a wide variety of organisms. The Browser is an integrated tool set for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic datasets. As of September 2012, genomic sequence and a basic set of annotation 'tracks' are provided for 63 organisms, including 26 mammals, 13 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6 worms, yeast and sea hare. In the past year 19 new genome assemblies have been added, and we anticipate releasing another 28 in early 2013. Further, a large number of annotation tracks have been either added, updated by contributors or remapped to the latest human reference genome. Among these are an updated UCSC Genes track for human and mouse assemblies. We have also introduced several features to improve usability, including new navigation menus. This article provides an update to the UCSC Genome Browser database, which has been previously featured in the Database issue of this journal.

  8. Comparative genomics reveals high biological diversity and specific adaptations in the industrially and medically important fungal genus Aspergillus.

    PubMed

    de Vries, Ronald P; Riley, Robert; Wiebenga, Ad; Aguilar-Osorio, Guillermo; Amillis, Sotiris; Uchima, Cristiane Akemi; Anderluh, Gregor; Asadollahi, Mojtaba; Askin, Marion; Barry, Kerrie; Battaglia, Evy; Bayram, Özgür; Benocci, Tiziano; Braus-Stromeyer, Susanna A; Caldana, Camila; Cánovas, David; Cerqueira, Gustavo C; Chen, Fusheng; Chen, Wanping; Choi, Cindy; Clum, Alicia; Dos Santos, Renato Augusto Corrêa; Damásio, André Ricardo de Lima; Diallinas, George; Emri, Tamás; Fekete, Erzsébet; Flipphi, Michel; Freyberg, Susanne; Gallo, Antonia; Gournas, Christos; Habgood, Rob; Hainaut, Matthieu; Harispe, María Laura; Henrissat, Bernard; Hildén, Kristiina S; Hope, Ryan; Hossain, Abeer; Karabika, Eugenia; Karaffa, Levente; Karányi, Zsolt; Kraševec, Nada; Kuo, Alan; Kusch, Harald; LaButti, Kurt; Lagendijk, Ellen L; Lapidus, Alla; Levasseur, Anthony; Lindquist, Erika; Lipzen, Anna; Logrieco, Antonio F; MacCabe, Andrew; Mäkelä, Miia R; Malavazi, Iran; Melin, Petter; Meyer, Vera; Mielnichuk, Natalia; Miskei, Márton; Molnár, Ákos P; Mulé, Giuseppina; Ngan, Chew Yee; Orejas, Margarita; Orosz, Erzsébet; Ouedraogo, Jean Paul; Overkamp, Karin M; Park, Hee-Soo; Perrone, Giancarlo; Piumi, Francois; Punt, Peter J; Ram, Arthur F J; Ramón, Ana; Rauscher, Stefan; Record, Eric; Riaño-Pachón, Diego Mauricio; Robert, Vincent; Röhrig, Julian; Ruller, Roberto; Salamov, Asaf; Salih, Nadhira S; Samson, Rob A; Sándor, Erzsébet; Sanguinetti, Manuel; Schütze, Tabea; Sepčić, Kristina; Shelest, Ekaterina; Sherlock, Gavin; Sophianopoulou, Vicky; Squina, Fabio M; Sun, Hui; Susca, Antonia; Todd, Richard B; Tsang, Adrian; Unkles, Shiela E; van de Wiele, Nathalie; van Rossen-Uffink, Diana; Oliveira, Juliana Velasco de Castro; Vesth, Tammi C; Visser, Jaap; Yu, Jae-Hyuk; Zhou, Miaomiao; Andersen, Mikael R; Archer, David B; Baker, Scott E; Benoit, Isabelle; Brakhage, Axel A; Braus, Gerhard H; Fischer, Reinhard; Frisvad, Jens C; Goldman, Gustavo H; Houbraken, Jos; Oakley, Berl; Pócsi, István; Scazzocchio, Claudio; Seiboth, Bernhard; vanKuyk, Patricia A; Wortman, Jennifer; Dyer, Paul S; Grigoriev, Igor V

    2017-02-14

    The fungal genus Aspergillus is of critical importance to humankind. Species include those with industrial applications, important pathogens of humans, animals and crops, a source of potent carcinogenic contaminants of food, and an important genetic model. The genome sequences of eight aspergilli have already been explored to investigate aspects of fungal biology, raising questions about evolution and specialization within this genus. We have generated genome sequences for ten novel, highly diverse Aspergillus species and compared these in detail to sister and more distant genera. Comparative studies of key aspects of fungal biology, including primary and secondary metabolism, stress response, biomass degradation, and signal transduction, revealed both conservation and diversity among the species. Observed genomic differences were validated with experimental studies. This revealed several highlights, such as the potential for sex in asexual species, organic acid production genes being a key feature of black aspergilli, alternative approaches for degrading plant biomass, and indications for the genetic basis of stress response. A genome-wide phylogenetic analysis demonstrated in detail the relationship of the newly genome sequenced species with other aspergilli. Many aspects of biological differences between fungal species cannot be explained by current knowledge obtained from genome sequences. The comparative genomics and experimental study, presented here, allows for the first time a genus-wide view of the biological diversity of the aspergilli and in many, but not all, cases linked genome differences to phenotype. Insights gained could be exploited for biotechnological and medical applications of fungi.

  9. The Ensembl genome database project.

    PubMed

    Hubbard, T; Barker, D; Birney, E; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Huminiecki, L; Kasprzyk, A; Lehvaslaiho, H; Lijnzaad, P; Melsopp, C; Mongin, E; Pettett, R; Pocock, M; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Clamp, M

    2002-01-01

    The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.

  10. Epidemiological and Genomic Landscape of Azole Resistance Mechanisms in Aspergillus Fungi

    PubMed Central

    Hagiwara, Daisuke; Watanabe, Akira; Kamei, Katsuhiko; Goldman, Gustavo H.

    2016-01-01

    Invasive aspergillosis is a life-threatening mycosis caused by the pathogenic fungus Aspergillus. The predominant causal species is Aspergillus fumigatus, and azole drugs are the treatment of choice. Azole drugs approved for clinical use include itraconazole, voriconazole, posaconazole, and the recently added isavuconazole. However, epidemiological research has indicated that the prevalence of azole-resistant A. fumigatus isolates has increased significantly over the last decade. What is worse is that azole-resistant strains are likely to have emerged not only in response to long-term drug treatment but also because of exposure to azole fungicides in the environment. Resistance mechanisms include amino acid substitutions in the target Cyp51A protein, tandem repeat sequence insertions at the cyp51A promoter, and overexpression of the ABC transporter Cdr1B. Environmental azole-resistant strains harboring the association of a tandem repeat sequence and punctual mutation of the Cyp51A gene (TR34/L98H and TR46/Y121F/T289A) have become widely disseminated across the world within a short time period. The epidemiological data also suggests that the number of Aspergillus spp. other than A. fumigatus isolated has risen. Some non-fumigatus species intrinsically show low susceptibility to azole drugs, imposing the need for accurate identification, and drug susceptibility testing in most clinical cases. Currently, our knowledge of azole resistance mechanisms in non-fumigatus Aspergillus species such as A. flavus, A. niger, A. tubingensis, A. terreus, A. fischeri, A. lentulus, A. udagawae, and A. calidoustus is limited. In this review, we present recent advances in our understanding of azole resistance mechanisms particularly in A. fumigatus. We then provide an overview of the genome sequences of non-fumigatus species, focusing on the proteins related to azole resistance mechanisms. PMID:27708619

  11. Retrovirus Integration Database (RID): a public database for retroviral insertion sites into host genomes.

    PubMed

    Shao, Wei; Shan, Jigui; Kearney, Mary F; Wu, Xiaolin; Maldarelli, Frank; Mellors, John W; Luke, Brian; Coffin, John M; Hughes, Stephen H

    2016-07-04

    The NCI Retrovirus Integration Database is a MySql-based relational database created for storing and retrieving comprehensive information about retroviral integration sites, primarily, but not exclusively, HIV-1. The database is accessible to the public for submission or extraction of data originating from experiments aimed at collecting information related to retroviral integration sites including: the site of integration into the host genome, the virus family and subtype, the origin of the sample, gene exons/introns associated with integration, and proviral orientation. Information about the references from which the data were collected is also stored in the database. Tools are built into the website that can be used to map the integration sites to UCSC genome browser, to plot the integration site patterns on a chromosome, and to display provirus LTRs in their inserted genome sequence. The website is robust, user friendly, and allows users to query the database and analyze the data dynamically. https://rid.ncifcrf.gov ; or http://home.ncifcrf.gov/hivdrp/resources.htm .

  12. The YeastGenome app: the Saccharomyces Genome Database at your fingertips.

    PubMed

    Wong, Edith D; Karra, Kalpana; Hitz, Benjamin C; Hong, Eurie L; Cherry, J Michael

    2013-01-01

    The Saccharomyces Genome Database (SGD) is a scientific database that provides researchers with high-quality curated data about the genes and gene products of Saccharomyces cerevisiae. To provide instant and easy access to this information on mobile devices, we have developed YeastGenome, a native application for the Apple iPhone and iPad. YeastGenome can be used to quickly find basic information about S. cerevisiae genes and chromosomal features regardless of internet connectivity. With or without network access, you can view basic information and Gene Ontology annotations about a gene of interest by searching gene names and gene descriptions or by browsing the database within the app to find the gene of interest. With internet access, the app provides more detailed information about the gene, including mutant phenotypes, references and protein and genetic interactions, as well as provides hyperlinks to retrieve detailed information by showing SGD pages and views of the genome browser. SGD provides online help describing basic ways to navigate the mobile version of SGD, highlights key features and answers frequently asked questions related to the app. The app is available from iTunes (http://itunes.com/apps/yeastgenome). The YeastGenome app is provided freely as a service to our community, as part of SGD's mission to provide free and open access to all its data and annotations.

  13. gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes.

    PubMed

    Nakagawa, So; Takahashi, Mahoko Ueda

    2016-01-01

    In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs. To investigate the function and evolution of EVEs in mammalian genomes, we developed EVE ORF databases for 20 genomes of 19 mammalian species. A total of 736,771 non-overlapping EVE ORFs were identified and archived in a database named gEVE (http://geve.med.u-tokai.ac.jp). The gEVE database provides nucleotide and amino acid sequences, genomic loci and functional annotations of EVE ORFs for all 20 genomes. In analyzing RNA-seq data with the gEVE database, we successfully identified the expressed EVE genes, suggesting that the gEVE database facilitates studies of the genomic analyses of various mammalian species.Database URL: http://geve.med.u-tokai.ac.jp. © The Author(s) 2016. Published by Oxford University Press.

  14. gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes

    PubMed Central

    Nakagawa, So; Takahashi, Mahoko Ueda

    2016-01-01

    In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs. To investigate the function and evolution of EVEs in mammalian genomes, we developed EVE ORF databases for 20 genomes of 19 mammalian species. A total of 736,771 non-overlapping EVE ORFs were identified and archived in a database named gEVE (http://geve.med.u-tokai.ac.jp). The gEVE database provides nucleotide and amino acid sequences, genomic loci and functional annotations of EVE ORFs for all 20 genomes. In analyzing RNA-seq data with the gEVE database, we successfully identified the expressed EVE genes, suggesting that the gEVE database facilitates studies of the genomic analyses of various mammalian species. Database URL: http://geve.med.u-tokai.ac.jp PMID:27242033

  15. HOWDY: an integrated database system for human genome research

    PubMed Central

    Hirakawa, Mika

    2002-01-01

    HOWDY is an integrated database system for accessing and analyzing human genomic information (http://www-alis.tokyo.jst.go.jp/HOWDY/). HOWDY stores information about relationships between genetic objects and the data extracted from a number of databases. HOWDY consists of an Internet accessible user interface that allows thorough searching of the human genomic databases using the gene symbols and their aliases. It also permits flexible editing of the sequence data. The database can be searched using simple words and the search can be restricted to a specific cytogenetic location. Linear maps displaying markers and genes on contig sequences are available, from which an object can be chosen. Any search starting point identifies all the information matching the query. HOWDY provides a convenient search environment of human genomic data for scientists unsure which database is most appropriate for their search. PMID:11752279

  16. A web-based genomic sequence database for the Streptomycetaceae: a tool for systematics and genome mining

    USDA-ARS?s Scientific Manuscript database

    The ARS Microbial Genome Sequence Database (http://199.133.98.43), a web-based database server, was established utilizing the BIGSdb (Bacterial Isolate Genomics Sequence Database) software package, developed at Oxford University, as a tool to manage multi-locus sequence data for the family Streptomy...

  17. The Saccharomyces Genome Database Variant Viewer

    PubMed Central

    Sheppard, Travis K.; Hitz, Benjamin C.; Engel, Stacia R.; Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C.; Dalusag, Kyla S.; Demeter, Janos; Hellerstedt, Sage T.; Karra, Kalpana; Nash, Robert S.; Paskov, Kelley M.; Skrzypek, Marek S.; Weng, Shuai; Wong, Edith D.; Cherry, J. Michael

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer. PMID:26578556

  18. The Ruby UCSC API: accessing the UCSC genome database using Ruby.

    PubMed

    Mishima, Hiroyuki; Aerts, Jan; Katayama, Toshiaki; Bonnal, Raoul J P; Yoshiura, Koh-ichiro

    2012-09-21

    The University of California, Santa Cruz (UCSC) genome database is among the most used sources of genomic annotation in human and other organisms. The database offers an excellent web-based graphical user interface (the UCSC genome browser) and several means for programmatic queries. A simple application programming interface (API) in a scripting language aimed at the biologist was however not yet available. Here, we present the Ruby UCSC API, a library to access the UCSC genome database using Ruby. The API is designed as a BioRuby plug-in and built on the ActiveRecord 3 framework for the object-relational mapping, making writing SQL statements unnecessary. The current version of the API supports databases of all organisms in the UCSC genome database including human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast.The API uses the bin index-if available-when querying for genomic intervals. The API also supports genomic sequence queries using locally downloaded *.2bit files that are not stored in the official MySQL database. The API is implemented in pure Ruby and is therefore available in different environments and with different Ruby interpreters (including JRuby). Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby UCSC API will facilitate biologists to query the UCSC genome database programmatically. The API is available through the RubyGem system. Source code and documentation are available at https://github.com/misshie/bioruby-ucsc-api/ under the Ruby license. Feedback and help is provided via the website at http://rubyucscapi.userecho.com/.

  19. The Ruby UCSC API: accessing the UCSC genome database using Ruby

    PubMed Central

    2012-01-01

    Background The University of California, Santa Cruz (UCSC) genome database is among the most used sources of genomic annotation in human and other organisms. The database offers an excellent web-based graphical user interface (the UCSC genome browser) and several means for programmatic queries. A simple application programming interface (API) in a scripting language aimed at the biologist was however not yet available. Here, we present the Ruby UCSC API, a library to access the UCSC genome database using Ruby. Results The API is designed as a BioRuby plug-in and built on the ActiveRecord 3 framework for the object-relational mapping, making writing SQL statements unnecessary. The current version of the API supports databases of all organisms in the UCSC genome database including human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast. The API uses the bin index—if available—when querying for genomic intervals. The API also supports genomic sequence queries using locally downloaded *.2bit files that are not stored in the official MySQL database. The API is implemented in pure Ruby and is therefore available in different environments and with different Ruby interpreters (including JRuby). Conclusions Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby UCSC API will facilitate biologists to query the UCSC genome database programmatically. The API is available through the RubyGem system. Source code and documentation are available at https://github.com/misshie/bioruby-ucsc-api/ under the Ruby license. Feedback and help is provided via the website at http://rubyucscapi.userecho.com/. PMID:22994508

  20. The salinity tolerant poplar database (STPD): a comprehensive database for studying tree salt-tolerant adaption and poplar genomics.

    PubMed

    Ma, Yazhen; Xu, Ting; Wan, Dongshi; Ma, Tao; Shi, Sheng; Liu, Jianquan; Hu, Quanjun

    2015-03-17

    Soil salinity is a significant factor that impairs plant growth and agricultural productivity, and numerous efforts are underway to enhance salt tolerance of economically important plants. Populus species are widely cultivated for diverse uses. Especially, they grow in different habitats, from salty soil to mesophytic environment, and are therefore used as a model genus for elucidating physiological and molecular mechanisms of stress tolerance in woody plants. The Salinity Tolerant Poplar Database (STPD) is an integrative database for salt-tolerant poplar genome biology. Currently the STPD contains Populus euphratica genome and its related genetic resources. P. euphratica, with a preference of the salty habitats, has become a valuable genetic resource for the exploitation of tolerance characteristics in trees. This database contains curated data including genomic sequence, genes and gene functional information, non-coding RNA sequences, transposable elements, simple sequence repeats and single nucleotide polymorphisms information of P. euphratica, gene expression data between P. euphratica and Populus tomentosa, and whole-genome alignments between Populus trichocarpa, P. euphratica and Salix suchowensis. The STPD provides useful searching and data mining tools, including GBrowse genome browser, BLAST servers and genome alignments viewer, which can be used to browse genome regions, identify similar sequences and visualize genome alignments. Datasets within the STPD can also be downloaded to perform local searches. A new Salinity Tolerant Poplar Database has been developed to assist studies of salt tolerance in trees and poplar genomics. The database will be continuously updated to incorporate new genome-wide data of related poplar species. This database will serve as an infrastructure for researches on the molecular function of genes, comparative genomics, and evolution in closely related species as well as promote advances in molecular breeding within Populus. The

  1. Specialized microbial databases for inductive exploration of microbial genome sequences

    PubMed Central

    Fang, Gang; Ho, Christine; Qiu, Yaowu; Cubas, Virginie; Yu, Zhou; Cabau, Cédric; Cheung, Frankie; Moszer, Ivan; Danchin, Antoine

    2005-01-01

    Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore , a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya) has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis) associated to related organisms for comparison. PMID:15698474

  2. Phylogenomics databases for facilitating functional genomics in rice.

    PubMed

    Jung, Ki-Hong; Cao, Peijian; Sharma, Rita; Jain, Rashmi; Ronald, Pamela C

    2015-12-01

    The completion of whole genome sequence of rice (Oryza sativa) has significantly accelerated functional genomics studies. Prior to the release of the sequence, only a few genes were assigned a function each year. Since sequencing was completed in 2005, the rate has exponentially increased. As of 2014, 1,021 genes have been described and added to the collection at The Overview of functionally characterized Genes in Rice online database (OGRO). Despite this progress, that number is still very low compared with the total number of genes estimated in the rice genome. One limitation to progress is the presence of functional redundancy among members of the same rice gene family, which covers 51.6 % of all non-transposable element-encoding genes. There remain a significant portion or rice genes that are not functionally redundant, as reflected in the recovery of loss-of-function mutants. To more accurately analyze functional redundancy in the rice genome, we have developed a phylogenomics databases for six large gene families in rice, including those for glycosyltransferases, glycoside hydrolases, kinases, transcription factors, transporters, and cytochrome P450 monooxygenases. In this review, we introduce key features and applications of these databases. We expect that they will serve as a very useful guide in the post-genomics era of research.

  3. Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics.

    PubMed

    Sakai, Hiroaki; Lee, Sung Shin; Tanaka, Tsuyoshi; Numa, Hisataka; Kim, Jungsok; Kawahara, Yoshihiro; Wakimoto, Hironobu; Yang, Ching-chia; Iwamoto, Masao; Abe, Takashi; Yamada, Yuko; Muto, Akira; Inokuchi, Hachiro; Ikemura, Toshimichi; Matsumoto, Takashi; Sasaki, Takuji; Itoh, Takeshi

    2013-02-01

    The Rice Annotation Project Database (RAP-DB, http://rapdb.dna.affrc.go.jp/) has been providing a comprehensive set of gene annotations for the genome sequence of rice, Oryza sativa (japonica group) cv. Nipponbare. Since the first release in 2005, RAP-DB has been updated several times along with the genome assembly updates. Here, we present our newest RAP-DB based on the latest genome assembly, Os-Nipponbare-Reference-IRGSP-1.0 (IRGSP-1.0), which was released in 2011. We detected 37,869 loci by mapping transcript and protein sequences of 150 monocot species. To provide plant researchers with highly reliable and up to date rice gene annotations, we have been incorporating literature-based manually curated data, and 1,626 loci currently incorporate literature-based annotation data, including commonly used gene names or gene symbols. Transcriptional activities are shown at the nucleotide level by mapping RNA-Seq reads derived from 27 samples. We also mapped the Illumina reads of a Japanese leading japonica cultivar, Koshihikari, and a Chinese indica cultivar, Guangluai-4, to the genome and show alignments together with the single nucleotide polymorphisms (SNPs) and gene functional annotations through a newly developed browser, Short-Read Assembly Browser (S-RAB). We have developed two satellite databases, Plant Gene Family Database (PGFD) and Integrative Database of Cereal Gene Phylogeny (IDCGP), which display gene family and homologous gene relationships among diverse plant species. RAP-DB and the satellite databases offer simple and user-friendly web interfaces, enabling plant and genome researchers to access the data easily and facilitating a broad range of plant research topics.

  4. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

    PubMed

    Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

    2016-01-04

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. The Yak genome database: an integrative database for studying yak biology and high-altitude adaption

    PubMed Central

    2012-01-01

    Background The yak (Bos grunniens) is a long-haired bovine that lives at high altitudes and is an important source of milk, meat, fiber and fuel. The recent sequencing, assembly and annotation of its genome are expected to further our understanding of the means by which it has adapted to life at high altitudes and its ecologically important traits. Description The Yak Genome Database (YGD) is an internet-based resource that provides access to genomic sequence data and predicted functional information concerning the genes and proteins of Bos grunniens. The curated data stored in the YGD includes genome sequences, predicted genes and associated annotations, non-coding RNA sequences, transposable elements, single nucleotide variants, and three-way whole-genome alignments between human, cattle and yak. YGD offers useful searching and data mining tools, including the ability to search for genes by name or using function keywords as well as GBrowse genome browsers and/or BLAST servers, which can be used to visualize genome regions and identify similar sequences. Sequence data from the YGD can also be downloaded to perform local searches. Conclusions A new yak genome database (YGD) has been developed to facilitate studies on high-altitude adaption and bovine genomics. The database will be continuously updated to incorporate new information such as transcriptome data and population resequencing data. The YGD can be accessed at http://me.lzu.edu.cn/yak. PMID:23134687

  6. MIPS PlantsDB: a database framework for comparative plant genome research.

    PubMed

    Nussbaumer, Thomas; Martis, Mihaela M; Roessner, Stephan K; Pfeifer, Matthias; Bader, Kai C; Sharma, Sapna; Gundlach, Heidrun; Spannagl, Manuel

    2013-01-01

    The rapidly increasing amount of plant genome (sequence) data enables powerful comparative analyses and integrative approaches and also requires structured and comprehensive information resources. Databases are needed for both model and crop plant organisms and both intuitive search/browse views and comparative genomics tools should communicate the data to researchers and help them interpret it. MIPS PlantsDB (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) was initially described in NAR in 2007 [Spannagl,M., Noubibou,O., Haase,D., Yang,L., Gundlach,H., Hindemitt, T., Klee,K., Haberer,G., Schoof,H. and Mayer,K.F. (2007) MIPSPlantsDB-plant database resource for integrative and comparative plant genome research. Nucleic Acids Res., 35, D834-D840] and was set up from the start to provide data and information resources for individual plant species as well as a framework for integrative and comparative plant genome research. PlantsDB comprises database instances for tomato, Medicago, Arabidopsis, Brachypodium, Sorghum, maize, rice, barley and wheat. Building up on that, state-of-the-art comparative genomics tools such as CrowsNest are integrated to visualize and investigate syntenic relationships between monocot genomes. Results from novel genome analysis strategies targeting the complex and repetitive genomes of triticeae species (wheat and barley) are provided and cross-linked with model species. The MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) as well as tight connections to other databases, e.g. via web services, are further important components of PlantsDB.

  7. MIPS PlantsDB: a database framework for comparative plant genome research

    PubMed Central

    Nussbaumer, Thomas; Martis, Mihaela M.; Roessner, Stephan K.; Pfeifer, Matthias; Bader, Kai C.; Sharma, Sapna; Gundlach, Heidrun; Spannagl, Manuel

    2013-01-01

    The rapidly increasing amount of plant genome (sequence) data enables powerful comparative analyses and integrative approaches and also requires structured and comprehensive information resources. Databases are needed for both model and crop plant organisms and both intuitive search/browse views and comparative genomics tools should communicate the data to researchers and help them interpret it. MIPS PlantsDB (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) was initially described in NAR in 2007 [Spannagl,M., Noubibou,O., Haase,D., Yang,L., Gundlach,H., Hindemitt, T., Klee,K., Haberer,G., Schoof,H. and Mayer,K.F. (2007) MIPSPlantsDB–plant database resource for integrative and comparative plant genome research. Nucleic Acids Res., 35, D834–D840] and was set up from the start to provide data and information resources for individual plant species as well as a framework for integrative and comparative plant genome research. PlantsDB comprises database instances for tomato, Medicago, Arabidopsis, Brachypodium, Sorghum, maize, rice, barley and wheat. Building up on that, state-of-the-art comparative genomics tools such as CrowsNest are integrated to visualize and investigate syntenic relationships between monocot genomes. Results from novel genome analysis strategies targeting the complex and repetitive genomes of triticeae species (wheat and barley) are provided and cross-linked with model species. The MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) as well as tight connections to other databases, e.g. via web services, are further important components of PlantsDB. PMID:23203886

  8. The Saccharomyces Genome Database Variant Viewer.

    PubMed

    Sheppard, Travis K; Hitz, Benjamin C; Engel, Stacia R; Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C; Dalusag, Kyla S; Demeter, Janos; Hellerstedt, Sage T; Karra, Kalpana; Nash, Robert S; Paskov, Kelley M; Skrzypek, Marek S; Weng, Shuai; Wong, Edith D; Cherry, J Michael

    2016-01-04

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Addition of a breeding database in the Genome Database for Rosaceae.

    PubMed

    Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie

    2013-01-01

    Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will

  10. Human Ageing Genomic Resources: new and updated databases

    PubMed Central

    Tacutu, Robi; Thornton, Daniel; Johnson, Emily; Budovsky, Arie; Barardo, Diogo; Craig, Thomas; Diana, Eugene; Lehmann, Gilad; Toren, Dmitri; Wang, Jingwei; Fraifeld, Vadim E

    2018-01-01

    Abstract In spite of a growing body of research and data, human ageing remains a poorly understood process. Over 10 years ago we developed the Human Ageing Genomic Resources (HAGR), a collection of databases and tools for studying the biology and genetics of ageing. Here, we present HAGR’s main functionalities, highlighting new additions and improvements. HAGR consists of six core databases: (i) the GenAge database of ageing-related genes, in turn composed of a dataset of >300 human ageing-related genes and a dataset with >2000 genes associated with ageing or longevity in model organisms; (ii) the AnAge database of animal ageing and longevity, featuring >4000 species; (iii) the GenDR database with >200 genes associated with the life-extending effects of dietary restriction; (iv) the LongevityMap database of human genetic association studies of longevity with >500 entries; (v) the DrugAge database with >400 ageing or longevity-associated drugs or compounds; (vi) the CellAge database with >200 genes associated with cell senescence. All our databases are manually curated by experts and regularly updated to ensure a high quality data. Cross-links across our databases and to external resources help researchers locate and integrate relevant information. HAGR is freely available online (http://genomics.senescence.info/). PMID:29121237

  11. Kazusa Marker DataBase: a database for genomics, genetics, and molecular breeding in plants.

    PubMed

    Shirasawa, Kenta; Isobe, Sachiko; Tabata, Satoshi; Hirakawa, Hideki

    2014-09-01

    In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10 plant species have been targeted: tomato (Solanum lycopersicum), pepper (Capsicum annuum), strawberry (Fragaria × ananassa), radish (Raphanus sativus), Lotus japonicus, soybean (Glycine max), peanut (Arachis hypogaea), red clover (Trifolium pratense), white clover (Trifolium repens), and eucalyptus (Eucalyptus camaldulensis). In addition, the number of plant species registered in this database will be increased as our research progresses. The Kazusa Marker DataBase will be a useful tool for both basic and applied sciences, such as genomics, genetics, and molecular breeding in crops.

  12. Genome Sequence of Aspergillus flavus NRRL 3357, a Strain That Causes Aflatoxin Contamination of Food and Feed.

    PubMed

    Nierman, William C; Yu, Jiujiang; Fedorova-Abrams, Natalie D; Losada, Liliana; Cleveland, Thomas E; Bhatnagar, Deepak; Bennett, Joan W; Dean, Ralph; Payne, Gary A

    2015-04-16

    Aflatoxin contamination of food and livestock feed results in significant annual crop losses internationally. Aspergillus flavus is the major fungus responsible for this loss. Additionally, A. flavus is the second leading cause of aspergillosis in immunocompromised human patients. Here, we report the genome sequence of strain NRRL 3357. Copyright © 2015 Nierman et al.

  13. The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide

    PubMed Central

    Liolios, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Kyrpides, Nikos C.

    2006-01-01

    The Genomes On Line Database (GOLD) is a web resource for comprehensive access to information regarding complete and ongoing genome sequencing projects worldwide. The database currently incorporates information on over 1500 sequencing projects, of which 294 have been completed and the data deposited in the public databases. GOLD v.2 has been expanded to provide information related to organism properties such as phenotype, ecotype and disease. Furthermore, project relevance and availability information is now included. GOLD is available at . It is also mirrored at the Institute of Molecular Biology and Biotechnology, Crete, Greece at PMID:16381880

  14. Design and implementation of the cacao genome database

    USDA-ARS?s Scientific Manuscript database

    The Cacao Genome Database (CGD, www.cacaogenomedb.org) is being developed to provide a comprehensive data mining resource of genomic, genetic and breeding data for Theobroma cacao. Designed using Chado and a collection of Drupal modules, known as Tripal, CGD currently contains the genetically anchor...

  15. Unlimited Thirst for Genome Sequencing, Data Interpretation, and Database Usage in Genomic Era: The Road towards Fast-Track Crop Plant Improvement

    PubMed Central

    Govindaraj, Mahalingam

    2015-01-01

    The number of sequenced crop genomes and associated genomic resources is growing rapidly with the advent of inexpensive next generation sequencing methods. Databases have become an integral part of all aspects of science research, including basic and applied plant and animal sciences. The importance of databases keeps increasing as the volume of datasets from direct and indirect genomics, as well as other omics approaches, keeps expanding in recent years. The databases and associated web portals provide at a minimum a uniform set of tools and automated analysis across a wide range of crop plant genomes. This paper reviews some basic terms and considerations in dealing with crop plant databases utilization in advancing genomic era. The utilization of databases for variation analysis with other comparative genomics tools, and data interpretation platforms are well described. The major focus of this review is to provide knowledge on platforms and databases for genome-based investigations of agriculturally important crop plants. The utilization of these databases in applied crop improvement program is still being achieved widely; otherwise, the end for sequencing is not far away. PMID:25874133

  16. VCGDB: a dynamic genome database of the Chinese population

    PubMed Central

    2014-01-01

    Background The data released by the 1000 Genomes Project contain an increasing number of genome sequences from different nations and populations with a large number of genetic variations. As a result, the focus of human genome studies is changing from single and static to complex and dynamic. The currently available human reference genome (GRCh37) is based on sequencing data from 13 anonymous Caucasian volunteers, which might limit the scope of genomics, transcriptomics, epigenetics, and genome wide association studies. Description We used the massive amount of sequencing data published by the 1000 Genomes Project Consortium to construct the Virtual Chinese Genome Database (VCGDB), a dynamic genome database of the Chinese population based on the whole genome sequencing data of 194 individuals. VCGDB provides dynamic genomic information, which contains 35 million single nucleotide variations (SNVs), 0.5 million insertions/deletions (indels), and 29 million rare variations, together with genomic annotation information. VCGDB also provides a highly interactive user-friendly virtual Chinese genome browser (VCGBrowser) with functions like seamless zooming and real-time searching. In addition, we have established three population-specific consensus Chinese reference genomes that are compatible with mainstream alignment software. Conclusions VCGDB offers a feasible strategy for processing big data to keep pace with the biological data explosion by providing a robust resource for genomics studies; in particular, studies aimed at finding regions of the genome associated with diseases. PMID:24708222

  17. Viral Genome DataBase: storing and analyzing genes and proteins from complete viral genomes.

    PubMed

    Hiscock, D; Upton, C

    2000-05-01

    The Viral Genome DataBase (VGDB) contains detailed information of the genes and predicted protein sequences from 15 completely sequenced genomes of large (&100 kb) viruses (2847 genes). The data that is stored includes DNA sequence, protein sequence, GenBank and user-entered notes, molecular weight (MW), isoelectric point (pI), amino acid content, A + T%, nucleotide frequency, dinucleotide frequency and codon use. The VGDB is a mySQL database with a user-friendly JAVA GUI. Results of queries can be easily sorted by any of the individual parameters. The software and additional figures and information are available at http://athena.bioc.uvic.ca/genomes/index.html .

  18. Tripal: a construction toolkit for online genome databases.

    PubMed

    Ficklin, Stephen P; Sanderson, Lacey-Anne; Cheng, Chun-Huai; Staton, Margaret E; Lee, Taein; Cho, Il-Hyung; Jung, Sook; Bett, Kirstin E; Main, Doreen

    2011-01-01

    As the availability, affordability and magnitude of genomics and genetics research increases so does the need to provide online access to resulting data and analyses. Availability of a tailored online database is the desire for many investigators or research communities; however, managing the Information Technology infrastructure needed to create such a database can be an undesired distraction from primary research or potentially cost prohibitive. Tripal provides simplified site development by merging the power of Drupal, a popular web Content Management System with that of Chado, a community-derived database schema for storage of genomic, genetic and other related biological data. Tripal provides an interface that extends the content management features of Drupal to the data housed in Chado. Furthermore, Tripal provides a web-based Chado installer, genomic data loaders, web-based editing of data for organisms, genomic features, biological libraries, controlled vocabularies and stock collections. Also available are Tripal extensions that support loading and visualizations of NCBI BLAST, InterPro, Kyoto Encyclopedia of Genes and Genomes and Gene Ontology analyses, as well as an extension that provides integration of Tripal with GBrowse, a popular GMOD tool. An Application Programming Interface is available to allow creation of custom extensions by site developers, and the look-and-feel of the site is completely customizable through Drupal-based PHP template files. Addition of non-biological content and user-management is afforded through Drupal. Tripal is an open source and freely available software package found at http://tripal.sourceforge.net.

  19. The integrated web service and genome database for agricultural plants with biotechnology information.

    PubMed

    Kim, Changkug; Park, Dongsuk; Seol, Youngjoo; Hahn, Jangho

    2011-01-01

    The National Agricultural Biotechnology Information Center (NABIC) constructed an agricultural biology-based infrastructure and developed a Web based relational database for agricultural plants with biotechnology information. The NABIC has concentrated on functional genomics of major agricultural plants, building an integrated biotechnology database for agro-biotech information that focuses on genomics of major agricultural resources. This genome database provides annotated genome information from 1,039,823 records mapped to rice, Arabidopsis, and Chinese cabbage.

  20. The integrated web service and genome database for agricultural plants with biotechnology information

    PubMed Central

    Kim, ChangKug; Park, DongSuk; Seol, YoungJoo; Hahn, JangHo

    2011-01-01

    The National Agricultural Biotechnology Information Center (NABIC) constructed an agricultural biology-based infrastructure and developed a Web based relational database for agricultural plants with biotechnology information. The NABIC has concentrated on functional genomics of major agricultural plants, building an integrated biotechnology database for agro-biotech information that focuses on genomics of major agricultural resources. This genome database provides annotated genome information from 1,039,823 records mapped to rice, Arabidopsis, and Chinese cabbage. PMID:21887015

  1. Bolbase: a comprehensive genomics database for Brassica oleracea.

    PubMed

    Yu, Jingyin; Zhao, Meixia; Wang, Xiaowu; Tong, Chaobo; Huang, Shunmou; Tehrim, Sadia; Liu, Yumei; Hua, Wei; Liu, Shengyi

    2013-09-30

    Brassica oleracea is a morphologically diverse species in the family Brassicaceae and contains a group of nutrition-rich vegetable crops, including common heading cabbage, cauliflower, broccoli, kohlrabi, kale, Brussels sprouts. This diversity along with its phylogenetic membership in a group of three diploid and three tetraploid species, and the recent availability of genome sequences within Brassica provide an unprecedented opportunity to study intra- and inter-species divergence and evolution in this species and its close relatives. We have developed a comprehensive database, Bolbase, which provides access to the B. oleracea genome data and comparative genomics information. The whole genome of B. oleracea is available, including nine fully assembled chromosomes and 1,848 scaffolds, with 45,758 predicted genes, 13,382 transposable elements, and 3,581 non-coding RNAs. Comparative genomics information is available, including syntenic regions among B. oleracea, Brassica rapa and Arabidopsis thaliana, synonymous (Ks) and non-synonymous (Ka) substitution rates between orthologous gene pairs, gene families or clusters, and differences in quantity, category, and distribution of transposable elements on chromosomes. Bolbase provides useful search and data mining tools, including a keyword search, a local BLAST server, and a customized GBrowse tool, which can be used to extract annotations of genome components, identify similar sequences and visualize syntenic regions among species. Users can download all genomic data and explore comparative genomics in a highly visual setting. Bolbase is the first resource platform for the B. oleracea genome and for genomic comparisons with its relatives, and thus it will help the research community to better study the function and evolution of Brassica genomes as well as enhance molecular breeding research. This database will be updated regularly with new features, improvements to genome annotation, and new genomic sequences as they

  2. Bolbase: a comprehensive genomics database for Brassica oleracea

    PubMed Central

    2013-01-01

    Background Brassica oleracea is a morphologically diverse species in the family Brassicaceae and contains a group of nutrition-rich vegetable crops, including common heading cabbage, cauliflower, broccoli, kohlrabi, kale, Brussels sprouts. This diversity along with its phylogenetic membership in a group of three diploid and three tetraploid species, and the recent availability of genome sequences within Brassica provide an unprecedented opportunity to study intra- and inter-species divergence and evolution in this species and its close relatives. Description We have developed a comprehensive database, Bolbase, which provides access to the B. oleracea genome data and comparative genomics information. The whole genome of B. oleracea is available, including nine fully assembled chromosomes and 1,848 scaffolds, with 45,758 predicted genes, 13,382 transposable elements, and 3,581 non-coding RNAs. Comparative genomics information is available, including syntenic regions among B. oleracea, Brassica rapa and Arabidopsis thaliana, synonymous (Ks) and non-synonymous (Ka) substitution rates between orthologous gene pairs, gene families or clusters, and differences in quantity, category, and distribution of transposable elements on chromosomes. Bolbase provides useful search and data mining tools, including a keyword search, a local BLAST server, and a customized GBrowse tool, which can be used to extract annotations of genome components, identify similar sequences and visualize syntenic regions among species. Users can download all genomic data and explore comparative genomics in a highly visual setting. Conclusions Bolbase is the first resource platform for the B. oleracea genome and for genomic comparisons with its relatives, and thus it will help the research community to better study the function and evolution of Brassica genomes as well as enhance molecular breeding research. This database will be updated regularly with new features, improvements to genome annotation

  3. Addition of a breeding database in the Genome Database for Rosaceae

    PubMed Central

    Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie

    2013-01-01

    Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will

  4. Genomic Target Database (GTD): A database of potential targets in human pathogenic bacteria

    PubMed Central

    Barh, Debmalya; Kumar, Anil; Misra, Amarendra Narayana

    2009-01-01

    A Genomic Target Database (GTD) has been developed having putative genomic drug targets for human bacterial pathogens. The selected pathogens are either drug resistant or vaccines are yet to be developed against them. The drug targets have been identified using subtractive genomics approaches and these are subsequently classified into Drug targets in pathogen specific unique metabolic pathways,Drug targets in host-pathogen common metabolic pathways, andMembrane localized drug targets. HTML code is used to link each target to its various properties and other available public resources. Essential resources and tools for subtractive genomic analysis, sub-cellular localization, vaccine and drug designing are also mentioned. To the best of authors knowledge, no such database (DB) is presently available that has listed metabolic pathways and membrane specific genomic drug targets based on subtractive genomics. Listed targets in GTD are readily available resource in developing drug and vaccine against the respective pathogen, its subtypes, and other family members. Currently GTD contains 58 drug targets for four pathogens. Shortly, drug targets for six more pathogens will be listed. Availability GTD is available at IIOAB website http://www.iioab.webs.com/GTD.htm. It can also be accessed at http://www.iioabdgd.webs.com.GTD is free for academic research and non-commercial use only. Commercial use is strictly prohibited without prior permission from IIOAB. PMID:20011153

  5. PoMaMo--a comprehensive database for potato genome data.

    PubMed

    Meyer, Svenja; Nagel, Axel; Gebhardt, Christiane

    2005-01-01

    A database for potato genome data (PoMaMo, Potato Maps and More) was established. The database contains molecular maps of all twelve potato chromosomes with about 1000 mapped elements, sequence data, putative gene functions, results from BLAST analysis, SNP and InDel information from different diploid and tetraploid potato genotypes, publication references, links to other public databases like GenBank (http://www.ncbi.nlm.nih.gov/) or SGN (Solanaceae Genomics Network, http://www.sgn.cornell.edu/), etc. Flexible search and data visualization interfaces enable easy access to the data via internet (https://gabi.rzpd.de/PoMaMo.html). The Java servlet tool YAMB (Yet Another Map Browser) was designed to interactively display chromosomal maps. Maps can be zoomed in and out, and detailed information about mapped elements can be obtained by clicking on an element of interest. The GreenCards interface allows a text-based data search by marker-, sequence- or genotype name, by sequence accession number, gene function, BLAST Hit or publication reference. The PoMaMo database is a comprehensive database for different potato genome data, and to date the only database containing SNP and InDel data from diploid and tetraploid potato genotypes.

  6. Nencki Genomics Database--Ensembl funcgen enhanced with intersections, user data and genome-wide TFBS motifs.

    PubMed

    Krystkowiak, Izabella; Lenart, Jakub; Debski, Konrad; Kuterba, Piotr; Petas, Michal; Kaminska, Bozena; Dabrowski, Michal

    2013-01-01

    We present the Nencki Genomics Database, which extends the functionality of Ensembl Regulatory Build (funcgen) for the three species: human, mouse and rat. The key enhancements over Ensembl funcgen include the following: (i) a user can add private data, analyze them alongside the public data and manage access rights; (ii) inside the database, we provide efficient procedures for computing intersections between regulatory features and for mapping them to the genes. To Ensembl funcgen-derived data, which include data from ENCODE, we add information on conserved non-coding (putative regulatory) sequences, and on genome-wide occurrence of transcription factor binding site motifs from the current versions of two major motif libraries, namely, Jaspar and Transfac. The intersections and mapping to the genes are pre-computed for the public data, and the result of any procedure run on the data added by the users is stored back into the database, thus incrementally increasing the body of pre-computed data. As the Ensembl funcgen schema for the rat is currently not populated, our database is the first database of regulatory features for this frequently used laboratory animal. The database is accessible without registration using the mysql client: mysql -h database.nencki-genomics.org -u public. Registration is required only to add or access private data. A WSDL webservice provides access to the database from any SOAP client, including the Taverna Workbench with a graphical user interface.

  7. Tripal: a construction toolkit for online genome databases

    PubMed Central

    Sanderson, Lacey-Anne; Cheng, Chun-Huai; Staton, Margaret E.; Lee, Taein; Cho, Il-Hyung; Jung, Sook; Bett, Kirstin E.; Main, Doreen

    2011-01-01

    As the availability, affordability and magnitude of genomics and genetics research increases so does the need to provide online access to resulting data and analyses. Availability of a tailored online database is the desire for many investigators or research communities; however, managing the Information Technology infrastructure needed to create such a database can be an undesired distraction from primary research or potentially cost prohibitive. Tripal provides simplified site development by merging the power of Drupal, a popular web Content Management System with that of Chado, a community-derived database schema for storage of genomic, genetic and other related biological data. Tripal provides an interface that extends the content management features of Drupal to the data housed in Chado. Furthermore, Tripal provides a web-based Chado installer, genomic data loaders, web-based editing of data for organisms, genomic features, biological libraries, controlled vocabularies and stock collections. Also available are Tripal extensions that support loading and visualizations of NCBI BLAST, InterPro, Kyoto Encyclopedia of Genes and Genomes and Gene Ontology analyses, as well as an extension that provides integration of Tripal with GBrowse, a popular GMOD tool. An Application Programming Interface is available to allow creation of custom extensions by site developers, and the look-and-feel of the site is completely customizable through Drupal-based PHP template files. Addition of non-biological content and user-management is afforded through Drupal. Tripal is an open source and freely available software package found at http://tripal.sourceforge.net PMID:21959868

  8. MBGD update 2015: microbial genome database for flexible ortholog analysis utilizing a diverse set of genomic data.

    PubMed

    Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu

    2015-01-01

    The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. GDR (Genome Database for Rosaceae): integrated web-database for Rosaceae genomics and genetics data

    PubMed Central

    Jung, Sook; Staton, Margaret; Lee, Taein; Blenda, Anna; Svancara, Randall; Abbott, Albert; Main, Dorrie

    2008-01-01

    The Genome Database for Rosaceae (GDR) is a central repository of curated and integrated genetics and genomics data of Rosaceae, an economically important family which includes apple, cherry, peach, pear, raspberry, rose and strawberry. GDR contains annotated databases of all publicly available Rosaceae ESTs, the genetically anchored peach physical map, Rosaceae genetic maps and comprehensively annotated markers and traits. The ESTs are assembled to produce unigene sets of each genus and the entire Rosaceae. Other annotations include putative function, microsatellites, open reading frames, single nucleotide polymorphisms, gene ontology terms and anchored map position where applicable. Most of the published Rosaceae genetic maps can be viewed and compared through CMap, the comparative map viewer. The peach physical map can be viewed using WebFPC/WebChrom, and also through our integrated GDR map viewer, which serves as a portal to the combined genetic, transcriptome and physical mapping information. ESTs, BACs, markers and traits can be queried by various categories and the search result sites are linked to the mapping visualization tools. GDR also provides online analysis tools such as a batch BLAST/FASTA server for the GDR datasets, a sequence assembly server and microsatellite and primer detection tools. GDR is available at http://www.rosaceae.org. PMID:17932055

  10. Genomics and Public Health Research: Can the State Allow Access to Genomic Databases?

    PubMed Central

    Cousineau, J; Girard, N; Monardes, C; Leroux, T; Jean, M Stanton

    2012-01-01

    Because many diseases are multifactorial disorders, the scientific progress in genomics and genetics should be taken into consideration in public health research. In this context, genomic databases will constitute an important source of information. Consequently, it is important to identify and characterize the State’s role and authority on matters related to public health, in order to verify whether it has access to such databases while engaging in public health genomic research. We first consider the evolution of the concept of public health, as well as its core functions, using a comparative approach (e.g. WHO, PAHO, CDC and the Canadian province of Quebec). Following an analysis of relevant Quebec legislation, the precautionary principle is examined as a possible avenue to justify State access to and use of genomic databases for research purposes. Finally, we consider the Influenza pandemic plans developed by WHO, Canada, and Quebec, as examples of key tools framing public health decision-making process. We observed that State powers in public health, are not, in Quebec, well adapted to the expansion of genomics research. We propose that the scope of the concept of research in public health should be clear and include the following characteristics: a commitment to the health and well-being of the population and to their determinants; the inclusion of both applied research and basic research; and, an appropriate model of governance (authorization, follow-up, consent, etc.). We also suggest that the strategic approach version of the precautionary principle could guide collective choices in these matters. PMID:23113174

  11. Aspergillus tubingensis and Aspergillus niger as the dominant black Aspergillus, use of simple PCR-RFLP for preliminary differentiation.

    PubMed

    Mirhendi, H; Zarei, F; Motamedi, M; Nouripour-Sisakht, S

    2016-03-01

    This work aimed to identify the species distribution of common clinical and environmental isolates of black Aspergilli based on simple restriction fragment length polymorphism (RFLP) analysis of the β-tubulin gene. A total of 149 clinical and environmental strains of black Aspergilli were collected and subjected to preliminary morphological examination. Total genomic DNAs were extracted, and PCR was performed to amplify part of the β-tubulin gene. At first, 52 randomly selected samples were species-delineated by sequence analysis. In order to distinguish the most common species, PCR amplicons of 117 black Aspergillus strains were identified by simple PCR-RFLP analysis using the enzyme TasI. Among 52 sequenced isolates, 28 were Aspergillus tubingensis, 21 Aspergillus niger, and the three remaining isolates included Aspergillus uvarum, Aspergillus awamori, and Aspergillus acidus. All 100 environmental and 17 BAL samples subjected to TasI-RFLP analysis of the β-tubulin gene, fell into two groups, consisting of about 59% (n=69) A. tubingensis and 41% (n=48) A. niger. Therefore, the method successfully and rapidly distinguished A. tubingensis and A. niger as the most common species among the clinical and environmental isolates. Although tardy, the Ehrlich test was also able to differentiate A. tubingensis and A. niger according to the yellow color reaction specific to A. niger. A. tubingensis and A. niger are the most common black Aspergillus in both clinical and environmental isolates in Iran. PCR-RFLP using TasI digestion of β-tubulin DNA enables rapid screening for these common species. Copyright © 2016 Elsevier Masson SAS. All rights reserved.

  12. PoMaMo—a comprehensive database for potato genome data

    PubMed Central

    Meyer, Svenja; Nagel, Axel; Gebhardt, Christiane

    2005-01-01

    A database for potato genome data (PoMaMo, Potato Maps and More) was established. The database contains molecular maps of all twelve potato chromosomes with about 1000 mapped elements, sequence data, putative gene functions, results from BLAST analysis, SNP and InDel information from different diploid and tetraploid potato genotypes, publication references, links to other public databases like GenBank (http://www.ncbi.nlm.nih.gov/) or SGN (Solanaceae Genomics Network, http://www.sgn.cornell.edu/), etc. Flexible search and data visualization interfaces enable easy access to the data via internet (https://gabi.rzpd.de/PoMaMo.html). The Java servlet tool YAMB (Yet Another Map Browser) was designed to interactively display chromosomal maps. Maps can be zoomed in and out, and detailed information about mapped elements can be obtained by clicking on an element of interest. The GreenCards interface allows a text-based data search by marker-, sequence- or genotype name, by sequence accession number, gene function, BLAST Hit or publication reference. The PoMaMo database is a comprehensive database for different potato genome data, and to date the only database containing SNP and InDel data from diploid and tetraploid potato genotypes. PMID:15608284

  13. Insights from the genome of a high alkaline cellulase producing Aspergillus fumigatus strain obtained from Peruvian Amazon rainforest.

    PubMed

    Paul, Sujay; Zhang, Angel; Ludeña, Yvette; Villena, Gretty K; Yu, Fengan; Sherman, David H; Gutiérrez-Correa, Marcel

    2017-06-10

    Here, we report the complete genome sequence of a high alkaline cellulase producing Aspergillus fumigatus strain LMB-35Aa isolated from soil of Peruvian Amazon rainforest. The genome is ∼27.5mb in size, comprises of 228 scaffolds with an average GC content of 50%, and is predicted to contain a total of 8660 protein-coding genes. Of which, 6156 are with known function; it codes for 607 putative CAZymes families potentially involved in carbohydrate metabolism. Several important cellulose degrading genes, such as endoglucanase A, endoglucanase B, endoglucanase D and beta-glucosidase, are also identified. The genome of A. fumigatus strain LMB-35Aa represents the first whole sequenced genome of non-clinical, high cellulase producing A. fumigatus strain isolated from forest soil. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Draft Genome Sequencing and Comparative Analysis of Aspergillus sojae NBRC4239

    PubMed Central

    Sato, Atsushi; Oshima, Kenshiro; Noguchi, Hideki; Ogawa, Masahiro; Takahashi, Tadashi; Oguma, Tetsuya; Koyama, Yasuji; Itoh, Takehiko; Hattori, Masahira; Hanya, Yoshiki

    2011-01-01

    We conducted genome sequencing of the filamentous fungus Aspergillus sojae NBRC4239 isolated from the koji used to prepare Japanese soy sauce. We used the 454 pyrosequencing technology and investigated the genome with respect to enzymes and secondary metabolites in comparison with other Aspergilli sequenced. Assembly of 454 reads generated a non-redundant sequence of 39.5-Mb possessing 13 033 putative genes and 65 scaffolds composed of 557 contigs. Of the 2847 open reading frames with Pfam domain scores of >150 found in A. sojae NBRC4239, 81.7% had a high degree of similarity with the genes of A. oryzae. Comparative analysis identified serine carboxypeptidase and aspartic protease genes unique to A. sojae NBRC4239. While A. oryzae possessed three copies of α-amyalse gene, A. sojae NBRC4239 possessed only a single copy. Comparison of 56 gene clusters for secondary metabolites between A. sojae NBRC4239 and A. oryzae revealed that 24 clusters were conserved, whereas 32 clusters differed between them that included a deletion of 18 508 bp containing mfs1, mao1, dmaT, and pks-nrps for the cyclopiazonic acid (CPA) biosynthesis, explaining the no productivity of CPA in A. sojae. The A. sojae NBRC4239 genome data will be useful to characterize functional features of the koji moulds used in Japanese industries. PMID:21659486

  15. GenomeRNAi: a database for cell-based RNAi phenotypes.

    PubMed

    Horn, Thomas; Arziman, Zeynep; Berger, Juerg; Boutros, Michael

    2007-01-01

    RNA interference (RNAi) has emerged as a powerful tool to generate loss-of-function phenotypes in a variety of organisms. Combined with the sequence information of almost completely annotated genomes, RNAi technologies have opened new avenues to conduct systematic genetic screens for every annotated gene in the genome. As increasing large datasets of RNAi-induced phenotypes become available, an important challenge remains the systematic integration and annotation of functional information. Genome-wide RNAi screens have been performed both in Caenorhabditis elegans and Drosophila for a variety of phenotypes and several RNAi libraries have become available to assess phenotypes for almost every gene in the genome. These screens were performed using different types of assays from visible phenotypes to focused transcriptional readouts and provide a rich data source for functional annotation across different species. The GenomeRNAi database provides access to published RNAi phenotypes obtained from cell-based screens and maps them to their genomic locus, including possible non-specific regions. The database also gives access to sequence information of RNAi probes used in various screens. It can be searched by phenotype, by gene, by RNAi probe or by sequence and is accessible at http://rnai.dkfz.de.

  16. GenomeRNAi: a database for cell-based RNAi phenotypes

    PubMed Central

    Horn, Thomas; Arziman, Zeynep; Berger, Juerg; Boutros, Michael

    2007-01-01

    RNA interference (RNAi) has emerged as a powerful tool to generate loss-of-function phenotypes in a variety of organisms. Combined with the sequence information of almost completely annotated genomes, RNAi technologies have opened new avenues to conduct systematic genetic screens for every annotated gene in the genome. As increasing large datasets of RNAi-induced phenotypes become available, an important challenge remains the systematic integration and annotation of functional information. Genome-wide RNAi screens have been performed both in Caenorhabditis elegans and Drosophila for a variety of phenotypes and several RNAi libraries have become available to assess phenotypes for almost every gene in the genome. These screens were performed using different types of assays from visible phenotypes to focused transcriptional readouts and provide a rich data source for functional annotation across different species. The GenomeRNAi database provides access to published RNAi phenotypes obtained from cell-based screens and maps them to their genomic locus, including possible non-specific regions. The database also gives access to sequence information of RNAi probes used in various screens. It can be searched by phenotype, by gene, by RNAi probe or by sequence and is accessible at PMID:17135194

  17. Brassica database (BRAD) version 2.0: integrating and mining Brassicaceae species genomic resources.

    PubMed

    Wang, Xiaobo; Wu, Jian; Liang, Jianli; Cheng, Feng; Wang, Xiaowu

    2015-01-01

    The Brassica database (BRAD) was built initially to assist users apply Brassica rapa and Arabidopsis thaliana genomic data efficiently to their research. However, many Brassicaceae genomes have been sequenced and released after its construction. These genomes are rich resources for comparative genomics, gene annotation and functional evolutionary studies of Brassica crops. Therefore, we have updated BRAD to version 2.0 (V2.0). In BRAD V2.0, 11 more Brassicaceae genomes have been integrated into the database, namely those of Arabidopsis lyrata, Aethionema arabicum, Brassica oleracea, Brassica napus, Camelina sativa, Capsella rubella, Leavenworthia alabamica, Sisymbrium irio and three extremophiles Schrenkiella parvula, Thellungiella halophila and Thellungiella salsuginea. BRAD V2.0 provides plots of syntenic genomic fragments between pairs of Brassicaceae species, from the level of chromosomes to genomic blocks. The Generic Synteny Browser (GBrowse_syn), a module of the Genome Browser (GBrowse), is used to show syntenic relationships between multiple genomes. Search functions for retrieving syntenic and non-syntenic orthologs, as well as their annotation and sequences are also provided. Furthermore, genome and annotation information have been imported into GBrowse so that all functional elements can be visualized in one frame. We plan to continually update BRAD by integrating more Brassicaceae genomes into the database. Database URL: http://brassicadb.org/brad/. © The Author(s) 2015. Published by Oxford University Press.

  18. Exploring Genetic, Genomic, and Phenotypic Data at the Rat Genome Database

    PubMed Central

    Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Lowry, Timothy F.; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R.; Dwinell, Melinda R.; Jacob, Howard J.; Shimoyama, Mary

    2013-01-01

    The laboratory rat, Rattus norvegicus, is an important model of human health and disease, and experimental findings in the rat have relevance to human physiology and disease. The Rat Genome Database (RGD, http://rgd.mcw.edu) is a model organism database that provides access to a wide variety of curated rat data including disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components for genes, quantitative trait loci, and strains. We present an overview of the database followed by specific examples that can be used to gain experience in employing RGD to explore the wealth of functional data available for the rat. PMID:23255149

  19. Using the Saccharomyces Genome Database (SGD) for analysis of genomic information

    PubMed Central

    Skrzypek, Marek S.; Hirschman, Jodi

    2011-01-01

    Analysis of genomic data requires access to software tools that place the sequence-derived information in the context of biology. The Saccharomyces Genome Database (SGD) integrates functional information about budding yeast genes and their products with a set of analysis tools that facilitate exploring their biological details. This unit describes how the various types of functional data available at SGD can be searched, retrieved, and analyzed. Starting with the guided tour of the SGD Home page and Locus Summary page, this unit highlights how to retrieve data using YeastMine, how to visualize genomic information with GBrowse, how to explore gene expression patterns with SPELL, and how to use Gene Ontology tools to characterize large-scale datasets. PMID:21901739

  20. Gene Expression Profiling of Bronchoalveolar Lavage Cells During Aspergillus Colonization of the Lung Allograft.

    PubMed

    Weigt, S Samuel; Wang, Xiaoyan; Palchevskiy, Vyacheslav; Patel, Naman; Derhovanessian, Ariss; Shino, Michael Y; Sayah, David M; Lynch, Joseph P; Saggar, Rajan; Ross, David J; Kubak, Bernie M; Ardehali, Abbas; Palmer, Scott; Husain, Shahid; Belperio, John A

    2018-06-01

    Aspergillus colonization after lung transplant is associated with an increased risk of chronic lung allograft dysfunction (CLAD). We hypothesized that gene expression during Aspergillus colonization could provide clues to CLAD pathogenesis. We examined transcriptional profiles in 3- or 6-month surveillance bronchoalveolar lavage fluid cell pellets from recipients with Aspergillus fumigatus colonization (n = 12) and without colonization (n = 10). Among the Aspergillus colonized, we also explored profiles in those who developed CLAD (n = 6) or remained CLAD-free (n = 6). Transcription profiles were assayed with the HG-U133 Plus 2.0 microarray (Affymetrix). Differential gene expression was based on an absolute fold difference of 2.0 or greater and unadjusted P value less than 0.05. We used NIH Database for Annotation, Visualization and Integrated Discovery for functional analyses, with false discovery rates less than 5% considered significant. Aspergillus colonization was associated with differential expression of 489 probe sets, representing 404 unique genes. "Defense response" genes and genes in the "cytokine-cytokine receptor" Kyoto Encyclopedia of Genes and Genomes pathway were notably enriched in this list. Among Aspergillus colonized patients, CLAD development was associated with differential expression of 69 probe sets, representing 64 unique genes. This list was enriched for genes involved in "immune response" and "response to wounding", among others. Notably, both chitinase 3-like-1 and chitotriosidase were associated with progression to CLAD. Aspergillus colonization is associated with gene expression profiles related to defense responses including cytokine signaling. Epithelial wounding, as well as the innate immune response to chitin that is present in the fungal cell wall, may be key in the link between Aspergillus colonization and CLAD.

  1. Bovine Genome Database: supporting community annotation and analysis of the Bos taurus genome

    PubMed Central

    2010-01-01

    Background A goal of the Bovine Genome Database (BGD; http://BovineGenome.org) has been to support the Bovine Genome Sequencing and Analysis Consortium (BGSAC) in the annotation and analysis of the bovine genome. We were faced with several challenges, including the need to maintain consistent quality despite diversity in annotation expertise in the research community, the need to maintain consistent data formats, and the need to minimize the potential duplication of annotation effort. With new sequencing technologies allowing many more eukaryotic genomes to be sequenced, the demand for collaborative annotation is likely to increase. Here we present our approach, challenges and solutions facilitating a large distributed annotation project. Results and Discussion BGD has provided annotation tools that supported 147 members of the BGSAC in contributing 3,871 gene models over a fifteen-week period, and these annotations have been integrated into the bovine Official Gene Set. Our approach has been to provide an annotation system, which includes a BLAST site, multiple genome browsers, an annotation portal, and the Apollo Annotation Editor configured to connect directly to our Chado database. In addition to implementing and integrating components of the annotation system, we have performed computational analyses to create gene evidence tracks and a consensus gene set, which can be viewed on individual gene pages at BGD. Conclusions We have provided annotation tools that alleviate challenges associated with distributed annotation. Our system provides a consistent set of data to all annotators and eliminates the need for annotators to format data. Involving the bovine research community in genome annotation has allowed us to leverage expertise in various areas of bovine biology to provide biological insight into the genome sequence. PMID:21092105

  2. Characterization of recombinant terrelysin, a hemolysin of Aspergillus terreus.

    PubMed

    Nayak, Ajay P; Blachere, Françoise M; Hettick, Justin M; Lukomski, Slawomir; Schmechel, Detlef; Beezhold, Donald H

    2011-01-01

    Fungal hemolysins are potential virulence factors. Some fungal hemolysins belong to the aegerolysin protein family that includes cytolysins capable of lysing erythrocytes and other cells. Here, we describe a hemolysin from Aspergillus terreus called terrelysin. We used the genome sequence database to identify the terrelysin sequence based on homology with other known aegerolysins. Aspergillus terreus mRNA was isolated, transcribed to cDNA and the open reading frame for terrelysin amplified by PCR using specific primers. Using the pASK-IBA6 cloning vector, we produced recombinant terrelysin (rTerrelysin) as a fusion product in Escherichia coli. The recombinant protein was purified and using MALDI-TOF MS determined to have a mass of 16,428 Da. Circular dichroism analysis suggests the secondary structure of the protein to be predominantly β-sheet. Results from thermal denaturation of rTerrelysin show that the protein maintained the β-sheet confirmation up to 65°C. Polyclonal antibody to rTerrelysin recognized a protein of approximately 16.5 kDa in mycelial extracts from A. terreus.

  3. A novel mycovirus from Aspergillus fumigatus contains four unique dsRNAs as its genome and is infectious as dsRNA

    PubMed Central

    Kanhayuwa, Lakkhana; Kotta-Loizou, Ioly; Özkan, Selin; Gunning, A. Patrick; Coutts, Robert H. A.

    2015-01-01

    We report the discovery and characterization of a double-stranded RNA (dsRNA) mycovirus isolated from the human pathogenic fungus Aspergillus fumigatus, Aspergillus fumigatus tetramycovirus-1 (AfuTmV-1), which reveals several unique features not found previously in positive-strand RNA viruses, including the fact that it represents the first dsRNA (to our knowledge) that is not only infectious as a purified entity but also as a naked dsRNA. The AfuTmV-1 genome consists of four capped dsRNAs, the largest of which encodes an RNA-dependent RNA polymerase (RdRP) containing a unique GDNQ motif normally characteristic of negative-strand RNA viruses. The third largest dsRNA encodes an S-adenosyl methionine–dependent methyltransferase capping enzyme and the smallest dsRNA a P-A-S–rich protein that apparently coats but does not encapsidate the viral genome as visualized by atomic force microscopy. A combination of a capping enzyme with a picorna-like RdRP in the AfuTmV-1 genome is a striking case of chimerism and the first example (to our knowledge) of such a phenomenon. AfuTmV-1 appears to be intermediate between dsRNA and positive-strand ssRNA viruses, as well as between encapsidated and capsidless RNA viruses. PMID:26139522

  4. EuPathDB: the eukaryotic pathogen genomics database resource

    PubMed Central

    Aurrecoechea, Cristina; Barreto, Ana; Basenko, Evelina Y.; Brestelli, John; Brunk, Brian P.; Cade, Shon; Crouch, Kathryn; Doherty, Ryan; Falke, Dave; Fischer, Steve; Gajria, Bindu; Harb, Omar S.; Heiges, Mark; Hertz-Fowler, Christiane; Hu, Sufen; Iodice, John; Kissinger, Jessica C.; Lawrence, Cris; Li, Wei; Pinney, Deborah F.; Pulman, Jane A.; Roos, David S.; Shanmugasundram, Achchuthan; Silva-Franco, Fatima; Steinbiss, Sascha; Stoeckert, Christian J.; Spruill, Drew; Wang, Haiming; Warrenfeltz, Susanne; Zheng, Jie

    2017-01-01

    The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, http://eupathdb.org) is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of host–pathogen interactions. PMID:27903906

  5. SALAD database: a motif-based database of protein annotations for plant comparative genomics

    PubMed Central

    Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

    2010-01-01

    Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209 529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named ‘SALAD on ARRAYs’ to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis. PMID:19854933

  6. SALAD database: a motif-based database of protein annotations for plant comparative genomics.

    PubMed

    Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

    2010-01-01

    Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209,529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named 'SALAD on ARRAYs' to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis.

  7. A Ruby API to query the Ensembl database for genomic features.

    PubMed

    Strozzi, Francesco; Aerts, Jan

    2011-04-01

    The Ensembl database makes genomic features available via its Genome Browser. It is also possible to access the underlying data through a Perl API for advanced querying. We have developed a full-featured Ruby API to the Ensembl databases, providing the same functionality as the Perl interface with additional features. A single Ruby API is used to access different releases of the Ensembl databases and is also able to query multi-species databases. Most functionality of the API is provided using the ActiveRecord pattern. The library depends on introspection to make it release independent. The API is available through the Rubygem system and can be installed with the command gem install ruby-ensembl-api.

  8. Description of an orthologous cluster of ochratoxin A biosynthetic genes in Aspergillus and Penicillium species. A comparative analysis.

    PubMed

    Gil-Serna, Jessica; García-Díaz, Marta; González-Jaén, María Teresa; Vázquez, Covadonga; Patiño, Belén

    2018-03-02

    Ochratoxin A (OTA) is one of the most important mycotoxins due to its toxic properties and worldwide distribution which is produced by several Aspergillus and Penicillium species. The knowledge of OTA biosynthetic genes and understanding of the mechanisms involved in their regulation are essential. In this work, we obtained a clear picture of biosynthetic genes organization in the main OTA-producing Aspergillus and Penicillium species (A. steynii, A. westerdijkiae, A. niger, A. carbonarius and P. nordicum) using complete genome sequences obtained in this work or previously available on databases. The results revealed a region containing five ORFs which predicted five proteins: halogenase, bZIP transcription factor, cytochrome P450 monooxygenase, non-ribosomal peptide synthetase and polyketide synthase in all the five species. Genetic synteny was conserved in both Penicillium and Aspergillus species although genomic location seemed to be different since the clusters presented different flanking regions (except for A. steynii and A. westerdijkiae); these observations support the hypothesis of the orthology of this genomic region and that it might have been acquired by horizontal transfer. New real-time RT-PCR assays for quantification of the expression of these OTA biosynthetic genes were developed. In all species, the five genes were consistently expressed in OTA-producing strains in permissive conditions. These protocols might favour futures studies on the regulation of biosynthetic genes in order to develop new efficient control methods to avoid OTA entering the food chain. Copyright © 2018 Elsevier B.V. All rights reserved.

  9. Toward systems metabolic engineering of Aspergillus and Pichia species for the production of chemicals and biofuels.

    PubMed

    Caspeta, Luis; Nielsen, Jens

    2013-05-01

    Recently genome sequence data have become available for Aspergillus and Pichia species of industrial interest. This has stimulated the use of systems biology approaches for large-scale analysis of the molecular and metabolic responses of Aspergillus and Pichia under defined conditions, which has resulted in much new biological information. Case-specific contextualization of this information has been performed using comparative and functional genomic tools. Genomics data are also the basis for constructing genome-scale metabolic models, and these models have helped in the contextualization of knowledge on the fundamental biology of Aspergillus and Pichia species. Furthermore, with the availability of these models, the engineering of Aspergillus and Pichia is moving from traditional approaches, such as random mutagenesis, to a systems metabolic engineering approach. Here we review the recent trends in systems biology of Aspergillus and Pichia species, highlighting the relevance of these developments for systems metabolic engineering of these organisms for the production of hydrolytic enzymes, biofuels and chemicals from biomass. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. THGS: a web-based database of Transmembrane Helices in Genome Sequences

    PubMed Central

    Fernando, S. A.; Selvarani, P.; Das, Soma; Kumar, Ch. Kiran; Mondal, Sukanta; Ramakumar, S.; Sekar, K.

    2004-01-01

    Transmembrane Helices in Genome Sequences (THGS) is an interactive web-based database, developed to search the transmembrane helices in the user-interested gene sequences available in the Genome Database (GDB). The proposed database has provision to search sequence motifs in transmembrane and globular proteins. In addition, the motif can be searched in the other sequence databases (Swiss-Prot and PIR) or in the macromolecular structure database, Protein Data Bank (PDB). Further, the 3D structure of the corresponding queried motif, if it is available in the solved protein structures deposited in the Protein Data Bank, can also be visualized using the widely used graphics package RASMOL. All the sequence databases used in the present work are updated frequently and hence the results produced are up to date. The database THGS is freely available via the world wide web and can be accessed at http://pranag.physics.iisc.ernet.in/thgs/ or http://144.16.71.10/thgs/. PMID:14681375

  11. PGSB/MIPS PlantsDB Database Framework for the Integration and Analysis of Plant Genome Data.

    PubMed

    Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai; Gundlach, Heidrun; Mayer, Klaus F X

    2017-01-01

    Plant Genome and Systems Biology (PGSB), formerly Munich Institute for Protein Sequences (MIPS) PlantsDB, is a database framework for the integration and analysis of plant genome data, developed and maintained for more than a decade now. Major components of that framework are genome databases and analysis resources focusing on individual (reference) genomes providing flexible and intuitive access to data. Another main focus is the integration of genomes from both model and crop plants to form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny). Data exchange and integrated search functionality with/over many plant genome databases is provided within the transPLANT project.

  12. Uniform standards for genome databases in forest and fruit trees

    USDA-ARS?s Scientific Manuscript database

    TreeGenes and tfGDR serve the international forestry and fruit tree genomics research communities, respectively. These databases hold similar sequence data and provide resources for the submission and recovery of this information in order to enable comparative genomics research. Large-scale genotype...

  13. SoyBase, The USDA-ARS Soybean Genetics and Genomics Database

    USDA-ARS?s Scientific Manuscript database

    SoyBase, the USDA-ARS soybean genetic database, is a comprehensive repository for professionally curated genetics, genomics and related data resources for soybean. SoyBase contains the most current genetic, physical and genomic sequence maps integrated with qualitative and quantitative traits. The...

  14. MIPS: a database for protein sequences, homology data and yeast genome information.

    PubMed Central

    Mewes, H W; Albermann, K; Heumann, K; Liebl, S; Pfeiffer, F

    1997-01-01

    The MIPS group (Martinsried Institute for Protein Sequences) at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, collects, processes and distributes protein sequence data within the framework of the tripartite association of the PIR-International Protein Sequence Database (,). MIPS contributes nearly 50% of the data input to the PIR-International Protein Sequence Database. The database is distributed on CD-ROM together with PATCHX, an exhaustive supplement of unique, unverified protein sequences from external sources compiled by MIPS. Through its WWW server (http://www.mips.biochem.mpg.de/ ) MIPS permits internet access to sequence databases, homology data and to yeast genome information. (i) Sequence similarity results from the FASTA program () are stored in the FASTA database for all proteins from PIR-International and PATCHX. The database is dynamically maintained and permits instant access to FASTA results. (ii) Starting with FASTA database queries, proteins have been classified into families and superfamilies (PROT-FAM). (iii) The HPT (hashed position tree) data structure () developed at MIPS is a new approach for rapid sequence and pattern searching. (iv) MIPS provides access to the sequence and annotation of the complete yeast genome (), the functional classification of yeast genes (FunCat) and its graphical display, the 'Genome Browser' (). A CD-ROM based on the JAVA programming language providing dynamic interactive access to the yeast genome and the related protein sequences has been compiled and is available on request. PMID:9016498

  15. DroSpeGe: rapid access database for new Drosophila species genomes.

    PubMed

    Gilbert, Donald G

    2007-01-01

    The Drosophila species comparative genome database DroSpeGe (http://insects.eugenes.org/DroSpeGe/) provides genome researchers with rapid, usable access to 12 new and old Drosophila genomes, since its inception in 2004. Scientists can use, with minimal computing expertise, the wealth of new genome information for developing new insights into insect evolution. New genome assemblies provided by several sequencing centers have been annotated with known model organism gene homologies and gene predictions to provided basic comparative data. TeraGrid supplies the shared cyberinfrastructure for the primary computations. This genome database includes homologies to Drosophila melanogaster and eight other eukaryote model genomes, and gene predictions from several groups. BLAST searches of the newest assemblies are integrated with genome maps. GBrowse maps provide detailed views of cross-species aligned genomes. BioMart provides for data mining of annotations and sequences. Common chromosome maps identify major synteny among species. Potential gain and loss of genes is suggested by Gene Ontology groupings for genes of the new species. Summaries of essential genome statistics include sizes, genes found and predicted, homology among genomes, phylogenetic trees of species and comparisons of several gene predictions for sensitivity and specificity in finding new and known genes.

  16. CBS Genome Atlas Database: a dynamic storage for bioinformatic results and sequence data.

    PubMed

    Hallin, Peter F; Ussery, David W

    2004-12-12

    Currently, new bacterial genomes are being published on a monthly basis. With the growing amount of genome sequence data, there is a demand for a flexible and easy-to-maintain structure for storing sequence data and results from bioinformatic analysis. More than 150 sequenced bacterial genomes are now available, and comparisons of properties for taxonomically similar organisms are not readily available to many biologists. In addition to the most basic information, such as AT content, chromosome length, tRNA count and rRNA count, a large number of more complex calculations are needed to perform detailed comparative genomics. DNA structural calculations like curvature and stacking energy, DNA compositions like base skews, oligo skews and repeats at the local and global level are just a few of the analysis that are presented on the CBS Genome Atlas Web page. Complex analysis, changing methods and frequent addition of new models are factors that require a dynamic database layout. Using basic tools like the GNU Make system, csh, Perl and MySQL, we have created a flexible database environment for storing and maintaining such results for a collection of complete microbial genomes. Currently, these results counts to more than 220 pieces of information. The backbone of this solution consists of a program package written in Perl, which enables administrators to synchronize and update the database content. The MySQL database has been connected to the CBS web-server via PHP4, to present a dynamic web content for users outside the center. This solution is tightly fitted to existing server infrastructure and the solutions proposed here can perhaps serve as a template for other research groups to solve database issues. A web based user interface which is dynamically linked to the Genome Atlas Database can be accessed via www.cbs.dtu.dk/services/GenomeAtlas/. This paper has a supplemental information page which links to the examples presented: www.cbs.dtu.dk/services/GenomeAtlas/suppl/bioinfdatabase.

  17. An object model and database for functional genomics.

    PubMed

    Jones, Andrew; Hunt, Ela; Wastling, Jonathan M; Pizarro, Angel; Stoeckert, Christian J

    2004-07-10

    Large-scale functional genomics analysis is now feasible and presents significant challenges in data analysis, storage and querying. Data standards are required to enable the development of public data repositories and to improve data sharing. There is an established data format for microarrays (microarray gene expression markup language, MAGE-ML) and a draft standard for proteomics (PEDRo). We believe that all types of functional genomics experiments should be annotated in a consistent manner, and we hope to open up new ways of comparing multiple datasets used in functional genomics. We have created a functional genomics experiment object model (FGE-OM), developed from the microarray model, MAGE-OM and two models for proteomics, PEDRo and our own model (Gla-PSI-Glasgow Proposal for the Proteomics Standards Initiative). FGE-OM comprises three namespaces representing (i) the parts of the model common to all functional genomics experiments; (ii) microarray-specific components; and (iii) proteomics-specific components. We believe that FGE-OM should initiate discussion about the contents and structure of the next version of MAGE and the future of proteomics standards. A prototype database called RNA And Protein Abundance Database (RAPAD), based on FGE-OM, has been implemented and populated with data from microbial pathogenesis. FGE-OM and the RAPAD schema are available from http://www.gusdb.org/fge.html, along with a set of more detailed diagrams. RAPAD can be accessed by registration at the site.

  18. Comprehensive annotation of secondary metabolite biosynthetic genes and gene clusters of Aspergillus nidulans, A. fumigatus, A. niger and A. oryzae

    PubMed Central

    2013-01-01

    Background Secondary metabolite production, a hallmark of filamentous fungi, is an expanding area of research for the Aspergilli. These compounds are potent chemicals, ranging from deadly toxins to therapeutic antibiotics to potential anti-cancer drugs. The genome sequences for multiple Aspergilli have been determined, and provide a wealth of predictive information about secondary metabolite production. Sequence analysis and gene overexpression strategies have enabled the discovery of novel secondary metabolites and the genes involved in their biosynthesis. The Aspergillus Genome Database (AspGD) provides a central repository for gene annotation and protein information for Aspergillus species. These annotations include Gene Ontology (GO) terms, phenotype data, gene names and descriptions and they are crucial for interpreting both small- and large-scale data and for aiding in the design of new experiments that further Aspergillus research. Results We have manually curated Biological Process GO annotations for all genes in AspGD with recorded functions in secondary metabolite production, adding new GO terms that specifically describe each secondary metabolite. We then leveraged these new annotations to predict roles in secondary metabolism for genes lacking experimental characterization. As a starting point for manually annotating Aspergillus secondary metabolite gene clusters, we used antiSMASH (antibiotics and Secondary Metabolite Analysis SHell) and SMURF (Secondary Metabolite Unknown Regions Finder) algorithms to identify potential clusters in A. nidulans, A. fumigatus, A. niger and A. oryzae, which we subsequently refined through manual curation. Conclusions This set of 266 manually curated secondary metabolite gene clusters will facilitate the investigation of novel Aspergillus secondary metabolites. PMID:23617571

  19. The COG database: a tool for genome-scale analysis of protein functions and evolution

    PubMed Central

    Tatusov, Roman L.; Galperin, Michael Y.; Natale, Darren A.; Koonin, Eugene V.

    2000-01-01

    Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www.ncbi.nlm.nih.gov/COG ). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56–83% of the gene products from each of the complete bacterial and archaeal genomes and ~35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes. PMID:10592175

  20. BRAD, the genetics and genomics database for Brassica plants.

    PubMed

    Cheng, Feng; Liu, Shengyi; Wu, Jian; Fang, Lu; Sun, Silong; Liu, Bo; Li, Pingxia; Hua, Wei; Wang, Xiaowu

    2011-10-13

    Brassica species include both vegetable and oilseed crops, which are very important to the daily life of common human beings. Meanwhile, the Brassica species represent an excellent system for studying numerous aspects of plant biology, specifically for the analysis of genome evolution following polyploidy, so it is also very important for scientific research. Now, the genome of Brassica rapa has already been assembled, it is the time to do deep mining of the genome data. BRAD, the Brassica database, is a web-based resource focusing on genome scale genetic and genomic data for important Brassica crops. BRAD was built based on the first whole genome sequence and on further data analysis of the Brassica A genome species, Brassica rapa (Chiifu-401-42). It provides datasets, such as the complete genome sequence of B. rapa, which was de novo assembled from Illumina GA II short reads and from BAC clone sequences, predicted genes and associated annotations, non coding RNAs, transposable elements (TE), B. rapa genes' orthologous to those in A. thaliana, as well as genetic markers and linkage maps. BRAD offers useful searching and data mining tools, including search across annotation datasets, search for syntenic or non-syntenic orthologs, and to search the flanking regions of a certain target, as well as the tools of BLAST and Gbrowse. BRAD allows users to enter almost any kind of information, such as a B. rapa or A. thaliana gene ID, physical position or genetic marker. BRAD, a new database which focuses on the genetics and genomics of the Brassica plants has been developed, it aims at helping scientists and breeders to fully and efficiently use the information of genome data of Brassica plants. BRAD will be continuously updated and can be accessed through http://brassicadb.org.

  1. Haemophilus influenzae Genome Database (HIGDB): a single point web resource for Haemophilus influenzae.

    PubMed

    Swetha, Rayapadi G; Kala Sekar, Dinesh Kumar; Ramaiah, Sudha; Anbarasu, Anand; Sekar, Kanagaraj

    2014-12-01

    Haemophilus influenzae (H. Influenzae) is the causative agent of pneumonia, bacteraemia and meningitis. The organism is responsible for large number of deaths in both developed and developing countries. Even-though the first bacterial genome to be sequenced was that of H. Influenzae, there is no exclusive database dedicated for H. Influenzae. This prompted us to develop the Haemophilus influenzae Genome Database (HIGDB). All data of HIGDB are stored and managed in MySQL database. The HIGDB is hosted on Solaris server and developed using PERL modules. Ajax and JavaScript are used for the interface development. The HIGDB contains detailed information on 42,741 proteins, 18,077 genes including 10 whole genome sequences and also 284 three dimensional structures of proteins of H. influenzae. In addition, the database provides "Motif search" and "GBrowse". The HIGDB is freely accessible through the URL: http://bioserver1.physics.iisc.ernet.in/HIGDB/. The HIGDB will be a single point access for bacteriological, clinical, genomic and proteomic information of H. influenzae. The database can also be used to identify DNA motifs within H. influenzae genomes and to compare gene or protein sequences of a particular strain with other strains of H. influenzae. Copyright © 2014 Elsevier Ltd. All rights reserved.

  2. VitisExpDB: a database resource for grape functional genomics.

    PubMed

    Doddapaneni, Harshavardhan; Lin, Hong; Walker, M Andrew; Yao, Jiqiang; Civerolo, Edwin L

    2008-02-28

    The family Vitaceae consists of many different grape species that grow in a range of climatic conditions. In the past few years, several studies have generated functional genomic information on different Vitis species and cultivars, including the European grape vine, Vitis vinifera. Our goal is to develop a comprehensive web data source for Vitaceae. VitisExpDB is an online MySQL-PHP driven relational database that houses annotated EST and gene expression data for V. vinifera and non-vinifera grape species and varieties. Currently, the database stores approximately 320,000 EST sequences derived from 8 species/hybrids, their annotation (BLAST top match) details and Gene Ontology based structured vocabulary. Putative homologs for each EST in other species and varieties along with information on their percent nucleotide identities, phylogenetic relationship and common primers can be retrieved. The database also includes information on probe sequence and annotation features of the high density 60-mer gene expression chip consisting of approximately 20,000 non-redundant set of ESTs. Finally, the database includes 14 processed global microarray expression profile sets. Data from 12 of these expression profile sets have been mapped onto metabolic pathways. A user-friendly web interface with multiple search indices and extensively hyperlinked result features that permit efficient data retrieval has been developed. Several online bioinformatics tools that interact with the database along with other sequence analysis tools have been added. In addition, users can submit their ESTs to the database. The developed database provides genomic resource to grape community for functional analysis of genes in the collection and for the grape genome annotation and gene function identification. The VitisExpDB database is available through our website http://cropdisease.ars.usda.gov/vitis_at/main-page.htm.

  3. VitisExpDB: A database resource for grape functional genomics

    PubMed Central

    Doddapaneni, Harshavardhan; Lin, Hong; Walker, M Andrew; Yao, Jiqiang; Civerolo, Edwin L

    2008-01-01

    Background The family Vitaceae consists of many different grape species that grow in a range of climatic conditions. In the past few years, several studies have generated functional genomic information on different Vitis species and cultivars, including the European grape vine, Vitis vinifera. Our goal is to develop a comprehensive web data source for Vitaceae. Description VitisExpDB is an online MySQL-PHP driven relational database that houses annotated EST and gene expression data for V. vinifera and non-vinifera grape species and varieties. Currently, the database stores ~320,000 EST sequences derived from 8 species/hybrids, their annotation (BLAST top match) details and Gene Ontology based structured vocabulary. Putative homologs for each EST in other species and varieties along with information on their percent nucleotide identities, phylogenetic relationship and common primers can be retrieved. The database also includes information on probe sequence and annotation features of the high density 60-mer gene expression chip consisting of ~20,000 non-redundant set of ESTs. Finally, the database includes 14 processed global microarray expression profile sets. Data from 12 of these expression profile sets have been mapped onto metabolic pathways. A user-friendly web interface with multiple search indices and extensively hyperlinked result features that permit efficient data retrieval has been developed. Several online bioinformatics tools that interact with the database along with other sequence analysis tools have been added. In addition, users can submit their ESTs to the database. Conclusion The developed database provides genomic resource to grape community for functional analysis of genes in the collection and for the grape genome annotation and gene function identification. The VitisExpDB database is available through our website . PMID:18307813

  4. Building a genome database using an object-oriented approach.

    PubMed

    Barbasiewicz, Anna; Liu, Lin; Lang, B Franz; Burger, Gertraud

    2002-01-01

    GOBASE is a relational database that integrates data associated with mitochondria and chloroplasts. The most important data in GOBASE, i. e., molecular sequences and taxonomic information, are obtained from the public sequence data repository at the National Center for Biotechnology Information (NCBI), and are validated by our experts. Maintaining a curated genomic database comes with a towering labor cost, due to the shear volume of available genomic sequences and the plethora of annotation errors and omissions in records retrieved from public repositories. Here we describe our approach to increase automation of the database population process, thereby reducing manual intervention. As a first step, we used Unified Modeling Language (UML) to construct a list of potential errors. Each case was evaluated independently, and an expert solution was devised, and represented as a diagram. Subsequently, the UML diagrams were used as templates for writing object-oriented automation programs in the Java programming language.

  5. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency.

    PubMed

    Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio

    2015-01-01

    Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.

  6. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency

    PubMed Central

    Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio

    2015-01-01

    Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB. PMID:26558254

  7. CnidBase: The Cnidarian Evolutionary Genomics Database

    PubMed Central

    Ryan, Joseph F.; Finnerty, John R.

    2003-01-01

    CnidBase, the Cnidarian Evolutionary Genomics Database, is a tool for investigating the evolutionary, developmental and ecological factors that affect gene expression and gene function in cnidarians. In turn, CnidBase will help to illuminate the role of specific genes in shaping cnidarian biodiversity in the present day and in the distant past. CnidBase highlights evolutionary changes between species within the phylum Cnidaria and structures genomic and expression data to facilitate comparisons to non-cnidarian metazoans. CnidBase aims to further the progress that has already been made in the realm of cnidarian evolutionary genomics by creating a central community resource which will help drive future research and facilitate more accurate classification and comparison of new experimental data with existing data. CnidBase is available at http://cnidbase.bu.edu/. PMID:12519972

  8. EU Laws on Privacy in Genomic Databases and Biobanking.

    PubMed

    Townend, David

    2016-03-01

    Both the European Union and the Council of Europe have a bearing on privacy in genomic databases and biobanking. In terms of legislation, the processing of personal data as it relates to the right to privacy is currently largely regulated in Europe by Directive 95/46/EC, which requires that processing be "fair and lawful" and follow a set of principles, meaning that the data be processed only for stated purposes, be sufficient for the purposes of the processing, be kept only for so long as is necessary to achieve those purposes, and be kept securely and only in an identifiable state for such time as is necessary for the processing. The European privacy regime does not require the de-identification (anonymization) of personal data used in genomic databases or biobanks, and alongside this practice informed consent as well as governance and oversight mechanisms provide for the protection of genomic data. © 2016 American Society of Law, Medicine & Ethics.

  9. Outreach and online training services at the Saccharomyces Genome Database.

    PubMed

    MacPherson, Kevin A; Starr, Barry; Wong, Edith D; Dalusag, Kyla S; Hellerstedt, Sage T; Lang, Olivia W; Nash, Robert S; Skrzypek, Marek S; Engel, Stacia R; Cherry, J Michael

    2017-01-01

    The Saccharomyces Genome Database (SGD; www.yeastgenome.org ), the primary genetics and genomics resource for the budding yeast S. cerevisiae , provides free public access to expertly curated information about the yeast genome and its gene products. As the central hub for the yeast research community, SGD engages in a variety of social outreach efforts to inform our users about new developments, promote collaboration, increase public awareness of the importance of yeast to biomedical research, and facilitate scientific discovery. Here we describe these various outreach methods, from networking at scientific conferences to the use of online media such as blog posts and webinars, and include our perspectives on the benefits provided by outreach activities for model organism databases. http://www.yeastgenome.org. © The Author(s) 2017. Published by Oxford University Press.

  10. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases.

    PubMed

    Chen, Qingyu; Zobel, Justin; Zhang, Xiuzhen; Verspoor, Karin

    2016-01-01

    First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material.

  11. Exploration of the Chemical Space of Public Genomic Databases

    EPA Science Inventory

    The current project aims to chemically index the content of public genomic databases to make these data accessible in relation to other publicly available, chemically-indexed toxicological information.

  12. Design and implementation of a database for Brucella melitensis genome annotation.

    PubMed

    De Hertogh, Benoît; Lahlimi, Leïla; Lambert, Christophe; Letesson, Jean-Jacques; Depiereux, Eric

    2008-03-18

    The genome sequences of three Brucella biovars and of some species close to Brucella sp. have become available, leading to new relationship analysis. Moreover, the automatic genome annotation of the pathogenic bacteria Brucella melitensis has been manually corrected by a consortium of experts, leading to 899 modifications of start sites predictions among the 3198 open reading frames (ORFs) examined. This new annotation, coupled with the results of automatic annotation tools of the complete genome sequences of the B. melitensis genome (including BLASTs to 9 genomes close to Brucella), provides numerous data sets related to predicted functions, biochemical properties and phylogenic comparisons. To made these results available, alphaPAGe, a functional auto-updatable database of the corrected sequence genome of B. melitensis, has been built, using the entity-relationship (ER) approach and a multi-purpose database structure. A friendly graphical user interface has been designed, and users can carry out different kinds of information by three levels of queries: (1) the basic search use the classical keywords or sequence identifiers; (2) the original advanced search engine allows to combine (by using logical operators) numerous criteria: (a) keywords (textual comparison) related to the pCDS's function, family domains and cellular localization; (b) physico-chemical characteristics (numerical comparison) such as isoelectric point or molecular weight and structural criteria such as the nucleic length or the number of transmembrane helix (TMH); (c) similarity scores with Escherichia coli and 10 species phylogenetically close to B. melitensis; (3) complex queries can be performed by using a SQL field, which allows all queries respecting the database's structure. The database is publicly available through a Web server at the following url: http://www.fundp.ac.be/urbm/bioinfo/aPAGe.

  13. Use of Genomic Databases for Inquiry-Based Learning about Influenza

    ERIC Educational Resources Information Center

    Ledley, Fred; Ndung'u, Eric

    2011-01-01

    The genome projects of the past decades have created extensive databases of biological information with applications in both research and education. We describe an inquiry-based exercise that uses one such database, the National Center for Biotechnology Information Influenza Virus Resource, to advance learning about influenza. This database…

  14. Using relational databases for improved sequence similarity searching and large-scale genomic analyses.

    PubMed

    Mackey, Aaron J; Pearson, William R

    2004-10-01

    Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.

  15. MaizeGDB: The Maize Genetics and Genomics Database.

    USDA-ARS?s Scientific Manuscript database

    MaizeGDB is the community database for biological information about the crop plant Zea mays. Genomic, genetic, sequence, gene product, functional characterization, literature reference, and person/organization contact information are among the datatypes stored at MaizeGDB. At the project’s website...

  16. Accessing the SEED genome databases via Web services API: tools for programmers.

    PubMed

    Disz, Terry; Akhter, Sajia; Cuevas, Daniel; Olson, Robert; Overbeek, Ross; Vonstein, Veronika; Stevens, Rick; Edwards, Robert A

    2010-06-14

    The SEED integrates many publicly available genome sequences into a single resource. The database contains accurate and up-to-date annotations based on the subsystems concept that leverages clustering between genomes and other clues to accurately and efficiently annotate microbial genomes. The backend is used as the foundation for many genome annotation tools, such as the Rapid Annotation using Subsystems Technology (RAST) server for whole genome annotation, the metagenomics RAST server for random community genome annotations, and the annotation clearinghouse for exchanging annotations from different resources. In addition to a web user interface, the SEED also provides Web services based API for programmatic access to the data in the SEED, allowing the development of third-party tools and mash-ups. The currently exposed Web services encompass over forty different methods for accessing data related to microbial genome annotations. The Web services provide comprehensive access to the database back end, allowing any programmer access to the most consistent and accurate genome annotations available. The Web services are deployed using a platform independent service-oriented approach that allows the user to choose the most suitable programming platform for their application. Example code demonstrate that Web services can be used to access the SEED using common bioinformatics programming languages such as Perl, Python, and Java. We present a novel approach to access the SEED database. Using Web services, a robust API for access to genomics data is provided, without requiring large volume downloads all at once. The API ensures timely access to the most current datasets available, including the new genomes as soon as they come online.

  17. Investigation of mutations in the HBB gene using the 1,000 genomes database.

    PubMed

    Carlice-Dos-Reis, Tânia; Viana, Jaime; Moreira, Fabiano Cordeiro; Cardoso, Greice de Lemos; Guerreiro, João; Santos, Sidney; Ribeiro-Dos-Santos, Ândrea

    2017-01-01

    Mutations in the HBB gene are responsible for several serious hemoglobinopathies, such as sickle cell anemia and β-thalassemia. Sickle cell anemia is one of the most common monogenic diseases worldwide. Due to its prevalence, diverse strategies have been developed for a better understanding of its molecular mechanisms. In silico analysis has been increasingly used to investigate the genotype-phenotype relationship of many diseases, and the sequences of healthy individuals deposited in the 1,000 Genomes database appear to be an excellent tool for such analysis. The objective of this study is to analyze the variations in the HBB gene in the 1,000 Genomes database, to describe the mutation frequencies in the different population groups, and to investigate the pattern of pathogenicity. The computational tool SNPEFF was used to align the data from 2,504 samples of the 1,000 Genomes database with the HG19 genome reference. The pathogenicity of each amino acid change was investigated using the databases CLINVAR, dbSNP and HbVar and five different predictors. Twenty different mutations were found in 209 healthy individuals. The African group had the highest number of individuals with mutations, and the European group had the lowest number. Thus, it is concluded that approximately 8.3% of phenotypically healthy individuals from the 1,000 Genomes database have some mutation in the HBB gene. The frequency of mutated genes was estimated at 0.042, so that the expected frequency of being homozygous or compound heterozygous for these variants in the next generation is approximately 0.002. In total, 193 subjects had a non-synonymous mutation, which 186 (7.4%) have a deleterious mutation. Considering that the 1,000 Genomes database is representative of the world's population, it can be estimated that fourteen out of every 10,000 individuals in the world will have a hemoglobinopathy in the next generation.

  18. Diversity, Application, and Synthetic Biology of Industrially Important Aspergillus Fungi.

    PubMed

    Park, Hee-Soo; Jun, Sang-Cheol; Han, Kap-Hoon; Hong, Seung-Beom; Yu, Jae-Hyuk

    2017-01-01

    The filamentous fungal genus Aspergillus consists of over 340 officially recognized species. A handful of these Aspergillus fungi are predominantly used for food fermentation and large-scale production of enzymes, organic acids, and bioactive compounds. These industrially important Aspergilli primarily belong to the two major Aspergillus sections, Nigri and Flavi. Aspergillus oryzae (section Flavi) is the most commonly used mold for the fermentation of soybeans, rice, grains, and potatoes. Aspergillus niger (section Nigri) is used in the industrial production of various enzymes and organic acids, including 99% (1.4 million tons per year) of citric acid produced worldwide. Better understanding of the genomes and the signaling mechanisms of key Aspergillus species can help identify novel approaches to enhance these commercially significant strains. This review summarizes the diversity, current applications, key products, and synthetic biology of Aspergillus fungi commonly used in industry. Copyright © 2017 Elsevier Inc. All rights reserved.

  19. RICD: a rice indica cDNA database resource for rice functional genomics.

    PubMed

    Lu, Tingting; Huang, Xuehui; Zhu, Chuanrang; Huang, Tao; Zhao, Qiang; Xie, Kabing; Xiong, Lizhong; Zhang, Qifa; Han, Bin

    2008-11-26

    The Oryza sativa L. indica subspecies is the most widely cultivated rice. During the last few years, we have collected over 20,000 putative full-length cDNAs and over 40,000 ESTs isolated from various cDNA libraries of two indica varieties Guangluai 4 and Minghui 63. A database of the rice indica cDNAs was therefore built to provide a comprehensive web data source for searching and retrieving the indica cDNA clones. Rice Indica cDNA Database (RICD) is an online MySQL-PHP driven database with a user-friendly web interface. It allows investigators to query the cDNA clones by keyword, genome position, nucleotide or protein sequence, and putative function. It also provides a series of information, including sequences, protein domain annotations, similarity search results, SNPs and InDels information, and hyperlinks to gene annotation in both The Rice Annotation Project Database (RAP-DB) and The TIGR Rice Genome Annotation Resource, expression atlas in RiceGE and variation report in Gramene of each cDNA. The online rice indica cDNA database provides cDNA resource with comprehensive information to researchers for functional analysis of indica subspecies and for comparative genomics. The RICD database is available through our website http://www.ncgr.ac.cn/ricd.

  20. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Reddy, Tatiparthi B. K.; Thomas, Alex D.; Stamatis, Dimitri

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Within this paper, we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencingmore » projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. Lastly, GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.« less

  1. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    PubMed Central

    Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402

  2. SmedGD 2.0: The Schmidtea mediterranea genome database

    PubMed Central

    Robb, Sofia M.C.; Gotting, Kirsten; Ross, Eric; Sánchez Alvarado, Alejandro

    2016-01-01

    Planarians have emerged as excellent models for the study of key biological processes such as stem cell function and regulation, axial polarity specification, regeneration, and tissue homeostasis among others. The most widely used organism for these studies is the free-living flatworm Schmidtea mediterranea. In 2007, the Schmidtea mediterranea Genome Database (SmedGD) was first released to provide a much needed resource for the small, but growing planarian community. SmedGD 1.0 has been a depository for genome sequence, a draft assembly, and related experimental data (e.g., RNAi phenotypes, in situ hybridization images, and differential gene expression results). We report here a comprehensive update to SmedGD (SmedGD 2.0) that aims to expand its role as an interactive community resource. The new database includes more recent, and up-to-date transcription data, provides tools that enhance interconnectivity between different genome assemblies and transcriptomes, including next generation assemblies for both the sexual and asexual biotypes of S. mediterranea. SmedGD 2.0 (http://smedgd.stowers.org) not only provides significantly improved gene annotations, but also tools for data sharing, attributes that will help both the planarian and biomedical communities to more efficiently mine the genomics and transcriptomics of S. mediterranea. PMID:26138588

  3. Choosing a genome browser for a Model Organism Database: surveying the Maize community

    PubMed Central

    Sen, Taner Z.; Harper, Lisa C.; Schaeffer, Mary L.; Andorf, Carson M.; Seigfried, Trent E.; Campbell, Darwin A.; Lawrence, Carolyn J.

    2010-01-01

    As the B73 maize genome sequencing project neared completion, MaizeGDB began to integrate a graphical genome browser with its existing web interface and database. To ensure that maize researchers would optimally benefit from the potential addition of a genome browser to the existing MaizeGDB resource, personnel at MaizeGDB surveyed researchers’ needs. Collected data indicate that existing genome browsers for maize were inadequate and suggest implementation of a browser with quick interface and intuitive tools would meet most researchers’ needs. Here, we document the survey’s outcomes, review functionalities of available genome browser software platforms and offer our rationale for choosing the GBrowse software suite for MaizeGDB. Because the genome as represented within the MaizeGDB Genome Browser is tied to detailed phenotypic data, molecular marker information, available stocks, etc., the MaizeGDB Genome Browser represents a novel mechanism by which the researchers can leverage maize sequence information toward crop improvement directly. Database URL: http://gbrowse.maizegdb.org/ PMID:20627860

  4. Deppdb--DNA electrostatic potential properties database: electrostatic properties of genome DNA.

    PubMed

    Osypov, Alexander A; Krutinin, Gleb G; Kamzolova, Svetlana G

    2010-06-01

    The electrostatic properties of genome DNA influence its interactions with different proteins, in particular, the regulation of transcription by RNA-polymerases. DEPPDB--DNA Electrostatic Potential Properties Database--was developed to hold and provide all available information on the electrostatic properties of genome DNA combined with its sequence and annotation of biological and structural properties of genome elements and whole genomes. Genomes in DEPPDB are organized on a taxonomical basis. Currently, the database contains all the completely sequenced bacterial and viral genomes according to NCBI RefSeq. General properties of the genome DNA electrostatic potential profile and principles of its formation are revealed. This potential correlates with the GC content but does not correspond to it exactly and strongly depends on both the sequence arrangement and its context (flanking regions). Analysis of the promoter regions for bacterial and viral RNA polymerases revealed a correspondence between the scale of these proteins' physical properties and electrostatic profile patterns. We also discovered a direct correlation between the potential value and the binding frequency of RNA polymerase to DNA, supporting the idea of the role of electrostatics in these interactions. This matches a pronounced tendency of the promoter regions to possess higher values of the electrostatic potential.

  5. Genome sequence of the white koji mold Aspergillus kawachii IFO 4308, used for brewing the Japanese distilled spirit shochu.

    PubMed

    Futagami, Taiki; Mori, Kazuki; Yamashita, Ayaka; Wada, Shotaro; Kajiwara, Yasuhiro; Takashita, Hideharu; Omori, Toshiro; Takegawa, Kaoru; Tashiro, Kosuke; Kuhara, Satoru; Goto, Masatoshi

    2011-11-01

    The filamentous fungus Aspergillus kawachii has traditionally been used for brewing the Japanese distilled spirit shochu. A. kawachii characteristically hyperproduces citric acid and a variety of polysaccharide glycoside hydrolases. Here the genome sequence of A. kawachii IFO 4308 was determined and annotated. Analysis of the sequence may provide insight into the properties of this fungus that make it superior for use in shochu production, leading to the further development of A. kawachii for industrial applications.

  6. DEPPDB - DNA electrostatic potential properties database. Electrostatic properties of genome DNA elements.

    PubMed

    Osypov, Alexander A; Krutinin, Gleb G; Krutinina, Eugenia A; Kamzolova, Svetlana G

    2012-04-01

    Electrostatic properties of genome DNA are important to its interactions with different proteins, in particular, related to transcription. DEPPDB - DNA Electrostatic Potential (and other Physical) Properties Database - provides information on the electrostatic and other physical properties of genome DNA combined with its sequence and annotation of biological and structural properties of genomes and their elements. Genomes are organized on taxonomical basis, supporting comparative and evolutionary studies. Currently, DEPPDB contains all completely sequenced bacterial, viral, mitochondrial, and plastids genomes according to the NCBI RefSeq, and some model eukaryotic genomes. Data for promoters, regulation sites, binding proteins, etc., are incorporated from established DBs and literature. The database is complemented by analytical tools. User sequences calculations are available. Case studies discovered electrostatics complementing DNA bending in E.coli plasmid BNT2 promoter functioning, possibly affecting host-environment metabolic switch. Transcription factors binding sites gravitate to high potential regions, confirming the electrostatics universal importance in protein-DNA interactions beyond the classical promoter-RNA polymerase recognition and regulation. Other genome elements, such as terminators, also show electrostatic peculiarities. Most intriguing are gene starts, exhibiting taxonomic correlations. The necessity of the genome electrostatic properties studies is discussed.

  7. SorghumFDB: sorghum functional genomics database with multidimensional network analysis.

    PubMed

    Tian, Tian; You, Qi; Zhang, Liwei; Yi, Xin; Yan, Hengyu; Xu, Wenying; Su, Zhen

    2016-01-01

    Sorghum (Sorghum bicolor [L.] Moench) has excellent agronomic traits and biological properties, such as heat and drought-tolerance. It is a C4 grass and potential bioenergy-producing plant, which makes it an important crop worldwide. With the sorghum genome sequence released, it is essential to establish a sorghum functional genomics data mining platform. We collected genomic data and some functional annotations to construct a sorghum functional genomics database (SorghumFDB). SorghumFDB integrated knowledge of sorghum gene family classifications (transcription regulators/factors, carbohydrate-active enzymes, protein kinases, ubiquitins, cytochrome P450, monolignol biosynthesis related enzymes, R-genes and organelle-genes), detailed gene annotations, miRNA and target gene information, orthologous pairs in the model plants Arabidopsis, rice and maize, gene loci conversions and a genome browser. We further constructed a dynamic network of multidimensional biological relationships, comprised of the co-expression data, protein-protein interactions and miRNA-target pairs. We took effective measures to combine the network, gene set enrichment and motif analyses to determine the key regulators that participate in related metabolic pathways, such as the lignin pathway, which is a major biological process in bioenergy-producing plants.Database URL: http://structuralbiology.cau.edu.cn/sorghum/index.html. © The Author(s) 2016. Published by Oxford University Press.

  8. A searchable database for the genome of Phomopsis longicolla (isolate MSPL 10-6).

    PubMed

    Darwish, Omar; Li, Shuxian; May, Zane; Matthews, Benjamin; Alkharouf, Nadim W

    2016-01-01

    Phomopsis longicolla (syn. Diaporthe longicolla) is an important seed-borne fungal pathogen that primarily causes Phomopsis seed decay (PSD) in most soybean production areas worldwide. This disease severely decreases soybean seed quality by reducing seed viability and oil quality, altering seed composition, and increasing frequencies of moldy and/or split beans. To facilitate investigation of the genetic base of fungal virulence factors and understand the mechanism of disease development, we designed and developed a database for P. longicolla isolate MSPL 10-6 that contains information about the genome assemblies (contigs), gene models, gene descriptions and GO functional ontologies. A web-based front end to the database was built using ASP.NET, which allows researchers to search and mine the genome of this important fungus. This database represents the first reported genome database for a seed borne fungal pathogen in the Diaporthe- Phomopsis complex. The database will also be a valuable resource for research and agricultural communities. It will aid in the development of new control strategies for this pathogen. http://bioinformatics.towson.edu/Phomopsis_longicolla/HomePage.aspx.

  9. A searchable database for the genome of Phomopsis longicolla (isolate MSPL 10-6)

    PubMed Central

    May, Zane; Matthews, Benjamin; Alkharouf, Nadim W.

    2016-01-01

    Phomopsis longicolla (syn. Diaporthe longicolla) is an important seed-borne fungal pathogen that primarily causes Phomopsis seed decay (PSD) in most soybean production areas worldwide. This disease severely decreases soybean seed quality by reducing seed viability and oil quality, altering seed composition, and increasing frequencies of moldy and/or split beans. To facilitate investigation of the genetic base of fungal virulence factors and understand the mechanism of disease development, we designed and developed a database for P. longicolla isolate MSPL 10-6 that contains information about the genome assemblies (contigs), gene models, gene descriptions and GO functional ontologies. A web-based front end to the database was built using ASP.NET, which allows researchers to search and mine the genome of this important fungus. This database represents the first reported genome database for a seed borne fungal pathogen in the Diaporthe– Phomopsis complex. The database will also be a valuable resource for research and agricultural communities. It will aid in the development of new control strategies for this pathogen. Availability: http://bioinformatics.towson.edu/Phomopsis_longicolla/HomePage.aspx PMID:28197060

  10. GenomeHubs: simple containerized setup of a custom Ensembl database and web server for any species

    PubMed Central

    Kumar, Sujai; Stevens, Lewis; Blaxter, Mark

    2017-01-01

    Abstract As the generation and use of genomic datasets is becoming increasingly common in all areas of biology, the need for resources to collate, analyse and present data from one or more genome projects is becoming more pressing. The Ensembl platform is a powerful tool to make genome data and cross-species analyses easily accessible through a web interface and a comprehensive application programming interface. Here we introduce GenomeHubs, which provide a containerized environment to facilitate the setup and hosting of custom Ensembl genome browsers. This simplifies mirroring of existing content and import of new genomic data into the Ensembl database schema. GenomeHubs also provide a set of analysis containers to decorate imported genomes with results of standard analyses and functional annotations and support export to flat files, including EMBL format for submission of assemblies and annotations to International Nucleotide Sequence Database Collaboration. Database URL: http://GenomeHubs.org PMID:28605774

  11. iMETHYL: an integrative database of human DNA methylation, gene expression, and genomic variation.

    PubMed

    Komaki, Shohei; Shiwa, Yuh; Furukawa, Ryohei; Hachiya, Tsuyoshi; Ohmomo, Hideki; Otomo, Ryo; Satoh, Mamoru; Hitomi, Jiro; Sobue, Kenji; Sasaki, Makoto; Shimizu, Atsushi

    2018-01-01

    We launched an integrative multi-omics database, iMETHYL (http://imethyl.iwate-megabank.org). iMETHYL provides whole-DNA methylation (~24 million autosomal CpG sites), whole-genome (~9 million single-nucleotide variants), and whole-transcriptome (>14 000 genes) data for CD4 + T-lymphocytes, monocytes, and neutrophils collected from approximately 100 subjects. These data were obtained from whole-genome bisulfite sequencing, whole-genome sequencing, and whole-transcriptome sequencing, making iMETHYL a comprehensive database.

  12. RefSeq microbial genomes database: new representation and annotation strategy.

    PubMed

    Tatusova, Tatiana; Ciufo, Stacy; Fedorov, Boris; O'Neill, Kathleen; Tolstoy, Igor

    2014-01-01

    The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives. These can be accessed through the Entrez search and retrieval system at http://www.ncbi.nlm.nih.gov/genome. Next-generation sequencing has enabled researchers to perform genomic sequencing at rates that were unimaginable in the past. Microbial genomes can now be sequenced in a matter of hours, which has led to a significant increase in the number of assembled genomes deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been developed for the annotation and representation of reference genomes and sequence variations derived from population studies and clinical outbreaks.

  13. CottonGen: a genomics, genetics and breeding database for cotton research

    USDA-ARS?s Scientific Manuscript database

    CottonGen (http://www.cottongen.org) is a curated and integrated web-based relational database providing access to publicly available genomic, genetic and breeding data for cotton. CottonGen supercedes CottonDB and the Cotton Marker Database, with enhanced tools for easier data sharing, mining, vis...

  14. Genomics reveals traces of fungal phenylpropanoid-flavonoid metabolic pathway in the f ilamentous fungus Aspergillus oryzae.

    PubMed

    Juvvadi, Praveen Rao; Seshime, Yasuyo; Kitamoto, Katsuhiko

    2005-12-01

    Fungal secondary metabolites constitute a wide variety of compounds which either play a vital role in agricultural, pharmaceutical and industrial contexts, or have devastating effects on agriculture, animal and human affairs by virtue of their toxigenicity. Owing to their beneficial and deleterious characteristics, these complex compounds and the genes responsible for their synthesis have been the subjects of extensive investigation by microbiologists and pharmacologists. A majority of the fungal secondary metabolic genes are classified as type I polyketide synthases (PKS) which are often clustered with other secondary metabolism related genes. In this review we discuss on the significance of our recent discovery of chalcone synthase (CHS) genes belonging to the type III PKS superfamily in an industrially important fungus, Aspergillus oryzae. CHS genes are known to play a vital role in the biosynthesis of flavonoids in plants. A comparative genome analyses revealed the unique character of A. oryzae with four CHS-like genes (csyA, csyB, csyC and csyD) amongst other Aspergilli (Aspergillus nidulans and Aspergillus fumigatus) which contained none of the CHS-like genes. Some other fungi such as Neurospora crassa, Fusarium graminearum, Magnaporthe grisea, Podospora anserina and Phanerochaete chrysosporium also contained putative type III PKSs, with a phylogenic distinction from bacteria and plants. The enzymatically active nature of these newly discovered homologues is expected owing to the conservation in the catalytic residues across the different species of plants and fungi, and also by the fact that a majority of these genes (csyA, csyB and csyD) were expressed in A. oryzae. While this finding brings filamentous fungi closer to plants and bacteria which until recently were the only ones considered to possess the type III PKSs, the presence of putative genes encoding other principal enzymes involved in the phenylpropanoid and flavonoid biosynthesis (viz

  15. Genome Sequence of the White Koji Mold Aspergillus kawachii IFO 4308, Used for Brewing the Japanese Distilled Spirit Shochu

    PubMed Central

    Futagami, Taiki; Mori, Kazuki; Yamashita, Ayaka; Wada, Shotaro; Kajiwara, Yasuhiro; Takashita, Hideharu; Omori, Toshiro; Takegawa, Kaoru; Tashiro, Kosuke; Kuhara, Satoru; Goto, Masatoshi

    2011-01-01

    The filamentous fungus Aspergillus kawachii has traditionally been used for brewing the Japanese distilled spirit shochu. A. kawachii characteristically hyperproduces citric acid and a variety of polysaccharide glycoside hydrolases. Here the genome sequence of A. kawachii IFO 4308 was determined and annotated. Analysis of the sequence may provide insight into the properties of this fungus that make it superior for use in shochu production, leading to the further development of A. kawachii for industrial applications. PMID:22045919

  16. Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kuo, Alan; Grigoriev, Igor

    2009-04-17

    Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentousmore » ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.« less

  17. DRDB: An Online Date Palm Genomic Resource Database.

    PubMed

    He, Zilong; Zhang, Chengwei; Liu, Wanfei; Lin, Qiang; Wei, Ting; Aljohi, Hasan A; Chen, Wei-Hua; Hu, Songnian

    2017-01-01

    Background: Date palm ( Phoenix dactylifera L.) is a cultivated woody plant with agricultural and economic importance in many countries around the world. With the advantages of next generation sequencing technologies, genome sequences for many date palm cultivars have been released recently. Short sequence repeat (SSR) and single nucleotide polymorphism (SNP) can be identified from these genomic data, and have been proven to be very useful biomarkers in plant genome analysis and breeding. Results: Here, we first improved the date palm genome assembly using 130X of HiSeq data generated in our lab. Then 246,445 SSRs (214,901 SSRs and 31,544 compound SSRs) were annotated in this genome assembly; among the SSRs, mononucleotide SSRs (58.92%) were the most abundant, followed by di- (29.92%), tri- (8.14%), tetra- (2.47%), penta- (0.36%), and hexa-nucleotide SSRs (0.19%). The high-quality PCR primer pairs were designed for most (174,497; 70.81% out of total) SSRs. We also annotated 6,375,806 SNPs with raw read depth≥3 in 90% cultivars. To further reduce false positive SNPs, we only kept 5,572,650 (87.40% out of total) SNPs with at least 20% cultivars support for downstream analyses. The high-quality PCR primer pairs were also obtained for 4,177,778 (65.53%) SNPs. We reconstructed the phylogenetic relationships among the 62 cultivars using these variants and found that they can be divided into three clusters, namely North Africa, Egypt - Sudan, and Middle East - South Asian, with Egypt - Sudan being the admixture of North Africa and Middle East - South Asian cultivars; we further confirmed these clusters using principal component analysis. Moreover, 34,346 SSRs and 4,177,778 SNPs with PCR primers were assigned to shared cultivars for cultivar classification and diversity analysis. All these SSRs, SNPs and their classification are available in our database, and can be used for cultivar identification, comparison, and molecular breeding. Conclusion: DRDB is a comprehensive

  18. DRDB: An Online Date Palm Genomic Resource Database

    PubMed Central

    He, Zilong; Zhang, Chengwei; Liu, Wanfei; Lin, Qiang; Wei, Ting; Aljohi, Hasan A.; Chen, Wei-Hua; Hu, Songnian

    2017-01-01

    Background: Date palm (Phoenix dactylifera L.) is a cultivated woody plant with agricultural and economic importance in many countries around the world. With the advantages of next generation sequencing technologies, genome sequences for many date palm cultivars have been released recently. Short sequence repeat (SSR) and single nucleotide polymorphism (SNP) can be identified from these genomic data, and have been proven to be very useful biomarkers in plant genome analysis and breeding. Results: Here, we first improved the date palm genome assembly using 130X of HiSeq data generated in our lab. Then 246,445 SSRs (214,901 SSRs and 31,544 compound SSRs) were annotated in this genome assembly; among the SSRs, mononucleotide SSRs (58.92%) were the most abundant, followed by di- (29.92%), tri- (8.14%), tetra- (2.47%), penta- (0.36%), and hexa-nucleotide SSRs (0.19%). The high-quality PCR primer pairs were designed for most (174,497; 70.81% out of total) SSRs. We also annotated 6,375,806 SNPs with raw read depth≥3 in 90% cultivars. To further reduce false positive SNPs, we only kept 5,572,650 (87.40% out of total) SNPs with at least 20% cultivars support for downstream analyses. The high-quality PCR primer pairs were also obtained for 4,177,778 (65.53%) SNPs. We reconstructed the phylogenetic relationships among the 62 cultivars using these variants and found that they can be divided into three clusters, namely North Africa, Egypt – Sudan, and Middle East – South Asian, with Egypt – Sudan being the admixture of North Africa and Middle East – South Asian cultivars; we further confirmed these clusters using principal component analysis. Moreover, 34,346 SSRs and 4,177,778 SNPs with PCR primers were assigned to shared cultivars for cultivar classification and diversity analysis. All these SSRs, SNPs and their classification are available in our database, and can be used for cultivar identification, comparison, and molecular breeding. Conclusion: DRDB is a

  19. The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata.

    PubMed

    Liolios, Konstantinos; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Kyrpides, Nikos C

    2008-01-01

    The Genomes On Line Database (GOLD) is a comprehensive resource that provides information on genome and metagenome projects worldwide. Complete and ongoing projects and their associated metadata can be accessed in GOLD through pre-computed lists and a search page. As of September 2007, GOLD contains information on more than 2900 sequencing projects, out of which 639 have been completed and their sequence data deposited in the public databases. GOLD continues to expand with the goal of providing metadata information related to the projects and the organisms/environments towards the Minimum Information about a Genome Sequence' (MIGS) guideline. GOLD is available at http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece at http://gold.imbb.forth.gr/

  20. Aspergillus flavus: human pathogen, allergen and mycotoxin producer.

    PubMed

    Hedayati, M T; Pasqualotto, A C; Warn, P A; Bowyer, P; Denning, D W

    2007-06-01

    Aspergillus infections have grown in importance in the last years. However, most of the studies have focused on Aspergillus fumigatus, the most prevalent species in the genus. In certain locales and hospitals, Aspergillus flavus is more common in air than A. fumigatus, for unclear reasons. After A. fumigatus, A. flavus is the second leading cause of invasive aspergillosis and it is the most common cause of superficial infection. Experimental invasive infections in mice show A. flavus to be 100-fold more virulent than A. fumigatus in terms of inoculum required. Particularly common clinical syndromes associated with A. flavus include chronic granulomatous sinusitis, keratitis, cutaneous aspergillosis, wound infections and osteomyelitis following trauma and inoculation. Outbreaks associated with A. flavus appear to be associated with single or closely related strains, in contrast to those associated with A. fumigatus. In addition, A. flavus produces aflatoxins, the most toxic and potent hepatocarcinogenic natural compounds ever characterized. Accurate species identification within Aspergillus flavus complex remains difficult due to overlapping morphological and biochemical characteristics, and much taxonomic and population genetics work is necessary to better understand the species and related species. The flavus complex currently includes 23 species or varieties, including two sexual species, Petromyces alliaceus and P. albertensis. The genome of the highly related Aspergillus oryzae is completed and available; that of A. flavus in the final stages of annotation. Our understanding of A. flavus lags far behind that of A. fumigatus. Studies of the genomics, taxonomy, population genetics, pathogenicity, allergenicity and antifungal susceptibility of A. flavus are all required.

  1. Development of a genome editing technique using the CRISPR/Cas9 system in the industrial filamentous fungus Aspergillus oryzae.

    PubMed

    Katayama, Takuya; Tanaka, Yuki; Okabe, Tomoya; Nakamura, Hidetoshi; Fujii, Wataru; Kitamoto, Katsuhiko; Maruyama, Jun-Ichi

    2016-04-01

    To develop a genome editing method using the CRISPR/Cas9 system in Aspergillus oryzae, the industrial filamentous fungus used in Japanese traditional fermentation and for the production of enzymes and heterologous proteins. To develop the CRISPR/Cas9 system as a genome editing technique for A. oryzae, we constructed plasmids expressing the gene encoding Cas9 nuclease and single guide RNAs for the mutagenesis of target genes. We introduced these into an A. oryzae strain and obtained transformants containing mutations within each target gene that exhibited expected phenotypes. The mutational rates ranged from 10 to 20 %, and 1 bp deletions or insertions were the most commonly induced mutations. We developed a functional and versatile genome editing method using the CRISPR/Cas9 system in A. oryzae. This technique will contribute to the use of efficient targeted mutagenesis in many A. oryzae industrial strains.

  2. Genomic analysis of the aconidial and high-performance protein producer, industrially relevant Aspergillus niger SH2 strain.

    PubMed

    Yin, Chao; Wang, Bin; He, Pan; Lin, Ying; Pan, Li

    2014-05-15

    Aspergillus niger is usually regarded as a beneficial species widely used in biotechnological industry. Obtaining the genome sequence of the widely used aconidial A. niger SH2 strain is of great importance to understand its unusual production capability. In this study we assembled a high-quality genome sequence of A. niger SH2 with approximately 11,517 ORFs. Relatively high proportion of genes enriched for protein expression related FunCat items verify its efficient capacity in protein production. Furthermore, genome-wide comparative analysis between A. niger SH2 and CBS513.88 reveals insights into unique properties of A. niger SH2. A. niger SH2 lacks the gene related with the initiation of asexual sporulation (PrpA), leading to its distinct aconidial phenotype. Frame shift mutations and non-synonymous SNPs in genes of cell wall integrity signaling, β-1,3-glucan synthesis and chitin synthesis influence its cell wall development which is important for its hyphal fragmentation during industrial high-efficiency protein production. Copyright © 2014 Elsevier B.V. All rights reserved.

  3. PSSRdb: a relational database of polymorphic simple sequence repeats extracted from prokaryotic genomes.

    PubMed

    Kumar, Pankaj; Chaitanya, Pasumarthy S; Nagarajaram, Hampapathalu A

    2011-01-01

    PSSRdb (Polymorphic Simple Sequence Repeats database) (http://www.cdfd.org.in/PSSRdb/) is a relational database of polymorphic simple sequence repeats (PSSRs) extracted from 85 different species of prokaryotes. Simple sequence repeats (SSRs) are the tandem repeats of nucleotide motifs of the sizes 1-6 bp and are highly polymorphic. SSR mutations in and around coding regions affect transcription and translation of genes. Such changes underpin phase variations and antigenic variations seen in some bacteria. Although SSR-mediated phase variation and antigenic variations have been well-studied in some bacteria there seems a lot of other species of prokaryotes yet to be investigated for SSR mediated adaptive and other evolutionary advantages. As a part of our on-going studies on SSR polymorphism in prokaryotes we compared the genome sequences of various strains and isolates available for 85 different species of prokaryotes and extracted a number of SSRs showing length variations and created a relational database called PSSRdb. This database gives useful information such as location of PSSRs in genomes, length variation across genomes, the regions harboring PSSRs, etc. The information provided in this database is very useful for further research and analysis of SSRs in prokaryotes.

  4. Proteomics of eukaryotic microorganisms: The medically and biotechnologically important fungal genus Aspergillus.

    PubMed

    Kniemeyer, Olaf

    2011-08-01

    Fungal species of the genus Aspergillus play significant roles as model organisms in basic research, as "cell factories" for the production of organic acids, pharmaceuticals or industrially important enzymes and as pathogens causing superficial and invasive infections in animals and humans. The release of the genome sequences of several Aspergillus sp. has paved the way for global analyses of protein expression in Aspergilli including the characterisation of proteins, which have not designated any function. With the application of proteomic methods, particularly 2-D gel and LC-MS/MS-based methods, first insights into the composition of the proteome of Aspergilli under different growth and stress conditions could be gained. Putative targets of global regulators led to the improvement of industrially relevant Aspergillus strains and so far not described Aspergillus antigens have already been discovered. Here, I review the recent proteome data generated for the species Aspergillus nidulans, Aspergillus fumigatus, Aspergillus niger, Aspergillus terreus, Aspergillus flavus and Aspergillus oryzae. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Rapid storage and retrieval of genomic intervals from a relational database system using nested containment lists

    PubMed Central

    Wiley, Laura K.; Sivley, R. Michael; Bush, William S.

    2013-01-01

    Efficient storage and retrieval of genomic annotations based on range intervals is necessary, given the amount of data produced by next-generation sequencing studies. The indexing strategies of relational database systems (such as MySQL) greatly inhibit their use in genomic annotation tasks. This has led to the development of stand-alone applications that are dependent on flat-file libraries. In this work, we introduce MyNCList, an implementation of the NCList data structure within a MySQL database. MyNCList enables the storage, update and rapid retrieval of genomic annotations from the convenience of a relational database system. Range-based annotations of 1 million variants are retrieved in under a minute, making this approach feasible for whole-genome annotation tasks. Database URL: https://github.com/bushlab/mynclist PMID:23894185

  6. Rapid storage and retrieval of genomic intervals from a relational database system using nested containment lists.

    PubMed

    Wiley, Laura K; Sivley, R Michael; Bush, William S

    2013-01-01

    Efficient storage and retrieval of genomic annotations based on range intervals is necessary, given the amount of data produced by next-generation sequencing studies. The indexing strategies of relational database systems (such as MySQL) greatly inhibit their use in genomic annotation tasks. This has led to the development of stand-alone applications that are dependent on flat-file libraries. In this work, we introduce MyNCList, an implementation of the NCList data structure within a MySQL database. MyNCList enables the storage, update and rapid retrieval of genomic annotations from the convenience of a relational database system. Range-based annotations of 1 million variants are retrieved in under a minute, making this approach feasible for whole-genome annotation tasks. Database URL: https://github.com/bushlab/mynclist.

  7. Allergens/Antigens, toxins and polyketides of important Aspergillus species.

    PubMed

    Bhetariya, Preetida J; Madan, Taruna; Basir, Seemi Farhat; Varma, Anupam; Usha, Sarma P

    2011-04-01

    The medical, agricultural and biotechnological importance of the primitive eukaryotic microorganisms, the Fungi was recognized way back in 1920. Among various groups of fungi, the Aspergillus species are studied in great detail using advances in genomics and proteomics to unravel biological and molecular mechanisms in these fungi. Aspergillus fumigatus, Aspergillus flavus, Aspergillus niger, Aspergillus parasiticus, Aspergillus nidulans and Aspergillus terreus are some of the important species relevant to human, agricultural and biotechnological applications. The potential of Aspergillus species to produce highly diversified complex biomolecules such as multifunctional proteins (allergens, antigens, enzymes) and polyketides is fascinating and demands greater insight into the understanding of these fungal species for application to human health. Recently a regulator gene for secondary metabolites, LaeA has been identified. Gene mining based on LaeA has facilitated new metabolites with antimicrobial activity such as emericellamides and antitumor activity such as terrequinone A from A. nidulans. Immunoproteomic approach was reported for identification of few novel allergens for A. fumigatus. In this context, the review is focused on recent developments in allergens, antigens, structural and functional diversity of the polyketide synthases that produce polyketides of pharmaceutical and biological importance. Possible antifungal drug targets for development of effective antifungal drugs and new strategies for development of molecular diagnostics are considered.

  8. The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata

    PubMed Central

    Liolios, Konstantinos; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Kyrpides, Nikos C.

    2008-01-01

    The Genomes On Line Database (GOLD) is a comprehensive resource that provides information on genome and metagenome projects worldwide. Complete and ongoing projects and their associated metadata can be accessed in GOLD through pre-computed lists and a search page. As of September 2007, GOLD contains information on more than 2900 sequencing projects, out of which 639 have been completed and their sequence data deposited in the public databases. GOLD continues to expand with the goal of providing metadata information related to the projects and the organisms/environments towards the Minimum Information about a Genome Sequence’ (MIGS) guideline. GOLD is available at http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece at http://gold.imbb.forth.gr/ PMID:17981842

  9. PvTFDB: a Phaseolus vulgaris transcription factors database for expediting functional genomics in legumes

    PubMed Central

    Bhawna; Bonthala, V.S.; Gajula, MNV Prasad

    2016-01-01

    The common bean [Phaseolus vulgaris (L.)] is one of the essential proteinaceous vegetables grown in developing countries. However, its production is challenged by low yields caused by numerous biotic and abiotic stress conditions. Regulatory transcription factors (TFs) symbolize a key component of the genome and are the most significant targets for producing stress tolerant crop and hence functional genomic studies of these TFs are important. Therefore, here we have constructed a web-accessible TFs database for P. vulgaris, called PvTFDB, which contains 2370 putative TF gene models in 49 TF families. This database provides a comprehensive information for each of the identified TF that includes sequence data, functional annotation, SSRs with their primer sets, protein physical properties, chromosomal location, phylogeny, tissue-specific gene expression data, orthologues, cis-regulatory elements and gene ontology (GO) assignment. Altogether, this information would be used in expediting the functional genomic studies of a specific TF(s) of interest. The objectives of this database are to understand functional genomics study of common bean TFs and recognize the regulatory mechanisms underlying various stress responses to ease breeding strategy for variety production through a couple of search interfaces including gene ID, functional annotation and browsing interfaces including by family and by chromosome. This database will also serve as a promising central repository for researchers as well as breeders who are working towards crop improvement of legume crops. In addition, this database provide the user unrestricted public access and the user can download entire data present in the database freely. Database URL: http://www.multiomics.in/PvTFDB/ PMID:27465131

  10. Sinbase: an integrated database to study genomics, genetics and comparative genomics in Sesamum indicum.

    PubMed

    Wang, Linhai; Yu, Jingyin; Li, Donghua; Zhang, Xiurong

    2015-01-01

    Sesame (Sesamum indicum L.) is an ancient and important oilseed crop grown widely in tropical and subtropical areas. It belongs to the gigantic order Lamiales, which includes many well-known or economically important species, such as olive (Olea europaea), leonurus (Leonurus japonicus) and lavender (Lavandula spica), many of which have important pharmacological properties. Despite their importance, genetic and genomic analyses on these species have been insufficient due to a lack of reference genome information. The now available S. indicum genome will provide an unprecedented opportunity for studying both S. indicum genetic traits and comparative genomics. To deliver S. indicum genomic information to the worldwide research community, we designed Sinbase, a web-based database with comprehensive sesame genomic, genetic and comparative genomic information. Sinbase includes sequences of assembled sesame pseudomolecular chromosomes, protein-coding genes (27,148), transposable elements (372,167) and non-coding RNAs (1,748). In particular, Sinbase provides unique and valuable information on colinear regions with various plant genomes, including Arabidopsis thaliana, Glycine max, Vitis vinifera and Solanum lycopersicum. Sinbase also provides a useful search function and data mining tools, including a keyword search and local BLAST service. Sinbase will be updated regularly with new features, improvements to genome annotation and new genomic sequences, and is freely accessible at http://ocri-genomics.org/Sinbase/. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  11. MOSAIC: an online database dedicated to the comparative genomics of bacterial strains at the intra-species level.

    PubMed

    Chiapello, Hélène; Gendrault, Annie; Caron, Christophe; Blum, Jérome; Petit, Marie-Agnès; El Karoui, Meriem

    2008-11-27

    The recent availability of complete sequences for numerous closely related bacterial genomes opens up new challenges in comparative genomics. Several methods have been developed to align complete genomes at the nucleotide level but their use and the biological interpretation of results are not straightforward. It is therefore necessary to develop new resources to access, analyze, and visualize genome comparisons. Here we present recent developments on MOSAIC, a generalist comparative bacterial genome database. This database provides the bacteriologist community with easy access to comparisons of complete bacterial genomes at the intra-species level. The strategy we developed for comparison allows us to define two types of regions in bacterial genomes: backbone segments (i.e., regions conserved in all compared strains) and variable segments (i.e., regions that are either specific to or variable in one of the aligned genomes). Definition of these segments at the nucleotide level allows precise comparative and evolutionary analyses of both coding and non-coding regions of bacterial genomes. Such work is easily performed using the MOSAIC Web interface, which allows browsing and graphical visualization of genome comparisons. The MOSAIC database now includes 493 pairwise comparisons and 35 multiple maximal comparisons representing 78 bacterial species. Genome conserved regions (backbones) and variable segments are presented in various formats for further analysis. A graphical interface allows visualization of aligned genomes and functional annotations. The MOSAIC database is available online at http://genome.jouy.inra.fr/mosaic.

  12. Aspergillus niger contains the cryptic phylogenetic species A. awamori.

    PubMed

    Perrone, Giancarlo; Stea, Gaetano; Epifani, Filomena; Varga, János; Frisvad, Jens C; Samson, Robert A

    2011-11-01

    Aspergillus section Nigri is an important group of species for food and medical mycology, and biotechnology. The Aspergillus niger 'aggregate' represents its most complicated taxonomic subgroup containing eight morphologically indistinguishable taxa: A. niger, Aspergillus tubingensis, Aspergillus acidus, Aspergillus brasiliensis, Aspergillus costaricaensis, Aspergillus lacticoffeatus, Aspergillus piperis, and Aspergillus vadensis. Aspergillus awamori, first described by Nakazawa, has been compared taxonomically with other black aspergilli and recently it has been treated as a synonym of A. niger. Phylogenetic analyses of sequences generated from portions of three genes coding for the proteins β-tubulin (benA), calmodulin (CaM), and the translation elongation factor-1 alpha (TEF-1α) of a population of A. niger strains isolated from grapes in Europe revealed the presence of a cryptic phylogenetic species within this population, A. awamori. Morphological, physiological, ecological and chemical data overlap occurred between A. niger and the cryptic A. awamori, however the splitting of these two species was also supported by AFLP analysis of the full genome. Isolates in both phylospecies can produce the mycotoxins ochratoxin A and fumonisin B₂, and they also share the production of pyranonigrin A, tensidol B, funalenone, malformins, and naphtho-γ-pyrones. In addition, sequence analysis of four putative A. awamori strains from Japan, used in the koji industrial fermentation, revealed that none of these strains belong to the A. awamori phylospecies. Copyright © 2011 British Mycological Society. Published by Elsevier Ltd. All rights reserved.

  13. Mouse Genome Database: From sequence to phenotypes and disease models

    PubMed Central

    Richardson, Joel E.; Kadin, James A.; Smith, Cynthia L.; Blake, Judith A.; Bult, Carol J.

    2015-01-01

    Summary The Mouse Genome Database (MGD, www.informatics.jax.org) is the international scientific database for genetic, genomic, and biological data on the laboratory mouse to support the research requirements of the biomedical community. To accomplish this goal, MGD provides broad data coverage, serves as the authoritative standard for mouse nomenclature for genes, mutants, and strains, and curates and integrates many types of data from literature and electronic sources. Among the key data sets MGD supports are: the complete catalog of mouse genes and genome features, comparative homology data for mouse and vertebrate genes, the authoritative set of Gene Ontology (GO) annotations for mouse gene functions, a comprehensive catalog of mouse mutations and their phenotypes, and a curated compendium of mouse models of human diseases. Here, we describe the data acquisition process, specifics about MGD's key data areas, methods to access and query MGD data, and outreach and user help facilities. genesis 53:458–473, 2015. © 2015 The Authors. Genesis Published by Wiley Periodicals, Inc. PMID:26150326

  14. Expanded national database collection and data coverage in the FINDbase worldwide database for clinically relevant genomic variation allele frequencies

    PubMed Central

    Viennas, Emmanouil; Komianou, Angeliki; Mizzi, Clint; Stojiljkovic, Maja; Mitropoulou, Christina; Muilu, Juha; Vihinen, Mauno; Grypioti, Panagiota; Papadaki, Styliani; Pavlidis, Cristiana; Zukic, Branka; Katsila, Theodora; van der Spek, Peter J.; Pavlovic, Sonja; Tzimas, Giannis; Patrinos, George P.

    2017-01-01

    FINDbase (http://www.findbase.org) is a comprehensive data repository that records the prevalence of clinically relevant genomic variants in various populations worldwide, such as pathogenic variants leading mostly to monogenic disorders and pharmacogenomics biomarkers. The database also records the incidence of rare genetic diseases in various populations, all in well-distinct data modules. Here, we report extensive data content updates in all data modules, with direct implications to clinical pharmacogenomics. Also, we report significant new developments in FINDbase, namely (i) the release of a new version of the ETHNOS software that catalyzes development curation of national/ethnic genetic databases, (ii) the migration of all FINDbase data content into 90 distinct national/ethnic mutation databases, all built around Microsoft's PivotViewer (http://www.getpivot.com) software (iii) new data visualization tools and (iv) the interrelation of FINDbase with DruGeVar database with direct implications in clinical pharmacogenomics. The abovementioned updates further enhance the impact of FINDbase, as a key resource for Genomic Medicine applications. PMID:27924022

  15. Evaluation of Aspergillus PCR protocols for testing serum specimens.

    PubMed

    White, P Lewis; Mengoli, Carlo; Bretagne, Stéphane; Cuenca-Estrella, Manuel; Finnstrom, Niklas; Klingspor, Lena; Melchers, Willem J G; McCulloch, Elaine; Barnes, Rosemary A; Donnelly, J Peter; Loeffler, Juergen

    2011-11-01

    A panel of human serum samples spiked with various amounts of Aspergillus fumigatus genomic DNA was distributed to 23 centers within the European Aspergillus PCR Initiative to determine analytical performance of PCR. Information regarding specific methodological components and PCR performance was requested. The information provided was made anonymous, and meta-regression analysis was performed to determine any procedural factors that significantly altered PCR performance. Ninety-seven percent of protocols were able to detect a threshold of 10 genomes/ml on at least one occasion, with 83% of protocols reproducibly detecting this concentration. Sensitivity and specificity were 86.1% and 93.6%, respectively. Positive associations between sensitivity and the use of larger sample volumes, an internal control PCR, and PCR targeting the internal transcribed spacer (ITS) region were shown. Negative associations between sensitivity and the use of larger elution volumes (≥100 μl) and PCR targeting the mitochondrial genes were demonstrated. Most Aspergillus PCR protocols used to test serum generate satisfactory analytical performance. Testing serum requires less standardization, and the specific recommendations shown in this article will only improve performance.

  16. PGSB PlantsDB: updates to the database framework for comparative plant genome research.

    PubMed

    Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai C; Martis, Mihaela M; Seidel, Michael; Kugler, Karl G; Gundlach, Heidrun; Mayer, Klaus F X

    2016-01-04

    PGSB (Plant Genome and Systems Biology: formerly MIPS) PlantsDB (http://pgsb.helmholtz-muenchen.de/plant/index.jsp) is a database framework for the comparative analysis and visualization of plant genome data. The resource has been updated with new data sets and types as well as specialized tools and interfaces to address user demands for intuitive access to complex plant genome data. In its latest incarnation, we have re-worked both the layout and navigation structure and implemented new keyword search options and a new BLAST sequence search functionality. Actively involved in corresponding sequencing consortia, PlantsDB has dedicated special efforts to the integration and visualization of complex triticeae genome data, especially for barley, wheat and rye. We enhanced CrowsNest, a tool to visualize syntenic relationships between genomes, with data from the wheat sub-genome progenitor Aegilops tauschii and added functionality to the PGSB RNASeqExpressionBrowser. GenomeZipper results were integrated for the genomes of barley, rye, wheat and perennial ryegrass and interactive access is granted through PlantsDB interfaces. Data exchange and cross-linking between PlantsDB and other plant genome databases is stimulated by the transPLANT project (http://transplantdb.eu/). © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. BioQ: tracing experimental origins in public genomic databases using a novel data provenance model.

    PubMed

    Saccone, Scott F; Quan, Jiaxi; Jones, Peter L

    2012-04-15

    Public genomic databases, which are often used to guide genetic studies of human disease, are now being applied to genomic medicine through in silico integrative genomics. These databases, however, often lack tools for systematically determining the experimental origins of the data. We introduce a new data provenance model that we have implemented in a public web application, BioQ, for assessing the reliability of the data by systematically tracing its experimental origins to the original subjects and biologics. BioQ allows investigators to both visualize data provenance as well as explore individual elements of experimental process flow using precise tools for detailed data exploration and documentation. It includes a number of human genetic variation databases such as the HapMap and 1000 Genomes projects. BioQ is freely available to the public at http://bioq.saclab.net.

  18. Saccharomyces genome database informs human biology

    PubMed Central

    Skrzypek, Marek S; Nash, Robert S; Wong, Edith D; MacPherson, Kevin A; Karra, Kalpana; Binkley, Gail; Simison, Matt; Miyasato, Stuart R

    2018-01-01

    Abstract The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is an expertly curated database of literature-derived functional information for the model organism budding yeast, Saccharomyces cerevisiae. SGD constantly strives to synergize new types of experimental data and bioinformatics predictions with existing data, and to organize them into a comprehensive and up-to-date information resource. The primary mission of SGD is to facilitate research into the biology of yeast and to provide this wealth of information to advance, in many ways, research on other organisms, even those as evolutionarily distant as humans. To build such a bridge between biological kingdoms, SGD is curating data regarding yeast-human complementation, in which a human gene can successfully replace the function of a yeast gene, and/or vice versa. These data are manually curated from published literature, made available for download, and incorporated into a variety of analysis tools provided by SGD. PMID:29140510

  19. Saccharomyces genome database informs human biology.

    PubMed

    Skrzypek, Marek S; Nash, Robert S; Wong, Edith D; MacPherson, Kevin A; Hellerstedt, Sage T; Engel, Stacia R; Karra, Kalpana; Weng, Shuai; Sheppard, Travis K; Binkley, Gail; Simison, Matt; Miyasato, Stuart R; Cherry, J Michael

    2018-01-04

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is an expertly curated database of literature-derived functional information for the model organism budding yeast, Saccharomyces cerevisiae. SGD constantly strives to synergize new types of experimental data and bioinformatics predictions with existing data, and to organize them into a comprehensive and up-to-date information resource. The primary mission of SGD is to facilitate research into the biology of yeast and to provide this wealth of information to advance, in many ways, research on other organisms, even those as evolutionarily distant as humans. To build such a bridge between biological kingdoms, SGD is curating data regarding yeast-human complementation, in which a human gene can successfully replace the function of a yeast gene, and/or vice versa. These data are manually curated from published literature, made available for download, and incorporated into a variety of analysis tools provided by SGD. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. PvTFDB: a Phaseolus vulgaris transcription factors database for expediting functional genomics in legumes.

    PubMed

    Bhawna; Bonthala, V S; Gajula, Mnv Prasad

    2016-01-01

    The common bean [Phaseolus vulgaris (L.)] is one of the essential proteinaceous vegetables grown in developing countries. However, its production is challenged by low yields caused by numerous biotic and abiotic stress conditions. Regulatory transcription factors (TFs) symbolize a key component of the genome and are the most significant targets for producing stress tolerant crop and hence functional genomic studies of these TFs are important. Therefore, here we have constructed a web-accessible TFs database for P. vulgaris, called PvTFDB, which contains 2370 putative TF gene models in 49 TF families. This database provides a comprehensive information for each of the identified TF that includes sequence data, functional annotation, SSRs with their primer sets, protein physical properties, chromosomal location, phylogeny, tissue-specific gene expression data, orthologues, cis-regulatory elements and gene ontology (GO) assignment. Altogether, this information would be used in expediting the functional genomic studies of a specific TF(s) of interest. The objectives of this database are to understand functional genomics study of common bean TFs and recognize the regulatory mechanisms underlying various stress responses to ease breeding strategy for variety production through a couple of search interfaces including gene ID, functional annotation and browsing interfaces including by family and by chromosome. This database will also serve as a promising central repository for researchers as well as breeders who are working towards crop improvement of legume crops. In addition, this database provide the user unrestricted public access and the user can download entire data present in the database freely.Database URL: http://www.multiomics.in/PvTFDB/. © The Author(s) 2016. Published by Oxford University Press.

  1. PGDD: a database of gene and genome duplication in plants

    PubMed Central

    Lee, Tae-Ho; Tang, Haibao; Wang, Xiyin; Paterson, Andrew H.

    2013-01-01

    Genome duplication (GD) has permanently shaped the architecture and function of many higher eukaryotic genomes. The angiosperms (flowering plants) are outstanding models in which to elucidate consequences of GD for higher eukaryotes, owing to their propensity for chromosomal duplication or even triplication in a few cases. Duplicated genome structures often require both intra- and inter-genome alignments to unravel their evolutionary history, also providing the means to deduce both obvious and otherwise-cryptic orthology, paralogy and other relationships among genes. The burgeoning sets of angiosperm genome sequences provide the foundation for a host of investigations into the functional and evolutionary consequences of gene and GD. To provide genome alignments from a single resource based on uniform standards that have been validated by empirical studies, we built the Plant Genome Duplication Database (PGDD; freely available at http://chibba.agtec.uga.edu/duplication/), a web service providing synteny information in terms of colinearity between chromosomes. At present, PGDD contains data for 26 plants including bryophytes and chlorophyta, as well as angiosperms with draft genome sequences. In addition to the inclusion of new genomes as they become available, we are preparing new functions to enhance PGDD. PMID:23180799

  2. The COG database: new developments in phylogenetic classification of proteins from complete genomes

    PubMed Central

    Tatusov, Roman L.; Natale, Darren A.; Garkavtsev, Igor V.; Tatusova, Tatiana A.; Shankavaram, Uma T.; Rao, Bachoti S.; Kiryutin, Boris; Galperin, Michael Y.; Fedorova, Natalie D.; Koonin, Eugene V.

    2001-01-01

    The database of Clusters of Orthologous Groups of proteins (COGs), which represents an attempt on a phylogenetic classification of the proteins encoded in complete genomes, currently consists of 2791 COGs including 45 350 proteins from 30 genomes of bacteria, archaea and the yeast Saccharomyces cerevisiae (http://www.ncbi.nlm.nih.gov/COG). In addition, a supplement to the COGs is available, in which proteins encoded in the genomes of two multicellular eukaryotes, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster, and shared with bacteria and/or archaea were included. The new features added to the COG database include information pages with structural and functional details on each COG and literature references, improvements of the COGNITOR program that is used to fit new proteins into the COGs, and classification of genomes and COGs constructed by using principal component analysis. PMID:11125040

  3. [The first case of persistent vaginitis due to Aspergillus protuberus in an immunocompetent patient].

    PubMed

    Borsa, Barış Ata; Özgün, Gonca; Houbraken, Jos; Ökmen, Fırat

    2015-01-01

    ITS regions were amplified and sequenced from isolated DNA for genomic characterization. The obtained sequences were compared with the NCBI database and internal databases of the CBS-KNAW Fungal Biodiversity Centre and confirmed as Aspergillus section Versicolores. As a result of recent changes in classification of fungi, analysis of partial β-tubulin and calmodulin sequences have also been used to obtain a detailed and precise characterization. Eventually, the strain has been identified as A.protuberus which is a recently accepted species distinct from Aspergillus section Versicolores. As the patient could not be contacted after the preliminary report, detailed demographical information, probable origin and route of transmission of the agent and prognosis of infection remained obscure. In conclusion, the first case of vaginitis caused by A.protuberus was described in this report with the support of clinical, pathological, microbiological and molecular data.

  4. Exploring Protein Function Using the Saccharomyces Genome Database.

    PubMed

    Wong, Edith D

    2017-01-01

    Elucidating the function of individual proteins will help to create a comprehensive picture of cell biology, as well as shed light on human disease mechanisms, possible treatments, and cures. Due to its compact genome, and extensive history of experimentation and annotation, the budding yeast Saccharomyces cerevisiae is an ideal model organism in which to determine protein function. This information can then be leveraged to infer functions of human homologs. Despite the large amount of research and biological data about S. cerevisiae, many proteins' functions remain unknown. Here, we explore ways to use the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org ) to predict the function of proteins and gain insight into their roles in various cellular processes.

  5. Molecular Genetics Information System (MOLGENIS): alternatives in developing local experimental genomics databases.

    PubMed

    Swertz, Morris A; De Brock, E O; Van Hijum, Sacha A F T; De Jong, Anne; Buist, Girbe; Baerends, Richard J S; Kok, Jan; Kuipers, Oscar P; Jansen, Ritsert C

    2004-09-01

    Genomic research laboratories need adequate infrastructure to support management of their data production and research workflow. But what makes infrastructure adequate? A lack of appropriate criteria makes any decision on buying or developing a system difficult. Here, we report on the decision process for the case of a molecular genetics group establishing a microarray laboratory. Five typical requirements for experimental genomics database systems were identified: (i) evolution ability to keep up with the fast developing genomics field; (ii) a suitable data model to deal with local diversity; (iii) suitable storage of data files in the system; (iv) easy exchange with other software; and (v) low maintenance costs. The computer scientists and the researchers of the local microarray laboratory considered alternative solutions for these five requirements and chose the following options: (i) use of automatic code generation; (ii) a customized data model based on standards; (iii) storage of datasets as black boxes instead of decomposing them in database tables; (iv) loosely linking to other programs for improved flexibility; and (v) a low-maintenance web-based user interface. Our team evaluated existing microarray databases and then decided to build a new system, Molecular Genetics Information System (MOLGENIS), implemented using code generation in a period of three months. This case can provide valuable insights and lessons to both software developers and a user community embarking on large-scale genomic projects. http://www.molgenis.nl

  6. The need for high-quality whole-genome sequence databases in microbial forensics.

    PubMed

    Sjödin, Andreas; Broman, Tina; Melefors, Öjar; Andersson, Gunnar; Rasmusson, Birgitta; Knutsson, Rickard; Forsman, Mats

    2013-09-01

    Microbial forensics is an important part of a strengthened capability to respond to biocrime and bioterrorism incidents to aid in the complex task of distinguishing between natural outbreaks and deliberate acts. The goal of a microbial forensic investigation is to identify and criminally prosecute those responsible for a biological attack, and it involves a detailed analysis of the weapon--that is, the pathogen. The recent development of next-generation sequencing (NGS) technologies has greatly increased the resolution that can be achieved in microbial forensic analyses. It is now possible to identify, quickly and in an unbiased manner, previously undetectable genome differences between closely related isolates. This development is particularly relevant for the most deadly bacterial diseases that are caused by bacterial lineages with extremely low levels of genetic diversity. Whole-genome analysis of pathogens is envisaged to be increasingly essential for this purpose. In a microbial forensic context, whole-genome sequence analysis is the ultimate method for strain comparisons as it is informative during identification, characterization, and attribution--all 3 major stages of the investigation--and at all levels of microbial strain identity resolution (ie, it resolves the full spectrum from family to isolate). Given these capabilities, one bottleneck in microbial forensics investigations is the availability of high-quality reference databases of bacterial whole-genome sequences. To be of high quality, databases need to be curated and accurate in terms of sequences, metadata, and genetic diversity coverage. The development of whole-genome sequence databases will be instrumental in successfully tracing pathogens in the future.

  7. Comparative genomics of citric-acid producing Aspergillus niger ATCC 1015 versus enzyme-producing CBS 513.88

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Andersen, Mikael R.; Salazar, Margarita; Schaap, Peter

    2011-06-01

    The filamentous fungus Aspergillus niger exhibits great diversity in its phenotype. It is found globally, both as marine and terrestrial strains, produces both organic acids and hydrolytic enzymes in high amounts, and some isolates exhibit pathogenicity. Although the genome of an industrial enzyme-producing A. niger strain (CBS 513.88) has already been sequenced, the versatility and diversity of this species compels additional exploration. We therefore undertook whole genome sequencing of the acidogenic A. niger wild type strain (ATCC 1015), and produced a genome sequence of very high quality. Only 15 gaps are present in the sequence and half the telomeric regionsmore » have been elucidated. Moreover, sequence information from ATCC 1015 was utilized to improve the genome sequence of CBS 513.88. Chromosome-level comparisons uncovered several genome rearrangements, deletions, a clear case of strain-specific horizontal gene transfer, and identification of 0.8 megabase of novel sequence. Single nucleotide polymorphisms per kilobase (SNPs/kb) between the two strains were found to be exceptionally high (average: 7.8, maximum: 160 SNPs/kb). High variation within the species was confirmed with exo-metabolite profiling and phylogenetics. Detailed lists of alleles were generated, and genotypic differences were observed to accumulate in metabolic pathways essential to acid production and protein synthesis. A transcriptome analysis revealed up-regulation of the electron transport chain, specifically the alternative oxidative pathway in ATCC 1015, while CBS 513.88 showed significant up regulation of genes associated with biosynthesis of amino acids that are abundant in glucoamylase A, tRNA-synthases and protein transporters.« less

  8. MIPSPlantsDB—plant database resource for integrative and comparative plant genome research

    PubMed Central

    Spannagl, Manuel; Noubibou, Octave; Haase, Dirk; Yang, Li; Gundlach, Heidrun; Hindemitt, Tobias; Klee, Kathrin; Haberer, Georg; Schoof, Heiko; Mayer, Klaus F. X.

    2007-01-01

    Genome-oriented plant research delivers rapidly increasing amount of plant genome data. Comprehensive and structured information resources are required to structure and communicate genome and associated analytical data for model organisms as well as for crops. The increase in available plant genomic data enables powerful comparative analysis and integrative approaches. PlantsDB aims to provide data and information resources for individual plant species and in addition to build a platform for integrative and comparative plant genome research. PlantsDB is constituted from genome databases for Arabidopsis, Medicago, Lotus, rice, maize and tomato. Complementary data resources for cis elements, repetive elements and extensive cross-species comparisons are implemented. The PlantsDB portal can be reached at . PMID:17202173

  9. The mitochondnal genome of Aspergillus nidulans contains reading frames homologous to the human URFs 1 and 4.

    PubMed Central

    Brown, T A; Davies, R W; Ray, J A; Waring, R B; Scazzocchio, C

    1983-01-01

    A 2830-bp segment of the mitochondrial genome of the fungus Aspergillus nidulans was sequenced and shown to contain two unidentified reading frames (URFs). These reading frames are 352 and 488 codons in length, and would specify unmodified proteins of mol. wts. 39,000 and 54,000, respectively. The derived amino acid sequences indicate that these genes are equivalent to the human mitochondrial URFs 1 and 4, with 39% amino acid homology for URF1 and 26% for URF4. Both URFs were shown by secondary structure predictions to code for predominantly beta-sheeted proteins with strong structural conservation between the fungal and human homologues. Counterparts of mammalian URFs have not previously been identified in non-mammalian genomes, and the discovery that A. nidulans possesses reading frames so closely homologous with URF1 and URF4 shows that these genes are of general functional importance in the mitochondria of diverse species. PMID:11894959

  10. The Human Ageing Genomic Resources: online databases and tools for biogerontologists

    PubMed Central

    de Magalhães, João Pedro; Budovsky, Arie; Lehmann, Gilad; Costa, Joana; Li, Yang; Fraifeld, Vadim; Church, George M.

    2009-01-01

    Summary Ageing is a complex, challenging phenomenon that will require multiple, interdisciplinary approaches to unravel its puzzles. To assist basic research on ageing, we developed the Human Ageing Genomic Resources (HAGR). This work provides an overview of the databases and tools in HAGR and describes how the gerontology research community can employ them. Several recent changes and improvements to HAGR are also presented. The two centrepieces in HAGR are GenAge and AnAge. GenAge is a gene database featuring genes associated with ageing and longevity in model organisms, a curated database of genes potentially associated with human ageing, and a list of genes tested for their association with human longevity. A myriad of biological data and information is included for hundreds of genes, making GenAge a reference for research that reflects our current understanding of the genetic basis of ageing. GenAge can also serve as a platform for the systems biology of ageing, and tools for the visualization of protein-protein interactions are also included. AnAge is a database of ageing in animals, featuring over 4,000 species, primarily assembled as a resource for comparative and evolutionary studies of ageing. Longevity records, developmental and reproductive traits, taxonomic information, basic metabolic characteristics, and key observations related to ageing are included in AnAge. Software is also available to aid researchers in the form of Perl modules to automate numerous tasks and as an SPSS script to analyse demographic mortality data. The Human Ageing Genomic Resources are available online at http://genomics.senescence.info. PMID:18986374

  11. Biocuration at the Saccharomyces Genome Database

    PubMed Central

    Skrzypek, Marek S.; Nash, Robert S.

    2015-01-01

    Saccharomyces Genome Database is an online resource dedicated to managing information about the biology and genetics of the model organism, yeast (Saccharomyces cerevisiae). This information is derived primarily from scientific publications through a process of human curation that involves manual extraction of data and their organization into a comprehensive system of knowledge. This system provides a foundation for further analysis of experimental data coming from research on yeast as well as other organisms. In this review we will demonstrate how biocuration and biocurators add a key component, the biological context, to our understanding of how genes, proteins, genomes and cells function and interact. We will explain the role biocurators play in sifting through the wealth of biological data to incorporate and connect key information. We will also discuss the many ways we assist researchers with their various research needs. We hope to convince the reader that manual curation is vital in converting the flood of data into organized and interconnected knowledge, and that biocurators play an essential role in the integration of scientific information into a coherent model of the cell. PMID:25997651

  12. The MaizeGDB Genome Browser tutorial: one example of database outreach to biologists via video.

    PubMed

    Harper, Lisa C; Schaeffer, Mary L; Thistle, Jordan; Gardiner, Jack M; Andorf, Carson M; Campbell, Darwin A; Cannon, Ethalinda K S; Braun, Bremen L; Birkett, Scott M; Lawrence, Carolyn J; Sen, Taner Z

    2011-01-01

    Video tutorials are an effective way for researchers to quickly learn how to use online tools offered by biological databases. At MaizeGDB, we have developed a number of video tutorials that demonstrate how to use various tools and explicitly outline the caveats researchers should know to interpret the information available to them. One such popular video currently available is 'Using the MaizeGDB Genome Browser', which describes how the maize genome was sequenced and assembled as well as how the sequence can be visualized and interacted with via the MaizeGDB Genome Browser. Database

  13. DoGSD: the dog and wolf genome SNP database.

    PubMed

    Bai, Bing; Zhao, Wen-Ming; Tang, Bi-Xia; Wang, Yan-Qing; Wang, Lu; Zhang, Zhang; Yang, He-Chuan; Liu, Yan-Hu; Zhu, Jun-Wei; Irwin, David M; Wang, Guo-Dong; Zhang, Ya-Ping

    2015-01-01

    The rapid advancement of next-generation sequencing technology has generated a deluge of genomic data from domesticated dogs and their wild ancestor, grey wolves, which have simultaneously broadened our understanding of domestication and diseases that are shared by humans and dogs. To address the scarcity of single nucleotide polymorphism (SNP) data provided by authorized databases and to make SNP data more easily/friendly usable and available, we propose DoGSD (http://dogsd.big.ac.cn), the first canidae-specific database which focuses on whole genome SNP data from domesticated dogs and grey wolves. The DoGSD is a web-based, open-access resource comprising ∼ 19 million high-quality whole-genome SNPs. In addition to the dbSNP data set (build 139), DoGSD incorporates a comprehensive collection of SNPs from two newly sequenced samples (1 wolf and 1 dog) and collected SNPs from three latest dog/wolf genetic studies (7 wolves and 68 dogs), which were taken together for analysis with the population genetic statistics, Fst. In addition, DoGSD integrates some closely related information including SNP annotation, summary lists of SNPs located in genes, synonymous and non-synonymous SNPs, sampling location and breed information. All these features make DoGSD a useful resource for in-depth analysis in dog-/wolf-related studies. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. VaProS: a database-integration approach for protein/genome information retrieval.

    PubMed

    Gojobori, Takashi; Ikeo, Kazuho; Katayama, Yukie; Kawabata, Takeshi; Kinjo, Akira R; Kinoshita, Kengo; Kwon, Yeondae; Migita, Ohsuke; Mizutani, Hisashi; Muraoka, Masafumi; Nagata, Koji; Omori, Satoshi; Sugawara, Hideaki; Yamada, Daichi; Yura, Kei

    2016-12-01

    Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein-protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts' knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/ .

  15. Evaluation of Aspergillus PCR Protocols for Testing Serum Specimens▿†

    PubMed Central

    White, P. Lewis; Mengoli, Carlo; Bretagne, Stéphane; Cuenca-Estrella, Manuel; Finnstrom, Niklas; Klingspor, Lena; Melchers, Willem J. G.; McCulloch, Elaine; Barnes, Rosemary A.; Donnelly, J. Peter; Loeffler, Juergen

    2011-01-01

    A panel of human serum samples spiked with various amounts of Aspergillus fumigatus genomic DNA was distributed to 23 centers within the European Aspergillus PCR Initiative to determine analytical performance of PCR. Information regarding specific methodological components and PCR performance was requested. The information provided was made anonymous, and meta-regression analysis was performed to determine any procedural factors that significantly altered PCR performance. Ninety-seven percent of protocols were able to detect a threshold of 10 genomes/ml on at least one occasion, with 83% of protocols reproducibly detecting this concentration. Sensitivity and specificity were 86.1% and 93.6%, respectively. Positive associations between sensitivity and the use of larger sample volumes, an internal control PCR, and PCR targeting the internal transcribed spacer (ITS) region were shown. Negative associations between sensitivity and the use of larger elution volumes (≥100 μl) and PCR targeting the mitochondrial genes were demonstrated. Most Aspergillus PCR protocols used to test serum generate satisfactory analytical performance. Testing serum requires less standardization, and the specific recommendations shown in this article will only improve performance. PMID:21940479

  16. BioQ: tracing experimental origins in public genomic databases using a novel data provenance model

    PubMed Central

    Saccone, Scott F.; Quan, Jiaxi; Jones, Peter L.

    2012-01-01

    Motivation: Public genomic databases, which are often used to guide genetic studies of human disease, are now being applied to genomic medicine through in silico integrative genomics. These databases, however, often lack tools for systematically determining the experimental origins of the data. Results: We introduce a new data provenance model that we have implemented in a public web application, BioQ, for assessing the reliability of the data by systematically tracing its experimental origins to the original subjects and biologics. BioQ allows investigators to both visualize data provenance as well as explore individual elements of experimental process flow using precise tools for detailed data exploration and documentation. It includes a number of human genetic variation databases such as the HapMap and 1000 Genomes projects. Availability and implementation: BioQ is freely available to the public at http://bioq.saclab.net Contact: ssaccone@wustl.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22426342

  17. The Biofuel Feedstock Genomics Resource: a web-based portal and database to enable functional genomics of plant biofuel feedstock species.

    PubMed

    Childs, Kevin L; Konganti, Kranti; Buell, C Robin

    2012-01-01

    Major feedstock sources for future biofuel production are likely to be high biomass producing plant species such as poplar, pine, switchgrass, sorghum and maize. One active area of research in these species is genome-enabled improvement of lignocellulosic biofuel feedstock quality and yield. To facilitate genomic-based investigations in these species, we developed the Biofuel Feedstock Genomic Resource (BFGR), a database and web-portal that provides high-quality, uniform and integrated functional annotation of gene and transcript assembly sequences from species of interest to lignocellulosic biofuel feedstock researchers. The BFGR includes sequence data from 54 species and permits researchers to view, analyze and obtain annotation at the gene, transcript, protein and genome level. Annotation of biochemical pathways permits the identification of key genes and transcripts central to the improvement of lignocellulosic properties in these species. The integrated nature of the BFGR in terms of annotation methods, orthologous/paralogous relationships and linkage to seven species with complete genome sequences allows comparative analyses for biofuel feedstock species with limited sequence resources. Database URL: http://bfgr.plantbiology.msu.edu.

  18. A Utility Maximizing and Privacy Preserving Approach for Protecting Kinship in Genomic Databases.

    PubMed

    Kale, Gulce; Ayday, Erman; Tastan, Oznur

    2017-09-12

    Rapid and low cost sequencing of genomes enabled widespread use of genomic data in research studies and personalized customer applications, where genomic data is shared in public databases. Although the identities of the participants are anonymized in these databases, sensitive information about individuals can still be inferred. One such information is kinship. We define two routes kinship privacy can leak and propose a technique to protect kinship privacy against these risks while maximizing the utility of shared data. The method involves systematic identification of minimal portions of genomic data to mask as new participants are added to the database. Choosing the proper positions to hide is cast as an optimization problem in which the number of positions to mask is minimized subject to privacy constraints that ensure the familial relationships are not revealed.We evaluate the proposed technique on real genomic data. Results indicate that concurrent sharing of data pertaining to a parent and an offspring results in high risks of kinship privacy, whereas the sharing data from further relatives together is often safer. We also show arrival order of family members have a high impact on the level of privacy risks and on the utility of sharing data. Available at: https://github.com/tastanlab/Kinship-Privacy. erman@cs.bilkent.edu.tr or oznur.tastan@cs.bilkent.edu.tr. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  19. The new modern era of yeast genomics: community sequencing and the resulting annotation of multiple Saccharomyces cerevisiae strains at the Saccharomyces Genome Database

    PubMed Central

    Engel, Stacia R.; Cherry, J. Michael

    2013-01-01

    The first completed eukaryotic genome sequence was that of the yeast Saccharomyces cerevisiae, and the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the original model organism database. SGD remains the authoritative community resource for the S. cerevisiae reference genome sequence and its annotation, and continues to provide comprehensive biological information correlated with S. cerevisiae genes and their products. A diverse set of yeast strains have been sequenced to explore commercial and laboratory applications, and a brief history of those strains is provided. The publication of these new genomes has motivated the creation of new tools, and SGD will annotate and provide comparative analyses of these sequences, correlating changes with variations in strain phenotypes and protein function. We are entering a new era at SGD, as we incorporate these new sequences and make them accessible to the scientific community, all in an effort to continue in our mission of educating researchers and facilitating discovery. Database URL: http://www.yeastgenome.org/ PMID:23487186

  20. Non-B DB: a database of predicted non-B DNA-forming motifs in mammalian genomes.

    PubMed

    Cer, Regina Z; Bruce, Kevin H; Mudunuri, Uma S; Yi, Ming; Volfovsky, Natalia; Luke, Brian T; Bacolla, Albino; Collins, Jack R; Stephens, Robert M

    2011-01-01

    Although the capability of DNA to form a variety of non-canonical (non-B) structures has long been recognized, the overall significance of these alternate conformations in biology has only recently become accepted en masse. In order to provide access to genome-wide locations of these classes of predicted structures, we have developed non-B DB, a database integrating annotations and analysis of non-B DNA-forming sequence motifs. The database provides the most complete list of alternative DNA structure predictions available, including Z-DNA motifs, quadruplex-forming motifs, inverted repeats, mirror repeats and direct repeats and their associated subsets of cruciforms, triplex and slipped structures, respectively. The database also contains motifs predicted to form static DNA bends, short tandem repeats and homo(purine•pyrimidine) tracts that have been associated with disease. The database has been built using the latest releases of the human, chimp, dog, macaque and mouse genomes, so that the results can be compared directly with other data sources. In order to make the data interpretable in a genomic context, features such as genes, single-nucleotide polymorphisms and repetitive elements (SINE, LINE, etc.) have also been incorporated. The database is accessed through query pages that produce results with links to the UCSC browser and a GBrowse-based genomic viewer. It is freely accessible at http://nonb.abcc.ncifcrf.gov.

  1. Comparison of the genomes and transcriptomes associated with the different protease secretions of Aspergillus oryzae 100-8 and 3.042.

    PubMed

    Zhao, Guozhong; Yao, Yunping; Hou, Lihua; Wang, Chunling; Cao, Xiaohong

    2014-10-01

    Aspergillus oryzae is used to produce traditional fermented foods and beverages. A. oryzae 3.042 produces a neutral protease and an alkaline protease but rarely an acid protease, which is unfavourable to soy-sauce fermentation. A. oryzae 100-8 was obtained by N(+) ion implantation mutagenesis of A. oryzae 3.042, and the protease secretions of these two strains are different. Sequencing the genome of A. oryzae 100-8 and comparing it to the genomes of A. oryzae 100-8 and 3.042 revealed some differences, such as single nucleotide polymorphisms, nucleotide deletion or insertion. Some of these differences may reflect the ability of A. oryzae to secrete proteases. Transcriptional sequencing and analysis of the two strains during the same growth processes provided further insights into the genes and pathways involved in protease secretion.

  2. Genomic Approach to Understand the Association of DNA Repair with Longevity and Healthy Aging Using Genomic Databases of Oldest-Old Population

    PubMed Central

    Kim, Hyun Soo

    2018-01-01

    Aged population is increasing worldwide due to the aging process that is inevitable. Accordingly, longevity and healthy aging have been spotlighted to promote social contribution of aged population. Many studies in the past few decades have reported the process of aging and longevity, emphasizing the importance of maintaining genomic stability in exceptionally long-lived population. Underlying reason of longevity remains unclear due to its complexity involving multiple factors. With advances in sequencing technology and human genome-associated approaches, studies based on population-based genomic studies are increasing. In this review, we summarize recent longevity and healthy aging studies of human population focusing on DNA repair as a major factor in maintaining genome integrity. To keep pace with recent growth in genomic research, aging- and longevity-associated genomic databases are also briefly introduced. To suggest novel approaches to investigate longevity-associated genetic variants related to DNA repair using genomic databases, gene set analysis was conducted, focusing on DNA repair- and longevity-associated genes. Their biological networks were additionally analyzed to grasp major factors containing genetic variants of human longevity and healthy aging in DNA repair mechanisms. In summary, this review emphasizes DNA repair activity in human longevity and suggests approach to conduct DNA repair-associated genomic study on human healthy aging.

  3. Extension modules for storage, visualization and querying of genomic, genetic and breeding data in Tripal databases

    PubMed Central

    Lee, Taein; Cheng, Chun-Huai; Ficklin, Stephen; Yu, Jing; Humann, Jodi; Main, Dorrie

    2017-01-01

    Abstract Tripal is an open-source database platform primarily used for development of genomic, genetic and breeding databases. We report here on the release of the Chado Loader, Chado Data Display and Chado Search modules to extend the functionality of the core Tripal modules. These new extension modules provide additional tools for (1) data loading, (2) customized visualization and (3) advanced search functions for supported data types such as organism, marker, QTL/Mendelian Trait Loci, germplasm, map, project, phenotype, genotype and their respective metadata. The Chado Loader module provides data collection templates in Excel with defined metadata and data loaders with front end forms. The Chado Data Display module contains tools to visualize each data type and the metadata which can be used as is or customized as desired. The Chado Search module provides search and download functionality for the supported data types. Also included are the tools to visualize map and species summary. The use of materialized views in the Chado Search module enables better performance as well as flexibility of data modeling in Chado, allowing existing Tripal databases with different metadata types to utilize the module. These Tripal Extension modules are implemented in the Genome Database for Rosaceae (rosaceae.org), CottonGen (cottongen.org), Citrus Genome Database (citrusgenomedb.org), Genome Database for Vaccinium (vaccinium.org) and the Cool Season Food Legume Database (coolseasonfoodlegume.org). Database URL: https://www.citrusgenomedb.org/, https://www.coolseasonfoodlegume.org/, https://www.cottongen.org/, https://www.rosaceae.org/, https://www.vaccinium.org/

  4. GenoMycDB: a database for comparative analysis of mycobacterial genes and genomes.

    PubMed

    Catanho, Marcos; Mascarenhas, Daniel; Degrave, Wim; Miranda, Antonio Basílio de

    2006-03-31

    Several databases and computational tools have been created with the aim of organizing, integrating and analyzing the wealth of information generated by large-scale sequencing projects of mycobacterial genomes and those of other organisms. However, with very few exceptions, these databases and tools do not allow for massive and/or dynamic comparison of these data. GenoMycDB (http://www.dbbm.fiocruz.br/GenoMycDB) is a relational database built for large-scale comparative analyses of completely sequenced mycobacterial genomes, based on their predicted protein content. Its central structure is composed of the results obtained after pair-wise sequence alignments among all the predicted proteins coded by the genomes of six mycobacteria: Mycobacterium tuberculosis (strains H37Rv and CDC1551), M. bovis AF2122/97, M. avium subsp. paratuberculosis K10, M. leprae TN, and M. smegmatis MC2 155. The database stores the computed similarity parameters of every aligned pair, providing for each protein sequence the predicted subcellular localization, the assigned cluster of orthologous groups, the features of the corresponding gene, and links to several important databases. Tables containing pairs or groups of potential homologs between selected species/strains can be produced dynamically by user-defined criteria, based on one or multiple sequence similarity parameters. In addition, searches can be restricted according to the predicted subcellular localization of the protein, the DNA strand of the corresponding gene and/or the description of the protein. Massive data search and/or retrieval are available, and different ways of exporting the result are offered. GenoMycDB provides an on-line resource for the functional classification of mycobacterial proteins as well as for the analysis of genome structure, organization, and evolution.

  5. The MaizeGDB Genome Browser tutorial: one example of database outreach to biologists via video

    PubMed Central

    Harper, Lisa C.; Schaeffer, Mary L.; Thistle, Jordan; Gardiner, Jack M.; Andorf, Carson M.; Campbell, Darwin A.; Cannon, Ethalinda K.S.; Braun, Bremen L.; Birkett, Scott M.; Lawrence, Carolyn J.; Sen, Taner Z.

    2011-01-01

    Video tutorials are an effective way for researchers to quickly learn how to use online tools offered by biological databases. At MaizeGDB, we have developed a number of video tutorials that demonstrate how to use various tools and explicitly outline the caveats researchers should know to interpret the information available to them. One such popular video currently available is ‘Using the MaizeGDB Genome Browser’, which describes how the maize genome was sequenced and assembled as well as how the sequence can be visualized and interacted with via the MaizeGDB Genome Browser. Database URL: http://www.maizegdb.org/ PMID:21565781

  6. Importance of databases of nucleic acids for bioinformatic analysis focused to genomics

    NASA Astrophysics Data System (ADS)

    Jimenez-Gutierrez, L. R.; Barrios-Hernández, C. J.; Pedraza-Ferreira, G. R.; Vera-Cala, L.; Martinez-Perez, F.

    2016-08-01

    Recently, bioinformatics has become a new field of science, indispensable in the analysis of millions of nucleic acids sequences, which are currently deposited in international databases (public or private); these databases contain information of genes, RNA, ORF, proteins, intergenic regions, including entire genomes from some species. The analysis of this information requires computer programs; which were renewed in the use of new mathematical methods, and the introduction of the use of artificial intelligence. In addition to the constant creation of supercomputing units trained to withstand the heavy workload of sequence analysis. However, it is still necessary the innovation on platforms that allow genomic analyses, faster and more effectively, with a technological understanding of all biological processes.

  7. BμG@Sbase—a microbial gene expression and comparative genomic database

    PubMed Central

    Witney, Adam A.; Waldron, Denise E.; Brooks, Lucy A.; Tyler, Richard H.; Withers, Michael; Stoker, Neil G.; Wren, Brendan W.; Butcher, Philip D.; Hinds, Jason

    2012-01-01

    The reducing cost of high-throughput functional genomic technologies is creating a deluge of high volume, complex data, placing the burden on bioinformatics resources and tool development. The Bacterial Microarray Group at St George's (BμG@S) has been at the forefront of bacterial microarray design and analysis for over a decade and while serving as a hub of a global network of microbial research groups has developed BμG@Sbase, a microbial gene expression and comparative genomic database. BμG@Sbase (http://bugs.sgul.ac.uk/bugsbase/) is a web-browsable, expertly curated, MIAME-compliant database that stores comprehensive experimental annotation and multiple raw and analysed data formats. Consistent annotation is enabled through a structured set of web forms, which guide the user through the process following a set of best practices and controlled vocabulary. The database currently contains 86 expertly curated publicly available data sets (with a further 124 not yet published) and full annotation information for 59 bacterial microarray designs. The data can be browsed and queried using an explorer-like interface; integrating intuitive tree diagrams to present complex experimental details clearly and concisely. Furthermore the modular design of the database will provide a robust platform for integrating other data types beyond microarrays into a more Systems analysis based future. PMID:21948792

  8. BμG@Sbase--a microbial gene expression and comparative genomic database.

    PubMed

    Witney, Adam A; Waldron, Denise E; Brooks, Lucy A; Tyler, Richard H; Withers, Michael; Stoker, Neil G; Wren, Brendan W; Butcher, Philip D; Hinds, Jason

    2012-01-01

    The reducing cost of high-throughput functional genomic technologies is creating a deluge of high volume, complex data, placing the burden on bioinformatics resources and tool development. The Bacterial Microarray Group at St George's (BμG@S) has been at the forefront of bacterial microarray design and analysis for over a decade and while serving as a hub of a global network of microbial research groups has developed BμG@Sbase, a microbial gene expression and comparative genomic database. BμG@Sbase (http://bugs.sgul.ac.uk/bugsbase/) is a web-browsable, expertly curated, MIAME-compliant database that stores comprehensive experimental annotation and multiple raw and analysed data formats. Consistent annotation is enabled through a structured set of web forms, which guide the user through the process following a set of best practices and controlled vocabulary. The database currently contains 86 expertly curated publicly available data sets (with a further 124 not yet published) and full annotation information for 59 bacterial microarray designs. The data can be browsed and queried using an explorer-like interface; integrating intuitive tree diagrams to present complex experimental details clearly and concisely. Furthermore the modular design of the database will provide a robust platform for integrating other data types beyond microarrays into a more Systems analysis based future.

  9. Comparative genomics of citric-acid-producing Aspergillus niger ATCC 1015 versus enzyme-producing CBS 513.88

    PubMed Central

    Andersen, Mikael R.; Salazar, Margarita P.; Schaap, Peter J.; van de Vondervoort, Peter J.I.; Culley, David; Thykaer, Jette; Frisvad, Jens C.; Nielsen, Kristian F.; Albang, Richard; Albermann, Kaj; Berka, Randy M.; Braus, Gerhard H.; Braus-Stromeyer, Susanna A.; Corrochano, Luis M.; Dai, Ziyu; van Dijck, Piet W.M.; Hofmann, Gerald; Lasure, Linda L.; Magnuson, Jon K.; Menke, Hildegard; Meijer, Martin; Meijer, Susan L.; Nielsen, Jakob B.; Nielsen, Michael L.; van Ooyen, Albert J.J.; Pel, Herman J.; Poulsen, Lars; Samson, Rob A.; Stam, Hein; Tsang, Adrian; van den Brink, Johannes M.; Atkins, Alex; Aerts, Andrea; Shapiro, Harris; Pangilinan, Jasmyn; Salamov, Asaf; Lou, Yigong; Lindquist, Erika; Lucas, Susan; Grimwood, Jane; Grigoriev, Igor V.; Kubicek, Christian P.; Martinez, Diego; van Peij, Noël N.M.E.; Roubos, Johannes A.; Nielsen, Jens; Baker, Scott E.

    2011-01-01

    The filamentous fungus Aspergillus niger exhibits great diversity in its phenotype. It is found globally, both as marine and terrestrial strains, produces both organic acids and hydrolytic enzymes in high amounts, and some isolates exhibit pathogenicity. Although the genome of an industrial enzyme-producing A. niger strain (CBS 513.88) has already been sequenced, the versatility and diversity of this species compel additional exploration. We therefore undertook whole-genome sequencing of the acidogenic A. niger wild-type strain (ATCC 1015) and produced a genome sequence of very high quality. Only 15 gaps are present in the sequence, and half the telomeric regions have been elucidated. Moreover, sequence information from ATCC 1015 was used to improve the genome sequence of CBS 513.88. Chromosome-level comparisons uncovered several genome rearrangements, deletions, a clear case of strain-specific horizontal gene transfer, and identification of 0.8 Mb of novel sequence. Single nucleotide polymorphisms per kilobase (SNPs/kb) between the two strains were found to be exceptionally high (average: 7.8, maximum: 160 SNPs/kb). High variation within the species was confirmed with exo-metabolite profiling and phylogenetics. Detailed lists of alleles were generated, and genotypic differences were observed to accumulate in metabolic pathways essential to acid production and protein synthesis. A transcriptome analysis supported up-regulation of genes associated with biosynthesis of amino acids that are abundant in glucoamylase A, tRNA-synthases, and protein transporters in the protein producing CBS 513.88 strain. Our results and data sets from this integrative systems biology analysis resulted in a snapshot of fungal evolution and will support further optimization of cell factories based on filamentous fungi. PMID:21543515

  10. A novel non-thermostable deuterolysin from Aspergillus oryzae.

    PubMed

    Maeda, Hiroshi; Katase, Toru; Sakai, Daisuke; Takeuchi, Michio; Kusumoto, Ken-Ichi; Amano, Hitoshi; Ishida, Hiroki; Abe, Keietsu; Yamagata, Youhei

    2016-09-01

    Three putative deuterolysin (EC 3.4.24.29) genes (deuA, deuB, and deuC) were found in the Aspergillus oryzae genome database ( http://www.bio.nite.go.jp/dogan/project/view/AO ). One of these genes, deuA, was corresponding to NpII gene, previously reported. DeuA and DeuB were overexpressed by recombinant A. oryzae and were purified. The degradation profiles against protein substrates of both enzymes were similar, but DeuB showed wider substrate specificity against peptidyl MCA-substrates compared with DeuA. Enzymatic profiles of DeuB except for thermostability also resembled those of DeuA. DeuB was inactivated by heat treatment above 80° C, different from thermostable DeuA. Transcription analysis in wild type A. oryzae showed only deuB was expressed in liquid culture, and the addition of the proteinous substrate upregulated the transcription. Furthermore, the NaNO3 addition seems to eliminate the effect of proteinous substrate for the transcription of deuB.

  11. GEAR: A database of Genomic Elements Associated with drug Resistance.

    PubMed

    Wang, Yin-Ying; Chen, Wei-Hua; Xiao, Pei-Pei; Xie, Wen-Bin; Luo, Qibin; Bork, Peer; Zhao, Xing-Ming

    2017-03-15

    Drug resistance is becoming a serious problem that leads to the failure of standard treatments, which is generally developed because of genetic mutations of certain molecules. Here, we present GEAR (A database of Genomic Elements Associated with drug Resistance) that aims to provide comprehensive information about genomic elements (including genes, single-nucleotide polymorphisms and microRNAs) that are responsible for drug resistance. Right now, GEAR contains 1631 associations between 201 human drugs and 758 genes, 106 associations between 29 human drugs and 66 miRNAs, and 44 associations between 17 human drugs and 22 SNPs. These relationships are firstly extracted from primary literature with text mining and then manually curated. The drug resistome deposited in GEAR provides insights into the genetic factors underlying drug resistance. In addition, new indications and potential drug combinations can be identified based on the resistome. The GEAR database can be freely accessed through http://gear.comp-sysbio.org.

  12. GEAR: A database of Genomic Elements Associated with drug Resistance

    PubMed Central

    Wang, Yin-Ying; Chen, Wei-Hua; Xiao, Pei-Pei; Xie, Wen-Bin; Luo, Qibin; Bork, Peer; Zhao, Xing-Ming

    2017-01-01

    Drug resistance is becoming a serious problem that leads to the failure of standard treatments, which is generally developed because of genetic mutations of certain molecules. Here, we present GEAR (A database of Genomic Elements Associated with drug Resistance) that aims to provide comprehensive information about genomic elements (including genes, single-nucleotide polymorphisms and microRNAs) that are responsible for drug resistance. Right now, GEAR contains 1631 associations between 201 human drugs and 758 genes, 106 associations between 29 human drugs and 66 miRNAs, and 44 associations between 17 human drugs and 22 SNPs. These relationships are firstly extracted from primary literature with text mining and then manually curated. The drug resistome deposited in GEAR provides insights into the genetic factors underlying drug resistance. In addition, new indications and potential drug combinations can be identified based on the resistome. The GEAR database can be freely accessed through http://gear.comp-sysbio.org. PMID:28294141

  13. OperomeDB: A Database of Condition-Specific Transcription Units in Prokaryotic Genomes.

    PubMed

    Chetal, Kashish; Janga, Sarath Chandra

    2015-01-01

    Background. In prokaryotic organisms, a substantial fraction of adjacent genes are organized into operons-codirectionally organized genes in prokaryotic genomes with the presence of a common promoter and terminator. Although several available operon databases provide information with varying levels of reliability, very few resources provide experimentally supported results. Therefore, we believe that the biological community could benefit from having a new operon prediction database with operons predicted using next-generation RNA-seq datasets. Description. We present operomeDB, a database which provides an ensemble of all the predicted operons for bacterial genomes using available RNA-sequencing datasets across a wide range of experimental conditions. Although several studies have recently confirmed that prokaryotic operon structure is dynamic with significant alterations across environmental and experimental conditions, there are no comprehensive databases for studying such variations across prokaryotic transcriptomes. Currently our database contains nine bacterial organisms and 168 transcriptomes for which we predicted operons. User interface is simple and easy to use, in terms of visualization, downloading, and querying of data. In addition, because of its ability to load custom datasets, users can also compare their datasets with publicly available transcriptomic data of an organism. Conclusion. OperomeDB as a database should not only aid experimental groups working on transcriptome analysis of specific organisms but also enable studies related to computational and comparative operomics.

  14. Large-Scale Phylogenetic Classification of Fungal Chitin Synthases and Identification of a Putative Cell-Wall Metabolism Gene Cluster in Aspergillus Genomes

    PubMed Central

    Pacheco-Arjona, Jose Ramon; Ramirez-Prado, Jorge Humberto

    2014-01-01

    The cell wall is a protective and versatile structure distributed in all fungi. The component responsible for its rigidity is chitin, a product of chitin synthase (Chsp) enzymes. There are seven classes of chitin synthase genes (CHS) and the amount and type encoded in fungal genomes varies considerably from one species to another. Previous Chsp sequence analyses focused on their study as individual units, regardless of genomic context. The identification of blocks of conserved genes between genomes can provide important clues about the interactions and localization of chitin synthases. On the present study, we carried out an in silico search of all putative Chsp encoded in 54 full fungal genomes, encompassing 21 orders from five phyla. Phylogenetic studies of these Chsp were able to confidently classify 347 out of the 369 Chsp identified (94%). Patterns in the distribution of Chsp related to taxonomy were identified, the most prominent being related to the type of fungal growth. More importantly, a synteny analysis for genomic blocks centered on class IV Chsp (the most abundant and widely distributed Chsp class) identified a putative cell wall metabolism gene cluster in members of the genus Aspergillus, the first such association reported for any fungal genome. PMID:25148134

  15. Biocuration at the Saccharomyces genome database.

    PubMed

    Skrzypek, Marek S; Nash, Robert S

    2015-08-01

    Saccharomyces Genome Database is an online resource dedicated to managing information about the biology and genetics of the model organism, yeast (Saccharomyces cerevisiae). This information is derived primarily from scientific publications through a process of human curation that involves manual extraction of data and their organization into a comprehensive system of knowledge. This system provides a foundation for further analysis of experimental data coming from research on yeast as well as other organisms. In this review we will demonstrate how biocuration and biocurators add a key component, the biological context, to our understanding of how genes, proteins, genomes and cells function and interact. We will explain the role biocurators play in sifting through the wealth of biological data to incorporate and connect key information. We will also discuss the many ways we assist researchers with their various research needs. We hope to convince the reader that manual curation is vital in converting the flood of data into organized and interconnected knowledge, and that biocurators play an essential role in the integration of scientific information into a coherent model of the cell. © 2015 Wiley Periodicals, Inc.

  16. Updates to the Cool Season Food Legume Genome Database: Resources for pea, lentil, faba bean and chickpea genetics, genomics and breeding

    USDA-ARS?s Scientific Manuscript database

    The Cool Season Food Legume Genome database (CSFL, www.coolseasonfoodlegume.org) is an online resource for genomics, genetics, and breeding research for chickpea, lentil,pea, and faba bean. The user-friendly and curated website allows for all publicly available map,marker,trait, gene,transcript, ger...

  17. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases

    PubMed Central

    Caspi, Ron; Altman, Tomer; Dale, Joseph M.; Dreher, Kate; Fulcher, Carol A.; Gilham, Fred; Kaipa, Pallavi; Karthikeyan, Athikkattuvalasu S.; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A.; Paley, Suzanne; Popescu, Liviu; Pujar, Anuradha; Shearer, Alexander G.; Zhang, Peifen; Karp, Peter D.

    2010-01-01

    The MetaCyc database (MetaCyc.org) is a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. With more than 1400 pathways, MetaCyc is the largest collection of metabolic pathways currently available. Pathways reactions are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes, and literature citations. BioCyc (BioCyc.org) is a collection of more than 500 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs also contain additional features, such as predicted operons, transport systems, and pathway hole-fillers. The BioCyc Web site offers several tools for the analysis of the PGDBs, including Omics Viewers that enable visualization of omics datasets on two different genome-scale diagrams and tools for comparative analysis. The BioCyc PGDBs generated by SRI are offered for adoption by any party interested in curation of metabolic, regulatory, and genome-related information about an organism. PMID:19850718

  18. Determining Epigenetic Targets: A Beginner's Guide to Identifying Genome Functionality Through Database Analysis.

    PubMed

    Hay, Elizabeth A; Cowie, Philip; MacKenzie, Alasdair

    2017-01-01

    There can now be little doubt that the cis-regulatory genome represents the largest information source within the human genome essential for health. In addition to containing up to five times more information than the coding genome, the cis-regulatory genome also acts as a major reservoir of disease-associated polymorphic variation. The cis-regulatory genome, which is comprised of enhancers, silencers, promoters, and insulators, also acts as a major functional target for epigenetic modification including DNA methylation and chromatin modifications. These epigenetic modifications impact the ability of cis-regulatory sequences to maintain tissue-specific and inducible expression of genes that preserve health. There has been limited ability to identify and characterize the functional components of this huge and largely misunderstood part of the human genome that, for decades, was ignored as "Junk" DNA. In an attempt to address this deficit, the current chapter will first describe methods of identifying and characterizing functional elements of the cis-regulatory genome at a genome-wide level using databases such as ENCODE, the UCSC browser, and NCBI. We will then explore the databases on the UCSC genome browser, which provides access to DNA methylation and chromatin modification datasets. Finally, we will describe how we can superimpose the huge volume of study data contained in the NCBI archives onto that contained within the UCSC browser in order to glean relevant in vivo study data for any locus within the genome. An ability to access and utilize these information sources will become essential to informing the future design of experiments and subsequent determination of the role of epigenetics in health and disease and will form a critical step in our development of personalized medicine.

  19. Manual Gene Ontology annotation workflow at the Mouse Genome Informatics Database

    PubMed Central

    Drabkin, Harold J.; Blake, Judith A.

    2012-01-01

    The Mouse Genome Database, the Gene Expression Database and the Mouse Tumor Biology database are integrated components of the Mouse Genome Informatics (MGI) resource (http://www.informatics.jax.org). The MGI system presents both a consensus view and an experimental view of the knowledge concerning the genetics and genomics of the laboratory mouse. From genotype to phenotype, this information resource integrates information about genes, sequences, maps, expression analyses, alleles, strains and mutant phenotypes. Comparative mammalian data are also presented particularly in regards to the use of the mouse as a model for the investigation of molecular and genetic components of human diseases. These data are collected from literature curation as well as downloads of large datasets (SwissProt, LocusLink, etc.). MGI is one of the founding members of the Gene Ontology (GO) and uses the GO for functional annotation of genes. Here, we discuss the workflow associated with manual GO annotation at MGI, from literature collection to display of the annotations. Peer-reviewed literature is collected mostly from a set of journals available electronically. Selected articles are entered into a master bibliography and indexed to one of eight areas of interest such as ‘GO’ or ‘homology’ or ‘phenotype’. Each article is then either indexed to a gene already contained in the database or funneled through a separate nomenclature database to add genes. The master bibliography and associated indexing provide information for various curator-reports such as ‘papers selected for GO that refer to genes with NO GO annotation’. Once indexed, curators who have expertise in appropriate disciplines enter pertinent information. MGI makes use of several controlled vocabularies that ensure uniform data encoding, enable robust analysis and support the construction of complex queries. These vocabularies range from pick-lists to structured vocabularies such as the GO. All data associations

  20. Manual Gene Ontology annotation workflow at the Mouse Genome Informatics Database.

    PubMed

    Drabkin, Harold J; Blake, Judith A

    2012-01-01

    The Mouse Genome Database, the Gene Expression Database and the Mouse Tumor Biology database are integrated components of the Mouse Genome Informatics (MGI) resource (http://www.informatics.jax.org). The MGI system presents both a consensus view and an experimental view of the knowledge concerning the genetics and genomics of the laboratory mouse. From genotype to phenotype, this information resource integrates information about genes, sequences, maps, expression analyses, alleles, strains and mutant phenotypes. Comparative mammalian data are also presented particularly in regards to the use of the mouse as a model for the investigation of molecular and genetic components of human diseases. These data are collected from literature curation as well as downloads of large datasets (SwissProt, LocusLink, etc.). MGI is one of the founding members of the Gene Ontology (GO) and uses the GO for functional annotation of genes. Here, we discuss the workflow associated with manual GO annotation at MGI, from literature collection to display of the annotations. Peer-reviewed literature is collected mostly from a set of journals available electronically. Selected articles are entered into a master bibliography and indexed to one of eight areas of interest such as 'GO' or 'homology' or 'phenotype'. Each article is then either indexed to a gene already contained in the database or funneled through a separate nomenclature database to add genes. The master bibliography and associated indexing provide information for various curator-reports such as 'papers selected for GO that refer to genes with NO GO annotation'. Once indexed, curators who have expertise in appropriate disciplines enter pertinent information. MGI makes use of several controlled vocabularies that ensure uniform data encoding, enable robust analysis and support the construction of complex queries. These vocabularies range from pick-lists to structured vocabularies such as the GO. All data associations are supported

  1. Comparative genomic analysis identified a mutation related to enhanced heterologous protein production in the filamentous fungus Aspergillus oryzae.

    PubMed

    Jin, Feng-Jie; Katayama, Takuya; Maruyama, Jun-Ichi; Kitamoto, Katsuhiko

    2016-11-01

    Genomic mapping of mutations using next-generation sequencing technologies has facilitated the identification of genes contributing to fundamental biological processes, including human diseases. However, few studies have used this approach to identify mutations contributing to heterologous protein production in industrial strains of filamentous fungi, such as Aspergillus oryzae. In a screening of A. oryzae strains that hyper-produce human lysozyme (HLY), we previously isolated an AUT1 mutant that showed higher production of various heterologous proteins; however, the underlying factors contributing to the increased heterologous protein production remained unclear. Here, using a comparative genomic approach performed with whole-genome sequences, we attempted to identify the genes responsible for the high-level production of heterologous proteins in the AUT1 mutant. The comparative sequence analysis led to the detection of a gene (AO090120000003), designated autA, which was predicted to encode an unknown cytoplasmic protein containing an alpha/beta-hydrolase fold domain. Mutation or deletion of autA was associated with higher production levels of HLY. Specifically, the HLY yields of the autA mutant and deletion strains were twofold higher than that of the control strain during the early stages of cultivation. Taken together, these results indicate that combining classical mutagenesis approaches with comparative genomic analysis facilitates the identification of novel genes involved in heterologous protein production in filamentous fungi.

  2. Scientific Advances with Aspergillus Species that Are Used for Food and Biotech Applications.

    PubMed

    Biesebeke, Rob Te; Record, Erik

    2008-01-01

    Yeast and filamentous fungi have been used for centuries in diverse biotechnological processes. Fungal fermentation technology is traditionally used in relation to food production, such as for bread, beer, cheese, sake and soy sauce. Last century, the industrial application of yeast and filamentous fungi expanded rapidly, with excellent examples such as purified enzymes and secondary metabolites (e.g. antibiotics), which are used in a wide range of food as well as non-food industries. Research on protein and/or metabolite secretion by fungal species has focused on identifying bottlenecks in (post-) transcriptional regulation of protein production, metabolic rerouting, morphology and the transit of proteins through the secretion pathway. In past years, genome sequencing of some fungi (e.g. Aspergillus oryzae, Aspergillus niger) has been completed. The available genome sequences have enabled identification of genes and functionally important regions of the genome. This has directed research to focus on a post-genomics era in which transcriptomics, proteomics and metabolomics methodologies will help to explore the scientific relevance and industrial application of fungal genome sequences.

  3. PIPEMicroDB: microsatellite database and primer generation tool for pigeonpea genome

    PubMed Central

    Sarika; Arora, Vasu; Iquebal, M. A.; Rai, Anil; Kumar, Dinesh

    2013-01-01

    Molecular markers play a significant role for crop improvement in desirable characteristics, such as high yield, resistance to disease and others that will benefit the crop in long term. Pigeonpea (Cajanus cajan L.) is the recently sequenced legume by global consortium led by ICRISAT (Hyderabad, India) and been analysed for gene prediction, synteny maps, markers, etc. We present PIgeonPEa Microsatellite DataBase (PIPEMicroDB) with an automated primer designing tool for pigeonpea genome, based on chromosome wise as well as location wise search of primers. Total of 123 387 Short Tandem Repeats (STRs) were extracted from pigeonpea genome, available in public domain using MIcroSAtellite tool (MISA). The database is an online relational database based on ‘three-tier architecture’ that catalogues information of microsatellites in MySQL and user-friendly interface is developed using PHP. Search for STRs may be customized by limiting their location on chromosome as well as number of markers in that range. This is a novel approach and is not been implemented in any of the existing marker database. This database has been further appended with Primer3 for primer designing of selected markers with left and right flankings of size up to 500 bp. This will enable researchers to select markers of choice at desired interval over the chromosome. Furthermore, one can use individual STRs of a targeted region over chromosome to narrow down location of gene of interest or linked Quantitative Trait Loci (QTLs). Although it is an in silico approach, markers’ search based on characteristics and location of STRs is expected to be beneficial for researchers. Database URL: http://cabindb.iasri.res.in/pigeonpea/ PMID:23396298

  4. PIPEMicroDB: microsatellite database and primer generation tool for pigeonpea genome.

    PubMed

    Sarika; Arora, Vasu; Iquebal, M A; Rai, Anil; Kumar, Dinesh

    2013-01-01

    Molecular markers play a significant role for crop improvement in desirable characteristics, such as high yield, resistance to disease and others that will benefit the crop in long term. Pigeonpea (Cajanus cajan L.) is the recently sequenced legume by global consortium led by ICRISAT (Hyderabad, India) and been analysed for gene prediction, synteny maps, markers, etc. We present PIgeonPEa Microsatellite DataBase (PIPEMicroDB) with an automated primer designing tool for pigeonpea genome, based on chromosome wise as well as location wise search of primers. Total of 123 387 Short Tandem Repeats (STRs) were extracted from pigeonpea genome, available in public domain using MIcroSAtellite tool (MISA). The database is an online relational database based on 'three-tier architecture' that catalogues information of microsatellites in MySQL and user-friendly interface is developed using PHP. Search for STRs may be customized by limiting their location on chromosome as well as number of markers in that range. This is a novel approach and is not been implemented in any of the existing marker database. This database has been further appended with Primer3 for primer designing of selected markers with left and right flankings of size up to 500 bp. This will enable researchers to select markers of choice at desired interval over the chromosome. Furthermore, one can use individual STRs of a targeted region over chromosome to narrow down location of gene of interest or linked Quantitative Trait Loci (QTLs). Although it is an in silico approach, markers' search based on characteristics and location of STRs is expected to be beneficial for researchers. Database URL: http://cabindb.iasri.res.in/pigeonpea/

  5. Purification and enzymatic characterization of secretory glycoside hydrolase family 3 (GH3) aryl β-glucosidases screened from Aspergillus oryzae genome.

    PubMed

    Kudo, Kanako; Watanabe, Akira; Ujiie, Seiryu; Shintani, Takahiro; Gomi, Katsuya

    2015-12-01

    By a global search of the genome database of Aspergillus oryzae, we found 23 genes encoding putative β-glucosidases, among which 10 genes with a signal peptide belonging to glycoside hydrolase family 3 (GH3) were overexpressed in A. oryzae using the improved glaA gene promoter. Consequently, crude enzyme preparations from three strains, each harboring the genes AO090038000223 (bglA), AO090103000127 (bglF), and AO090003001511 (bglJ), showed a substrate preference toward p-nitrophenyl-β-d-glucopyranoside (pNPGlc) and thus were purified to homogeneity and enzymatically characterized. All the purified enzymes (BglA, BglF, and BglJ) preferentially hydrolyzed aryl β-glycosides, including pNPGlc, rather than cellobiose, and these enzymes were proven to be aryl β-glucosidases. Although the specific activity of BglF toward all the substrates tested was significantly low, BglA and BglJ showed appreciably high activities toward pNPGlc and arbutin. The kinetic parameters of BglA and BglJ for pNPGlc suggested that both the enzymes had relatively higher hydrolytic activity toward pNPGlc among the fungal β-glucosidases reported. The thermal and pH stabilities of BglA were higher than those of BglJ, and BglA was particularly stable in a wide pH range (pH 4.5-10). In contrast, BglJ was the most heat- and alkaline-labile among the three β-glucosidases. Furthermore, BglA was more tolerant to ethanol than BglJ; as a result, it showed much higher hydrolytic activity toward isoflavone glycosides in the presence of ethanol than BglJ. This study suggested that the mining of novel β-glucosidases exhibiting higher activity from microbial genome sequences is of great use for the production of beneficial compounds such as isoflavone aglycones. Copyright © 2015 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.

  6. The porcine translational research database: A manually curated, genomics and proteomics-based research resource

    USDA-ARS?s Scientific Manuscript database

    The use of swine in biomedical research has increased dramatically in the last decade. Diverse genomic- and proteomic databases have been developed to facilitate research using human and rodent models. Current porcine gene databases, however, lack the robust annotation to study pig models that are...

  7. TRACTOR_DB: a database of regulatory networks in gamma-proteobacterial genomes

    PubMed Central

    González, Abel D.; Espinosa, Vladimir; Vasconcelos, Ana T.; Pérez-Rueda, Ernesto; Collado-Vides, Julio

    2005-01-01

    Experimental data on the Escherichia coli transcriptional regulatory system has been used in the past years to predict new regulatory elements (promoters, transcription factors (TFs), TFs' binding sites and operons) within its genome. As more genomes of gamma-proteobacteria are being sequenced, the prediction of these elements in a growing number of organisms has become more feasible, as a step towards the study of how different bacteria respond to environmental changes at the level of transcriptional regulation. In this work, we present TRACTOR_DB (TRAnscription FaCTORs' predicted binding sites in prokaryotic genomes), a relational database that contains computational predictions of new members of 74 regulons in 17 gamma-proteobacterial genomes. For these predictions we used a comparative genomics approach regarding which several proof-of-principle articles for large regulons have been published. TRACTOR_DB may be currently accessed at http://www.bioinfo.cu/Tractor_DB, http://www.tractor.lncc.br/ or at http://www.cifn.unam.mx/Computational_Genomics/tractorDB. Contact Email id is tractor@cifn.unam.mx. PMID:15608293

  8. Developing genomic knowledge bases and databases to support clinical management: current perspectives.

    PubMed

    Huser, Vojtech; Sincan, Murat; Cimino, James J

    2014-01-01

    Personalized medicine, the ability to tailor diagnostic and treatment decisions for individual patients, is seen as the evolution of modern medicine. We characterize here the informatics resources available today or envisioned in the near future that can support clinical interpretation of genomic test results. We assume a clinical sequencing scenario (germline whole-exome sequencing) in which a clinical specialist, such as an endocrinologist, needs to tailor patient management decisions within his or her specialty (targeted findings) but relies on a genetic counselor to interpret off-target incidental findings. We characterize the genomic input data and list various types of knowledge bases that provide genomic knowledge for generating clinical decision support. We highlight the need for patient-level databases with detailed lifelong phenotype content in addition to genotype data and provide a list of recommendations for personalized medicine knowledge bases and databases. We conclude that no single knowledge base can currently support all aspects of personalized recommendations and that consolidation of several current resources into larger, more dynamic and collaborative knowledge bases may offer a future path forward.

  9. Developing genomic knowledge bases and databases to support clinical management: current perspectives

    PubMed Central

    Huser, Vojtech; Sincan, Murat; Cimino, James J

    2014-01-01

    Personalized medicine, the ability to tailor diagnostic and treatment decisions for individual patients, is seen as the evolution of modern medicine. We characterize here the informatics resources available today or envisioned in the near future that can support clinical interpretation of genomic test results. We assume a clinical sequencing scenario (germline whole-exome sequencing) in which a clinical specialist, such as an endocrinologist, needs to tailor patient management decisions within his or her specialty (targeted findings) but relies on a genetic counselor to interpret off-target incidental findings. We characterize the genomic input data and list various types of knowledge bases that provide genomic knowledge for generating clinical decision support. We highlight the need for patient-level databases with detailed lifelong phenotype content in addition to genotype data and provide a list of recommendations for personalized medicine knowledge bases and databases. We conclude that no single knowledge base can currently support all aspects of personalized recommendations and that consolidation of several current resources into larger, more dynamic and collaborative knowledge bases may offer a future path forward. PMID:25276091

  10. A RESTful application programming interface for the PubMLST molecular typing and genome databases

    PubMed Central

    Bray, James E.; Maiden, Martin C. J.

    2017-01-01

    Abstract Molecular typing is used to differentiate microorganisms at the subspecies or strain level for epidemiological investigations, infection control, public health and environmental sampling. DNA sequence-based typing methods require authoritative databases that link sequence variants to nomenclature in order to facilitate communication and comparison of identified types in national or global settings. The PubMLST website (https://pubmlst.org/) fulfils this role for over a hundred microorganisms for which it hosts curated molecular sequence typing data, providing sequence and allelic profile definitions for multi-locus sequence typing (MLST) and single-gene typing approaches. In recent years, these have expanded to cover the whole genome with schemes such as core genome MLST (cgMLST) and whole genome MLST (wgMLST) which catalogue the allelic diversity found in hundreds to thousands of genes. These approaches provide a common nomenclature for high-resolution strain characterization and comparison. Molecular typing information is linked to isolate provenance, phenotype, and increasingly genome assemblies, providing a resource for outbreak investigation and research in to population structure, gene association, global epidemiology and vaccine coverage. A Representational State Transfer (REST) Application Programming Interface (API) has been developed for the PubMLST website to make these large quantities of structured molecular typing and whole genome sequence data available for programmatic access by any third party application. The API is an integral component of the Bacterial Isolate Genome Sequence Database (BIGSdb) platform that is used to host PubMLST resources, and exposes all public data within the site. In addition to data browsing, searching and download, the API supports authentication and submission of new data to curator queues. Database URL: http://rest.pubmlst.org/ PMID:29220452

  11. KGCAK: a K-mer based database for genome-wide phylogeny and complexity evaluation.

    PubMed

    Wang, Dapeng; Xu, Jiayue; Yu, Jun

    2015-09-16

    The K-mer approach, treating genomic sequences as simple characters and counting the relative abundance of each string upon a fixed K, has been extensively applied to phylogeny inference for genome assembly, annotation, and comparison. To meet increasing demands for comparing large genome sequences and to promote the use of the K-mer approach, we develop a versatile database, KGCAK ( http://kgcak.big.ac.cn/KGCAK/ ), containing ~8,000 genomes that include genome sequences of diverse life forms (viruses, prokaryotes, protists, animals, and plants) and cellular organelles of eukaryotic lineages. It builds phylogeny based on genomic elements in an alignment-free fashion and provides in-depth data processing enabling users to compare the complexity of genome sequences based on K-mer distribution. We hope that KGCAK becomes a powerful tool for exploring relationship within and among groups of species in a tree of life based on genomic data.

  12. Genome Characterization of Oleaginous Aspergillus oryzae BCC7051: A Potential Fungal-Based Platform for Lipid Production

    DOE PAGES

    Thammarongtham, Chinae; Nookaew, Intawat; Vorapreeda, Tayvich; ...

    2017-09-01

    The selected robust fungus, Aspergillus oryzae strain BCC7051 is of interest for biotechnological production of lipid-derived products due to its capability to accumulate high amount of intracellular lipids using various sugars and agro-industrial substrates. Here in this paper, we report the genome sequence of the oleaginous A. oryzae BCC7051. The obtained reads were de novo assembled into 25 scaffolds spanning of 38,550,958 bps with predicted 11,456 protein-coding genes. By synteny mapping, a large rearrangement was found in two scaffolds of A. oryzae BCC7051 as compared to the reference RIB40 strain. The genetic relationship between BCC7051 and other strains of A.more » oryzae in terms of aflatoxin production was investigated, indicating that the A. oryzae BCC7051 was categorized into group 2 nonaflatoxin-producing strain. Moreover, a comparative analysis of the structural genes focusing on the involvement in lipid metabolism among oleaginous yeast and fungi revealed the presence of multiple isoforms of metabolic enzymes responsible for fatty acid synthesis in BCC7051. The alternative routes of acetyl-CoA generation as oleaginous features and malate/citrate/pyruvate shuttle were also identified in this A. oryzae strain. The genome sequence generated in this work is a dedicated resource for expanding genome-wide study of microbial lipids at systems level, and developing the fungal-based platform for production of diversified lipids with commercial relevance.« less

  13. Genome Characterization of Oleaginous Aspergillus oryzae BCC7051: A Potential Fungal-Based Platform for Lipid Production

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thammarongtham, Chinae; Nookaew, Intawat; Vorapreeda, Tayvich

    The selected robust fungus, Aspergillus oryzae strain BCC7051 is of interest for biotechnological production of lipid-derived products due to its capability to accumulate high amount of intracellular lipids using various sugars and agro-industrial substrates. Here in this paper, we report the genome sequence of the oleaginous A. oryzae BCC7051. The obtained reads were de novo assembled into 25 scaffolds spanning of 38,550,958 bps with predicted 11,456 protein-coding genes. By synteny mapping, a large rearrangement was found in two scaffolds of A. oryzae BCC7051 as compared to the reference RIB40 strain. The genetic relationship between BCC7051 and other strains of A.more » oryzae in terms of aflatoxin production was investigated, indicating that the A. oryzae BCC7051 was categorized into group 2 nonaflatoxin-producing strain. Moreover, a comparative analysis of the structural genes focusing on the involvement in lipid metabolism among oleaginous yeast and fungi revealed the presence of multiple isoforms of metabolic enzymes responsible for fatty acid synthesis in BCC7051. The alternative routes of acetyl-CoA generation as oleaginous features and malate/citrate/pyruvate shuttle were also identified in this A. oryzae strain. The genome sequence generated in this work is a dedicated resource for expanding genome-wide study of microbial lipids at systems level, and developing the fungal-based platform for production of diversified lipids with commercial relevance.« less

  14. HPMCD: the database of human microbial communities from metagenomic datasets and microbial reference genomes.

    PubMed

    Forster, Samuel C; Browne, Hilary P; Kumar, Nitin; Hunt, Martin; Denise, Hubert; Mitchell, Alex; Finn, Robert D; Lawley, Trevor D

    2016-01-04

    The Human Pan-Microbe Communities (HPMC) database (http://www.hpmcd.org/) provides a manually curated, searchable, metagenomic resource to facilitate investigation of human gastrointestinal microbiota. Over the past decade, the application of metagenome sequencing to elucidate the microbial composition and functional capacity present in the human microbiome has revolutionized many concepts in our basic biology. When sufficient high quality reference genomes are available, whole genome metagenomic sequencing can provide direct biological insights and high-resolution classification. The HPMC database provides species level, standardized phylogenetic classification of over 1800 human gastrointestinal metagenomic samples. This is achieved by combining a manually curated list of bacterial genomes from human faecal samples with over 21000 additional reference genomes representing bacteria, viruses, archaea and fungi with manually curated species classification and enhanced sample metadata annotation. A user-friendly, web-based interface provides the ability to search for (i) microbial groups associated with health or disease state, (ii) health or disease states and community structure associated with a microbial group, (iii) the enrichment of a microbial gene or sequence and (iv) enrichment of a functional annotation. The HPMC database enables detailed analysis of human microbial communities and supports research from basic microbiology and immunology to therapeutic development in human health and disease. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. dbWGFP: a database and web server of human whole-genome single nucleotide variants and their functional predictions.

    PubMed

    Wu, Jiaxin; Wu, Mengmeng; Li, Lianshuo; Liu, Zhuo; Zeng, Wanwen; Jiang, Rui

    2016-01-01

    The recent advancement of the next generation sequencing technology has enabled the fast and low-cost detection of all genetic variants spreading across the entire human genome, making the application of whole-genome sequencing a tendency in the study of disease-causing genetic variants. Nevertheless, there still lacks a repository that collects predictions of functionally damaging effects of human genetic variants, though it has been well recognized that such predictions play a central role in the analysis of whole-genome sequencing data. To fill this gap, we developed a database named dbWGFP (a database and web server of human whole-genome single nucleotide variants and their functional predictions) that contains functional predictions and annotations of nearly 8.58 billion possible human whole-genome single nucleotide variants. Specifically, this database integrates 48 functional predictions calculated by 17 popular computational methods and 44 valuable annotations obtained from various data sources. Standalone software, user-friendly query services and free downloads of this database are available at http://bioinfo.au.tsinghua.edu.cn/dbwgfp. dbWGFP provides a valuable resource for the analysis of whole-genome sequencing, exome sequencing and SNP array data, thereby complementing existing data sources and computational resources in deciphering genetic bases of human inherited diseases. © The Author(s) 2016. Published by Oxford University Press.

  16. MetReS, an Efficient Database for Genomic Applications.

    PubMed

    Vilaplana, Jordi; Alves, Rui; Solsona, Francesc; Mateo, Jordi; Teixidó, Ivan; Pifarré, Marc

    2018-02-01

    MetReS (Metabolic Reconstruction Server) is a genomic database that is shared between two software applications that address important biological problems. Biblio-MetReS is a data-mining tool that enables the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the processes of interest and their function. The main goal of this work was to identify the areas where the performance of the MetReS database performance could be improved and to test whether this improvement would scale to larger datasets and more complex types of analysis. The study was started with a relational database, MySQL, which is the current database server used by the applications. We also tested the performance of an alternative data-handling framework, Apache Hadoop. Hadoop is currently used for large-scale data processing. We found that this data handling framework is likely to greatly improve the efficiency of the MetReS applications as the dataset and the processing needs increase by several orders of magnitude, as expected to happen in the near future.

  17. MaizeGDB: The Maize Genetics and Genomics Database.

    PubMed

    Harper, Lisa; Gardiner, Jack; Andorf, Carson; Lawrence, Carolyn J

    2016-01-01

    MaizeGDB is the community database for biological information about the crop plant Zea mays. Genomic, genetic, sequence, gene product, functional characterization, literature reference, and person/organization contact information are among the datatypes stored at MaizeGDB. At the project's website ( http://www.maizegdb.org ) are custom interfaces enabling researchers to browse data and to seek out specific information matching explicit search criteria. In addition, pre-compiled reports are made available for particular types of data and bulletin boards are provided to facilitate communication and coordination among members of the community of maize geneticists.

  18. Comparative genomics and transcriptome analysis of Aspergillus niger and metabolic engineering for citrate production

    PubMed Central

    Yin, Xian; Shin, Hyun-dong; Li, Jianghua; Du, Guocheng; Liu, Long; Chen, Jian

    2017-01-01

    Despite a long and successful history of citrate production in Aspergillus niger, the molecular mechanism of citrate accumulation is only partially understood. In this study, we used comparative genomics and transcriptome analysis of citrate-producing strains—namely, A. niger H915-1 (citrate titer: 157 g L−1), A1 (117 g L−1), and L2 (76 g L−1)—to gain a genome-wide view of the mechanism of citrate accumulation. Compared with A. niger A1 and L2, A. niger H915-1 contained 92 mutated genes, including a succinate-semialdehyde dehydrogenase in the γ-aminobutyric acid shunt pathway and an aconitase family protein involved in citrate synthesis. Furthermore, transcriptome analysis of A. niger H915-1 revealed that the transcription levels of 479 genes changed between the cell growth stage (6 h) and the citrate synthesis stage (12 h, 24 h, 36 h, and 48 h). In the glycolysis pathway, triosephosphate isomerase was up-regulated, whereas pyruvate kinase was down-regulated. Two cytosol ATP-citrate lyases, which take part in the cycle of citrate synthesis, were up-regulated, and may coordinate with the alternative oxidases in the alternative respiratory pathway for energy balance. Finally, deletion of the oxaloacetate acetylhydrolase gene in H915-1 eliminated oxalate formation but neither influence on pH decrease nor difference in citrate production were observed. PMID:28106122

  19. Construction of Pará rubber tree genome and multi-transcriptome database accelerates rubber researches.

    PubMed

    Makita, Yuko; Kawashima, Mika; Lau, Nyok Sean; Othman, Ahmad Sofiman; Matsui, Minami

    2018-01-19

    Natural rubber is an economically important material. Currently the Pará rubber tree, Hevea brasiliensis is the main commercial source. Little is known about rubber biosynthesis at the molecular level. Next-generation sequencing (NGS) technologies brought draft genomes of three rubber cultivars and a variety of RNA sequencing (RNA-seq) data. However, no current genome or transcriptome databases (DB) are organized by gene. A gene-oriented database is a valuable support for rubber research. Based on our original draft genome sequence of H. brasiliensis RRIM600, we constructed a rubber tree genome and transcriptome DB. Our DB provides genome information including gene functional annotations and multi-transcriptome data of RNA-seq, full-length cDNAs including PacBio Isoform sequencing (Iso-Seq), ESTs and genome wide transcription start sites (TSSs) derived from CAGE technology. Using our original and publically available RNA-seq data, we calculated co-expressed genes for identifying functionally related gene sets and/or genes regulated by the same transcription factor (TF). Users can access multi-transcriptome data through both a gene-oriented web page and a genome browser. For the gene searching system, we provide keyword search, sequence homology search and gene expression search; users can also select their expression threshold easily. The rubber genome and transcriptome DB provides rubber tree genome sequence and multi-transcriptomics data. This DB is useful for comprehensive understanding of the rubber transcriptome. This will assist both industrial and academic researchers for rubber and economically important close relatives such as R. communis, M. esculenta and J. curcas. The Rubber Transcriptome DB release 2017.03 is accessible at http://matsui-lab.riken.jp/rubber/ .

  20. On the way toward systems biology of Aspergillus fumigatus infection.

    PubMed

    Albrecht, Daniela; Kniemeyer, Olaf; Mech, Franziska; Gunzer, Matthias; Brakhage, Axel; Guthke, Reinhard

    2011-06-01

    Pathogenicity of Aspergillus fumigatus is multifactorial. Thus, global studies are essential for the understanding of the infection process. Therefore, a data warehouse was established where genome sequence, transcriptome and proteome data are stored. These data are analyzed for the elucidation of virulence determinants. The data analysis workflow starts with pre-processing including imputing of missing values and normalization. Last step is the identification of differentially expressed genes/proteins as interesting candidates for further analysis, in particular for functional categorization and correlation studies. Sequence data and other prior knowledge extracted from databases are integrated to support the inference of gene regulatory networks associated with pathogenicity. This knowledge-assisted data analysis aims at establishing mathematical models with predictive strength to assist further experimental work. Recently, first steps were done to extend the integrative data analysis and computational modeling by evaluating spatio-temporal data (movies) that monitor interactions of A. fumigatus morphotypes (e.g. conidia) with host immune cells. Copyright © 2011 Elsevier GmbH. All rights reserved.

  1. The phytophthora genome initiative database: informatics and analysis for distributed pathogenomic research.

    PubMed

    Waugh, M; Hraber, P; Weller, J; Wu, Y; Chen, G; Inman, J; Kiphart, D; Sobral, B

    2000-01-01

    The Phytophthora Genome Initiative (PGI) is a distributed collaboration to study the genome and evolution of a particularly destructive group of plant pathogenic oomycete, with the goal of understanding the mechanisms of infection and resistance. NCGR provides informatics support for the collaboration as well as a centralized data repository. In the pilot phase of the project, several investigators prepared Phytophthora infestans and Phytophthora sojae EST and Phytophthora sojae BAC libraries and sent them to another laboratory for sequencing. Data from sequencing reactions were transferred to NCGR for analysis and curation. An analysis pipeline transforms raw data by performing simple analyses (i.e., vector removal and similarity searching) that are stored and can be retrieved by investigators using a web browser. Here we describe the database and access tools, provide an overview of the data therein and outline future plans. This resource has provided a unique opportunity for the distributed, collaborative study of a genus from which relatively little sequence data are available. Results may lead to insight into how better to control these pathogens. The homepage of PGI can be accessed at http:www.ncgr.org/pgi, with database access through the database access hyperlink.

  2. HpBase: A genome database of a sea urchin, Hemicentrotus pulcherrimus.

    PubMed

    Kinjo, Sonoko; Kiyomoto, Masato; Yamamoto, Takashi; Ikeo, Kazuho; Yaguchi, Shunsuke

    2018-04-01

    To understand the mystery of life, it is important to accumulate genomic information for various organisms because the whole genome encodes the commands for all the genes. Since the genome of Strongylocentrotus purpratus was sequenced in 2006 as the first sequenced genome in echinoderms, the genomic resources of other North American sea urchins have gradually been accumulated, but no sea urchin genomes are available in other areas, where many scientists have used the local species and reported important results. In this manuscript, we report a draft genome of the sea urchin Hemincentrotus pulcherrimus because this species has a long history as the target of developmental and cell biology in East Asia. The genome of H. pulcherrimus was assembled into 16,251 scaffold sequences with an N50 length of 143 kbp, and approximately 25,000 genes were identified in the genome. The size of the genome and the sequencing coverage were estimated to be approximately 800 Mbp and 100×, respectively. To provide these data and information of annotation, we constructed a database, HpBase (http://cell-innovation.nig.ac.jp/Hpul/). In HpBase, gene searches, genome browsing, and blast searches are available. In addition, HpBase includes the "recipes" for experiments from each lab using H. pulcherrimus. These recipes will continue to be updated according to the circumstances of individual scientists and can be powerful tools for experimental biologists and for the community. HpBase is a suitable dataset for evolutionary, developmental, and cell biologists to compare H. pulcherrimus genomic information with that of other species and to isolate gene information. © 2018 Japanese Society of Developmental Biologists.

  3. Detection of alternative splice variants at the proteome level in Aspergillus flavus.

    PubMed

    Chang, Kung-Yen; Georgianna, D Ryan; Heber, Steffen; Payne, Gary A; Muddiman, David C

    2010-03-05

    Identification of proteins from proteolytic peptides or intact proteins plays an essential role in proteomics. Researchers use search engines to match the acquired peptide sequences to the target proteins. However, search engines depend on protein databases to provide candidates for consideration. Alternative splicing (AS), the mechanism where the exon of pre-mRNAs can be spliced and rearranged to generate distinct mRNA and therefore protein variants, enable higher eukaryotic organisms, with only a limited number of genes, to have the requisite complexity and diversity at the proteome level. Multiple alternative isoforms from one gene often share common segments of sequences. However, many protein databases only include a limited number of isoforms to keep minimal redundancy. As a result, the database search might not identify a target protein even with high quality tandem MS data and accurate intact precursor ion mass. We computationally predicted an exhaustive list of putative isoforms of Aspergillus flavus proteins from 20 371 expressed sequence tags to investigate whether an alternative splicing protein database can assign a greater proportion of mass spectrometry data. The newly constructed AS database provided 9807 new alternatively spliced variants in addition to 12 832 previously annotated proteins. The searches of the existing tandem MS spectra data set using the AS database identified 29 new proteins encoded by 26 genes. Nine fungal genes appeared to have multiple protein isoforms. In addition to the discovery of splice variants, AS database also showed potential to improve genome annotation. In summary, the introduction of an alternative splicing database helps identify more proteins and unveils more information about a proteome.

  4. Integrated database for identifying candate genes for Aspergillus flavus resistance in maize

    USDA-ARS?s Scientific Manuscript database

    Aspergillus flavus Link:Fr, an opportunistic fungus that produces aflatoxin, is pathogenic to maize and other oilseed crops. Aflatoxin is a potent carcinogen, and its presence markedly reduces the value of grain. Understanding and enhancing host resistance to A. flavus infection and/or subsequent af...

  5. Gee Fu: a sequence version and web-services database tool for genomic assembly, genome feature and NGS data.

    PubMed

    Ramirez-Gonzalez, Ricardo; Caccamo, Mario; MacLean, Daniel

    2011-10-01

    Scientists now use high-throughput sequencing technologies and short-read assembly methods to create draft genome assemblies in just days. Tools and pipelines like the assembler, and the workflow management environments make it easy for a non-specialist to implement complicated pipelines to produce genome assemblies and annotations very quickly. Such accessibility results in a proliferation of assemblies and associated files, often for many organisms. These assemblies get used as a working reference by lots of different workers, from a bioinformatician doing gene prediction or a bench scientist designing primers for PCR. Here we describe Gee Fu, a database tool for genomic assembly and feature data, including next-generation sequence alignments. Gee Fu is an instance of a Ruby-On-Rails web application on a feature database that provides web and console interfaces for input, visualization of feature data via AnnoJ, access to data through a web-service interface, an API for direct data access by Ruby scripts and access to feature data stored in BAM files. Gee Fu provides a platform for storing and sharing different versions of an assembly and associated features that can be accessed and updated by bench biologists and bioinformaticians in ways that are easy and useful for each. http://tinyurl.com/geefu dan.maclean@tsl.ac.uk.

  6. Random Mutagenesis of the Aspergillus oryzae Genome Results in Fungal Antibacterial Activity

    PubMed Central

    Leonard, Cory A.; Brown, Stacy D.; Hayman, J. Russell

    2013-01-01

    Multidrug-resistant bacteria cause severe infections in hospitals and communities. Development of new drugs to combat resistant microorganisms is needed. Natural products of microbial origin are the source of most currently available antibiotics. We hypothesized that random mutagenesis of Aspergillus oryzae would result in secretion of antibacterial compounds. To address this hypothesis, we developed a screen to identify individual A. oryzae mutants that inhibit the growth of Methicillin-resistant Staphylococcus aureus (MRSA) in vitro. To randomly generate A. oryzae mutant strains, spores were treated with ethyl methanesulfonate (EMS). Over 3000 EMS-treated A. oryzae cultures were tested in the screen, and one isolate, CAL220, exhibited altered morphology and antibacterial activity. Culture supernatant from this isolate showed antibacterial activity against Methicillin-sensitive Staphylococcus aureus, MRSA, and Pseudomonas aeruginosa, but not Klebsiella pneumonia or Proteus vulgaris. The results of this study support our hypothesis and suggest that the screen used is sufficient and appropriate to detect secreted antibacterial fungal compounds resulting from mutagenesis of A. oryzae. Because the genome of A. oryzae has been sequenced and systems are available for genetic transformation of this organism, targeted as well as random mutations may be introduced to facilitate the discovery of novel antibacterial compounds using this system. PMID:23983696

  7. Random Mutagenesis of the Aspergillus oryzae Genome Results in Fungal Antibacterial Activity.

    PubMed

    Leonard, Cory A; Brown, Stacy D; Hayman, J Russell

    2013-01-01

    Multidrug-resistant bacteria cause severe infections in hospitals and communities. Development of new drugs to combat resistant microorganisms is needed. Natural products of microbial origin are the source of most currently available antibiotics. We hypothesized that random mutagenesis of Aspergillus oryzae would result in secretion of antibacterial compounds. To address this hypothesis, we developed a screen to identify individual A. oryzae mutants that inhibit the growth of Methicillin-resistant Staphylococcus aureus (MRSA) in vitro. To randomly generate A. oryzae mutant strains, spores were treated with ethyl methanesulfonate (EMS). Over 3000 EMS-treated A. oryzae cultures were tested in the screen, and one isolate, CAL220, exhibited altered morphology and antibacterial activity. Culture supernatant from this isolate showed antibacterial activity against Methicillin-sensitive Staphylococcus aureus, MRSA, and Pseudomonas aeruginosa, but not Klebsiella pneumonia or Proteus vulgaris. The results of this study support our hypothesis and suggest that the screen used is sufficient and appropriate to detect secreted antibacterial fungal compounds resulting from mutagenesis of A. oryzae. Because the genome of A. oryzae has been sequenced and systems are available for genetic transformation of this organism, targeted as well as random mutations may be introduced to facilitate the discovery of novel antibacterial compounds using this system.

  8. Genomics of Aspergillus oryzae: Learning from the History of Koji Mold and Exploration of Its Future

    PubMed Central

    Machida, Masayuki; Yamada, Osamu; Gomi, Katsuya

    2008-01-01

    At a time when the notion of microorganisms did not exist, our ancestors empirically established methods for the production of various fermentation foods: miso (bean curd seasoning) and shoyu (soy sauce), both of which have been widely used and are essential for Japanese cooking, and sake, a magical alcoholic drink consumed at a variety of ritual occasions, are typical examples. A filamentous fungus, Aspergillus oryzae, is the key organism in the production of all these traditional foods, and its solid-state cultivation (SSC) has been confirmed to be the secret for the high productivity of secretory hydrolases vital for the fermentation process. Indeed, our genome comparison and transcriptome analysis uncovered mechanisms for effective degradation of raw materials in SSC: the extracellular hydrolase genes that have been found only in the A. oryzae genome but not in A. fumigatus are highly induced during SSC but not in liquid cultivation. Also, the temperature reduction process empirically adopted in the traditional soy-sauce fermentation processes has been found to be important to keep strong expression of the A. oryzae-specific extracellular hydrolases. One of the prominent potentials of A. oryzae is that it has been successfully applied to effective degradation of biodegradable plastic. Both cutinase, responsible for the degradation of plastic, and hydrophobin, which recruits cutinase on the hydrophobic surface to enhance degradation, have been discovered in A. oryzae. Genomic analysis in concert with traditional knowledge and technology will continue to be powerful tools in the future exploration of A. oryzae. PMID:18820080

  9. Heterologous and endogenous U6 snRNA promoters enable CRISPR/Cas9 mediated genome editing in Aspergillus niger.

    PubMed

    Zheng, Xiaomei; Zheng, Ping; Sun, Jibin; Kun, Zhang; Ma, Yanhe

    2018-01-01

    U6 promoters have been used for single guide RNA (sgRNA) transcription in the clustered regularly interspaced short palindromic repeats/CRISPR-associated protein (CRISPR/Cas9) genome editing system. However, no available U6 promoters have been identified in Aspergillus niger, which is an important industrial platform for organic acid and protein production. Two CRISPR/Cas9 systems established in A. niger have recourse to the RNA polymerase II promoter or in vitro transcription for sgRNA synthesis, but these approaches generally increase cloning efforts and genetic manipulation. The validation of functional RNA polymerase II promoters is therefore an urgent need for A. niger . Here, we developed a novel CRISPR/Cas9 system in A. niger for sgRNA expression, based on one endogenous U6 promoter and two heterologous U6 promoters. The three tested U6 promoters enabled sgRNA transcription and the disruption of the polyketide synthase albA gene in A. niger . Furthermore, this system enabled highly efficient gene insertion at the targeted genome loci in A. niger using donor DNAs with homologous arms as short as 40-bp. This study demonstrated that both heterologous and endogenous U6 promoters were functional for sgRNA expression in A. niger . Based on this result, a novel and simple CRISPR/Cas9 toolbox was established in A. niger, that will benefit future gene functional analysis and genome editing.

  10. PGG.Population: a database for understanding the genomic diversity and genetic ancestry of human populations

    PubMed Central

    Zhang, Chao; Gao, Yang; Liu, Jiaojiao; Xue, Zhe; Lu, Yan; Deng, Lian; Tian, Lei; Feng, Qidi

    2018-01-01

    Abstract There are a growing number of studies focusing on delineating genetic variations that are associated with complex human traits and diseases due to recent advances in next-generation sequencing technologies. However, identifying and prioritizing disease-associated causal variants relies on understanding the distribution of genetic variations within and among populations. The PGG.Population database documents 7122 genomes representing 356 global populations from 107 countries and provides essential information for researchers to understand human genomic diversity and genetic ancestry. These data and information can facilitate the design of research studies and the interpretation of results of both evolutionary and medical studies involving human populations. The database is carefully maintained and constantly updated when new data are available. We included miscellaneous functions and a user-friendly graphical interface for visualization of genomic diversity, population relationships (genetic affinity), ancestral makeup, footprints of natural selection, and population history etc. Moreover, PGG.Population provides a useful feature for users to analyze data and visualize results in a dynamic style via online illustration. The long-term ambition of the PGG.Population, together with the joint efforts from other researchers who contribute their data to our database, is to create a comprehensive depository of geographic and ethnic variation of human genome, as well as a platform bringing influence on future practitioners of medicine and clinical investigators. PGG.Population is available at https://www.pggpopulation.org. PMID:29112749

  11. Aspergillus: introduction

    USDA-ARS?s Scientific Manuscript database

    Species in the genus Aspergillus possess versatile metabolic activities that impact our daily life both positively and negatively. Aspergillus flavus and Aspergillus oryzae are closely related fungi. While the former is able to produce carcinogenic aflatoxins and is an etiological agent of aspergill...

  12. The MaizeGDB Genome Browser Tutorial: One example of database outreach to biologists via video

    USDA-ARS?s Scientific Manuscript database

    Video tutorials are an effective way for researchers to quickly learn how to use online tools offered by biological databases. At the Maize Genetics and Genomics Database (MaizeGDB), we have developed a number of video tutorials that aim to demonstrate how to use various tools as well as to explici...

  13. Motif-independent prediction of a secondary metabolism gene cluster using comparative genomics: application to sequenced genomes of Aspergillus and ten other filamentous fungal species.

    PubMed

    Takeda, Itaru; Umemura, Myco; Koike, Hideaki; Asai, Kiyoshi; Machida, Masayuki

    2014-08-01

    Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  14. Integrated Database And Knowledge Base For Genomic Prospective Cohort Study In Tohoku Medical Megabank Toward Personalized Prevention And Medicine.

    PubMed

    Ogishima, Soichi; Takai, Takako; Shimokawa, Kazuro; Nagaie, Satoshi; Tanaka, Hiroshi; Nakaya, Jun

    2015-01-01

    The Tohoku Medical Megabank project is a national project to revitalization of the disaster area in the Tohoku region by the Great East Japan Earthquake, and have conducted large-scale prospective genome-cohort study. Along with prospective genome-cohort study, we have developed integrated database and knowledge base which will be key database for realizing personalized prevention and medicine.

  15. MBGD update 2013: the microbial genome database for exploring the diversity of microbial world.

    PubMed

    Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu

    2013-01-01

    The microbial genome database for comparative analysis (MBGD, available at http://mbgd.genome.ad.jp/) is a platform for microbial genome comparison based on orthology analysis. As its unique feature, MBGD allows users to conduct orthology analysis among any specified set of organisms; this flexibility allows MBGD to adapt to a variety of microbial genomic study. Reflecting the huge diversity of microbial world, the number of microbial genome projects now becomes several thousands. To efficiently explore the diversity of the entire microbial genomic data, MBGD now provides summary pages for pre-calculated ortholog tables among various taxonomic groups. For some closely related taxa, MBGD also provides the conserved synteny information (core genome alignment) pre-calculated using the CoreAligner program. In addition, efficient incremental updating procedure can create extended ortholog table by adding additional genomes to the default ortholog table generated from the representative set of genomes. Combining with the functionalities of the dynamic orthology calculation of any specified set of organisms, MBGD is an efficient and flexible tool for exploring the microbial genome diversity.

  16. LDSplitDB: a database for studies of meiotic recombination hotspots in MHC using human genomic data.

    PubMed

    Guo, Jing; Chen, Hao; Yang, Peng; Lee, Yew Ti; Wu, Min; Przytycka, Teresa M; Kwoh, Chee Keong; Zheng, Jie

    2018-04-20

    Meiotic recombination happens during the process of meiosis when chromosomes inherited from two parents exchange genetic materials to generate chromosomes in the gamete cells. The recombination events tend to occur in narrow genomic regions called recombination hotspots. Its dysregulation could lead to serious human diseases such as birth defects. Although the regulatory mechanism of recombination events is still unclear, DNA sequence polymorphisms have been found to play crucial roles in the regulation of recombination hotspots. To facilitate the studies of the underlying mechanism, we developed a database named LDSplitDB which provides an integrative and interactive data mining and visualization platform for the genome-wide association studies of recombination hotspots. It contains the pre-computed association maps of the major histocompatibility complex (MHC) region in the 1000 Genomes Project and the HapMap Phase III datasets, and a genome-scale study of the European population from the HapMap Phase II dataset. Besides the recombination profiles, related data of genes, SNPs and different types of epigenetic modifications, which could be associated with meiotic recombination, are provided for comprehensive analysis. To meet the computational requirement of the rapidly increasing population genomics data, we prepared a lookup table of 400 haplotypes for recombination rate estimation using the well-known LDhat algorithm which includes all possible two-locus haplotype configurations. To the best of our knowledge, LDSplitDB is the first large-scale database for the association analysis of human recombination hotspots with DNA sequence polymorphisms. It provides valuable resources for the discovery of the mechanism of meiotic recombination hotspots. The information about MHC in this database could help understand the roles of recombination in human immune system. DATABASE URL: http://histone.scse.ntu.edu.sg/LDSplitDB.

  17. Update on Genomic Databases and Resources at the National Center for Biotechnology Information.

    PubMed

    Tatusova, Tatiana

    2016-01-01

    The National Center for Biotechnology Information (NCBI), as a primary public repository of genomic sequence data, collects and maintains enormous amounts of heterogeneous data. Data for genomes, genes, gene expressions, gene variation, gene families, proteins, and protein domains are integrated with the analytical, search, and retrieval resources through the NCBI website, text-based search and retrieval system, provides a fast and easy way to navigate across diverse biological databases.Comparative genome analysis tools lead to further understanding of evolution processes quickening the pace of discovery. Recent technological innovations have ignited an explosion in genome sequencing that has fundamentally changed our understanding of the biology of living organisms. This huge increase in DNA sequence data presents new challenges for the information management system and the visualization tools. New strategies have been designed to bring an order to this genome sequence shockwave and improve the usability of associated data.

  18. Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements

    PubMed Central

    Mukherjee, Supratim; Stamatis, Dimitri; Bertsch, Jon; Ovchinnikova, Galina; Verezemska, Olena; Isbandi, Michelle; Thomas, Alex D.; Ali, Rida; Sharma, Kaushal; Kyrpides, Nikos C.; Reddy, T. B. K.

    2017-01-01

    The Genomes Online Database (GOLD) (https://gold.jgi.doe.gov) is a manually curated data management system that catalogs sequencing projects with associated metadata from around the world. In the current version of GOLD (v.6), all projects are organized based on a four level classification system in the form of a Study, Organism (for isolates) or Biosample (for environmental samples), Sequencing Project and Analysis Project. Currently, GOLD provides information for 26 117 Studies, 239 100 Organisms, 15 887 Biosamples, 97 212 Sequencing Projects and 78 579 Analysis Projects. These are integrated with over 312 metadata fields from which 58 are controlled vocabularies with 2067 terms. The web interface facilitates submission of a diverse range of Sequencing Projects (such as isolate genome, single-cell genome, metagenome, metatranscriptome) and complex Analysis Projects (such as genome from metagenome, or combined assembly from multiple Sequencing Projects). GOLD provides a seamless interface with the Integrated Microbial Genomes (IMG) system and supports and promotes the Genomic Standards Consortium (GSC) Minimum Information standards. This paper describes the data updates and additional features added during the last two years. PMID:27794040

  19. Exploring human disease using the Rat Genome Database.

    PubMed

    Shimoyama, Mary; Laulederkind, Stanley J F; De Pons, Jeff; Nigam, Rajni; Smith, Jennifer R; Tutaj, Marek; Petri, Victoria; Hayman, G Thomas; Wang, Shur-Jen; Ghiasvand, Omid; Thota, Jyothi; Dwinell, Melinda R

    2016-10-01

    Rattus norvegicus, the laboratory rat, has been a crucial model for studies of the environmental and genetic factors associated with human diseases for over 150 years. It is the primary model organism for toxicology and pharmacology studies, and has features that make it the model of choice in many complex-disease studies. Since 1999, the Rat Genome Database (RGD; http://rgd.mcw.edu) has been the premier resource for genomic, genetic, phenotype and strain data for the laboratory rat. The primary role of RGD is to curate rat data and validate orthologous relationships with human and mouse genes, and make these data available for incorporation into other major databases such as NCBI, Ensembl and UniProt. RGD also provides official nomenclature for rat genes, quantitative trait loci, strains and genetic markers, as well as unique identifiers. The RGD team adds enormous value to these basic data elements through functional and disease annotations, the analysis and visual presentation of pathways, and the integration of phenotype measurement data for strains used as disease models. Because much of the rat research community focuses on understanding human diseases, RGD provides a number of datasets and software tools that allow users to easily explore and make disease-related connections among these datasets. RGD also provides comprehensive human and mouse data for comparative purposes, illustrating the value of the rat in translational research. This article introduces RGD and its suite of tools and datasets to researchers - within and beyond the rat community - who are particularly interested in leveraging rat-based insights to understand human diseases. © 2016. Published by The Company of Biologists Ltd.

  20. Characteristic clinical features of Aspergillus appendicitis: Case report and literature review.

    PubMed

    Gjeorgjievski, Mihajlo; Amin, Mitual B; Cappell, Mitchell S

    2015-11-28

    This work aims to facilitate diagnosing Aspergillus appendicitis, which can be missed clinically due to its rarity, by proposing a clinical pentad for Aspergillus appendicitis based on literature review and one new case. The currently reported case of pathologically-proven Aspergillus appendicitis was identified by computerized search of pathology database at William Beaumont Hospital, 1999-2014. Prior cases were identified by computerized literature search. Among 10980 pathology reports of pathologically-proven appendicitis, one case of Aspergillus appendicitis was identified (rate = 0.01%). A young boy with profound neutropenia, recent chemotherapy, and acute myelogenous leukemia presented with right lower quadrant pain, pyrexia, and generalized malaise. Abdominal computed tomography scan showed a thickened appendiceal wall and periappendiceal inflammation, suggesting appendicitis. Emergent laparotomy showed an inflamed, thickened appendix, which was resected. The patient did poorly postoperatively with low-grade-fevers while receiving antibacterial therapy, but rapidly improved after initiating amphotericin therapy. Microscopic examination of a silver stain of the appendectomy specimen revealed fungi with characteristic Aspergillus morphology, findings confirmed by immunohistochemistry. Primary Aspergillus appendicitis is exceptionally rare, with only 3 previously reported cases. All three cases presented with (1)-neutropenia, (2)-recent chemotherapy, (3)-acute leukemia, and (4)-suspected appendicitis; (5)-the two prior cases initially treated with antibacterial therapy, fared poorly before instituting anti-Aspergillus therapy. The current patient satisfied all these five criteria. Based on these four cases, a clinical pentad is proposed for Aspergillus appendicitis: clinically-suspected appendicitis, neutropenia, recent chemotherapy, acute leukemia, and poor clinical response if treated solely by antibacterial/anti-candidial therapy. Patients presenting with

  1. EUCANEXT: an integrated database for the exploration of genomic and transcriptomic data from Eucalyptus species

    PubMed Central

    Nascimento, Leandro Costa; Salazar, Marcela Mendes; Lepikson-Neto, Jorge; Camargo, Eduardo Leal Oliveira; Parreiras, Lucas Salera; Carazzolle, Marcelo Falsarella

    2017-01-01

    Abstract Tree species of the genus Eucalyptus are the most valuable and widely planted hardwoods in the world. Given the economic importance of Eucalyptus trees, much effort has been made towards the generation of specimens with superior forestry properties that can deliver high-quality feedstocks, customized to the industrýs needs for both cellulosic (paper) and lignocellulosic biomass production. In line with these efforts, large sets of molecular data have been generated by several scientific groups, providing invaluable information that can be applied in the development of improved specimens. In order to fully explore the potential of available datasets, the development of a public database that provides integrated access to genomic and transcriptomic data from Eucalyptus is needed. EUCANEXT is a database that analyses and integrates publicly available Eucalyptus molecular data, such as the E. grandis genome assembly and predicted genes, ESTs from several species and digital gene expression from 26 RNA-Seq libraries. The database has been implemented in a Fedora Linux machine running MySQL and Apache, while Perl CGI was used for the web interfaces. EUCANEXT provides a user-friendly web interface for easy access and analysis of publicly available molecular data from Eucalyptus species. This integrated database allows for complex searches by gene name, keyword or sequence similarity and is publicly accessible at http://www.lge.ibi.unicamp.br/eucalyptusdb. Through EUCANEXT, users can perform complex analysis to identify genes related traits of interest using RNA-Seq libraries and tools for differential expression analysis. Moreover, all the bioinformatics pipeline here described, including the database schema and PERL scripts, are readily available and can be applied to any genomic and transcriptomic project, regardless of the organism. Database URL: http://www.lge.ibi.unicamp.br/eucalyptusdb PMID:29220468

  2. Evaluation of relational and NoSQL database architectures to manage genomic annotations.

    PubMed

    Schulz, Wade L; Nelson, Brent G; Felker, Donn K; Durant, Thomas J S; Torres, Richard

    2016-12-01

    While the adoption of next generation sequencing has rapidly expanded, the informatics infrastructure used to manage the data generated by this technology has not kept pace. Historically, relational databases have provided much of the framework for data storage and retrieval. Newer technologies based on NoSQL architectures may provide significant advantages in storage and query efficiency, thereby reducing the cost of data management. But their relative advantage when applied to biomedical data sets, such as genetic data, has not been characterized. To this end, we compared the storage, indexing, and query efficiency of a common relational database (MySQL), a document-oriented NoSQL database (MongoDB), and a relational database with NoSQL support (PostgreSQL). When used to store genomic annotations from the dbSNP database, we found the NoSQL architectures to outperform traditional, relational models for speed of data storage, indexing, and query retrieval in nearly every operation. These findings strongly support the use of novel database technologies to improve the efficiency of data management within the biological sciences. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. Clustered array of ochratoxin A biosynthetic genes in Aspergillus steynii and their expression patterns in permissive conditions.

    PubMed

    Gil-Serna, Jessica; Vázquez, Covadonga; González-Jaén, María Teresa; Patiño, Belén

    2015-12-02

    Aspergillus steynii is probably the most relevant species of section Circumdati producing ochratoxin A (OTA). This mycotoxin contaminates a wide number of commodities and it is highly toxic for humans and animals. Little is known on the biosynthetic genes and their regulation in Aspergillus species. In this work, we identified and analysed three contiguous genes in A. steynii using 5'-RACE and genome walking approaches which predicted a cytochrome P450 monooxygenase (p450ste), a non-ribosomal peptide synthetase (nrpsste) and a polyketide synthase (pksste). These three genes were contiguous within a 20742 bp long genomic DNA fragment. Their corresponding cDNA were sequenced and their expression was analysed in three A. steynii strains using real time RT-PCR specific assays in permissive conditions in in vitro cultures. OTA was also analysed in these cultures. Comparative analyses of predicted genomic, cDNA and amino acid sequences were performed with sequences of similar gene functions. All the results obtained in these analyses were consistent and point out the involvement of these three genes in OTA biosynthesis by A. steynii and showed a co-ordinated expression pattern. This is the first time that a clustered organization OTA biosynthetic genes has been reported in Aspergillus genus. The results also suggested that this situation might be common in Aspergillus OTA-producing species and distinct to the one described for Penicillium species. Copyright © 2015 Elsevier B.V. All rights reserved.

  4. TOPSAN: a dynamic web database for structural genomics.

    PubMed

    Ellrott, Kyle; Zmasek, Christian M; Weekes, Dana; Sri Krishna, S; Bakolitsa, Constantina; Godzik, Adam; Wooley, John

    2011-01-01

    The Open Protein Structure Annotation Network (TOPSAN) is a web-based collaboration platform for exploring and annotating structures determined by structural genomics efforts. Characterization of those structures presents a challenge since the majority of the proteins themselves have not yet been characterized. Responding to this challenge, the TOPSAN platform facilitates collaborative annotation and investigation via a user-friendly web-based interface pre-populated with automatically generated information. Semantic web technologies expand and enrich TOPSAN's content through links to larger sets of related databases, and thus, enable data integration from disparate sources and data mining via conventional query languages. TOPSAN can be found at http://www.topsan.org.

  5. Comparison of the genomic sequence of the microminipig, a novel breed of swine, with the genomic database for conventional pig.

    PubMed

    Miura, Naoki; Kucho, Ken-Ichi; Noguchi, Michiko; Miyoshi, Noriaki; Uchiumi, Toshiki; Kawaguchi, Hiroaki; Tanimoto, Akihide

    2014-01-01

    The microminipig, which weighs less than 10 kg at an early stage of maturity, has been reported as a potential experimental model animal. Its extremely small size and other distinct characteristics suggest the possibility of a number of differences between the genome of the microminipig and that of conventional pigs. In this study, we analyzed the genomes of two healthy microminipigs using a next-generation sequencer SOLiD™ system. We then compared the obtained genomic sequences with a genomic database for the domestic pig (Sus scrofa). The mapping coverage of sequenced tag from the microminipig to conventional pig genomic sequences was greater than 96% and we detected no clear, substantial genomic variance from these data. The results may indicate that the distinct characteristics of the microminipig derive from small-scale alterations in the genome, such as Single Nucleotide Polymorphisms or translational modifications, rather than large-scale deletion or insertion polymorphisms. Further investigation of the entire genomic sequence of the microminipig with methods enabling deeper coverage is required to elucidate the genetic basis of its distinct phenotypic traits. Copyright © 2014 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved.

  6. PGG.Population: a database for understanding the genomic diversity and genetic ancestry of human populations.

    PubMed

    Zhang, Chao; Gao, Yang; Liu, Jiaojiao; Xue, Zhe; Lu, Yan; Deng, Lian; Tian, Lei; Feng, Qidi; Xu, Shuhua

    2018-01-04

    There are a growing number of studies focusing on delineating genetic variations that are associated with complex human traits and diseases due to recent advances in next-generation sequencing technologies. However, identifying and prioritizing disease-associated causal variants relies on understanding the distribution of genetic variations within and among populations. The PGG.Population database documents 7122 genomes representing 356 global populations from 107 countries and provides essential information for researchers to understand human genomic diversity and genetic ancestry. These data and information can facilitate the design of research studies and the interpretation of results of both evolutionary and medical studies involving human populations. The database is carefully maintained and constantly updated when new data are available. We included miscellaneous functions and a user-friendly graphical interface for visualization of genomic diversity, population relationships (genetic affinity), ancestral makeup, footprints of natural selection, and population history etc. Moreover, PGG.Population provides a useful feature for users to analyze data and visualize results in a dynamic style via online illustration. The long-term ambition of the PGG.Population, together with the joint efforts from other researchers who contribute their data to our database, is to create a comprehensive depository of geographic and ethnic variation of human genome, as well as a platform bringing influence on future practitioners of medicine and clinical investigators. PGG.Population is available at https://www.pggpopulation.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. Cpf1-Database: web-based genome-wide guide RNA library design for gene knockout screens using CRISPR-Cpf1.

    PubMed

    Park, Jeongbin; Bae, Sangsu

    2018-03-15

    Following the type II CRISPR-Cas9 system, type V CRISPR-Cpf1 endonucleases have been found to be applicable for genome editing in various organisms in vivo. However, there are as yet no web-based tools capable of optimally selecting guide RNAs (gRNAs) among all possible genome-wide target sites. Here, we present Cpf1-Database, a genome-wide gRNA library design tool for LbCpf1 and AsCpf1, which have DNA recognition sequences of 5'-TTTN-3' at the 5' ends of target sites. Cpf1-Database provides a sophisticated but simple way to design gRNAs for AsCpf1 nucleases on the genome scale. One can easily access the data using a straightforward web interface, and using the powerful collections feature one can easily design gRNAs for thousands of genes in short time. Free access at http://www.rgenome.net/cpf1-database/. sangsubae@hanyang.ac.kr.

  8. A-WINGS: an integrated genome database for Pleurocybella porrigens (Angel's wing oyster mushroom, Sugihiratake).

    PubMed

    Yamamoto, Naoki; Suzuki, Tomohiro; Kobayashi, Masaaki; Dohra, Hideo; Sasaki, Yohei; Hirai, Hirofumi; Yokoyama, Koji; Kawagishi, Hirokazu; Yano, Kentaro

    2014-12-03

    The angel's wing oyster mushroom (Pleurocybella porrigens, Sugihiratake) is a well-known delicacy. However, its potential risk in acute encephalopathy was recently revealed by a food poisoning incident. To disclose the genes underlying the accident and provide mechanistic insight, we seek to develop an information infrastructure containing omics data. In our previous work, we sequenced the genome and transcriptome using next-generation sequencing techniques. The next step in achieving our goal is to develop a web database to facilitate the efficient mining of large-scale omics data and identification of genes specifically expressed in the mushroom. This paper introduces a web database A-WINGS (http://bioinf.mind.meiji.ac.jp/a-wings/) that provides integrated genomic and transcriptomic information for the angel's wing oyster mushroom. The database contains structure and functional annotations of transcripts and gene expressions. Functional annotations contain information on homologous sequences from NCBI nr and UniProt, Gene Ontology, and KEGG Orthology. Digital gene expression profiles were derived from RNA sequencing (RNA-seq) analysis in the fruiting bodies and mycelia. The omics information stored in the database is freely accessible through interactive and graphical interfaces by search functions that include 'GO TREE VIEW' browsing, keyword searches, and BLAST searches. The A-WINGS database will accelerate omics studies on specific aspects of the angel's wing oyster mushroom and the family Tricholomataceae.

  9. Strategies to explore functional genomics data sets in NCBI's GEO database.

    PubMed

    Wilhite, Stephen E; Barrett, Tanya

    2012-01-01

    The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze, and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries.

  10. Using population mixtures to optimize the utility of genomic databases: linkage disequilibrium and association study design in India.

    PubMed

    Pemberton, T J; Jakobsson, M; Conrad, D F; Coop, G; Wall, J D; Pritchard, J K; Patel, P I; Rosenberg, N A

    2008-07-01

    When performing association studies in populations that have not been the focus of large-scale investigations of haplotype variation, it is often helpful to rely on genomic databases in other populations for study design and analysis - such as in the selection of tag SNPs and in the imputation of missing genotypes. One way of improving the use of these databases is to rely on a mixture of database samples that is similar to the population of interest, rather than using the single most similar database sample. We demonstrate the effectiveness of the mixture approach in the application of African, European, and East Asian HapMap samples for tag SNP selection in populations from India, a genetically intermediate region underrepresented in genomic studies of haplotype variation.

  11. Genome sequence and comparative analyses of atoxigenic Aspergillus flavus WRRL 1519

    USDA-ARS?s Scientific Manuscript database

    Aflatoxins are fungal secondary metabolites that often contaminate foodstuffs and crops, the major producer of which is Aspergillus flavus. Use of non-aflatoxigenic strains of A. flavus to compete against aflatoxin-producing strains has emerged as one of the best management practices for reducing af...

  12. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata.

    PubMed

    Liolios, Konstantinos; Chen, I-Min A; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor M; Kyrpides, Nikos C

    2010-01-01

    The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/

  13. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

    PubMed Central

    Liolios, Konstantinos; Chen, I-Min A.; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor M.; Kyrpides, Nikos C.

    2010-01-01

    The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/ PMID:19914934

  14. Secondary metabolite profiles and antifungal drug susceptibility of Aspergillus fumigatus and closely related species, Aspergillus lentulus, Aspergillus udagawae, and Aspergillus viridinutans.

    PubMed

    Tamiya, Hiroyuki; Ochiai, Eri; Kikuchi, Kazuyo; Yahiro, Maki; Toyotome, Takahito; Watanabe, Akira; Yaguchi, Takashi; Kamei, Katsuhiko

    2015-05-01

    The incidence of Aspergillus infection has been increasing in the past few years. Also, new Aspergillus fumigatus-related species, namely Aspergillus lentulus, Aspergillus udagawae, and Aspergillus viridinutans, were shown to infect humans. These fungi exhibit marked morphological similarities to A. fumigatus, albeit with different clinical courses and antifungal drug susceptibilities. The present study used liquid chromatography/time-of-flight mass spectrometry to identify the secondary metabolites secreted as virulence factors by these Aspergillus species and compared their antifungal susceptibility. The metabolite profiles varied widely among A. fumigatus, A. lentulus, A. udagawae, and A. viridinutans, producing 27, 13, 8, and 11 substances, respectively. Among the mycotoxins, fumifungin, fumiquinazoline A/B and D, fumitremorgin B, gliotoxin, sphingofungins, pseurotins, and verruculogen were only found in A. fumigatus, whereas auranthine was only found in A. lentulus. The amount of gliotoxin, one of the most abundant mycotoxins in A. fumigatus, was negligible in these related species. In addition, they had decreased susceptibility to antifungal agents such as itraconazole and voriconazole, even though metabolites that were shared in the isolates showing higher minimum inhibitory concentrations than epidemiological cutoff values were not detected. These strikingly different secondary metabolite profiles may lead to the development of more discriminative identification protocols for such closely related Aspergillus species as well as improved treatment outcomes. Copyright © 2015 Japanese Society of Chemotherapy and The Japanese Association for Infectious Diseases. Published by Elsevier Ltd. All rights reserved.

  15. Comparative and functional characterization of intragenic tandem repeats in 10 Aspergillus genomes.

    PubMed

    Gibbons, John G; Rokas, Antonis

    2009-03-01

    Intragenic tandem repeats (ITRs) are consecutive repeats of three or more nucleotides found in coding regions. ITRs are the underlying cause of several human genetic diseases and have been associated with phenotypic variation, including pathogenesis, in several clades of the tree of life. We have examined the evolution and functional role of ITRs in 10 genomes spanning the fungal genus Aspergillus, a clade of relevance to medicine, agriculture, and industry. We identified several hundred ITRs in each of the species examined. ITR content varied extensively between species, with an average 79% of ITRs unique to a given species. For the fraction of conserved ITR regions, sequence comparisons within species and between close relatives revealed that they were highly variable. ITR-containing proteins were evolutionarily less conserved, compositionally distinct, and overrepresented for domains associated with cell-surface localization and function relative to the rest of the proteome. Furthermore, ITRs were preferentially found in proteins involved in transcription, cellular communication, and cell-type differentiation but were underrepresented in proteins involved in metabolism and energy. Importantly, although ITRs were evolutionarily labile, their functional associations appeared. To be remarkably conserved across eukaryotes. Fungal ITRs likely participate in a variety of developmental processes and cell-surface-associated functions, suggesting that their contribution to fungal lifestyle and evolution may be more general than previously assumed.

  16. Characterization of a feruloyl esterase from Aspergillus terreus facilitates the division of fungal enzymes from Carbohydrate Esterase family 1 of the carbohydrate-active enzymes (CAZy) database.

    PubMed

    Mäkelä, Miia R; Dilokpimol, Adiphol; Koskela, Salla M; Kuuskeri, Jaana; de Vries, Ronald P; Hildén, Kristiina

    2018-04-26

    Feruloyl esterases (FAEs) are accessory enzymes for plant biomass degradation, which catalyse hydrolysis of carboxylic ester linkages between hydroxycinnamic acids and plant cell-wall carbohydrates. They are a diverse group of enzymes evolved from, e.g. acetyl xylan esterases (AXEs), lipases and tannases, thus complicating their classification and prediction of function by sequence similarity. Recently, an increasing number of fungal FAEs have been biochemically characterized, owing to their potential in various biotechnological applications and multitude of candidate FAEs in fungal genomes. However, only part of the fungal FAEs are included in Carbohydrate Esterase family 1 (CE1) of the carbohydrate-active enzymes (CAZy) database. In this work, we performed a phylogenetic analysis that divided the fungal members of CE1 into five subfamilies of which three contained characterized enzymes with conserved activities. Conservation within one of the subfamilies was confirmed by characterization of an additional CE1 enzyme from Aspergillus terreus. Recombinant A. terreus FaeD (AtFaeD) showed broad specificity towards synthetic methyl and ethyl esters, and released ferulic acid from plant biomass substrates, demonstrating its true FAE activity and interesting features as potential biocatalyst. The subfamily division of the fungal CE1 members enables more efficient selection of candidate enzymes for biotechnological processes. © 2018 The Authors. Microbial Biotechnology published by John Wiley & Sons Ltd and Society for Applied Microbiology.

  17. CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects

    PubMed Central

    Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf

    2014-01-01

    CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB PMID:25281234

  18. CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects.

    PubMed

    Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf

    2014-01-01

    CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB. © The Author(s) 2014. Published by Oxford University Press.

  19. Comprehensive coverage of cardiovascular disease data in the disease portals at the Rat Genome Database.

    PubMed

    Wang, Shur-Jen; Laulederkind, Stanley J F; Hayman, G Thomas; Petri, Victoria; Smith, Jennifer R; Tutaj, Marek; Nigam, Rajni; Dwinell, Melinda R; Shimoyama, Mary

    2016-08-01

    Cardiovascular diseases are complex diseases caused by a combination of genetic and environmental factors. To facilitate progress in complex disease research, the Rat Genome Database (RGD) provides the community with a disease portal where genome objects and biological data related to cardiovascular diseases are systematically organized. The purpose of this study is to present biocuration at RGD, including disease, genetic, and pathway data. The RGD curation team uses controlled vocabularies/ontologies to organize data curated from the published literature or imported from disease and pathway databases. These organized annotations are associated with genes, strains, and quantitative trait loci (QTLs), thus linking functional annotations to genome objects. Screen shots from the web pages are used to demonstrate the organization of annotations at RGD. The human cardiovascular disease genes identified by annotations were grouped according to data sources and their annotation profiles were compared by in-house tools and other enrichment tools available to the public. The analysis results show that the imported cardiovascular disease genes from ClinVar and OMIM are functionally different from the RGD manually curated genes in terms of pathway and Gene Ontology annotations. The inclusion of disease genes from other databases enriches the collection of disease genes not only in quantity but also in quality. Copyright © 2016 the American Physiological Society.

  20. Recent advances in reconstructing microbial secondary metabolites biosynthesis in Aspergillus spp.

    PubMed

    He, Yi; Wang, Bin; Chen, Wanping; Cox, Russell J; He, Jingren; Chen, Fusheng

    High throughput genome sequencing has revealed a multitude of potential secondary metabolites biosynthetic pathways that remain cryptic. Pathway reconstruction coupled with genetic engineering via heterologous expression enables discovery of novel compounds, elucidation of biosynthetic pathways, and optimization of product yields. Apart from Escherichia coli and yeast, fungi, especially Aspergillus spp., are well known and efficient heterologous hosts. This review summarizes recent advances in heterologous expression of microbial secondary metabolite biosynthesis in Aspergillus spp. We also discuss the technological challenges and successes in regard to heterologous host selection and DNA assembly behind the reconstruction of microbial secondary metabolite biosynthesis. Copyright © 2018 Elsevier Inc. All rights reserved.

  1. Identification and functional analysis of the aspergillic acid gene cluster in Aspergillus flavus

    USDA-ARS?s Scientific Manuscript database

    Aspergillus flavus can colonize important food staples and produces aflatoxins, toxic and carcinogenic secondary metabolites. In silico analysis of the A. flavus genome revealed 56 gene clusters encoding for secondary metabolites. How these many of these metabolites affect fungal development, surviv...

  2. Curated protein information in the Saccharomyces genome database.

    PubMed

    Hellerstedt, Sage T; Nash, Robert S; Weng, Shuai; Paskov, Kelley M; Wong, Edith D; Karra, Kalpana; Engel, Stacia R; Cherry, J Michael

    2017-01-01

    Due to recent advancements in the production of experimental proteomic data, the Saccharomyces genome database (SGD; www.yeastgenome.org ) has been expanding our protein curation activities to make new data types available to our users. Because of broad interest in post-translational modifications (PTM) and their importance to protein function and regulation, we have recently started incorporating expertly curated PTM information on individual protein pages. Here we also present the inclusion of new abundance and protein half-life data obtained from high-throughput proteome studies. These new data types have been included with the aim to facilitate cellular biology research. : www.yeastgenome.org. © The Author(s) 2017. Published by Oxford University Press.

  3. Comparative genomics of citric-acid producing Aspergillus niger ATCC 1015 versus enzyme-producing CBS 513.88

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grigoriev, Igor V.; Baker, Scott E.; Andersen, Mikael R.

    2011-04-28

    The filamentous fungus Aspergillus niger exhibits great diversity in its phenotype. It is found globally, both as marine and terrestrial strains, produces both organic acids and hydrolytic enzymes in high amounts, and some isolates exhibit pathogenicity. Although the genome of an industrial enzyme-producing A. niger strain (CBS 513.88) has already been sequenced, the versatility and diversity of this species compels additional exploration. We therefore undertook whole genome sequencing of the acidogenic A. niger wild type strain (ATCC 1015), and produced a genome sequence of very high quality. Only 15 gaps are present in the sequence and half the telomeric regionsmore » have been elucidated. Moreover, sequence information from ATCC 1015 was utilized to improve the genome sequence of CBS 513.88. Chromosome-level comparisons uncovered several genome rearrangements, deletions, a clear case of strain-specific horizontal gene transfer, and identification of 0.8 megabase of novel sequence. Single nucleotide polymorphisms per kilobase (SNPs/kb) between the two strains were found to be exceptionally high (average: 7.8, maximum: 160 SNPs/kb). High variation within the species was confirmed with exo-metabolite profiling and phylogenetics. Detailed lists of alleles were generated, and genotypic differences were observed to accumulate in metabolic pathways essential to acid production and protein synthesis. A transcriptome analysis revealed up-regulation of the electron transport chain, specifically the alternative oxidative pathway in ATCC 1015, while CBS 513.88 showed significant up-regulation of genes relevant to glucoamylase A production, such as tRNA-synthases and protein transporters. Our results and datasets from this integrative systems biology analysis resulted in a snapshot of fungal evolution and will support further optimization of cell factories based on filamentous fungi.[Supplemental materials (10 figures, three text documents and 16 tables) have been made

  4. SolCyc: a database hub at the Sol Genomics Network (SGN) for the manual curation of metabolic networks in Solanum and Nicotiana specific databases

    PubMed Central

    Foerster, Hartmut; Bombarely, Aureliano; Battey, James N D; Sierro, Nicolas; Ivanov, Nikolai V; Mueller, Lukas A

    2018-01-01

    Abstract SolCyc is the entry portal to pathway/genome databases (PGDBs) for major species of the Solanaceae family hosted at the Sol Genomics Network. Currently, SolCyc comprises six organism-specific PGDBs for tomato, potato, pepper, petunia, tobacco and one Rubiaceae, coffee. The metabolic networks of those PGDBs have been computationally predicted by the pathologic component of the pathway tools software using the manually curated multi-domain database MetaCyc (http://www.metacyc.org/) as reference. SolCyc has been recently extended by taxon-specific databases, i.e. the family-specific SolanaCyc database, containing only curated data pertinent to species of the nightshade family, and NicotianaCyc, a genus-specific database that stores all relevant metabolic data of the Nicotiana genus. Through manual curation of the published literature, new metabolic pathways have been created in those databases, which are complemented by the continuously updated, relevant species-specific pathways from MetaCyc. At present, SolanaCyc comprises 199 pathways and 29 superpathways and NicotianaCyc accounts for 72 pathways and 13 superpathways. Curator-maintained, taxon-specific databases such as SolanaCyc and NicotianaCyc are characterized by an enrichment of data specific to these taxa and free of falsely predicted pathways. Both databases have been used to update recently created Nicotiana-specific databases for Nicotiana tabacum, Nicotiana benthamiana, Nicotiana sylvestris and Nicotiana tomentosiformis by propagating verifiable data into those PGDBs. In addition, in-depth curation of the pathways in N.tabacum has been carried out which resulted in the elimination of 156 pathways from the 569 pathways predicted by pathway tools. Together, in-depth curation of the predicted pathway network and the supplementation with curated data from taxon-specific databases has substantially improved the curation status of the species–specific N.tabacum PGDB. The implementation of this

  5. Ribosomal subunit protein typing using matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) for the identification and discrimination of Aspergillus species.

    PubMed

    Nakamura, Sayaka; Sato, Hiroaki; Tanaka, Reiko; Kusuya, Yoko; Takahashi, Hiroki; Yaguchi, Takashi

    2017-04-26

    Accurate identification of Aspergillus species is a very important subject. Mass spectral fingerprinting using matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) is generally employed for the rapid identification of fungal isolates. However, the results are based on simple mass spectral pattern-matching, with no peak assignment and no taxonomic input. We propose here a ribosomal subunit protein (RSP) typing technique using MALDI-TOF MS for the identification and discrimination of Aspergillus species. The results are concluded to be phylogenetic in that they reflect the molecular evolution of housekeeping RSPs. The amino acid sequences of RSPs of genome-sequenced strains of Aspergillus species were first verified and compared to compile a reliable biomarker list for the identification of Aspergillus species. In this process, we revealed that many amino acid sequences of RSPs (about 10-60%, depending on strain) registered in the public protein databases needed to be corrected or newly added. The verified RSPs were allocated to RSP types based on their mass. Peak assignments of RSPs of each sample strain as observed by MALDI-TOF MS were then performed to set RSP type profiles, which were then further processed by means of cluster analysis. The resulting dendrogram based on RSP types showed a relatively good concordance with the tree based on β-tubulin gene sequences. RSP typing was able to further discriminate the strains belonging to Aspergillus section Fumigati. The RSP typing method could be applied to identify Aspergillus species, even for species within section Fumigati. The discrimination power of RSP typing appears to be comparable to conventional β-tubulin gene analysis. This method would therefore be suitable for species identification and discrimination at the strain to species level. Because RSP typing can characterize the strains within section Fumigati, this method has potential as a powerful and reliable tool in

  6. Aspergillus mulundensis sp. nov., a new species for the fungus producing the antifungal echinocandin lipopeptides, mulundocandins.

    PubMed

    Bills, Gerald F; Yue, Qun; Chen, Li; Li, Yan; An, Zhiqiang; Frisvad, Jens C

    2016-03-01

    The invalidly published name Aspergillus sydowii var. mulundensis was proposed for a strain of Aspergillus that produced new echinocandin metabolites designated as the mulundocadins. Reinvestigation of this strain (Y-30462=DSMZ 5745) using phylogenetic, morphological, and metabolic data indicated that it is a distinct and novel species of Aspergillus sect. Nidulantes. The taxonomic novelty, Aspergillus mulundensis, is introduced for this historically important echinocandin-producing strain. The closely related A. nidulans FGSC A4 has one of the most extensively characterized secondary metabolomes of any filamentous fungus. Comparison of the full-genome sequences of DSMZ 5745 and FGSC A4 indicated that the two strains share 33 secondary metabolite biosynthetic gene clusters. These shared gene clusters represent ~45% of the total secondary metabolome of each strain, thus indicating a high level intraspecific divergence in terms of secondary metabolism.

  7. Aspergillus Section Fumigati Typing by PCR-Restriction Fragment Polymorphism▿

    PubMed Central

    Staab, Janet F.; Balajee, S. Arunmozhi; Marr, Kieren A.

    2009-01-01

    Recent studies have shown that there are multiple clinically important members of the Aspergillus section Fumigati that are difficult to distinguish on the basis of morphological features (e.g., Aspergillus fumigatus, A. lentulus, and Neosartorya udagawae). Identification of these organisms may be clinically important, as some species vary in their susceptibilities to antifungal agents. In a prior study, we utilized multilocus sequence typing to describe A. lentulus as a species distinct from A. fumigatus. The sequence data show that the gene encoding β-tubulin, benA, has high interspecies variability at intronic regions but is conserved among isolates of the same species. These data were used to develop a PCR-restriction fragment length polymorphism (PCR-RFLP) method that rapidly and accurately distinguishes A. fumigatus, A. lentulus, and N. udagawae, three major species within the section Fumigati that have previously been implicated in disease. Digestion of the benA amplicon with BccI generated unique banding patterns; the results were validated by screening a collection of clinical strains and by in silico analysis of the benA sequences of Aspergillus spp. deposited in the GenBank database. PCR-RFLP of benA is a simple method for the identification of clinically important, similar morphotypes of Aspergillus spp. within the section Fumigati. PMID:19403766

  8. Aspergillus section Fumigati typing by PCR-restriction fragment polymorphism.

    PubMed

    Staab, Janet F; Balajee, S Arunmozhi; Marr, Kieren A

    2009-07-01

    Recent studies have shown that there are multiple clinically important members of the Aspergillus section Fumigati that are difficult to distinguish on the basis of morphological features (e.g., Aspergillus fumigatus, A. lentulus, and Neosartorya udagawae). Identification of these organisms may be clinically important, as some species vary in their susceptibilities to antifungal agents. In a prior study, we utilized multilocus sequence typing to describe A. lentulus as a species distinct from A. fumigatus. The sequence data show that the gene encoding beta-tubulin, benA, has high interspecies variability at intronic regions but is conserved among isolates of the same species. These data were used to develop a PCR-restriction fragment length polymorphism (PCR-RFLP) method that rapidly and accurately distinguishes A. fumigatus, A. lentulus, and N. udagawae, three major species within the section Fumigati that have previously been implicated in disease. Digestion of the benA amplicon with BccI generated unique banding patterns; the results were validated by screening a collection of clinical strains and by in silico analysis of the benA sequences of Aspergillus spp. deposited in the GenBank database. PCR-RFLP of benA is a simple method for the identification of clinically important, similar morphotypes of Aspergillus spp. within the section Fumigati.

  9. OryzaGenome: Genome Diversity Database of Wild Oryza Species.

    PubMed

    Ohyanagi, Hajime; Ebata, Toshinobu; Huang, Xuehui; Gong, Hao; Fujita, Masahiro; Mochizuki, Takako; Toyoda, Atsushi; Fujiyama, Asao; Kaminuma, Eli; Nakamura, Yasukazu; Feng, Qi; Wang, Zi-Xuan; Han, Bin; Kurata, Nori

    2016-01-01

    The species in the genus Oryza, encompassing nine genome types and 23 species, are a rich genetic resource and may have applications in deeper genomic analyses aiming to understand the evolution of plant genomes. With the advancement of next-generation sequencing (NGS) technology, a flood of Oryza species reference genomes and genomic variation information has become available in recent years. This genomic information, combined with the comprehensive phenotypic information that we are accumulating in our Oryzabase, can serve as an excellent genotype-phenotype association resource for analyzing rice functional and structural evolution, and the associated diversity of the Oryza genus. Here we integrate our previous and future phenotypic/habitat information and newly determined genotype information into a united repository, named OryzaGenome, providing the variant information with hyperlinks to Oryzabase. The current version of OryzaGenome includes genotype information of 446 O. rufipogon accessions derived by imputation and of 17 accessions derived by imputation-free deep sequencing. Two variant viewers are implemented: SNP Viewer as a conventional genome browser interface and Variant Table as a text-based browser for precise inspection of each variant one by one. Portable VCF (variant call format) file or tab-delimited file download is also available. Following these SNP (single nucleotide polymorphism) data, reference pseudomolecules/scaffolds/contigs and genome-wide variation information for almost all of the closely and distantly related wild Oryza species from the NIG Wild Rice Collection will be available in future releases. All of the resources can be accessed through http://viewer.shigen.info/oryzagenome/. © The Author 2015. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.

  10. Aspergillus asper sp. nov. and Aspergillus collinsii sp. nov., from Aspergillus section Usti.

    PubMed

    Jurjevic, Zeljko; Peterson, Stephen W

    2016-07-01

    In sampling fungi from the built environment, two isolates that could not confidently be placed in described species were encountered. Phenotypic analysis suggested that they belonged in Aspergillus sect. Usti. In order to verify the sectional placement and to assure that they were undescribed rather than phenotypically aberrant isolates, DNA was isolated and sequenced at the beta-tubulin, calmodulin, internal transcribed spacer and RNA polymerase II loci and sequences compared with those from other species in the genus Aspergillus. At each locus, each new isolate was distant from existing species. Phylogenetic trees calculated from these data and GenBank data for species of the section Usti excluded the placement of these isolates in existing species, with statistical support. Because they were excluded from existing taxa, the distinct species Aspergillus asper (type strain NRRL 35910 T ) and Aspergillus collinsii (type strain NRRL 66196 T ) in sect. Usti are proposed to accommodate these strains.

  11. An Integrated Molecular Database on Indian Insects.

    PubMed

    Pratheepa, Maria; Venkatesan, Thiruvengadam; Gracy, Gandhi; Jalali, Sushil Kumar; Rangheswaran, Rajagopal; Antony, Jomin Cruz; Rai, Anil

    2018-01-01

    MOlecular Database on Indian Insects (MODII) is an online database linking several databases like Insect Pest Info, Insect Barcode Information System (IBIn), Insect Whole Genome sequence, Other Genomic Resources of National Bureau of Agricultural Insect Resources (NBAIR), Whole Genome sequencing of Honey bee viruses, Insecticide resistance gene database and Genomic tools. This database was developed with a holistic approach for collecting information about phenomic and genomic information of agriculturally important insects. This insect resource database is available online for free at http://cib.res.in. http://cib.res.in/.

  12. Overexpression of Aspergillus tubingensis faeA in protease-deficient Aspergillus niger enables ferulic acid production from plant material.

    PubMed

    Zwane, Eunice N; Rose, Shaunita H; van Zyl, Willem H; Rumbold, Karl; Viljoen-Bloom, Marinda

    2014-06-01

    The production of ferulic acid esterase involved in the release of ferulic acid side groups from xylan was investigated in strains of Aspergillus tubingensis, Aspergillus carneus, Aspergillus niger and Rhizopus oryzae. The highest activity on triticale bran as sole carbon source was observed with the A. tubingensis T8.4 strain, which produced a type A ferulic acid esterase active against methyl p-coumarate, methyl ferulate and methyl sinapate. The activity of the A. tubingensis ferulic acid esterase (AtFAEA) was inhibited twofold by glucose and induced twofold in the presence of maize bran. An initial accumulation of endoglucanase was followed by the production of endoxylanase, suggesting a combined action with ferulic acid esterase on maize bran. A genomic copy of the A. tubingensis faeA gene was cloned and expressed in A. niger D15#26 under the control of the A. niger gpd promoter. The recombinant strain has reduced protease activity and does not acidify the media, therefore promoting high-level expression of recombinant enzymes. It produced 13.5 U/ml FAEA after 5 days on autoclaved maize bran as sole carbon source, which was threefold higher than for the A. tubingensis donor strain. The recombinant AtFAEA was able to extract 50 % of the available ferulic acid from non-pretreated maize bran, making this enzyme suitable for the biological production of ferulic acid from lignocellulosic plant material.

  13. GénoPlante-Info (GPI): a collection of databases and bioinformatics resources for plant genomics

    PubMed Central

    Samson, Delphine; Legeai, Fabrice; Karsenty, Emmanuelle; Reboux, Sébastien; Veyrieras, Jean-Baptiste; Just, Jeremy; Barillot, Emmanuel

    2003-01-01

    Génoplante is a partnership program between public French institutes (INRA, CIRAD, IRD and CNRS) and private companies (Biogemma, Bayer CropScience and Bioplante) that aims at developing genome analysis programs for crop species (corn, wheat, rapeseed, sunflower and pea) and model plants (Arabidopsis and rice). The outputs of these programs form a wealth of information (genomic sequence, transcriptome, proteome, allelic variability, mapping and synteny, and mutation data) and tools (databases, interfaces, analysis software), that are being integrated and made public at the public bioinformatics resource centre of Génoplante: GénoPlante-Info (GPI). This continuous flood of data and tools is regularly updated and will grow continuously during the coming two years. Access to the GPI databases and tools is available at http://genoplante-info.infobiogen.fr/. PMID:12519976

  14. Pathway Analysis and Omics Data Visualization Using Pathway Genome Databases: FragariaCyc, a Case Study.

    PubMed

    Naithani, Sushma; Jaiswal, Pankaj

    2017-01-01

    The species-specific plant Pathway Genome Databases (PGDBs) based on the BioCyc platform provide a conceptual model of the cellular metabolic network of an organism. Such frameworks allow analysis of the genome-scale expression data to understand changes in the overall metabolisms of an organism (or organs, tissues, and cells) in response to various extrinsic (e.g. developmental and differentiation) and/or extrinsic signals (e.g. pathogens and abiotic stresses) from the surrounding environment. Using FragariaCyc, a pathway database for the diploid strawberry Fragaria vesca, we show (1) the basic navigation across a PGDB; (2) a case study of pathway comparison across plant species; and (3) an example of RNA-Seq data analysis using Omics Viewer tool. The protocols described here generally apply to other Pathway Tools-based PGDBs.

  15. Strategies to Explore Functional Genomics Data Sets in NCBI’s GEO Database

    PubMed Central

    Wilhite, Stephen E.; Barrett, Tanya

    2012-01-01

    The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries. PMID:22130872

  16. The PathoYeastract database: an information system for the analysis of gene and genomic transcription regulation in pathogenic yeasts.

    PubMed

    Monteiro, Pedro Tiago; Pais, Pedro; Costa, Catarina; Manna, Sauvagya; Sá-Correia, Isabel; Teixeira, Miguel Cacho

    2017-01-04

    We present the PATHOgenic YEAst Search for Transcriptional Regulators And Consensus Tracking (PathoYeastract - http://pathoyeastract.org) database, a tool for the analysis and prediction of transcription regulatory associations at the gene and genomic levels in the pathogenic yeasts Candida albicans and C. glabrata Upon data retrieval from hundreds of publications, followed by curation, the database currently includes 28 000 unique documented regulatory associations between transcription factors (TF) and target genes and 107 DNA binding sites, considering 134 TFs in both species. Following the structure used for the YEASTRACT database, PathoYeastract makes available bioinformatics tools that enable the user to exploit the existing information to predict the TFs involved in the regulation of a gene or genome-wide transcriptional response, while ranking those TFs in order of their relative importance. Each search can be filtered based on the selection of specific environmental conditions, experimental evidence or positive/negative regulatory effect. Promoter analysis tools and interactive visualization tools for the representation of TF regulatory networks are also provided. The PathoYeastract database further provides simple tools for the prediction of gene and genomic regulation based on orthologous regulatory associations described for other yeast species, a comparative genomics setup for the study of cross-species evolution of regulatory networks. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. The evolutionary imprint of domestication on genome variation and function of the filamentous fungus Aspergillus oryzae

    PubMed Central

    Gibbons, John G.; Salichos, Leonidas; Slot, Jason C.; Rinker, David C.; McGary, Kriston L.; King, Jonas G.; Klich, Maren A.; Tabb, David L.; McDonald, W. Hayes; Rokas, Antonis

    2012-01-01

    Summary The domestication of animals, plants and microbes fundamentally transformed the lifestyle and demography of the human species [1]. Although the genetic and functional underpinnings of animal and plant domestication are well understood, little is known about microbe domestication [2–6]. We systematically examined genome-wide sequence and functional variation between the domesticated fungus Aspergillus oryzae, whose saccharification abilities humans have harnessed for thousands of years to produce sake, soy sauce and miso from starch-rich grains, and its wild relative A. flavus, a potentially toxigenic plant and animal pathogen [7]. We discovered dramatic changes in the sequence variation and abundance profiles of genes and wholesale primary and secondary metabolic pathways between domesticated and wild relative isolates during growth on rice. Through selection by humans, our data suggest that an atoxigenic lineage of A. flavus gradually evolved into a “cell factory” for enzymes and metabolites involved in the saccharification process. These results suggest that whereas animal and plant domestication was largely driven by Neolithic “genetic tinkering” of developmental pathways, microbe domestication was driven by extensive remodeling of metabolism. PMID:22795693

  18. Human Ageing Genomic Resources: Integrated databases and tools for the biology and genetics of ageing

    PubMed Central

    Tacutu, Robi; Craig, Thomas; Budovsky, Arie; Wuttke, Daniel; Lehmann, Gilad; Taranukha, Dmitri; Costa, Joana; Fraifeld, Vadim E.; de Magalhães, João Pedro

    2013-01-01

    The Human Ageing Genomic Resources (HAGR, http://genomics.senescence.info) is a freely available online collection of research databases and tools for the biology and genetics of ageing. HAGR features now several databases with high-quality manually curated data: (i) GenAge, a database of genes associated with ageing in humans and model organisms; (ii) AnAge, an extensive collection of longevity records and complementary traits for >4000 vertebrate species; and (iii) GenDR, a newly incorporated database, containing both gene mutations that interfere with dietary restriction-mediated lifespan extension and consistent gene expression changes induced by dietary restriction. Since its creation about 10 years ago, major efforts have been undertaken to maintain the quality of data in HAGR, while further continuing to develop, improve and extend it. This article briefly describes the content of HAGR and details the major updates since its previous publications, in terms of both structure and content. The completely redesigned interface, more intuitive and more integrative of HAGR resources, is also presented. Altogether, we hope that through its improvements, the current version of HAGR will continue to provide users with the most comprehensive and accessible resources available today in the field of biogerontology. PMID:23193293

  19. ATGC database and ATGC-COGs: an updated resource for micro- and macro-evolutionary studies of prokaryotic genomes and protein family annotation

    PubMed Central

    Kristensen, David M.; Wolf, Yuri I.; Koonin, Eugene V.

    2017-01-01

    The Alignable Tight Genomic Clusters (ATGCs) database is a collection of closely related bacterial and archaeal genomes that provides several tools to aid research into evolutionary processes in the microbial world. Each ATGC is a taxonomy-independent cluster of 2 or more completely sequenced genomes that meet the objective criteria of a high degree of local gene order (synteny) and a small number of synonymous substitutions in the protein-coding genes. As such, each ATGC is suited for analysis of microevolutionary variations within a cohesive group of organisms (e.g. species), whereas the entire collection of ATGCs is useful for macroevolutionary studies. The ATGC database includes many forms of pre-computed data, in particular ATGC-COGs (Clusters of Orthologous Genes), multiple sequence alignments, a set of ‘index’ orthologs representing the most well-conserved members of each ATGC-COG, the phylogenetic tree of the organisms within each ATGC, etc. Although the ATGC database contains several million proteins from thousands of genomes organized into hundreds of clusters (roughly a 4-fold increase since the last version of the ATGC database), it is now built with completely automated methods and will be regularly updated following new releases of the NCBI RefSeq database. The ATGC database is hosted jointly at the University of Iowa at dmk-brain.ecn.uiowa.edu/ATGC/ and the NCBI at ftp.ncbi.nlm.nih.gov/pub/kristensen/ATGC/atgc_home.html. PMID:28053163

  20. PATtyFams: Protein families for the microbial genomes in the PATRIC database

    DOE PAGES

    Davis, James J.; Gerdes, Svetlana; Olsen, Gary J.; ...

    2016-02-08

    The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based functionmore » assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). In conclusion, this new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.« less

  1. PATtyFams: Protein families for the microbial genomes in the PATRIC database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Davis, James J.; Gerdes, Svetlana; Olsen, Gary J.

    The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based functionmore » assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). In conclusion, this new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.« less

  2. The University of Minnesota Biocatalysis/Biodegradation Database: specialized metabolism for functional genomics.

    PubMed Central

    Ellis, L B; Hershberger, C D; Wackett, L P

    1999-01-01

    The University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD, http://www.labmed.umn.edu/umbbd/i nde x.html) first became available on the web in 1995 to provide information on microbial biocatalytic reactions of, and biodegradation pathways for, organic chemical compounds, especially those produced by man. Its goal is to become a representative database of biodegradation, spanning the diversity of known microbial metabolic routes, organic functional groups, and environmental conditions under which biodegradation occurs. The database can be used to enhance understanding of basic biochemistry, biocatalysis leading to speciality chemical manufacture, and biodegradation of environmental pollutants. It is also a resource for functional genomics, since it contains information on enzymes and genes involved in specialized metabolism not found in intermediary metabolism databases, and thus can assist in assigning functions to genes homologous to such less common genes. With information on >400 reactions and compounds, it is poised to become a resource for prediction of microbial biodegradation pathways for compounds it does not contain, a process complementary to predicting the functions of new classes of microbial genes. PMID:9847233

  3. Identification of Aspergillus sections Flavi, Nigri, and Fumigati and their differentiation using specific primers.

    PubMed

    Ashtiani, Nafiseh Mohebbi; Kachuei, Reza; Yalfani, Roozbeh; Harchegani, Asghar Beigi; Nosratabadi, Mohsen

    2017-06-01

    Aspergillus species are important in medicine, agriculture and various industries. The sections Fumigati, Flavi, and Nigri are the most important members of the Aspergillus genus. This study intended to identify and separate these three Aspergillus sections and to differentiate among them using specific primers. A bioinformatics study was initially performed to analyse the sequences of five genes, namely, beta-tubulin, calmodulin, the pre-rRNA processing protein Tsr1, the DNA-replication licensing factor Mcm7, and RNA polymerase II second largest subunit (RPB2) in the three Aspergillus sections using MEGA6 software and the NCBI database. Primers were designed to select genes for each of the Aspergillus sections being analysed. A total of 134 environmental and clinical Aspergillus species were isolated, purified and initially identified by colony morphology.. Subsequently, DNA was extracted using the phenol-chloroform method, specific primers were synthesized, PCR was performed for DNA from all isolates, and the results were compared to morphological characteristics. Of the 134 isolates tested, 56 were Nigri, 32 were Fumigati, 32 were Flavi, and the rest (14 isolates) belonged to other sections. The beta-tubulin and calmodulin genes were found to be the most suitable for differentiating among these three groups; the beta-tubulin gene was used for molecular identification of Aspergillus section Fumigati, and the calmodulin gene for identifying sections Flavi and Nigri.

  4. FPD: A comprehensive phosphorylation database in fungi.

    PubMed

    Bai, Youhuang; Chen, Bin; Li, Mingzhu; Zhou, Yincong; Ren, Silin; Xu, Qin; Chen, Ming; Wang, Shihua

    2017-10-01

    Protein phosphorylation, one of the most classic post-translational modification, plays a critical role in diverse cellular processes including cell cycle, growth, and signal transduction pathways. However, the available information about phosphorylation in fungi is limited. Here, we provided a Fungi Phosphorylation Database (FPD) that comprises high-confidence in vivo phosphosites identified by MS-based proteomics in various fungal species. This comprehensive phosphorylation database contains 62 272 non-redundant phosphorylation sites in 11 222 proteins across eight organisms, including Aspergillus flavus, Aspergillus nidulans, Fusarium graminearum, Magnaporthe oryzae, Neurospora crassa, Saccharomyces cerevisiae, Schizosaccharomyces pombe, and Cryptococcus neoformans. A fungi-specific phosphothreonine motif and several conserved phosphorylation motifs were discovered by comparatively analysing the pattern of phosphorylation sites in plants, animals, and fungi. Copyright © 2017 British Mycological Society. Published by Elsevier Ltd. All rights reserved.

  5. LAMP-PCR detection of ochratoxigenic Aspergillus species collected from peanut kernel.

    PubMed

    Al-Sheikh, H M

    2015-01-30

    Over the last decade, ochratoxin A (OTA) has been widely described and is ubiquitous in several agricultural products. Ochratoxins represent the second-most important mycotoxin group after aflatoxins. A total of 34 samples were surveyed from 3 locations, including Mecca, Madina, and Riyadh, Saudi Arabia, during 2012. Fungal contamination frequency was determined for surface-sterilized peanut seeds, which were seeded onto malt extract agar media. Aspergillus niger (35%), Aspergillus ochraceus (30%), and Aspergillus carbonarius (25%) were the most frequently observed Aspergillius species, while Aspergillus flavus and Aspergillus phoenicis isolates were only infrequently recovered and in small numbers (10%). OTA production was evaluated on yeast extract sucrose medium, which revealed that 57% of the isolates were A. niger and 60% of A. carbonarius isolates were OTA producers; 100% belonged to A. ochraceus. Only one isolate, morphologically identified as A. carbonarius, and 3 A. niger isolates unstably produced OTA. A polymerase chain reaction (PCR)-based identification and detection assay was used to identify A. ochraceus isolates. Using the primer sets OCRA1/OCRA2, 400-base pair PCR fragments were produced only when genomic DNA from A. ochraceus isolates was used. Recently, the loop-mediated isothermal amplification assay using recombinase polymerase amplification chemistry was used for A. carbonarius and A. niger DNA identification. As a non-gel-based technique, the amplification product was directly visualized in the reaction tube after adding calcein for naked-eye examination.

  6. Use of a Drosophila Genome-Wide Conserved Sequence Database to Identify Functionally Related cis-Regulatory Enhancers

    PubMed Central

    Brody, Thomas; Yavatkar, Amarendra S; Kuzin, Alexander; Kundu, Mukta; Tyson, Leonard J; Ross, Jermaine; Lin, Tzu-Yang; Lee, Chi-Hon; Awasaki, Takeshi; Lee, Tzumin; Odenwald, Ward F

    2012-01-01

    Background: Phylogenetic footprinting has revealed that cis-regulatory enhancers consist of conserved DNA sequence clusters (CSCs). Currently, there is no systematic approach for enhancer discovery and analysis that takes full-advantage of the sequence information within enhancer CSCs. Results: We have generated a Drosophila genome-wide database of conserved DNA consisting of >100,000 CSCs derived from EvoPrints spanning over 90% of the genome. cis-Decoder database search and alignment algorithms enable the discovery of functionally related enhancers. The program first identifies conserved repeat elements within an input enhancer and then searches the database for CSCs that score highly against the input CSC. Scoring is based on shared repeats as well as uniquely shared matches, and includes measures of the balance of shared elements, a diagnostic that has proven to be useful in predicting cis-regulatory function. To demonstrate the utility of these tools, a temporally-restricted CNS neuroblast enhancer was used to identify other functionally related enhancers and analyze their structural organization. Conclusions: cis-Decoder reveals that co-regulating enhancers consist of combinations of overlapping shared sequence elements, providing insights into the mode of integration of multiple regulating transcription factors. The database and accompanying algorithms should prove useful in the discovery and analysis of enhancers involved in any developmental process. Developmental Dynamics 241:169–189, 2012. © 2011 Wiley Periodicals, Inc. Key findings A genome-wide catalog of Drosophila conserved DNA sequence clusters. cis-Decoder discovers functionally related enhancers. Functionally related enhancers share balanced sequence element copy numbers. Many enhancers function during multiple phases of development. PMID:22174086

  7. Construction of an ortholog database using the semantic web technology for integrative analysis of genomic data.

    PubMed

    Chiba, Hirokazu; Nishide, Hiroyo; Uchiyama, Ikuo

    2015-01-01

    Recently, various types of biological data, including genomic sequences, have been rapidly accumulating. To discover biological knowledge from such growing heterogeneous data, a flexible framework for data integration is necessary. Ortholog information is a central resource for interlinking corresponding genes among different organisms, and the Semantic Web provides a key technology for the flexible integration of heterogeneous data. We have constructed an ortholog database using the Semantic Web technology, aiming at the integration of numerous genomic data and various types of biological information. To formalize the structure of the ortholog information in the Semantic Web, we have constructed the Ortholog Ontology (OrthO). While the OrthO is a compact ontology for general use, it is designed to be extended to the description of database-specific concepts. On the basis of OrthO, we described the ortholog information from our Microbial Genome Database for Comparative Analysis (MBGD) in the form of Resource Description Framework (RDF) and made it available through the SPARQL endpoint, which accepts arbitrary queries specified by users. In this framework based on the OrthO, the biological data of different organisms can be integrated using the ortholog information as a hub. Besides, the ortholog information from different data sources can be compared with each other using the OrthO as a shared ontology. Here we show some examples demonstrating that the ortholog information described in RDF can be used to link various biological data such as taxonomy information and Gene Ontology. Thus, the ortholog database using the Semantic Web technology can contribute to biological knowledge discovery through integrative data analysis.

  8. A DATABASE FOR TRACKING TOXICOGENOMIC SAMPLES AND PROCEDURES WITH GENOMIC, PROTEOMIC AND METABONOMIC COMPONENTS

    EPA Science Inventory

    A Database for Tracking Toxicogenomic Samples and Procedures with Genomic, Proteomic and Metabonomic Components
    Wenjun Bao1, Jennifer Fostel2, Michael D. Waters2, B. Alex Merrick2, Drew Ekman3, Mitchell Kostich4, Judith Schmid1, David Dix1
    Office of Research and Developmen...

  9. Identification by Molecular Methods and Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry and Antifungal Susceptibility Profiles of Clinically Significant Rare Aspergillus Species in a Referral Chest Hospital in Delhi, India.

    PubMed

    Masih, Aradhana; Singh, Pradeep K; Kathuria, Shallu; Agarwal, Kshitij; Meis, Jacques F; Chowdhary, Anuradha

    2016-09-01

    Aspergillus species cause a wide spectrum of clinical infections. Although Aspergillus fumigatus and Aspergillus flavus remain the most commonly isolated species in aspergillosis, in the last decade, rare and cryptic Aspergillus species have emerged in diverse clinical settings. The present study analyzed the distribution and in vitro antifungal susceptibility profiles of rare Aspergillus species in clinical samples from patients with suspected aspergillosis in 8 medical centers in India. Further, a matrix-assisted laser desorption ionization-time of flight mass spectrometry in-house database was developed to identify these clinically relevant Aspergillus species. β-Tubulin and calmodulin gene sequencing identified 45 rare Aspergillus isolates to the species level, except for a solitary isolate. They included 23 less common Aspergillus species belonging to 12 sections, mainly in Circumdati, Nidulantes, Flavi, Terrei, Versicolores, Aspergillus, and Nigri Matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) identified only 8 (38%) of the 23 rare Aspergillus isolates to the species level. Following the creation of an in-house database with the remaining 14 species not available in the Bruker database, the MALDI-TOF MS identification rate increased to 95%. Overall, high MICs of ≥2 μg/ml were noted for amphotericin B in 29% of the rare Aspergillus species, followed by voriconazole in 20% and isavuconazole in 7%, whereas MICs of >0.5 μg/ml for posaconazole were observed in 15% of the isolates. Regarding the clinical diagnoses in 45 patients with positive rare Aspergillus species cultures, 19 (42%) were regarded to represent colonization. In the remaining 26 patients, rare Aspergillus species were the etiologic agent of invasive, chronic, and allergic bronchopulmonary aspergillosis, allergic fungal rhinosinusitis, keratitis, and mycetoma. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  10. Identification by Molecular Methods and Matrix-Assisted Laser Desorption Ionization–Time of Flight Mass Spectrometry and Antifungal Susceptibility Profiles of Clinically Significant Rare Aspergillus Species in a Referral Chest Hospital in Delhi, India

    PubMed Central

    Masih, Aradhana; Singh, Pradeep K.; Kathuria, Shallu; Agarwal, Kshitij

    2016-01-01

    Aspergillus species cause a wide spectrum of clinical infections. Although Aspergillus fumigatus and Aspergillus flavus remain the most commonly isolated species in aspergillosis, in the last decade, rare and cryptic Aspergillus species have emerged in diverse clinical settings. The present study analyzed the distribution and in vitro antifungal susceptibility profiles of rare Aspergillus species in clinical samples from patients with suspected aspergillosis in 8 medical centers in India. Further, a matrix-assisted laser desorption ionization–time of flight mass spectrometry in-house database was developed to identify these clinically relevant Aspergillus species. β-Tubulin and calmodulin gene sequencing identified 45 rare Aspergillus isolates to the species level, except for a solitary isolate. They included 23 less common Aspergillus species belonging to 12 sections, mainly in Circumdati, Nidulantes, Flavi, Terrei, Versicolores, Aspergillus, and Nigri. Matrix-assisted laser desorption ionization–time of flight mass spectrometry (MALDI-TOF MS) identified only 8 (38%) of the 23 rare Aspergillus isolates to the species level. Following the creation of an in-house database with the remaining 14 species not available in the Bruker database, the MALDI-TOF MS identification rate increased to 95%. Overall, high MICs of ≥2 μg/ml were noted for amphotericin B in 29% of the rare Aspergillus species, followed by voriconazole in 20% and isavuconazole in 7%, whereas MICs of >0.5 μg/ml for posaconazole were observed in 15% of the isolates. Regarding the clinical diagnoses in 45 patients with positive rare Aspergillus species cultures, 19 (42%) were regarded to represent colonization. In the remaining 26 patients, rare Aspergillus species were the etiologic agent of invasive, chronic, and allergic bronchopulmonary aspergillosis, allergic fungal rhinosinusitis, keratitis, and mycetoma. PMID:27413188

  11. 1-CMDb: A Curated Database of Genomic Variations of the One-Carbon Metabolism Pathway.

    PubMed

    Bhat, Manoj K; Gadekar, Veerendra P; Jain, Aditya; Paul, Bobby; Rai, Padmalatha S; Satyamoorthy, Kapaettu

    2017-01-01

    The one-carbon metabolism pathway is vital in maintaining tissue homeostasis by driving the critical reactions of folate and methionine cycles. A myriad of genetic and epigenetic events mark the rate of reactions in a tissue-specific manner. Integration of these to predict and provide personalized health management requires robust computational tools that can process multiomics data. The DNA sequences that may determine the chain of biological events and the endpoint reactions within one-carbon metabolism genes remain to be comprehensively recorded. Hence, we designed the one-carbon metabolism database (1-CMDb) as a platform to interrogate its association with a host of human disorders. DNA sequence and network information of a total of 48 genes were extracted from a literature survey and KEGG pathway that are involved in the one-carbon folate-mediated pathway. The information generated, collected, and compiled for all these genes from the UCSC genome browser included the single nucleotide polymorphisms (SNPs), CpGs, copy number variations (CNVs), and miRNAs, and a comprehensive database was created. Furthermore, a significant correlation analysis was performed for SNPs in the pathway genes. Detailed data of SNPs, CNVs, CpG islands, and miRNAs for 48 folate pathway genes were compiled. The SNPs in CNVs (9670), CpGs (984), and miRNAs (14) were also compiled for all pathway genes. The SIFT score, the prediction and PolyPhen score, as well as the prediction for each of the SNPs were tabulated and represented for folate pathway genes. Also included in the database for folate pathway genes were the links to 124 various phenotypes and disease associations as reported in the literature and from publicly available information. A comprehensive database was generated consisting of genomic elements within and among SNPs, CNVs, CpGs, and miRNAs of one-carbon metabolism pathways to facilitate (a) single source of information and (b) integration into large-genome scale network

  12. What Does Genetic Diversity of Aspergillus flavus Tell Us About Aspergillus oryzae?

    USDA-ARS?s Scientific Manuscript database

    Aspergillus flavus and Aspergillus oryzae belong to Aspergillus section Flavi. They are closely related and are of significant economic importance. The former species has the ability to produce harmful aflatoxins while the latter is widely used in food fermentation and industrial enzyme production. ...

  13. The Génolevures database.

    PubMed

    Martin, Tiphaine; Sherman, David J; Durrens, Pascal

    2011-01-01

    The Génolevures online database (URL: http://www.genolevures.org) stores and provides the data and results obtained by the Génolevures Consortium through several campaigns of genome annotation of the yeasts in the Saccharomycotina subphylum (hemiascomycetes). This database is dedicated to large-scale comparison of these genomes, storing not only the different chromosomal elements detected in the sequences, but also the logical relations between them. The database is divided into a public part, accessible to anyone through Internet, and a private part where the Consortium members make genome annotations with our Magus annotation system; this system is used to annotate several related genomes in parallel. The public database is widely consulted and offers structured data, organized using a REST web site architecture that allows for automated requests. The implementation of the database, as well as its associated tools and methods, is evolving to cope with the influx of genome sequences produced by Next Generation Sequencing (NGS). Copyright © 2011 Académie des sciences. Published by Elsevier SAS. All rights reserved.

  14. Matrix-assisted laser desorption ionization time-of-flight mass spectrometry for fast and accurate identification of clinically relevant Aspergillus species.

    PubMed

    Alanio, A; Beretti, J-L; Dauphin, B; Mellado, E; Quesne, G; Lacroix, C; Amara, A; Berche, P; Nassif, X; Bougnoux, M-E

    2011-05-01

    New Aspergillus species have recently been described with the use of multilocus sequencing in refractory cases of invasive aspergillosis. The classical phenotypic identification methods routinely used in clinical laboratories failed to identify them adequately. Some of these Aspergillus species have specific patterns of susceptibility to antifungal agents, and misidentification may lead to inappropriate therapy. We developed a matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry (MS)-based strategy to adequately identify Aspergillus species to the species level. A database including the reference spectra of 28 clinically relevant species from seven Aspergillus sections (five common and 23 unusual species) was engineered. The profiles of young and mature colonies were analysed for each reference strain, and species-specific spectral fingerprints were identified. The performance of the database was then tested on 124 clinical and 16 environmental isolates previously characterized by partial sequencing of the β-tubulin and calmodulin genes. One hundred and thirty-eight isolates of 140 (98.6%) were correctly identified. Two atypical isolates could not be identified, but no isolate was misidentified (specificity: 100%). The database, including species-specific spectral fingerprints of young and mature colonies of the reference strains, allowed identification regardless of the maturity of the clinical isolate. These results indicate that MALDI-TOF MS is a powerful tool for rapid and accurate identification of both common and unusual species of Aspergillus. It can give better results than morphological identification in clinical laboratories. © 2010 The Authors. Clinical Microbiology and Infection © 2010 European Society of Clinical Microbiology and Infectious Diseases.

  15. Cyclone: java-based querying and computing with Pathway/Genome databases.

    PubMed

    Le Fèvre, François; Smidtas, Serge; Schächter, Vincent

    2007-05-15

    Cyclone aims at facilitating the use of BioCyc, a collection of Pathway/Genome Databases (PGDBs). Cyclone provides a fully extensible Java Object API to analyze and visualize these data. Cyclone can read and write PGDBs, and can write its own data in the CycloneML format. This format is automatically generated from the BioCyc ontology by Cyclone itself, ensuring continued compatibility. Cyclone objects can also be stored in a relational database CycloneDB. Queries can be written in SQL, and in an intuitive and concise object-oriented query language, Hibernate Query Language (HQL). In addition, Cyclone interfaces easily with Java software including the Eclipse IDE for HQL edition, the Jung API for graph algorithms or Cytoscape for graph visualization. Cyclone is freely available under an open source license at: http://sourceforge.net/projects/nemo-cyclone. For download and installation instructions, tutorials, use cases and examples, see http://nemo-cyclone.sourceforge.net.

  16. Human Mitochondrial Protein Database

    National Institute of Standards and Technology Data Gateway

    SRD 131 Human Mitochondrial Protein Database (Web, free access)   The Human Mitochondrial Protein Database (HMPDb) provides comprehensive data on mitochondrial and human nuclear encoded proteins involved in mitochondrial biogenesis and function. This database consolidates information from SwissProt, LocusLink, Protein Data Bank (PDB), GenBank, Genome Database (GDB), Online Mendelian Inheritance in Man (OMIM), Human Mitochondrial Genome Database (mtDB), MITOMAP, Neuromuscular Disease Center and Human 2-D PAGE Databases. This database is intended as a tool not only to aid in studying the mitochondrion but in studying the associated diseases.

  17. Development of RFLP-PCR method for the identification of medically important Aspergillus species using single restriction enzyme MwoI.

    PubMed

    Diba, K; Mirhendi, H; Kordbacheh, P; Rezaie, S

    2014-01-01

    In this study we attempted to modify the PCR-RFLP method using restriction enzyme MwoI for the identification of medically important Aspergillus species. Our subjects included nine standard Aspergillus species and 205 Aspergillus isolates of approved hospital acquired infections and hospital indoor sources. First of all, Aspergillus isolates were identified in the level of species by using morphologic method. A twenty four hours culture was performed for each isolates to harvest Aspergillus mycelia and then genomic DNA was extracted using Phenol-Chloroform method. PCR-RFLP using single restriction enzyme MwoI was performed in ITS regions of rDNA gene. The electrophoresis data were analyzed and compared with those of morphologic identifications. Total of 205 Aspergillus isolates included 153 (75%) environmental and 52 (25%) clinical isolates. A. flavus was the most frequently isolate in our study (55%), followed by A. niger 65(31.7%), A. fumigatus 18(8.7%), A. nidulans and A. parasiticus 2(1% each). MwoI enabled us to discriminate eight medically important Aspergillus species including A. fumigatus, A. niger, A. flavus as the most common isolated species. PCR-RFLP method using the restriction enzyme MwoI is a rapid and reliable test for identification of at least the most medically important Aspergillus species.

  18. Development of RFLP-PCR method for the identification of medically important Aspergillus species using single restriction enzyme MwoI

    PubMed Central

    Diba, K.; Mirhendi, H.; Kordbacheh, P.; Rezaie, S.

    2014-01-01

    In this study we attempted to modify the PCR-RFLP method using restriction enzyme MwoI for the identification of medically important Aspergillus species. Our subjects included nine standard Aspergillus species and 205 Aspergillus isolates of approved hospital acquired infections and hospital indoor sources. First of all, Aspergillus isolates were identified in the level of species by using morphologic method. A twenty four hours culture was performed for each isolates to harvest Aspergillus mycelia and then genomic DNA was extracted using Phenol-Chloroform method. PCR-RFLP using single restriction enzyme MwoI was performed in ITS regions of rDNA gene. The electrophoresis data were analyzed and compared with those of morphologic identifications. Total of 205 Aspergillus isolates included 153 (75%) environmental and 52 (25%) clinical isolates. A. flavus was the most frequently isolate in our study (55%), followed by A. niger 65(31.7%), A. fumigatus 18(8.7%), A. nidulans and A. parasiticus 2(1% each). MwoI enabled us to discriminate eight medically important Aspergillus species including A. fumigatus, A. niger, A. flavus as the most common isolated species. PCR-RFLP method using the restriction enzyme MwoI is a rapid and reliable test for identification of at least the most medically important Aspergillus species. PMID:25242934

  19. Genome-scale analysis of the high-efficient protein secretion system of Aspergillus oryzae.

    PubMed

    Liu, Lifang; Feizi, Amir; Österlund, Tobias; Hjort, Carsten; Nielsen, Jens

    2014-06-24

    The koji mold, Aspergillus oryzae is widely used for the production of industrial enzymes due to its particularly high protein secretion capacity and ability to perform post-translational modifications. However, systemic analysis of its secretion system is lacking, generally due to the poorly annotated proteome. Here we defined a functional protein secretory component list of A. oryzae using a previously reported secretory model of S. cerevisiae as scaffold. Additional secretory components were obtained by blast search with the functional components reported in other closely related fungal species such as Aspergillus nidulans and Aspergillus niger. To evaluate the defined component list, we performed transcriptome analysis on three α-amylase over-producing strains with varying levels of secretion capacities. Specifically, secretory components involved in the ER-associated processes (including components involved in the regulation of transport between ER and Golgi) were significantly up-regulated, with many of them never been identified for A. oryzae before. Furthermore, we defined a complete list of the putative A. oryzae secretome and monitored how it was affected by overproducing amylase. In combination with the transcriptome data, the most complete secretory component list and the putative secretome, we improved the systemic understanding of the secretory machinery of A. oryzae in response to high levels of protein secretion. The roles of many newly predicted secretory components were experimentally validated and the enriched component list provides a better platform for driving more mechanistic studies of the protein secretory pathway in this industrially important fungus.

  20. A nonribosomal peptide synthetase (Pes1) confers protection against oxidative stress in Aspergillus fumigatus.

    PubMed

    Reeves, Emer P; Reiber, Kathrin; Neville, Claire; Scheibner, Olaf; Kavanagh, Kevin; Doyle, Sean

    2006-07-01

    Aspergillus fumigatus is an important human fungal pathogen. The Aspergillus fumigatus genome contains 14 nonribosomal peptide synthetase genes, potentially responsible for generating metabolites that contribute to organismal virulence. Differential expression of the nonribosomal peptide synthetase gene, pes1, in four strains of Aspergillus fumigatus was observed. The pattern of pes1 expression differed from that of a putative siderophore synthetase gene, sidD, and so is unlikely to be involved in iron acquisition. The Pes1 protein (expected molecular mass 698 kDa) was partially purified and identified by immunoreactivity, peptide mass fingerprinting (36% sequence coverage) and MALDI LIFT-TOF/TOF MS (four internal peptides sequenced). A pes1 disruption mutant (delta pes1) of Aspergillus fumigatus strain 293.1 was generated and confirmed by Southern and western analysis, in addition to RT-PCR. The delta pes1 mutant also showed significantly reduced virulence in the Galleria mellonella model system (P < 0.001) and increased sensitivity to oxidative stress (P = 0.002) in culture and during neutrophil-mediated phagocytosis. In addition, the mutant exhibited altered conidial surface morphology and hydrophilicity, compared to Aspergillus fumigatus 293.1. It is concluded that pes1 contributes to improved fungal tolerance against oxidative stress, mediated by the conidial phenotype, during the infection process.

  1. A low-latency, big database system and browser for storage, querying and visualization of 3D genomic data

    PubMed Central

    Butyaev, Alexander; Mavlyutov, Ruslan; Blanchette, Mathieu; Cudré-Mauroux, Philippe; Waldispühl, Jérôme

    2015-01-01

    Recent releases of genome three-dimensional (3D) structures have the potential to transform our understanding of genomes. Nonetheless, the storage technology and visualization tools need to evolve to offer to the scientific community fast and convenient access to these data. We introduce simultaneously a database system to store and query 3D genomic data (3DBG), and a 3D genome browser to visualize and explore 3D genome structures (3DGB). We benchmark 3DBG against state-of-the-art systems and demonstrate that it is faster than previous solutions, and importantly gracefully scales with the size of data. We also illustrate the usefulness of our 3D genome Web browser to explore human genome structures. The 3D genome browser is available at http://3dgb.cs.mcgill.ca/. PMID:25990738

  2. Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases.

    PubMed

    Sanderson, Lacey-Anne; Ficklin, Stephen P; Cheng, Chun-Huai; Jung, Sook; Feltus, Frank A; Bett, Kirstin E; Main, Dorrie

    2013-01-01

    Tripal is an open-source freely available toolkit for construction of online genomic and genetic databases. It aims to facilitate development of community-driven biological websites by integrating the GMOD Chado database schema with Drupal, a popular website creation and content management software. Tripal provides a suite of tools for interaction with a Chado database and display of content therein. The tools are designed to be generic to support the various ways in which data may be stored in Chado. Previous releases of Tripal have supported organisms, genomic libraries, biological stocks, stock collections and genomic features, their alignments and annotations. Also, Tripal and its extension modules provided loaders for commonly used file formats such as FASTA, GFF, OBO, GAF, BLAST XML, KEGG heir files and InterProScan XML. Default generic templates were provided for common views of biological data, which could be customized using an open Application Programming Interface to change the way data are displayed. Here, we report additional tools and functionality that are part of release v1.1 of Tripal. These include (i) a new bulk loader that allows a site curator to import data stored in a custom tab delimited format; (ii) full support of every Chado table for Drupal Views (a powerful tool allowing site developers to construct novel displays and search pages); (iii) new modules including 'Feature Map', 'Genetic', 'Publication', 'Project', 'Contact' and the 'Natural Diversity' modules. Tutorials, mailing lists, download and set-up instructions, extension modules and other documentation can be found at the Tripal website located at http://tripal.info. DATABASE URL: http://tripal.info/.

  3. Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases

    PubMed Central

    Sanderson, Lacey-Anne; Ficklin, Stephen P.; Cheng, Chun-Huai; Jung, Sook; Feltus, Frank A.; Bett, Kirstin E.; Main, Dorrie

    2013-01-01

    Tripal is an open-source freely available toolkit for construction of online genomic and genetic databases. It aims to facilitate development of community-driven biological websites by integrating the GMOD Chado database schema with Drupal, a popular website creation and content management software. Tripal provides a suite of tools for interaction with a Chado database and display of content therein. The tools are designed to be generic to support the various ways in which data may be stored in Chado. Previous releases of Tripal have supported organisms, genomic libraries, biological stocks, stock collections and genomic features, their alignments and annotations. Also, Tripal and its extension modules provided loaders for commonly used file formats such as FASTA, GFF, OBO, GAF, BLAST XML, KEGG heir files and InterProScan XML. Default generic templates were provided for common views of biological data, which could be customized using an open Application Programming Interface to change the way data are displayed. Here, we report additional tools and functionality that are part of release v1.1 of Tripal. These include (i) a new bulk loader that allows a site curator to import data stored in a custom tab delimited format; (ii) full support of every Chado table for Drupal Views (a powerful tool allowing site developers to construct novel displays and search pages); (iii) new modules including ‘Feature Map’, ‘Genetic’, ‘Publication’, ‘Project’, ‘Contact’ and the ‘Natural Diversity’ modules. Tutorials, mailing lists, download and set-up instructions, extension modules and other documentation can be found at the Tripal website located at http://tripal.info. Database URL: http://tripal.info/ PMID:24163125

  4. The Human Oral Microbiome Database: a web accessible resource for investigating oral microbe taxonomic and genomic information

    PubMed Central

    Chen, Tsute; Yu, Wen-Han; Izard, Jacques; Baranova, Oxana V.; Lakshmanan, Abirami; Dewhirst, Floyd E.

    2010-01-01

    The human oral microbiome is the most studied human microflora, but 53% of the species have not yet been validly named and 35% remain uncultivated. The uncultivated taxa are known primarily from 16S rRNA sequence information. Sequence information tied solely to obscure isolate or clone numbers, and usually lacking accurate phylogenetic placement, is a major impediment to working with human oral microbiome data. The goal of creating the Human Oral Microbiome Database (HOMD) is to provide the scientific community with a body site-specific comprehensive database for the more than 600 prokaryote species that are present in the human oral cavity based on a curated 16S rRNA gene-based provisional naming scheme. Currently, two primary types of information are provided in HOMD—taxonomic and genomic. Named oral species and taxa identified from 16S rRNA gene sequence analysis of oral isolates and cloning studies were placed into defined 16S rRNA phylotypes and each given unique Human Oral Taxon (HOT) number. The HOT interlinks phenotypic, phylogenetic, genomic, clinical and bibliographic information for each taxon. A BLAST search tool is provided to match user 16S rRNA gene sequences to a curated, full length, 16S rRNA gene reference data set. For genomic analysis, HOMD provides comprehensive set of analysis tools and maintains frequently updated annotations for all the human oral microbial genomes that have been sequenced and publicly released. Oral bacterial genome sequences, determined as part of the Human Microbiome Project, are being added to the HOMD as they become available. We provide HOMD as a conceptual model for the presentation of microbiome data for other human body sites. Database URL: http://www.homd.org PMID:20624719

  5. Updated regulation curation model at the Saccharomyces Genome Database

    PubMed Central

    Engel, Stacia R; Skrzypek, Marek S; Hellerstedt, Sage T; Wong, Edith D; Nash, Robert S; Weng, Shuai; Binkley, Gail; Sheppard, Travis K; Karra, Kalpana; Cherry, J Michael

    2018-01-01

    Abstract The Saccharomyces Genome Database (SGD) provides comprehensive, integrated biological information for the budding yeast Saccharomyces cerevisiae, along with search and analysis tools to explore these data, enabling the discovery of functional relationships between sequence and gene products in fungi and higher organisms. We have recently expanded our data model for regulation curation to address regulation at the protein level in addition to transcription, and are presenting the expanded data on the ‘Regulation’ pages at SGD. These pages include a summary describing the context under which the regulator acts, manually curated and high-throughput annotations showing the regulatory relationships for that gene and a graphical visualization of its regulatory network and connected networks. For genes whose products regulate other genes or proteins, the Regulation page includes Gene Ontology enrichment analysis of the biological processes in which those targets participate. For DNA-binding transcription factors, we also provide other information relevant to their regulatory function, such as DNA binding site motifs and protein domains. As with other data types at SGD, all regulatory relationships and accompanying data are available through YeastMine, SGD’s data warehouse based on InterMine. Database URL: http://www.yeastgenome.org PMID:29688362

  6. Aspergillus fumigatus and Related Species

    PubMed Central

    Sugui, Janyce A.; Kwon-Chung, Kyung J.; Juvvadi, Praveen R.; Latgé, Jean-Paul; Steinbach, William J.

    2015-01-01

    The genus Aspergillus contains etiologic agents of aspergillosis. The clinical manifestations of the disease range from allergic reaction to invasive pulmonary infection. Among the pathogenic aspergilli, Aspergillus fumigatus is most ubiquitous in the environment and is the major cause of the disease, followed by Aspergillus flavus, Aspergillus niger, Aspergillus terreus, Aspergillus nidulans, and several species in the section Fumigati that morphologically resemble A. fumigatus. Patients that are at risk for acquiring aspergillosis are those with an altered immune system. Early diagnosis, species identification, and adequate antifungal therapy are key elements for treatment of the disease, especially in cases of pulmonary invasive aspergillosis that often advance very rapidly. Incorporating knowledge of the basic biology of Aspergillus species to that of the diseases that they cause is fundamental for further progress in the field. PMID:25377144

  7. Verification of Ribosomal Proteins of Aspergillus fumigatus for Use as Biomarkers in MALDI-TOF MS Identification.

    PubMed

    Nakamura, Sayaka; Sato, Hiroaki; Tanaka, Reiko; Yaguchi, Takashi

    2016-01-01

    We have previously proposed a rapid identification method for bacterial strains based on the profiles of their ribosomal subunit proteins (RSPs), observed using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS). This method can perform phylogenetic characterization based on the mass of housekeeping RSP biomarkers, ideally calculated from amino acid sequence information registered in public protein databases. With the aim of extending its field of application to medical mycology, this study investigates the actual state of information of RSPs of eukaryotic fungi registered in public protein databases through the characterization of ribosomal protein fractions extracted from genome-sequenced Aspergillus fumigatus strains Af293 and A1163 as a model. In this process, we have found that the public protein databases harbor problems. The RSP names are in confusion, so we have provisionally unified them using the yeast naming system. The most serious problem is that many incorrect sequences are registered in the public protein databases. Surprisingly, more than half of the sequences are incorrect, due chiefly to mis-annotation of exon/intron structures. These errors could be corrected by a combination of in silico inspection by sequence homology analysis and MALDI-TOF MS measurements. We were also able to confirm conserved post-translational modifications in eleven RSPs. After these verifications, the masses of 31 expressed RSPs under 20,000 Da could be accurately confirmed. These RSPs have a potential to be useful biomarkers for identifying clinical isolates of A. fumigatus .

  8. Performance of Matrix-Assisted Laser Desorption Ionization−Time of Flight Mass Spectrometry for Identification of Aspergillus, Scedosporium, and Fusarium spp. in the Australian Clinical Setting

    PubMed Central

    Sleiman, Sue; Halliday, Catriona L.; Chapman, Belinda; Brown, Mitchell; Nitschke, Joanne; Lau, Anna F.

    2016-01-01

    We developed an Australian database for the identification of Aspergillus, Scedosporium, and Fusarium species (n = 28) by matrix-assisted laser desorption ionization−time of flight mass spectrometry (MALDI-TOF MS). In a challenge against 117 isolates, species identification significantly improved when the in-house-built database was combined with the Bruker Filamentous Fungi Library compared with that for the Bruker library alone (Aspergillus, 93% versus 69%; Fusarium, 84% versus 42%; and Scedosporium, 94% versus 18%, respectively). PMID:27252460

  9. ATGC database and ATGC-COGs: an updated resource for micro- and macro-evolutionary studies of prokaryotic genomes and protein family annotation.

    PubMed

    Kristensen, David M; Wolf, Yuri I; Koonin, Eugene V

    2017-01-04

    The Alignable Tight Genomic Clusters (ATGCs) database is a collection of closely related bacterial and archaeal genomes that provides several tools to aid research into evolutionary processes in the microbial world. Each ATGC is a taxonomy-independent cluster of 2 or more completely sequenced genomes that meet the objective criteria of a high degree of local gene order (synteny) and a small number of synonymous substitutions in the protein-coding genes. As such, each ATGC is suited for analysis of microevolutionary variations within a cohesive group of organisms (e.g. species), whereas the entire collection of ATGCs is useful for macroevolutionary studies. The ATGC database includes many forms of pre-computed data, in particular ATGC-COGs (Clusters of Orthologous Genes), multiple sequence alignments, a set of 'index' orthologs representing the most well-conserved members of each ATGC-COG, the phylogenetic tree of the organisms within each ATGC, etc. Although the ATGC database contains several million proteins from thousands of genomes organized into hundreds of clusters (roughly a 4-fold increase since the last version of the ATGC database), it is now built with completely automated methods and will be regularly updated following new releases of the NCBI RefSeq database. The ATGC database is hosted jointly at the University of Iowa at dmk-brain.ecn.uiowa.edu/ATGC/ and the NCBI at ftp.ncbi.nlm.nih.gov/pub/kristensen/ATGC/atgc_home.html. Published by Oxford University Press on behalf of Nucleic Acids Research 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  10. [Progress in omics research of Aspergillus niger].

    PubMed

    Sui, Yufei; Ouyang, Liming; Lu, Hongzhong; Zhuang, Yingping; Zhang, Siliang

    2016-08-25

    Aspergillus niger, as an important industrial fermentation strain, is widely applied in the production of organic acids and industrial enzymes. With the development of diverse omics technologies, the data of genome, transcriptome, proteome and metabolome of A. niger are increasing continuously, which declared the coming era of big data for the research in fermentation process of A. niger. The data analysis from single omics and the comparison of multi-omics, to the integrations of multi-omics based on the genome-scale metabolic network model largely extends the intensive and systematic understanding of the efficient production mechanism of A. niger. It also provides possibilities for the reasonable global optimization of strain performance by genetic modification and process regulation. We reviewed and summarized progress in omics research of A. niger, and proposed the development direction of omics research on this cell factory.

  11. The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata.

    PubMed

    Pagani, Ioanna; Liolios, Konstantinos; Jansson, Jakob; Chen, I-Min A; Smirnova, Tatyana; Nosrat, Bahador; Markowitz, Victor M; Kyrpides, Nikos C

    2012-01-01

    The Genomes OnLine Database (GOLD, http://www.genomesonline.org/) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2011, GOLD, now on version 4.0, contains information for 11,472 sequencing projects, of which 2907 have been completed and their sequence data has been deposited in a public repository. Out of these complete projects, 1918 are finished and 989 are permanent drafts. Moreover, GOLD contains information for 340 metagenome studies associated with 1927 metagenome samples. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about any (x) Sequence specification and beyond.

  12. Genetic diversity of Aspergillus species isolated from onychomycosis and Aspergillus hongkongensis sp. nov., with implications to antifungal susceptibility testing.

    PubMed

    Tsang, Chi-Ching; Hui, Teresa W S; Lee, Kim-Chung; Chen, Jonathan H K; Ngan, Antonio H Y; Tam, Emily W T; Chan, Jasper F W; Wu, Andrea L; Cheung, Mei; Tse, Brian P H; Wu, Alan K L; Lai, Christopher K C; Tsang, Dominic N C; Que, Tak-Lun; Lam, Ching-Wan; Yuen, Kwok-Yung; Lau, Susanna K P; Woo, Patrick C Y

    2016-02-01

    Thirteen Aspergillus isolates recovered from nails of 13 patients (fingernails, n=2; toenails, n=11) with onychomycosis were characterized. Twelve strains were identified by multilocus sequencing as Aspergillus spp. (Aspergillus sydowii [n=4], Aspergillus welwitschiae [n=3], Aspergillus terreus [n=2], Aspergillus flavus [n=1], Aspergillus tubingensis [n=1], and Aspergillus unguis [n=1]). Isolates of A. terreus, A. flavus, and A. unguis were also identifiable by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. The 13th isolate (HKU49(T)) possessed unique morphological characteristics different from other Aspergillus spp. Molecular characterization also unambiguously showed that HKU49(T) was distinct from other Aspergillus spp. We propose the novel species Aspergillus hongkongensis to describe this previously unknown fungus. Antifungal susceptibility testing showed most Aspergillus isolates had low MICs against itraconazole and voriconazole, but all Aspergillus isolates had high MICs against fluconazole. A diverse spectrum of Aspergillus species is associated with onychomycosis. Itraconazole and voriconazole are probably better drug options for Aspergillus onychomycosis. Copyright © 2016 Elsevier Inc. All rights reserved.

  13. Genome-scale analysis of the high-efficient protein secretion system of Aspergillus oryzae

    PubMed Central

    2014-01-01

    Background The koji mold, Aspergillus oryzae is widely used for the production of industrial enzymes due to its particularly high protein secretion capacity and ability to perform post-translational modifications. However, systemic analysis of its secretion system is lacking, generally due to the poorly annotated proteome. Results Here we defined a functional protein secretory component list of A. oryzae using a previously reported secretory model of S. cerevisiae as scaffold. Additional secretory components were obtained by blast search with the functional components reported in other closely related fungal species such as Aspergillus nidulans and Aspergillus niger. To evaluate the defined component list, we performed transcriptome analysis on three α-amylase over-producing strains with varying levels of secretion capacities. Specifically, secretory components involved in the ER-associated processes (including components involved in the regulation of transport between ER and Golgi) were significantly up-regulated, with many of them never been identified for A. oryzae before. Furthermore, we defined a complete list of the putative A. oryzae secretome and monitored how it was affected by overproducing amylase. Conclusion In combination with the transcriptome data, the most complete secretory component list and the putative secretome, we improved the systemic understanding of the secretory machinery of A. oryzae in response to high levels of protein secretion. The roles of many newly predicted secretory components were experimentally validated and the enriched component list provides a better platform for driving more mechanistic studies of the protein secretory pathway in this industrially important fungus. PMID:24961398

  14. Misidentification of Aspergillus nomius and Aspergillus tamarii as Aspergillus flavus: characterization by internal transcribed spacer, β-Tubulin, and calmodulin gene sequencing, metabolic fingerprinting, and matrix-assisted laser desorption ionization-time of flight mass spectrometry.

    PubMed

    Tam, Emily W T; Chen, Jonathan H K; Lau, Eunice C L; Ngan, Antonio H Y; Fung, Kitty S C; Lee, Kim-Chung; Lam, Ching-Wan; Yuen, Kwok-Yung; Lau, Susanna K P; Woo, Patrick C Y

    2014-04-01

    Aspergillus nomius and Aspergillus tamarii are Aspergillus species that phenotypically resemble Aspergillus flavus. In the last decade, a number of case reports have identified A. nomius and A. tamarii as causes of human infections. In this study, using an internal transcribed spacer, β-tubulin, and calmodulin gene sequencing, only 8 of 11 clinical isolates reported as A. flavus in our clinical microbiology laboratory by phenotypic methods were identified as A. flavus. The other three isolates were A. nomius (n = 2) or A. tamarii (n = 1). The results corresponded with those of metabolic fingerprinting, in which the A. flavus, A. nomius, and A. tamarii strains were separated into three clusters based on ultra-high-performance liquid chromatography-tandem mass spectrometry (UHPLC MS) analysis. The first two patients with A. nomius infections had invasive aspergillosis and chronic cavitary and fibrosing pulmonary and pleural aspergillosis, respectively, whereas the third patient had A. tamarii colonization of the airway. Identification of the 11 clinical isolates and three reference strains by matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) showed that only six of the nine strains of A. flavus were identified correctly. None of the strains of A. nomius and A. tamarii was correctly identified. β-Tubulin or the calmodulin gene should be the gene target of choice for identifying A. flavus, A. nomius, and A. tamarii. To improve the usefulness of MALDI-TOF MS, the number of strains for each species in MALDI-TOF MS databases should be expanded to cover intraspecies variability.

  15. Misidentification of Aspergillus nomius and Aspergillus tamarii as Aspergillus flavus: Characterization by Internal Transcribed Spacer, β-Tubulin, and Calmodulin Gene Sequencing, Metabolic Fingerprinting, and Matrix-Assisted Laser Desorption Ionization–Time of Flight Mass Spectrometry

    PubMed Central

    Tam, Emily W. T.; Chen, Jonathan H. K.; Lau, Eunice C. L.; Ngan, Antonio H. Y.; Fung, Kitty S. C.; Lee, Kim-Chung; Lam, Ching-Wan; Yuen, Kwok-Yung

    2014-01-01

    Aspergillus nomius and Aspergillus tamarii are Aspergillus species that phenotypically resemble Aspergillus flavus. In the last decade, a number of case reports have identified A. nomius and A. tamarii as causes of human infections. In this study, using an internal transcribed spacer, β-tubulin, and calmodulin gene sequencing, only 8 of 11 clinical isolates reported as A. flavus in our clinical microbiology laboratory by phenotypic methods were identified as A. flavus. The other three isolates were A. nomius (n = 2) or A. tamarii (n = 1). The results corresponded with those of metabolic fingerprinting, in which the A. flavus, A. nomius, and A. tamarii strains were separated into three clusters based on ultra-high-performance liquid chromatography-tandem mass spectrometry (UHPLC MS) analysis. The first two patients with A. nomius infections had invasive aspergillosis and chronic cavitary and fibrosing pulmonary and pleural aspergillosis, respectively, whereas the third patient had A. tamarii colonization of the airway. Identification of the 11 clinical isolates and three reference strains by matrix-assisted laser desorption ionization–time of flight mass spectrometry (MALDI-TOF MS) showed that only six of the nine strains of A. flavus were identified correctly. None of the strains of A. nomius and A. tamarii was correctly identified. β-Tubulin or the calmodulin gene should be the gene target of choice for identifying A. flavus, A. nomius, and A. tamarii. To improve the usefulness of MALDI-TOF MS, the number of strains for each species in MALDI-TOF MS databases should be expanded to cover intraspecies variability. PMID:24452174

  16. Prospecting for the incidence of genes involved in ochratoxin and fumonisin biosynthesis in Brazilian strains of Aspergillus niger and Aspergillus welwitschiae.

    PubMed

    Massi, Fernanda Pelisson; Sartori, Daniele; de Souza Ferranti, Larissa; Iamanaka, Beatriz Thie; Taniwaki, Marta Hiromi; Vieira, Maria Lucia Carneiro; Fungaro, Maria Helena Pelegrinelli

    2016-03-16

    Aspergillus niger "aggregate" is an informal taxonomic rank that represents a group of species from the section Nigri. Among A. niger "aggregate" species Aspergillus niger sensu stricto and its cryptic species Aspergillus welwitschiae (=Aspergillus awamori sensu Perrone) are proven as ochratoxin A and fumonisin B2 producing species. A. niger has been frequently found in tropical and subtropical foods. A. welwitschiae is a new species, which was recently dismembered from the A. niger taxon. These species are morphologically very similar and molecular data are indispensable for their identification. A total of 175 Brazilian isolates previously identified as A. niger collected from dried fruits, Brazil nuts, coffee beans, grapes, cocoa and onions were investigated in this study. Based on partial calmodulin gene sequences about one-half of our isolates were identified as A. welwitschiae. This new species was the predominant species in onions analyzed in Brazil. A. niger and A. welwitschiae differ in their ability to produce ochratoxin A and fumonisin B2. Among A. niger isolates, approximately 32% were OTA producers, but in contrast only 1% of the A. welwitschiae isolates revealed the ability to produce ochratoxin A. Regarding fumonisin B2 production, there was a higher frequency of FB2 producing isolates in A. niger (74%) compared to A. welwitschiae (34%). Because not all A. niger and A. welwitschiae strains produce ochratoxin A and fumonisin B2, in this study a multiplex PCR was developed for detecting the presence of essential genes involved in ochratoxin (polyketide synthase and radHflavin-dependent halogenase) and fumonisin (α-oxoamine synthase) biosynthesis in the genome of A. niger and A. welwitschiae isolates. The frequency of strains harboring the mycotoxin genes was markedly different between A. niger and A. welwitschiae. All OTA producing isolates of A. niger and A. welwitschiae showed in their genome the pks and radH genes, and 95.2% of the nonproducing

  17. Genome-Wide Association Mapping of and Aspergillus flavus Aflatoxin Accumulation Resistance in Maize

    Treesearch

    Marilyn L. Warburton; Juliet D. Tang; Gary L. Windham; Leigh K. Hawkins; Seth C. Murray; Wenwei Xu; Debbie Boykin; Andy Perkins; W. Paul Williams

    2015-01-01

    Contamination of maize (Zea mays L.) with aflatoxin, produced by the fungus Aspergillus flavus Link, has severe health and economic consequences. Efforts to reduce aflatoxin accumulation in maize have focused on identifying and selecting germplasm with natural host resistance factors, and several maize lines with significantly...

  18. A low-latency, big database system and browser for storage, querying and visualization of 3D genomic data.

    PubMed

    Butyaev, Alexander; Mavlyutov, Ruslan; Blanchette, Mathieu; Cudré-Mauroux, Philippe; Waldispühl, Jérôme

    2015-09-18

    Recent releases of genome three-dimensional (3D) structures have the potential to transform our understanding of genomes. Nonetheless, the storage technology and visualization tools need to evolve to offer to the scientific community fast and convenient access to these data. We introduce simultaneously a database system to store and query 3D genomic data (3DBG), and a 3D genome browser to visualize and explore 3D genome structures (3DGB). We benchmark 3DBG against state-of-the-art systems and demonstrate that it is faster than previous solutions, and importantly gracefully scales with the size of data. We also illustrate the usefulness of our 3D genome Web browser to explore human genome structures. The 3D genome browser is available at http://3dgb.cs.mcgill.ca/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Public variant databases: liability?

    PubMed

    Thorogood, Adrian; Cook-Deegan, Robert; Knoppers, Bartha Maria

    2017-07-01

    Public variant databases support the curation, clinical interpretation, and sharing of genomic data, thus reducing harmful errors or delays in diagnosis. As variant databases are increasingly relied on in the clinical context, there is concern that negligent variant interpretation will harm patients and attract liability. This article explores the evolving legal duties of laboratories, public variant databases, and physicians in clinical genomics and recommends a governance framework for databases to promote responsible data sharing.Genet Med advance online publication 15 December 2016.

  20. Elusive Origins of the Extra Genes in Aspergillus oryzae

    PubMed Central

    Khaldi, Nora; Wolfe, Kenneth H.

    2008-01-01

    The genome sequence of Aspergillus oryzae revealed unexpectedly that this species has approximately 20% more genes than its congeneric species A. nidulans and A. fumigatus. Where did these extra genes come from? Here, we evaluate several possible causes of the elevated gene number. Many gene families are expanded in A. oryzae relative to A. nidulans and A. fumigatus, but we find no evidence of ancient whole-genome duplication or other segmental duplications, either in A. oryzae or in the common ancestor of the genus Aspergillus. We show that the presence of divergent pairs of paralogs is a feature peculiar to A. oryzae and is not shared with A. nidulans or A. fumigatus. In phylogenetic trees that include paralog pairs from A. oryzae, we frequently find that one of the genes in a pair from A. oryzae has the expected orthologous relationship with A. nidulans, A. fumigatus and other species in the subphylum Eurotiomycetes, whereas the other A. oryzae gene falls outside this clade but still within the Ascomycota. We identified 456 such gene pairs in A. oryzae. Further phylogenetic analysis did not however indicate a single consistent evolutionary origin for the divergent members of these pairs. Approximately one-third of them showed phylogenies that are suggestive of horizontal gene transfer (HGT) from Sordariomycete species, and these genes are closer together in the A. oryzae genome than expected by chance, but no unique Sordariomycete donor species was identifiable. The postulated HGTs from Sordariomycetes still leave the majority of extra A. oryzae genes unaccounted for. One possible explanation for our observations is that A. oryzae might have been the recipient of many separate HGT events from diverse donors. PMID:18725939

  1. MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource for plant genomics

    PubMed Central

    Schoof, Heiko; Ernst, Rebecca; Nazarov, Vladimir; Pfeifer, Lukas; Mewes, Hans-Werner; Mayer, Klaus F. X.

    2004-01-01

    Arabidopsis thaliana is the most widely studied model plant. Functional genomics is intensively underway in many laboratories worldwide. Beyond the basic annotation of the primary sequence data, the annotated genetic elements of Arabidopsis must be linked to diverse biological data and higher order information such as metabolic or regulatory pathways. The MIPS Arabidopsis thaliana database MAtDB aims to provide a comprehensive resource for Arabidopsis as a genome model that serves as a primary reference for research in plants and is suitable for transfer of knowledge to other plants, especially crops. The genome sequence as a common backbone serves as a scaffold for the integration of data, while, in a complementary effort, these data are enhanced through the application of state-of-the-art bioinformatics tools. This information is visualized on a genome-wide and a gene-by-gene basis with access both for web users and applications. This report updates the information given in a previous report and provides an outlook on further developments. The MAtDB web interface can be accessed at http://mips.gsf.de/proj/thal/db. PMID:14681437

  2. Characterization of the product of a nonribosomal peptide synthetase-like (NRPS-like) gene using the doxycycline dependent Tet-on system in Aspergillus terreus.

    PubMed

    Sun, Wei-Wen; Guo, Chun-Jun; Wang, Clay C C

    2016-04-01

    Genome sequencing of the fungus Aspergillus terreus uncovered a number of silent core structural biosynthetic genes encoding enzymes presumed to be involved in the production of cryptic secondary metabolites. There are five nonribosomal peptide synthetase (NRPS)-like genes with the predicted A-T-TE domain architecture within the A. terreus genome. Among the five genes, only the product of pgnA remains unknown. The Tet-on system is an inducible, tunable and metabolism-independent expression system originally developed for Aspergillus niger. Here we report the adoption of the Tet-on system as an effective gene activation tool in A. terreus. Application of this system in A. terreus allowed us to uncover the product of the cryptic NRPS-like gene, pgnA. Furthermore expression of pgnA in the heterologous Aspergillus nidulans host suggested that the pgnA gene alone is necessary for phenguignardic acid (1) biosynthesis. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata

    PubMed Central

    Pagani, Ioanna; Liolios, Konstantinos; Jansson, Jakob; Chen, I-Min A.; Smirnova, Tatyana; Nosrat, Bahador; Markowitz, Victor M.; Kyrpides, Nikos C.

    2012-01-01

    The Genomes OnLine Database (GOLD, http://www.genomesonline.org/) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2011, GOLD, now on version 4.0, contains information for 11 472 sequencing projects, of which 2907 have been completed and their sequence data has been deposited in a public repository. Out of these complete projects, 1918 are finished and 989 are permanent drafts. Moreover, GOLD contains information for 340 metagenome studies associated with 1927 metagenome samples. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about any (x) Sequence specification and beyond. PMID:22135293

  4. Draft genome sequence of an aflatoxigenic Aspergillus species, A. bombycis

    USDA-ARS?s Scientific Manuscript database

    The genome of the A. bombycis Type strain was sequenced using a Personal Genome Machine, followed by annotation of its predicted genes. The genome size for A. bombycis was found to be approximately 37 Mb and contained 12,266 genes. This announcement introduces a sequenced genome for an aflatoxigenic...

  5. The Changing Face of Scientific Discourse: Analysis of Genomic and Proteomic Database Usage and Acceptance.

    ERIC Educational Resources Information Center

    Brown, Cecelia

    2003-01-01

    Discusses the growth in use and acceptance of Web-based genomic and proteomic databases (GPD) in scholarly communication. Confirms the role of GPD in the scientific literature cycle, suggests GPD are a storage and retrieval mechanism for molecular biology information, and recommends that existing models of scientific communication be updated to…

  6. Public variant databases: liability?

    PubMed Central

    Thorogood, Adrian; Cook-Deegan, Robert; Knoppers, Bartha Maria

    2017-01-01

    Public variant databases support the curation, clinical interpretation, and sharing of genomic data, thus reducing harmful errors or delays in diagnosis. As variant databases are increasingly relied on in the clinical context, there is concern that negligent variant interpretation will harm patients and attract liability. This article explores the evolving legal duties of laboratories, public variant databases, and physicians in clinical genomics and recommends a governance framework for databases to promote responsible data sharing. Genet Med advance online publication 15 December 2016 PMID:27977006

  7. Sequence modelling and an extensible data model for genomic database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Peter Wei-Der

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data modelmore » that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.« less

  8. Sequence modelling and an extensible data model for genomic database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Peter Wei-Der

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data modelmore » that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.« less

  9. Performance of Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry for Identification of Aspergillus, Scedosporium, and Fusarium spp. in the Australian Clinical Setting.

    PubMed

    Sleiman, Sue; Halliday, Catriona L; Chapman, Belinda; Brown, Mitchell; Nitschke, Joanne; Lau, Anna F; Chen, Sharon C-A

    2016-08-01

    We developed an Australian database for the identification of Aspergillus, Scedosporium, and Fusarium species (n = 28) by matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS). In a challenge against 117 isolates, species identification significantly improved when the in-house-built database was combined with the Bruker Filamentous Fungi Library compared with that for the Bruker library alone (Aspergillus, 93% versus 69%; Fusarium, 84% versus 42%; and Scedosporium, 94% versus 18%, respectively). Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  10. Genome-wide analysis of the Zn(II)2Cys6 zinc cluster-encoding gene family in Aspergillus flavus

    USDA-ARS?s Scientific Manuscript database

    Proteins with a Zn(II)2Cys6 domain, Cys-X2-Cys-X6-Cys-X5-12-Cys-X2-Cys-X6-9-Cys (hereafter, referred to as the C6 domain), form a subclass of zinc finger proteins found exclusively in fungi and yeast. Genome sequence databases of Saccharomyces cerevisiae and Candida albicans have provided an overvie...

  11. Survey of candidate genes for maize resistance to infection by Aspergillus flavus and/or aflatoxin contamination

    Treesearch

    Leigh Hawkins; Marilyn Warburton; Juliet Tang; John Tomashek; Dafne Alves Oliveira; Oluwaseun Ogunola; J. Smith; W. Williams

    2018-01-01

    Many projects have identified candidate genes for resistance to aflatoxin accumulation or Aspergillus flavus infection and growth in maize using genetic mapping, genomics, transcriptomics and/or proteomics studies. However, only a small percentage of these candidates have been validated in field conditions, and their relative contribution to...

  12. The MAR databases: development and implementation of databases specific for marine metagenomics

    PubMed Central

    Klemetsen, Terje; Raknes, Inge A; Fu, Juan; Agafonov, Alexander; Balasundaram, Sudhagar V; Tartari, Giacomo; Robertsen, Espen

    2018-01-01

    Abstract We introduce the marine databases; MarRef, MarDB and MarCat (https://mmp.sfb.uit.no/databases/), which are publicly available resources that promote marine research and innovation. These data resources, which have been implemented in the Marine Metagenomics Portal (MMP) (https://mmp.sfb.uit.no/), are collections of richly annotated and manually curated contextual (metadata) and sequence databases representing three tiers of accuracy. While MarRef is a database for completely sequenced marine prokaryotic genomes, which represent a marine prokaryote reference genome database, MarDB includes all incomplete sequenced prokaryotic genomes regardless level of completeness. The last database, MarCat, represents a gene (protein) catalog of uncultivable (and cultivable) marine genes and proteins derived from marine metagenomics samples. The first versions of MarRef and MarDB contain 612 and 3726 records, respectively. Each record is built up of 106 metadata fields including attributes for sampling, sequencing, assembly and annotation in addition to the organism and taxonomic information. Currently, MarCat contains 1227 records with 55 metadata fields. Ontologies and controlled vocabularies are used in the contextual databases to enhance consistency. The user-friendly web interface lets the visitors browse, filter and search in the contextual databases and perform BLAST searches against the corresponding sequence databases. All contextual and sequence databases are freely accessible and downloadable from https://s1.sfb.uit.no/public/mar/. PMID:29106641

  13. Aspergillus--classification and antifungal susceptibilities.

    PubMed

    Buzina, Walter

    2013-01-01

    Aspergillus is one of the most important fungal genera for the man, for its industrial use, its ability to spoil food and not least its medical impact as cause of a variety of diseases. Currently hundreds of species of Aspergillus are known; nearly fifty of them are able to cause infections in humans and animals. Recently, the genus Aspergillus is subdivided into 8 subgenera and 22 sections. The spectrum of diseases caused by Aspergillus species varies from superficial cutaneous to invasive and systemic infections. All species of Aspergillus investigated so far are resistant against the antifungals fluconazole and 5-fluorocytosine, the range of susceptibilities to currently available antifungals is discussed in this paper.

  14. Verification of Ribosomal Proteins of Aspergillus fumigatus for Use as Biomarkers in MALDI-TOF MS Identification

    PubMed Central

    Nakamura, Sayaka; Sato, Hiroaki; Tanaka, Reiko; Yaguchi, Takashi

    2016-01-01

    We have previously proposed a rapid identification method for bacterial strains based on the profiles of their ribosomal subunit proteins (RSPs), observed using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS). This method can perform phylogenetic characterization based on the mass of housekeeping RSP biomarkers, ideally calculated from amino acid sequence information registered in public protein databases. With the aim of extending its field of application to medical mycology, this study investigates the actual state of information of RSPs of eukaryotic fungi registered in public protein databases through the characterization of ribosomal protein fractions extracted from genome-sequenced Aspergillus fumigatus strains Af293 and A1163 as a model. In this process, we have found that the public protein databases harbor problems. The RSP names are in confusion, so we have provisionally unified them using the yeast naming system. The most serious problem is that many incorrect sequences are registered in the public protein databases. Surprisingly, more than half of the sequences are incorrect, due chiefly to mis-annotation of exon/intron structures. These errors could be corrected by a combination of in silico inspection by sequence homology analysis and MALDI-TOF MS measurements. We were also able to confirm conserved post-translational modifications in eleven RSPs. After these verifications, the masses of 31 expressed RSPs under 20,000 Da could be accurately confirmed. These RSPs have a potential to be useful biomarkers for identifying clinical isolates of A. fumigatus. PMID:27843740

  15. OntoMate: a text-mining tool aiding curation at the Rat Genome Database

    PubMed Central

    Liu, Weisong; Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R.; De Pons, Jeff; Dwinell, Melinda R.; Shimoyama, Mary

    2015-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu PMID:25619558

  16. SinEx DB: a database for single exon coding sequences in mammalian genomes.

    PubMed

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl. © The Author(s) 2016. Published by Oxford University Press.

  17. Comprehensive reconstruction and in silico analysis of Aspergillus niger genome-scale metabolic network model that accounts for 1210 ORFs.

    PubMed

    Lu, Hongzhong; Cao, Weiqiang; Ouyang, Liming; Xia, Jianye; Huang, Mingzhi; Chu, Ju; Zhuang, Yingping; Zhang, Siliang; Noorman, Henk

    2017-03-01

    Aspergillus niger is one of the most important cell factories for industrial enzymes and organic acids production. A comprehensive genome-scale metabolic network model (GSMM) with high quality is crucial for efficient strain improvement and process optimization. The lack of accurate reaction equations and gene-protein-reaction associations (GPRs) in the current best model of A. niger named GSMM iMA871, however, limits its application scope. To overcome these limitations, we updated the A. niger GSMM by combining the latest genome annotation and literature mining technology. Compared with iMA871, the number of reactions in iHL1210 was increased from 1,380 to 1,764, and the number of unique ORFs from 871 to 1,210. With the aid of our transcriptomics analysis, the existence of 63% ORFs and 68% reactions in iHL1210 can be verified when glucose was used as the only carbon source. Physiological data from chemostat cultivations, 13 C-labeled and molecular experiments from the published literature were further used to check the performance of iHL1210. The average correlation coefficients between the predicted fluxes and estimated fluxes from 13 C-labeling data were sufficiently high (above 0.89) and the prediction of cell growth on most of the reported carbon and nitrogen sources was consistent. Using the updated genome-scale model, we evaluated gene essentiality on synthetic and yeast extract medium, as well as the effects of NADPH supply on glucoamylase production in A. niger. In summary, the new A. niger GSMM iHL1210 contains significant improvements with respect to the metabolic coverage and prediction performance, which paves the way for systematic metabolic engineering of A. niger. Biotechnol. Bioeng. 2017;114: 685-695. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  18. Permanent Genetic Resources added to Molecular Ecology Resources Database 1 October 2009-30 November 2009.

    PubMed

    An, Junghwa; Bechet, Arnaud; Berggren, Asa; Brown, Sarah K; Bruford, Michael W; Cai, Qingui; Cassel-Lundhagen, Anna; Cezilly, Frank; Chen, Song-Lin; Cheng, Wei; Choi, Sung-Kyoung; Ding, X Y; Fan, Yong; Feldheim, Kevin A; Feng, Z Y; Friesen, Vicki L; Gaillard, Maria; Galaraza, Juan A; Gallo, Leonardo; Ganeshaiah, K N; Geraci, Julia; Gibbons, John G; Grant, William S; Grauvogel, Zac; Gustafsson, S; Guyon, Jeffrey R; Han, L; Heath, Daniel D; Hemmilä, S; Hogan, J Derek; Hou, B W; Jakse, Jernej; Javornik, Branka; Kaňuch, Peter; Kim, Kyung-Kil; Kim, Kyung-Seok; Kim, Sang-Gyu; Kim, Sang-In; Kim, Woo-Jin; Kim, Yi-Kyung; Klich, Maren A; Kreiser, Brian R; Kwan, Ye-Seul; Lam, Athena W; Lasater, Kelly; Lascoux, M; Lee, Hang; Lee, Yun-Sun; Li, D L; Li, Shao-Jing; Li, W Y; Liao, Xiaolin; Liber, Zlatko; Lin, Lin; Liu, Shaoying; Luo, Xin-Hui; Ma, Y H; Ma, Yajun; Marchelli, Paula; Min, Mi-Sook; Moccia, Maria Domenica; Mohana, Kumara P; Moore, Marcelle; Morris-Pocock, James A; Park, Han-Chan; Pfunder, Monika; Ivan, Radosavljević; Ravikanth, G; Roderick, George K; Rokas, Antonis; Sacks, Benjamin N; Saski, Christopher A; Satovic, Zlatko; Schoville, Sean D; Sebastiani, Federico; Sha, Zhen-Xia; Shin, Eun-Ha; Soliani, Carolina; Sreejayan, N; Sun, Zhengxin; Tao, Yong; Taylor, Scott A; Templin, William D; Shaanker, R Uma; Vasudeva, R; Vendramin, Giovanni G; Walter, Ryan P; Wang, Gui-Zhong; Wang, Ke-Jian; Wang, Y Q; Wattier, Rémi A; Wei, Fuwen; Widmer, Alex; Woltmann, Stefan; Won, Yong-Jin; Wu, Jing; Xie, M L; Xu, Genbo; Xu, Xiao-Jun; Ye, Hai-Hui; Zhan, Xiangjiang; Zhang, F; Zhong, J

    2010-03-01

    This article documents the addition of 411 microsatellite marker loci and 15 pairs of Single Nucleotide Polymorphism (SNP) sequencing primers to the Molecular Ecology Resources Database. Loci were developed for the following species: Acanthopagrus schlegeli, Anopheles lesteri, Aspergillus clavatus, Aspergillus flavus, Aspergillus fumigatus, Aspergillus oryzae, Aspergillus terreus, Branchiostoma japonicum, Branchiostoma belcheri, Colias behrii, Coryphopterus personatus, Cynogolssus semilaevis, Cynoglossus semilaevis, Dendrobium officinale, Dendrobium officinale, Dysoxylum malabaricum, Metrioptera roeselii, Myrmeciza exsul, Ochotona thibetana, Neosartorya fischeri, Nothofagus pumilio, Onychodactylus fischeri, Phoenicopterus roseus, Salvia officinalis L., Scylla paramamosain, Silene latifo, Sula sula, and Vulpes vulpes. These loci were cross-tested on the following species: Aspergillus giganteus, Colias pelidne, Colias interior, Colias meadii, Colias eurytheme, Coryphopterus lipernes, Coryphopterus glaucofrenum, Coryphopterus eidolon, Gnatholepis thompsoni, Elacatinus evelynae, Dendrobium loddigesii Dendrobium devonianum, Dysoxylum binectariferum, Nothofagus antarctica, Nothofagus dombeyii, Nothofagus nervosa, Nothofagus obliqua, Sula nebouxii, and Sula variegata. This article also documents the addition of 39 sequencing primer pairs and 15 allele specific primers or probes for Paralithodes camtschaticus. © 2010 Blackwell Publishing Ltd.

  19. Molecular identification and amphotericin B susceptibility testing of clinical isolates of Aspergillus from 11 hospitals in Korea.

    PubMed

    Heo, Min Seok; Shin, Jong Hee; Choi, Min Ji; Park, Yeon Joon; Lee, Hye Soo; Koo, Sun Hoe; Lee, Won Gil; Kim, Soo Hyun; Shin, Myung Geun; Suh, Soon Pal; Ryang, Dong Wook

    2015-11-01

    We investigated the species distribution and amphotericin B (AMB) susceptibility of Korean clinical Aspergillus isolates by using two Etests and the CLSI broth microdilution method. A total of 136 Aspergillus isolates obtained from 11 university hospitals were identified by sequencing the internal transcribed spacer (ITS) and β-tubulin genomic regions. Minimal inhibitory concentrations (MICs) of AMB were determined in Etests using Mueller-Hinton agar (Etest-MH) and RPMI agar (Etest-RPG), and categorical agreement with the CLSI method was assessed by using epidemiological cutoff values. ITS sequencing identified the following six Aspergillus species complexes: Aspergillus fumigatus (42.6% of the isolates), A. niger (23.5%), A. flavus (17.6%), A. terreus (11.0%), A. versicolor (4.4%), and A. ustus (0.7%). Cryptic species identifiable by β-tubulin sequencing accounted for 25.7% (35/136) of the isolates. Of all 136 isolates, 36 (26.5%) had AMB MICs of ≥2 μg/mL by the CLSI method. The categorical agreement of Etest-RPG with the CLSI method was 98% for the A. fumigatus, A. niger, and A. versicolor complexes, 87% for the A. terreus complex, and 37.5% for the A. flavus complex. That of Etest-MH was ≤75% for the A. niger, A. flavus, A. terreus, and A. versicolor complexes but was higher for the A. fumigatus complex (98.3%). Aspergillus species other than A. fumigatus constitute about 60% of clinical Aspergillus isolates, and reduced AMB susceptibility is common among clinical isolates of Aspergillus in Korea. Molecular identification and AMB susceptibility testing by Etest-RPG may be useful for characterizing Aspergillus isolates of clinical relevance.

  20. PineElm_SSRdb: a microsatellite marker database identified from genomic, chloroplast, mitochondrial and EST sequences of pineapple (Ananas comosus (L.) Merrill).

    PubMed

    Chaudhary, Sakshi; Mishra, Bharat Kumar; Vivek, Thiruvettai; Magadum, Santoshkumar; Yasin, Jeshima Khan

    2016-01-01

    Simple Sequence Repeats or microsatellites are resourceful molecular genetic markers. There are only few reports of SSR identification and development in pineapple. Complete genome sequence of pineapple available in the public domain can be used to develop numerous novel SSRs. Therefore, an attempt was made to identify SSRs from genomic, chloroplast, mitochondrial and EST sequences of pineapple which will help in deciphering genetic makeup of its germplasm resources. A total of 359511 SSRs were identified in pineapple (356385 from genome sequence, 45 from chloroplast sequence, 249 in mitochondrial sequence and 2832 from EST sequences). The list of EST-SSR markers and their details are available in the database. PineElm_SSRdb is an open source database available for non-commercial academic purpose at http://app.bioelm.com/ with a mapping tool which can develop circular maps of selected marker set. This database will be of immense use to breeders, researchers and graduates working on Ananas spp. and to others working on cross-species transferability of markers, investigating diversity, mapping and DNA fingerprinting.

  1. Aflatoxigenic Aspergillus flavus and Aspergillus parasiticus strains in Hungarian maize fields.

    PubMed

    Sebők, Flóra; Dobolyi, Csaba; Zágoni, Dóra; Risa, Anita; Krifaton, Csilla; Hartman, Mátyás; Cserháti, Mátyás; Szoboszlay, Sándor; Kriszt, Balázs

    2016-12-01

    Due to the climate change, aflatoxigenic Aspergillus species and strains have appeared in several European countries, contaminating different agricultural commodities with aflatoxin. Our aim was to screen the presence of aflatoxigenic fungi in maize fields throughout the seven geographic regions of Hungary. Fungi belonging to Aspergillus section Flavi were isolated in the ratio of 26.9% and 42.3% from soil and maize samples in 2013, and these ratios decreased to 16.1% and 34.7% in 2014. Based on morphological characteristics and the sequence analysis of the partial calmodulin gene, all isolates proved to be Aspergillus flavus, except four strains, which were identified as Aspergillus parasiticus. About half of the A. flavus strains and all the A. parasiticus strains were able to synthesize aflatoxins. Aflatoxigenic Aspergillus strains were isolated from all the seven regions of Hungary. A. parasiticus strains were found in the soil of the regions Southern Great Plain and Southern Transdanubia and in a maize sample of the region Western Transdanubia. In spite of the fact that aflatoxins have rarely been detected in feeds and foods in Hungary, aflatoxigenic A. flavus and A. parasiticus strains are present in the maize culture throughout Hungary posing a potential threat to food safety.

  2. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

    PubMed Central

    Pruitt, Kim D.; Tatusova, Tatiana; Maglott, Donna R.

    2005-01-01

    The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff. PMID:15608248

  3. The MAR databases: development and implementation of databases specific for marine metagenomics.

    PubMed

    Klemetsen, Terje; Raknes, Inge A; Fu, Juan; Agafonov, Alexander; Balasundaram, Sudhagar V; Tartari, Giacomo; Robertsen, Espen; Willassen, Nils P

    2018-01-04

    We introduce the marine databases; MarRef, MarDB and MarCat (https://mmp.sfb.uit.no/databases/), which are publicly available resources that promote marine research and innovation. These data resources, which have been implemented in the Marine Metagenomics Portal (MMP) (https://mmp.sfb.uit.no/), are collections of richly annotated and manually curated contextual (metadata) and sequence databases representing three tiers of accuracy. While MarRef is a database for completely sequenced marine prokaryotic genomes, which represent a marine prokaryote reference genome database, MarDB includes all incomplete sequenced prokaryotic genomes regardless level of completeness. The last database, MarCat, represents a gene (protein) catalog of uncultivable (and cultivable) marine genes and proteins derived from marine metagenomics samples. The first versions of MarRef and MarDB contain 612 and 3726 records, respectively. Each record is built up of 106 metadata fields including attributes for sampling, sequencing, assembly and annotation in addition to the organism and taxonomic information. Currently, MarCat contains 1227 records with 55 metadata fields. Ontologies and controlled vocabularies are used in the contextual databases to enhance consistency. The user-friendly web interface lets the visitors browse, filter and search in the contextual databases and perform BLAST searches against the corresponding sequence databases. All contextual and sequence databases are freely accessible and downloadable from https://s1.sfb.uit.no/public/mar/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. WheatGenome.info: A Resource for Wheat Genomics Resource.

    PubMed

    Lai, Kaitao

    2016-01-01

    An integrated database with a variety of Web-based systems named WheatGenome.info hosting wheat genome and genomic data has been developed to support wheat research and crop improvement. The resource includes multiple Web-based applications, which are implemented as a variety of Web-based systems. These include a GBrowse2-based wheat genome viewer with BLAST search portal, TAGdb for searching wheat second generation genome sequence data, wheat autoSNPdb, links to wheat genetic maps using CMap and CMap3D, and a wheat genome Wiki to allow interaction between diverse wheat genome sequencing activities. This portal provides links to a variety of wheat genome resources hosted at other research organizations. This integrated database aims to accelerate wheat genome research and is freely accessible via the web interface at http://www.wheatgenome.info/ .

  5. Atypical Aspergillus parasiticus isolates from pistachio with aflR gene nucleotide insertion identical to Aspergillus sojae

    USDA-ARS?s Scientific Manuscript database

    Aflatoxins are the most toxic and carcinogenic secondary metabolites produced primarily by the filamentous fungi Aspergillus flavus and Aspergillus parasiticus. The toxins cause devastating economic losses because of strict regulations on distribution of contaminated products. Aspergillus sojae are...

  6. Elucidation of primary metabolic pathways in Aspergillus species: orphaned research in characterizing orphan genes.

    PubMed

    Andersen, Mikael Rørdam

    2014-11-01

    Primary metabolism affects all phenotypical traits of filamentous fungi. Particular examples include reacting to extracellular stimuli, producing precursor molecules required for cell division and morphological changes as well as providing monomer building blocks for production of secondary metabolites and extracellular enzymes. In this review, all annotated genes from four Aspergillus species have been examined. In this process, it becomes evident that 80-96% of the genes (depending on the species) are still without verified function. A significant proportion of the genes with verified metabolic functions are assigned to secondary or extracellular metabolism, leaving only 2-4% of the annotated genes within primary metabolism. It is clear that primary metabolism has not received the same attention in the post-genomic area as many other research areas--despite its role at the very centre of cellular function. However, several methods can be employed to use the metabolic networks in tandem with comparative genomics to accelerate functional assignment of genes in primary metabolism. In particular, gaps in metabolic pathways can be used to assign functions to orphan genes. In this review, applications of this from the Aspergillus genes will be examined, and it is proposed that, where feasible, this should be a standard part of functional annotation of fungal genomes. © The Author 2014. Published by Oxford University Press.

  7. Previously unknown species of Aspergillus.

    PubMed

    Gautier, M; Normand, A-C; Ranque, S

    2016-08-01

    The use of multi-locus DNA sequence analysis has led to the description of previously unknown 'cryptic' Aspergillus species, whereas classical morphology-based identification of Aspergillus remains limited to the section or species-complex level. The current literature highlights two main features concerning these 'cryptic' Aspergillus species. First, the prevalence of such species in clinical samples is relatively high compared with emergent filamentous fungal taxa such as Mucorales, Scedosporium or Fusarium. Second, it is clearly important to identify these species in the clinical laboratory because of the high frequency of antifungal drug-resistant isolates of such Aspergillus species. Matrix-assisted laser desorption/ionization-time of flight mass spectrometry (MALDI-TOF MS) has recently been shown to enable the identification of filamentous fungi with an accuracy similar to that of DNA sequence-based methods. As MALDI-TOF MS is well suited to the routine clinical laboratory workflow, it facilitates the identification of these 'cryptic' Aspergillus species at the routine mycology bench. The rapid establishment of enhanced filamentous fungi identification facilities will lead to a better understanding of the epidemiology and clinical importance of these emerging Aspergillus species. Based on routine MALDI-TOF MS-based identification results, we provide original insights into the key interpretation issues of a positive Aspergillus culture from a clinical sample. Which ubiquitous species that are frequently isolated from air samples are rarely involved in human invasive disease? Can both the species and the type of biological sample indicate Aspergillus carriage, colonization or infection in a patient? Highly accurate routine filamentous fungi identification is central to enhance the understanding of these previously unknown Aspergillus species, with a vital impact on further improved patient care. Copyright © 2016 European Society of Clinical Microbiology and

  8. The strategies WDK: a graphical search interface and web development kit for functional genomics databases

    PubMed Central

    Fischer, Steve; Aurrecoechea, Cristina; Brunk, Brian P.; Gao, Xin; Harb, Omar S.; Kraemer, Eileen T.; Pennington, Cary; Treatman, Charles; Kissinger, Jessica C.; Roos, David S.; Stoeckert, Christian J.

    2011-01-01

    Web sites associated with the Eukaryotic Pathogen Bioinformatics Resource Center (EuPathDB.org) have recently introduced a graphical user interface, the Strategies WDK, intended to make advanced searching and set and interval operations easy and accessible to all users. With a design guided by usability studies, the system helps motivate researchers to perform dynamic computational experiments and explore relationships across data sets. For example, PlasmoDB users seeking novel therapeutic targets may wish to locate putative enzymes that distinguish pathogens from their hosts, and that are expressed during appropriate developmental stages. When a researcher runs one of the approximately 100 searches available on the site, the search is presented as a first step in a strategy. The strategy is extended by running additional searches, which are combined with set operators (union, intersect or minus), or genomic interval operators (overlap, contains). A graphical display uses Venn diagrams to make the strategy’s flow obvious. The interface facilitates interactive adjustment of the component searches with changes propagating forward through the strategy. Users may save their strategies, creating protocols that can be shared with colleagues. The strategy system has now been deployed on all EuPathDB databases, and successfully deployed by other projects. The Strategies WDK uses a configurable MVC architecture that is compatible with most genomics and biological warehouse databases, and is available for download at code.google.com/p/strategies-wdk. Database URL: www.eupathdb.org PMID:21705364

  9. The Innate Immune Database (IIDB)

    PubMed Central

    Korb, Martin; Rust, Aistair G; Thorsson, Vesteinn; Battail, Christophe; Li, Bin; Hwang, Daehee; Kennedy, Kathleen A; Roach, Jared C; Rosenberger, Carrie M; Gilchrist, Mark; Zak, Daniel; Johnson, Carrie; Marzolf, Bruz; Aderem, Alan; Shmulevich, Ilya; Bolouri, Hamid

    2008-01-01

    Background As part of a National Institute of Allergy and Infectious Diseases funded collaborative project, we have performed over 150 microarray experiments measuring the response of C57/BL6 mouse bone marrow macrophages to toll-like receptor stimuli. These microarray expression profiles are available freely from our project web site . Here, we report the development of a database of computationally predicted transcription factor binding sites and related genomic features for a set of over 2000 murine immune genes of interest. Our database, which includes microarray co-expression clusters and a host of web-based query, analysis and visualization facilities, is available freely via the internet. It provides a broad resource to the research community, and a stepping stone towards the delineation of the network of transcriptional regulatory interactions underlying the integrated response of macrophages to pathogens. Description We constructed a database indexed on genes and annotations of the immediate surrounding genomic regions. To facilitate both gene-specific and systems biology oriented research, our database provides the means to analyze individual genes or an entire genomic locus. Although our focus to-date has been on mammalian toll-like receptor signaling pathways, our database structure is not limited to this subject, and is intended to be broadly applicable to immunology. By focusing on selected immune-active genes, we were able to perform computationally intensive expression and sequence analyses that would currently be prohibitive if applied to the entire genome. Using six complementary computational algorithms and methodologies, we identified transcription factor binding sites based on the Position Weight Matrices available in TRANSFAC. For one example transcription factor (ATF3) for which experimental data is available, over 50% of our predicted binding sites coincide with genome-wide chromatin immnuopreciptation (ChIP-chip) results. Our database can

  10. Sequence- and Structure-Based Functional Annotation and Assessment of Metabolic Transporters in Aspergillus oryzae: A Representative Case Study

    PubMed Central

    Raethong, Nachon; Wong-ekkabut, Jirasak; Laoteng, Kobkul; Vongsangnak, Wanwipa

    2016-01-01

    Aspergillus oryzae is widely used for the industrial production of enzymes. In A. oryzae metabolism, transporters appear to play crucial roles in controlling the flux of molecules for energy generation, nutrients delivery, and waste elimination in the cell. While the A. oryzae genome sequence is available, transporter annotation remains limited and thus the connectivity of metabolic networks is incomplete. In this study, we developed a metabolic annotation strategy to understand the relationship between the sequence, structure, and function for annotation of A. oryzae metabolic transporters. Sequence-based analysis with manual curation showed that 58 genes of 12,096 total genes in the A. oryzae genome encoded metabolic transporters. Under consensus integrative databases, 55 unambiguous metabolic transporter genes were distributed into channels and pores (7 genes), electrochemical potential-driven transporters (33 genes), and primary active transporters (15 genes). To reveal the transporter functional role, a combination of homology modeling and molecular dynamics simulation was implemented to assess the relationship between sequence to structure and structure to function. As in the energy metabolism of A. oryzae, the H+-ATPase encoded by the AO090005000842 gene was selected as a representative case study of multilevel linkage annotation. Our developed strategy can be used for enhancing metabolic network reconstruction. PMID:27274991

  11. Sequence- and Structure-Based Functional Annotation and Assessment of Metabolic Transporters in Aspergillus oryzae: A Representative Case Study.

    PubMed

    Raethong, Nachon; Wong-Ekkabut, Jirasak; Laoteng, Kobkul; Vongsangnak, Wanwipa

    2016-01-01

    Aspergillus oryzae is widely used for the industrial production of enzymes. In A. oryzae metabolism, transporters appear to play crucial roles in controlling the flux of molecules for energy generation, nutrients delivery, and waste elimination in the cell. While the A. oryzae genome sequence is available, transporter annotation remains limited and thus the connectivity of metabolic networks is incomplete. In this study, we developed a metabolic annotation strategy to understand the relationship between the sequence, structure, and function for annotation of A. oryzae metabolic transporters. Sequence-based analysis with manual curation showed that 58 genes of 12,096 total genes in the A. oryzae genome encoded metabolic transporters. Under consensus integrative databases, 55 unambiguous metabolic transporter genes were distributed into channels and pores (7 genes), electrochemical potential-driven transporters (33 genes), and primary active transporters (15 genes). To reveal the transporter functional role, a combination of homology modeling and molecular dynamics simulation was implemented to assess the relationship between sequence to structure and structure to function. As in the energy metabolism of A. oryzae, the H(+)-ATPase encoded by the AO090005000842 gene was selected as a representative case study of multilevel linkage annotation. Our developed strategy can be used for enhancing metabolic network reconstruction.

  12. Maize databases

    USDA-ARS?s Scientific Manuscript database

    This chapter is a succinct overview of maize data held in the species-specific database MaizeGDB (the Maize Genomics and Genetics Database), and selected multi-species data repositories, such as Gramene/Ensembl Plants, Phytozome, UniProt and the National Center for Biotechnology Information (NCBI), ...

  13. Analysis of disease-associated objects at the Rat Genome Database

    PubMed Central

    Wang, Shur-Jen; Laulederkind, Stanley J. F.; Hayman, G. T.; Smith, Jennifer R.; Petri, Victoria; Lowry, Timothy F.; Nigam, Rajni; Dwinell, Melinda R.; Worthey, Elizabeth A.; Munzenmaier, Diane H.; Shimoyama, Mary; Jacob, Howard J.

    2013-01-01

    The Rat Genome Database (RGD) is the premier resource for genetic, genomic and phenotype data for the laboratory rat, Rattus norvegicus. In addition to organizing biological data from rats, the RGD team focuses on manual curation of gene–disease associations for rat, human and mouse. In this work, we have analyzed disease-associated strains, quantitative trait loci (QTL) and genes from rats. These disease objects form the basis for seven disease portals. Among disease portals, the cardiovascular disease and obesity/metabolic syndrome portals have the highest number of rat strains and QTL. These two portals share 398 rat QTL, and these shared QTL are highly concentrated on rat chromosomes 1 and 2. For disease-associated genes, we performed gene ontology (GO) enrichment analysis across portals using RatMine enrichment widgets. Fifteen GO terms, five from each GO aspect, were selected to profile enrichment patterns of each portal. Of the selected biological process (BP) terms, ‘regulation of programmed cell death’ was the top enriched term across all disease portals except in the obesity/metabolic syndrome portal where ‘lipid metabolic process’ was the most enriched term. ‘Cytosol’ and ‘nucleus’ were common cellular component (CC) annotations for disease genes, but only the cancer portal genes were highly enriched with ‘nucleus’ annotations. Similar enrichment patterns were observed in a parallel analysis using the DAVID functional annotation tool. The relationship between the preselected 15 GO terms and disease terms was examined reciprocally by retrieving rat genes annotated with these preselected terms. The individual GO term–annotated gene list showed enrichment in physiologically related diseases. For example, the ‘regulation of blood pressure’ genes were enriched with cardiovascular disease annotations, and the ‘lipid metabolic process’ genes with obesity annotations. Furthermore, we were able to enhance enrichment of neurological

  14. DNA Barcoding Coupled with High Resolution Melting Analysis Enables Rapid and Accurate Distinction of Aspergillus species.

    PubMed

    Fidler, Gabor; Kocsube, Sandor; Leiter, Eva; Biro, Sandor; Paholcsek, Melinda

    2017-08-01

    We describe a high-resolution melting (HRM) analysis method that is rapid, reproducible, and able to identify reference strains and further 40 clinical isolates of Aspergillus fumigatus (14), A. lentulus (3), A. terreus (7), A. flavus (8), A. niger (2), A. welwitschiae (4), and A. tubingensis (2). Asp1 and Asp2 primer sets were designed to amplify partial sequences of the Aspergillus benA (beta-tubulin) genes in a closed-, single-tube system. Human placenta DNA, further Aspergillus (3), Candida (9), Fusarium (6), and Scedosporium (2) nucleic acids from type strains and clinical isolates were also included in this study to evaluate cross reactivity with other relevant pathogens causing invasive fungal infections. The barcoding capacity of this method proved to be 100% providing distinctive binomial scores; 14, 34, 36, 35, 25, 15, 26 when tested among species, while the within-species distinction capacity of the assay proved to be 0% based on the aligned thermodynamic profiles of the Asp1, Asp2 melting clusters allowing accurate species delimitation of all tested clinical isolates. The identification limit of this HRM assay was also estimated on Aspergillus reference gDNA panels where it proved to be 10-102 genomic equivalents (GE) except the A. fumigatus panel where it was 103 only. Furthermore, misidentification was not detected with human genomic DNA or with Candida, Fusarium, and Scedosporium strains. Our DNA barcoding assay introduced here provides results within a few hours, and it may possess further diagnostic utility when analyzing standard cultures supporting adequate therapeutic decisions. © The Author 2016. Published by Oxford University Press on behalf of The International Society for Human and Animal Mycology. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  15. ANISEED 2017: extending the integrated ascidian database to the exploration and evolutionary comparison of genome-scale datasets

    PubMed Central

    Brozovic, Matija; Dantec, Christelle; Dardaillon, Justine; Dauga, Delphine; Faure, Emmanuel; Gineste, Mathieu; Louis, Alexandra; Naville, Magali; Nitta, Kazuhiro R; Piette, Jacques; Reeves, Wendy; Scornavacca, Céline; Simion, Paul; Vincentelli, Renaud; Bellec, Maelle; Aicha, Sameh Ben; Fagotto, Marie; Guéroult-Bellone, Marion; Haeussler, Maximilian; Jacox, Edwin; Lowe, Elijah K; Mendez, Mickael; Roberge, Alexis; Stolfi, Alberto; Yokomori, Rui; Cambillau, Christian; Christiaen, Lionel; Delsuc, Frédéric; Douzery, Emmanuel; Dumollard, Rémi; Kusakabe, Takehiro; Nakai, Kenta; Nishida, Hiroki; Satou, Yutaka; Swalla, Billie; Veeman, Michael; Volff, Jean-Nicolas

    2018-01-01

    Abstract ANISEED (www.aniseed.cnrs.fr) is the main model organism database for tunicates, the sister-group of vertebrates. This release gives access to annotated genomes, gene expression patterns, and anatomical descriptions for nine ascidian species. It provides increased integration with external molecular and taxonomy databases, better support for epigenomics datasets, in particular RNA-seq, ChIP-seq and SELEX-seq, and features novel interactive interfaces for existing and novel datatypes. In particular, the cross-species navigation and comparison is enhanced through a novel taxonomy section describing each represented species and through the implementation of interactive phylogenetic gene trees for 60% of tunicate genes. The gene expression section displays the results of RNA-seq experiments for the three major model species of solitary ascidians. Gene expression is controlled by the binding of transcription factors to cis-regulatory sequences. A high-resolution description of the DNA-binding specificity for 131 Ciona robusta (formerly C. intestinalis type A) transcription factors by SELEX-seq is provided and used to map candidate binding sites across the Ciona robusta and Phallusia mammillata genomes. Finally, use of a WashU Epigenome browser enhances genome navigation, while a Genomicus server was set up to explore microsynteny relationships within tunicates and with vertebrates, Amphioxus, echinoderms and hemichordates. PMID:29149270

  16. CycADS: an annotation database system to ease the development and update of BioCyc databases

    PubMed Central

    Vellozo, Augusto F.; Véron, Amélie S.; Baa-Puyoulet, Patrice; Huerta-Cepas, Jaime; Cottret, Ludovic; Febvay, Gérard; Calevro, Federica; Rahbé, Yvan; Douglas, Angela E.; Gabaldón, Toni; Sagot, Marie-France; Charles, Hubert; Colella, Stefano

    2011-01-01

    In recent years, genomes from an increasing number of organisms have been sequenced, but their annotation remains a time-consuming process. The BioCyc databases offer a framework for the integrated analysis of metabolic networks. The Pathway tool software suite allows the automated construction of a database starting from an annotated genome, but it requires prior integration of all annotations into a specific summary file or into a GenBank file. To allow the easy creation and update of a BioCyc database starting from the multiple genome annotation resources available over time, we have developed an ad hoc data management system that we called Cyc Annotation Database System (CycADS). CycADS is centred on a specific database model and on a set of Java programs to import, filter and export relevant information. Data from GenBank and other annotation sources (including for example: KAAS, PRIAM, Blast2GO and PhylomeDB) are collected into a database to be subsequently filtered and extracted to generate a complete annotation file. This file is then used to build an enriched BioCyc database using the PathoLogic program of Pathway Tools. The CycADS pipeline for annotation management was used to build the AcypiCyc database for the pea aphid (Acyrthosiphon pisum) whose genome was recently sequenced. The AcypiCyc database webpage includes also, for comparative analyses, two other metabolic reconstruction BioCyc databases generated using CycADS: TricaCyc for Tribolium castaneum and DromeCyc for Drosophila melanogaster. Linked to its flexible design, CycADS offers a powerful software tool for the generation and regular updating of enriched BioCyc databases. The CycADS system is particularly suited for metabolic gene annotation and network reconstruction in newly sequenced genomes. Because of the uniform annotation used for metabolic network reconstruction, CycADS is particularly useful for comparative analysis of the metabolism of different organisms. Database URL: http

  17. Molecular strategy for identification in Aspergillus section Flavi.

    PubMed

    Godet, Marie; Munaut, Françoise

    2010-03-01

    Aspergillus flavus is one of the most common contaminants that produces aflatoxins in foodstuffs. It is also a human allergen and a pathogen of animals and plants. Aspergillus flavus is included in the Aspergillus section Flavi that comprises 11 closely related species producing different profiles of secondary metabolites. A six-step strategy has been developed that allows identification of nine of the 11 species. First, three real-time PCR reactions allowed us to discriminate four groups within the section: (1) A. flavus/Aspergillus oryzae/Aspergillus minisclerotigenes/Aspergillus parvisclerotigenus; (2) Aspergillus parasiticus/Aspergillus sojae/Aspergillus arachidicola; (3) Aspergillus tamarii/Aspergillus bombycis/Aspergillus pseudotamarii; and (4) Aspergillus nomius. Secondly, random amplification of polymorphic DNA (RAPD) amplifications or SmaI digestion allowed us to differentiate (1) A. flavus, A. oryzae and A. minisclerotigenes; (2) A. parasiticus, A. sojae and A. arachidicola; (3) A. tamarii, A. bombycis and A. pseudotamarii. Among the 11 species, only A. parvisclerotigenus cannot be differentiated from A. flavus. Using the results of real-time PCR, RAPD and SmaI digestion, a decision-making tree was drawn up to identify nine of the 11 species of section Flavi. In contrast to conventional morphological methods, which are often time-consuming, the molecular strategy proposed here is based mainly on real-time PCR, which is rapid and requires minimal handling.

  18. Taxonomic Characterization and Secondary Metabolite Profiling of Aspergillus Section Aspergillus Contaminating Feeds and Feedstuffs

    PubMed Central

    Greco, Mariana; Kemppainen, Minna; Pose, Graciela; Pardo, Alejandro

    2015-01-01

    Xerophilic fungal species of the genus Aspergillus are economically highly relevant due to their ability to grow on low water activity substrates causing spoilage of stored goods and animal feeds. These fungi can synthesize a variety of secondary metabolites, many of which show animal toxicity, creating a health risk for food production animals and to humans as final consumers, respectively. Animal feeds used for rabbit, chinchilla and rainbow trout production in Argentina were analysed for the presence of xerophilic Aspergillus section Aspergillus species. High isolation frequencies (>60%) were detected in all the studied rabbit and chinchilla feeds, while the rainbow trout feeds showed lower fungal charge (25%). These section Aspergillus contaminations comprised predominantly five taxa. Twenty isolates were subjected to taxonomic characterization using both ascospore SEM micromorphology and two independent DNA loci sequencing. The secondary metabolite profiles of the isolates were determined qualitatively by HPLC-MS. All the isolates produced neoechinulin A, 17 isolates were positive for cladosporin and echinulin, and 18 were positive for neoechinulin B. Physcion and preechinulin were detected in a minor proportion of the isolates. This is the first report describing the detailed species composition and the secondary metabolite profiles of Aspergillus section Aspergillus contaminating animal feeds. PMID:26364643

  19. Taxonomic Characterization and Secondary Metabolite Profiling of Aspergillus Section Aspergillus Contaminating Feeds and Feedstuffs.

    PubMed

    Greco, Mariana; Kemppainen, Minna; Pose, Graciela; Pardo, Alejandro

    2015-09-02

    Xerophilic fungal species of the genus Aspergillus are economically highly relevant due to their ability to grow on low water activity substrates causing spoilage of stored goods and animal feeds. These fungi can synthesize a variety of secondary metabolites, many of which show animal toxicity, creating a health risk for food production animals and to humans as final consumers, respectively. Animal feeds used for rabbit, chinchilla and rainbow trout production in Argentina were analysed for the presence of xerophilic Aspergillus section Aspergillus species. High isolation frequencies (>60%) were detected in all the studied rabbit and chinchilla feeds, while the rainbow trout feeds showed lower fungal charge (25%). These section Aspergillus contaminations comprised predominantly five taxa. Twenty isolates were subjected to taxonomic characterization using both ascospore SEM micromorphology and two independent DNA loci sequencing. The secondary metabolite profiles of the isolates were determined qualitatively by HPLC-MS. All the isolates produced neoechinulin A, 17 isolates were positive for cladosporin and echinulin, and 18 were positive for neoechinulin B. Physcion and preechinulin were detected in a minor proportion of the isolates. This is the first report describing the detailed species composition and the secondary metabolite profiles of Aspergillus section Aspergillus contaminating animal feeds.

  20. Genomics Portals: integrative web-platform for mining genomics data.

    PubMed

    Shinde, Kaustubh; Phatak, Mukta; Johannes, Freudenberg M; Chen, Jing; Li, Qian; Vineet, Joshi K; Hu, Zhen; Ghosh, Krishnendu; Meller, Jaroslaw; Medvedovic, Mario

    2010-01-13

    A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org.

  1. Genomics Portals: integrative web-platform for mining genomics data

    PubMed Central

    2010-01-01

    Background A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Results Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. Conclusion The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org. PMID:20070909

  2. Minos as a novel Tc1/mariner-type transposable element for functional genomic analysis in Aspergillus nidulans.

    PubMed

    Evangelinos, Minoas; Anagnostopoulos, Gerasimos; Karvela-Kalogeraki, Iliana; Stathopoulou, Panagiota M; Scazzocchio, Claudio; Diallinas, George

    2015-08-01

    Transposons constitute powerful genetic tools for gene inactivation, exon or promoter trapping and genome analyses. The Minos element from Drosophila hydei, a Tc1/mariner-like transposon, has proved as a very efficient tool for heterologous transposition in several metazoa. In filamentous fungi, only a handful of fungal-specific transposable elements have been exploited as genetic tools, with the impala Tc1/mariner element from Fusarium oxysporum being the most successful. Here, we developed a two-component transposition system to manipulate Minos transposition in Aspergillus nidulans (AnMinos). Our system allows direct selection of transposition events based on re-activation of niaD, a gene necessary for growth on nitrate as a nitrogen source. On average, among 10(8) conidiospores, we obtain up to ∼0.8×10(2) transposition events leading to the expected revertant phenotype (niaD(+)), while ∼16% of excision events lead to AnMinos loss. Characterized excision footprints consisted of the four terminal bases of the transposon flanked by the TA target duplication and led to no major DNA rearrangements. AnMinos transposition depends on the presence of its homologous transposase. Its frequency was not significantly affected by temperature, UV irradiation or the transcription status of the original integration locus (niaD). Importantly, transposition is dependent on nkuA, encoding an enzyme essential for non-homologous end joining of DNA in double-strand break repair. AnMinos proved to be an efficient tool for functional analysis as it seems to transpose in different genomic loci positions in all chromosomes, including a high proportion of integration events within or close to genes. We have used Minos to obtain morphological and toxic analogue resistant mutants. Interestingly, among morphological mutants some seem to be due to Minos-elicited over-expression of specific genes, rather than gene inactivation. Copyright © 2015 Elsevier Inc. All rights reserved.

  3. Using FlyBase, a Database of Drosophila Genes & Genomes

    PubMed Central

    Marygold, Steven J.; Crosby, Madeline A.; Goodman, Joshua L.

    2016-01-01

    SUMMARY For nearly 25 years, FlyBase (flybase.org) has provided a freely available online database of biological information about Drosophila species, focusing on the model organism D. melanogaster. The need for a centralized, integrated view of Drosophila research has never been greater as advances in genomic, proteomic and high-throughput technologies add to the quantity and diversity of available data and resources. FlyBase has taken several approaches to respond to these changes in the research landscape. Novel report pages have been generated for new reagent types and physical interaction data; Drosophila models of human disease are now represented and showcased in dedicated Human Disease Model Reports; other integrated reports have been established that bring together related genes, datasets or reagents; Gene Reports have been revised to improve access to new data types and to highlight functional data; links to external sites have been organized and expanded; and new tools have been developed to display and interrogate all these data, including improved batch processing and bulk file availability. In addition, several new community initiatives have served to enhance interactions between researchers and FlyBase, resulting in direct user contributions and improved feedback. This chapter provides an overview of the data content, organization and available tools within FlyBase, focusing on recent improvements. We hope it serves as a guide for our diverse user base, enabling efficient and effective exploration of the database and thereby accelerating research discoveries. PMID:27730573

  4. MIPS plant genome information resources.

    PubMed

    Spannagl, Manuel; Haberer, Georg; Ernst, Rebecca; Schoof, Heiko; Mayer, Klaus F X

    2007-01-01

    The Munich Institute for Protein Sequences (MIPS) has been involved in maintaining plant genome databases since the Arabidopsis thaliana genome project. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable data sets for model plant genomes as a backbone against which experimental data, for example from high-throughput functional genomics, can be organized and evaluated. In addition, model genomes also form a scaffold for comparative genomics, and much can be learned from genome-wide evolutionary studies.

  5. The 2018 Nucleic Acids Research database issue and the online molecular biology database collection.

    PubMed

    Rigden, Daniel J; Fernández, Xosé M

    2018-01-04

    The 2018 Nucleic Acids Research Database Issue contains 181 papers spanning molecular biology. Among them, 82 are new and 84 are updates describing resources that appeared in the Issue previously. The remaining 15 cover databases most recently published elsewhere. Databases in the area of nucleic acids include 3DIV for visualisation of data on genome 3D structure and RNArchitecture, a hierarchical classification of RNA families. Protein databases include the established SMART, ELM and MEROPS while GPCRdb and the newcomer STCRDab cover families of biomedical interest. In the area of metabolism, HMDB and Reactome both report new features while PULDB appears in NAR for the first time. This issue also contains reports on genomics resources including Ensembl, the UCSC Genome Browser and ENCODE. Update papers from the IUPHAR/BPS Guide to Pharmacology and DrugBank are highlights of the drug and drug target section while a number of proteomics databases including proteomicsDB are also covered. The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). The NAR online Molecular Biology Database Collection has been updated, reviewing 138 entries, adding 88 new resources and eliminating 47 discontinued URLs, bringing the current total to 1737 databases. It is available at http://www.oxfordjournals.org/nar/database/c/. © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Peanut gene expression profiling in developing seeds at different reproduction stages during Aspergillus parasiticus infection

    PubMed Central

    Guo, Baozhu; Chen, Xiaoping; Dang, Phat; Scully, Brian T; Liang, Xuanqiang; Holbrook, C Corley; Yu, Jiujiang; Culbreath, Albert K

    2008-01-01

    Background Peanut (Arachis hypogaea L.) is an important crop economically and nutritionally, and is one of the most susceptible host crops to colonization of Aspergillus parasiticus and subsequent aflatoxin contamination. Knowledge from molecular genetic studies could help to devise strategies in alleviating this problem; however, few peanut DNA sequences are available in the public database. In order to understand the molecular basis of host resistance to aflatoxin contamination, a large-scale project was conducted to generate expressed sequence tags (ESTs) from developing seeds to identify resistance-related genes involved in defense response against Aspergillus infection and subsequent aflatoxin contamination. Results We constructed six different cDNA libraries derived from developing peanut seeds at three reproduction stages (R5, R6 and R7) from a resistant and a susceptible cultivated peanut genotypes, 'Tifrunner' (susceptible to Aspergillus infection with higher aflatoxin contamination and resistant to TSWV) and 'GT-C20' (resistant to Aspergillus with reduced aflatoxin contamination and susceptible to TSWV). The developing peanut seed tissues were challenged by A. parasiticus and drought stress in the field. A total of 24,192 randomly selected cDNA clones from six libraries were sequenced. After removing vector sequences and quality trimming, 21,777 high-quality EST sequences were generated. Sequence clustering and assembling resulted in 8,689 unique EST sequences with 1,741 tentative consensus EST sequences (TCs) and 6,948 singleton ESTs. Functional classification was performed according to MIPS functional catalogue criteria. The unique EST sequences were divided into twenty-two categories. A similarity search against the non-redundant protein database available from NCBI indicated that 84.78% of total ESTs showed significant similarity to known proteins, of which 165 genes had been previously reported in peanuts. There were differences in overall expression

  7. First Report of an Atypical New Aspergillus parasiticus Isolates with Nucleotides Insertion in aflR Gene Identical to Aspergillus sojae

    USDA-ARS?s Scientific Manuscript database

    Aflatoxins are toxic and carcinogenic secondary metabolites produced primarily by the filamentous fungi Aspergillus favus and Aspergillus parasitic and cause toxin contamination in food chain worldwide. Aspergillus oryzae and Aspergillus sojae are highly valued as koji molds in the traditional prep...

  8. The state of proteome profiling in the fungal genus Aspergillus.

    PubMed

    Kim, Yonghyun; Nandakumar, M P; Marten, Mark R

    2008-03-01

    Aspergilli are an important genus of filamentous fungi that contribute to a multibillion dollar industry. Since many fungal genome sequencing were recently completed, it would be advantageous to profile their proteome to better understand the fungal cell factory. Here, we review proteomic data generated for the Aspergilli in recent years. Thus far, a combined total of 28 cell surface, 102 secreted and 139 intracellular proteins have been identified based on 10 different studies on Aspergillus proteomics. A summary proteome map highlighting identified proteins in major metabolic pathway is presented.

  9. GenomeVista

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Poliakov, Alexander; Couronne, Olivier

    2002-11-04

    Aligning large vertebrate genomes that are structurally complex poses a variety of problems not encountered on smaller scales. Such genomes are rich in repetitive elements and contain multiple segmental duplications, which increases the difficulty of identifying true orthologous SNA segments in alignments. The sizes of the sequences make many alignment algorithms designed for comparing single proteins extremely inefficient when processing large genomic intervals. We integrated both local and global alignment tools and developed a suite of programs for automatically aligning large vertebrate genomes and identifying conserved non-coding regions in the alignments. Our method uses the BLAT local alignment program tomore » find anchors on the base genome to identify regions of possible homology for a query sequence. These regions are postprocessed to find the best candidates which are then globally aligned using the AVID global alignment program. In the last step conserved non-coding segments are identified using VISTA. Our methods are fast and the resulting alignments exhibit a high degree of sensitivity, covering more than 90% of known coding exons in the human genome. The GenomeVISTA software is a suite of Perl programs that is built on a MySQL database platform. The scheduler gets control data from the database, builds a queve of jobs, and dispatches them to a PC cluster for execution. The main program, running on each node of the cluster, processes individual sequences. A Perl library acts as an interface between the database and the above programs. The use of a separate library allows the programs to function independently of the database schema. The library also improves on the standard Perl MySQL database interfere package by providing auto-reconnect functionality and improved error handling.« less

  10. MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource based on the first complete plant genome

    PubMed Central

    Schoof, Heiko; Zaccaria, Paolo; Gundlach, Heidrun; Lemcke, Kai; Rudd, Stephen; Kolesov, Grigory; Arnold, Roland; Mewes, H. W.; Mayer, Klaus F. X.

    2002-01-01

    Arabidopsis thaliana is the first plant for which the complete genome has been sequenced and published. Annotation of complex eukaryotic genomes requires more than the assignment of genetic elements to the sequence. Besides completing the list of genes, we need to discover their cellular roles, their regulation and their interactions in order to understand the workings of the whole plant. The MIPS Arabidopsis thaliana Database (MAtDB; http://mips.gsf.de/proj/thal/db) started out as a repository for genome sequence data in the European Scientists Sequencing Arabidopsis (ESSA) project and the Arabidopsis Genome Initiative. Our aim is to transform MAtDB into an integrated biological knowledge resource by integrating diverse data, tools, query and visualization capabilities and by creating a comprehensive resource for Arabidopsis as a reference model for other species, including crop plants. PMID:11752263

  11. The Importance of Biological Databases in Biological Discovery.

    PubMed

    Baxevanis, Andreas D; Bateman, Alex

    2015-06-19

    Biological databases play a central role in bioinformatics. They offer scientists the opportunity to access a wide variety of biologically relevant data, including the genomic sequences of an increasingly broad range of organisms. This unit provides a brief overview of major sequence databases and portals, such as GenBank, the UCSC Genome Browser, and Ensembl. Model organism databases, including WormBase, The Arabidopsis Information Resource (TAIR), and those made available through the Mouse Genome Informatics (MGI) resource, are also covered. Non-sequence-centric databases, such as Online Mendelian Inheritance in Man (OMIM), the Protein Data Bank (PDB), MetaCyc, and the Kyoto Encyclopedia of Genes and Genomes (KEGG), are also discussed. Copyright © 2015 John Wiley & Sons, Inc.

  12. Forward genetics screen coupled with whole-genome resequencing identifies novel gene targets for improving heterologous enzyme production in Aspergillus niger

    DOE PAGES

    Reilly, Morgann C.; Kim, Joonhoon; Lynn, Jed; ...

    2018-01-06

    Plant biomass, once reduced to its composite sugars, can be converted to fuel substitutes. One means of overcoming the recalcitrance of lignocellulose is pretreatment followed by enzymatic hydrolysis. However, currently available commercial enzyme cocktails are inhibited in the presence of residual pretreatment chemicals. Recent studies have identified a number of cellulolytic enzymes from bacteria that are tolerant to pretreatment chemicals such as ionic liquids. The challenge now is generation of these enzymes in copious amounts, an arena where fungal organisms such as Aspergillus niger have proven efficient. Fungal host strains still need to be engineered to increase production titers ofmore » heterologous protein over native enzymes, which has been a difficult task. Here, we developed a forward genetics screen coupled with whole-genome resequencing to identify specific lesions responsible for a protein hyper-production phenotype in A. niger. As a result, this strategy successfully identified novel targets, including a low-affinity glucose transporter, MstC, whose deletion significantly improved secretion of recombinant proteins driven by a glucoamylase promoter.« less

  13. 5S rRNA Promoter for Guide RNA Expression Enabled Highly Efficient CRISPR/Cas9 Genome Editing in Aspergillus niger.

    PubMed

    Zheng, Xiaomei; Zheng, Ping; Zhang, Kun; Cairns, Timothy C; Meyer, Vera; Sun, Jibin; Ma, Yanhe

    2018-04-30

    The CRISPR/Cas9 system is a revolutionary genome editing tool. However, in eukaryotes, search and optimization of a suitable promoter for guide RNA expression is a significant technical challenge. Here we used the industrially important fungus, Aspergillus niger, to demonstrate that the 5S rRNA gene, which is both highly conserved and efficiently expressed in eukaryotes, can be used as a guide RNA promoter. The gene editing system was established with 100% rates of precision gene modifications among dozens of transformants using short (40-bp) homologous donor DNA. This system was also applicable for generation of designer chromosomes, as evidenced by deletion of a 48 kb gene cluster required for biosynthesis of the mycotoxin fumonisin B1. Moreover, this system also facilitated simultaneous mutagenesis of multiple genes in A. niger. We anticipate that the use of the 5S rRNA gene as guide RNA promoter can broadly be applied for engineering highly efficient eukaryotic CRISPR/Cas9 toolkits. Additionally, the system reported here will enable development of designer chromosomes in model and industrially important fungi.

  14. Forward genetics screen coupled with whole-genome resequencing identifies novel gene targets for improving heterologous enzyme production in Aspergillus niger

    DOE PAGES

    Reilly, Morgann C.; Kim, Joonhoon; Lynn, Jed; ...

    2018-01-06

    Plant biomass, once reduced to its composite sugars, can be converted to fuel substitutes. One means of overcoming the recalcitrance of lignocellulose is pretreatment followed by enzymatic hydrolysis. However, currently available commercial enzyme cocktails are inhibited in the presence of residual pretreatment chemicals. Recent studies have identified a number of cellulolytic enzymes from bacteria that are tolerant to pretreatment chemicals such as ionic liquids. The challenge now is generation of these enzymes in copious amounts, an arena where fungal organisms such as Aspergillus niger have proven efficient. Fungal host strains still need to be engineered to increase production titers ofmore » heterologous protein over native enzymes, which has been a difficult task. Here, we developed a forward genetics screen coupled with whole-genome resequencing to identify specific lesions responsible for a protein hyper-production phenotype in A. niger. This strategy successfully identified novel targets, including a low-affinity glucose transporter, MstC, whose deletion significantly improved secretion of recombinant proteins driven by a glucoamylase promoter.« less

  15. Forward genetics screen coupled with whole-genome resequencing identifies novel gene targets for improving heterologous enzyme production in Aspergillus niger

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Reilly, Morgann C.; Kim, Joonhoon; Lynn, Jed

    Plant biomass, once reduced to its composite sugars, can be converted to fuel substitutes. One means of overcoming the recalcitrance of lignocellulose is pretreatment followed by enzymatic hydrolysis. However, currently available commercial enzyme cocktails are inhibited in the presence of residual pretreatment chemicals. Recent studies have identified a number of cellulolytic enzymes from bacteria that are tolerant to pretreatment chemicals such as ionic liquids. The challenge now is generation of these enzymes in copious amounts, an arena where fungal organisms such as Aspergillus niger have proven efficient. Fungal host strains still need to be engineered to increase production titers ofmore » heterologous protein over native enzymes, which has been a difficult task. Here, we developed a forward genetics screen coupled with whole-genome resequencing to identify specific lesions responsible for a protein hyper-production phenotype in A. niger. This strategy successfully identified novel targets, including a low-affinity glucose transporter, MstC, whose deletion significantly improved secretion of recombinant proteins driven by a glucoamylase promoter.« less

  16. Forward genetics screen coupled with whole-genome resequencing identifies novel gene targets for improving heterologous enzyme production in Aspergillus niger

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Reilly, Morgann C.; Kim, Joonhoon; Lynn, Jed

    Plant biomass, once reduced to its composite sugars, can be converted to fuel substitutes. One means of overcoming the recalcitrance of lignocellulose is pretreatment followed by enzymatic hydrolysis. However, currently available commercial enzyme cocktails are inhibited in the presence of residual pretreatment chemicals. Recent studies have identified a number of cellulolytic enzymes from bacteria that are tolerant to pretreatment chemicals such as ionic liquids. The challenge now is generation of these enzymes in copious amounts, an arena where fungal organisms such as Aspergillus niger have proven efficient. Fungal host strains still need to be engineered to increase production titers ofmore » heterologous protein over native enzymes, which has been a difficult task. Here, we developed a forward genetics screen coupled with whole-genome resequencing to identify specific lesions responsible for a protein hyper-production phenotype in A. niger. As a result, this strategy successfully identified novel targets, including a low-affinity glucose transporter, MstC, whose deletion significantly improved secretion of recombinant proteins driven by a glucoamylase promoter.« less

  17. The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease.

    PubMed

    Eppig, Janan T; Blake, Judith A; Bult, Carol J; Kadin, James A; Richardson, Joel E

    2015-01-01

    The Mouse Genome Database (MGD, http://www.informatics.jax.org) serves the international biomedical research community as the central resource for integrated genomic, genetic and biological data on the laboratory mouse. To facilitate use of mouse as a model in translational studies, MGD maintains a core of high-quality curated data and integrates experimentally and computationally generated data sets. MGD maintains a unified catalog of genes and genome features, including functional RNAs, QTL and phenotypic loci. MGD curates and provides functional and phenotype annotations for mouse genes using the Gene Ontology and Mammalian Phenotype Ontology. MGD integrates phenotype data and associates mouse genotypes to human diseases, providing critical mouse-human relationships and access to repositories holding mouse models. MGD is the authoritative source of nomenclature for genes, genome features, alleles and strains following guidelines of the International Committee on Standardized Genetic Nomenclature for Mice. A new addition to MGD, the Human-Mouse: Disease Connection, allows users to explore gene-phenotype-disease relationships between human and mouse. MGD has also updated search paradigms for phenotypic allele attributes, incorporated incidental mutation data, added a module for display and exploration of genes and microRNA interactions and adopted the JBrowse genome browser. MGD resources are freely available to the scientific community. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. dBBQs: dataBase of Bacterial Quality scores.

    PubMed

    Wanchai, Visanu; Patumcharoenpol, Preecha; Nookaew, Intawat; Ussery, David

    2017-12-28

    It is well-known that genome sequencing technologies are becoming significantly cheaper and faster. As a result of this, the exponential growth in sequencing data in public databases allows us to explore ever growing large collections of genome sequences. However, it is less known that the majority of available sequenced genome sequences in public databases are not complete, drafts of varying qualities. We have calculated quality scores for around 100,000 bacterial genomes from all major genome repositories and put them in a fast and easy-to-use database. Prokaryotic genomic data from all sources were collected and combined to make a non-redundant set of bacterial genomes. The genome quality score for each was calculated by four different measurements: assembly quality, number of rRNA and tRNA genes, and the occurrence of conserved functional domains. The dataBase of Bacterial Quality scores (dBBQs) was designed to store and retrieve quality scores. It offers fast searching and download features which the result can be used for further analysis. In addition, the search results are shown in interactive JavaScript chart framework using DC.js. The analysis of quality scores across major public genome databases find that around 68% of the genomes are of acceptable quality for many uses. dBBQs (available at http://arc-gem.uams.edu/dbbqs ) provides genome quality scores for all available prokaryotic genome sequences with a user-friendly Web-interface. These scores can be used as cut-offs to get a high-quality set of genomes for testing bioinformatics tools or improving the analysis. Moreover, all data of the four measurements that were combined to make the quality score for each genome, which can potentially be used for further analysis. dBBQs will be updated regularly and is freely use for non-commercial purpose.

  19. Aspergillus fumigatus and Aspergillosis

    PubMed Central

    Latgé, Jean-Paul

    1999-01-01

    Aspergillus fumigatus is one of the most ubiquitous of the airborne saprophytic fungi. Humans and animals constantly inhale numerous conidia of this fungus. The conidia are normally eliminated in the immunocompetent host by innate immune mechanisms, and aspergilloma and allergic bronchopulmonary aspergillosis, uncommon clinical syndromes, are the only infections observed in such hosts. Thus, A. fumigatus was considered for years to be a weak pathogen. With increases in the number of immunosuppressed patients, however, there has been a dramatic increase in severe and usually fatal invasive aspergillosis, now the most common mold infection worldwide. In this review, the focus is on the biology of A. fumigatus and the diseases it causes. Included are discussions of (i) genomic and molecular characterization of the organism, (ii) clinical and laboratory methods available for the diagnosis of aspergillosis in immunocompetent and immunocompromised hosts, (iii) identification of host and fungal factors that play a role in the establishment of the fungus in vivo, and (iv) problems associated with antifungal therapy. PMID:10194462

  20. ANISEED 2017: extending the integrated ascidian database to the exploration and evolutionary comparison of genome-scale datasets.

    PubMed

    Brozovic, Matija; Dantec, Christelle; Dardaillon, Justine; Dauga, Delphine; Faure, Emmanuel; Gineste, Mathieu; Louis, Alexandra; Naville, Magali; Nitta, Kazuhiro R; Piette, Jacques; Reeves, Wendy; Scornavacca, Céline; Simion, Paul; Vincentelli, Renaud; Bellec, Maelle; Aicha, Sameh Ben; Fagotto, Marie; Guéroult-Bellone, Marion; Haeussler, Maximilian; Jacox, Edwin; Lowe, Elijah K; Mendez, Mickael; Roberge, Alexis; Stolfi, Alberto; Yokomori, Rui; Brown, C Titus; Cambillau, Christian; Christiaen, Lionel; Delsuc, Frédéric; Douzery, Emmanuel; Dumollard, Rémi; Kusakabe, Takehiro; Nakai, Kenta; Nishida, Hiroki; Satou, Yutaka; Swalla, Billie; Veeman, Michael; Volff, Jean-Nicolas; Lemaire, Patrick

    2018-01-04

    ANISEED (www.aniseed.cnrs.fr) is the main model organism database for tunicates, the sister-group of vertebrates. This release gives access to annotated genomes, gene expression patterns, and anatomical descriptions for nine ascidian species. It provides increased integration with external molecular and taxonomy databases, better support for epigenomics datasets, in particular RNA-seq, ChIP-seq and SELEX-seq, and features novel interactive interfaces for existing and novel datatypes. In particular, the cross-species navigation and comparison is enhanced through a novel taxonomy section describing each represented species and through the implementation of interactive phylogenetic gene trees for 60% of tunicate genes. The gene expression section displays the results of RNA-seq experiments for the three major model species of solitary ascidians. Gene expression is controlled by the binding of transcription factors to cis-regulatory sequences. A high-resolution description of the DNA-binding specificity for 131 Ciona robusta (formerly C. intestinalis type A) transcription factors by SELEX-seq is provided and used to map candidate binding sites across the Ciona robusta and Phallusia mammillata genomes. Finally, use of a WashU Epigenome browser enhances genome navigation, while a Genomicus server was set up to explore microsynteny relationships within tunicates and with vertebrates, Amphioxus, echinoderms and hemichordates. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. KONAGAbase: a genomic and transcriptomic database for the diamondback moth, Plutella xylostella.

    PubMed

    Jouraku, Akiya; Yamamoto, Kimiko; Kuwazaki, Seigo; Urio, Masahiro; Suetsugu, Yoshitaka; Narukawa, Junko; Miyamoto, Kazuhisa; Kurita, Kanako; Kanamori, Hiroyuki; Katayose, Yuichi; Matsumoto, Takashi; Noda, Hiroaki

    2013-07-09

    The diamondback moth (DBM), Plutella xylostella, is one of the most harmful insect pests for crucifer crops worldwide. DBM has rapidly evolved high resistance to most conventional insecticides such as pyrethroids, organophosphates, fipronil, spinosad, Bacillus thuringiensis, and diamides. Therefore, it is important to develop genomic and transcriptomic DBM resources for analysis of genes related to insecticide resistance, both to clarify the mechanism of resistance of DBM and to facilitate the development of insecticides with a novel mode of action for more effective and environmentally less harmful insecticide rotation. To contribute to this goal, we developed KONAGAbase, a genomic and transcriptomic database for DBM (KONAGA is the Japanese word for DBM). KONAGAbase provides (1) transcriptomic sequences of 37,340 ESTs/mRNAs and 147,370 RNA-seq contigs which were clustered and assembled into 84,570 unigenes (30,695 contigs, 50,548 pseudo singletons, and 3,327 singletons); and (2) genomic sequences of 88,530 WGS contigs with 246,244 degenerate contigs and 106,455 singletons from which 6,310 de novo identified repeat sequences and 34,890 predicted gene-coding sequences were extracted. The unigenes and predicted gene-coding sequences were clustered and 32,800 representative sequences were extracted as a comprehensive putative gene set. These sequences were annotated with BLAST descriptions, Gene Ontology (GO) terms, and Pfam descriptions, respectively. KONAGAbase contains rich graphical user interface (GUI)-based web interfaces for easy and efficient searching, browsing, and downloading sequences and annotation data. Five useful search interfaces consisting of BLAST search, keyword search, BLAST result-based search, GO tree-based search, and genome browser are provided. KONAGAbase is publicly available from our website (http://dbm.dna.affrc.go.jp/px/) through standard web browsers. KONAGAbase provides DBM comprehensive transcriptomic and draft genomic sequences with

  2. Aspergillus Bronchitis in Patients with Cystic Fibrosis.

    PubMed

    Brandt, Claudia; Roehmel, Jobst; Rickerts, Volker; Melichar, Volker; Niemann, Nadja; Schwarz, Carsten

    2018-02-01

    Aspergillus fumigatus frequently colonizes the airways of patients with cystic fibrosis (CF) and may cause various severe infections, such as bronchitis. Serological data, sputum dependent markers and longitudinal data of treated cases of Aspergillus bronchitis were evaluated for further description of this infection. This study, which comprises three substudies, aimed to analyze epidemiological data of Aspergillus in CF and the entity of Aspergillus bronchitis. In a first step, data of the German Cystic Fibrosis Registry were used to evaluate the frequency of Aspergillus colonization in patients with CF (n = 2599). Then a retrospective analysis of 10 cases of Aspergillus bronchitis was performed to evaluate longitudinal data for lung function and clinical presentation parameters: sputum production, cough and physical capacity. Finally, a prospective cohort study (n = 22) was conducted to investigate serological markers for Aspergillus bronchitis: total serum IgE, specific serum IgE, specific serum IgG, as well as sputum galactomannan, real-time PCR detection of Aspergillus DNA in sputum and fungal cultures. Analysis of the German CF registry revealed an Aspergillus colonization rate of 32.5% among the 2599 patients. A retrospective data analysis of 10 treated cases revealed the clinical course of Aspergillus bronchitis, including repeated positive sputum culture findings for A. fumigatus, no antibiotic treatment response, total serum IgE levels <200 kU/l, no observation of new pulmonary infiltrates and appropriate antifungal treatment response. Antifungal treatment durations of 4 ± 1.6 (2-6) weeks significantly reduced cough (P = 0.0067), sputum production (P < 0.0001) and lung function measures (P = 0.0358) but not physical capacity (P = 0.0794). From this retrospective study, a prevalence of 1.6% was calculated. In addition, two cases of Aspergillus bronchitis were identified in the prospective cohort study according to immunological, molecular

  3. An Aspergillus flavus secondary metabolic gene cluster containing a hybrid PKS-NRPS is necessary for synthesis of the 2-pyridones, leporins

    USDA-ARS?s Scientific Manuscript database

    The genome of the filamentous fungus, Aspergillus flavus, has been shown to harbor as many as 55 putative secondary metabolic gene clusters including the one responsible for production of the toxic and carcinogenic, polyketide synthase (PKS)-derived family of secondary metabolites termed aflatoxins....

  4. The Eukaryotic Pathogen Databases: a functional genomic resource integrating data from human and veterinary parasites.

    PubMed

    Harb, Omar S; Roos, David S

    2015-01-01

    Over the past 20 years, advances in high-throughput biological techniques and the availability of computational resources including fast Internet access have resulted in an explosion of large genome-scale data sets "big data." While such data are readily available for download and personal use and analysis from a variety of repositories, often such analysis requires access to seldom-available computational skills. As a result a number of databases have emerged to provide scientists with online tools enabling the interrogation of data without the need for sophisticated computational skills beyond basic knowledge of Internet browser utility. This chapter focuses on the Eukaryotic Pathogen Databases (EuPathDB: http://eupathdb.org) Bioinformatic Resource Center (BRC) and illustrates some of the available tools and methods.

  5. The Comprehensive Phytopathogen Genomics Resource: a web-based resource for data-mining plant pathogen genomes.

    PubMed

    Hamilton, John P; Neeno-Eckwall, Eric C; Adhikari, Bishwo N; Perna, Nicole T; Tisserat, Ned; Leach, Jan E; Lévesque, C André; Buell, C Robin

    2011-01-01

    The Comprehensive Phytopathogen Genomics Resource (CPGR) provides a web-based portal for plant pathologists and diagnosticians to view the genome and trancriptome sequence status of 806 bacterial, fungal, oomycete, nematode, viral and viroid plant pathogens. Tools are available to search and analyze annotated genome sequences of 74 bacterial, fungal and oomycete pathogens. Oomycete and fungal genomes are obtained directly from GenBank, whereas bacterial genome sequences are downloaded from the A Systematic Annotation Package (ASAP) database that provides curation of genomes using comparative approaches. Curated lists of bacterial genes relevant to pathogenicity and avirulence are also provided. The Plant Pathogen Transcript Assemblies Database provides annotated assemblies of the transcribed regions of 82 eukaryotic genomes from publicly available single pass Expressed Sequence Tags. Data-mining tools are provided along with tools to create candidate diagnostic markers, an emerging use for genomic sequence data in plant pathology. The Plant Pathogen Ribosomal DNA (rDNA) database is a resource for pathogens that lack genome or transcriptome data sets and contains 131 755 rDNA sequences from GenBank for 17 613 species identified as plant pathogens and related genera. Database URL: http://cpgr.plantbiology.msu.edu.

  6. Aspergillus species as emerging causative agents of onychomycosis.

    PubMed

    Nouripour-Sisakht, S; Mirhendi, H; Shidfar, M R; Ahmadi, B; Rezaei-Matehkolaei, A; Geramishoar, M; Zarei, F; Jalalizand, N

    2015-06-01

    Onychomycosis is a common nail infection caused by dermatophytes, non-dermatophyte molds (NDM), and yeasts. Aspergillus species are emerging as increasing causes of toenail onychomycosis. The purpose of this study was species delineation of Aspergillus spp. isolated from patients with onychomycosis. During a period of one year (2012-2013), nail samples were collected from patients clinically suspected of onychomycosis and subjected to microscopic examination and culture. Species identification was performed based on macro- and micro-morphology of colonies. For precise species identification, PCR-amplification and sequencing of the beta-tubulin gene followed by BLAST queries were performed where required. A total of 463/2,292 (20.2%) tested nails were diagnosed with onychomycosis. Among the positive specimens, 154 cases (33.2%) were identified as saprophytic NDM onychomycosis, 135 (29.2%) of which were attributable to Aspergillus. Aspergillus species isolated from the infected nails included Aspergillus flavus (77.3%, n=119), Aspergillus niger (n=4), Aspergillus tubingensis (n=4), Aspergillus terreus (n=3), Aspergillus sydowii (n=2), Aspergillus spp. (n=2), and Aspergillus candidus (n=1). Among the patients diagnosed with onychomycosis due to Aspergillus (average patient age, 47.4 years), 40 had fingernail and 95 toenail involvement. The large toenails were most commonly affected. This study identified a markedly high occurrence of A. flavus, and this fungus appears to be an emerging cause of saprophytic onychomycosis in Iran. The study moreover highlights the necessity of differentiating between dermatophytic and non-dermatophytic nail infections for informed decisions on appropriate therapy. Copyright © 2015 Elsevier Masson SAS. All rights reserved.

  7. Expanded microbial genome coverage and improved protein family annotation in the COG database

    PubMed Central

    Galperin, Michael Y.; Makarova, Kira S.; Wolf, Yuri I.; Koonin, Eugene V.

    2015-01-01

    Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the

  8. Gramene database: navigating plant comparative genomics resources

    USDA-ARS?s Scientific Manuscript database

    Gramene (http://www.gramene.org) is an online, open source, curated resource for plant comparative genomics and pathway analysis designed to support researchers working in plant genomics, breeding, evolutionary biology, system biology, and metabolic engineering. It exploits phylogenetic relationship...

  9. ReprDB and panDB: minimalist databases with maximal microbial representation.

    PubMed

    Zhou, Wei; Gay, Nicole; Oh, Julia

    2018-01-18

    Profiling of shotgun metagenomic samples is hindered by a lack of unified microbial reference genome databases that (i) assemble genomic information from all open access microbial genomes, (ii) have relatively small sizes, and (iii) are compatible to various metagenomic read mapping tools. Moreover, computational tools to rapidly compile and update such databases to accommodate the rapid increase in new reference genomes do not exist. As a result, database-guided analyses often fail to profile a substantial fraction of metagenomic shotgun sequencing reads from complex microbiomes. We report pipelines that efficiently traverse all open access microbial genomes and assemble non-redundant genomic information. The pipelines result in two species-resolution microbial reference databases of relatively small sizes: reprDB, which assembles microbial representative or reference genomes, and panDB, for which we developed a novel iterative alignment algorithm to identify and assemble non-redundant genomic regions in multiple sequenced strains. With the databases, we managed to assign taxonomic labels and genome positions to the majority of metagenomic reads from human skin and gut microbiomes, demonstrating a significant improvement over a previous database-guided analysis on the same datasets. reprDB and panDB leverage the rapid increases in the number of open access microbial genomes to more fully profile metagenomic samples. Additionally, the databases exclude redundant sequence information to avoid inflated storage or memory space and indexing or analyzing time. Finally, the novel iterative alignment algorithm significantly increases efficiency in pan-genome identification and can be useful in comparative genomic analyses.

  10. Islander: A database of precisely mapped genomic islands in tRNA and tmRNA genes

    DOE PAGES

    Hudson, Corey M.; Lau, Britney Y.; Williams, Kelly P.

    2014-11-05

    Genomic islands are mobile DNAs that are major agents of bacterial and archaeal evolution. Integration into prokaryotic chromosomes usually occurs site-specifically at tRNA or tmRNA gene (together, tDNA) targets, catalyzed by tyrosine integrases. This splits the target gene, yet sequences within the island restore the disrupted gene; the regenerated target and its displaced fragment precisely mark the endpoints of the island. We applied this principle to search for islands in genomic DNA sequences. Our algorithm identifies tDNAs, finds fragments of those tDNAs in the same replicon and removes unlikely candidate islands through a series of filters. A search for islandsmore » in 2168 whole prokaryotic genomes produced 3919 candidates. The website Islander (recently moved to http://bioinformatics.sandia.gov/islander/) presents these precisely mapped candidate islands, the gene content and the island sequence. The algorithm further insists that each island encode an integrase, and attachment site sequence identity is carefully noted; therefore, the database also serves in the study of integrase site-specificity and its evolution.« less

  11. Short Interspersed Nuclear Element (SINE) Sequences in the Genome of the Human Pathogenic Fungus Aspergillus fumigatus Af293.

    PubMed

    Kanhayuwa, Lakkhana; Coutts, Robert H A

    2016-01-01

    Novel families of short interspersed nuclear element (SINE) sequences in the human pathogenic fungus Aspergillus fumigatus, clinical isolate Af293, were identified and categorised into tRNA-related and 5S rRNA-related SINEs. Eight predicted tRNA-related SINE families originating from different tRNAs, and nominated as AfuSINE2 sequences, contained target site duplications of short direct repeat sequences (4-14 bp) flanking the elements, an extended tRNA-unrelated region and typical features of RNA polymerase III promoter sequences. The elements ranged in size from 140-493 bp and were present in low copy number in the genome and five out of eight were actively transcribed. One putative tRNAArg-derived sequence, AfuSINE2-1a possessed a unique feature of repeated trinucleotide ACT residues at its 3'-terminus. This element was similar in sequence to the I-4_AO element found in A. oryzae and an I-1_AF long nuclear interspersed element-like sequence identified in A. fumigatus Af293. Families of 5S rRNA-related SINE sequences, nominated as AfuSINE3, were also identified and their 5'-5S rRNA-related regions show 50-65% and 60-75% similarity to respectively A. fumigatus 5S rRNAs and SINE3-1_AO found in A. oryzae. A. fumigatus Af293 contains five copies of AfuSINE3 sequences ranging in size from 259-343 bp and two out of five AfuSINE3 sequences were actively transcribed. Investigations on AfuSINE distribution in the fungal genome revealed that the elements are enriched in pericentromeric and subtelomeric regions and inserted within gene-rich regions. We also demonstrated that some, but not all, AfuSINE sequences are targeted by host RNA silencing mechanisms. Finally, we demonstrated that infection of the fungus with mycoviruses had no apparent effects on SINE activity.

  12. Development of a database system for mapping insertional mutations onto the mouse genome with large-scale experimental data

    PubMed Central

    2009-01-01

    Background Insertional mutagenesis is an effective method for functional genomic studies in various organisms. It can rapidly generate easily tractable mutations. A large-scale insertional mutagenesis with the piggyBac (PB) transposon is currently performed in mice at the Institute of Developmental Biology and Molecular Medicine (IDM), Fudan University in Shanghai, China. This project is carried out via collaborations among multiple groups overseeing interconnected experimental steps and generates a large volume of experimental data continuously. Therefore, the project calls for an efficient database system for recording, management, statistical analysis, and information exchange. Results This paper presents a database application called MP-PBmice (insertional mutation mapping system of PB Mutagenesis Information Center), which is developed to serve the on-going large-scale PB insertional mutagenesis project. A lightweight enterprise-level development framework Struts-Spring-Hibernate is used here to ensure constructive and flexible support to the application. The MP-PBmice database system has three major features: strict access-control, efficient workflow control, and good expandability. It supports the collaboration among different groups that enter data and exchange information on daily basis, and is capable of providing real time progress reports for the whole project. MP-PBmice can be easily adapted for other large-scale insertional mutation mapping projects and the source code of this software is freely available at http://www.idmshanghai.cn/PBmice. Conclusion MP-PBmice is a web-based application for large-scale insertional mutation mapping onto the mouse genome, implemented with the widely used framework Struts-Spring-Hibernate. This system is already in use by the on-going genome-wide PB insertional mutation mapping project at IDM, Fudan University. PMID:19958505

  13. Translational genomics for plant breeding with the genome sequence explosion.

    PubMed

    Kang, Yang Jae; Lee, Taeyoung; Lee, Jayern; Shim, Sangrea; Jeong, Haneul; Satyawan, Dani; Kim, Moon Young; Lee, Suk-Ha

    2016-04-01

    The use of next-generation sequencers and advanced genotyping technologies has propelled the field of plant genomics in model crops and plants and enhanced the discovery of hidden bridges between genotypes and phenotypes. The newly generated reference sequences of unstudied minor plants can be annotated by the knowledge of model plants via translational genomics approaches. Here, we reviewed the strategies of translational genomics and suggested perspectives on the current databases of genomic resources and the database structures of translated information on the new genome. As a draft picture of phenotypic annotation, translational genomics on newly sequenced plants will provide valuable assistance for breeders and researchers who are interested in genetic studies. © 2015 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  14. BISQUE: locus- and variant-specific conversion of genomic, transcriptomic and proteomic database identifiers.

    PubMed

    Meyer, Michael J; Geske, Philip; Yu, Haiyuan

    2016-05-15

    Biological sequence databases are integral to efforts to characterize and understand biological molecules and share biological data. However, when analyzing these data, scientists are often left holding disparate biological currency-molecular identifiers from different databases. For downstream applications that require converting the identifiers themselves, there are many resources available, but analyzing associated loci and variants can be cumbersome if data is not given in a form amenable to particular analyses. Here we present BISQUE, a web server and customizable command-line tool for converting molecular identifiers and their contained loci and variants between different database conventions. BISQUE uses a graph traversal algorithm to generalize the conversion process for residues in the human genome, genes, transcripts and proteins, allowing for conversion across classes of molecules and in all directions through an intuitive web interface and a URL-based web service. BISQUE is freely available via the web using any major web browser (http://bisque.yulab.org/). Source code is available in a public GitHub repository (https://github.com/hyulab/BISQUE). haiyuan.yu@cornell.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  15. Species of Aspergillus section Aspergillus from clinical samples in the United States.

    PubMed

    Siqueira, João P Z; Sutton, Deanna A; Gené, Josepa; García, Dania; Wiederhold, Nathan; Guarro, Josep

    2018-07-01

    The diversity of Aspergillus species in clinical samples is continuously increasing. Species under the former name Eurotium, currently accommodated in section Aspergillus of the genus Aspergillus, are xerophilic fungi widely found in the human environment and able to grow on substrates with low water activity. However, their prevalence in the clinical setting is poorly known. We have studied the presence of these species in a set of clinical samples from the United States using a multilocus sequence analysis based on the internal transcribed spacer (ITS) region of the rRNA, and fragments of the genes β-tubulin (BenA), calmodulin (CaM), and polymerase II second largest subunit (RPB2). A total of 25 isolates were studied and identified as follows: A. montevidensis (44%), A. chevalieri (36%), A. pseudoglaucus (8%), and A. costiformis (4%). A new species Aspergillus microperforatus is also proposed, which represented 8% of the isolates studied and is characterized by uniseriate conidial heads, subglobose to pyriform vesicles, rough conidia, globose to subglobose cleistothecia, and lenticular and smooth ascospores. The in vitro antifungal activity of eight clinically available antifungals was also determined against these isolates, with the echinocandins and posaconazole having the most potent activity.

  16. Tracheobronchial Manifestations of Aspergillus Infections

    PubMed Central

    Krenke, Rafal; Grabczak, Elzbieta M.

    2011-01-01

    Human lungs are constantly exposed to a large number of Aspergillus spores which are present in ambient air. These spores are usually harmless to immunocompetent subjects but can produce a symptomatic disease in patients with impaired antifungal defense. In a small percentage of patients, the trachea and bronchi may be the main or even the sole site of Aspergillus infection. The clinical entities that may develop in tracheobronchial location include saprophytic, allergic and invasive diseases. Although this review is focused on invasive Aspergillus tracheobronchial infections, some aspects of allergic and saprophytic tracheobronchial diseases are also discussed in order to present the whole spectrum of tracheobronchial aspergillosis. To be consistent with clinical practice, an approach basing on specific conditions predisposing to invasive Aspergillus tracheobronchial infections is used to present the differences in the clinical course and prognosis of these infections. Thus, invasive or potentially invasive Aspergillus airway diseases are discussed separately in three groups of patients: (1) lung transplant recipients, (2) highly immunocompromised patients with hematologic malignancies and/or patients undergoing hematopoietic stem cell transplantation, and (3) the remaining, less severely immunocompromised patients or even immunocompetent subjects. PMID:22194666

  17. Identification of genomic sites for CRISPR/Cas9-based genome editing in the Vitis vinifera genome.

    PubMed

    Wang, Yi; Liu, Xianju; Ren, Chong; Zhong, Gan-Yuan; Yang, Long; Li, Shaohua; Liang, Zhenchang

    2016-04-21

    CRISPR/Cas9 has been recently demonstrated as an effective and popular genome editing tool for modifying genomes of humans, animals, microorganisms, and plants. Success of such genome editing is highly dependent on the availability of suitable target sites in the genomes to be edited. Many specific target sites for CRISPR/Cas9 have been computationally identified for several annual model and crop species, but such sites have not been reported for perennial, woody fruit species. In this study, we identified and characterized five types of CRISPR/Cas9 target sites in the widely cultivated grape species Vitis vinifera and developed a user-friendly database for editing grape genomes in the future. A total of 35,767,960 potential CRISPR/Cas9 target sites were identified from grape genomes in this study. Among them, 22,597,817 target sites were mapped to specific genomic locations and 7,269,788 were found to be highly specific. Protospacers and PAMs were found to distribute uniformly and abundantly in the grape genomes. They were present in all the structural elements of genes with the coding region having the highest abundance. Five PAM types, TGG, AGG, GGG, CGG and NGG, were observed. With the exception of the NGG type, they were abundantly present in the grape genomes. Synteny analysis of similar genes revealed that the synteny of protospacers matched the synteny of homologous genes. A user-friendly database containing protospacers and detailed information of the sites was developed and is available for public use at the Grape-CRISPR website ( http://biodb.sdau.edu.cn/gc/index.html ). Grape genomes harbour millions of potential CRISPR/Cas9 target sites. These sites are widely distributed among and within chromosomes with predominant abundance in the coding regions of genes. We developed a publicly-accessible Grape-CRISPR database for facilitating the use of the CRISPR/Cas9 system as a genome editing tool for functional studies and molecular breeding of grapes. Among

  18. Treatment strategies for Aspergillus infections.

    PubMed

    Chiller, Tom M.; Stevens, David A.

    2000-04-01

    Infections caused by Aspergillus species consist of many different disease presentations, ranging from relatively benign asthma in atopic disease to life-threatening systemic invasive infections. The spectrum of disease manifestations is determined by a combination of genetic predisposition, host immune system defects, and virulence of the Aspergillus species. For the purposes of this discussion, we will address three principal entities: invasive aspergillosis, both primary and disseminated, pulmonary aspergilloma, and allergic bronchopulmonary aspergillosis. Amphotericin B is the standard of treatment for severe Aspergillus infections, despite the fact that mortality in these patients remains high. Alternative therapies such as combination regimens and itraconazole also have efficacy against Aspergillus infections. We discuss the role of current therapies, the potential role of drugs in development, and the results of ongoing research with combination and immunotherapies. Copyright 2000 Harcourt Publishers Ltd.

  19. ODG: Omics database generator - a tool for generating, querying, and analyzing multi-omics comparative databases to facilitate biological understanding.

    PubMed

    Guhlin, Joseph; Silverstein, Kevin A T; Zhou, Peng; Tiffin, Peter; Young, Nevin D

    2017-08-10

    Rapid generation of omics data in recent years have resulted in vast amounts of disconnected datasets without systemic integration and knowledge building, while individual groups have made customized, annotated datasets available on the web with few ways to link them to in-lab datasets. With so many research groups generating their own data, the ability to relate it to the larger genomic and comparative genomic context is becoming increasingly crucial to make full use of the data. The Omics Database Generator (ODG) allows users to create customized databases that utilize published genomics data integrated with experimental data which can be queried using a flexible graph database. When provided with omics and experimental data, ODG will create a comparative, multi-dimensional graph database. ODG can import definitions and annotations from other sources such as InterProScan, the Gene Ontology, ENZYME, UniPathway, and others. This annotation data can be especially useful for studying new or understudied species for which transcripts have only been predicted, and rapidly give additional layers of annotation to predicted genes. In better studied species, ODG can perform syntenic annotation translations or rapidly identify characteristics of a set of genes or nucleotide locations, such as hits from an association study. ODG provides a web-based user-interface for configuring the data import and for querying the database. Queries can also be run from the command-line and the database can be queried directly through programming language hooks available for most languages. ODG supports most common genomic formats as well as generic, easy to use tab-separated value format for user-provided annotations. ODG is a user-friendly database generation and query tool that adapts to the supplied data to produce a comparative genomic database or multi-layered annotation database. ODG provides rapid comparative genomic annotation and is therefore particularly useful for non-model or

  20. Aspergillus, Penicillium and Talaromyces isolated from house dust samples collected around the world

    PubMed Central

    Visagie, C.M.; Hirooka, Y.; Tanney, J.B.; Whitfield, E.; Mwange, K.; Meijer, M.; Amend, A.S.; Seifert, K.A.; Samson, R.A.

    2014-01-01

    As part of a worldwide survey of the indoor mycobiota, dust was collected from nine countries. Analyses of dust samples included the culture-dependent dilution-to-extinction method and the culture-independent 454-pyrosequencing. Of the 7 904 isolates, 2 717 isolates were identified as belonging to Aspergillus, Penicillium and Talaromyces. The aim of this study was to identify isolates to species level and describe the new species found. Secondly, we wanted to create a reliable reference sequence database to be used for next-generation sequencing projects. Isolates represented 59 Aspergillus species, including eight undescribed species, 49 Penicillium species of which seven were undescribed and 18 Talaromyces species including three described here as new. In total, 568 ITS barcodes were generated, and 391 β-tubulin and 507 calmodulin sequences, which serve as alternative identification markers. PMID:25492981

  1. Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits.

    PubMed

    Dessimoz, Christophe; Boeckmann, Brigitte; Roth, Alexander C J; Gonnet, Gaston H

    2006-01-01

    Correct orthology assignment is a critical prerequisite of numerous comparative genomics procedures, such as function prediction, construction of phylogenetic species trees and genome rearrangement analysis. We present an algorithm for the detection of non-orthologs that arise by mistake in current orthology classification methods based on genome-specific best hits, such as the COGs database. The algorithm works with pairwise distance estimates, rather than computationally expensive and error-prone tree-building methods. The accuracy of the algorithm is evaluated through verification of the distribution of predicted cases, case-by-case phylogenetic analysis and comparisons with predictions from other projects using independent methods. Our results show that a very significant fraction of the COG groups include non-orthologs: using conservative parameters, the algorithm detects non-orthology in a third of all COG groups. Consequently, sequence analysis sensitive to correct orthology assignments will greatly benefit from these findings.

  2. Using FlyBase, a Database of Drosophila Genes and Genomes.

    PubMed

    Marygold, Steven J; Crosby, Madeline A; Goodman, Joshua L

    2016-01-01

    For nearly 25 years, FlyBase (flybase.org) has provided a freely available online database of biological information about Drosophila species, focusing on the model organism D. melanogaster. The need for a centralized, integrated view of Drosophila research has never been greater as advances in genomic, proteomic, and high-throughput technologies add to the quantity and diversity of available data and resources.FlyBase has taken several approaches to respond to these changes in the research landscape. Novel report pages have been generated for new reagent types and physical interaction data; Drosophila models of human disease are now represented and showcased in dedicated Human Disease Model Reports; other integrated reports have been established that bring together related genes, datasets, or reagents; Gene Reports have been revised to improve access to new data types and to highlight functional data; links to external sites have been organized and expanded; and new tools have been developed to display and interrogate all these data, including improved batch processing and bulk file availability. In addition, several new community initiatives have served to enhance interactions between researchers and FlyBase, resulting in direct user contributions and improved feedback.This chapter provides an overview of the data content, organization, and available tools within FlyBase, focusing on recent improvements. We hope it serves as a guide for our diverse user base, enabling efficient and effective exploration of the database and thereby accelerating research discoveries.

  3. Biological Databases for Human Research

    PubMed Central

    Zou, Dong; Ma, Lina; Yu, Jun; Zhang, Zhang

    2015-01-01

    The completion of the Human Genome Project lays a foundation for systematically studying the human genome from evolutionary history to precision medicine against diseases. With the explosive growth of biological data, there is an increasing number of biological databases that have been developed in aid of human-related research. Here we present a collection of human-related biological databases and provide a mini-review by classifying them into different categories according to their data types. As human-related databases continue to grow not only in count but also in volume, challenges are ahead in big data storage, processing, exchange and curation. PMID:25712261

  4. Aspergillus baeticus sp. nov. and Aspergillus thesauricus sp. nov., two species in section Usti from Spanish caves.

    PubMed

    Nováková, Alena; Hubka, Vit; Saiz-Jimenez, Cesareo; Kolarik, Miroslav

    2012-11-01

    Two novel species of Aspergillus that are clearly distinct from all known species in section Usti were revealed during a study of microfungal communities in Spanish caves. The novel species identified in this study and additional species of Aspergillus section Usti are associated with places and substrates related to human activities in caves. Novel species are described using data from four loci (ITS, benA, caM and rpb2), morphology and basic chemical and physiological analyses. Members of the species Aspergillus thesauricus sp. nov. were isolated from various substrates, including decaying organic matter, cave air and cave sediment of the Cueva del Tesoro Cave (the Treasure cave); the species is represented by twelve isolates and is most closely related to the recently described Aspergillus germanicus. Members of the species Aspergillus baeticus sp. nov. were isolated from cave sediment in the Gruta de las Maravillas Cave (the Grotto of the Marvels); the species is represented by two isolates. An additional isolate was found in the Cueva del Tesoro Cave and in the Demänovská Peace Cave (Slovakia), suggesting a potentially wide distribution of this micro-organism. The species is related to Aspergillus ustus and Aspergillus pseudoustus. Both species were unable to grow at 37 °C, and a weakly positive, light greenish yellow Ehrlich reaction was observed in A. thesauricus. Unique morphological features alone are sufficient to distinguish both species from related taxa.

  5. MIPS: analysis and annotation of proteins from whole genomes

    PubMed Central

    Mewes, H. W.; Amid, C.; Arnold, R.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Münsterkötter, M.; Pagel, P.; Strack, N.; Stümpflen, V.; Warfsmann, J.; Ruepp, A.

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein–protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:14681354

  6. MIPS: analysis and annotation of proteins from whole genomes.

    PubMed

    Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  7. Characterization of a polyketide synthase in Aspergillus niger whose product is a precursor for both dihydroxynaphthalene (DHN) melanin and naphtho-γ-pyrone.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chiang, Yi Ming; Meyer, Kristen M; Praseuth, Michael

    2010-12-06

    The genome sequencing of the fungus Aspergillus niger, an industrial workhorse, uncovered a large cache of genes encoding enzymes thought to be involved in the production of secondary metabolites yet to be identified. Identification and structural characterization of many of these predicted secondary metabolites are hampered by their low concentration relative to the known A. niger metabolites such as the naphtho-γ-pyrone family of polyketides. We deleted a nonreducing PKS gene in A. niger strain ATCC 11414, a daughter strain of A. niger ATCC strain 1015 whose genome was sequenced by the DOE Joint Genome Institute. This PKS encoding gene ismore » a predicted ortholog of alb1 from Aspergillus fumigatus which is responsible for production of YWA1, a precursor of fungal DHN melanin. Our results show that the A. niger alb1 PKS is responsible for the production of the polyketide precursor for DHN melanin biosynthesis. Deletion of alb1 elimnates the production of major metabolites, naphtho-γ-pyrones. The generation of an A. niger strain devoid of naphtho-γ-pyrones will greatly facilitate the elucidation of cryptic biosynthetic pathways in this organism.« less

  8. Recurrent prosthetic valve endocarditis caused by Aspergillus delacroxii (formerly Aspergillus nidulans var. echinulatus)

    PubMed Central

    Uhrin, Gábor Balázs; Jensen, Rasmus Hare; Korup, Eva; Grønlund, Jens; Hjort, Ulla; Moser, Claus; Arendrup, Maiken Cavling; Schønheyder, Henrik Carl

    2015-01-01

    We report Aspergillus delacroxii (formerly Aspergillus nidulans var. echinulatus) causing recurrent prosthetic valve endocarditis. The fungus was the sole agent detected during replacement of a mechanical aortic valve conduit due to abscess formation. Despite extensive surgery and anti-fungal treatment, the patient had a cerebral hemorrhage 4 months post-surgery prompting a diagnosis of recurrent prosthetic valve endocarditis and fungemia. PMID:26909244

  9. Development in Aspergillus

    PubMed Central

    Krijgsheld, P.; Bleichrodt, R.; van Veluw, G.J.; Wang, F.; Müller, W.H.; Dijksterhuis, J.; Wösten, H.A.B.

    2013-01-01

    The genus Aspergillus represents a diverse group of fungi that are among the most abundant fungi in the world. Germination of a spore can lead to a vegetative mycelium that colonizes a substrate. The hyphae within the mycelium are highly heterogeneous with respect to gene expression, growth, and secretion. Aspergilli can reproduce both asexually and sexually. To this end, conidiophores and ascocarps are produced that form conidia and ascospores, respectively. This review describes the molecular mechanisms underlying growth and development of Aspergillus. PMID:23450714

  10. Molecular Diagnostic Testing for Aspergillus

    PubMed Central

    Powers-Fletcher, Margaret V.

    2016-01-01

    The direct detection of Aspergillus nucleic acid in clinical specimens has the potential to improve the diagnosis of aspergillosis by offering more rapid and sensitive identification of invasive infections than is possible with traditional techniques, such as culture or histopathology. Molecular tests for Aspergillus have been limited historically by lack of standardization and variable sensitivities and specificities. Recent efforts have been directed at addressing these limitations and optimizing assay performance using a variety of specimen types. This review provides a summary of standardization efforts and outlines the complexities of molecular testing for Aspergillus in clinical mycology. PMID:27487954

  11. Characterization of oxylipins and dioxygenase genes in the asexual fungus Aspergillus niger

    PubMed Central

    2009-01-01

    Background Aspergillus niger is an ascomycetous fungus that is known to reproduce through asexual spores, only. Interestingly, recent genome analysis of A. niger has revealed the presence of a full complement of functional genes related to sexual reproduction [1]. An example of such genes are the dioxygenase genes which in Aspergillus nidulans, have been shown to be connected to oxylipin production and regulation of both sexual and asexual sporulation [2-4]. Nevertheless, the presence of sex related genes alone does not confirm sexual sporulation in A. niger. Results The current study shows experimentally that A. niger produces the oxylipins 8,11-dihydroxy octadecadienoic acid (8,11-diHOD), 5,8-dihydroxy octadecadienoic acid (5,8-diHOD), lactonized 5,8-diHOD, 8-hydroxy octadecadienoic acid (8-HOD), 10-hydroxy octadecadienoic acid (10-HOD), small amounts of 8-hydroxy octadecamonoenoic acid (8-HOM), 9-hydroxy octadecadienoic acid (9-HOD) and 13-hydroxy octadecadienoic acid (13-HOD). Importantly, this study shows that the A. niger genome contains three putative dioxygenase genes, ppoA, ppoC and ppoD. Expression analysis confirmed that all three genes are indeed expressed under the conditions tested. Conclusion A. niger produces the same oxylipins and has similar dioxygenase genes as A. nidulans. Their presence could point towards the existence of sexual reproduction in A. niger or a broader role for the gene products in physiology, than just sexual development. PMID:19309517

  12. [Survival Strategies of Aspergillus in the Human Body].

    PubMed

    Tashiro, Masato; Izumikawa, Koichi

    2017-01-01

     The human body is a hostile environment for Aspergillus species, which originally live outside the human body. There are lots of elimination mechanisms against Aspergillus inhaled into the human body, such as high body temperature, soluble lung components, mucociliary clearance mechanism, or responses of phagocytes. Aspergillus fumigatus, which is the primary causative agent of human infections among the human pathogenic species of Aspergillus, defend itself from the hostile human body environment by various mechanisms, such as thermotolerance, mycotoxin production, and characteristic morphological features. Here we review mechanisms of defense in Aspergillus against elimination from the human body.

  13. Expanded microbial genome coverage and improved protein family annotation in the COG database.

    PubMed

    Galperin, Michael Y; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V

    2015-01-01

    Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the

  14. In silico analysis of protein toxin and bacteriocins from Lactobacillus paracasei SD1 genome and available online databases

    PubMed Central

    Surachat, Komwit; Sangket, Unitsa; Deachamag, Panchalika; Chotigeat, Wilaiwan

    2017-01-01

    Lactobacillus paracasei SD1 is a potential probiotic strain due to its ability to survive several conditions in human dental cavities. To ascertain its safety for human use, we therefore performed a comprehensive bioinformatics analysis and characterization of the bacterial protein toxins produced by this strain. We report the complete genome of Lactobacillus paracasei SD1 and its comparison to other Lactobacillus genomes. Additionally, we identify and analyze its protein toxins and antimicrobial proteins using reliable online database resources and establish its phylogenetic relationship with other bacterial genomes. Our investigation suggests that this strain is safe for human use and contains several bacteriocins that confer health benefits to the host. An in silico analysis of protein-protein interactions between the target bacteriocins and the microbial proteins gtfB and luxS of Streptococcus mutans was performed and is discussed here. PMID:28837656

  15. MELOGEN: an EST database for melon functional genomics

    PubMed Central

    Gonzalez-Ibeas, Daniel; Blanca, José; Roig, Cristina; González-To, Mireia; Picó, Belén; Truniger, Verónica; Gómez, Pedro; Deleu, Wim; Caño-Delgado, Ana; Arús, Pere; Nuez, Fernando; Garcia-Mas, Jordi; Puigdomènech, Pere; Aranda, Miguel A

    2007-01-01

    Background Melon (Cucumis melo L.) is one of the most important fleshy fruits for fresh consumption. Despite this, few genomic resources exist for this species. To facilitate the discovery of genes involved in essential traits, such as fruit development, fruit maturation and disease resistance, and to speed up the process of breeding new and better adapted melon varieties, we have produced a large collection of expressed sequence tags (ESTs) from eight normalized cDNA libraries from different tissues in different physiological conditions. Results We determined over 30,000 ESTs that were clustered into 16,637 non-redundant sequences or unigenes, comprising 6,023 tentative consensus sequences (contigs) and 10,614 unclustered sequences (singletons). Many potential molecular markers were identified in the melon dataset: 1,052 potential simple sequence repeats (SSRs) and 356 single nucleotide polymorphisms (SNPs) were found. Sixty-nine percent of the melon unigenes showed a significant similarity with proteins in databases. Functional classification of the unigenes was carried out following the Gene Ontology scheme. In total, 9,402 unigenes were mapped to one or more ontology. Remarkably, the distributions of melon and Arabidopsis unigenes followed similar tendencies, suggesting that the melon dataset is representative of the whole melon transcriptome. Bioinformatic analyses primarily focused on potential precursors of melon micro RNAs (miRNAs) in the melon dataset, but many other genes potentially controlling disease resistance and fruit quality traits were also identified. Patterns of transcript accumulation were characterised by Real-Time-qPCR for 20 of these genes. Conclusion The collection of ESTs characterised here represents a substantial increase on the genetic information available for melon. A database (MELOGEN) which contains all EST sequences, contig images and several tools for analysis and data mining has been created. This set of sequences constitutes

  16. Aspergillus discitis with acute disc abscess.

    PubMed

    Assaad, W; Nuchikat, P S; Cohen, L; Esguerra, J V; Whittier, F C

    1994-10-01

    Aspergillus osteomyelitis of the vertebral body and disc space is rare. This report discusses a case that occurred in an immunosuppressed 29-year-old man and reviews the pertinent medical literature. To review the management and treatment of Aspergillus osteomyelitis of the vertebral body and disc space. The patient presented with acute neurologic compromise resulting from L5-S1 discitis and a large epidural soft tissue component secondary to the Aspergillus infection. The patient underwent aggressive surgical debridement along with treatment with amphotericin B and had a complete clinical recovery. The authors recommend a combined medical-surgical approach in most cases of vertebral Aspergillus osteomyelitis. Early surgery with vigorous surgical debridement along with antifungal treatment seems to yield a good outcome.

  17. Coverage of whole proteome by structural genomics observed through protein homology modeling database

    PubMed Central

    Yamaguchi, Akihiro; Go, Mitiko

    2006-01-01

    We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE (http://daisy.nagahama-i-bio.ac.jp/Famsbase/), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50% of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60% of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30% in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5%, and that for eubacteria by 7% in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics. PMID:17146617

  18. pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.

    PubMed

    Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael

    2013-08-01

    With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains. © 2013 John Wiley & Sons Ltd and Society for Applied Microbiology.

  19. Phylogeny, identification and nomenclature of the genus Aspergillus

    PubMed Central

    Samson, R.A.; Visagie, C.M.; Houbraken, J.; Hong, S.-B.; Hubka, V.; Klaassen, C.H.W.; Perrone, G.; Seifert, K.A.; Susca, A.; Tanney, J.B.; Varga, J.; Kocsubé, S.; Szigeti, G.; Yaguchi, T.; Frisvad, J.C.

    2014-01-01

    Aspergillus comprises a diverse group of species based on morphological, physiological and phylogenetic characters, which significantly impact biotechnology, food production, indoor environments and human health. Aspergillus was traditionally associated with nine teleomorph genera, but phylogenetic data suggest that together with genera such as Polypaecilum, Phialosimplex, Dichotomomyces and Cristaspora, Aspergillus forms a monophyletic clade closely related to Penicillium. Changes in the International Code of Nomenclature for algae, fungi and plants resulted in the move to one name per species, meaning that a decision had to be made whether to keep Aspergillus as one big genus or to split it into several smaller genera. The International Commission of Penicillium and Aspergillus decided to keep Aspergillus instead of using smaller genera. In this paper, we present the arguments for this decision. We introduce new combinations for accepted species presently lacking an Aspergillus name and provide an updated accepted species list for the genus, now containing 339 species. To add to the scientific value of the list, we include information about living ex-type culture collection numbers and GenBank accession numbers for available representative ITS, calmodulin, β-tubulin and RPB2 sequences. In addition, we recommend a standard working technique for Aspergillus and propose calmodulin as a secondary identification marker. PMID:25492982

  20. The development of large-scale de-identified biomedical databases in the age of genomics-principles and challenges.

    PubMed

    Dankar, Fida K; Ptitsyn, Andrey; Dankar, Samar K

    2018-04-10

    Contemporary biomedical databases include a wide range of information types from various observational and instrumental sources. Among the most important features that unite biomedical databases across the field are high volume of information and high potential to cause damage through data corruption, loss of performance, and loss of patient privacy. Thus, issues of data governance and privacy protection are essential for the construction of data depositories for biomedical research and healthcare. In this paper, we discuss various challenges of data governance in the context of population genome projects. The various challenges along with best practices and current research efforts are discussed through the steps of data collection, storage, sharing, analysis, and knowledge dissemination.

  1. Aspergillus colonization in patients with bronchogenic carcinoma.

    PubMed

    Ali, Sana; Malik, Abida; Bhargava, Rakesh; Shahid, Mohammad; Fatima, Nazish

    2014-05-01

    Aspergillus antigens such as galactomannan antigen, a cell wall polysaccharide, can be detected in patient's serum or bronchoalveolar lavage. To study the prevalence of Aspergillus infection in patients with bronchogenic carcinoma, we measured galactomannan antigen in serum and bronchoalveolar lavage samples of patients with bronchogenic carcinoma. The study was conducted on 45 bronchogenic carcinoma patients. The diagnosis of lung cancer was confirmed by bronchoscopy, histopathological and radiological examinations. Bronchoalveolar lavage fluid collected from each patient by fiberoptic bronchoscopy was subjected to direct microscopy and culture on Sabouraud's dextrose agar and Czapek-Dox agar, and Aspergillus galactomannan antigen was measured in serum and bronchoalveolar lavage samples. The majority of patients were male (93.3%) in the age group 51-60 years, 88.9% were addicted to gutka chewing, and 82.1% were addicted to smoking. Most patients complained of cough (73%) and shortness of breath (51.1%). Squamous cell carcinoma (64.4%) was the most common malignancy, followed by adenocarcinoma (13.3%). On culture of bronchoalveolar lavage samples, 35.5% showed growth of Aspergillus spp. (Aspergillus fumigatus in 17.8%, Aspergillus flavus in 13.3%, and Aspergillus niger in 4.4%). Galactomannan antigen was detected in 58.3% of bronchoalveolar lavage samples and 47.2% of serum samples. There is a high prevalence of aspergillosis in patients with lung carcinoma, especially among smokers and gutka chewers.

  2. Gramene database in 2010: updates and extensions.

    PubMed

    Youens-Clark, Ken; Buckler, Ed; Casstevens, Terry; Chen, Charles; Declerck, Genevieve; Derwent, Paul; Dharmawardhana, Palitha; Jaiswal, Pankaj; Kersey, Paul; Karthikeyan, A S; Lu, Jerry; McCouch, Susan R; Ren, Liya; Spooner, William; Stein, Joshua C; Thomason, Jim; Wei, Sharon; Ware, Doreen

    2011-01-01

    Now in its 10th year, the Gramene database (http://www.gramene.org) has grown from its primary focus on rice, the first fully-sequenced grass genome, to become a resource for major model and crop plants including Arabidopsis, Brachypodium, maize, sorghum, poplar and grape in addition to several species of rice. Gramene began with the addition of an Ensembl genome browser and has expanded in the last decade to become a robust resource for plant genomics hosting a wide array of data sets including quantitative trait loci (QTL), metabolic pathways, genetic diversity, genes, proteins, germplasm, literature, ontologies and a fully-structured markers and sequences database integrated with genome browsers and maps from various published studies (genetic, physical, bin, etc.). In addition, Gramene now hosts a variety of web services including a Distributed Annotation Server (DAS), BLAST and a public MySQL database. Twice a year, Gramene releases a major build of the database and makes interim releases to correct errors or to make important updates to software and/or data.

  3. Databases for Microbiologists

    DOE PAGES

    Zhulin, Igor B.

    2015-05-26

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. Finally, the purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists.

  4. Databases for Microbiologists

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhulin, Igor B.

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. Finally, the purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists.

  5. Databases for Microbiologists

    PubMed Central

    2015-01-01

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. The purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists. PMID:26013493

  6. Reduction of aflatoxin production by Aspergillus flavus and Aspergillus parasiticus in interaction with Streptomyces.

    PubMed

    Verheecke, C; Liboz, T; Anson, P; Diaz, R; Mathieu, F

    2015-05-01

    The aim of this study is to investigate aflatoxin gene expression during Streptomyces-Aspergillus interaction. Aflatoxins are carcinogenic compounds produced mainly by Aspergillus flavus and Aspergillus parasiticus. A previous study has shown that Streptomyces-A. flavus interaction can reduce aflatoxin content in vitro. Here, we first validated this same effect in the interaction with A. parasiticus. Moreover, we showed that growth reduction and aflatoxin content were correlated in A. parasiticus but not in A. flavus. Secondly, we investigated the mechanisms of action by reverse-transcriptase quantitative PCR. As microbial interaction can lead to variations in expression of household genes, the most stable [act1, βtub (and cox5 for A. parasiticus)] were chosen using geNorm software. To shed light on the mechanisms involved, we studied during the interaction the expression of five genes (aflD, aflM, aflP, aflR and aflS). Overall, the results of aflatoxin gene expression showed that Streptomyces repressed gene expression to a greater level in A. parasiticus than in A. flavus. Expression of aflR and aflS was generally repressed in both Aspergillus species. Expression of aflM was repressed and was correlated with aflatoxin B1 content. The results suggest that aflM expression could be a potential aflatoxin indicator in Streptomyces species interactions. Therefore, we demonstrate that Streptomyces can reduce aflatoxin production by both Aspergillus species and that this effect can be correlated with the repression of aflM expression. © 2015 The Authors.

  7. Ecophysiological characterization of Aspergillus carbonarius, Aspergillus tubingensis and Aspergillus niger isolated from grapes in Spanish vineyards.

    PubMed

    García-Cela, E; Crespo-Sempere, A; Ramos, A J; Sanchis, V; Marin, S

    2014-03-03

    The aim of this study was to evaluate the diversity of black aspergilli isolated from berries from different agroclimatic regions of Spain. Growth characterization (in terms of temperature and water activity requirements) of Aspergillus carbonarius, Aspergillus tubingensis and Aspergillus niger was carried out on synthetic grape medium. A. tubingensis and A. niger showed higher maximum temperatures for growth (>45 °C versus 40-42 °C), and lower minimum aw requirements (0.83 aw versus 0.87 aw) than A. carbonarius. No differences in growth boundaries due to their geographical origin were found within A. niger aggregate isolates. Conversely, A. carbonarius isolates from the hotter and drier region grew and produced OTA at lower aw than other isolates. However, little genetic diversity in A. carbonarius was observed for the microsatellites tested and the same sequence of β-tubulin gene was observed; therefore intraspecific variability did not correlate with the geographical origin of the isolates or with their ability to produce OTA. Climatic change prediction points to drier and hotter climatic scenarios where A. tubingensis and A. niger could be even more prevalent over A. carbonarius, since they are better adapted to extreme high temperature and drier conditions. Copyright © 2013 Elsevier B.V. All rights reserved.

  8. Immunoevasive Aspergillus virulence factors.

    PubMed

    Chotirmall, Sanjay H; Mirkovic, Bojana; Lavelle, Gillian M; McElvaney, Noel G

    2014-12-01

    Individuals with structural lung disease or defective immunity are predisposed to Aspergillus-associated disease. Manifestations range from allergic to cavitary or angio-invasive syndromes. Despite daily spore inhalation, immunocompetence facilitates clearance through initiation of innate and adaptive host responses. These include mechanical barriers, phagocyte activation, antimicrobial peptide release and pattern recognition receptor activation. Adaptive responses include Th1 and Th2 approaches. Understanding Aspergillus virulence mechanisms remains critical to the development of effective research and treatment strategies to counteract the fungi. Major virulence factors relate to fungal structure, protease release and allergens; however, mechanisms utilized to evade immune recognition continue to be important in establishing infection. These include the fungal rodlet layer, dihydroxynaphthalene-melanin, detoxifying systems for reactive oxygen species and toxin release. One major immunoevasive toxin, gliotoxin, plays a key role in mediating Aspergillus-associated colonization in the context of cystic fibrosis. Here, it down-regulates vitamin D receptor expression which following itraconazole therapy is rescued concurrent with decreased Th2 cytokine (IL-5 and IL-13) concentrations in the CF airway. This review focuses on the interaction between Aspergillus pathogenic mechanisms, host immune responses and the immunoevasive strategies employed by the organism during disease states such as that observed in cystic fibrosis.

  9. openSputnik--a database to ESTablish comparative plant genomics using unsaturated sequence collections.

    PubMed

    Rudd, Stephen

    2005-01-01

    The public expressed sequence tag collections are continually being enriched with high-quality sequences that represent an ever-expanding range of taxonomically diverse plant species. While these sequence collections provide biased insight into the populations of expressed genes available within individual species and their associated tissues, the information is conceivably of wider relevance in a comparative context. When we consider the available expressed sequence tag (EST) collections of summer 2004, most of the major plant taxonomic clades are at least superficially represented. Investigation of the five million available plant ESTs provides a wealth of information that has applications in modelling the routes of plant genome evolution and the identification of lineage-specific genes and gene families. Over four million ESTs from over 50 distinct plant species have been collated within an EST analysis pipeline called openSputnik. The ESTs were resolved down into approximately one million unigene sequences. These have been annotated using orthology-based annotation transfer from reference plant genomes and using a variety of contemporary bioinformatics methods to assign peptide, structural and functional attributes. The openSputnik database is available at http://sputnik.btk.fi.

  10. Analysis of secreted proteins from Aspergillus flavus.

    PubMed

    Medina, Martha L; Haynes, Paul A; Breci, Linda; Francisco, Wilson A

    2005-08-01

    MS/MS techniques in proteomics make possible the identification of proteins from organisms with little or no genome sequence information available. Peptide sequences are obtained from tandem mass spectra by matching peptide mass and fragmentation information to protein sequence information from related organisms, including unannotated genome sequence data. This peptide identification data can then be grouped and reconstructed into protein data. In this study, we have used this approach to study protein secretion by Aspergillus flavus, a filamentous fungus for which very little genome sequence information is available. A. flavus is capable of degrading the flavonoid rutin (quercetin 3-O-glycoside), as the only source of carbon via an extracellular enzyme system. In this continuing study, a proteomic analysis was used to identify secreted proteins from A. flavus when grown on rutin. The growth media glucose and potato dextrose were used to identify differentially expressed secreted proteins. The secreted proteins were analyzed by 1- and 2-DE and MS/MS. A total of 51 unique A. flavus secreted proteins were identified from the three growth conditions. Ten proteins were unique to rutin-, five to glucose- and one to potato dextrose-grown A. flavus. Sixteen secreted proteins were common to all three media. Fourteen identifications were of hypothetical proteins or proteins of unknown functions. To our knowledge, this is the first extensive proteomic study conducted to identify the secreted proteins from a filamentous fungus.

  11. PlantRGDB: A Database of Plant Retrocopied Genes.

    PubMed

    Wang, Yi

    2017-01-01

    RNA-based gene duplication, known as retrocopy, plays important roles in gene origination and genome evolution. The genomes of many plants have been sequenced, offering an opportunity to annotate and mine the retrocopies in plant genomes. However, comprehensive and unified annotation of retrocopies in these plants is still lacking. In this study I constructed the PlantRGDB (Plant Retrocopied Gene DataBase), the first database of plant retrocopies, to provide a putatively complete centralized list of retrocopies in plant genomes. The database is freely accessible at http://probes.pw.usda.gov/plantrgdb or http://aegilops.wheat.ucdavis.edu/plantrgdb. It currently integrates 49 plant species and 38,997 retrocopies along with characterization information. PlantRGDB provides a user-friendly web interface for searching, browsing and downloading the retrocopies in the database. PlantRGDB also offers graphical viewer-integrated sequence information for displaying the structure of each retrocopy. The attributes of the retrocopies of each species are reported using a browse function. In addition, useful tools, such as an advanced search and BLAST, are available to search the database more conveniently. In conclusion, the database will provide a web platform for obtaining valuable insight into the generation of retrocopies and will supplement research on gene duplication and genome evolution in plants. © The Author 2017. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  12. PeroxisomeDB: a database for the peroxisomal proteome, functional genomics and disease

    PubMed Central

    Schlüter, Agatha; Fourcade, Stéphane; Domènech-Estévez, Enric; Gabaldón, Toni; Huerta-Cepas, Jaime; Berthommier, Guillaume; Ripp, Raymond; Wanders, Ronald J. A.; Poch, Olivier; Pujol, Aurora

    2007-01-01

    Peroxisomes are essential organelles of eukaryotic origin, ubiquitously distributed in cells and organisms, playing key roles in lipid and antioxidant metabolism. Loss or malfunction of peroxisomes causes more than 20 fatal inherited conditions. We have created a peroxisomal database () that includes the complete peroxisomal proteome of Homo sapiens and Saccharomyces cerevisiae, by gathering, updating and integrating the available genetic and functional information on peroxisomal genes. PeroxisomeDB is structured in interrelated sections ‘Genes’, ‘Functions’, ‘Metabolic pathways’ and ‘Diseases’, that include hyperlinks to selected features of NCBI, ENSEMBL and UCSC databases. We have designed graphical depictions of the main peroxisomal metabolic routes and have included updated flow charts for diagnosis. Precomputed BLAST, PSI-BLAST, multiple sequence alignment (MUSCLE) and phylogenetic trees are provided to assist in direct multispecies comparison to study evolutionary conserved functions and pathways. Highlights of the PeroxisomeDB include new tools developed for facilitating (i) identification of novel peroxisomal proteins, by means of identifying proteins carrying peroxisome targeting signal (PTS) motifs, (ii) detection of peroxisomes in silico, particularly useful for screening the deluge of newly sequenced genomes. PeroxisomeDB should contribute to the systematic characterization of the peroxisomal proteome and facilitate system biology approaches on the organelle. PMID:17135190

  13. Aspergillus infections in cystic fibrosis.

    PubMed

    King, Jill; Brunel, Shan F; Warris, Adilia

    2016-07-05

    Patients with cystic fibrosis (CF) suffer from chronic lung infection and airway inflammation. Respiratory failure secondary to chronic or recurrent infection remains the commonest cause of death and accounts for over 90% of mortality. Bacteria as Staphylococcus aureus, Pseudomonas aeruginosa and Burkholderia cepacia complex have been regarded the main CF pathogens and their role in progressive lung decline has been studied extensively. Little attention has been paid to the role of Aspergillus spp. and other filamentous fungi in the pathogenesis of non-ABPA (allergic bronchopulmonary aspergillosis) respiratory disease in CF, despite their frequent recovery in respiratory samples. It has become more apparent however, that Aspergillus spp. may play an important role in chronic lung disease in CF. Research delineating the underlying mechanisms of Aspergillus persistence and infection in the CF lung and its link to lung deterioration is lacking. This review summarizes the Aspergillus disease phenotypes observed in CF, discusses the role of CFTR (cystic fibrosis transmembrane conductance regulator)-protein in innate immune responses and new treatment modalities. Copyright © 2016. Published by Elsevier Ltd.

  14. Use of UHPLC high-resolution Orbitrap mass spectrometry to investigate the genes involved in the production of secondary metabolites in Aspergillus flavus

    USDA-ARS?s Scientific Manuscript database

    The fungus Aspergillus flavus is known for its ability to produce the toxic and carcinogenic aflatoxins in food and feed. While aflatoxins are of most concern, A. flavus is predicted to be capable of producing many more metabolites based on a study of its complete genome sequence. Some of these meta...

  15. Sputnik: a database platform for comparative plant genomics.

    PubMed

    Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F X

    2003-01-01

    Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics.

  16. Sputnik: a database platform for comparative plant genomics

    PubMed Central

    Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F.X.

    2003-01-01

    Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics. PMID:12519965

  17. Entamoeba histolytica: construction and applications of subgenomic databases.

    PubMed

    Hofer, Margit; Duchêne, Michael

    2005-07-01

    Knowledge about the influence of environmental stress such as the action of chemotherapeutic agents on gene expression in Entamoeba histolytica is limited. We plan to use oligonucleotide microarray hybridization to approach these questions. As the basis for our array, sequence data from the genome project carried out by the Institute for Genomic Research (TIGR) and the Sanger Institute were used to annotate parts of the parasite genome. Three subgenomic databases containing enzymes, cytoskeleton genes, and stress genes were compiled with the help of the ExPASy proteomics website and the BLAST servers at the two genome project sites. The known sequences from reference species, mostly human and Escherichia coli, were searched against TIGR and Sanger E. histolytica sequence contigs and the homologs were copied into a Microsoft Access database. In a similar way, two additional databases of cytoskeletal genes and stress genes were generated. Metabolic pathways could be assembled from our enzyme database, but sometimes they were incomplete as is the case for the sterol biosynthesis pathway. The raw databases contained a significant number of duplicate entries which were merged to obtain curated non-redundant databases. This procedure revealed that some E. histolytica genes may have several putative functions. Representative examples such as the case of the delta-aminolevulinate synthase/serine palmitoyltransferase are discussed.

  18. Comparative systems analysis of the secretome of the opportunistic pathogen Aspergillus fumigatus and other Aspergillus species.

    PubMed

    Vivek-Ananth, R P; Mohanraj, Karthikeyan; Vandanashree, Muralidharan; Jhingran, Anupam; Craig, James P; Samal, Areejit

    2018-04-26

    Aspergillus fumigatus and multiple other Aspergillus species cause a wide range of lung infections, collectively termed aspergillosis. Aspergilli are ubiquitous in environment with healthy immune systems routinely eliminating inhaled conidia, however, Aspergilli can become an opportunistic pathogen in immune-compromised patients. The aspergillosis mortality rate and emergence of drug-resistance reveals an urgent need to identify novel targets. Secreted and cell membrane proteins play a critical role in fungal-host interactions and pathogenesis. Using a computational pipeline integrating data from high-throughput experiments and bioinformatic predictions, we have identified secreted and cell membrane proteins in ten Aspergillus species known to cause aspergillosis. Small secreted and effector-like proteins similar to agents of fungal-plant pathogenesis were also identified within each secretome. A comparison with humans revealed that at least 70% of Aspergillus secretomes have no sequence similarity with the human proteome. An analysis of antigenic qualities of Aspergillus proteins revealed that the secretome is significantly more antigenic than cell membrane proteins or the complete proteome. Finally, overlaying an expression dataset, four A. fumigatus proteins upregulated during infection and with available structures, were found to be structurally similar to known drug target proteins in other organisms, and were able to dock in silico with the respective drug.

  19. Nencki Genomics Database—Ensembl funcgen enhanced with intersections, user data and genome-wide TFBS motifs

    PubMed Central

    Krystkowiak, Izabella; Lenart, Jakub; Debski, Konrad; Kuterba, Piotr; Petas, Michal; Kaminska, Bozena; Dabrowski, Michal

    2013-01-01

    We present the Nencki Genomics Database, which extends the functionality of Ensembl Regulatory Build (funcgen) for the three species: human, mouse and rat. The key enhancements over Ensembl funcgen include the following: (i) a user can add private data, analyze them alongside the public data and manage access rights; (ii) inside the database, we provide efficient procedures for computing intersections between regulatory features and for mapping them to the genes. To Ensembl funcgen-derived data, which include data from ENCODE, we add information on conserved non-coding (putative regulatory) sequences, and on genome-wide occurrence of transcription factor binding site motifs from the current versions of two major motif libraries, namely, Jaspar and Transfac. The intersections and mapping to the genes are pre-computed for the public data, and the result of any procedure run on the data added by the users is stored back into the database, thus incrementally increasing the body of pre-computed data. As the Ensembl funcgen schema for the rat is currently not populated, our database is the first database of regulatory features for this frequently used laboratory animal. The database is accessible without registration using the mysql client: mysql –h database.nencki-genomics.org –u public. Registration is required only to add or access private data. A WSDL webservice provides access to the database from any SOAP client, including the Taverna Workbench with a graphical user interface. Database URL: http://www.nencki-genomics.org. PMID:24089456

  20. Aspergillus tracheobronchitis in a mild immunocompromised host.

    PubMed

    Cho, Byung Ha; Oh, Youngmin; Kang, Eun Seok; Hong, Yong Joo; Jeong, Hye Won; Lee, Ok-Jun; Chang, You-Jin; Choe, Kang Hyeon; Lee, Ki Man; An, Jin-Young

    2014-11-01

    Aspergillus tracheobronchitis is a form of invasive pulmonary aspergillosis in which the Aspergillus infection is limited predominantly to the tracheobronchial tree. It occurs primarily in severely immunocompromised patients such as lung transplant recipients. Here, we report a case of Aspergillus tracheobronchitis in a 42-year-old man with diabetes mellitus, who presented with intractable cough, lack of expectoration of sputum, and chest discomfort. The patient did not respond to conventional treatment with antibiotics and antitussive agents, and he underwent bronchoscopy that showed multiple, discrete, gelatinous whitish plaques mainly involving the trachea and the left bronchus. On the basis of the bronchoscopic and microbiologic findings, we made the diagnosis of Aspergillus tracheobronchitis and initiated antifungal therapy. He showed gradual improvement in his symptoms and continued taking oral itraconazole for 6 months. Physicians should consider Aspergillus tracheobronchitis as a probable diagnosis in immunocompromised patients presenting with atypical respiratory symptoms and should try to establish a prompt diagnosis.

  1. BloodChIP: a database of comparative genome-wide transcription factor binding profiles in human blood cells.

    PubMed

    Chacon, Diego; Beck, Dominik; Perera, Dilmi; Wong, Jason W H; Pimanda, John E

    2014-01-01

    The BloodChIP database (http://www.med.unsw.edu.au/CRCWeb.nsf/page/BloodChIP) supports exploration and visualization of combinatorial transcription factor (TF) binding at a particular locus in human CD34-positive and other normal and leukaemic cells or retrieval of target gene sets for user-defined combinations of TFs across one or more cell types. Increasing numbers of genome-wide TF binding profiles are being added to public repositories, and this trend is likely to continue. For the power of these data sets to be fully harnessed by experimental scientists, there is a need for these data to be placed in context and easily accessible for downstream applications. To this end, we have built a user-friendly database that has at its core the genome-wide binding profiles of seven key haematopoietic TFs in human stem/progenitor cells. These binding profiles are compared with binding profiles in normal differentiated and leukaemic cells. We have integrated these TF binding profiles with chromatin marks and expression data in normal and leukaemic cell fractions. All queries can be exported into external sites to construct TF-gene and protein-protein networks and to evaluate the association of genes with cellular processes and tissue expression.

  2. Two novel species of Aspergillus section Nigri from indoor air

    USDA-ARS?s Scientific Manuscript database

    Aspergillus collinsii, Aspergillus floridensis, and Aspergillus trinidadensis are described as novel uniseriate species of Aspergillus section Nigri isolated from air samples. To describe the species we used phenotypes from 7-d Czapek yeast extract agar culture (CYA) and malt extract agar culture (M...

  3. Short Interspersed Nuclear Element (SINE) Sequences in the Genome of the Human Pathogenic Fungus Aspergillus fumigatus Af293

    PubMed Central

    Kanhayuwa, Lakkhana; Coutts, Robert H. A.

    2016-01-01

    Novel families of short interspersed nuclear element (SINE) sequences in the human pathogenic fungus Aspergillus fumigatus, clinical isolate Af293, were identified and categorised into tRNA-related and 5S rRNA-related SINEs. Eight predicted tRNA-related SINE families originating from different tRNAs, and nominated as AfuSINE2 sequences, contained target site duplications of short direct repeat sequences (4–14 bp) flanking the elements, an extended tRNA-unrelated region and typical features of RNA polymerase III promoter sequences. The elements ranged in size from 140–493 bp and were present in low copy number in the genome and five out of eight were actively transcribed. One putative tRNAArg-derived sequence, AfuSINE2-1a possessed a unique feature of repeated trinucleotide ACT residues at its 3’-terminus. This element was similar in sequence to the I-4_AO element found in A. oryzae and an I-1_AF long nuclear interspersed element-like sequence identified in A. fumigatus Af293. Families of 5S rRNA-related SINE sequences, nominated as AfuSINE3, were also identified and their 5'-5S rRNA-related regions show 50–65% and 60–75% similarity to respectively A. fumigatus 5S rRNAs and SINE3-1_AO found in A. oryzae. A. fumigatus Af293 contains five copies of AfuSINE3 sequences ranging in size from 259–343 bp and two out of five AfuSINE3 sequences were actively transcribed. Investigations on AfuSINE distribution in the fungal genome revealed that the elements are enriched in pericentromeric and subtelomeric regions and inserted within gene-rich regions. We also demonstrated that some, but not all, AfuSINE sequences are targeted by host RNA silencing mechanisms. Finally, we demonstrated that infection of the fungus with mycoviruses had no apparent effects on SINE activity. PMID:27736869

  4. In silico analysis of β-mannanases and β-mannosidase from Aspergillus flavus and Trichoderma virens UKM1

    NASA Astrophysics Data System (ADS)

    Yee, Chai Sin; Murad, Abdul Munir Abdul; Bakar, Farah Diba Abu

    2013-11-01

    A gene encoding an endo-β-1,4-mannanase from Trichoderma virens UKM1 (manTV) and Aspergillus flavus UKM1 (manAF) was analysed with bioinformatic tools. In addition, A. flavus NRRL 3357 genome database was screened for a β-mannosidase gene and analysed (mndA-AF). These three genes were analysed to understand their gene properties. manTV and manAF both consists of 1,332-bp and 1,386-bp nucleotides encoding 443 and 461 amino acid residues, respectively. Both the endo-β-1,4-mannanases belong to the glycosyl hydrolase family 5 and contain a carbohydrate-binding module family 1 (CBM1). On the other hand, mndA-AF which is a 2,745-bp gene encodes a protein sequence of 914 amino acid residues. This β-mannosidase belongs to the glycosyl hydrolase family 2. Predicted molecular weight of manTV, manAF and mndA-AF are 47.74 kDa, 49.71 kDa and 103 kDa, respectively. All three predicted protein sequences possessed signal peptide sequence and are highly conserved among other fungal β-mannanases and β-mannosidases.

  5. Discrimination of Aspergillus lentulus from Aspergillus fumigatus by Raman spectroscopy and MALDI-TOF MS.

    PubMed

    Verwer, P E B; van Leeuwen, W B; Girard, V; Monnin, V; van Belkum, A; Staab, J F; Verbrugh, H A; Bakker-Woudenberg, I A J M; van de Sande, W W J

    2014-02-01

    In 2005, a new sibling species of Aspergillus fumigatus was discovered: Aspergillus lentulus. Both species can cause invasive fungal disease in immune-compromised patients. The species are morphologically very similar. Current techniques for identification are PCR-based or morphology-based. These techniques are labour-intense and not sufficiently discriminatory. Since A. lentulus is less susceptible to several antifungal agents, it is important to correctly identify the causative infectious agent in order to optimize antifungal therapy. In this study we determined whether Raman spectroscopy and/or MALDI-TOF MS were able to differentiate between A. lentulus and A. fumigatus. For 16 isolates of A. lentulus and 16 isolates of A. fumigatus, Raman spectra and peptide profiles were obtained using the Spectracell and MALDI-TOF MS (VITEK MS RUO, bioMérieux) respectively. In order to obtain reliable Raman spectra for A. fumigatus and A. lentulus, the culture medium needed to be adjusted to obtain colourless conidia. Only Raman spectra obtained from colourless conidia were reproducible and correctly identified 25 out of 32 (78 %) of the Aspergillus strains. For VITEK MS RUO, no medium adjustments were necessary. Pigmented conidia resulted in reproducible peptide profiles as well in this case. VITEK MS RUO correctly identified 100 % of the Aspergillus isolates, within a timeframe of approximately 54 h including culture. Of the two techniques studied here, VITEK MS RUO was superior to Raman spectroscopy in the discrimination of A. lentulus from A. fumigatus. VITEK MS RUO seems to be a successful technique in the daily identification of Aspergillus spp. within a limited timeframe.

  6. Genomics of peanut leaf-spot pathogens; and RNA-interference-mediated control of aflatoxins

    USDA-ARS?s Scientific Manuscript database

    An overview update of the research done at USDA-ARS National Peanut Research Laboratory will be presented: including: the release of the Cercospora arachidicola genome, sequencing of Cercosporidium personatum, a workflow to study genetic diversity of aflatoxigenic Aspergillus, and progress on the us...

  7. PCR-Internal Transcribed Spacer (ITS) genes sequencing and phylogenetic analysis of clinical and environmental Aspergillus species associated with HIV-TB co infected patients in a hospital in Abeokuta, southwestern Nigeria.

    PubMed

    Shittu, Olufunke Bolatito; Adelaja, Oluwabunmi Molade; Obuotor, Tolulope Mobolaji; Sam-Wobo, Sam Olufemi; Adenaike, Adeyemi Sunday

    2016-03-01

    Aspergillosis has been identified as one of the hospital acquired infections but the contribution of water and inhouse air as possible sources of Aspergillus infection in immunocompromised individuals like HIV-TB patients have not been studied in any hospital setting in Nigeria. To identify and investigate genetic relationship between clinical and environmental Aspergillus sp. associated with HIV-TB co infected patients. DNA extraction, purification, amplification and sequencing of Internal Transcribed Spacer (ITS) genes were performed using standard protocols. Similarity search using BLAST on NCBI was used for species identification and MEGA 5.0 was used for phylogenetic analysis. Analyses of sequenced ITS genes of selected fourteen (14) Aspergillus isolates identified in the GenBank database revealed Aspergillus niger (28.57%), A. tubingensis (7.14%), A. flavus (7.14%) and A. fumigatus (57.14%). Aspergillus in sputum of HIV patients were Aspergillus niger, A. fumigatus, A. tubingensis and A. flavus. Also, A. niger and A. fumigatus were identified from water and open-air. Phylogenetic analysis of sequences yielded genetic relatedness between clinical and environmental isolates. Water and air in health care settings in Nigeria are important sources of Aspergillus sp. for HIV-TB patients.

  8. A Genome-Wide Survey of the Microsatellite Content of the Globe Artichoke Genome and the Development of a Web-Based Database

    PubMed Central

    Portis, Ezio; Portis, Flavio; Valente, Luisa; Moglia, Andrea; Barchi, Lorenzo; Lanteri, Sergio; Acquadro, Alberto

    2016-01-01

    The recently acquired genome sequence of globe artichoke (Cynara cardunculus var. scolymus) has been used to catalog the genome’s content of simple sequence repeat (SSR) markers. More than 177,000 perfect SSRs were revealed, equivalent to an overall density across the genome of 244.5 SSRs/Mbp, but some 224,000 imperfect SSRs were also identified. About 21% of these SSRs were complex (two stretches of repeats separated by <100 nt). Some 73% of the SSRs were composed of dinucleotide motifs. The SSRs were categorized for the numbers of repeats present, their overall length and were allocated to their linkage group. A total of 4,761 perfect and 6,583 imperfect SSRs were present in 3,781 genes (14.11% of the total), corresponding to an overall density across the gene space of 32,5 and 44,9 SSRs/Mbp for perfect and imperfect motifs, respectively. A putative function has been assigned, using the gene ontology approach, to the set of genes harboring at least one SSR. The same search parameters were applied to reveal the SSR content of 14 other plant species for which genome sequence is available. Certain species-specific SSR motifs were identified, along with a hexa-nucleotide motif shared only with the other two Compositae species (sunflower (Helianthus annuus) and horseweed (Conyza canadensis)) included in the study. Finally, a database, called “Cynara cardunculus MicroSatellite DataBase” (CyMSatDB) was developed to provide a searchable interface to the SSR data. CyMSatDB facilitates the retrieval of SSR markers, as well as suggested forward and reverse primers, on the basis of genomic location, genomic vs genic context, perfect vs imperfect repeat, motif type, motif sequence and repeat number. The SSR markers were validated via an in silico based PCR analysis adopting two available assembled transcriptomes, derived from contrasting globe artichoke accessions, as templates. PMID:27648830

  9. Cloning and Genomic Organization of a Rhamnogalacturonase Gene from Locally Isolated Strain of Aspergillus niger.

    PubMed

    Damak, Naourez; Abdeljalil, Salma; Taeib, Noomen Hadj; Gargouri, Ali

    2015-08-01

    The rhg gene encoding a rhamnogalacturonase was isolated from the novel strain A1 of Aspergillus niger. It consists of an ORF of 1.505 kb encoding a putative protein of 446 amino acids with a predicted molecular mass of 47 kDa, belonging to the family 28 of glycosyl hydrolases. The nature and position of amino acids comprising the active site as well as the three-dimensional structure were well conserved between the A. niger CTM10548 and fungal rhamnogalacturonases. The coding region of the rhg gene is interrupted by three short introns of 56 (introns 1 and 3) and 52 (intron 2) bp in length. The comparison of the peptide sequence with A. niger rhg sequences revealed that the A1 rhg should be an endo-rhamnogalacturonases, more homologous to rhg A than rhg B A. niger known enzymes. The comparison of rhg nucleotide sequence from A. niger A1 with rhg A from A. niger shows several base changes. Most of these changes (59 %) are located at the third base of codons suggesting maintaining the same enzyme function. We used the rhamnogalacturonase A from Aspergillus aculeatus as a template to build a structural model of rhg A1 that adopted a right-handed parallel β-helix.

  10. GBshape: a genome browser database for DNA shape annotations

    PubMed Central

    Chiu, Tsu-Pei; Yang, Lin; Zhou, Tianyin; Main, Bradley J.; Parker, Stephen C.J.; Nuzhdin, Sergey V.; Tullius, Thomas D.; Rohs, Remo

    2015-01-01

    Many regulatory mechanisms require a high degree of specificity in protein-DNA binding. Nucleotide sequence does not provide an answer to the question of why a protein binds only to a small subset of the many putative binding sites in the genome that share the same core motif. Whereas higher-order effects, such as chromatin accessibility, cooperativity and cofactors, have been described, DNA shape recently gained attention as another feature that fine-tunes the DNA binding specificities of some transcription factor families. Our Genome Browser for DNA shape annotations (GBshape; freely available at http://rohslab.cmb.usc.edu/GBshape/) provides minor groove width, propeller twist, roll, helix twist and hydroxyl radical cleavage predictions for the entire genomes of 94 organisms. Additional genomes can easily be added using the GBshape framework. GBshape can be used to visualize DNA shape annotations qualitatively in a genome browser track format, and to download quantitative values of DNA shape features as a function of genomic position at nucleotide resolution. As biological applications, we illustrate the periodicity of DNA shape features that are present in nucleosome-occupied sequences from human, fly and worm, and we demonstrate structural similarities between transcription start sites in the genomes of four Drosophila species. PMID:25326329

  11. Antibiotic Extraction as a Recent Biocontrol Method for Aspergillus Niger andAspergillus Flavus Fungi in Ancient Egyptian mural paintings

    NASA Astrophysics Data System (ADS)

    Hemdan, R. Elmitwalli; Fatma, Helmi M.; Rizk, Mohammed A.; Hagrassy, Abeer F.

    Biodeterioration of mural paintings by Aspergillus niger and Aspergillus flavus Fungi has been proved in different mural paintings in Egypt nowadays. Several researches have studied the effect of fungi on mural paintings, the mechanism of interaction and methods of control. But none of these researches gives us the solution without causing a side effect. In this paper, for the first time, a recent treatment by antibiotic "6 penthyl α pyrone phenol" was applied as a successful technique for elimination of Aspergillus niger and Aspergillus flavus. On the other hand, it is favorable for cleaning Surfaces of Murals executed by tembera technique from the fungi metabolism which caused a black pigments on surfaces.

  12. The antiSMASH database, a comprehensive database of microbial secondary metabolite biosynthetic gene clusters.

    PubMed

    Blin, Kai; Medema, Marnix H; Kottmann, Renzo; Lee, Sang Yup; Weber, Tilmann

    2017-01-04

    Secondary metabolites produced by microorganisms are the main source of bioactive compounds that are in use as antimicrobial and anticancer drugs, fungicides, herbicides and pesticides. In the last decade, the increasing availability of microbial genomes has established genome mining as a very important method for the identification of their biosynthetic gene clusters (BGCs). One of the most popular tools for this task is antiSMASH. However, so far, antiSMASH is limited to de novo computing results for user-submitted genomes and only partially connects these with BGCs from other organisms. Therefore, we developed the antiSMASH database, a simple but highly useful new resource to browse antiSMASH-annotated BGCs in the currently 3907 bacterial genomes in the database and perform advanced search queries combining multiple search criteria. antiSMASH-DB is available at http://antismash-db.secondarymetabolites.org/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. The EMBL nucleotide sequence database

    PubMed Central

    Stoesser, Guenter; Baker, Wendy; van den Broek, Alexandra; Camon, Evelyn; Garcia-Pastor, Maria; Kanz, Carola; Kulikova, Tamara; Lombard, Vincent; Lopez, Rodrigo; Parkinson, Helen; Redaschi, Nicole; Sterk, Peter; Stoehr, Peter; Tuli, Mary Ann

    2001-01-01

    The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. PMID:11125039

  14. [Aspergillus spondylodiscitis. Apropos of 5 cases].

    PubMed

    Cortet, B; Deprez, X; Triki, R; Savage, C; Flipo, R M; Duquesnoy, B; Delcambre, B

    1993-01-01

    Five cases of Aspergillus discitis in male patients are reported. Three patients had impaired immune responses as a result of immunosuppressive therapy following a heart transplant (two cases) or hairy cell leukemia (one case). Two patients had a recent history of mycobacterial infection. All five patients were hospitalized for severe spinal pain suggestive of an inflammatory disease with no neurological abnormalities. Erythrocyte sedimentation rate was elevated in every case. The diagnosis of discitis was suspected on spinal roentgenograms and established by computed tomography and/or magnetic resonance imaging. In three patients the spine was the only site of Aspergillus infection (lumbar discitis in two cases and thoracic discitis in one case). One patient developed Aspergillus infection of several disks (L1-L2, L2-L3, and L4-L5) after Aspergillus endocarditis with embolization to the left lower limb. Another patient developed discitis after an Aspergillus lung infection. In every case, Aspergillus fumigatus was recovered in cultures of specimens harvested by a percutaneous needle biopsy of the intervertebral disk. All five patients were treated by itraconazole which was given as single drug therapy in one case and in combination with 5-flucytosine and amphotericin B in four cases. Recovery was achieved in every case after four to six months of this drug therapy. In contrast to most previously reported cases, none of the five patients reported herein required surgical treatment. Efficacy of conservative treatment in this study may be related to the use of itraconazole in every case.(ABSTRACT TRUNCATED AT 250 WORDS)

  15. Identifying and characterizing the most significant β-glucosidase of the novel species Aspergillus saccharolyticus

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sorensen, Anette; Ahring, Birgitte K.; Lubeck, Mette

    2012-08-20

    A newly discovered fungal species, Aspergillus saccharolyticus, was found to produce a culture broth rich in beta-glucosidase activity. In this present work, the main beta-glucosidase of A. saccharolyticus responsible for the efficient hydrolytic activity was identified, isolated, and characterized. Ion exchange chromatography was used to fractionate the culture broth, yielding fractions with high beta-glucosidase activity and only one visible band on an SDS-PAGE gel. Mass spectrometry analysis of this band gave peptide matches to beta-glucosidases from aspergilli. Through a PCR approach using degenerate primers and genome walking, a 2919 base pair sequence encoding the 860 amino acid BGL1 polypeptide wasmore » determined. BGL1 of A. saccharolyticus has 91% and 82% identity with BGL1 from Aspergillus aculeatus and BGL1 from Aspergillus niger, respectively, both belonging to Glycoside hydrolase family 3. Homology modeling studies suggested beta-glucosidase activity with preserved retaining mechanism and a wider catalytic pocket compared to other beta-glucosidases. The bgl1 gene was heterologously expressed in Trichoderma reesei QM6a, purified, and characterized by enzyme kinetics studies. The enzyme can hydrolyze cellobiose, pNPG, and cellodextrins. The enzyme showed good thermostability, was stable at 50°C, and at 60°C it had a half-life of approximately 6 hours.« less

  16. Genome sequence of an aflatoxigenic pathogen of Argentinian peanut, Aspergillus arachidicola

    USDA-ARS?s Scientific Manuscript database

    In this study we sequenced the genome of the A. arachidicola Type strain (CBS 117610) and found its genome size to be 38.9 Mb, and its number of predicted genes to be 12,091, which are values comparable to those in other sequenced Aspergilli. Of its predicted genes, 691 were identified as unique to ...

  17. SBMDb: first whole genome putative microsatellite DNA marker database of sugarbeet for bioenergy and industrial applications

    PubMed Central

    Iquebal, Mir Asif; Jaiswal, Sarika; Angadi, U.B.; Sablok, Gaurav; Arora, Vasu; Kumar, Sunil; Rai, Anil; Kumar, Dinesh

    2015-01-01

    DNA marker plays important role as valuable tools to increase crop productivity by finding plausible answers to genetic variations and linking the Quantitative Trait Loci (QTL) of beneficial trait. Prior approaches in development of Short Tandem Repeats (STR) markers were time consuming and inefficient. Recent methods invoking the development of STR markers using whole genomic or transcriptomics data has gained wide importance with immense potential in developing breeding and cultivator improvement approaches. Availability of whole genome sequences and in silico approaches has revolutionized bulk marker discovery. We report world’s first sugarbeet whole genome marker discovery having 145 K markers along with 5 K functional domain markers unified in common platform using MySQL, Apache and PHP in SBMDb. Embedded markers and corresponding location information can be selected for desired chromosome, location/interval and primers can be generated using Primer3 core, integrated at backend. Our analyses revealed abundance of ‘mono’ repeat (76.82%) over ‘di’ repeats (13.68%). Highest density (671.05 markers/Mb) was found in chromosome 1 and lowest density (341.27 markers/Mb) in chromosome 6. Current investigation of sugarbeet genome marker density has direct implications in increasing mapping marker density. This will enable present linkage map having marker distance of ∼2 cM, i.e. from 200 to 2.6 Kb, thus facilitating QTL/gene mapping. We also report e-PCR-based detection of 2027 polymorphic markers in panel of five genotypes. These markers can be used for DUS test of variety identification and MAS/GAS in variety improvement program. The present database presents wide source of potential markers for developing and implementing new approaches for molecular breeding required to accelerate industrious use of this crop, especially for sugar, health care products, medicines and color dye. Identified markers will also help in improvement of bioenergy trait

  18. SBMDb: first whole genome putative microsatellite DNA marker database of sugarbeet for bioenergy and industrial applications.

    PubMed

    Iquebal, Mir Asif; Jaiswal, Sarika; Angadi, U B; Sablok, Gaurav; Arora, Vasu; Kumar, Sunil; Rai, Anil; Kumar, Dinesh

    2015-01-01

    DNA marker plays important role as valuable tools to increase crop productivity by finding plausible answers to genetic variations and linking the Quantitative Trait Loci (QTL) of beneficial trait. Prior approaches in development of Short Tandem Repeats (STR) markers were time consuming and inefficient. Recent methods invoking the development of STR markers using whole genomic or transcriptomics data has gained wide importance with immense potential in developing breeding and cultivator improvement approaches. Availability of whole genome sequences and in silico approaches has revolutionized bulk marker discovery. We report world's first sugarbeet whole genome marker discovery having 145 K markers along with 5 K functional domain markers unified in common platform using MySQL, Apache and PHP in SBMDb. Embedded markers and corresponding location information can be selected for desired chromosome, location/interval and primers can be generated using Primer3 core, integrated at backend. Our analyses revealed abundance of 'mono' repeat (76.82%) over 'di' repeats (13.68%). Highest density (671.05 markers/Mb) was found in chromosome 1 and lowest density (341.27 markers/Mb) in chromosome 6. Current investigation of sugarbeet genome marker density has direct implications in increasing mapping marker density. This will enable present linkage map having marker distance of ∼2 cM, i.e. from 200 to 2.6 Kb, thus facilitating QTL/gene mapping. We also report e-PCR-based detection of 2027 polymorphic markers in panel of five genotypes. These markers can be used for DUS test of variety identification and MAS/GAS in variety improvement program. The present database presents wide source of potential markers for developing and implementing new approaches for molecular breeding required to accelerate industrious use of this crop, especially for sugar, health care products, medicines and color dye. Identified markers will also help in improvement of bioenergy trait of

  19. A New Single Nucleotide Polymorphism Database for Rainbow Trout Generated Through Whole Genome Resequencing.

    PubMed

    Gao, Guangtu; Nome, Torfinn; Pearse, Devon E; Moen, Thomas; Naish, Kerry A; Thorgaard, Gary H; Lien, Sigbjørn; Palti, Yniv

    2018-01-01

    within each population. We also provide functional annotation based on the genome position of each SNP and evaluate the use of clonal lines for filtering of PSVs and MSVs. These SNPs form a new database, which provides an important resource for a new high density SNP array design and for other SNP genotyping platforms used for genetic and genomics studies of this iconic salmonid fish species.

  20. Causative Agents of Aspergillosis Including Cryptic Aspergillus Species and A. fumigatus.

    PubMed

    Toyotome, Takahito

    2016-01-01

    Aspergillosis is an important deep mycosis. The causative agents are Aspergillus fumigatus, Aspergillus flavus, Aspergillus niger, and Aspergillus terreus, of which A. fumigatus is the most prevalent. Cryptic Aspergillus spp., which morphologically resemble representative species of each Aspergillus section, also cause aspergillosis. Most of the cryptic species reveal different susceptibility patterns and/or different secondary metabolite profiles, also called exometabolome in this manuscript, from those representative species. On the other hand, azole-resistant A. fumigatus strains in clinical specimens and in the environment have been reported. Therefore, it is imperative to precisely identify the species, including cryptic Aspergillus spp., and evaluate the susceptibility of isolates.In this manuscript, some of the causative cryptic Aspergillus spp. are briefly reviewed. In addition, the exometabolome of Aspergillus section Fumigati is described. Finally, azole resistance of A. fumigatus is also discussed, in reference to several studies from Japan.