Extensive Error in the Number of Genes Inferred from Draft Genome Assemblies
Denton, James F.; Lugo-Martinez, Jose; Tucker, Abraham E.; Schrider, Daniel R.; Warren, Wesley C.; Hahn, Matthew W.
2014-01-01
Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process. PMID:25474019
Extensive error in the number of genes inferred from draft genome assemblies.
Denton, James F; Lugo-Martinez, Jose; Tucker, Abraham E; Schrider, Daniel R; Warren, Wesley C; Hahn, Matthew W
2014-12-01
Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process.
The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans.
Tully, Benjamin J; Graham, Elaina D; Heidelberg, John F
2018-01-16
Microorganisms play a crucial role in mediating global biogeochemical cycles in the marine environment. By reconstructing the genomes of environmental organisms through metagenomics, researchers are able to study the metabolic potential of Bacteria and Archaea that are resistant to isolation in the laboratory. Utilizing the large metagenomic dataset generated from 234 samples collected during the Tara Oceans circumnavigation expedition, we were able to assemble 102 billion paired-end reads into 562 million contigs, which in turn were co-assembled and consolidated in to 7.2 million contigs ≥2 kb in length. Approximately 1 million of these contigs were binned to reconstruct draft genomes. In total, 2,631 draft genomes with an estimated completion of ≥50% were generated (1,491 draft genomes >70% complete; 603 genomes >90% complete). A majority of the draft genomes were manually assigned phylogeny based on sets of concatenated phylogenetic marker genes and/or 16S rRNA gene sequences. The draft genomes are now publically available for the research community at-large.
The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans
Tully, Benjamin J.; Graham, Elaina D.; Heidelberg, John F.
2018-01-01
Microorganisms play a crucial role in mediating global biogeochemical cycles in the marine environment. By reconstructing the genomes of environmental organisms through metagenomics, researchers are able to study the metabolic potential of Bacteria and Archaea that are resistant to isolation in the laboratory. Utilizing the large metagenomic dataset generated from 234 samples collected during the Tara Oceans circumnavigation expedition, we were able to assemble 102 billion paired-end reads into 562 million contigs, which in turn were co-assembled and consolidated in to 7.2 million contigs ≥2 kb in length. Approximately 1 million of these contigs were binned to reconstruct draft genomes. In total, 2,631 draft genomes with an estimated completion of ≥50% were generated (1,491 draft genomes >70% complete; 603 genomes >90% complete). A majority of the draft genomes were manually assigned phylogeny based on sets of concatenated phylogenetic marker genes and/or 16S rRNA gene sequences. The draft genomes are now publically available for the research community at-large. PMID:29337314
CAR: contig assembly of prokaryotic draft genomes using rearrangements.
Lu, Chin Lung; Chen, Kun-Tze; Huang, Shih-Yuan; Chiu, Hsien-Tai
2014-11-28
Next generation sequencing technology has allowed efficient production of draft genomes for many organisms of interest. However, most draft genomes are just collections of independent contigs, whose relative positions and orientations along the genome being sequenced are unknown. Although several tools have been developed to order and orient the contigs of draft genomes, more accurate tools are still needed. In this study, we present a novel reference-based contig assembly (or scaffolding) tool, named as CAR, that can efficiently and more accurately order and orient the contigs of a prokaryotic draft genome based on a reference genome of a related organism. Given a set of contigs in multi-FASTA format and a reference genome in FASTA format, CAR can output a list of scaffolds, each of which is a set of ordered and oriented contigs. For validation, we have tested CAR on a real dataset composed of several prokaryotic genomes and also compared its performance with several other reference-based contig assembly tools. Consequently, our experimental results have shown that CAR indeed performs better than all these other reference-based contig assembly tools in terms of sensitivity, precision and genome coverage. CAR serves as an efficient tool that can more accurately order and orient the contigs of a prokaryotic draft genome based on a reference genome. The web server of CAR is freely available at http://genome.cs.nthu.edu.tw/CAR/ and its stand-alone program can also be downloaded from the same website.
ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers.
Coombe, Lauren; Zhang, Jessica; Vandervalk, Benjamin P; Chu, Justin; Jackman, Shaun D; Birol, Inanc; Warren, René L
2018-06-20
The long-range sequencing information captured by linked reads, such as those available from 10× Genomics (10xG), helps resolve genome sequence repeats, and yields accurate and contiguous draft genome assemblies. We introduce ARKS, an alignment-free linked read genome scaffolding methodology that uses linked reads to organize genome assemblies further into contiguous drafts. Our approach departs from other read alignment-dependent linked read scaffolders, including our own (ARCS), and uses a kmer-based mapping approach. The kmer mapping strategy has several advantages over read alignment methods, including better usability and faster processing, as it precludes the need for input sequence formatting and draft sequence assembly indexing. The reliance on kmers instead of read alignments for pairing sequences relaxes the workflow requirements, and drastically reduces the run time. Here, we show how linked reads, when used in conjunction with Hi-C data for scaffolding, improve a draft human genome assembly of PacBio long-read data five-fold (baseline vs. ARKS NG50 = 4.6 vs. 23.1 Mbp, respectively). We also demonstrate how the method provides further improvements of a megabase-scale Supernova human genome assembly (NG50 = 14.74 Mbp vs. 25.94 Mbp before and after ARKS), which itself exclusively uses linked read data for assembly, with an execution speed six to nine times faster than competitive linked read scaffolders (~ 10.5 h compared to 75.7 h, on average). Following ARKS scaffolding of a human genome 10xG Supernova assembly (of cell line NA12878), fewer than 9 scaffolds cover each chromosome, except the largest (chromosome 1, n = 13). ARKS uses a kmer mapping strategy instead of linked read alignments to record and associate the barcode information needed to order and orient draft assembly sequences. The simplified workflow, when compared to that of our initial implementation, ARCS, markedly improves run time performances on experimental human genome datasets. Furthermore, the novel distance estimator in ARKS utilizes barcoding information from linked reads to estimate gap sizes. It accomplishes this by modeling the relationship between known distances of a region within contigs and calculating associated Jaccard indices. ARKS has the potential to provide correct, chromosome-scale genome assemblies, promptly. We expect ARKS to have broad utility in helping refine draft genomes.
Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi
2015-11-20
The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.
Improving draft genome contiguity with reference-derived in silico mate-pair libraries.
Grau, José Horacio; Hackl, Thomas; Koepfli, Klaus-Peter; Hofreiter, Michael
2018-05-01
Contiguous genome assemblies are a highly valued biological resource because of the higher number of completely annotated genes and genomic elements that are usable compared to fragmented draft genomes. Nonetheless, contiguity is difficult to obtain if only low coverage data and/or only distantly related reference genome assemblies are available. In order to improve genome contiguity, we have developed Cross-Species Scaffolding-a new pipeline that imports long-range distance information directly into the de novo assembly process by constructing mate-pair libraries in silico. We show how genome assembly metrics and gene prediction dramatically improve with our pipeline by assembling two primate genomes solely based on ∼30x coverage of shotgun sequencing data.
Rivera, Yazmin; Zeller, Kurt; Srivastava, Subodh K; Sutherland, Jeremy; Galvez, Marco E; Nakhla, Mark K; Poniatowska, Anna; Schnabel, Guido; Sundin, George W; Abad, Gloria
2018-05-03
Fungi in the genus Monilinia are known to cause devastating brown rot disease of stone and pome fruits. Here, we report the draft genome assemblies of four important phytopathogenic species: Monilinia fructicola, Monilinia fructigena, Monilinia polystroma, and Monilinia laxa. The draft genome assemblies were 39 Mb (M. fructigena), 42 Mb (M. laxa), 43 Mb (M. fructicola), and 45 Mb (M. polystroma) with as few as 550 contigs (M. laxa). These are the first draft genome resources publicly available for M. laxa, M. fructigena, and M. polystroma.
Two low coverage bird genomes and a comparison of reference-guided versus de novo genome assemblies.
Card, Daren C; Schield, Drew R; Reyes-Velasco, Jacobo; Fujita, Matthew K; Andrew, Audra L; Oyler-McCance, Sara J; Fike, Jennifer A; Tomback, Diana F; Ruggiero, Robert P; Castoe, Todd A
2014-01-01
As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (∼3.5-5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark's Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies.
Two low coverage bird genomes and a comparison of reference-guided versus de novo genome assemblies
Card, Daren C.; Schield, Drew R.; Reyes-Velasco, Jacobo; Fujita, Matthre K.; Andrew, Audra L.; Oyler-McCance, Sara J.; Fike, Jennifer A.; Tomback, Diana F.; Ruggiero, Robert P.; Castoe, Todd A.
2014-01-01
As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (~3.5–5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark's Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies.
SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information
2014-01-01
Background The recent introduction of the Pacific Biosciences RS single molecule sequencing technology has opened new doors to scaffolding genome assemblies in a cost-effective manner. The long read sequence information is promised to enhance the quality of incomplete and inaccurate draft assemblies constructed from Next Generation Sequencing (NGS) data. Results Here we propose a novel hybrid assembly methodology that aims to scaffold pre-assembled contigs in an iterative manner using PacBio RS long read information as a backbone. On a test set comprising six bacterial draft genomes, assembled using either a single Illumina MiSeq or Roche 454 library, we show that even a 50× coverage of uncorrected PacBio RS long reads is sufficient to drastically reduce the number of contigs. Comparisons to the AHA scaffolder indicate our strategy is better capable of producing (nearly) complete bacterial genomes. Conclusions The current work describes our SSPACE-LongRead software which is designed to upgrade incomplete draft genomes using single molecule sequences. We conclude that the recent advances of the PacBio sequencing technology and chemistry, in combination with the limited computational resources required to run our program, allow to scaffold genomes in a fast and reliable manner. PMID:24950923
Thompson, Sarah M.; Kalamorz, Falk; David, Charles; Addison, Shea M.; Smith, Grant R.
2018-01-01
ABSTRACT Here, we report the draft genome sequence of “Candidatus Liberibacter europaeus” ASNZ1, assembled from broom psyllids (Arytainilla spartiophila) from New Zealand. The assembly comprises 15 contigs, with a total length of 1.33 Mb and a G+C content of 33.5%. PMID:29773636
Two Low Coverage Bird Genomes and a Comparison of Reference-Guided versus De Novo Genome Assemblies
Card, Daren C.; Schield, Drew R.; Reyes-Velasco, Jacobo; Fujita, Matthew K.; Andrew, Audra L.; Oyler-McCance, Sara J.; Fike, Jennifer A.; Tomback, Diana F.; Ruggiero, Robert P.; Castoe, Todd A.
2014-01-01
As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (∼3.5–5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark's Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies. PMID:25192061
Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum.
VanBuren, Robert; Bryant, Doug; Edger, Patrick P; Tang, Haibao; Burgess, Diane; Challabathula, Dinakar; Spittle, Kristi; Hall, Richard; Gu, Jenny; Lyons, Eric; Freeling, Michael; Bartels, Dorothea; Ten Hallers, Boudewijn; Hastie, Alex; Michael, Todd P; Mockler, Todd C
2015-11-26
Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetium genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a 'near-complete' draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. The Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.
De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds.
Dudchenko, Olga; Batra, Sanjit S; Omer, Arina D; Nyquist, Sarah K; Hoeger, Marie; Durand, Neva C; Shamim, Muhammad S; Machol, Ido; Lander, Eric S; Aiden, Aviva Presser; Aiden, Erez Lieberman
2017-04-07
The Zika outbreak, spread by the Aedes aegypti mosquito, highlights the need to create high-quality assemblies of large genomes in a rapid and cost-effective way. Here we combine Hi-C data with existing draft assemblies to generate chromosome-length scaffolds. We validate this method by assembling a human genome, de novo, from short reads alone (67× coverage). We then combine our method with draft sequences to create genome assemblies of the mosquito disease vectors Ae aegypti and Culex quinquefasciatus , each consisting of three scaffolds corresponding to the three chromosomes in each species. These assemblies indicate that almost all genomic rearrangements among these species occur within, rather than between, chromosome arms. The genome assembly procedure we describe is fast, inexpensive, and accurate, and can be applied to many species. Copyright © 2017, American Association for the Advancement of Science.
Sequencing and De novo Draft Assemblies of the Fathead Minnow (Pimphales promelas)Reference Genome
This study was undertaken to develop genome-scale resources for the fathead minnow (Pimphales promelas) an important model organism widely used in both aquatic ecotoxicology research and in regulatory toxicity testing. We report on the first sequencing and two draft assemblies fo...
De novo genome assembly of the red silk cotton tree (Bombax ceiba).
Gao, Yong; Wang, Haibo; Liu, Chao; Chu, Honglong; Dai, Dongqin; Song, Shengnan; Yu, Long; Han, Lihong; Fu, Yi; Tian, Bin; Tang, Lizhou
2018-05-01
Bombax ceiba L. (the red silk cotton tree) is a large deciduous tree that is distributed in tropical and sub-tropical Asia as well as northern Australia. It has great economic and ecological importance, with several applications in industry and traditional medicine in many Asian countries. To facilitate further utilization of this plant resource, we present here the draft genome sequence for B. ceiba. We assembled a relatively intact genome of B. ceiba by using PacBio single-molecule sequencing and BioNano optical mapping technologies. The final draft genome is approximately 895 Mb long, with contig and scaffold N50 sizes of 1.0 Mb and 2.06 Mb, respectively. The high-quality draft genome assembly of B. ceiba will be a valuable resource enabling further genetic improvement and more effective use of this tree species.
Seuylemezian, Arman; Cooper, Kerry; Schubert, Wayne
2018-01-01
ABSTRACT Spore-forming microorganisms are of concern for forward contamination because they can survive harsh interplanetary travel. Here, we report the draft genome sequences of 12 spore-forming strains isolated from the Manned Spacecraft Operations Building (MSOB) and the Vehicle Assembly Building (VAB) in Cape Canaveral, FL, where the Viking spacecraft were assembled. PMID:29567731
Zhong, Xingyu; Tian, Yuqing; Niu, Guoqing; Tan, Huarong
2013-07-01
A draft genome sequence of Streptomyces ansochromogenes 7100 was generated using 454 sequencing technology. In combination with local BLAST searches and gap filling techniques, a comprehensive antiSMASH-based method was adopted to assemble the secondary metabolite biosynthetic gene clusters in the draft genome of S. ansochromogenes. A total of at least 35 putative gene clusters were identified and assembled. Transcriptional analysis showed that 20 of the 35 gene clusters were expressed in either or all of the three different media tested, whereas the other 15 gene clusters were silent in all three different media. This study provides a comprehensive method to identify and assemble secondary metabolite biosynthetic gene clusters in draft genomes of Streptomyces, and will significantly promote functional studies of these secondary metabolite biosynthetic gene clusters.
GFinisher: a new strategy to refine and finish bacterial genome assemblies
NASA Astrophysics Data System (ADS)
Guizelini, Dieval; Raittz, Roberto T.; Cruz, Leonardo M.; Souza, Emanuel M.; Steffens, Maria B. R.; Pedrosa, Fabio O.
2016-10-01
Despite the development in DNA sequencing technology, improving the number and the length of reads, the process of reconstruction of complete genome sequences, the so called genome assembly, is still complex. Only 13% of the prokaryotic genome sequencing projects have been completed. Draft genome sequences deposited in public databases are fragmented in contigs and may lack the full gene complement. The aim of the present work is to identify assembly errors and improve the assembly process of bacterial genomes. The biological patterns observed in genomic sequences and the application of a priori information can allow the identification of misassembled regions, and the reorganization and improvement of the overall de novo genome assembly. GFinisher starts generating a Fuzzy GC skew graphs for each contig in an assembly and follows breaking down the contigs in critical points in order to reassemble and close them using jFGap. This has been successfully applied to dataset from 96 genome assemblies, decreasing the number of contigs by up to 86%. GFinisher can easily optimize assemblies of prokaryotic draft genomes and can be used to improve the assembly programs based on nucleotide sequence patterns in the genome. The software and source code are available at http://gfinisher.sourceforge.net/.
GFinisher: a new strategy to refine and finish bacterial genome assemblies.
Guizelini, Dieval; Raittz, Roberto T; Cruz, Leonardo M; Souza, Emanuel M; Steffens, Maria B R; Pedrosa, Fabio O
2016-10-10
Despite the development in DNA sequencing technology, improving the number and the length of reads, the process of reconstruction of complete genome sequences, the so called genome assembly, is still complex. Only 13% of the prokaryotic genome sequencing projects have been completed. Draft genome sequences deposited in public databases are fragmented in contigs and may lack the full gene complement. The aim of the present work is to identify assembly errors and improve the assembly process of bacterial genomes. The biological patterns observed in genomic sequences and the application of a priori information can allow the identification of misassembled regions, and the reorganization and improvement of the overall de novo genome assembly. GFinisher starts generating a Fuzzy GC skew graphs for each contig in an assembly and follows breaking down the contigs in critical points in order to reassemble and close them using jFGap. This has been successfully applied to dataset from 96 genome assemblies, decreasing the number of contigs by up to 86%. GFinisher can easily optimize assemblies of prokaryotic draft genomes and can be used to improve the assembly programs based on nucleotide sequence patterns in the genome. The software and source code are available at http://gfinisher.sourceforge.net/.
Positional bias in variant calls against draft reference assemblies.
Briskine, Roman V; Shimizu, Kentaro K
2017-03-28
Whole genome resequencing projects may implement variant calling using draft reference genomes assembled de novo from short-read libraries. Despite lower quality of such assemblies, they allowed researchers to extend a wide range of population genetic and genome-wide association analyses to non-model species. As the variant calling pipelines are complex and involve many software packages, it is important to understand inherent biases and limitations at each step of the analysis. In this article, we report a positional bias present in variant calling performed against draft reference assemblies constructed from de Bruijn or string overlap graphs. We assessed how frequently variants appeared at each position counted from ends of a contig or scaffold sequence, and discovered unexpectedly high number of variants at the positions related to the length of either k-mers or reads used for the assembly. We detected the bias in both publicly available draft assemblies from Assemblathon 2 competition as well as in the assemblies we generated from our simulated short-read data. Simulations confirmed that the bias causing variants are predominantly false positives induced by reads from spatially distant repeated sequences. The bias is particularly strong in contig assemblies. Scaffolding does not eliminate the bias but tends to mitigate it because of the changes in variants' relative positions and alterations in read alignments. The bias can be effectively reduced by filtering out the variants that reside in repetitive elements. Draft genome sequences generated by several popular assemblers appear to be susceptible to the positional bias potentially affecting many resequencing projects in non-model species. The bias is inherent to the assembly algorithms and arises from their particular handling of repeated sequences. It is recommended to reduce the bias by filtering especially if higher-quality genome assembly cannot be achieved. Our findings can help other researchers to improve the quality of their variant data sets and reduce artefactual findings in downstream analyses.
Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum
DOE Office of Scientific and Technical Information (OSTI.GOV)
VanBuren, Robert; Bryant, Doug; Edger, Patrick P.
Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly1. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetiummore » genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a ‘near-complete’ draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. As a result, the Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.« less
Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum
VanBuren, Robert; Bryant, Doug; Edger, Patrick P.; ...
2015-11-11
Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly1. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetiummore » genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a ‘near-complete’ draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. As a result, the Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.« less
Seuylemezian, Arman; Cooper, Kerry; Schubert, Wayne; Vaishampayan, Parag
2018-03-22
Spore-forming microorganisms are of concern for forward contamination because they can survive harsh interplanetary travel. Here, we report the draft genome sequences of 12 spore-forming strains isolated from the Manned Spacecraft Operations Building (MSOB) and the Vehicle Assembly Building (VAB) in Cape Canaveral, FL, where the Viking spacecraft were assembled. Copyright © 2018 Seuylemezian et al.
Draft genome of a Xanthomonas perforans strain associated with pith necrosis.
Torelli, Emanuela; Aiello, Dalia; Polizzi, Giancarlo; Firrao, Giuseppe; Cirvilleri, Gabriella
2015-02-01
Xanthomonas perforans causes bacterial spot of tomato and pepper. A genome draft of an unusual isolate (strain 4P1S2), differing in that it was associated with stem pith necrosis, was assembled from Illumina MiSeq sequencing data using the draft of X. perforans strain 91-118 as a reference. The resulting draft (accession number JRWW00000000) largely overlapped with the reference draft. In addition, the reads not mapping on the reference assembly were selected and used for a further assembly, that revealed a large putative plasmid. The analysis of the predicted proteins showed only few gene features that could be potentially implicated in the switch of a phytopathological behavior. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Besnard, Fabrice; Koutsovoulos, Georgios; Dieudonné, Sana; Blaxter, Mark; Félix, Marie-Anne
2017-01-01
Mapping-by-sequencing has become a standard method to map and identify phenotype-causing mutations in model species. Here, we show that a fragmented draft assembly is sufficient to perform mapping-by-sequencing in nonmodel species. We generated a draft assembly and annotation of the genome of the free-living nematode Oscheius tipulae, a distant relative of the model Caenorhabditis elegans. We used this draft to identify the likely causative mutations at the O. tipulae cov-3 locus, which affect vulval development. The cov-3 locus encodes the O. tipulae ortholog of C. elegans mig-13, and we further show that Cel-mig-13 mutants also have an unsuspected vulval-development phenotype. In a virtuous circle, we were able to use the linkage information collected during mutant mapping to improve the genome assembly. These results showcase the promise of genome-enabled forward genetics in nonmodel species. PMID:28630114
Besnard, Fabrice; Koutsovoulos, Georgios; Dieudonné, Sana; Blaxter, Mark; Félix, Marie-Anne
2017-08-01
Mapping-by-sequencing has become a standard method to map and identify phenotype-causing mutations in model species. Here, we show that a fragmented draft assembly is sufficient to perform mapping-by-sequencing in nonmodel species. We generated a draft assembly and annotation of the genome of the free-living nematode Oscheius tipulae , a distant relative of the model Caenorhabditis elegans We used this draft to identify the likely causative mutations at the O. tipulae cov -3 locus, which affect vulval development. The cov-3 locus encodes the O. tipulae ortholog of C. elegans mig-13 , and we further show that Cel-mig-13 mutants also have an unsuspected vulval-development phenotype. In a virtuous circle, we were able to use the linkage information collected during mutant mapping to improve the genome assembly. These results showcase the promise of genome-enabled forward genetics in nonmodel species. Copyright © 2017 by the Genetics Society of America.
Fathead minnow genome sequencing and assembly
The dataset provides the URLs for accessing the genome sequence data and two draft assemblies as well as fathead minnow genotyping data associated with estimating the heterozygosity of the in-bred line.This dataset is associated with the following publication:Burns, F., L. Cogburn, G. Ankley , D. Villeneuve , E. Waits , Y. Chang, V. Llaca, S. Deschamps, R. Jackson, and R. Hoke. Sequencing and De novo Draft Assemblies of the Fathead Minnow (Pimphales promelas)Reference Genome. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA, 35(1): 212-217, (2016).
Draft Genome Assembly of a Wolbachia Endosymbiont of Plutella australiana
Ward, Christopher M.
2017-01-01
ABSTRACT Wolbachia spp. are endosymbiotic bacteria that infect around 50% of arthropods and cause a broad range of effects, including manipulating host reproduction. Here, we present the annotated draft genome assembly of Wolbachia strain wAus, which infects Plutella australiana, a cryptic ally of the major Brassica pest Plutella xylostella (diamondback moth). PMID:29074653
Draft Genome Assembly of a Wolbachia Endosymbiont of Plutella australiana.
Ward, Christopher M; Baxter, Simon W
2017-10-26
Wolbachia spp. are endosymbiotic bacteria that infect around 50% of arthropods and cause a broad range of effects, including manipulating host reproduction. Here, we present the annotated draft genome assembly of Wolbachia strain wAus, which infects Plutella australiana , a cryptic ally of the major Brassica pest Plutella xylostella (diamondback moth). Copyright © 2017 Ward and Baxter.
O'Brien, Heath E; Gong, Yunchen; Fung, Pauline; Wang, Pauline W; Guttman, David S
2011-01-01
Next-generation genomic technology has both greatly accelerated the pace of genome research as well as increased our reliance on draft genome sequences. While groups such as the Genomics Standards Consortium have made strong efforts to promote genome standards there is a still a general lack of uniformity among published draft genomes, leading to challenges for downstream comparative analyses. This lack of uniformity is a particular problem when using standard draft genomes that frequently have large numbers of low-quality sequencing tracts. Here we present a proposal for an "enhanced-quality draft" genome that identifies at least 95% of the coding sequences, thereby effectively providing a full accounting of the genic component of the genome. Enhanced-quality draft genomes are easily attainable through a combination of small- and large-insert next-generation, paired-end sequencing. We illustrate the generation of an enhanced-quality draft genome by re-sequencing the plant pathogenic bacterium Pseudomonas syringae pv. phaseolicola 1448A (Pph 1448A), which has a published, closed genome sequence of 5.93 Mbp. We use a combination of Illumina paired-end and mate-pair sequencing, and surprisingly find that de novo assemblies with 100x paired-end coverage and mate-pair sequencing with as low as low as 2-5x coverage are substantially better than assemblies based on higher coverage. The rapid and low-cost generation of large numbers of enhanced-quality draft genome sequences will be of particular value for microbial diagnostics and biosecurity, which rely on precise discrimination of potentially dangerous clones from closely related benign strains.
Draft Genome Sequence of Tolypothrix boutellei Strain VB521301
Chandrababunaidu, Mathu Malar; Singh, Deeksha; Sen, Diya; Bhan, Sushma; Das, Subhadeep; Gupta, Akash
2015-01-01
We report here the draft genome sequence of the filamentous nitrogen-fixing cyanobacterium Tolypothrix boutellei strain VB521301. The organism is lipid rich and hydrophobic and produces polyunsaturated fatty acids which can be harnessed for industrial purpose. The draft genome sequence assembled into 11,572,263 bp with 70 scaffolds and 7,777 protein coding genes. PMID:25700407
Draft genome of the Peruvian scallop Argopecten purpuratus.
Li, Chao; Liu, Xiao; Liu, Bo; Ma, Bin; Liu, Fengqiao; Liu, Guilong; Shi, Qiong; Wang, Chunde
2018-04-01
The Peruvian scallop, Argopecten purpuratus, is mainly cultured in southern Chile and Peru was introduced into China in the last century. Unlike other Argopecten scallops, the Peruvian scallop normally has a long life span of up to 7 to 10 years. Therefore, researchers have been using it to develop hybrid vigor. Here, we performed whole genome sequencing, assembly, and gene annotation of the Peruvian scallop, with an important aim to develop genomic resources for genetic breeding in scallops. A total of 463.19-Gb raw DNA reads were sequenced. A draft genome assembly of 724.78 Mb was generated (accounting for 81.87% of the estimated genome size of 885.29 Mb), with a contig N50 size of 80.11 kb and a scaffold N50 size of 1.02 Mb. Repeat sequences were calculated to reach 33.74% of the whole genome, and 26,256 protein-coding genes and 3,057 noncoding RNAs were predicted from the assembly. We generated a high-quality draft genome assembly of the Peruvian scallop, which will provide a solid resource for further genetic breeding and for the analysis of the evolutionary history of this economically important scallop.
Reducing assembly complexity of microbial genomes with single-molecule sequencing
USDA-ARS?s Scientific Manuscript database
Genome assembly algorithms cannot fully reconstruct microbial chromosomes from the DNA reads output by first or second-generation sequencing instruments. Therefore, most genomes are left unfinished due to the significant resources required to manually close gaps left in the draft assemblies. Single-...
Draft genome sequence of Lactobacillus mali KCTC 3596.
Kim, Dong-Wook; Choi, Sang-Haeng; Kang, Aram; Nam, Seong-Hyeuk; Kim, Dae-Soo; Kim, Ryong Nam; Kim, Aeri; Park, Hong-Seog
2011-09-01
We announce the draft genome sequence of the type strain Lactobacillus mali KCTC 3596 (2,652,969 bp, with a G+C content of 36.0%), which is one of the most prevalent lactic acid bacteria present during the manufacturing process of apple juice. The genome consists of 122 large contigs (>100 bp). All of the contigs were assembled by Newbler Assembler 2.3 (454 Life Science). Copyright © 2011, American Society for Microbiology. All Rights Reserved.
Yasui, Yasuo; Hirakawa, Hideki; Ueno, Mariko; Matsui, Katsuhiro; Katsube-Tanaka, Tomoyuki; Yang, Soo Jung; Aii, Jotaro; Sato, Shingo; Mori, Masashi
2016-01-01
Buckwheat (Fagopyrum esculentum Moench; 2n = 2x = 16) is a nutritionally dense annual crop widely grown in temperate zones. To accelerate molecular breeding programmes of this important crop, we generated a draft assembly of the buckwheat genome using short reads obtained by next-generation sequencing (NGS), and constructed the Buckwheat Genome DataBase. After assembling short reads, we determined 387,594 scaffolds as the draft genome sequence (FES_r1.0). The total length of FES_r1.0 was 1,177,687,305 bp, and the N50 of the scaffolds was 25,109 bp. Gene prediction analysis revealed 286,768 coding sequences (CDSs; FES_r1.0_cds) including those related to transposable elements. The total length of FES_r1.0_cds was 212,917,911 bp, and the N50 was 1,101 bp. Of these, the functions of 35,816 CDSs excluding those for transposable elements were annotated by BLAST analysis. To demonstrate the utility of the database, we conducted several test analyses using BLAST and keyword searches. Furthermore, we used the draft genome as a reference sequence for NGS-based markers, and successfully identified novel candidate genes controlling heteromorphic self-incompatibility of buckwheat. The database and draft genome sequence provide a valuable resource that can be used in efforts to develop buckwheat cultivars with superior agronomic traits. PMID:27037832
Mavromatis, Konstantinos; Land, Miriam L; Brettin, Thomas S; Quest, Daniel J; Copeland, Alex; Clum, Alicia; Goodwin, Lynne; Woyke, Tanja; Lapidus, Alla; Klenk, Hans Peter; Cottingham, Robert W; Kyrpides, Nikos C
2012-01-01
The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation. In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis. These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).
Draft Genome Sequence of Tolypothrix boutellei Strain VB521301.
Chandrababunaidu, Mathu Malar; Singh, Deeksha; Sen, Diya; Bhan, Sushma; Das, Subhadeep; Gupta, Akash; Adhikary, Siba Prasad; Tripathy, Sucheta
2015-02-19
We report here the draft genome sequence of the filamentous nitrogen-fixing cyanobacterium Tolypothrix boutellei strain VB521301. The organism is lipid rich and hydrophobic and produces polyunsaturated fatty acids which can be harnessed for industrial purpose. The draft genome sequence assembled into 11,572,263 bp with 70 scaffolds and 7,777 protein coding genes. Copyright © 2015 Chandrababunaidu et al.
Torres, Andrea C; Suárez, Nadia E; Font, Graciela; Saavedra, Lucila; Taranto, María Pía
2016-08-25
We report here the draft genome sequence of Lactobacillus reuteri strain CRL 1098. This strain represents an interesting candidate for functional food development because of its proven probiotic properties. The draft genome sequence is composed of 1,969,471 bp assembled into 45 contigs and an average G+C content of 38.8%. Copyright © 2016 Torres et al.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Starkenburg, S. R.; Polle, J. E. W.; Hovde, B.
ABSTRACT The green alga Scenedesmus obliquus is an emerging platform species for the industrial production of biofuels. Here, we report the draft assembly and annotation for the nuclear, plastid, and mitochondrial genomes of S. obliquus strain DOE0152z.
Starkenburg, S. R.; Polle, J. E. W.; Hovde, B.; ...
2017-08-10
ABSTRACT The green alga Scenedesmus obliquus is an emerging platform species for the industrial production of biofuels. Here, we report the draft assembly and annotation for the nuclear, plastid, and mitochondrial genomes of S. obliquus strain DOE0152z.
Whole genome sequencing of Chinese clearhead icefish, Protosalanx hyalocranius.
Liu, Kai; Xu, Dongpo; Li, Jia; Bian, Chao; Duan, Jinrong; Zhou, Yanfeng; Zhang, Minying; You, Xinxin; You, Yang; Chen, Jieming; Yu, Hui; Xu, Gangchun; Fang, Di-An; Qiang, Jun; Jiang, Shulun; He, Jie; Xu, Junmin; Shi, Qiong; Zhang, Zhiyong; Xu, Pao
2017-04-01
Chinese clearhead icefish, Protosalanx hyalocranius , is a representative icefish species with economic importance and special appearance. Due to its great economic value in China, the fish was introduced into Lake Dianchi and several other lakes from the Lake Taihu half a century ago. Similar to the Sinocyclocheilus cavefish, the clearhead icefish has certain cavefish-like traits, such as transparent body and nearly scaleless skin. Here, we provide the whole genome sequence of this surface-dwelling fish and generated a draft genome assembly, aiming at exploring molecular mechanisms for the biological interests. A total of 252.1 Gb of raw reads were sequenced. Subsequently, a novel draft genome assembly was generated, with the scaffold N50 reaching 1.163 Mb. The genome completeness was estimated to be 98.39 % by using the CEGMA evaluation. Finally, we annotated 19 884 protein-coding genes and observed that repeat sequences account for 24.43 % of the genome assembly. We report the first draft genome of the Chinese clearhead icefish. The genome assembly will provide a solid foundation for further molecular breeding and germplasm resource protection in Chinese clearhead icefish, as well as other icefishes. It is also a valuable genetic resource for revealing the molecular mechanisms for the cavefish-like characters. © The Authors 2017. Published by Oxford University Press.
A physical map of the bovine genome
Snelling, Warren M; Chiu, Readman; Schein, Jacqueline E; Hobbs, Matthew; Abbey, Colette A; Adelson, David L; Aerts, Jan; Bennett, Gary L; Bosdet, Ian E; Boussaha, Mekki; Brauning, Rudiger; Caetano, Alexandre R; Costa, Marcos M; Crawford, Allan M; Dalrymple, Brian P; Eggen, André; Everts-van der Wind, Annelie; Floriot, Sandrine; Gautier, Mathieu; Gill, Clare A; Green, Ronnie D; Holt, Robert; Jann, Oliver; Jones, Steven JM; Kappes, Steven M; Keele, John W; de Jong, Pieter J; Larkin, Denis M; Lewin, Harris A; McEwan, John C; McKay, Stephanie; Marra, Marco A; Mathewson, Carrie A; Matukumalli, Lakshmi K; Moore, Stephen S; Murdoch, Brenda; Nicholas, Frank W; Osoegawa, Kazutoyo; Roy, Alice; Salih, Hanni; Schibler, Laurent; Schnabel, Robert D; Silveri, Licia; Skow, Loren C; Smith, Timothy PL; Sonstegard, Tad S; Taylor, Jeremy F; Tellam, Ross; Van Tassell, Curtis P; Williams, John L; Womack, James E; Wye, Natasja H; Yang, George; Zhao, Shaying
2007-01-01
Background Cattle are important agriculturally and relevant as a model organism. Previously described genetic and radiation hybrid (RH) maps of the bovine genome have been used to identify genomic regions and genes affecting specific traits. Application of these maps to identify influential genetic polymorphisms will be enhanced by integration with each other and with bacterial artificial chromosome (BAC) libraries. The BAC libraries and clone maps are essential for the hybrid clone-by-clone/whole-genome shotgun sequencing approach taken by the bovine genome sequencing project. Results A bovine BAC map was constructed with HindIII restriction digest fragments of 290,797 BAC clones from animals of three different breeds. Comparative mapping of 422,522 BAC end sequences assisted with BAC map ordering and assembly. Genotypes and pedigree from two genetic maps and marker scores from three whole-genome RH panels were consolidated on a 17,254-marker composite map. Sequence similarity allowed integrating the BAC and composite maps with the bovine draft assembly (Btau3.1), establishing a comprehensive resource describing the bovine genome. Agreement between the marker and BAC maps and the draft assembly is high, although discrepancies exist. The composite and BAC maps are more similar than either is to the draft assembly. Conclusion Further refinement of the maps and greater integration into the genome assembly process may contribute to a high quality assembly. The maps provide resources to associate phenotypic variation with underlying genomic variation, and are crucial resources for understanding the biology underpinning this important ruminant species so closely associated with humans. PMID:17697342
Draft genome of the Northern snakehead, Channa argus.
Xu, Jian; Bian, Chao; Chen, Kunci; Liu, Guiming; Jiang, Yanliang; Luo, Qing; You, Xinxin; Peng, Wenzhu; Li, Jia; Huang, Yu; Yi, Yunhai; Dong, Chuanju; Deng, Hua; Zhang, Songhao; Zhang, Hanyuan; Shi, Qiong; Xu, Peng
2017-04-01
The Northern snakehead (Channa argus), a member of the Channidae family of the Perciformes, is an economically important freshwater fish native to East Asia. In North America, it has become notorious as an intentionally released invasive species. Its ability to breathe air with gills and migrate short distances over land makes it a good model for bimodal breath research. Therefore, recent research has focused on the identification of relevant candidate genes. Here, we performed whole genome sequencing of C. argus to construct its draft genome, aiming to offer useful information for further functional studies and identification of target genes related to its unusual facultative air breathing. Findings: We assembled the C. argus genome with a total of 140.3 Gb of raw reads, which were sequenced using the Illumina HiSeq2000 platform. The final draft genome assembly was approximately 615.3 Mb, with a contig N50 of 81.4 kb and scaffold N50 of 4.5 Mb. The identified repeat sequences account for 18.9% of the whole genome. The 19 877 protein-coding genes were predicted from the genome assembly, with an average of 10.5 exons per gene. Conclusion: We generated a high-quality draft genome of C. argus, which will provide a valuable genetic resource for further biomedical investigations of this economically important teleost fish. © The Author 2017. Published by Oxford University Press.
Sen, Diya; Chandrababunaidu, Mathu Malar; Singh, Deeksha; Sanghi, Neha; Ghorai, Arpita; Mishra, Gyan Prakash; Madduluri, Madhavi
2015-01-01
We report here the draft genome sequence of Scytonema millei VB511283, a cyanobacterium isolated from biofilms on the exterior of stone monuments in Santiniketan, eastern India. The draft genome is 11,627,246 bp long (11.63 Mb), with 118 scaffolds. About 9,011 protein-coding genes, 117 tRNAs, and 12 rRNAs are predicted from this assembly. PMID:25744984
USDA-ARS?s Scientific Manuscript database
We announce the draft genome assembly of Lactococcus garvieae str. PAQ102015-99, a recently isolated strain from an outbreak of lactococcosis at a commercial trout farm in the Northwestern US. The draft genome comprises 14 contigs totaling 2,068,357 bp with an N50 of 496,618 bp and average G+C conte...
Kisand, Veljo; Lettieri, Teresa
2013-04-01
De novo genome sequencing of previously uncharacterized microorganisms has the potential to open up new frontiers in microbial genomics by providing insight into both functional capabilities and biodiversity. Until recently, Roche 454 pyrosequencing was the NGS method of choice for de novo assembly because it generates hundreds of thousands of long reads (<450 bps), which are presumed to aid in the analysis of uncharacterized genomes. The array of tools for processing NGS data are increasingly free and open source and are often adopted for both their high quality and role in promoting academic freedom. The error rate of pyrosequencing the Alcanivorax borkumensis genome was such that thousands of insertions and deletions were artificially introduced into the finished genome. Despite a high coverage (~30 fold), it did not allow the reference genome to be fully mapped. Reads from regions with errors had low quality, low coverage, or were missing. The main defect of the reference mapping was the introduction of artificial indels into contigs through lower than 100% consensus and distracting gene calling due to artificial stop codons. No assembler was able to perform de novo assembly comparable to reference mapping. Automated annotation tools performed similarly on reference mapped and de novo draft genomes, and annotated most CDSs in the de novo assembled draft genomes. Free and open source software (FOSS) tools for assembly and annotation of NGS data are being developed rapidly to provide accurate results with less computational effort. Usability is not high priority and these tools currently do not allow the data to be processed without manual intervention. Despite this, genome assemblers now readily assemble medium short reads into long contigs (>97-98% genome coverage). A notable gap in pyrosequencing technology is the quality of base pair calling and conflicting base pairs between single reads at the same nucleotide position. Regardless, using draft whole genomes that are not finished and remain fragmented into tens of contigs allows one to characterize unknown bacteria with modest effort.
Moll, Karen M; Zhou, Peng; Ramaraj, Thiruvarangan; Fajardo, Diego; Devitt, Nicholas P; Sadowsky, Michael J; Stupar, Robert M; Tiffin, Peter; Miller, Jason R; Young, Nevin D; Silverstein, Kevin A T; Mudge, Joann
2017-08-04
Third generation sequencing technologies, with sequencing reads in the tens- of kilo-bases, facilitate genome assembly by spanning ambiguous regions and improving continuity. This has been critical for plant genomes, which are difficult to assemble due to high repeat content, gene family expansions, segmental and tandem duplications, and polyploidy. Recently, high-throughput mapping and scaffolding strategies have further improved continuity. Together, these long-range technologies enable quality draft assemblies of complex genomes in a cost-effective and timely manner. Here, we present high quality genome assemblies of the model legume plant, Medicago truncatula (R108) using PacBio, Dovetail Chicago (hereafter, Dovetail) and BioNano technologies. To test these technologies for plant genome assembly, we generated five assemblies using all possible combinations and ordering of these three technologies in the R108 assembly. While the BioNano and Dovetail joins overlapped, they also showed complementary gains in continuity and join numbers. Both technologies spanned repetitive regions that PacBio alone was unable to bridge. Combining technologies, particularly Dovetail followed by BioNano, resulted in notable improvements compared to Dovetail or BioNano alone. A combination of PacBio, Dovetail, and BioNano was used to generate a high quality draft assembly of R108, a M. truncatula accession widely used in studies of functional genomics. As a test for the usefulness of the resulting genome sequence, the new R108 assembly was used to pinpoint breakpoints and characterize flanking sequence of a previously identified translocation between chromosomes 4 and 8, identifying more than 22.7 Mb of novel sequence not present in the earlier A17 reference assembly. Adding Dovetail followed by BioNano data yielded complementary improvements in continuity over the original PacBio assembly. This strategy proved efficient and cost-effective for developing a quality draft assembly compared to traditional reference assemblies.
Draft Genome Sequence of Lactobacillus plantarum Strain IPLA 88
Ladero, Victor; Alvarez-Sieiro, Patricia; Redruello, Begoña; del Rio, Beatriz; Linares, Daniel M.; Martin, M. Cruz; Fernández, María
2013-01-01
Here, we report a 3.2-Mbp draft assembly for the genome of Lactobacillus plantarum IPLA 88. The sequence of this sourdough isolate provides insight into the adaptation of this versatile species to different environments. PMID:23887921
A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs.
Swain, Martin T; Tsai, Isheng J; Assefa, Samual A; Newbold, Chris; Berriman, Matthew; Otto, Thomas D
2012-06-07
Genome projects now produce draft assemblies within weeks owing to advanced high-throughput sequencing technologies. For milestone projects such as Escherichia coli or Homo sapiens, teams of scientists were employed to manually curate and finish these genomes to a high standard. Nowadays, this is not feasible for most projects, and the quality of genomes is generally of a much lower standard. This protocol describes software (PAGIT) that is used to improve the quality of draft genomes. It offers flexible functionality to close gaps in scaffolds, correct base errors in the consensus sequence and exploit reference genomes (if available) in order to improve scaffolding and generating annotations. The protocol is most accessible for bacterial and small eukaryotic genomes (up to 300 Mb), such as pathogenic bacteria, malaria and parasitic worms. Applying PAGIT to an E. coli assembly takes ∼24 h: it doubles the average contig size and annotates over 4,300 gene models.
Augmenting Chinese hamster genome assembly by identifying regions of high confidence.
Vishwanathan, Nandita; Bandyopadhyay, Arpan A; Fu, Hsu-Yuan; Sharma, Mohit; Johnson, Kathryn C; Mudge, Joann; Ramaraj, Thiruvarangan; Onsongo, Getiria; Silverstein, Kevin A T; Jacob, Nitya M; Le, Huong; Karypis, George; Hu, Wei-Shou
2016-09-01
Chinese hamster Ovary (CHO) cell lines are the dominant industrial workhorses for therapeutic recombinant protein production. The availability of genome sequence of Chinese hamster and CHO cells will spur further genome and RNA sequencing of producing cell lines. However, the mammalian genomes assembled using shot-gun sequencing data still contain regions of uncertain quality due to assembly errors. Identifying high confidence regions in the assembled genome will facilitate its use for cell engineering and genome engineering. We assembled two independent drafts of Chinese hamster genome by de novo assembly from shotgun sequencing reads and by re-scaffolding and gap-filling the draft genome from NCBI for improved scaffold lengths and gap fractions. We then used the two independent assemblies to identify high confidence regions using two different approaches. First, the two independent assemblies were compared at the sequence level to identify their consensus regions as "high confidence regions" which accounts for at least 78 % of the assembled genome. Further, a genome wide comparison of the Chinese hamster scaffolds with mouse chromosomes revealed scaffolds with large blocks of collinearity, which were also compiled as high-quality scaffolds. Genome scale collinearity was complemented with EST based synteny which also revealed conserved gene order compared to mouse. As cell line sequencing becomes more commonly practiced, the approaches reported here are useful for assessing the quality of assembly and potentially facilitate the engineering of cell lines. Copyright © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Yoshida, Catherine E; Kruczkiewicz, Peter; Laing, Chad R; Lingohr, Erika J; Gannon, Victor P J; Nash, John H E; Taboada, Eduardo N
2016-01-01
For nearly 100 years serotyping has been the gold standard for the identification of Salmonella serovars. Despite the increasing adoption of DNA-based subtyping approaches, serotype information remains a cornerstone in food safety and public health activities aimed at reducing the burden of salmonellosis. At the same time, recent advances in whole-genome sequencing (WGS) promise to revolutionize our ability to perform advanced pathogen characterization in support of improved source attribution and outbreak analysis. We present the Salmonella In Silico Typing Resource (SISTR), a bioinformatics platform for rapidly performing simultaneous in silico analyses for several leading subtyping methods on draft Salmonella genome assemblies. In addition to performing serovar prediction by genoserotyping, this resource integrates sequence-based typing analyses for: Multi-Locus Sequence Typing (MLST), ribosomal MLST (rMLST), and core genome MLST (cgMLST). We show how phylogenetic context from cgMLST analysis can supplement the genoserotyping analysis and increase the accuracy of in silico serovar prediction to over 94.6% on a dataset comprised of 4,188 finished genomes and WGS draft assemblies. In addition to allowing analysis of user-uploaded whole-genome assemblies, the SISTR platform incorporates a database comprising over 4,000 publicly available genomes, allowing users to place their isolates in a broader phylogenetic and epidemiological context. The resource incorporates several metadata driven visualizations to examine the phylogenetic, geospatial and temporal distribution of genome-sequenced isolates. As sequencing of Salmonella isolates at public health laboratories around the world becomes increasingly common, rapid in silico analysis of minimally processed draft genome assemblies provides a powerful approach for molecular epidemiology in support of public health investigations. Moreover, this type of integrated analysis using multiple sequence-based methods of sub-typing allows for continuity with historical serotyping data as we transition towards the increasing adoption of genomic analyses in epidemiology. The SISTR platform is freely available on the web at https://lfz.corefacility.ca/sistr-app/.
Yasui, Yasuo; Hirakawa, Hideki; Ueno, Mariko; Matsui, Katsuhiro; Katsube-Tanaka, Tomoyuki; Yang, Soo Jung; Aii, Jotaro; Sato, Shingo; Mori, Masashi
2016-06-01
Buckwheat (Fagopyrum esculentum Moench; 2n = 2x = 16) is a nutritionally dense annual crop widely grown in temperate zones. To accelerate molecular breeding programmes of this important crop, we generated a draft assembly of the buckwheat genome using short reads obtained by next-generation sequencing (NGS), and constructed the Buckwheat Genome DataBase. After assembling short reads, we determined 387,594 scaffolds as the draft genome sequence (FES_r1.0). The total length of FES_r1.0 was 1,177,687,305 bp, and the N50 of the scaffolds was 25,109 bp. Gene prediction analysis revealed 286,768 coding sequences (CDSs; FES_r1.0_cds) including those related to transposable elements. The total length of FES_r1.0_cds was 212,917,911 bp, and the N50 was 1,101 bp. Of these, the functions of 35,816 CDSs excluding those for transposable elements were annotated by BLAST analysis. To demonstrate the utility of the database, we conducted several test analyses using BLAST and keyword searches. Furthermore, we used the draft genome as a reference sequence for NGS-based markers, and successfully identified novel candidate genes controlling heteromorphic self-incompatibility of buckwheat. The database and draft genome sequence provide a valuable resource that can be used in efforts to develop buckwheat cultivars with superior agronomic traits. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Draft Genome Sequence of Gordonia sp. Strain UCD-TK1 (Phylum Actinobacteria)
Koenigsaecker, Tynisha M.; Coil, David A.
2016-01-01
Here, we present the draft genome of Gordonia sp. strain UCD-TK1. The assembly contains 5,470,576 bp in 98 contigs. This strain was isolated from a disinfected ambulatory surgery center. PMID:27738036
Sen, Diya; Chandrababunaidu, Mathu Malar; Singh, Deeksha; Sanghi, Neha; Ghorai, Arpita; Mishra, Gyan Prakash; Madduluri, Madhavi; Adhikary, Siba Prasad; Tripathy, Sucheta
2015-03-05
We report here the draft genome sequence of Scytonema millei VB511283, a cyanobacterium isolated from biofilms on the exterior of stone monuments in Santiniketan, eastern India. The draft genome is 11,627,246 bp long (11.63 Mb), with 118 scaffolds. About 9,011 protein-coding genes, 117 tRNAs, and 12 rRNAs are predicted from this assembly. Copyright © 2015 Sen et al.
Tamariz, Jesus; Llanos, Carlos; Seas, Carlos; Montenegro, Paola; Lagos, Jose; Fernandes, Miriam R; Cerdeira, Louise; Lincopan, Nilton
2018-03-29
We present here the draft genome sequence of the first New Delhi metallo-β-lactamase (NDM-1)-producing Escherichia coli strain, belonging to sequence type 155 (ST155), isolated in Peru. Assembly of this draft genome resulted in 5,061,184 bp, revealing a clinically significant resistome for β-lactams, aminoglycosides, tetracyclines, phenicols, sulfonamides, trimethoprim, and fluoroquinolones. Copyright © 2018 Tamariz et al.
Draft genome sequence of Dactylonectria macrodydima, a plant pathogenic fungus in the Nectriaceae
USDA-ARS?s Scientific Manuscript database
Dactylonectria macrodidyma is part of the Nectriaceae, a family containing important plant pathogens. This species possesses the ability to induce disease on grapevine, avocado and olive. Here, we report the first draft genome of D. macrodidyma isolate JAC15-08. The assembled genome was 58 Mbp and c...
The genome of black cottonwood, Populus trichocarpa (Torr. & Gray)
G.A. Tuskan; S. DiFazio; S. Jansson; J. Bohlmann; I. Grigoriev; U. Hellsten; N. Putnam; S. Ralph; S. Rombauts; A. Salamov; J. Schein; L. Sterck; A. Aerts; R.R. Bhalerao; R.P. Bhalerao; D. Blaudez; W. Boerjan; A. Brun; A. Brunner; V. Busov; M. Campbell; J. Carlson; M. Chalot; J. Chapman; G.-L. Chen; D. Cooper; P.M. Coutinho; J. Couturier; S. Covert; Q. Cronk; R. Cunningham; J. Davis; S. Degroeve; A. Dejardin; C. dePamphilis; J. Detter; B. Dirks; U. Dubchak; S. Duplessis; J. Ehlting; B. Ellis; K. Gendler; D. Goodstein; M. Gribskov; J. Grimwood; A. Groover; L. Gunter; B. Hamberger; B. Heinze; Y. Helariutta; B. Henrissat; D. Holligan; R. Holt; W. Huang; N. Islam-Faridi; S. Jones; M. Jones-Rhoades; R. Jorgensen; C. Joshi; J. Kangasjarvi; J. Karlsson; C. Kelleher; R. Kirkpatrick; M. Kirst; A. Kohler; U. Kalluri; F. Larimer; J. Leebens-Mack; J.-C. Leple; P. Locascio; Y. Lou; S. Lucas; F. Martin; B. Montanini; C. Napoli; D.R. Nelson; C. Nelson; K. Nieminen; O. Nilsson; V. Pereda; G. Peter; R. Philippe; G. Pilate; A. Poliakov; J. Razumovskaya; P. Richardson; C. Rinaldi; K. Ritland; P. Rouze; D. Ryaboy; J. Schumtz; J. Schrader; B. Segerman; H. Shin; A. Siddiqui; F. Sterky; A. Terry; C.-J. Tsai; E. Uberbacher; P. Unneberg; J. Vahala; K. Wall; S. Wessler; G. Yang; T. Yin; C. Douglas; M. Marra; G. Sandberg; Y. Van de Peer; D. Rokhsar
2006-01-01
We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs...
The draft genome sequence of cork oak
Ramos, António Marcos; Usié, Ana; Barbosa, Pedro; Barros, Pedro M.; Capote, Tiago; Chaves, Inês; Simões, Fernanda; Abreu, Isabl; Carrasquinho, Isabel; Faro, Carlos; Guimarães, Joana B.; Mendonça, Diogo; Nóbrega, Filomena; Rodrigues, Leandra; Saibo, Nelson J. M.; Varela, Maria Carolina; Egas, Conceição; Matos, José; Miguel, Célia M.; Oliveira, M. Margarida; Ricardo, Cândido P.; Gonçalves, Sónia
2018-01-01
Cork oak (Quercus suber) is native to southwest Europe and northwest Africa where it plays a crucial environmental and economical role. To tackle the cork oak production and industrial challenges, advanced research is imperative but dependent on the availability of a sequenced genome. To address this, we produced the first draft version of the cork oak genome. We followed a de novo assembly strategy based on high-throughput sequence data, which generated a draft genome comprising 23,347 scaffolds and 953.3 Mb in size. A total of 79,752 genes and 83,814 transcripts were predicted, including 33,658 high-confidence genes. An InterPro signature assignment was detected for 69,218 transcripts, which represented 82.6% of the total. Validation studies demonstrated the genome assembly and annotation completeness and highlighted the usefulness of the draft genome for read mapping of high-throughput sequence data generated using different protocols. All data generated is available through the public databases where it was deposited, being therefore ready to use by the academic and industry communities working on cork oak and/or related species. PMID:29786699
The draft genome sequence of cork oak.
Ramos, António Marcos; Usié, Ana; Barbosa, Pedro; Barros, Pedro M; Capote, Tiago; Chaves, Inês; Simões, Fernanda; Abreu, Isabl; Carrasquinho, Isabel; Faro, Carlos; Guimarães, Joana B; Mendonça, Diogo; Nóbrega, Filomena; Rodrigues, Leandra; Saibo, Nelson J M; Varela, Maria Carolina; Egas, Conceição; Matos, José; Miguel, Célia M; Oliveira, M Margarida; Ricardo, Cândido P; Gonçalves, Sónia
2018-05-22
Cork oak (Quercus suber) is native to southwest Europe and northwest Africa where it plays a crucial environmental and economical role. To tackle the cork oak production and industrial challenges, advanced research is imperative but dependent on the availability of a sequenced genome. To address this, we produced the first draft version of the cork oak genome. We followed a de novo assembly strategy based on high-throughput sequence data, which generated a draft genome comprising 23,347 scaffolds and 953.3 Mb in size. A total of 79,752 genes and 83,814 transcripts were predicted, including 33,658 high-confidence genes. An InterPro signature assignment was detected for 69,218 transcripts, which represented 82.6% of the total. Validation studies demonstrated the genome assembly and annotation completeness and highlighted the usefulness of the draft genome for read mapping of high-throughput sequence data generated using different protocols. All data generated is available through the public databases where it was deposited, being therefore ready to use by the academic and industry communities working on cork oak and/or related species.
USDA-ARS?s Scientific Manuscript database
The genome of the cattle tick R. microplus, an ectoparasite with global distribution, is estimated to be 7.1 Gbp and consists of ~70% repetitive DNA. We report the first assembly of a tick genome that utilized a hybrid sequencing and assembly approach to capture the repetitive fractions of the genom...
Guida, Brandon S; Garcia-Pichel, Ferran
2016-01-28
Mastigocoleus testarum strain BC008 is a model organism used to study marine photoautotrophic carbonate dissolution. It is a multicellular, filamentous, diazotrophic, euendolithic cyanobacterium ubiquitously found in marine benthic environments. We present an accurate draft genome assembly of 172 contigs spanning 12,700,239 bp with 9,131 annotated genes with an average G+C% of 37.3. Copyright © 2016 Guida and Garcia-Pichel.
Batista, Thiago M; Moreira, Rennan G; Hilário, Heron O; Morais, Camila G; Franco, Glória R; Rosa, Luiz H; Rosa, Carlos A
2017-03-01
We present the draft genome sequence of the type strain of the yeast Sugiyamaella xylanicola UFMG-CM-Y1884 T (= UFMG-CA-32.1 T = CBS 12683 T ), a xylan-degrading species capable of fermenting d-xylose to ethanol. The assembled genome has a size of ~ 13.7 Mb and a GC content of 33.8% and contains 5971 protein-coding genes. We identified 15 genes with significant similarity to the d-xylose reductase gene from several other fungal species. The draft genome assembled from whole-genome shotgun sequencing of the yeast Sugiyamaella xylanicola UFMG-CM-Y1884 T (= UFMG-CA-32.1 T = CBS 12683 T ) has been deposited at DDBJ/ENA/GenBank under the accession number MQSX00000000 under version MQSX01000000.
Sharma, Sandeep; Zaccaron, Alex Z; Ridenour, John B; Allen, Tom W; Conner, Kassie; Doyle, Vinson P; Price, Trey; Sikora, Edward; Singh, Raghuwinder; Spurlock, Terry; Tomaso-Peterson, Maria; Wilkerson, Tessie; Bluhm, Burton H
2018-04-01
The draft genome of Xylaria sp. isolate MSU_SB201401, causal agent of taproot decline of soybean in the southern U.S., is presented here. The genome assembly was 56.7 Mb in size with an L50 of 246. A total of 10,880 putative protein-encoding genes were predicted, including 647 genes encoding carbohydrate-active enzymes and 1053 genes encoding secreted proteins. This is the first draft genome of a plant-pathogenic Xylaria sp. associated with soybean. The draft genome of Xylaria sp. isolate MSU_SB201401 will provide an important resource for future experiments to determine the molecular basis of pathogenesis.
Melo, Ricardo Rodrigues de; Persinoti, Gabriela Felix; Paixão, Douglas Antonio Alvaredo; Squina, Fábio Márcio; Ruller, Roberto; Sato, Helia Harumi
Here, we show the draft genome sequence of Streptomyces sp. F1, a strain isolated from soil with great potential for secretion of hydrolytic enzymes used to deconstruct cellulosic biomass. The draft genome assembly of Streptomyces sp. strain F1 has 69 contigs with a total genome size of 8,142,296bp and G+C 72.65%. Preliminary genome analysis identified 175 proteins as Carbohydrate-Active Enzymes, being 85 glycoside hydrolases organized in 33 distinct families. This draft genome information provides new insights on the key genes encoding hydrolytic enzymes involved in biomass deconstruction employed by soil bacteria. Copyright © 2017 Sociedade Brasileira de Microbiologia. Published by Elsevier Editora Ltda. All rights reserved.
The draft genome of a diploid cotton Gossypium raimondii
USDA-ARS?s Scientific Manuscript database
We have sequenced and assembled the draft genome of Gossypium raimondii, whose progenitor is considered the contributor of the D-subgenome to the economically important natural textile fiber producer, G. hirsutum. Next-generation Illumina pair-end (PE) sequencing strategies were employed to obtain ...
Draft Genome Sequence of “Cohnella kolymensis” B-2846
Kudryashova, Ekaterina B.; Ariskina, Elena V.
2016-01-01
A draft genome sequence of “Cohnella kolymensis” strain B-2846 was derived using IonTorrent sequencing technology. The size of the assembly and G+C content were in agreement with those of other species of this genus. Characterization of the genome of a novel species of Cohnella will assist in bacterial systematics. PMID:26769947
Yakym, Christopher J.; Helmkampf, Martin; Hagiwara, Kehau; Ip, Courtney G.; Antonio, Brandi J.; Armstrong, Ellie; Ulloa, Wesley J.; Awaya, Jonathan D.
2016-01-01
We report here the 6.0-Mb draft genome assembly of Pseudoalteromonas luteoviolacea strain IPB1 that was isolated from the Hawaiian marine sponge Iotrochota protea. Genome mining complemented with bioassay studies will elucidate secondary metabolite biosynthetic pathways and will help explain the ecological interaction between host sponge and microorganism. PMID:27660784
Utturkar, Sagar M.; Klingeman, Dawn Marie; Land, Miriam L.; ...
2014-06-14
Our motivation with this work was to assess the potential of different types of sequence data combined with de novo and hybrid assembly approaches to improve existing draft genome sequences. Our results show Illumina, 454 and PacBio sequencing technologies were used to generate de novo and hybrid genome assemblies for four different bacteria, which were assessed for quality using summary statistics (e.g. number of contigs, N50) and in silico evaluation tools. Differences in predictions of multiple copies of rDNA operons for each respective bacterium were evaluated by PCR and Sanger sequencing, and then the validated results were applied as anmore » additional criterion to rank assemblies. In general, assemblies using longer PacBio reads were better able to resolve repetitive regions. In this study, the combination of Illumina and PacBio sequence data assembled through the ALLPATHS-LG algorithm gave the best summary statistics and most accurate rDNA operon number predictions. This study will aid others looking to improve existing draft genome assemblies. As to availability and implementation–all assembly tools except CLC Genomics Workbench are freely available under GNU General Public License.« less
ABACAS: algorithm-based automatic contiguation of assembled sequences
Assefa, Samuel; Keane, Thomas M.; Otto, Thomas D.; Newbold, Chris; Berriman, Matthew
2009-01-01
Summary: Due to the availability of new sequencing technologies, we are now increasingly interested in sequencing closely related strains of existing finished genomes. Recently a number of de novo and mapping-based assemblers have been developed to produce high quality draft genomes from new sequencing technology reads. New tools are necessary to take contigs from a draft assembly through to a fully contiguated genome sequence. ABACAS is intended as a tool to rapidly contiguate (align, order, orientate), visualize and design primers to close gaps on shotgun assembled contigs based on a reference sequence. The input to ABACAS is a set of contigs which will be aligned to the reference genome, ordered and orientated, visualized in the ACT comparative browser, and optimal primer sequences are automatically generated. Availability and Implementation: ABACAS is implemented in Perl and is freely available for download from http://abacas.sourceforge.net Contact: sa4@sanger.ac.uk PMID:19497936
Das, Subhadeep; Singh, Deeksha; Madduluri, Madhavi; Chandrababunaidu, Mathu Malar; Gupta, Akash
2015-01-01
We report here the draft genome sequence of Tolypothrix campylonemoides VB511288, isolated from building facades in Santiniketan, India. The members of this genus produce several compounds of commercial importance. The draft assembly is 10,627,177 bases in 135 scaffolds, and it contains 7,886 protein-coding genes, 994 pseudogenes, 18 rRNA genes, and 76 tRNA genes. PMID:25838485
Wang, Daxi; Korhonen, Pasi K; Gasser, Robin B; Young, Neil D
Clonorchis sinensis (family Opisthorchiidae) is an important foodborne parasite that has a major socioeconomic impact on ~35 million people predominantly in China, Vietnam, Korea and the Russian Far East. In humans, infection with C. sinensis causes clonorchiasis, a complex hepatobiliary disease that can induce cholangiocarcinoma (CCA), a malignant cancer of the bile ducts. Central to understanding the epidemiology of this disease is knowledge of genetic variation within and among populations of this parasite. Although most published molecular studies seem to suggest that C. sinensis represents a single species, evidence of karyotypic variation within C. sinensis and cryptic species within a related opisthorchiid fluke (Opisthorchis viverrini) emphasise the importance of studying and comparing the genes and genomes of geographically distinct isolates of C. sinensis. Recently, we sequenced, assembled and characterised a draft nuclear genome of a C. sinensis isolate from Korea and compared it with a published draft genome of a Chinese isolate of this species using a bioinformatic workflow established for comparing draft genome assemblies and their gene annotations. We identified that 50.6% and 51.3% of the Korean and Chinese C. sinensis genomic scaffolds were syntenic, respectively. Within aligned syntenic blocks, the genomes had a high level of nucleotide identity (99.1%) and encoded 15 variable proteins likely to be involved in diverse biological processes. Here, we review current technical challenges of using draft genome assemblies to undertake comparative genomic analyses to quantify genetic variation between isolates of the same species. Using a workflow that overcomes these challenges, we report on a high-quality draft genome for C. sinensis from Korea and comparative genomic analyses, as a basis for future investigations of the genetic structures of C. sinensis populations, and discuss the biotechnological implications of these explorations. Copyright © 2018 Elsevier Inc. All rights reserved.
An Annotated Draft Genome for Radix auricularia (Gastropoda, Mollusca)
Feldmeyer, Barbara; Schmidt, Hanno; Greshake, Bastian; Tills, Oliver; Truebano, Manuela; Rundle, Simon D.; Paule, Juraj; Ebersberger, Ingo; Pfenninger, Markus
2017-01-01
Molluscs are the second most species-rich phylum in the animal kingdom, yet only 11 genomes of this group have been published so far. Here, we present the draft genome sequence of the pulmonate freshwater snail Radix auricularia. Six whole genome shotgun libraries with different layouts were sequenced. The resulting assembly comprises 4,823 scaffolds with a cumulative length of 910 Mb and an overall read coverage of 72×. The assembly contains 94.6% of a metazoan core gene collection, indicating an almost complete coverage of the coding fraction. The discrepancy of ∼690 Mb compared with the estimated genome size of R. auricularia (1.6 Gb) results from a high repeat content of 70% mainly comprising DNA transposons. The annotation of 17,338 protein coding genes was supported by the use of publicly available transcriptome data. This draft will serve as starting point for further genomic and population genetic research in this scientifically important phylum. PMID:28204581
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lapidus, Alla L.
From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly ofmore » whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.« less
Genome assemblies for 11 Yersinia pestis strains isolated in the Caucasus region
Zhgenti, Ekaterine; Johnson, Shannon L.; Davenport, Karen W.; ...
2015-09-17
Yersinia pestis, the causative agent of plague, is endemic to the Caucasus region but few reference strain genome sequences from that region are available. We present the improved draft or finished assembled genomes from 11 strains isolated in the nation of Georgia and surrounding countries.
Draft Genome Sequence of Microbacterium sp. Strain UCD-TDU (Phylum Actinobacteria)
Bendiks, Zachary A.; Lang, Jenna M.; Darling, Aaron E.; Coil, David A.
2013-01-01
Here, we present the draft genome sequence of Microbacterium sp. strain UCD-TDU, a member of the phylum Actinobacteria. The assembly contains 3,746,321 bp (in 8 scaffolds). This strain was isolated from a residential toilet as part of an undergraduate student research project to sequence reference genomes of microbes from the built environment. PMID:23516225
Draft Genome Sequence of Lactobacillus paracasei DmW181, a Bacterium Isolated from Wild Drosophila.
Hammer, Austin J; Walters, Amber; Carroll, Courtney; Newell, Peter D; Chaston, John M
2017-07-06
The draft genome sequence of Lactobacillus paracasei DmW181, an anaerobic bacterium isolate from wild Drosophila flies, is reported here. Strain DmW181 possesses genes for sialic acid and mannose metabolism. The assembled genome is 3,201,429 bp, with 3,454 predicted genes. Copyright © 2017 Hammer et al.
Draft Genome Sequence of the Putrescine-Producing Strain Lactococcus lactis subsp. lactis 1AA59
del Rio, Beatriz; Linares, Daniel M.; Fernandez, María; Mayo, Baltasar; Martín, M. Cruz
2015-01-01
We report here the 2,576,542-bp genome annotated draft assembly sequence of Lactococcus lactis subsp. lactis 1AA59. This strain—isolated from a traditional cheese—produces putrescine, one of the most frequently biogenic amines found in dairy products. PMID:26089428
Use of a draft genome of coffee (Coffea arabica) to identify SNPs associated with caffeine content.
Tran, Hue T M; Ramaraj, Thiruvarangan; Furtado, Agnelo; Lee, Leonard Slade; Henry, Robert J
2018-03-07
Arabica coffee (Coffea arabica) has a small gene pool limiting genetic improvement. Selection for caffeine content within this gene pool would be assisted by identification of the genes controlling this important trait. Sequencing of DNA bulks from 18 genotypes with extreme high- or low-caffeine content from a population of 232 genotypes was used to identify linked polymorphisms. To obtain a reference genome, a whole genome assembly of arabica coffee (variety K7) was achieved by sequencing using short read (Illumina) and long-read (PacBio) technology. Assembly was performed using a range of assembly tools resulting in 76 409 scaffolds with a scaffold N50 of 54 544 bp and a total scaffold length of 1448 Mb. Validation of the genome assembly using different tools showed high completeness of the genome. More than 99% of transcriptome sequences mapped to the C. arabica draft genome, and 89% of BUSCOs were present. The assembled genome annotated using AUGUSTUS yielded 99 829 gene models. Using the draft arabica genome as reference in mapping and variant calling allowed the detection of 1444 nonsynonymous single nucleotide polymorphisms (SNPs) associated with caffeine content. Based on Kyoto Encyclopaedia of Genes and Genomes pathway-based analysis, 65 caffeine-associated SNPs were discovered, among which 11 SNPs were associated with genes encoding enzymes involved in the conversion of substrates, which participate in the caffeine biosynthesis pathways. This analysis demonstrated the complex genetic control of this key trait in coffee. © 2018 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
A high-coverage draft genome of the mycalesine butterfly Bicyclus anynana.
Nowell, Reuben W; Elsworth, Ben; Oostra, Vicencio; Zwaan, Bas J; Wheat, Christopher W; Saastamoinen, Marjo; Saccheri, Ilik J; Van't Hof, Arjen E; Wasik, Bethany R; Connahs, Heidi; Aslam, Muhammad L; Kumar, Sujai; Challis, Richard J; Monteiro, Antónia; Brakefield, Paul M; Blaxter, Mark
2017-07-01
The mycalesine butterfly Bicyclus anynana, the "Squinting bush brown," is a model organism in the study of lepidopteran ecology, development, and evolution. Here, we present a draft genome sequence for B. anynana to serve as a genomics resource for current and future studies of this important model species. Seven libraries with insert sizes ranging from 350 bp to 20 kb were constructed using DNA from an inbred female and sequenced using both Illumina and PacBio technology; 128 Gb of raw Illumina data was filtered to 124 Gb and assembled to a final size of 475 Mb (∼×260 assembly coverage). Contigs were scaffolded using mate-pair, transcriptome, and PacBio data into 10 800 sequences with an N50 of 638 kb (longest scaffold 5 Mb). The genome is comprised of 26% repetitive elements and encodes a total of 22 642 predicted protein-coding genes. Recovery of a BUSCO set of core metazoan genes was almost complete (98%). Overall, these metrics compare well with other recently published lepidopteran genomes. We report a high-quality draft genome sequence for Bicyclus anynana. The genome assembly and annotated gene models are available at LepBase (http://ensembl.lepbase.org/index.html). © The Authors 2017. Published by Oxford University Press.
A high-coverage draft genome of the mycalesine butterfly Bicyclus anynana
Elsworth, Ben; Oostra, Vicencio; Zwaan, Bas J.; Wheat, Christopher W.; Saastamoinen, Marjo; Saccheri, Ilik J.; van’t Hof, Arjen E.; Wasik, Bethany R.; Connahs, Heidi; Aslam, Muhammad L.; Kumar, Sujai; Challis, Richard J.; Monteiro, Antónia; Brakefield, Paul M.
2017-01-01
Abstract The mycalesine butterfly Bicyclus anynana, the “Squinting bush brown,” is a model organism in the study of lepidopteran ecology, development, and evolution. Here, we present a draft genome sequence for B. anynana to serve as a genomics resource for current and future studies of this important model species. Seven libraries with insert sizes ranging from 350 bp to 20 kb were constructed using DNA from an inbred female and sequenced using both Illumina and PacBio technology; 128 Gb of raw Illumina data was filtered to 124 Gb and assembled to a final size of 475 Mb (∼×260 assembly coverage). Contigs were scaffolded using mate-pair, transcriptome, and PacBio data into 10 800 sequences with an N50 of 638 kb (longest scaffold 5 Mb). The genome is comprised of 26% repetitive elements and encodes a total of 22 642 predicted protein-coding genes. Recovery of a BUSCO set of core metazoan genes was almost complete (98%). Overall, these metrics compare well with other recently published lepidopteran genomes. We report a high-quality draft genome sequence for Bicyclus anynana. The genome assembly and annotated gene models are available at LepBase (http://ensembl.lepbase.org/index.html). PMID:28486658
Das, Subhadeep; Singh, Deeksha; Madduluri, Madhavi; Chandrababunaidu, Mathu Malar; Gupta, Akash; Adhikary, Siba Prasad; Tripathy, Sucheta
2015-04-02
We report here the draft genome sequence of Tolypothrix campylonemoides VB511288, isolated from building facades in Santiniketan, India. The members of this genus produce several compounds of commercial importance. The draft assembly is 10,627,177 bases in 135 scaffolds, and it contains 7,886 protein-coding genes, 994 pseudogenes, 18 rRNA genes, and 76 tRNA genes. Copyright © 2015 Das et al.
HopBase: a unified resource for Humulus genomics
Hill, Steven T.; Sudarsanam, Ramcharan
2017-01-01
Abstract Hop (Humulus lupulus L. var lupulus) is a dioecious plant of worldwide significance, used primarily for bittering and flavoring in brewing beer. Studies on the medicinal properties of several unique compounds produced by hop have led to additional interest from pharmacy and healthcare industries as well as livestock production as a natural antibiotic. Genomic research in hop has resulted a published draft genome and transcriptome assemblies. As research into the genomics of hop has gained interest, there is a critical need for centralized online genomic resources. To support the growing research community, we report the development of an online resource "HopBase.org." In addition to providing a gene annotation to the existing Shinsuwase draft genome, HopBase makes available genome assemblies and annotations for both the cultivar “Teamaker” and male hop accession number USDA 21422M. These genome assemblies, gene annotations, along with other common data, coupled with a genome browser and BLAST database enable the hop community to enter the genomic age. The HopBase genomic resource is accessible at http://hopbase.org and http://hopbase.cgrb.oregonstate.edu. PMID:28415075
Draft genome of tule elk Cervus canadensis nannodes.
Mizzi, Jessica E; Lounsberry, Zachary T; Brown, C Titus; Sacks, Benjamin N
2017-01-01
This paper presents the first draft genome of the tule elk ( Cervus elaphus nannodes ), a subspecies native to California that underwent an extreme genetic bottleneck in the late 1800s. The genome was generated from Illumina HiSeq 3000 whole genome sequencing of four individuals, resulting in the assembly of 2.395 billion base pairs (Gbp) over 602,862 contigs over 500 bp and N50 = 6,885 bp. This genome provides a resource to facilitate future genomic research on elk and other cervids.
Dramatic improvement in genome assembly achieved using doubled-haploid genomes.
Zhang, Hong; Tan, Engkong; Suzuki, Yutaka; Hirose, Yusuke; Kinoshita, Shigeharu; Okano, Hideyuki; Kudoh, Jun; Shimizu, Atsushi; Saito, Kazuyoshi; Watabe, Shugo; Asakawa, Shuichi
2014-10-27
Improvement in de novo assembly of large genomes is still to be desired. Here, we improved draft genome sequence quality by employing doubled-haploid individuals. We sequenced wildtype and doubled-haploid Takifugu rubripes genomes, under the same conditions, using the Illumina platform and assembled contigs with SOAPdenovo2. We observed 5.4-fold and 2.6-fold improvement in the sizes of the N50 contig and scaffold of doubled-haploid individuals, respectively, compared to the wildtype, indicating that the use of a doubled-haploid genome aids in accurate genome analysis.
Gschloessl, B; Dorkeld, F; Berges, H; Beydon, G; Bouchez, O; Branco, M; Bretaudeau, A; Burban, C; Dubois, E; Gauthier, P; Lhuillier, E; Nichols, J; Nidelet, S; Rocha, S; Sauné, L; Streiff, R; Gautier, M; Kerdelhué, C
2018-05-01
The pine processionary moth Thaumetopoea pityocampa (Lepidoptera: Notodontidae) is the main pine defoliator in the Mediterranean region. Its urticating larvae cause severe human and animal health concerns in the invaded areas. This species shows a high phenotypic variability for various traits, such as phenology, fecundity and tolerance to extreme temperatures. This study presents the construction and analysis of extensive genomic and transcriptomic resources, which are an obligate prerequisite to understand their underlying genetic architecture. Using a well-studied population from Portugal with peculiar phenological characteristics, the karyotype was first determined and a first draft genome of 537 Mb total length was assembled into 68,292 scaffolds (N50 = 164 kb). From this genome assembly, 29,415 coding genes were predicted. To circumvent some limitations for fine-scale physical mapping of genomic regions of interest, a 3X coverage BAC library was also developed. In particular, 11 BACs from this library were individually sequenced to assess the assembly quality. Additionally, de novo transcriptomic resources were generated from various developmental stages sequenced with HiSeq and MiSeq Illumina technologies. The reads were de novo assembled into 62,376 and 63,175 transcripts, respectively. Then, a robust subset of the genome-predicted coding genes, the de novo transcriptome assemblies and previously published 454/Sanger data were clustered to obtain a high-quality and comprehensive reference transcriptome consisting of 29,701 bona fide unigenes. These sequences covered 99% of the cegma and 88% of the busco highly conserved eukaryotic genes and 84% of the busco arthropod gene set. Moreover, 90% of these transcripts could be localized on the draft genome. The described information is available via a genome annotation portal (http://bipaa.genouest.org/sp/thaumetopoea_pityocampa/). © 2018 John Wiley & Sons Ltd.
Haemonchus contortus: Genome Structure, Organization and Comparative Genomics.
Laing, R; Martinelli, A; Tracey, A; Holroyd, N; Gilleard, J S; Cotton, J A
2016-01-01
One of the first genome sequencing projects for a parasitic nematode was that for Haemonchus contortus. The open access data from the Wellcome Trust Sanger Institute provided a valuable early resource for the research community, particularly for the identification of specific genes and genetic markers. Later, a second sequencing project was initiated by the University of Melbourne, and the two draft genome sequences for H. contortus were published back-to-back in 2013. There is a pressing need for long-range genomic information for genetic mapping, population genetics and functional genomic studies, so we are continuing to improve the Wellcome Trust Sanger Institute assembly to provide a finished reference genome for H. contortus. This review describes this process, compares the H. contortus genome assemblies with draft genomes from other members of the strongylid group and discusses future directions for parasite genomics using the H. contortus model. Copyright © 2016 Elsevier Ltd. All rights reserved.
USDA-ARS?s Scientific Manuscript database
The current pig reference genome sequence (Sscrofa10.2) was established using Sanger sequencing and following the clone-by-clone hierarchical shotgun sequencing approach used in the public human genome project. However, as sequence coverage was low (4-6x) the resulting assembly was only of draft qua...
USDA-ARS?s Scientific Manuscript database
Although draft genomes are available for most agronomically important plant species, the majority are incomplete, highly fragmented, and often riddled with assembly and scaffolding errors. These assembly issues hinder advances in tool development for functional genomics and systems biology. Here we ...
Chiriac, Cecilia; Baricz, Andreea
2018-01-01
ABSTRACT The draft genome assembly of Janthinobacterium sp. strain ROICE36 has 207 contigs, with a total genome size of 5,977,006 bp and a G+C content of 62%. Preliminary genome analysis identified 5,363 protein-coding genes and a total of 7 secondary metabolic gene clusters (encoding bacteriocins, nonribosomal peptide-synthetase [NRPS], terpene, hserlactone, and other ketide synthases). PMID:29650588
A draft annotation and overview of the human genome
Wright, Fred A; Lemon, William J; Zhao, Wei D; Sears, Russell; Zhuo, Degen; Wang, Jian-Ping; Yang, Hee-Yung; Baer, Troy; Stredney, Don; Spitzner, Joe; Stutz, Al; Krahe, Ralf; Yuan, Bo
2001-01-01
Background The recent draft assembly of the human genome provides a unified basis for describing genomic structure and function. The draft is sufficiently accurate to provide useful annotation, enabling direct observations of previously inferred biological phenomena. Results We report here a functionally annotated human gene index placed directly on the genome. The index is based on the integration of public transcript, protein, and mapping information, supplemented with computational prediction. We describe numerous global features of the genome and examine the relationship of various genetic maps with the assembly. In addition, initial sequence analysis reveals highly ordered chromosomal landscapes associated with paralogous gene clusters and distinct functional compartments. Finally, these annotation data were synthesized to produce observations of gene density and number that accord well with historical estimates. Such a global approach had previously been described only for chromosomes 21 and 22, which together account for 2.2% of the genome. Conclusions We estimate that the genome contains 65,000-75,000 transcriptional units, with exon sequences comprising 4%. The creation of a comprehensive gene index requires the synthesis of all available computational and experimental evidence. PMID:11516338
Genome Improvement at JGI-HAGSC
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grimwood, Jane; Schmutz, Jeremy J.; Myers, Richard M.
Since the completion of the sequencing of the human genome, the Joint Genome Institute (JGI) has rapidly expanded its scientific goals in several DOE mission-relevant areas. At the JGI-HAGSC, we have kept pace with this rapid expansion of projects with our focus on assessing, assembling, improving and finishing eukaryotic whole genome shotgun (WGS) projects for which the shotgun sequence is generated at the Production Genomic Facility (JGI-PGF). We follow this by combining the draft WGS with genomic resources generated at JGI-HAGSC or in collaborator laboratories (including BAC end sequences, genetic maps and FLcDNA sequences) to produce an improved draft sequence.more » For eukaryotic genomes important to the DOE mission, we then add further information from directed experiments to produce reference genomic sequences that are publicly available for any scientific researcher. Also, we have continued our program for producing BAC-based finished sequence, both for adding information to JGI genome projects and for small BAC-based sequencing projects proposed through any of the JGI sequencing programs. We have now built our computational expertise in WGS assembly and analysis and have moved eukaryotic genome assembly from the JGI-PGF to JGI-HAGSC. We have concentrated our assembly development work on large plant genomes and complex fungal and algal genomes.« less
Singh, Deeksha; Chandrababunaidu, Mathu Malar; Panda, Arijit; Sen, Diya; Bhattacharyya, Sourav
2015-01-01
The draft genome assembly of Hassallia byssoidea strain VB512170 with a genome size of ~13 Mb and 10,183 protein-coding genes in 62 scaffolds is reported here for the first time. This is a terrestrial hydrophobic cyanobacterium isolated from monuments in India. We report several copies of luciferase and antibiotic genes in this organism. PMID:25745001
Tanjung, Zulfikar Achmad; Aditama, Redi; Buana, Rika Fithri Nurani; Pratomo, Antonius Dony Madu; Tryono, Reno; Liwang, Tony
2018-01-01
ABSTRACT Ganoderma boninense is the dominant fungal pathogen of basal stem rot (BSR) disease on Elaeis guineensis. We sequenced the nuclear genome of mycelia using both Illumina and Pacific Biosciences platforms for assembly of scaffolds. The draft genome comprised 79.24 Mb, 495 scaffolds, and 26,226 predicted coding sequences. PMID:29700132
Sakai-Kawada, Francis E; Yakym, Christopher J; Helmkampf, Martin; Hagiwara, Kehau; Ip, Courtney G; Antonio, Brandi J; Armstrong, Ellie; Ulloa, Wesley J; Awaya, Jonathan D
2016-09-22
We report here the 6.0-Mb draft genome assembly of Pseudoalteromonas luteoviolacea strain IPB1 that was isolated from the Hawaiian marine sponge Iotrochota protea Genome mining complemented with bioassay studies will elucidate secondary metabolite biosynthetic pathways and will help explain the ecological interaction between host sponge and microorganism. Copyright © 2016 Sakai-Kawada et al.
Draft Genome Sequence of Escherichia coli K-12 (ATCC 10798).
Dimitrova, Daniela; Engelbrecht, Kathleen C; Putonti, Catherine; Koenig, David W; Wolfe, Alan J
2017-07-06
Here, we present the draft genome sequence of Escherichia coli ATCC 10798. E. coli ATCC 10798 is a K-12 strain, one of the most well-studied model microorganisms. The size of the genome was 4,685,496 bp, with a G+C content of 50.70%. This assembly consists of 62 contigs and the F plasmid. Copyright © 2017 Dimitrova et al.
Draft Genome Sequence of Escherichia coli K-12 (ATCC 10798)
Dimitrova, Daniela; Engelbrecht, Kathleen C.; Koenig, David W.; Wolfe, Alan J.
2017-01-01
ABSTRACT Here, we present the draft genome sequence of Escherichia coli ATCC 10798. E. coli ATCC 10798 is a K-12 strain, one of the most well-studied model microorganisms. The size of the genome was 4,685,496 bp, with a G+C content of 50.70%. This assembly consists of 62 contigs and the F plasmid. PMID:28684574
USDA-ARS?s Scientific Manuscript database
Intense artificial selection over the last 100 years has produced elite maize (Zea mays) inbred lines that combine to produce high-yielding hybrids. To further our understanding of how genome and transcriptome variation contribute to the production of high-yielding hybrids, we generated a draft geno...
Draft Genomes for Eight Burkholderia mallei Isolates from Turkey
DOE Office of Scientific and Technical Information (OSTI.GOV)
Daligault, H. E.; Johnson, Shannon L.; Davenport, K. W.
Burkholderia mallei, the etiologic agent of glanders, is a Gram-negative, nonmotile, facultative intracellular pathogen. Though glanders have been eradicated from many parts of the world, the threat ofB. malleibeing used as a weapon is very real. We, then, present draft genome assemblies of 8Burkholderia malleistrains that were isolated in Turkey.
Draft Genomes for Eight Burkholderia mallei Isolates from Turkey
Daligault, H. E.; Johnson, Shannon L.; Davenport, K. W.; ...
2016-01-07
Burkholderia mallei, the etiologic agent of glanders, is a Gram-negative, nonmotile, facultative intracellular pathogen. Though glanders have been eradicated from many parts of the world, the threat ofB. malleibeing used as a weapon is very real. We, then, present draft genome assemblies of 8Burkholderia malleistrains that were isolated in Turkey.
Single molecule sequencing-guided scaffolding and correction of draft assemblies.
Zhu, Shenglong; Chen, Danny Z; Emrich, Scott J
2017-12-06
Although single molecule sequencing is still improving, the lengths of the generated sequences are inevitably an advantage in genome assembly. Prior work that utilizes long reads to conduct genome assembly has mostly focused on correcting sequencing errors and improving contiguity of de novo assemblies. We propose a disassembling-reassembling approach for both correcting structural errors in the draft assembly and scaffolding a target assembly based on error-corrected single molecule sequences. To achieve this goal, we formulate a maximum alternating path cover problem. We prove that this problem is NP-hard, and solve it by a 2-approximation algorithm. Our experimental results show that our approach can improve the structural correctness of target assemblies in the cost of some contiguity, even with smaller amounts of long reads. In addition, our reassembling process can also serve as a competitive scaffolder relative to well-established assembly benchmarks.
Finishing bacterial genome assemblies with Mix.
Soueidan, Hayssam; Maurier, Florence; Groppi, Alexis; Sirand-Pugnet, Pascal; Tardy, Florence; Citti, Christine; Dupuy, Virginie; Nikolski, Macha
2013-01-01
Among challenges that hamper reaping the benefits of genome assembly are both unfinished assemblies and the ensuing experimental costs. First, numerous software solutions for genome de novo assembly are available, each having its advantages and drawbacks, without clear guidelines as to how to choose among them. Second, these solutions produce draft assemblies that often require a resource intensive finishing phase. In this paper we address these two aspects by developing Mix , a tool that mixes two or more draft assemblies, without relying on a reference genome and having the goal to reduce contig fragmentation and thus speed-up genome finishing. The proposed algorithm builds an extension graph where vertices represent extremities of contigs and edges represent existing alignments between these extremities. These alignment edges are used for contig extension. The resulting output assembly corresponds to a set of paths in the extension graph that maximizes the cumulative contig length. We evaluate the performance of Mix on bacterial NGS data from the GAGE-B study and apply it to newly sequenced Mycoplasma genomes. Resulting final assemblies demonstrate a significant improvement in the overall assembly quality. In particular, Mix is consistent by providing better overall quality results even when the choice is guided solely by standard assembly statistics, as is the case for de novo projects. Mix is implemented in Python and is available at https://github.com/cbib/MIX, novel data for our Mycoplasma study is available at http://services.cbib.u-bordeaux2.fr/mix/.
Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen.
Stewart, Robert D; Auffret, Marc D; Warr, Amanda; Wiser, Andrew H; Press, Maximilian O; Langford, Kyle W; Liachko, Ivan; Snelling, Timothy J; Dewhurst, Richard J; Walker, Alan W; Roehe, Rainer; Watson, Mick
2018-02-28
The cow rumen is adapted for the breakdown of plant material into energy and nutrients, a task largely performed by enzymes encoded by the rumen microbiome. Here we present 913 draft bacterial and archaeal genomes assembled from over 800 Gb of rumen metagenomic sequence data derived from 43 Scottish cattle, using both metagenomic binning and Hi-C-based proximity-guided assembly. Most of these genomes represent previously unsequenced strains and species. The draft genomes contain over 69,000 proteins predicted to be involved in carbohydrate metabolism, over 90% of which do not have a good match in public databases. Inclusion of the 913 genomes presented here improves metagenomic read classification by sevenfold against our own data, and by fivefold against other publicly available rumen datasets. Thus, our dataset substantially improves the coverage of rumen microbial genomes in the public databases and represents a valuable resource for biomass-degrading enzyme discovery and studies of the rumen microbiome.
Riveros-Mckay, Fernando; Campos, Itzia; Giles-Gómez, Martha; Bolívar, Francisco
2014-01-01
Leuconostoc mesenteroides P45 was isolated from the traditional Mexican pulque beverage. We report its draft genome sequence, assembled in 6 contigs consisting of 1,874,188 bp and no plasmids. Genome annotation predicted a total of 1,800 genes, 1,687 coding sequences, 52 pseudogenes, 9 rRNAs, 51 tRNAs, 1 noncoding RNA, and 44 frameshifted genes. PMID:25377708
Singh, Deeksha; Chandrababunaidu, Mathu Malar; Panda, Arijit; Sen, Diya; Bhattacharyya, Sourav; Adhikary, Siba Prasad; Tripathy, Sucheta
2015-03-05
The draft genome assembly of Hassallia byssoidea strain VB512170 with a genome size of ~13 Mb and 10,183 protein-coding genes in 62 scaffolds is reported here for the first time. This is a terrestrial hydrophobic cyanobacterium isolated from monuments in India. We report several copies of luciferase and antibiotic genes in this organism. Copyright © 2015 Singh et al.
Moreno-Avitia, Fabian; Lozano, Luis; Utrilla, Jose; Bolívar, Francisco; Escalante, Adelfo
2017-06-08
Pseudomonas chlororaphis strain ATCC 9446 is a biocontrol-related organism. We report here its draft genome sequence assembled into 35 contigs consisting of 6,783,030 bp. Genome annotation predicted a total of 6,200 genes, 6,128 coding sequences, 81 pseudogenes, 58 tRNAs, 4 noncoding RNAs (ncRNAs), and 41 frameshifted genes. Copyright © 2017 Moreno-Avitia et al.
Utomo, Condro; Tanjung, Zulfikar Achmad; Aditama, Redi; Buana, Rika Fithri Nurani; Pratomo, Antonius Dony Madu; Tryono, Reno; Liwang, Tony
2018-04-26
Ganoderma boninense is the dominant fungal pathogen of basal stem rot (BSR) disease on Elaeis guineensis We sequenced the nuclear genome of mycelia using both Illumina and Pacific Biosciences platforms for assembly of scaffolds. The draft genome comprised 79.24 Mb, 495 scaffolds, and 26,226 predicted coding sequences. Copyright © 2018 Utomo et al.
Ferreira, Dalila Souza Santos; Kato, Rodrigo Bentes; Miranda, Fábio Malcher; da Costa Pinheiro, Kenny; Fonseca, Paula Luize Camargos; Tomé, Luiz Marcelo Ribeiro; Vaz, Aline Bruna Martins; Badotti, Fernanda; Ramos, Rommel Thiago Jucá; Brenig, Bertram; Azevedo, Vasco Ariston de Carvalho; Benevides, Raquel Guimarães; Góes-Neto, Aristóteles
2018-06-01
Herein, we present the draft genome of Trametes villosa isolate CCMB561, a wood-decaying Basidiomycota commonly found in tropical semiarid climate. The genome assembly was 57.98 Mb in size with an L50 of 691. A total of 16,711 putative protein-encoding genes was predicted, including 590 genes coding for carbohydrate-active enzymes (CAZy), directly involved in the decomposition of lignocellulosic materials. This is the first genome of this species of high interest in bioenergy research. The draft genome of Trametes villosa isolate CCMB561 will provide an important resource for future investigations in biofuel production, bioremediation and other green technologies.
Li, Qili; Bu, Junyan; Yu, Zhihe; Tang, Lihua; Huang, Suiping; Guo, Tangxun; Mo, Jianyou; Hsiang, Tom
2018-02-22
Here, we present a draft genome sequence of isolate 15060 of Colletotrichum fructicola , a causal agent of mango anthracnose. The final assembly consists of 1,048 scaffolds totaling 56,493,063 bp (G+C content, 53.38%) and 15,180 predicted genes. Copyright © 2018 Li et al.
Singh, Nitin Kumar; Blachowicz, Adriana; Checinska, Aleksandra; Wang, Clay; Venkateswaran, Kasthuri
2016-07-14
Draft genome sequences of Aspergillus fumigatus strains (ISSFT-021 and IF1SW-F4), opportunistic pathogens isolated from the International Space Station (ISS), were assembled to facilitate investigations of the nature of the virulence characteristics of the ISS strains to other clinical strains isolated on Earth. Copyright © 2016 Singh et al.
Moura, Quézia; Fernandes, Miriam R; Cerdeira, Louise; Santos, Ana Carolina M; de Souza, Tiago A; Ienne, Susan; Pignatari, Antonio Carlos C; Gales, Ana C; Silva, Rosa M; Lincopan, Nilton
2017-09-01
Here we report the draft genome sequence of a multidrug-resistant (MDR) Aeromonas hydrophila strain belonging to sequence type 508 (ST508) isolated from a human bloodstream infection. Assembly and annotation of this draft genome resulted in 5028498bp and revealed the presence of 16S rRNA methylase rmtD and bla CTX-M-131 genes encoding high-level resistance to aminoglycosides and cephalosporins, respectively, as well as multiple virulence genes. This draft genome can provide significant information for understanding mechanisms on the establishment and treatment of infections caused by this pathogen. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Utturkar, Sagar M.; Bayer, Edward A.; Borovok, Ilya; ...
2016-09-29
Here, we and others have shown the utility of long sequence reads to improve genome assembly quality. In this study, we generated PacBio DNA sequence data to improve the assemblies of draft genomes for Clostridium thermocellum AD2, Clostridium thermocellum LQRI, and Pelosinus fermentans R7.
Verwaaijen, Bart; Wibberg, Daniel; Nelkner, Johanna; Gordin, Miriam; Rupp, Oliver; Winkler, Anika; Bremges, Andreas; Blom, Jochen; Grosch, Rita; Pühler, Alfred; Schlüter, Andreas
2018-02-10
Lettuce (Lactuca sativa, L.) is an important annual plant of the family Asteraceae (Compositae). The commercial lettuce cultivar Tizian has been used in various scientific studies investigating the interaction of the plant with phytopathogens or biological control agents. Here, we present the de novo draft genome sequencing and gene prediction for this specific cultivar derived from transcriptome sequence data. The assembled scaffolds amount to a size of 2.22 Gb. Based on RNAseq data, 31,112 transcript isoforms were identified. Functional predictions for these transcripts were determined within the GenDBE annotation platform. Comparison with the cv. Salinas reference genome revealed a high degree of sequence similarity on genome and transcriptome levels, with an average amino acid identity of 99%. Furthermore, it was observed that two large regions are either missing or are highly divergent within the cv. Tizian genome compared to cv. Salinas. One of these regions covers the major resistance complex 1 region of cv. Salinas. The cv. Tizian draft genome sequence provides a valuable resource for future functional and transcriptome analyses focused on this lettuce cultivar. Copyright © 2017 Elsevier B.V. All rights reserved.
Jo, Jihoon; Oh, Jooseong; Lee, Hyun-Gwan; Hong, Hyun-Hee; Lee, Sung-Gwon; Cheon, Seongmin; Kern, Elizabeth M A; Jin, Soyeong; Cho, Sung-Jin; Park, Joong-Ki; Park, Chungoo
2017-01-01
The Japanese sea cucumber (Apostichopus japonicus Selenka 1867) is an economically important species as a source of seafood and ingredient in traditional medicine. It is mainly found off the coasts of northeast Asia. Recently, substantial exploitation and widespread biotic diseases in A. japonicus have generated increasing conservation concern. However, the genomic knowledge base and resources available for researchers to use in managing this natural resource and to establish genetically based breeding systems for sea cucumber aquaculture are still in a nascent stage. A total of 312 Gb of raw sequences were generated using the Illumina HiSeq 2000 platform and assembled to a final size of 0.66 Gb, which is about 80.5% of the estimated genome size (0.82 Gb). We observed nucleotide-level heterozygosity within the assembled genome to be 0.986%. The resulting draft genome assembly comprising 132 607 scaffolds with an N50 value of 10.5 kb contains a total of 21 771 predicted protein-coding genes. We identified 6.6-14.5 million heterozygous single nucleotide polymorphisms in the assembled genome of the three natural color variants (green, red, and black), resulting in an estimated nucleotide diversity of 0.00146. We report the first draft genome of A. japonicus and provide a general overview of the genetic variation in the three major color variants of A. japonicus. These data will help provide a comprehensive view of the genetic, physiological, and evolutionary relationships among color variants in A. japonicus, and will be invaluable resources for sea cucumber genomic research. © The Author 2017. Published by Oxford University Press.
Bhattacharyya, Anamitra; Stilwagen, Stephanie; Reznik, Gary; Feil, Helene; Feil, William S; Anderson, Iain; Bernal, Axel; D'Souza, Mark; Ivanova, Natalia; Kapatral, Vinayak; Larsen, Niels; Los, Tamara; Lykidis, Athanasios; Selkov, Eugene; Walunas, Theresa L; Purcell, Alexander; Edwards, Rob A; Hawkins, Trevor; Haselkorn, Robert; Overbeek, Ross; Kyrpides, Nikos C; Predki, Paul F
2002-10-01
Draft sequencing is a rapid and efficient method for determining the near-complete sequence of microbial genomes. Here we report a comparative analysis of one complete and two draft genome sequences of the phytopathogenic bacterium, Xylella fastidiosa, which causes serious disease in plants, including citrus, almond, and oleander. We present highlights of an in silico analysis based on a comparison of reconstructions of core biological subsystems. Cellular pathway reconstructions have been used to identify a small number of genes, which are likely to reside within the draft genomes but are not captured in the draft assembly. These represented only a small fraction of all genes and were predominantly large and small ribosomal subunit protein components. By using this approach, some of the inherent limitations of draft sequence can be significantly reduced. Despite the incomplete nature of the draft genomes, it is possible to identify several phage-related genes, which appear to be absent from the draft genomes and not the result of insufficient sequence sampling. This region may therefore identify potential host-specific functions. Based on this first functional reconstruction of a phytopathogenic microbe, we spotlight an unusual respiration machinery as a potential target for biological control. We also predicted and developed a new defined growth medium for Xylella.
Draft genome sequence of field isolate Brucella melitensis strain 2007BM/1 from India.
Singh, D K; Kumar, Bablu; Shrinet, Garima; Singh, R P; Das, Aparajita; Mantur, B G; Abhishek; Pandey, Aruna; Mondal, Piyali; Sajjanar, B K; Doimari, Soni; Singh, Vijayata; Kumari, Reena; Tiwari, A K; Gandham, Ravi Kumar
2018-04-21
Brucellosis is among one of the most widespread important global zoonotic diseases that is endemic in many parts of India. Brucella melitensis is supposed to be the most pathogenic species for humans. Here we report the draft genome sequence of B. melitensis strain 2007BM/1 isolated from a human in India. Genomic DNA was extracted from Brucella culture and was sequenced using an Illumina MiSeq platform. The generated reads were assembled using three de novo assemblers and the draft genome was annotated. This monoisolate, with a genome length of 3268756bp, was found to be resistant to azithromycin and trimethoprim/sulfamethoxazole but susceptible to tetracycline, ofloxacin, rifampicin, ciprofloxacin and doxycycline. The presence of virulence genes in the strain was identified. The results obtained will help in understanding drug resistance mechanisms and virulence factors in highly zoonotic B. melitensis and suggest the need for judicious use of antibiotics in livestock health and management practices. Copyright © 2018 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Draft Genome Sequence of Hafnia paralvei Strain GTA-HAF03.
Kohlman, Melissa E; Carrillo, Catherine D; Wong, Alex
2015-02-19
Hafnia paralvei is a Gram-negative member of the Enterobacteriaceae family, closely related to the opportunistic pathogen Hafnia alvei. We report here the first draft genome sequence of H. paralvei, from the beef trim isolate GTA-HAF03, consisting of a 5.0-Mbp assembly encoding 4,382 proteins and 90 predicted RNAs. Copyright © 2015 Kohlman et al.
Draft Genome Sequence of the Tyramine Producer Enterococcus durans Strain IPLA 655
Ladero, Victor; Linares, Daniel M.; del Rio, Beatriz; Fernandez, Maria; Martin, M. Cruz
2013-01-01
We here report a 3.059-Mbp draft assembly for the genome of Enterococcus durans strain IPLA 655. This dairy isolate provides a model for studying the regulation of the biosynthesis of tyramine (a toxic compound). These results should aid our understanding of tyramine production and allow tyramine accumulation in food to be reduced. PMID:23682153
Gleaning evolutionary insights from the genome sequence of a probiotic yeast Saccharomyces boulardii
2013-01-01
Background The yeast Saccharomyces boulardii is used worldwide as a probiotic to alleviate the effects of several gastrointestinal diseases and control antibiotics-associated diarrhea. While many studies report the probiotic effects of S. boulardii, no genome information for this yeast is currently available in the public domain. Results We report the 11.4 Mbp draft genome of this probiotic yeast. The draft genome was obtained by assembling Roche 454 FLX + shotgun data into 194 contigs with an N50 of 251 Kbp. We compare our draft genome with all other Saccharomyces cerevisiae genomes. Conclusions Our analysis confirms the close similarity of S. boulardii to S. cerevisiae strains and provides a framework to understand the probiotic effects of this yeast, which exhibits unique physiological and metabolic properties. PMID:24148866
Khatri, Indu; Akhtar, Akil; Kaur, Kamaldeep; Tomar, Rajul; Prasad, Gandham Satyanarayana; Ramya, Thirumalai Nallan Chakravarthy; Subramanian, Srikrishna
2013-10-22
The yeast Saccharomyces boulardii is used worldwide as a probiotic to alleviate the effects of several gastrointestinal diseases and control antibiotics-associated diarrhea. While many studies report the probiotic effects of S. boulardii, no genome information for this yeast is currently available in the public domain. We report the 11.4 Mbp draft genome of this probiotic yeast. The draft genome was obtained by assembling Roche 454 FLX + shotgun data into 194 contigs with an N50 of 251 Kbp. We compare our draft genome with all other Saccharomyces cerevisiae genomes. Our analysis confirms the close similarity of S. boulardii to S. cerevisiae strains and provides a framework to understand the probiotic effects of this yeast, which exhibits unique physiological and metabolic properties.
Riveros-Mckay, Fernando; Campos, Itzia; Giles-Gómez, Martha; Bolívar, Francisco; Escalante, Adelfo
2014-11-06
Leuconostoc mesenteroides P45 was isolated from the traditional Mexican pulque beverage. We report its draft genome sequence, assembled in 6 contigs consisting of 1,874,188 bp and no plasmids. Genome annotation predicted a total of 1,800 genes, 1,687 coding sequences, 52 pseudogenes, 9 rRNAs, 51 tRNAs, 1 noncoding RNA, and 44 frameshifted genes. Copyright © 2014 Riveros-Mckay et al.
Ross, Daniel E; Marshall, Christopher W; May, Harold D; Norman, R Sean
2017-09-07
Draft genome sequences of Acetobacterium sp. strain MES1 and Desulfovibrio sp. strain MES5 were obtained from the metagenome of a cathode-associated community enriched within a microbial electrosynthesis system (MES). The draft genome sequences provide insight into the functional potential of these microorganisms within an MES and a foundation for future comparative analyses. Copyright © 2017 Ross et al.
Draft Genome of the Pearl Oyster Pinctada fucata: A Platform for Understanding Bivalve Biology
Takeuchi, Takeshi; Kawashima, Takeshi; Koyanagi, Ryo; Gyoja, Fuki; Tanaka, Makiko; Ikuta, Tetsuro; Shoguchi, Eiichi; Fujiwara, Mayuki; Shinzato, Chuya; Hisata, Kanako; Fujie, Manabu; Usami, Takeshi; Nagai, Kiyohito; Maeyama, Kaoru; Okamoto, Kikuhiko; Aoki, Hideo; Ishikawa, Takashi; Masaoka, Tetsuji; Fujiwara, Atushi; Endo, Kazuyoshi; Endo, Hirotoshi; Nagasawa, Hiromichi; Kinoshita, Shigeharu; Asakawa, Shuichi; Watabe, Shugo; Satoh, Nori
2012-01-01
The study of the pearl oyster Pinctada fucata is key to increasing our understanding of the molecular mechanisms involved in pearl biosynthesis and biology of bivalve molluscs. We sequenced ∼1150-Mb genome at ∼40-fold coverage using the Roche 454 GS-FLX and Illumina GAIIx sequencers. The sequences were assembled into contigs with N50 = 1.6 kb (total contig assembly reached to 1024 Mb) and scaffolds with N50 = 14.5 kb. The pearl oyster genome is AT-rich, with a GC content of 34%. DNA transposons, retrotransposons, and tandem repeat elements occupied 0.4, 1.5, and 7.9% of the genome, respectively (a total of 9.8%). Version 1.0 of the P. fucata draft genome contains 23 257 complete gene models, 70% of which are supported by the corresponding expressed sequence tags. The genes include those reported to have an association with bio-mineralization. Genes encoding transcription factors and signal transduction molecules are present in numbers comparable with genomes of other metazoans. Genome-wide molecular phylogeny suggests that the lophotrochozoan represents a distinct clade from ecdysozoans. Our draft genome of the pearl oyster thus provides a platform for the identification of selection markers and genes for calcification, knowledge of which will be important in the pearl industry. PMID:22315334
Sutcliffe, Brodie; Rosewarne, Carly P.; Greenfield, Paul; Li, Dongmei
2013-01-01
The draft genome sequence of Thermotoga maritima A7A was obtained from a metagenomic assembly obtained from a high-temperature hydrocarbon reservoir in the Gippsland Basin, Australia. The organism is predicted to be a motile anaerobe with an array of catabolic enzymes for the degradation of numerous carbohydrates. PMID:24009120
O'Hair, Joshua A.; Li, Hui; Thapa, Santosh; Scholz, Matthew B.
2017-01-01
ABSTRACT Novel cellulolytic microorganisms can potentially influence second-generation biofuel production. This paper reports the draft genome sequence of Bacillus licheniformis strain YNP1-TSU, isolated from hydrothermal-vegetative microbiomes inside Yellowstone National Park. The assembled sequence contigs predicted 4,230 coding genes, 66 tRNAs, and 10 rRNAs through automated annotation. PMID:28254968
Hyson, Peter; Shapiro, Joshua A; Wien, Michelle W
2015-10-08
Exiguobacterium sp. strain BMC-KP was isolated as part of a student environmental sampling project at Bryn Mawr College, PA. Sequencing of bacterial DNA assembled a 3.32-Mb draft genome. Analysis suggests the presence of genes for tolerance to cold and toxic metals, broad carbohydrate metabolism, and genes derived from phage. Copyright © 2015 Hyson et al.
Zhang, Yunzeng; Barthe, Gary; Grosser, Jude W; Wang, Nian
2016-07-08
Citrus blight is a citrus tree overall decline disease and causes serious losses in the citrus industry worldwide. Although it was described more than one hundred years ago, its causal agent remains unknown and its pathophysiology is not well determined, which hampers our understanding of the disease and design of suitable disease management. In this study, we sequenced and assembled the draft genome for Swingle citrumelo, one important citrus rootstock. The draft genome is approximately 280 Mb, which covers 74 % of the estimated Swingle citrumelo genome and the average coverage is around 15X. The draft genome of Swingle citrumelo enabled us to conduct transcriptome analysis of roots of blight and healthy Swingle citrumelo using RNA-seq. The RNA-seq was reliable as evidenced by the high consistence of RNA-seq analysis and quantitative reverse transcription PCR results (R(2) = 0.966). Comparison of the gene expression profiles between blight and healthy root samples revealed the molecular mechanism underneath the characteristic blight phenotypes including decline, starch accumulation, and drought stress. The JA and ET biosynthesis and signaling pathways showed decreased transcript abundance, whereas SA-mediated defense-related genes showed increased transcript abundance in blight trees, suggesting unclassified biotrophic pathogen was involved in this disease. Overall, the Swingle citrumelo draft genome generated in this study will advance our understanding of plant biology and contribute to the citrus breeding. Transcriptome analysis of blight and healthy trees deepened our understanding of the pathophysiology of citrus blight.
Yong, Hoi-Sen; Eamsobhana, Praphathip; Lim, Phaik-Eem; Razali, Rozaimi; Aziz, Farhanah Abdul; Rosli, Nurul Shielawati Mohamed; Poole-Johnson, Johan; Anwar, Arif
2015-08-01
Angiostrongylus cantonensis is a bursate nematode parasite that causes eosinophilic meningitis (or meningoencephalitis) in humans in many parts of the world. The genomic data from A. cantonensis will form a useful resource for comparative genomic and chemogenomic studies to aid the development of diagnostics and therapeutics. We have sequenced, assembled and annotated the genome of A. cantonensis. The genome size is estimated to be ∼260 Mb, with 17,280 genomic scaffolds, 91X coverage, 81.45% for complete and 93.95% for partial score based on CEGMA analysis of genome completeness. The number of predicted genes of ≥300 bp was 17,482. A total of 7737 predicted protein-coding genes of ≥50 amino acids were identified in the assembled genome. Among the proteins of known function, kinases are the most abundant followed by transferases. The draft genome contains 34 excretory-secretory proteins (ES), a minimum of 44 Nematode Astacin (NAS) metalloproteases, 12 Homeobox (HOX) genes, and 30 neurotransmitters. The assembled genome size (260 Mb) is larger than those of Pristionchus pacificus, Caenorhabditis elegans, Necator americanus, Caenorhabditis briggsae, Trichinella spiralis, Brugia malayi and Loa loa, but smaller than Haemonchus contortus and Ascaris suum. The repeat content (25%) is similar to H. contortus. The GC content (41.17%) is lower compared to P. pacificus (42.7%) and H. contortus (43.1%) but higher compared to C. briggsae (37.69%), A. suum (37.9%) and N. americanus (40.2%) while the scaffold N50 is 42,191. This draft genome will facilitate the understanding of many unresolved issues on the parasite and the disorder it causes. Copyright © 2015 Elsevier B.V. All rights reserved.
2018-01-01
ABSTRACT Karnal bunt of wheat is an internationally quarantined fungal pathogen disease caused by Tilletia indica and affects the international commercial seed trade of wheat. We announce here the first improved draft genome assembly of a monoteliosporic culture of the Tilletia indica fungus, consisting of 787 scaffolds with an approximate total genome size of 31.83 Mbp, which is more accurate and near to complete than the previous version. PMID:29773612
Kumar, Anil; Mishra, Pallavi; Maurya, Ranjeet; Mishra, A K; Gupta, Vijai K; Ramteke, Pramod W; Marla, Soma S
2018-05-17
Karnal bunt of wheat is an internationally quarantined fungal pathogen disease caused by Tilletia indica and affects the international commercial seed trade of wheat. We announce here the first improved draft genome assembly of a monoteliosporic culture of the Tilletia indica fungus, consisting of 787 scaffolds with an approximate total genome size of 31.83 Mbp, which is more accurate and near to complete than the previous version. Copyright © 2018 Kumar et al.
Draft genome of the lined seahorse, Hippocampus erectus.
Lin, Qiang; Qiu, Ying; Gu, Ruobo; Xu, Meng; Li, Jia; Bian, Chao; Zhang, Huixian; Qin, Geng; Zhang, Yanhong; Luo, Wei; Chen, Jieming; You, Xinxin; Fan, Mingjun; Sun, Min; Xu, Pao; Venkatesh, Byrappa; Xu, Junming; Fu, Hongtuo; Shi, Qiong
2017-06-01
The lined seahorse, Hippocampus erectus , is an Atlantic species and mainly inhabits shallow sea beds or coral reefs. It has become very popular in China for its wide use in traditional Chinese medicine. In order to improve the aquaculture yield of this valuable fish species, we are trying to develop genomic resources for assistant selection in genetic breeding. Here, we provide whole genome sequencing, assembly, and gene annotation of the lined seahorse, which can enrich genome resource and further application for its molecular breeding. A total of 174.6 Gb (Gigabase) raw DNA sequences were generated by the Illumina Hiseq2500 platform. The final assembly of the lined seahorse genome is around 458 Mb, representing 94% of the estimated genome size (489 Mb by k-mer analysis). The contig N50 and scaffold N50 reached 14.57 kb and 1.97 Mb, respectively. Quality of the assembled genome was assessed by BUSCO with prediction of 85% of the known vertebrate genes and evaluated using the de novo assembled RNA-seq transcripts to prove a high mapping ratio (more than 99% transcripts could be mapped to the assembly). Using homology-based, de novo and transcriptome-based prediction methods, we predicted 20 788 protein-coding genes in the generated assembly, which is less than our previously reported gene number (23 458) of the tiger tail seahorse ( H. comes ). We report a draft genome of the lined seahorse. These generated genomic data are going to enrich genome resource of this economically important fish, and also provide insights into the genetic mechanisms of its iconic morphology and male pregnancy behavior. © The Authors 2017. Published by Oxford University Press.
Draft genome of the lined seahorse, Hippocampus erectus
Lin, Qiang; Qiu, Ying; Gu, Ruobo; Xu, Meng; Li, Jia; Bian, Chao; Zhang, Huixian; Qin, Geng; Zhang, Yanhong; Luo, Wei; Chen, Jieming; You, Xinxin; Fan, Mingjun; Sun, Min; Xu, Pao; Venkatesh, Byrappa
2017-01-01
Abstract Background: The lined seahorse, Hippocampus erectus, is an Atlantic species and mainly inhabits shallow sea beds or coral reefs. It has become very popular in China for its wide use in traditional Chinese medicine. In order to improve the aquaculture yield of this valuable fish species, we are trying to develop genomic resources for assistant selection in genetic breeding. Here, we provide whole genome sequencing, assembly, and gene annotation of the lined seahorse, which can enrich genome resource and further application for its molecular breeding. Findings: A total of 174.6 Gb (Gigabase) raw DNA sequences were generated by the Illumina Hiseq2500 platform. The final assembly of the lined seahorse genome is around 458 Mb, representing 94% of the estimated genome size (489 Mb by k-mer analysis). The contig N50 and scaffold N50 reached 14.57 kb and 1.97 Mb, respectively. Quality of the assembled genome was assessed by BUSCO with prediction of 85% of the known vertebrate genes and evaluated using the de novo assembled RNA-seq transcripts to prove a high mapping ratio (more than 99% transcripts could be mapped to the assembly). Using homology-based, de novo and transcriptome-based prediction methods, we predicted 20 788 protein-coding genes in the generated assembly, which is less than our previously reported gene number (23 458) of the tiger tail seahorse (H. comes). Conclusion: We report a draft genome of the lined seahorse. These generated genomic data are going to enrich genome resource of this economically important fish, and also provide insights into the genetic mechanisms of its iconic morphology and male pregnancy behavior. PMID:28444302
Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse
Hillier, LaDeana W.; Zody, Michael C.; Goldstein, Steve; She, Xinwe; Bult, Carol J.; Agarwala, Richa; Cherry, Joshua L.; DiCuccio, Michael; Hlavina, Wratko; Kapustin, Yuri; Meric, Peter; Maglott, Donna; Birtle, Zoë; Marques, Ana C.; Graves, Tina; Zhou, Shiguo; Teague, Brian; Potamousis, Konstantinos; Churas, Christopher; Place, Michael; Herschleb, Jill; Runnheim, Ron; Forrest, Daniel; Amos-Landgraf, James; Schwartz, David C.; Cheng, Ze; Lindblad-Toh, Kerstin; Eichler, Evan E.; Ponting, Chris P.
2009-01-01
The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non–protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not. PMID:19468303
Brown, Nathan M; Mueller, Ryan S; Shepardson, Jonathan W; Landry, Zachary C; Morré, Jeffrey T; Maier, Claudia S; Hardy, F Joan; Dreher, Theo W
2016-06-13
Very few closed genomes of the cyanobacteria that commonly produce toxic blooms in lakes and reservoirs are available, limiting our understanding of the properties of these organisms. A new anatoxin-a-producing member of the Nostocaceae, Anabaena sp. WA102, was isolated from a freshwater lake in Washington State, USA, in 2013 and maintained in non-axenic culture. The Anabaena sp. WA102 5.7 Mbp genome assembly has been closed with long-read, single-molecule sequencing and separately a draft genome assembly has been produced with short-read sequencing technology. The closed and draft genome assemblies are compared, showing a correlation between long repeats in the genome and the many gaps in the short-read assembly. Anabaena sp. WA102 encodes anatoxin-a biosynthetic genes, as does its close relative Anabaena sp. AL93 (also introduced in this study). These strains are distinguished by differences in the genes for light-harvesting phycobilins, with Anabaena sp. AL93 possessing a phycoerythrocyanin operon. Biologically relevant structural variants in the Anabaena sp. WA102 genome were detected only by long-read sequencing: a tandem triplication of the anaBCD promoter region in the anatoxin-a synthase gene cluster (not triplicated in Anabaena sp. AL93) and a 5-kbp deletion variant present in two-thirds of the population. The genome has a large number of mobile elements (160). Strikingly, there was no synteny with the genome of its nearest fully assembled relative, Anabaena sp. 90. Structural and functional genome analyses indicate that Anabaena sp. WA102 has a flexible genome. Genome closure, which can be readily achieved with long-read sequencing, reveals large scale (e.g., gene order) and local structural features that should be considered in understanding genome evolution and function.
Sánchez-Nieves, Rubén; Facciotti, Marc; Saavedra-Collado, Sofía; Dávila-Santiago, Lizbeth; Rodríguez-Carrero, Roy; Montalvo-Rodríguez, Rafael
2016-03-01
The genus Haloarcula belongs to the family Halobacteriaceae which currently has 10 valid species. Here we report the draft genome sequence of strain SL3, a new species within this genus, isolated from the Solar Salterns of Cabo Rojo, Puerto Rico. Genome assembly performed using NGEN Assembler resulted in 18 contigs (N50 = 601,911 bp), the largest of which contains 1,023,775 bp. The genome consists of 3.97 MB and has a GC content of 61.97%. Like all species of Haloarcula, the genome encodes heterogeneous copies of the small subunit ribosomal RNA. In addition, the genome includes 6 rRNAs, 48 tRNAs, and 3797 protein coding sequences. Several carbohydrate-active enzymes genes were found, as well as enzymes involved in the dihydroxyacetone processing pathway which are not found in other Haloarcula species. The NCBI accession number for this genome is LIUF00000000 and the strain deposit number is CECT9001.
Nowrousian, Minou; Stajich, Jason E.; Chu, Meiling; Engh, Ines; Espagne, Eric; Halliday, Karen; Kamerewerd, Jens; Kempken, Frank; Knab, Birgit; Kuo, Hsiao-Che; Osiewacz, Heinz D.; Pöggeler, Stefanie; Read, Nick D.; Seiler, Stephan; Smith, Kristina M.; Zickler, Denise; Kück, Ulrich; Freitag, Michael
2010-01-01
Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30–90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in ∼4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for comparative studies to address basic questions of fungal biology. PMID:20386741
Nowrousian, Minou; Stajich, Jason E; Chu, Meiling; Engh, Ines; Espagne, Eric; Halliday, Karen; Kamerewerd, Jens; Kempken, Frank; Knab, Birgit; Kuo, Hsiao-Che; Osiewacz, Heinz D; Pöggeler, Stefanie; Read, Nick D; Seiler, Stephan; Smith, Kristina M; Zickler, Denise; Kück, Ulrich; Freitag, Michael
2010-04-08
Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30-90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in approximately 4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for comparative studies to address basic questions of fungal biology.
Ortiz, Elio M.; Berretta, Marcelo F.; Benintende, Graciela B.; Zandomeni, Rubén O.
2015-01-01
Geobacillus sp. isolate T6 was collected from a thermal spring in Salta, Argentina. The draft genome sequence (3,767,773 bp) of this isolate is represented by one major scaffold of 3,46 Mbp, a second one of 207 kbp, and 20 scaffolds of <13 kbp. The assembled sequences revealed 3,919 protein-coding genes. PMID:26184933
Nasser, Kother; Mustafa, Abu Salim; Khan, Mohd Wasif; Purohit, Prashant; Al-Obaid, Inaam; Dhar, Rita; Al-Fouzan, Wadha
2018-04-19
Acinetobacter baumannii is an important opportunistic pathogen in global health care settings. Its dissemination and multidrug resistance pose an issue with treatment and outbreak control. Here, we present draft genome assemblies of six multidrug-resistant clinical strains of A. baumannii isolated from patients admitted to one of two major hospitals in Kuwait. Copyright © 2018 Nasser et al.
Eastman, Alexander W.; Yuan, Ze-Chun
2015-01-01
Advances in sequencing technology have drastically increased the depth and feasibility of bacterial genome sequencing. However, little information is available that details the specific techniques and procedures employed during genome sequencing despite the large numbers of published genomes. Shotgun approaches employed by second-generation sequencing platforms has necessitated the development of robust bioinformatics tools for in silico assembly, and complete assembly is limited by the presence of repetitive DNA sequences and multi-copy operons. Typically, re-sequencing with multiple platforms and laborious, targeted Sanger sequencing are employed to finish a draft bacterial genome. Here we describe a novel strategy based on the identification and targeted sequencing of repetitive rDNA operons to expedite bacterial genome assembly and finishing. Our strategy was validated by finishing the genome of Paenibacillus polymyxa strain CR1, a bacterium with potential in sustainable agriculture and bio-based processes. An analysis of the 38 contigs contained in the P. polymyxa strain CR1 draft genome revealed 12 repetitive rDNA operons with varied intragenic and flanking regions of variable length, unanimously located at contig boundaries and within contig gaps. These highly similar but not identical rDNA operons were experimentally verified and sequenced simultaneously with multiple, specially designed primer sets. This approach also identified and corrected significant sequence rearrangement generated during the initial in silico assembly of sequencing reads. Our approach reduces the required effort associated with blind primer walking for contig assembly, increasing both the speed and feasibility of genome finishing. Our study further reinforces the notion that repetitive DNA elements are major limiting factors for genome finishing. Moreover, we provided a step-by-step workflow for genome finishing, which may guide future bacterial genome finishing projects. PMID:25653642
Zheng, Zequn; Zhang, Qisen; Zhou, Gaofeng; Sweetingham, Mark W.; Howieson, John G.; Li, Chengdao
2013-01-01
Lupin (Lupinus angustifolius L.) is the most recently domesticated crop in major agricultural cultivation. Its seeds are high in protein and dietary fibre, but low in oil and starch. Medical and dietetic studies have shown that consuming lupin-enriched food has significant health benefits. We report the draft assembly from a whole genome shotgun sequencing dataset for this legume species with 26.9x coverage of the genome, which is predicted to contain 57,807 genes. Analysis of the annotated genes with metabolic pathways provided a partial understanding of some key features of lupin, such as the amino acid profile of storage proteins in seeds. Furthermore, we applied the NGS-based RAD-sequencing technology to obtain 8,244 sequence-defined markers for anchoring the genomic sequences. A total of 4,214 scaffolds from the genome sequence assembly were aligned into the genetic map. The combination of the draft assembly and a sequence-defined genetic map made it possible to locate and study functional genes of agronomic interest. The identification of co-segregating SNP markers, scaffold sequences and gene annotation facilitated the identification of a candidate R gene associated with resistance to the major lupin disease anthracnose. We demonstrated that the combination of medium-depth genome sequencing and a high-density genetic linkage map by application of NGS technology is a cost-effective approach to generating genome sequence data and a large number of molecular markers to study the genomics, genetics and functional genes of lupin, and to apply them to molecular plant breeding. This strategy does not require prior genome knowledge, which potentiates its application to a wide range of non-model species. PMID:23734219
Yang, Huaan; Tao, Ye; Zheng, Zequn; Zhang, Qisen; Zhou, Gaofeng; Sweetingham, Mark W; Howieson, John G; Li, Chengdao
2013-01-01
Lupin (Lupinus angustifolius L.) is the most recently domesticated crop in major agricultural cultivation. Its seeds are high in protein and dietary fibre, but low in oil and starch. Medical and dietetic studies have shown that consuming lupin-enriched food has significant health benefits. We report the draft assembly from a whole genome shotgun sequencing dataset for this legume species with 26.9x coverage of the genome, which is predicted to contain 57,807 genes. Analysis of the annotated genes with metabolic pathways provided a partial understanding of some key features of lupin, such as the amino acid profile of storage proteins in seeds. Furthermore, we applied the NGS-based RAD-sequencing technology to obtain 8,244 sequence-defined markers for anchoring the genomic sequences. A total of 4,214 scaffolds from the genome sequence assembly were aligned into the genetic map. The combination of the draft assembly and a sequence-defined genetic map made it possible to locate and study functional genes of agronomic interest. The identification of co-segregating SNP markers, scaffold sequences and gene annotation facilitated the identification of a candidate R gene associated with resistance to the major lupin disease anthracnose. We demonstrated that the combination of medium-depth genome sequencing and a high-density genetic linkage map by application of NGS technology is a cost-effective approach to generating genome sequence data and a large number of molecular markers to study the genomics, genetics and functional genes of lupin, and to apply them to molecular plant breeding. This strategy does not require prior genome knowledge, which potentiates its application to a wide range of non-model species.
Genome Sequence of the Necrotrophic Plant Pathogen Alternaria brassicicola Abra43
Belmas, Elodie; Briand, Martial; Kwasiborski, Anthony; Colou, Justine; N’Guyen, Guillaume; Iacomi, Béatrice; Grappin, Philippe; Campion, Claire; Simoneau, Philippe; Barret, Matthieu
2018-01-01
ABSTRACT Alternaria brassicicola causes dark spot (or black spot) disease, which is one of the most common and destructive fungal diseases of Brassicaceae spp. worldwide. Here, we report the draft genome sequence of strain Abra43. The assembly comprises 29 scaffolds, with an N50 value of 2.1 Mb. The assembled genome was 31,036,461 bp in length, with a G+C content of 50.85%. PMID:29439047
Genome Sequence of an Endophytic Fungus, Fusarium solani JS-169, Which Has Antifungal Activity.
Kim, Jung A; Jeon, Jongbum; Park, Sook-Young; Kim, Ki-Tae; Choi, Gobong; Lee, Hyun-Jung; Kim, Yangsun; Yang, Hee-Sun; Yeo, Joo-Hong; Lee, Yong-Hwan; Kim, Soonok
2017-10-19
An endophytic fungus, Fusarium solani strain JS-169, isolated from a mulberry twig, showed considerable antifungal activity. Here, we report the draft genome sequence of this strain. The assembly comprises 17 scaffolds, with an N 50 value of 4.93 Mb. The assembled genome was 45,813,297 bp in length, with a G+C content of 49.91%. Copyright © 2017 Kim et al.
Patil, Yogita; Müller, Nicolai; Schink, Bernhard; ...
2017-02-20
Anaerobium acetethylicum strain GluBS11 T belongs to the family Lachnospiraceae within the order Clostridiales. It is a Gram-positive, non-motile and strictly anaerobic bacterium isolated from biogas slurry that was originally enriched with gluconate as carbon source (Patil, et al., Int J Syst Evol Microbiol 65:3289-3296, 2015). Here we describe the draft genome sequence of strain GluBS11 T and provide a detailed insight into its physiological and metabolic features. The draft genome sequence generated 4,609,043 bp, distributed among 105 scaffolds assembled using the SPAdes genome assembler method. It comprises in total 4,132 genes, of which 4,008 were predicted to be proteinmore » coding genes, 124 RNA genes and 867 pseudogenes. The content was 43.51 mol %. The annotated genome of strain GluBS11 T contains putative genes coding for the pentose phosphate pathway, the Embden-Meyerhoff-Parnas pathway, the Entner-Doudoroff pathway and the tricarboxylic acid cycle. The genome revealed the presence of most of the necessary genes required for the fermentation of glucose and gluconate to acetate, ethanol, and hydrogen gas. However, a candidate gene for production of formate was not identified.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Patil, Yogita; Müller, Nicolai; Schink, Bernhard
Anaerobium acetethylicum strain GluBS11 T belongs to the family Lachnospiraceae within the order Clostridiales. It is a Gram-positive, non-motile and strictly anaerobic bacterium isolated from biogas slurry that was originally enriched with gluconate as carbon source (Patil, et al., Int J Syst Evol Microbiol 65:3289-3296, 2015). Here we describe the draft genome sequence of strain GluBS11 T and provide a detailed insight into its physiological and metabolic features. The draft genome sequence generated 4,609,043 bp, distributed among 105 scaffolds assembled using the SPAdes genome assembler method. It comprises in total 4,132 genes, of which 4,008 were predicted to be proteinmore » coding genes, 124 RNA genes and 867 pseudogenes. The content was 43.51 mol %. The annotated genome of strain GluBS11 T contains putative genes coding for the pentose phosphate pathway, the Embden-Meyerhoff-Parnas pathway, the Entner-Doudoroff pathway and the tricarboxylic acid cycle. The genome revealed the presence of most of the necessary genes required for the fermentation of glucose and gluconate to acetate, ethanol, and hydrogen gas. However, a candidate gene for production of formate was not identified.« less
Lee, Chi-Ching; Chen, Yi-Ping Phoebe; Yao, Tzu-Jung; Ma, Cheng-Yu; Lo, Wei-Cheng; Lyu, Ping-Chiang; Tang, Chuan Yi
2013-04-10
Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project. Copyright © 2012 Elsevier B.V. All rights reserved.
The draft genome of sweet orange (Citrus sinensis).
Xu, Qiang; Chen, Ling-Ling; Ruan, Xiaoan; Chen, Dijun; Zhu, Andan; Chen, Chunli; Bertrand, Denis; Jiao, Wen-Biao; Hao, Bao-Hai; Lyon, Matthew P; Chen, Jiongjiong; Gao, Song; Xing, Feng; Lan, Hong; Chang, Ji-Wei; Ge, Xianhong; Lei, Yang; Hu, Qun; Miao, Yin; Wang, Lun; Xiao, Shixin; Biswas, Manosh Kumar; Zeng, Wenfang; Guo, Fei; Cao, Hongbo; Yang, Xiaoming; Xu, Xi-Wen; Cheng, Yun-Jiang; Xu, Juan; Liu, Ji-Hong; Luo, Oscar Junhong; Tang, Zhonghui; Guo, Wen-Wu; Kuang, Hanhui; Zhang, Hong-Yu; Roose, Mikeal L; Nagarajan, Niranjan; Deng, Xiu-Xin; Ruan, Yijun
2013-01-01
Oranges are an important nutritional source for human health and have immense economic value. Here we present a comprehensive analysis of the draft genome of sweet orange (Citrus sinensis). The assembled sequence covers 87.3% of the estimated orange genome, which is relatively compact, as 20% is composed of repetitive elements. We predicted 29,445 protein-coding genes, half of which are in the heterozygous state. With additional sequencing of two more citrus species and comparative analyses of seven citrus genomes, we present evidence to suggest that sweet orange originated from a backcross hybrid between pummelo and mandarin. Focused analysis on genes involved in vitamin C metabolism showed that GalUR, encoding the rate-limiting enzyme of the galacturonate pathway, is significantly upregulated in orange fruit, and the recent expansion of this gene family may provide a genomic basis. This draft genome represents a valuable resource for understanding and improving many important citrus traits in the future.
Reducing assembly complexity of microbial genomes with single-molecule sequencing.
Koren, Sergey; Harhay, Gregory P; Smith, Timothy P L; Bono, James L; Harhay, Dayna M; Mcvey, Scott D; Radune, Diana; Bergman, Nicholas H; Phillippy, Adam M
2013-01-01
The short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes. Therefore, most genomes have been left unfinished due to the significant resources required to manually close gaps in draft assemblies. Third-generation, single-molecule sequencing addresses this problem by greatly increasing sequencing read length, which simplifies the assembly problem. To measure the benefit of single-molecule sequencing on microbial genome assembly, we sequenced and assembled the genomes of six bacteria and analyzed the repeat complexity of 2,267 complete bacteria and archaea. Our results indicate that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio RS sequencing library. These single-library assemblies are also more accurate than typical short-read assemblies and hybrid assemblies of short and long reads. Automated assembly of long, single-molecule sequencing data reduces the cost of microbial finishing to $1,000 for most genomes, and future advances in this technology are expected to drive the cost lower. This is expected to increase the number of completed genomes, improve the quality of microbial genome databases, and enable high-fidelity, population-scale studies of pan-genomes and chromosomal organization.
Kim, Jung A; Jeon, Jongbum; Kim, Ki-Tae; Choi, Gobong; Park, Sook-Young; Lee, Hyun-Jung; Shim, Sang-Hee; Lee, Yong-Hwan; Kim, Soonok
2017-08-03
An endophytic fungus, Gaeumannomyces sp. strain JS-464, is capable of producing a number of secondary metabolites which showed significant nitric oxide reduction activity. The draft genome assembly has a size of 53,151,282 bp, with a G+C content of 53.11% consisting of 80 scaffolds with an N 50 of 7.46 Mbp. Copyright © 2017 Kim et al.
Auffret, Pauline; Segura, Audrey; Klopp, Christophe; Bouchez, Olivier; Kérourédan, Monique; Bibbal, Delphine; Brugère, Hubert; Forano, Evelyne
2017-01-01
ABSTRACT Enterohemorrhagic Escherichia coli (EHEC) with serotype O157:H7 is a major foodborne pathogen. Here, we report the draft genome sequence of EHEC O157:H7 strain MC2 isolated from cattle in France. The assembly contains 5,400,376 bp that encoded 5,914 predicted genes (5,805 protein-encoding genes and 109 RNA genes). PMID:28983004
Oldeschulte, David L; Halley, Yvette A; Wilson, Miranda L; Bhattarai, Eric K; Brashear, Wesley; Hill, Joshua; Metz, Richard P; Johnson, Charles D; Rollins, Dale; Peterson, Markus J; Bickhart, Derek M; Decker, Jared E; Sewell, John F; Seabury, Christopher M
2017-09-07
Northern bobwhite ( Colinus virginianus ; hereafter bobwhite) and scaled quail ( Callipepla squamata ) populations have suffered precipitous declines across most of their US ranges. Illumina-based first- (v1.0) and second- (v2.0) generation draft genome assemblies for the scaled quail and the bobwhite produced N50 scaffold sizes of 1.035 and 2.042 Mb, thereby producing a 45-fold improvement in contiguity over the existing bobwhite assembly, and ≥90% of the assembled genomes were captured within 1313 and 8990 scaffolds, respectively. The scaled quail assembly (v1.0 = 1.045 Gb) was ∼20% smaller than the bobwhite (v2.0 = 1.254 Gb), which was supported by kmer-based estimates of genome size. Nevertheless, estimates of GC content (41.72%; 42.66%), genome-wide repetitive content (10.40%; 10.43%), and MAKER-predicted protein coding genes (17,131; 17,165) were similar for the scaled quail (v1.0) and bobwhite (v2.0) assemblies, respectively. BUSCO analyses utilizing 3023 single-copy orthologs revealed a high level of assembly completeness for the scaled quail (v1.0; 84.8%) and the bobwhite (v2.0; 82.5%), as verified by comparison with well-established avian genomes. We also detected 273 putative segmental duplications in the scaled quail genome (v1.0), and 711 in the bobwhite genome (v2.0), including some that were shared among both species. Autosomal variant prediction revealed ∼2.48 and 4.17 heterozygous variants per kilobase within the scaled quail (v1.0) and bobwhite (v2.0) genomes, respectively, and estimates of historic effective population size were uniformly higher for the bobwhite across all time points in a coalescent model. However, large-scale declines were predicted for both species beginning ∼15-20 KYA. Copyright © 2017 Oldeschulte et al.
Draft genome sequence of non-shiga toxin-producing Escherichia coli O157 NCCP15738.
Kwon, Taesoo; Kim, Jung-Beom; Bak, Young-Seok; Yu, Young-Bin; Kwon, Ki Sung; Kim, Won; Cho, Seung-Hak
2016-01-01
The non-shiga toxin-producing Escherichia coli (non-STEC) O157 is a pathogenic strain that cause diarrhea but does not cause hemolytic-uremic syndrome, or hemorrhagic colitis. Here, we present the 5-Mb draft genome sequence of non-STEC O157 NCCP15738, which was isolated from the feces of a Korean patient with diarrhea, and describe its features and the structural basis for its genome evolution. A total of 565-Mbp paired-end reads were generated using the Illumina-HiSeq 2000 platform. The reads were assembled into 135 scaffolds throughout the de novo assembly. The assembled genome size of NCCP15738 was 5,005,278 bp with an N50 value of 142,450 bp and 50.65 % G+C content. Using Rapid Annotation using Subsystem Technology analysis, we predicted 4780 ORFs and 31 RNA genes. The evolutionary tree was inferred from multiple sequence alignment of 45 E. coli species. The most closely related neighbor of NCCP15738 indicated by whole-genome phylogeny was E. coli UMNK88, but that indicated by multilocus sequence analysis was E. coli DH1(ME8569). A comparison between the NCCP15738 genome and those of reference strains, E. coli K-12 substr. MG1655 and EHEC O157:H7 EDL933 by bioinformatics analyses revealed unique genes in NCCP15738 associated with lysis protein S, two-component signal transduction system, conjugation, the flagellum, nucleotide-binding proteins, and metal-ion binding proteins. Notably, NCCP15738 has a dual flagella system like that in Vibrio parahaemolyticus, Aeromonas spp., and Rhodospirillum centenum. The draft genome sequence and the results of bioinformatics analysis of NCCP15738 provide the basis for understanding the genomic evolution of this strain.
Endogenous hepadnaviruses, bornaviruses and circoviruses in snakes
Gilbert, C.; Meik, J. M.; Dashevsky, D.; Card, D. C.; Castoe, T. A.; Schaack, S.
2014-01-01
We report the discovery of endogenous viral elements (EVEs) from Hepadnaviridae, Bornaviridae and Circoviridae in the speckled rattlesnake, Crotalus mitchellii, the first viperid snake for which a draft whole genome sequence assembly is available. Analysis of the draft assembly reveals genome fragments from the three virus families were inserted into the genome of this snake over the past 50 Myr. Cross-species PCR screening of orthologous loci and computational scanning of the python and king cobra genomes reveals that circoviruses integrated most recently (within the last approx. 10 Myr), whereas bornaviruses and hepadnaviruses integrated at least approximately 13 and approximately 50 Ma, respectively. This is, to our knowledge, the first report of circo-, borna- and hepadnaviruses in snakes and the first characterization of non-retroviral EVEs in non-avian reptiles. Our study provides a window into the historical dynamics of viruses in these host lineages and shows that their evolution involved multiple host-switches between mammals and reptiles. PMID:25080342
The Draft Assembly of the Radically Organized Stylonychia lemnae Macronuclear Genome
Aeschlimann, Samuel H.; Jönsson, Franziska; Postberg, Jan; Stover, Nicholas A.; Petera, Robert L.; Lipps, Hans-Joachim; Nowacki, Mariusz; Swart, Estienne C.
2014-01-01
Stylonychia lemnae is a classical model single-celled eukaryote, and a quintessential ciliate typified by dimorphic nuclei: A small, germline micronucleus and a massive, vegetative macronucleus. The genome within Stylonychia’s macronucleus has a very unusual architecture, comprised variably and highly amplified “nanochromosomes,” each usually encoding a single gene with a minimal amount of surrounding noncoding DNA. As only a tiny fraction of the Stylonychia genes has been sequenced, and to promote research using this organism, we sequenced its macronuclear genome. We report the analysis of the 50.2-Mb draft S. lemnae macronuclear genome assembly, containing in excess of 16,000 complete nanochromosomes, assembled as less than 20,000 contigs. We found considerable conservation of fundamental genomic properties between S. lemnae and its close relative, Oxytricha trifallax, including nanochromosomal gene synteny, alternative fragmentation, and copy number. Protein domain searches in Stylonychia revealed two new telomere-binding protein homologs and the presence of linker histones. Among the diverse histone variants of S. lemnae and O. trifallax, we found divergent, coexpressed variants corresponding to four of the five core nucleosomal proteins (H1.2, H2A.6, H2B.4, and H3.7) suggesting that these ciliates may possess specialized nucleosomes involved in genome processing during nuclear differentiation. The assembly of the S. lemnae macronuclear genome demonstrates that largely complete, well-assembled highly fragmented genomes of similar size and complexity may be produced from one library and lane of Illumina HiSeq 2000 shotgun sequencing. The provision of the S. lemnae macronuclear genome sets the stage for future detailed experimental studies of chromatin-mediated, RNA-guided developmental genome rearrangements. PMID:24951568
Strategies and tools for whole genome alignments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Couronne, Olivier; Poliakov, Alexander; Bray, Nicolas
2002-11-25
The availability of the assembled mouse genome makespossible, for the first time, an alignment and comparison of two largevertebrate genomes. We have investigated different strategies ofalignment for the subsequent analysis of conservation of genomes that areeffective for different quality assemblies. These strategies were appliedto the comparison of the working draft of the human genome with the MouseGenome Sequencing Consortium assembly, as well as other intermediatemouse assemblies. Our methods are fast and the resulting alignmentsexhibit a high degree of sensitivity, covering more than 90 percent ofknown coding exons in the human genome. We have obtained such coveragewhile preserving specificity. With amore » view towards the end user, we havedeveloped a suite of tools and websites for automatically aligning, andsubsequently browsing and working with whole genome comparisons. Wedescribe the use of these tools to identify conserved non-coding regionsbetween the human and mouse genomes, some of which have not beenidentified by other methods.« less
Liu, Xianghui; Arumugam, Krithika; Natarajan, Gayathri; Seviour, Thomas W; Drautz-Moses, Daniela I; Wuertz, Stefan; Law, Yingyu; Williams, Rohan B H
2018-05-10
Here, we present the draft genome sequence of an anaerobic ammonium-oxidizing bacterium (AnAOB), " Candidatus Brocadia," which was enriched in an anammox reactor. A 3.2-Mb genome sequence comprising 168 contigs was assembled, in which 2,765 protein-coding genes, 47 tRNAs, and one each of 5S, 16S, and 23S rRNAs were annotated. No evidence for the presence of a nitric oxide-forming nitrite reductase was found. Copyright © 2018 Liu et al.
Mets, David G; Brainard, Michael S
2018-01-01
Abstract Background Vocal learning in songbirds has emerged as a powerful model for sensorimotor learning. Neurobehavioral studies of Bengalese finch (Lonchura striata domestica) song, naturally more variable and plastic than songs of other finch species, have demonstrated the importance of behavioral variability for initial learning, maintenance, and plasticity of vocalizations. However, the molecular and genetic underpinnings of this variability and the learning it supports are poorly understood. Findings To establish a platform for the molecular analysis of behavioral variability and plasticity, we generated an initial draft assembly of the Bengalese finch genome from a single male animal to 151× coverage and an N50 of 3.0 MB. Furthermore, we developed an initial set of gene models using RNA-seq data from 8 samples that comprise liver, muscle, cerebellum, brainstem/midbrain, and forebrain tissue from juvenile and adult Bengalese finches of both sexes. Conclusions We provide a draft Bengalese finch genome and gene annotation to facilitate the study of the molecular-genetic influences on behavioral variability and the process of vocal learning. These data will directly support many avenues for the identification of genes involved in learning, including differential expression analysis, comparative genomic analysis (through comparison to existing avian genome assemblies), and derivation of genetic maps for linkage analysis. Bengalese finch gene models and sequences will be essential for subsequent manipulation (molecular or genetic) of genes and gene products, enabling novel mechanistic investigations into the role of variability in learned behavior. PMID:29618046
Colquitt, Bradley M; Mets, David G; Brainard, Michael S
2018-03-01
Vocal learning in songbirds has emerged as a powerful model for sensorimotor learning. Neurobehavioral studies of Bengalese finch (Lonchura striata domestica) song, naturally more variable and plastic than songs of other finch species, have demonstrated the importance of behavioral variability for initial learning, maintenance, and plasticity of vocalizations. However, the molecular and genetic underpinnings of this variability and the learning it supports are poorly understood. To establish a platform for the molecular analysis of behavioral variability and plasticity, we generated an initial draft assembly of the Bengalese finch genome from a single male animal to 151× coverage and an N50 of 3.0 MB. Furthermore, we developed an initial set of gene models using RNA-seq data from 8 samples that comprise liver, muscle, cerebellum, brainstem/midbrain, and forebrain tissue from juvenile and adult Bengalese finches of both sexes. We provide a draft Bengalese finch genome and gene annotation to facilitate the study of the molecular-genetic influences on behavioral variability and the process of vocal learning. These data will directly support many avenues for the identification of genes involved in learning, including differential expression analysis, comparative genomic analysis (through comparison to existing avian genome assemblies), and derivation of genetic maps for linkage analysis. Bengalese finch gene models and sequences will be essential for subsequent manipulation (molecular or genetic) of genes and gene products, enabling novel mechanistic investigations into the role of variability in learned behavior.
Seabury, Christopher M.; Dowd, Scot E.; Seabury, Paul M.; Raudsepp, Terje; Brightsmith, Donald J.; Liboriussen, Poul; Halley, Yvette; Fisher, Colleen A.; Owens, Elaine; Viswanathan, Ganesh; Tizard, Ian R.
2013-01-01
Data deposition to NCBI Genomes This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession AMXX00000000 (SMACv1.0, unscaffolded genome assembly). The version described in this paper is the first version (AMXX01000000). The scaffolded assembly (SMACv1.1) has been deposited at DDBJ/EMBL/GenBank under the accession AOUJ00000000, and is also the first version (AOUJ01000000). Strong biological interest in traits such as the acquisition and utilization of speech, cognitive abilities, and longevity catalyzed the utilization of two next-generation sequencing platforms to provide the first-draft de novo genome assembly for the large, new world parrot Ara macao (Scarlet Macaw). Despite the challenges associated with genome assembly for an outbred avian species, including 951,507 high-quality putative single nucleotide polymorphisms, the final genome assembly (>1.035 Gb) includes more than 997 Mb of unambiguous sequence data (excluding N’s). Cytogenetic analyses including ZooFISH revealed complex rearrangements associated with two scarlet macaw macrochromosomes (AMA6, AMA7), which supports the hypothesis that translocations, fusions, and intragenomic rearrangements are key factors associated with karyotype evolution among parrots. In silico annotation of the scarlet macaw genome provided robust evidence for 14,405 nuclear gene annotation models, their predicted transcripts and proteins, and a complete mitochondrial genome. Comparative analyses involving the scarlet macaw, chicken, and zebra finch genomes revealed high levels of nucleotide-based conservation as well as evidence for overall genome stability among the three highly divergent species. Application of a new whole-genome analysis of divergence involving all three species yielded prioritized candidate genes and noncoding regions for parrot traits of interest (i.e., speech, intelligence, longevity) which were independently supported by the results of previous human GWAS studies. We also observed evidence for genes and noncoding loci that displayed extreme conservation across the three avian lineages, thereby reflecting their likely biological and developmental importance among birds. PMID:23667475
Li, Gang; Hillier, LaDeana W; Grahn, Robert A; Zimin, Aleksey V; David, Victor A; Menotti-Raymond, Marilyn; Middleton, Rondo; Hannah, Steven; Hendrickson, Sher; Makunin, Alex; O'Brien, Stephen J; Minx, Pat; Wilson, Richard K; Lyons, Leslie A; Warren, Wesley C; Murphy, William J
2016-06-01
High-resolution genetic and physical maps are invaluable tools for building accurate genome assemblies, and interpreting results of genome-wide association studies (GWAS). Previous genetic and physical maps anchored good quality draft assemblies of the domestic cat genome, enabling the discovery of numerous genes underlying hereditary disease and phenotypes of interest to the biomedical science and breeding communities. However, these maps lacked sufficient marker density to order thousands of shorter scaffolds in earlier assemblies, which instead relied heavily on comparative mapping with related species. A high-resolution map would aid in validating and ordering chromosome scaffolds from existing and new genome assemblies. Here, we describe a high-resolution genetic linkage map of the domestic cat genome based on genotyping 453 domestic cats from several multi-generational pedigrees on the Illumina 63K SNP array. The final maps include 58,055 SNP markers placed relative to 6637 markers with unique positions, distributed across all autosomes and the X chromosome. Our final sex-averaged maps span a total autosomal length of 4464 cM, the longest described linkage map for any mammal, confirming length estimates from a previous microsatellite-based map. The linkage map was used to order and orient the scaffolds from a substantially more contiguous domestic cat genome assembly (Felis catus v8.0), which incorporated ∼20 × coverage of Illumina fragment reads. The new genome assembly shows substantial improvements in contiguity, with a nearly fourfold increase in N50 scaffold size to 18 Mb. We use this map to report probable structural errors in previous maps and assemblies, and to describe features of the recombination landscape, including a massive (∼50 Mb) recombination desert (of virtually zero recombination) on the X chromosome that parallels a similar desert on the porcine X chromosome in both size and physical location. Copyright © 2016 Li et al.
Draft genome sequence of ramie, Boehmeria nivea (L.) Gaudich.
Luan, Ming-Bao; Jian, Jian-Bo; Chen, Ping; Chen, Jun-Hui; Chen, Jian-Hua; Gao, Qiang; Gao, Gang; Zhou, Ju-Hong; Chen, Kun-Mei; Guang, Xuan-Min; Chen, Ji-Kang; Zhang, Qian-Qian; Wang, Xiao-Fei; Fang, Long; Sun, Zhi-Min; Bai, Ming-Zhou; Fang, Xiao-Dong; Zhao, Shan-Cen; Xiong, He-Ping; Yu, Chun-Ming; Zhu, Ai-Guo
2018-05-01
Ramie, Boehmeria nivea (L.) Gaudich, family Urticaceae, is a plant native to eastern Asia, and one of the world's oldest fibre crops. It is also used as animal feed and for the phytoremediation of heavy metal-contaminated farmlands. Thus, the genome sequence of ramie was determined to explore the molecular basis of its fibre quality, protein content and phytoremediation. For further understanding ramie genome, different paired-end and mate-pair libraries were combined to generate 134.31 Gb of raw DNA sequences using the Illumina whole-genome shotgun sequencing approach. The highly heterozygous B. nivea genome was assembled using the Platanus Genome Assembler, which is an effective tool for the assembly of highly heterozygous genome sequences. The final length of the draft genome of this species was approximately 341.9 Mb (contig N50 = 22.62 kb, scaffold N50 = 1,126.36 kb). Based on ramie genome annotations, 30,237 protein-coding genes were predicted, and the repetitive element content was 46.3%. The completeness of the final assembly was evaluated by benchmarking universal single-copy orthologous genes (BUSCO); 90.5% of the 1,440 expected embryophytic genes were identified as complete, and 4.9% were identified as fragmented. Phylogenetic analysis based on single-copy gene families and one-to-one orthologous genes placed ramie with mulberry and cannabis, within the clade of urticalean rosids. Genome information of ramie will be a valuable resource for the conservation of endangered Boehmeria species and for future studies on the biogeography and characteristic evolution of members of Urticaceae. © 2018 John Wiley & Sons Ltd.
Simões-Araújo, Jean Luiz; Rumjanek, Norma Gouvêa; Xavier, Gustavo Ribeiro; Zilli, Jerri Édson
The strain BR 3351 T (Bradyrhizobium manausense) was obtained from nodules of cowpea (Vigna unguiculata L. Walp) growing in soil collected from Amazon rainforest. Furthermore, it was observed that the strain has high capacity to fix nitrogen symbiotically in symbioses with cowpea. We report here the draft genome sequence of strain BR 3351 T . The information presented will be important for comparative analysis of nodulation and nitrogen fixation for diazotrophic bacteria. A draft genome with 9,145,311bp and 62.9% of GC content was assembled in 127 scaffolds using 100bp pair-end Illumina MiSeq system. The RAST annotation identified 8603 coding sequences, 51 RNAs genes, classified in 504 subsystems. Published by Elsevier Editora Ltda.
Huang, Jing; Qiao, Zi Xu; Tang, Jing Wei; Wang, Gejiao
2015-01-01
Pontibacillus yanchengensis Y32(T) is an aerobic, motile, Gram-positive, endospore-forming, and moderately halophilic bacterium isolated from a salt field. In this study, we describe the features of P. yanchengensis strain Y32(T) together with a comparison with other four Pontibacillus genomes. The 4,281,464 bp high-quality-draft genome of strain Y32(T) is arranged into 153 contigs containing 3,965 protein-coding genes and 77 RNA encoding genes. The genome of strain Y32(T) possesses many genes related to its halophilic character, flagellar assembly and chemotaxis to support its survival in a salt-rich environment.
Li, Chien-Feng; Tang, Hui-Ling; Chiou, Chien-Shun; Tung, Kwong-Chung; Lu, Min-Chi; Lai, Yi-Chyi
2018-03-01
Klebsiella spp. are regarded as major pathogens causing infections in humans and various animals. Here we report the draft genome sequence of a CTX-M-type β-lactamase-producing Klebsiella quasipneumoniae subsp. similipneumoniae strain CHKP0062 isolated from a Yellow-margined Box turtle. An Illumina-Solexa platform was used to sequence the genome of CHKP0062. Qualified reads were assembled de novo using Velvet. The draft genome was annotated by the NCBI Prokaryotic Genome Annotation Pipeline (PGAP). The resistome and virulome of the strain were investigated. A total of 5423 protein-coding sequences, 87 tRNAs, 24 rRNAs and 12 ncRNAs were identified in the 5 699 275-bp genome. CHKP0062 was assigned to sequence type ST2131 with the K-loci type as KL67. No virulence-associated genes were identified. However, numerous antimicrobial resistance genes were present in this strain. Plasmid contigs were assembled and revealed homology to the multidrug resistance plasmids pC15-K, pCTX-M3 and pKF3-94, with the carriage of the class A β-lactamase genes bla TEM-1b and bla CTX-M-3 . The genome sequence reported in this study will be useful for comparative genomic analysis regarding the dissemination of clinically important antibiotic resistance genes among Klebsiella spp. isolated from humans and animals. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Comparing memory-efficient genome assemblers on stand-alone and cloud infrastructures.
Kleftogiannis, Dimitrios; Kalnis, Panos; Bajic, Vladimir B
2013-01-01
A fundamental problem in bioinformatics is genome assembly. Next-generation sequencing (NGS) technologies produce large volumes of fragmented genome reads, which require large amounts of memory to assemble the complete genome efficiently. With recent improvements in DNA sequencing technologies, it is expected that the memory footprint required for the assembly process will increase dramatically and will emerge as a limiting factor in processing widely available NGS-generated reads. In this report, we compare current memory-efficient techniques for genome assembly with respect to quality, memory consumption and execution time. Our experiments prove that it is possible to generate draft assemblies of reasonable quality on conventional multi-purpose computers with very limited available memory by choosing suitable assembly methods. Our study reveals the minimum memory requirements for different assembly programs even when data volume exceeds memory capacity by orders of magnitude. By combining existing methodologies, we propose two general assembly strategies that can improve short-read assembly approaches and result in reduction of the memory footprint. Finally, we discuss the possibility of utilizing cloud infrastructures for genome assembly and we comment on some findings regarding suitable computational resources for assembly.
Draft genome sequence of the silver pomfret fish, Pampus argenteus.
AlMomin, Sabah; Kumar, Vinod; Al-Amad, Sami; Al-Hussaini, Mohsen; Dashti, Talal; Al-Enezi, Khaznah; Akbar, Abrar
2016-01-01
Silver pomfret, Pampus argenteus, is a fish species from coastal waters. Despite its high commercial value, this edible fish has not been sequenced. Hence, its genetic and genomic studies have been limited. We report the first draft genome sequence of the silver pomfret obtained using a Next Generation Sequencing (NGS) technology. We assembled 38.7 Gb of nucleotides into scaffolds of 350 Mb with N50 of about 1.5 kb, using high quality paired end reads. These scaffolds represent 63.7% of the estimated silver pomfret genome length. The newly sequenced and assembled genome has 11.06% repetitive DNA regions, and this percentage is comparable to that of the tilapia genome. The genome analysis predicted 16 322 genes. About 91% of these genes showed homology with known proteins. Many gene clusters were annotated to protein and fatty-acid metabolism pathways that may be important in the context of the meat texture and immune system developmental processes. The reference genome can pave the way for the identification of many other genomic features that could improve breeding and population-management strategies, and it can also help characterize the genetic diversity of P. argenteus.
Garza-Ramos, Ulises; Tamayo-Legorreta, Elsa; Arellano-Quintanilla, Doris María; Rodriguez-Medina, Nadia; Silva-Sanchez, Jesús; Catalan-Najera, Juan; Rocha-Martínez, Marisol Karina; Bravo-Díaz, María Asunción
2018-01-01
ABSTRACT A colistin-resistant mcr-1-carrying Escherichia coli strain, RC2-007, was isolated from a swine farm in Mexico. This extraintestinal and uropathogenic strain of E. coli belongs to serotype O89:H9 and sequence type 744. Assembly and annotation resulted in a 4.9-Mb draft genome that revealed the presence of plasmid-mediated mcr-1-ISApI1 genes as part of a prophage. PMID:29519827
Whole-Genome Sequence of the Soil Bacterium Micrococcus sp. KBS0714.
Kuo, V; Shoemaker, W R; Muscarella, M E; Lennon, J T
2017-08-10
We present here a draft genome assembly of Micrococcus sp. KBS0714, which was isolated from agricultural soil. The genome provides insight into the strategies that Micrococcus spp. use to contend with environmental stressors such as desiccation and starvation in environmental and host-associated ecosystems. Copyright © 2017 Kuo et al.
The draft genome of Globodera ellingtonae
USDA-ARS?s Scientific Manuscript database
Globodera ellingtonae is a newly described potato cyst nematode found in Idaho, Oregon, and Argentina. Here we present a genome assembly for G. ellingtonae, a relative of the quarantine nematodes G. pallida and G. rostochiensis, produced using data from Illumina and Pacific Biosciences sequencing te...
The Transcriptomics of Secondary Growth and Wood Formation in Conifers
Carvalho, Ana; Paiva, Jorge; Louzada, José; Lima-Brito, José
2013-01-01
In the last years, forestry scientists have adapted genomics and next-generation sequencing (NGS) technologies to the search for candidate genes related to the transcriptomics of secondary growth and wood formation in several tree species. Gymnosperms, in particular, the conifers, are ecologically and economically important, namely, for the production of wood and other forestry end products. Until very recently, no whole genome sequencing of a conifer genome was available. Due to the gradual improvement of the NGS technologies and inherent bioinformatics tools, two draft assemblies of the whole genomes sequence of Picea abies and Picea glauca arose in the current year. These draft genome assemblies will bring new insights about the structure, content, and evolution of the conifer genomes. Furthermore, new directions in the forestry, breeding and research of conifers will be discussed in the following. The identification of genes associated with the xylem transcriptome and the knowledge of their regulatory mechanisms will provide less time-consuming breeding cycles and a high accuracy for the selection of traits related to wood production and quality. PMID:24288610
Barrero, Roberto A; Guerrero, Felix D; Black, Michael; McCooke, John; Chapman, Brett; Schilkey, Faye; Pérez de León, Adalberto A; Miller, Robert J; Bruns, Sara; Dobry, Jason; Mikhaylenko, Galina; Stormo, Keith; Bell, Callum; Tao, Quanzhou; Bogden, Robert; Moolhuijzen, Paula M; Hunter, Adam; Bellgard, Matthew I
2017-08-01
The genome of the cattle tick Rhipicephalus microplus, an ectoparasite with global distribution, is estimated to be 7.1Gbp in length and consists of approximately 70% repetitive DNA. We report the draft assembly of a tick genome that utilized a hybrid sequencing and assembly approach to capture the repetitive fractions of the genome. Our hybrid approach produced an assembly consisting of 2.0Gbp represented in 195,170 scaffolds with a N50 of 60,284bp. The Rmi v2.0 assembly is 51.46% repetitive with a large fraction of unclassified repeats, short interspersed elements, long interspersed elements and long terminal repeats. We identified 38,827 putative R. microplus gene loci, of which 24,758 were protein coding genes (≥100 amino acids). OrthoMCL comparative analysis against 11 selected species including insects and vertebrates identified 10,835 and 3,423 protein coding gene loci that are unique to R. microplus or common to both R. microplus and Ixodes scapularis ticks, respectively. We identified 191 microRNA loci, of which 168 have similarity to known miRNAs and 23 represent novel miRNA families. We identified the genomic loci of several highly divergent R. microplus esterases with sequence similarity to acetylcholinesterase. Additionally we report the finding of a novel cytochrome P450 CYP41 homolog that shows similar protein folding structures to known CYP41 proteins known to be involved in acaricide resistance. Copyright © 2017 Australian Society for Parasitology. Published by Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hirsch, Candice N.; Hirsch, Cory D.; Brohammer, Alex B.
Intense artificial selection over the last 100 years has produced elite maize (Zea mays) inbred lines that combine to produce high-yielding hybrids. To further our understanding of how genome and transcriptome variation contribute to the production of high-yielding hybrids, we generated a draft genome assembly of the inbred line PH207 to complement and compare with the existing B73 reference sequence. B73 is a founder of the Stiff Stalk germplasm pool, while PH207 is a founder of Iodent germplasm, both of which have contributed substantially to the production of temperate commercial maize and are combined to make heterotic hybrids. Comparison ofmore » these two assemblies revealed over 2500 genes present in only one of the two genotypes and 136 gene families that have undergone extensive expansion or contraction. Transcriptome profiling revealed extensive expression variation, with as many as 10,564 differentially expressed transcripts and 7128 transcripts expressed in only one of the two genotypes in a single tissue. Genotype-specific genes were more likely to have tissue/condition-specific expression and lower transcript abundance. The availability of a high-quality genome assembly for the elite maize inbred PH207 expands our knowledge of the breadth of natural genome and transcriptome variation in elite maize inbred lines across heterotic pools.« less
Hirsch, Candice N.; Hirsch, Cory D.; Brohammer, Alex B.; ...
2016-11-01
Intense artificial selection over the last 100 years has produced elite maize (Zea mays) inbred lines that combine to produce high-yielding hybrids. To further our understanding of how genome and transcriptome variation contribute to the production of high-yielding hybrids, we generated a draft genome assembly of the inbred line PH207 to complement and compare with the existing B73 reference sequence. B73 is a founder of the Stiff Stalk germplasm pool, while PH207 is a founder of Iodent germplasm, both of which have contributed substantially to the production of temperate commercial maize and are combined to make heterotic hybrids. Comparison ofmore » these two assemblies revealed over 2500 genes present in only one of the two genotypes and 136 gene families that have undergone extensive expansion or contraction. Transcriptome profiling revealed extensive expression variation, with as many as 10,564 differentially expressed transcripts and 7128 transcripts expressed in only one of the two genotypes in a single tissue. Genotype-specific genes were more likely to have tissue/condition-specific expression and lower transcript abundance. The availability of a high-quality genome assembly for the elite maize inbred PH207 expands our knowledge of the breadth of natural genome and transcriptome variation in elite maize inbred lines across heterotic pools.« less
Soifer, Ilya; Barad, Omer; Shem-Tov, Doron; Baruch, Kobi; Lu, Fei; Hernandez, Alvaro G.; Wright, Chris L.; Koehler, Klaus; Buell, C. Robin; de Leon, Natalia
2016-01-01
Intense artificial selection over the last 100 years has produced elite maize (Zea mays) inbred lines that combine to produce high-yielding hybrids. To further our understanding of how genome and transcriptome variation contribute to the production of high-yielding hybrids, we generated a draft genome assembly of the inbred line PH207 to complement and compare with the existing B73 reference sequence. B73 is a founder of the Stiff Stalk germplasm pool, while PH207 is a founder of Iodent germplasm, both of which have contributed substantially to the production of temperate commercial maize and are combined to make heterotic hybrids. Comparison of these two assemblies revealed over 2500 genes present in only one of the two genotypes and 136 gene families that have undergone extensive expansion or contraction. Transcriptome profiling revealed extensive expression variation, with as many as 10,564 differentially expressed transcripts and 7128 transcripts expressed in only one of the two genotypes in a single tissue. Genotype-specific genes were more likely to have tissue/condition-specific expression and lower transcript abundance. The availability of a high-quality genome assembly for the elite maize inbred PH207 expands our knowledge of the breadth of natural genome and transcriptome variation in elite maize inbred lines across heterotic pools. PMID:27803309
The sequence and de novo assembly of the giant panda genome
Li, Ruiqiang; Fan, Wei; Tian, Geng; Zhu, Hongmei; He, Lin; Cai, Jing; Huang, Quanfei; Cai, Qingle; Li, Bo; Bai, Yinqi; Zhang, Zhihe; Zhang, Yaping; Wang, Wen; Li, Jun; Wei, Fuwen; Li, Heng; Jian, Min; Li, Jianwen; Zhang, Zhaolei; Nielsen, Rasmus; Li, Dawei; Gu, Wanjun; Yang, Zhentao; Xuan, Zhaoling; Ryder, Oliver A.; Leung, Frederick Chi-Ching; Zhou, Yan; Cao, Jianjun; Sun, Xiao; Fu, Yonggui; Fang, Xiaodong; Guo, Xiaosen; Wang, Bo; Hou, Rong; Shen, Fujun; Mu, Bo; Ni, Peixiang; Lin, Runmao; Qian, Wubin; Wang, Guodong; Yu, Chang; Nie, Wenhui; Wang, Jinhuan; Wu, Zhigang; Liang, Huiqing; Min, Jiumeng; Wu, Qi; Cheng, Shifeng; Ruan, Jue; Wang, Mingwei; Shi, Zhongbin; Wen, Ming; Liu, Binghang; Ren, Xiaoli; Zheng, Huisong; Dong, Dong; Cook, Kathleen; Shan, Gao; Zhang, Hao; Kosiol, Carolin; Xie, Xueying; Lu, Zuhong; Zheng, Hancheng; Li, Yingrui; Steiner, Cynthia C.; Lam, Tommy Tsan-Yuk; Lin, Siyuan; Zhang, Qinghui; Li, Guoqing; Tian, Jing; Gong, Timing; Liu, Hongde; Zhang, Dejin; Fang, Lin; Ye, Chen; Zhang, Juanbin; Hu, Wenbo; Xu, Anlong; Ren, Yuanyuan; Zhang, Guojie; Bruford, Michael W.; Li, Qibin; Ma, Lijia; Guo, Yiran; An, Na; Hu, Yujie; Zheng, Yang; Shi, Yongyong; Li, Zhiqiang; Liu, Qing; Chen, Yanling; Zhao, Jing; Qu, Ning; Zhao, Shancen; Tian, Feng; Wang, Xiaoling; Wang, Haiyin; Xu, Lizhi; Liu, Xiao; Vinar, Tomas; Wang, Yajun; Lam, Tak-Wah; Yiu, Siu-Ming; Liu, Shiping; Zhang, Hemin; Li, Desheng; Huang, Yan; Wang, Xia; Yang, Guohua; Jiang, Zhi; Wang, Junyi; Qin, Nan; Li, Li; Li, Jingxiang; Bolund, Lars; Kristiansen, Karsten; Wong, Gane Ka-Shu; Olson, Maynard; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun
2013-01-01
Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes. PMID:20010809
Gupta, Sonal; Nawaz, Kashif; Parween, Sabiha; Roy, Riti; Sahu, Kamlesh; Kumar Pole, Anil; Khandal, Hitaishi; Srivastava, Rishi; Kumar Parida, Swarup; Chattopadhyay, Debasis
2017-02-01
Cicer reticulatum L. is the wild progenitor of the fourth most important legume crop chickpea (C. arietinum L.). We assembled short-read sequences into 416 Mb draft genome of C. reticulatum and anchored 78% (327 Mb) of this assembly to eight linkage groups. Genome annotation predicted 25,680 protein-coding genes covering more than 90% of predicted gene space. The genome assembly shared a substantial synteny and conservation of gene orders with the genome of the model legume Medicago truncatula. Resistance gene homologs of wild and domesticated chickpeas showed high sequence homology and conserved synteny. Comparison of gene sequences and nucleotide diversity using 66 wild and domesticated chickpea accessions suggested that the desi type chickpea was genetically closer to the wild species than the kabuli type. Comparative analyses predicted gene flow between the wild and the cultivated species during domestication. Molecular diversity and population genetic structure determination using 15,096 genome-wide single nucleotide polymorphisms revealed an admixed domestication pattern among cultivated (desi and kabuli) and wild chickpea accessions belonging to three population groups reflecting significant influence of parentage or geographical origin for their cultivar-specific population classification. The assembly and the polymorphic sequence resources presented here would facilitate the study of chickpea domestication and targeted use of wild Cicer germplasms for agronomic trait improvement in chickpea. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Draft genomes of two blister beetles Hycleus cichorii and Hycleus phaleratus
Wu, Yuan-Ming; Li, Jiang
2018-01-01
Abstract Background Commonly known as blister beetles or Spanish fly, there are more than 1500 species in the Meloidae family (Hexapoda: Coleoptera: Tenebrionoidea) that produce the potent defensive blistering agent cantharidin. Cantharidin and its derivatives have been used to treat cancers such as liver, stomach, lung, and esophageal cancers. Hycleus cichorii and Hycleus phaleratus are the most commercially important blister beetles in China due to their ability to biosynthesize this potent vesicant. However, there is a lack of genome reference, which has hindered development of studies on the biosynthesis of cantharidin and a better understanding of its biology and pharmacology. Results We report 2 draft genomes and quantified gene sets for the blister beetles H. cichorii and H. phaleratus, 2 complex genomes with >72% repeats and approximately 1% heterozygosity, using Illumina sequencing data. An integrated assembly pipeline was performed for assembly, and most of the coding regions were obtained. Benchmarking universal single-copy orthologs (BUSCO) assessment showed that our assembly obtained more than 98% of the Endopterygota universal single-copy orthologs. Comparison analysis showed that the completeness of coding genes in our assembly was comparable to other beetle genomes such as Dendroctonus ponderosae and Agrilus planipennis. Gene annotation yielded 13 813 and 13 725 protein-coding genes in H. cichorii and H. phaleratus, of which approximately 89% were functionally annotated. BUSCO assessment showed that approximately 86% and 84% of the Endopterygota universal single-copy orthologs were annotated completely in these 2 gene sets, whose completeness is comparable to that of D. ponderosae and A. planipennis. Conclusions Assembly of both blister beetle genomes provides a valuable resource for future biosynthesis of cantharidin and comparative genomic studies of blister beetles and other beetles. PMID:29444297
Draft genomes of two blister beetles Hycleus cichorii and Hycleus phaleratus.
Wu, Yuan-Ming; Li, Jiang; Chen, Xiang-Sheng
2018-03-01
Commonly known as blister beetles or Spanish fly, there are more than 1500 species in the Meloidae family (Hexapoda: Coleoptera: Tenebrionoidea) that produce the potent defensive blistering agent cantharidin. Cantharidin and its derivatives have been used to treat cancers such as liver, stomach, lung, and esophageal cancers. Hycleus cichorii and Hycleus phaleratus are the most commercially important blister beetles in China due to their ability to biosynthesize this potent vesicant. However, there is a lack of genome reference, which has hindered development of studies on the biosynthesis of cantharidin and a better understanding of its biology and pharmacology. We report 2 draft genomes and quantified gene sets for the blister beetles H. cichorii and H. phaleratus, 2 complex genomes with >72% repeats and approximately 1% heterozygosity, using Illumina sequencing data. An integrated assembly pipeline was performed for assembly, and most of the coding regions were obtained. Benchmarking universal single-copy orthologs (BUSCO) assessment showed that our assembly obtained more than 98% of the Endopterygota universal single-copy orthologs. Comparison analysis showed that the completeness of coding genes in our assembly was comparable to other beetle genomes such as Dendroctonus ponderosae and Agrilus planipennis. Gene annotation yielded 13 813 and 13 725 protein-coding genes in H. cichorii and H. phaleratus, of which approximately 89% were functionally annotated. BUSCO assessment showed that approximately 86% and 84% of the Endopterygota universal single-copy orthologs were annotated completely in these 2 gene sets, whose completeness is comparable to that of D. ponderosae and A. planipennis. Assembly of both blister beetle genomes provides a valuable resource for future biosynthesis of cantharidin and comparative genomic studies of blister beetles and other beetles.
Endogenous hepadnaviruses, bornaviruses and circoviruses in snakes.
Gilbert, C; Meik, J M; Dashevsky, D; Card, D C; Castoe, T A; Schaack, S
2014-09-22
We report the discovery of endogenous viral elements (EVEs) from Hepadnaviridae, Bornaviridae and Circoviridae in the speckled rattlesnake, Crotalus mitchellii, the first viperid snake for which a draft whole genome sequence assembly is available. Analysis of the draft assembly reveals genome fragments from the three virus families were inserted into the genome of this snake over the past 50 Myr. Cross-species PCR screening of orthologous loci and computational scanning of the python and king cobra genomes reveals that circoviruses integrated most recently (within the last approx. 10 Myr), whereas bornaviruses and hepadnaviruses integrated at least approximately 13 and approximately 50 Ma, respectively. This is, to our knowledge, the first report of circo-, borna- and hepadnaviruses in snakes and the first characterization of non-retroviral EVEs in non-avian reptiles. Our study provides a window into the historical dynamics of viruses in these host lineages and shows that their evolution involved multiple host-switches between mammals and reptiles. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
The draft genome of tropical fruit durian (Durio zibethinus).
Teh, Bin Tean; Lim, Kevin; Yong, Chern Han; Ng, Cedric Chuan Young; Rao, Sushma Ramesh; Rajasegaran, Vikneswari; Lim, Weng Khong; Ong, Choon Kiat; Chan, Ki; Cheng, Vincent Kin Yuen; Soh, Poh Sheng; Swarup, Sanjay; Rozen, Steven G; Nagarajan, Niranjan; Tan, Patrick
2017-11-01
Durian (Durio zibethinus) is a Southeast Asian tropical plant known for its hefty, spine-covered fruit and sulfury and onion-like odor. Here we present a draft genome assembly of D. zibethinus, representing the third plant genus in the Malvales order and first in the Helicteroideae subfamily to be sequenced. Single-molecule sequencing and chromosome contact maps enabled assembly of the highly heterozygous durian genome at chromosome-scale resolution. Transcriptomic analysis showed upregulation of sulfur-, ethylene-, and lipid-related pathways in durian fruits. We observed paleopolyploidization events shared by durian and cotton and durian-specific gene expansions in MGL (methionine γ-lyase), associated with production of volatile sulfur compounds (VSCs). MGL and the ethylene-related gene ACS (aminocyclopropane-1-carboxylic acid synthase) were upregulated in fruits concomitantly with their downstream metabolites (VSCs and ethylene), suggesting a potential association between ethylene biosynthesis and methionine regeneration via the Yang cycle. The durian genome provides a resource for tropical fruit biology and agronomy.
Chen, Chaoyang; Sun, Chongran; Wu, Yi-Rui
2018-03-21
A wild-type solventogenic strain Clostridium diolis WST, isolated from mangrove sediments, was characterized to produce high amount of butanol and acetone with negligible level of ethanol and acids from glucose via a unique acetone-butanol (AB) fermentation pathway. Through the genomic sequencing, the assembled draft genome of strain WST is calculated to be 5.85 Mb with a GC content of 29.69% and contains 5263 genes that contribute to the annotation of 5049 protein-coding sequences. Within these annotated genes, the butanol dehydrogenase gene (bdh) was determined to be in a higher amount from strain WST compared to other Clostridial strains, which is positively related to its high-efficient production of butanol. Therefore, we present a draft genome sequence analysis of strain WST in this article that should facilitate to further understand the solventogenic mechanism of this special microorganism.
Sequencing and assembly of the 22-gb loblolly pine genome.
Zimin, Aleksey; Stevens, Kristian A; Crepeau, Marc W; Holtz-Morris, Ann; Koriabine, Maxim; Marçais, Guillaume; Puiu, Daniela; Roberts, Michael; Wegrzyn, Jill L; de Jong, Pieter J; Neale, David B; Salzberg, Steven L; Yorke, James A; Langley, Charles H
2014-03-01
Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer "super-reads," rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp.
A reference genome of the European beech (Fagus sylvatica L.).
Mishra, Bagdevi; Gupta, Deepak K; Pfenninger, Markus; Hickler, Thomas; Langer, Ewald; Nam, Bora; Paule, Juraj; Sharma, Rahul; Ulaszewski, Bartosz; Warmbier, Joanna; Burczyk, Jaroslaw; Thines, Marco
2018-06-01
The European beech is arguably the most important climax broad-leaved tree species in Central Europe, widely planted for its valuable wood. Here, we report the 542 Mb draft genome sequence of an up to 300-year-old individual (Bhaga) from an undisturbed stand in the Kellerwald-Edersee National Park in central Germany. Using a hybrid assembly approach, Illumina reads with short- and long-insert libraries, coupled with long Pacific Biosciences reads, we obtained an assembled genome size of 542 Mb, in line with flow cytometric genome size estimation. The largest scaffold was of 1.15 Mb, the N50 length was 145 kb, and the L50 count was 983. The assembly contained 0.12% of Ns. A Benchmarking with Universal Single-Copy Orthologs (BUSCO) analysis retrieved 94% complete BUSCO genes, well in the range of other high-quality draft genomes of trees. A total of 62,012 protein-coding genes were predicted, assisted by transcriptome sequencing. In addition, we are reporting an efficient method for extracting high-molecular-weight DNA from dormant buds, by which contamination by environmental bacteria and fungi was kept at a minimum. The assembled genome will be a valuable resource and reference for future population genomics studies on the evolution and past climate change adaptation of beech and will be helpful for identifying genes, e.g., involved in drought tolerance, in order to select and breed individuals to adapt forestry to climate change in Europe. A continuously updated genome browser and download page can be accessed from beechgenome.net, which will include future genome versions of the reference individual Bhaga, as new sequencing approaches develop.
Jeong, Young-Min; Kim, Namshin; Ahn, Byung Ohg; Oh, Mijin; Chung, Won-Hyong; Chung, Hee; Jeong, Seongmun; Lim, Ki-Byung; Hwang, Yoon-Jung; Kim, Goon-Bo; Baek, Seunghoon; Choi, Sang-Bong; Hyung, Dae-Jin; Lee, Seung-Won; Sohn, Seong-Han; Kwon, Soo-Jin; Jin, Mina; Seol, Young-Joo; Chae, Won Byoung; Choi, Keun Jin; Park, Beom-Seok; Yu, Hee-Ju; Mun, Jeong-Hwan
2016-07-01
This study presents a chromosome-scale draft genome sequence of radish that is assembled into nine chromosomal pseudomolecules. A comprehensive comparative genome analysis with the Brassica genomes provides genomic evidences on the evolution of the mesohexaploid radish genome. Radish (Raphanus sativus L.) is an agronomically important root vegetable crop and its origin and phylogenetic position in the tribe Brassiceae is controversial. Here we present a comprehensive analysis of the radish genome based on the chromosome sequences of R. sativus cv. WK10039. The radish genome was sequenced and assembled into 426.2 Mb spanning >98 % of the gene space, of which 344.0 Mb were integrated into nine chromosome pseudomolecules. Approximately 36 % of the genome was repetitive sequences and 46,514 protein-coding genes were predicted and annotated. Comparative mapping of the tPCK-like ancestral genome revealed that the radish genome has intermediate characteristics between the Brassica A/C and B genomes in the triplicated segments, suggesting an internal origin from the genus Brassica. The evolutionary characteristics shared between radish and other Brassica species provided genomic evidences that the current form of nine chromosomes in radish was rearranged from the chromosomes of hexaploid progenitor. Overall, this study provides a chromosome-scale draft genome sequence of radish as well as novel insight into evolution of the mesohexaploid genomes in the tribe Brassiceae.
Draft genome sequence of the fish pathogen Flavobacterium columnare strain CSF-298-10
USDA-ARS?s Scientific Manuscript database
We announce the genome assembly of Flavobacterium columnare strain CSF-298-10, a strain isolated from an outbreak of Columnaris disease at a commercial trout farm in Snake River Valley Idaho, USA. The complete genome consists of 13 contigs totaling 3,284,579 bp, average G+C content of 31.5% and 2933...
Genomics - the new rock and roll?
Dunham, I
2000-10-01
The end of the beginning of the Human Genome Project was announced on 26 June when the working draft or first assembly was announced. Here, Ian Dunham who led the group at the Sanger Centre that produced the first complete sequence of a human chromosome reflects on how it felt to be with the genome project from the beginning.
The Genome of the Cucumber, Cucumis Sativus L
USDA-ARS?s Scientific Manuscript database
Cucumber is an economically important crop as well as a model system for sex determination studies and plant vascular biology. Here we report the draft genome sequence of Cucumis sativus var. sativus L., assembled using a novel combination of traditional Sanger and next-generation Illumina GA sequen...
Dichosa, Armand E. K.; Davenport, Karen W.; Li, Po-E; ...
2015-03-19
In this study, we report here the genome sequence of Thauera sp. strain SWB20, isolated from a Singaporean wastewater treatment facility using gel microdroplets (GMDs) and single-cell genomics (SCG). This approach provided a single clonal microcolony that was sufficient to obtain a 4.9-Mbp genome assembly of an ecologically relevant Thauera species.
Li, Xi; Zhu, Yongze; Shen, Mengyuan; Du, Jing; Zhang, Lei; Wang, Dairong
2018-03-01
Enterobacter cloacae is one of the major pathogens responsible for a variety of human infections. Here we report the draft genome sequence of multidrug-resistant E. cloacae strain HBY isolated from a female patient in China. Whole genomic DNA of E. cloacae strain HBY was extracted and was sequenced using an Illumina HiSeq™ 2000 platform. The generated sequence reads were assembled using CLC Genomics Workbench. The draft genome was annotated using Rapid Annotations using Subsystems Technology (RAST), and the presence of antimicrobial resistance genes was identified. The 5799439-bp genome contains various antimicrobial resistance genes conferring resistance to aminoglycosides, β-lactams, fosfomycin, macrolides, sulphonamides and fluoroquinolones. Notably, the strain was identified to carry two main carbapenemase genes (bla KPC-2 and bla NDM-1 ). The genome sequence reported in this study will provide valuable information to understand antibiotic resistance mechanisms in this strain. It is important to monitor the spread strains of Enterobacter sp. encoding both of these carbapenemase genes. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Draft genome of the protandrous Chinese black porgy, Acanthopagrus schlegelii.
Zhang, Zhiyong; Zhang, Kai; Chen, Shuyin; Zhang, Zhiwei; Zhang, Jinyong; You, Xinxin; Bian, Chao; Xu, Jin; Jia, Chaofeng; Qiang, Jun; Zhu, Fei; Li, Hongxia; Liu, Hailin; Shen, Dehua; Ren, Zhonghong; Chen, Jieming; Li, Jia; Gao, Tianheng; Gu, Ruobo; Xu, Junmin; Shi, Qiong; Xu, Pao
2018-04-01
As one of the most popular and valuable commercial marine fishes in China and East Asian countries, the Chinese black porgy (Acanthopagrus schlegelii), also known as the blackhead seabream, has some attractive characteristics such as fast growth rate, good meat quality, resistance to diseases, and excellent adaptability to various environments. Furthermore, the black porgy is a good model for investigating sex changes in fish due to its protandrous hermaphroditism. Here, we obtained a high-quality genome assembly of this interesting teleost species and performed a genomic survey on potential genes associated with the sex-change phenomenon. We generated 175.4 gigabases (Gb) of clean sequence reads using a whole-genome shotgun sequencing strategy. The final genome assembly is approximately 688.1 megabases (Mb), accounting for 93% of the estimated genome size (739.6 Mb). The achieved scaffold N50 is 7.6 Mb, reaching a relatively high level among sequenced fish species. We identified 19 465 protein-coding genes, which had an average transcript length of 17.3 kb. By performing a comparative genomic analysis, we found 3 types of genes potentially associated with sex change, which are useful for studying the genetic basis of the protandrous hermaphroditism. We provide a draft genome assembly of the Chinese black porgy and discuss the potential genetic mechanisms of sex change. These data are also an important resource for studying the biology and for facilitating breeding of this economically important fish.
Draft sequencing and analysis of the genome of pufferfish Takifugu flavidus.
Gao, Yang; Gao, Qiang; Zhang, Huan; Wang, Lingling; Zhang, Fuchong; Yang, Chuanyan; Song, Linsheng
2014-12-01
The pufferfish Takifugu flavidus is an important economic species due to its outstanding flavour and high market value. It has been regarded as an excellent model of genetic study for decades as well. In the present study, three mate-pair libraries of T. flavidus genome were sequenced by the SOLiD 4 next-generation sequencing platform, and the draft genome was constructed with the short reads using an assisted assembly strategy. The draft consists of 50,947 scaffolds with an N50 value of 305.7 kb, and the average GC content was 45.2%. The combined length of repetitive sequences was 26.5 Mb, which accounted for 6.87% of the genome, indicating that the compactness of T. flavidus genome was approximative with that of T. rubripes genome. A total of 1,253 non-coding RNA genes and 30,285 protein-encoding genes were assigned to the genome. There were 132,775 and 394 presumptive genes playing roles in the colour pattern variation, the relatively slow growth and the lipid metabolism, respectively. Among them, genes involved in the microtubule-dependent transport system, angiogenesis, decapentaplegic pathway and lipid mobilization were significantly expanded in the T. flavidus genome. This draft genome provides a valuable resource for understanding and improving both fundamental and applied research with pufferfish in the future. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Thompson, Peter C; Zarlenga, Dante S; Liu, Ming-Yuan; Rosenthal, Benjamin M
2017-09-01
Genome assemblies can form the basis of comparative analyses fostering insight into the evolutionary genetics of a parasite's pathogenicity, host-pathogen interactions, environmental constraints and invasion biology; however, the length and complexity of many parasite genomes has hampered the development of well-resolved assemblies. In order to improve Trichinella genome assemblies, the genome of the sylvatic encapsulated species Trichinella murrelli was sequenced using third-generation, long-read technology and, using syntenic comparisons, scaffolded to a reference genome assembly of Trichinella spiralis, markedly improving both. A high-quality draft assembly for T. murrelli was achieved that totalled 63·2 Mbp, half of which was condensed into 26 contigs each longer than 571 000 bp. When compared with previous assemblies for parasites in the genus, ours required 10-fold fewer contigs, which were five times longer, on average. Better assembly across repetitive regions also enabled resolution of 8 Mbp of previously indeterminate sequence. Furthermore, syntenic comparisons identified widespread scaffold misassemblies in the T. spiralis reference genome. The two new assemblies, organized for the first time into three chromosomal scaffolds, will be valuable resources for future studies linking phenotypic traits within each species to their underlying genetic bases.
Garza-Ramos, Ulises; Tamayo-Legorreta, Elsa; Arellano-Quintanilla, Doris María; Rodriguez-Medina, Nadia; Silva-Sanchez, Jesús; Catalan-Najera, Juan; Rocha-Martínez, Marisol Karina; Bravo-Díaz, María Asunción; Alpuche-Aranda, Celia
2018-03-08
A colistin-resistant mcr-1 -carrying Escherichia coli strain, RC2-007, was isolated from a swine farm in Mexico. This extraintestinal and uropathogenic strain of E. coli belongs to serotype O89:H9 and sequence type 744. Assembly and annotation resulted in a 4.9-Mb draft genome that revealed the presence of plasmid-mediated mcr-1 -IS ApI1 genes as part of a prophage. Copyright © 2018 Garza-Ramos et al.
The genome of the fire ant Solenopsis invicta
USDA-ARS?s Scientific Manuscript database
Ants have evolved very complex societies and are key ecosystem members. Some of them are also major pests, as exemplified by the fire ant Solenopsis invicta. We present here the draft genome of S. invicta, assembled from 454 and Illumina reads obtained from a focal haploid male and his brothers. In ...
Das, Abhishek; Panda, Arijit; Singh, Deeksha; Chandrababunaidu, Mathu Malar; Mishra, Gyan Prakash; Bhan, Sushma
2015-01-01
Scytonema tolypothrichoides VB-61278, a terrestrial cyanobacterium, can be exploited to produce commercially important products. Here, we report for the first time a 10-Mb draft genome assembly of S. tolypothrichoides VB-61278, with 214 scaffolds and 7,148 putative protein-coding genes. PMID:25838486
Kim, Hyung Jun; Jang, Soojin
2017-12-01
Staphylococcus haemolyticus is the second most frequently isolated coagulase-negative staphylococci from blood cultures. Moreover, multidrug resistance associated with the genome flexibility of S. haemolyticus has been increasingly reported worldwide. Here we report the draft genome sequence of multidrug-resistant S. haemolyticus IPK_TSA25 isolated from a building surface in South Korea. Genomic DNA of S. haemolyticus IPK_TSA25 was sequenced using the PacBio RS II sequencing platform. Generated reads were assembled using PacBio SMRT Analysis 2.3.0. The draft genome was annotated and antibiotic resistance genes were identified. The genome of 2517398bp contains various antibiotic resistance genes associated with resistance to β-lactams, aminoglycosides and macrolides. Genome analysis also revealed chromosomal integration of the full-length Staphylococcus aureus plasmid pS0385-1 containing a tetracycline resistance gene. The genome sequence reported in this study will provide valuable information to understand the flexibility of the S. haemolyticus genome, which facilitates acquisition of antibiotic resistance genes and contributes to the dissemination of antibiotic resistance by this emerging pathogen. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Pasquali, Frédérique; Palma, Federica; Guillier, Laurent; Lucchi, Alex; De Cesare, Alessandra; Manfreda, Gerardo
2018-01-01
Listeria monocytogenes is a foodborne pathogen adapted to survive and persist in multiple environments. Following two previous studies on prevalence and virulence of L. monocytogenes ST121 and ST14 repeatedly collected in a the same rabbit-meat processing plant, the research questions of the present study were to: (1) assess persistence of L. monocytogenes isolates from the rabbit-plant; (2) select genes associated to physiological adaptation to the food-processing environment; (3) compare presence/absence/truncation of these genes in newly sequenced and publicly available ST121 and ST14 genomes. A total of 273 draft genomes including ST121 and ST14 newly sequenced and publicly available draft genomes were analyzed. Whole-genome Single Nucleotide Polymorfism (wgSNP) analysis was performed separately on the assemblies of ST121 and ST14 draft genomes. SNPs alignments were used to infer phylogeny. A dataset of L. monocytogenes ecophysiology genes was built based on a comprehensive literature review. The 94 selected genes were screened on the assemblies of all ST121 and ST14 draft genomes. Significant gene enrichments were evaluated by statistical analyses. A persistent ST14 clone, including 23 out of 27 newly sequenced genomes, was circulating in the rabbit-meat plant along with two not persistent clones. A significant enrichment was observed in ST121 genomes concerning stress survival islet 2 (SSI-2) (alkaline and oxidative stress), qacH gene (resistance to benzalkonium chloride), cadA1C gene cassette (resistance to 70 mg/l of cadmium chloride) and a truncated version of actA gene (biofilm formation). Conversely, ST14 draft genomes were enriched with a full-length version of actA gene along with the Listeria Genomic Island 2 (LGI 2) including the ars operon (arsenic resistance) and the cadA4C gene cassette (resistance to 35 mg/l of cadmium chloride). Phenotypic tests confirmed ST121 as a weak biofilm producer in comparison to ST14. In conclusion, ST121 carried the qacH gene and was phenotypically resistant to quaternary ammonium compounds. This property might contribute to the high prevalence of ST121 in food processing plants. ST14 showed greater ability to form biofilms, which might contribute to the occasional colonization and persistence on harborage sites where sanitizing procedures are difficult to display. PMID:29662481
Pilkington, Sarah M; Crowhurst, Ross; Hilario, Elena; Nardozza, Simona; Fraser, Lena; Peng, Yongyan; Gunaseelan, Kularajathevan; Simpson, Robert; Tahir, Jibran; Deroles, Simon C; Templeton, Kerry; Luo, Zhiwei; Davy, Marcus; Cheng, Canhong; McNeilage, Mark; Scaglione, Davide; Liu, Yifei; Zhang, Qiong; Datson, Paul; De Silva, Nihal; Gardiner, Susan E; Bassett, Heather; Chagné, David; McCallum, John; Dzierzon, Helge; Deng, Cecilia; Wang, Yen-Yi; Barron, Lorna; Manako, Kelvina; Bowen, Judith; Foster, Toshi M; Erridge, Zoe A; Tiffin, Heather; Waite, Chethi N; Davies, Kevin M; Grierson, Ella P; Laing, William A; Kirk, Rebecca; Chen, Xiuyin; Wood, Marion; Montefiori, Mirco; Brummell, David A; Schwinn, Kathy E; Catanach, Andrew; Fullerton, Christina; Li, Dawei; Meiyalaghan, Sathiyamoorthy; Nieuwenhuizen, Niels; Read, Nicola; Prakash, Roneel; Hunter, Don; Zhang, Huaibi; McKenzie, Marian; Knäbel, Mareike; Harris, Alastair; Allan, Andrew C; Gleave, Andrew; Chen, Angela; Janssen, Bart J; Plunkett, Blue; Ampomah-Dwamena, Charles; Voogd, Charlotte; Leif, Davin; Lafferty, Declan; Souleyre, Edwige J F; Varkonyi-Gasic, Erika; Gambi, Francesco; Hanley, Jenny; Yao, Jia-Long; Cheung, Joey; David, Karine M; Warren, Ben; Marsh, Ken; Snowden, Kimberley C; Lin-Wang, Kui; Brian, Lara; Martinez-Sanchez, Marcela; Wang, Mindy; Ileperuma, Nadeesha; Macnee, Nikolai; Campin, Robert; McAtee, Peter; Drummond, Revel S M; Espley, Richard V; Ireland, Hilary S; Wu, Rongmei; Atkinson, Ross G; Karunairetnam, Sakuntala; Bulley, Sean; Chunkath, Shayhan; Hanley, Zac; Storey, Roy; Thrimawithana, Amali H; Thomson, Susan; David, Charles; Testolin, Raffaele; Huang, Hongwen; Hellens, Roger P; Schaffer, Robert J
2018-04-16
Most published genome sequences are drafts, and most are dominated by computational gene prediction. Draft genomes typically incorporate considerable sequence data that are not assigned to chromosomes, and predicted genes without quality confidence measures. The current Actinidia chinensis (kiwifruit) 'Hongyang' draft genome has 164 Mb of sequences unassigned to pseudo-chromosomes, and omissions have been identified in the gene models. A second genome of an A. chinensis (genotype Red5) was fully sequenced. This new sequence resulted in a 554.0 Mb assembly with all but 6 Mb assigned to pseudo-chromosomes. Pseudo-chromosomal comparisons showed a considerable number of translocation events have occurred following a whole genome duplication (WGD) event some consistent with centromeric Robertsonian-like translocations. RNA sequencing data from 12 tissues and ab initio analysis informed a genome-wide manual annotation, using the WebApollo tool. In total, 33,044 gene loci represented by 33,123 isoforms were identified, named and tagged for quality of evidential support. Of these 3114 (9.4%) were identical to a protein within 'Hongyang' The Kiwifruit Information Resource (KIR v2). Some proportion of the differences will be varietal polymorphisms. However, as most computationally predicted Red5 models required manual re-annotation this proportion is expected to be small. The quality of the new gene models was tested by fully sequencing 550 cloned 'Hort16A' cDNAs and comparing with the predicted protein models for Red5 and both the original 'Hongyang' assembly and the revised annotation from KIR v2. Only 48.9% and 63.5% of the cDNAs had a match with 90% identity or better to the original and revised 'Hongyang' annotation, respectively, compared with 90.9% to the Red5 models. Our study highlights the need to take a cautious approach to draft genomes and computationally predicted genes. Our use of the manual annotation tool WebApollo facilitated manual checking and correction of gene models enabling improvement of computational prediction. This utility was especially relevant for certain types of gene families such as the EXPANSIN like genes. Finally, this high quality gene set will supply the kiwifruit and general plant community with a new tool for genomics and other comparative analysis.
Pasquali, Frédérique; Palma, Federica; Guillier, Laurent; Lucchi, Alex; De Cesare, Alessandra; Manfreda, Gerardo
2018-01-01
Listeria monocytogenes is a foodborne pathogen adapted to survive and persist in multiple environments. Following two previous studies on prevalence and virulence of L. monocytogenes ST121 and ST14 repeatedly collected in a the same rabbit-meat processing plant, the research questions of the present study were to: (1) assess persistence of L. monocytogenes isolates from the rabbit-plant; (2) select genes associated to physiological adaptation to the food-processing environment; (3) compare presence/absence/truncation of these genes in newly sequenced and publicly available ST121 and ST14 genomes. A total of 273 draft genomes including ST121 and ST14 newly sequenced and publicly available draft genomes were analyzed. Whole-genome Single Nucleotide Polymorfism (wgSNP) analysis was performed separately on the assemblies of ST121 and ST14 draft genomes. SNPs alignments were used to infer phylogeny. A dataset of L. monocytogenes ecophysiology genes was built based on a comprehensive literature review. The 94 selected genes were screened on the assemblies of all ST121 and ST14 draft genomes. Significant gene enrichments were evaluated by statistical analyses. A persistent ST14 clone, including 23 out of 27 newly sequenced genomes, was circulating in the rabbit-meat plant along with two not persistent clones. A significant enrichment was observed in ST121 genomes concerning stress survival islet 2 (SSI-2) (alkaline and oxidative stress), qacH gene (resistance to benzalkonium chloride), cadA1C gene cassette (resistance to 70 mg/l of cadmium chloride) and a truncated version of actA gene (biofilm formation). Conversely, ST14 draft genomes were enriched with a full-length version of actA gene along with the Listeria Genomic Island 2 (LGI 2) including the ars operon (arsenic resistance) and the cadA4C gene cassette (resistance to 35 mg/l of cadmium chloride). Phenotypic tests confirmed ST121 as a weak biofilm producer in comparison to ST14. In conclusion, ST121 carried the qacH gene and was phenotypically resistant to quaternary ammonium compounds. This property might contribute to the high prevalence of ST121 in food processing plants. ST14 showed greater ability to form biofilms, which might contribute to the occasional colonization and persistence on harborage sites where sanitizing procedures are difficult to display.
Genome sequence and genetic diversity of the common carp, Cyprinus carpio.
Xu, Peng; Zhang, Xiaofeng; Wang, Xumin; Li, Jiongtang; Liu, Guiming; Kuang, Youyi; Xu, Jian; Zheng, Xianhu; Ren, Lufeng; Wang, Guoliang; Zhang, Yan; Huo, Linhe; Zhao, Zixia; Cao, Dingchen; Lu, Cuiyun; Li, Chao; Zhou, Yi; Liu, Zhanjiang; Fan, Zhonghua; Shan, Guangle; Li, Xingang; Wu, Shuangxiu; Song, Lipu; Hou, Guangyuan; Jiang, Yanliang; Jeney, Zsigmond; Yu, Dan; Wang, Li; Shao, Changjun; Song, Lai; Sun, Jing; Ji, Peifeng; Wang, Jian; Li, Qiang; Xu, Liming; Sun, Fanyue; Feng, Jianxin; Wang, Chenghui; Wang, Shaolin; Wang, Baosen; Li, Yan; Zhu, Yaping; Xue, Wei; Zhao, Lan; Wang, Jintu; Gu, Ying; Lv, Weihua; Wu, Kejing; Xiao, Jingfa; Wu, Jiayan; Zhang, Zhang; Yu, Jun; Sun, Xiaowen
2014-11-01
The common carp, Cyprinus carpio, is one of the most important cyprinid species and globally accounts for 10% of freshwater aquaculture production. Here we present a draft genome of domesticated C. carpio (strain Songpu), whose current assembly contains 52,610 protein-coding genes and approximately 92.3% coverage of its paleotetraploidized genome (2n = 100). The latest round of whole-genome duplication has been estimated to have occurred approximately 8.2 million years ago. Genome resequencing of 33 representative individuals from worldwide populations demonstrates a single origin for C. carpio in 2 subspecies (C. carpio Haematopterus and C. carpio carpio). Integrative genomic and transcriptomic analyses were used to identify loci potentially associated with traits including scaling patterns and skin color. In combination with the high-resolution genetic map, the draft genome paves the way for better molecular studies and improved genome-assisted breeding of C. carpio and other closely related species.
A draft sequence of the rice genome (Oryza sativa L. ssp. indica).
Yu, Jun; Hu, Songnian; Wang, Jun; Wong, Gane Ka-Shu; Li, Songgang; Liu, Bin; Deng, Yajun; Dai, Li; Zhou, Yan; Zhang, Xiuqing; Cao, Mengliang; Liu, Jing; Sun, Jiandong; Tang, Jiabin; Chen, Yanjiong; Huang, Xiaobing; Lin, Wei; Ye, Chen; Tong, Wei; Cong, Lijuan; Geng, Jianing; Han, Yujun; Li, Lin; Li, Wei; Hu, Guangqiang; Huang, Xiangang; Li, Wenjie; Li, Jian; Liu, Zhanwei; Li, Long; Liu, Jianping; Qi, Qiuhui; Liu, Jinsong; Li, Li; Li, Tao; Wang, Xuegang; Lu, Hong; Wu, Tingting; Zhu, Miao; Ni, Peixiang; Han, Hua; Dong, Wei; Ren, Xiaoyu; Feng, Xiaoli; Cui, Peng; Li, Xianran; Wang, Hao; Xu, Xin; Zhai, Wenxue; Xu, Zhao; Zhang, Jinsong; He, Sijie; Zhang, Jianguo; Xu, Jichen; Zhang, Kunlin; Zheng, Xianwu; Dong, Jianhai; Zeng, Wanyong; Tao, Lin; Ye, Jia; Tan, Jun; Ren, Xide; Chen, Xuewei; He, Jun; Liu, Daofeng; Tian, Wei; Tian, Chaoguang; Xia, Hongai; Bao, Qiyu; Li, Gang; Gao, Hui; Cao, Ting; Wang, Juan; Zhao, Wenming; Li, Ping; Chen, Wei; Wang, Xudong; Zhang, Yong; Hu, Jianfei; Wang, Jing; Liu, Song; Yang, Jian; Zhang, Guangyu; Xiong, Yuqing; Li, Zhijie; Mao, Long; Zhou, Chengshu; Zhu, Zhen; Chen, Runsheng; Hao, Bailin; Zheng, Weimou; Chen, Shouyi; Guo, Wei; Li, Guojie; Liu, Siqi; Tao, Ming; Wang, Jian; Zhu, Lihuang; Yuan, Longping; Yang, Huanming
2002-04-05
We have produced a draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp. indica, by whole-genome shotgun sequencing. The genome was 466 megabases in size, with an estimated 46,022 to 55,615 genes. Functional coverage in the assembled sequences was 92.0%. About 42.2% of the genome was in exact 20-nucleotide oligomer repeats, and most of the transposons were in the intergenic regions between genes. Although 80.6% of predicted Arabidopsis thaliana genes had a homolog in rice, only 49.4% of predicted rice genes had a homolog in A. thaliana. The large proportion of rice genes with no recognizable homologs is due to a gradient in the GC content of rice coding sequences.
The genome of Theobroma cacao.
Argout, Xavier; Salse, Jerome; Aury, Jean-Marc; Guiltinan, Mark J; Droc, Gaetan; Gouzy, Jerome; Allegre, Mathilde; Chaparro, Cristian; Legavre, Thierry; Maximova, Siela N; Abrouk, Michael; Murat, Florent; Fouet, Olivier; Poulain, Julie; Ruiz, Manuel; Roguet, Yolande; Rodier-Goud, Maguy; Barbosa-Neto, Jose Fernandes; Sabot, Francois; Kudrna, Dave; Ammiraju, Jetty Siva S; Schuster, Stephan C; Carlson, John E; Sallet, Erika; Schiex, Thomas; Dievart, Anne; Kramer, Melissa; Gelley, Laura; Shi, Zi; Bérard, Aurélie; Viot, Christopher; Boccara, Michel; Risterucci, Ange Marie; Guignon, Valentin; Sabau, Xavier; Axtell, Michael J; Ma, Zhaorong; Zhang, Yufan; Brown, Spencer; Bourge, Mickael; Golser, Wolfgang; Song, Xiang; Clement, Didier; Rivallan, Ronan; Tahi, Mathias; Akaza, Joseph Moroh; Pitollat, Bertrand; Gramacho, Karina; D'Hont, Angélique; Brunel, Dominique; Infante, Diogenes; Kebe, Ismael; Costet, Pierre; Wing, Rod; McCombie, W Richard; Guiderdoni, Emmanuel; Quetier, Francis; Panaud, Olivier; Wincker, Patrick; Bocs, Stephanie; Lanaud, Claire
2011-02-01
We sequenced and assembled the draft genome of Theobroma cacao, an economically important tropical-fruit tree crop that is the source of chocolate. This assembly corresponds to 76% of the estimated genome size and contains almost all previously described genes, with 82% of these genes anchored on the 10 T. cacao chromosomes. Analysis of this sequence information highlighted specific expansion of some gene families during evolution, for example, flavonoid-related genes. It also provides a major source of candidate genes for T. cacao improvement. Based on the inferred paleohistory of the T. cacao genome, we propose an evolutionary scenario whereby the ten T. cacao chromosomes were shaped from an ancestor through eleven chromosome fusions.
Baptista, Rodrigo P; Reis-Cunha, Joao Luis; DeBarry, Jeremy D; Chiari, Egler; Kissinger, Jessica C; Bartholomeu, Daniella C; Macedo, Andrea M
2018-02-14
Next-generation sequencing (NGS) methods are low-cost high-throughput technologies that produce thousands to millions of sequence reads. Despite the high number of raw sequence reads, their short length, relative to Sanger, PacBio or Nanopore reads, complicates the assembly of genomic repeats. Many genome tools are available, but the assembly of highly repetitive genome sequences using only NGS short reads remains challenging. Genome assembly of organisms responsible for important neglected diseases such as Trypanosoma cruzi, the aetiological agent of Chagas disease, is known to be challenging because of their repetitive nature. Only three of six recognized discrete typing units (DTUs) of the parasite have their draft genomes published and therefore genome evolution analyses in the taxon are limited. In this study, we developed a computational workflow to assemble highly repetitive genomes via a combination of de novo and reference-based assembly strategies to better overcome the intrinsic limitations of each, based on Illumina reads. The highly repetitive genome of the human-infecting parasite T. cruzi 231 strain was used as a test subject. The combined-assembly approach shown in this study benefits from the reference-based assembly ability to resolve highly repetitive sequences and from the de novo capacity to assemble genome-specific regions, improving the quality of the assembly. The acceptable confidence obtained by analyzing our results showed that our combined approach is an attractive option to assemble highly repetitive genomes with NGS short reads. Phylogenomic analysis including the 231 strain, the first representative of DTU III whose genome was sequenced, was also performed and provides new insights into T. cruzi genome evolution.
De Novo Genome and Transcriptome Assembly of the Canadian Beaver (Castor canadensis).
Lok, Si; Paton, Tara A; Wang, Zhuozhi; Kaur, Gaganjot; Walker, Susan; Yuen, Ryan K C; Sung, Wilson W L; Whitney, Joseph; Buchanan, Janet A; Trost, Brett; Singh, Naina; Apresto, Beverly; Chen, Nan; Coole, Matthew; Dawson, Travis J; Ho, Karen; Hu, Zhizhou; Pullenayegum, Sanjeev; Samler, Kozue; Shipstone, Arun; Tsoi, Fiona; Wang, Ting; Pereira, Sergio L; Rostami, Pirooz; Ryan, Carol Ann; Tong, Amy Hin Yan; Ng, Karen; Sundaravadanam, Yogi; Simpson, Jared T; Lim, Burton K; Engstrom, Mark D; Dutton, Christopher J; Kerr, Kevin C R; Franke, Maria; Rapley, William; Wintle, Richard F; Scherer, Stephen W
2017-02-09
The Canadian beaver ( Castor canadensis ) is the largest indigenous rodent in North America. We report a draft annotated assembly of the beaver genome, the first for a large rodent and the first mammalian genome assembled directly from uncorrected and moderate coverage (< 30 ×) long reads generated by single-molecule sequencing. The genome size is 2.7 Gb estimated by k-mer analysis. We assembled the beaver genome using the new Canu assembler optimized for noisy reads. The resulting assembly was refined using Pilon supported by short reads (80 ×) and checked for accuracy by congruency against an independent short read assembly. We scaffolded the assembly using the exon-gene models derived from 9805 full-length open reading frames (FL-ORFs) constructed from the beaver leukocyte and muscle transcriptomes. The final assembly comprised 22,515 contigs with an N50 of 278,680 bp and an N50-scaffold of 317,558 bp. Maximum contig and scaffold lengths were 3.3 and 4.2 Mb, respectively, with a combined scaffold length representing 92% of the estimated genome size. The completeness and accuracy of the scaffold assembly was demonstrated by the precise exon placement for 91.1% of the 9805 assembled FL-ORFs and 83.1% of the BUSCO (Benchmarking Universal Single-Copy Orthologs) gene set used to assess the quality of genome assemblies. Well-represented were genes involved in dentition and enamel deposition, defining characteristics of rodents with which the beaver is well-endowed. The study provides insights for genome assembly and an important genomics resource for Castoridae and rodent evolutionary biology. Copyright © 2017 Lok et al.
De Novo Genome and Transcriptome Assembly of the Canadian Beaver (Castor canadensis)
Lok, Si; Paton, Tara A.; Wang, Zhuozhi; Kaur, Gaganjot; Walker, Susan; Yuen, Ryan K. C.; Sung, Wilson W. L.; Whitney, Joseph; Buchanan, Janet A.; Trost, Brett; Singh, Naina; Apresto, Beverly; Chen, Nan; Coole, Matthew; Dawson, Travis J.; Ho, Karen; Hu, Zhizhou; Pullenayegum, Sanjeev; Samler, Kozue; Shipstone, Arun; Tsoi, Fiona; Wang, Ting; Pereira, Sergio L.; Rostami, Pirooz; Ryan, Carol Ann; Tong, Amy Hin Yan; Ng, Karen; Sundaravadanam, Yogi; Simpson, Jared T.; Lim, Burton K.; Engstrom, Mark D.; Dutton, Christopher J.; Kerr, Kevin C. R.; Franke, Maria; Rapley, William; Wintle, Richard F.; Scherer, Stephen W.
2017-01-01
The Canadian beaver (Castor canadensis) is the largest indigenous rodent in North America. We report a draft annotated assembly of the beaver genome, the first for a large rodent and the first mammalian genome assembled directly from uncorrected and moderate coverage (< 30 ×) long reads generated by single-molecule sequencing. The genome size is 2.7 Gb estimated by k-mer analysis. We assembled the beaver genome using the new Canu assembler optimized for noisy reads. The resulting assembly was refined using Pilon supported by short reads (80 ×) and checked for accuracy by congruency against an independent short read assembly. We scaffolded the assembly using the exon–gene models derived from 9805 full-length open reading frames (FL-ORFs) constructed from the beaver leukocyte and muscle transcriptomes. The final assembly comprised 22,515 contigs with an N50 of 278,680 bp and an N50-scaffold of 317,558 bp. Maximum contig and scaffold lengths were 3.3 and 4.2 Mb, respectively, with a combined scaffold length representing 92% of the estimated genome size. The completeness and accuracy of the scaffold assembly was demonstrated by the precise exon placement for 91.1% of the 9805 assembled FL-ORFs and 83.1% of the BUSCO (Benchmarking Universal Single-Copy Orthologs) gene set used to assess the quality of genome assemblies. Well-represented were genes involved in dentition and enamel deposition, defining characteristics of rodents with which the beaver is well-endowed. The study provides insights for genome assembly and an important genomics resource for Castoridae and rodent evolutionary biology. PMID:28087693
Bringing the fathead minnow (Pimephales promelas) into the ...
The fathead minnow (Pimephales promelas) is a well-established ecotoxicological model organism that has been widely used for regulatory ecotoxicity testing and research for over a half century. Throughout this time, a lot of knowledge has been gained about the fathead minnow’s biological responses to various xenobiotics. However, despite its importance as a model organism, the fathead minnow still has few publicly available gene sequences. Recently, Burns et al. (2015; Environ. Toxicol. Chem. 35:212) described the sequencing and de-novo assembly of the fathead minnow genome. Two draft genome assemblies are now publicly available on the GenBank database. However, on their own the draft assemblies remain of limited use to researchers who are primarily interested in the functional units of the genome, i.e. the genes. In the present study, an annotation pipeline, consisting of gene prediction, evidence alignment, and data synthesis, was applied to the fathead minnow SOAPdenovo assembly. Ab initio gene prediction was performed using AUGUSTUS, which provided a starting point of 43,345 gene predictions. Fathead minnow Expressed Sequence Tags (ESTs) and zebrafish protein-coding sequences (CDSs) were then aligned to the assembly using the corresponding spliced alignment methods of the program Exonerate. Of the over 240,000 EST alignments, 73% were successfully aligned with 90% or greater sequence identity and query coverage. Similarly, 39% of nearly 45,000 zebrafish co
Das, Abhishek; Panda, Arijit; Singh, Deeksha; Chandrababunaidu, Mathu Malar; Mishra, Gyan Prakash; Bhan, Sushma; Adhikary, Siba Prasad; Tripathy, Sucheta
2015-04-02
Scytonema tolypothrichoides VB-61278, a terrestrial cyanobacterium, can be exploited to produce commercially important products. Here, we report for the first time a 10-Mb draft genome assembly of S. tolypothrichoides VB-61278, with 214 scaffolds and 7,148 putative protein-coding genes. Copyright © 2015 Das et al.
Gonzalez-Esquer, C. Raul; Twary, Scott N.; Hovde, Blake T.; ...
2018-01-25
Picochlorum soloecismus is a halotolerant, fast-growing, and moderate-lipid-producing microalga that is being evaluated as a renewable feedstock for biofuel production. Herein, we report on an improved high-quality draft assembly and annotation for the nuclear, chloroplast, and mitochondrial genomes of P. soloecismus DOE 101.
The value of new genome references.
Worley, Kim C; Richards, Stephen; Rogers, Jeffrey
2017-09-15
Genomic information has become a ubiquitous and almost essential aspect of biological research. Over the last 10-15 years, the cost of generating sequence data from DNA or RNA samples has dramatically declined and our ability to interpret those data increased just as remarkably. Although it is still possible for biologists to conduct interesting and valuable research on species for which genomic data are not available, the impact of having access to a high quality whole genome reference assembly for a given species is nothing short of transformational. Research on a species for which we have no DNA or RNA sequence data is restricted in fundamental ways. In contrast, even access to an initial draft quality genome (see below for definitions) opens a wide range of opportunities that are simply not available without that reference genome assembly. Although a complete discussion of the impact of genome sequencing and assembly is beyond the scope of this short paper, the goal of this review is to summarize the most common and highest impact contributions that whole genome sequencing and assembly has had on comparative and evolutionary biology. Copyright © 2016. Published by Elsevier Inc.
Jang, Hyein; Addy, Nicole; Ewing, Laura; Jean-Gilles Beaubrun, Junia; Lee, YouYoung; Woo, JungHa; Negrete, Flavia; Finkelstein, Samantha; Tall, Ben D; Lehner, Angelika; Eshwar, Athmanya; Gopinath, Gopal R
2018-04-12
Here, we present draft genome sequences of 29 Cronobacter sakazakii isolates obtained from foods of plant origin and dried-food manufacturing facilities. Assemblies and annotations resulted in genome sizes ranging from 4.3 to 4.5 Mb and 3,977 to 4,256 gene-coding sequences with G+C contents of ∼57.0%.
USDA-ARS?s Scientific Manuscript database
This study reports a de novo assembled draft genome sequence of Xylella fastidiosa subsp. multiplex strain BB01 causing blueberry bacterial leaf scorch in Georgia, USA. The BB01 genome is 2,517,579 bp with a G+C content of 51.8% and 2,943 open reading frames (ORFs) and 48 RNA genes....
Jiang, Yujia; Lu, Jiasheng; Chen, Tianpeng; Yan, Wei; Dong, Weiliang; Zhou, Jie; Zhang, Wenming; Ma, Jiangfeng; Jiang, Min; Xin, Fengxue
2018-05-23
A novel butanogenic Clostridium sp. NJ4 was successfully isolated and characterized, which could directly produce relatively high titer of butanol from inulin through consolidated bioprocessing (CBP). The assembled draft genome of strain NJ4 is 4.09 Mp, containing 3891 encoded protein sequences with G+C content of 30.73%. Among these annotated genes, a levanase, a hypothetical inulinase, and two bifunctional alcohol/aldehyde dehydrogenases (AdhE) were found to play key roles in the achievement of ABE production from inulin through CBP.
Draft genome sequence of the rubber tree Hevea brasiliensis.
Rahman, Ahmad Yamin Abdul; Usharraj, Abhilash O; Misra, Biswapriya B; Thottathil, Gincy P; Jayasekaran, Kandakumar; Feng, Yun; Hou, Shaobin; Ong, Su Yean; Ng, Fui Ling; Lee, Ling Sze; Tan, Hock Siew; Sakaff, Muhd Khairul Luqman Muhd; Teh, Beng Soon; Khoo, Bee Feong; Badai, Siti Suriawati; Aziz, Nurohaida Ab; Yuryev, Anton; Knudsen, Bjarne; Dionne-Laporte, Alexandre; Mchunu, Nokuthula P; Yu, Qingyi; Langston, Brennick J; Freitas, Tracey Allen K; Young, Aaron G; Chen, Rui; Wang, Lei; Najimudin, Nazalan; Saito, Jennifer A; Alam, Maqsudul
2013-02-02
Hevea brasiliensis, a member of the Euphorbiaceae family, is the major commercial source of natural rubber (NR). NR is a latex polymer with high elasticity, flexibility, and resilience that has played a critical role in the world economy since 1876. Here, we report the draft genome sequence of H. brasiliensis. The assembly spans ~1.1 Gb of the estimated 2.15 Gb haploid genome. Overall, ~78% of the genome was identified as repetitive DNA. Gene prediction shows 68,955 gene models, of which 12.7% are unique to Hevea. Most of the key genes associated with rubber biosynthesis, rubberwood formation, disease resistance, and allergenicity have been identified. The knowledge gained from this genome sequence will aid in the future development of high-yielding clones to keep up with the ever increasing need for natural rubber.
Draft genome sequence of the rubber tree Hevea brasiliensis
2013-01-01
Background Hevea brasiliensis, a member of the Euphorbiaceae family, is the major commercial source of natural rubber (NR). NR is a latex polymer with high elasticity, flexibility, and resilience that has played a critical role in the world economy since 1876. Results Here, we report the draft genome sequence of H. brasiliensis. The assembly spans ~1.1 Gb of the estimated 2.15 Gb haploid genome. Overall, ~78% of the genome was identified as repetitive DNA. Gene prediction shows 68,955 gene models, of which 12.7% are unique to Hevea. Most of the key genes associated with rubber biosynthesis, rubberwood formation, disease resistance, and allergenicity have been identified. Conclusions The knowledge gained from this genome sequence will aid in the future development of high-yielding clones to keep up with the ever increasing need for natural rubber. PMID:23375136
Sánchez-Nieves, Rubén; Facciotti, Marc T; Saavedra-Collado, Sofía; Dávila-Santiago, Lizbeth; Rodríguez-Carrero, Roy; Montalvo-Rodríguez, Rafael
2016-03-01
The genus Halorubrum is a member of the family Halobacteriaceae which currently has the highest number of described species (31) of all the haloarchaea. Here we report the draft genome sequence of strain V5, a new species within this genus that was isolated from the solar salterns of Cabo Rojo, Puerto Rico. Assembly was performed and rendered the genome into 17 contigs (N50 = 515,834 bp), the largest of which contains 1,031,026 bp. The genome consists of 3.57 MB in length with G + C content of 67.6%. In general, the genome includes 4 rRNAs, 52 tRNAs, and 3246 protein-coding sequences. The NCBI accession number for this genome is LIST00000000 and the strain deposit number is CECT9000.
Icarus: visualizer for de novo assembly evaluation.
Mikheenko, Alla; Valin, Gleb; Prjibelski, Andrey; Saveliev, Vladislav; Gurevich, Alexey
2016-11-01
: Data visualization plays an increasingly important role in NGS data analysis. With advances in both sequencing and computational technologies, it has become a new bottleneck in genomics studies. Indeed, evaluation of de novo genome assemblies is one of the areas that can benefit from the visualization. However, even though multiple quality assessment methods are now available, existing visualization tools are hardly suitable for this purpose. Here, we present Icarus-a novel genome visualizer for accurate assessment and analysis of genomic draft assemblies, which is based on the tool QUAST. Icarus can be used in studies where a related reference genome is available, as well as for non-model organisms. The tool is available online and as a standalone application. http://cab.spbu.ru/software/icarus CONTACT: aleksey.gurevich@spbu.ruSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Why Assembling Plant Genome Sequences Is So Challenging
Claros, Manuel Gonzalo; Bautista, Rocío; Guerrero-Fernández, Darío; Benzerki, Hicham; Seoane, Pedro; Fernández-Pozo, Noé
2012-01-01
In spite of the biological and economic importance of plants, relatively few plant species have been sequenced. Only the genome sequence of plants with relatively small genomes, most of them angiosperms, in particular eudicots, has been determined. The arrival of next-generation sequencing technologies has allowed the rapid and efficient development of new genomic resources for non-model or orphan plant species. But the sequencing pace of plants is far from that of animals and microorganisms. This review focuses on the typical challenges of plant genomes that can explain why plant genomics is less developed than animal genomics. Explanations about the impact of some confounding factors emerging from the nature of plant genomes are given. As a result of these challenges and confounding factors, the correct assembly and annotation of plant genomes is hindered, genome drafts are produced, and advances in plant genomics are delayed. PMID:24832233
USDA-ARS?s Scientific Manuscript database
Clostridium perfringens strain LLY_N11 is a commensal bacterial isolate from a healthy chicken that produced a necrotic enteritis in experimental studies. Here we present the assembly and annotation of its genome, which may provide further insights into improved understanding of the molecular mechan...
Addy, Nicole; Ewing, Laura; Jean-Gilles Beaubrun, Junia; Lee, YouYoung; Woo, JungHa; Negrete, Flavia; Finkelstein, Samantha; Tall, Ben D.; Lehner, Angelika; Eshwar, Athmanya; Gopinath, Gopal R.
2018-01-01
ABSTRACT Here, we present draft genome sequences of 29 Cronobacter sakazakii isolates obtained from foods of plant origin and dried-food manufacturing facilities. Assemblies and annotations resulted in genome sizes ranging from 4.3 to 4.5 Mb and 3,977 to 4,256 gene-coding sequences with G+C contents of ∼57.0%. PMID:29650569
Draft genome of the red harvester ant Pogonomyrmex barbatus.
Smith, Chris R; Smith, Christopher D; Robertson, Hugh M; Helmkampf, Martin; Zimin, Aleksey; Yandell, Mark; Holt, Carson; Hu, Hao; Abouheif, Ehab; Benton, Richard; Cash, Elizabeth; Croset, Vincent; Currie, Cameron R; Elhaik, Eran; Elsik, Christine G; Favé, Marie-Julie; Fernandes, Vilaiwan; Gibson, Joshua D; Graur, Dan; Gronenberg, Wulfila; Grubbs, Kirk J; Hagen, Darren E; Viniegra, Ana Sofia Ibarraran; Johnson, Brian R; Johnson, Reed M; Khila, Abderrahman; Kim, Jay W; Mathis, Kaitlyn A; Munoz-Torres, Monica C; Murphy, Marguerite C; Mustard, Julie A; Nakamura, Rin; Niehuis, Oliver; Nigam, Surabhi; Overson, Rick P; Placek, Jennifer E; Rajakumar, Rajendhran; Reese, Justin T; Suen, Garret; Tao, Shu; Torres, Candice W; Tsutsui, Neil D; Viljakainen, Lumi; Wolschin, Florian; Gadau, Jürgen
2011-04-05
We report the draft genome sequence of the red harvester ant, Pogonomyrmex barbatus. The genome was sequenced using 454 pyrosequencing, and the current assembly and annotation were completed in less than 1 y. Analyses of conserved gene groups (more than 1,200 manually annotated genes to date) suggest a high-quality assembly and annotation comparable to recently sequenced insect genomes using Sanger sequencing. The red harvester ant is a model for studying reproductive division of labor, phenotypic plasticity, and sociogenomics. Although the genome of P. barbatus is similar to other sequenced hymenopterans (Apis mellifera and Nasonia vitripennis) in GC content and compositional organization, and possesses a complete CpG methylation toolkit, its predicted genomic CpG content differs markedly from the other hymenopterans. Gene networks involved in generating key differences between the queen and worker castes (e.g., wings and ovaries) show signatures of increased methylation and suggest that ants and bees may have independently co-opted the same gene regulatory mechanisms for reproductive division of labor. Gene family expansions (e.g., 344 functional odorant receptors) and pseudogene accumulation in chemoreception and P450 genes compared with A. mellifera and N. vitripennis are consistent with major life-history changes during the adaptive radiation of Pogonomyrmex spp., perhaps in parallel with the development of the North American deserts.
GapBlaster-A Graphical Gap Filler for Prokaryote Genomes.
de Sá, Pablo H C G; Miranda, Fábio; Veras, Adonney; de Melo, Diego Magalhães; Soares, Siomar; Pinheiro, Kenny; Guimarães, Luis; Azevedo, Vasco; Silva, Artur; Ramos, Rommel T J
2016-01-01
The advent of NGS (Next Generation Sequencing) technologies has resulted in an exponential increase in the number of complete genomes available in biological databases. This advance has allowed the development of several computational tools enabling analyses of large amounts of data in each of the various steps, from processing and quality filtering to gap filling and manual curation. The tools developed for gap closure are very useful as they result in more complete genomes, which will influence downstream analyses of genomic plasticity and comparative genomics. However, the gap filling step remains a challenge for genome assembly, often requiring manual intervention. Here, we present GapBlaster, a graphical application to evaluate and close gaps. GapBlaster was developed via Java programming language. The software uses contigs obtained in the assembly of the genome to perform an alignment against a draft of the genome/scaffold, using BLAST or Mummer to close gaps. Then, all identified alignments of contigs that extend through the gaps in the draft sequence are presented to the user for further evaluation via the GapBlaster graphical interface. GapBlaster presents significant results compared to other similar software and has the advantage of offering a graphical interface for manual curation of the gaps. GapBlaster program, the user guide and the test datasets are freely available at https://sourceforge.net/projects/gapblaster2015/. It requires Sun JDK 8 and Blast or Mummer.
Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies
Sundquist, Andreas; Ronaghi, Mostafa; Tang, Haixu; Pevzner, Pavel; Batzoglou, Serafim
2007-01-01
While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology. PMID:17534434
Fernandes, Miriam R; Sellera, Fábio P; Moura, Quézia; Souza, Tiago A; Lincopan, Nilton
2018-03-01
Asymptomatic carriers can act as reservoirs of multidrug-resistant (MDR) bacteria. The aim of this study was to describe the draft genome sequence of a MDR Escherichia coli lineage recovered from a faecal sample of a healthy carrier. Genomic DNA was sequenced on an Illumina NextSeq platform. Sequence reads were de novo assembled using CLC Genomics Workbench and the whole genome sequence was evaluated through bioinformatics tools available from the Center of Genomic Epidemiology as well as additional in silico analysis. The genome size was calculated as 5178340 bp, with 5442 protein-coding sequences and 5492 total genes. Presence of the bla CTX-M-8 , bla CTX-M-55 and fosA3 genes was detected in addition to other antimicrobial resistance genes. Interestingly, the strain was assigned to serotype O8:H4-fimH97 and was classified within the highly virulent phylogroup B2. This draft genome can provide helpful information to elucidate genetic features that contribute to colonisation and adaptation of MDR and virulent pathogens in asymptomatic carriers. Copyright © 2018 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Single haplotype assembly of the human genome from a hydatidiform mole.
Steinberg, Karyn Meltz; Schneider, Valerie A; Graves-Lindsay, Tina A; Fulton, Robert S; Agarwala, Richa; Huddleston, John; Shiryev, Sergey A; Morgulis, Aleksandr; Surti, Urvashi; Warren, Wesley C; Church, Deanna M; Eichler, Evan E; Wilson, Richard K
2014-12-01
A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly. © 2014 Steinberg et al.; Published by Cold Spring Harbor Laboratory Press.
Single haplotype assembly of the human genome from a hydatidiform mole
Steinberg, Karyn Meltz; Schneider, Valerie A.; Graves-Lindsay, Tina A.; Fulton, Robert S.; Agarwala, Richa; Huddleston, John; Shiryev, Sergey A.; Morgulis, Aleksandr; Surti, Urvashi; Warren, Wesley C.; Church, Deanna M.; Eichler, Evan E.; Wilson, Richard K.
2014-01-01
A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly. PMID:25373144
Wang, Anqi; Wang, Zhanyu; Li, Zheng; Li, Lei M
2018-06-15
It is highly desirable to assemble genomes of high continuity and consistency at low cost. The current bottleneck of draft genome continuity using the second generation sequencing (SGS) reads is primarily caused by uncertainty among repetitive sequences. Even though the single-molecule real-time sequencing technology is very promising to overcome the uncertainty issue, its relatively high cost and error rate add burden on budget or computation. Many long-read assemblers take the overlap-layout-consensus (OLC) paradigm, which is less sensitive to sequencing errors, heterozygosity and variability of coverage. However, current assemblers of SGS data do not sufficiently take advantage of the OLC approach. Aiming at minimizing uncertainty, the proposed method BAUM, breaks the whole genome into regions by adaptive unique mapping; then the local OLC is used to assemble each region in parallel. BAUM can (i) perform reference-assisted assembly based on the genome of a close species (ii) or improve the results of existing assemblies that are obtained based on short or long sequencing reads. The tests on two eukaryote genomes, a wild rice Oryza longistaminata and a parrot Melopsittacus undulatus, show that BAUM achieved substantial improvement on genome size and continuity. Besides, BAUM reconstructed a considerable amount of repetitive regions that failed to be assembled by existing short read assemblers. We also propose statistical approaches to control the uncertainty in different steps of BAUM. http://www.zhanyuwang.xin/wordpress/index.php/2017/07/21/baum. Supplementary data are available at Bioinformatics online.
Draft genome sequence and genetic transformation of the oleaginous alga Nannochloropis gaditana
Radakovits, Randor; Jinkerson, Robert E.; Fuerstenberg, Susan I.; Tae, Hongseok; Settlage, Robert E.; Boore, Jeffrey L.; Posewitz, Matthew C.
2012-01-01
The potential use of algae in biofuels applications is receiving significant attention. However, none of the current algal model species are competitive production strains. Here we present a draft genome sequence and a genetic transformation method for the marine microalga Nannochloropsis gaditana CCMP526. We show that N. gaditana has highly favourable lipid yields, and is a promising production organism. The genome assembly includes nuclear (~29 Mb) and organellar genomes, and contains 9,052 gene models. We define the genes required for glycerolipid biogenesis and detail the differential regulation of genes during nitrogen-limited lipid biosynthesis. Phylogenomic analysis identifies genetic attributes of this organism, including unique stramenopile photosynthesis genes and gene expansions that may explain the distinguishing photoautotrophic phenotypes observed. The availability of a genome sequence and transformation methods will facilitate investigations into N. gaditana lipid biosynthesis and permit genetic engineering strategies to further improve this naturally productive alga. PMID:22353717
Draft genome sequence and genetic transformation of the oleaginous alga Nannochloropis gaditana.
Radakovits, Randor; Jinkerson, Robert E; Fuerstenberg, Susan I; Tae, Hongseok; Settlage, Robert E; Boore, Jeffrey L; Posewitz, Matthew C
2012-02-21
The potential use of algae in biofuels applications is receiving significant attention. However, none of the current algal model species are competitive production strains. Here we present a draft genome sequence and a genetic transformation method for the marine microalga Nannochloropsis gaditana CCMP526. We show that N. gaditana has highly favourable lipid yields, and is a promising production organism. The genome assembly includes nuclear (~29 Mb) and organellar genomes, and contains 9,052 gene models. We define the genes required for glycerolipid biogenesis and detail the differential regulation of genes during nitrogen-limited lipid biosynthesis. Phylogenomic analysis identifies genetic attributes of this organism, including unique stramenopile photosynthesis genes and gene expansions that may explain the distinguishing photoautotrophic phenotypes observed. The availability of a genome sequence and transformation methods will facilitate investigations into N. gaditana lipid biosynthesis and permit genetic engineering strategies to further improve this naturally productive alga.
Improved hybrid de novo genome assembly of domesticated apple (Malus x domestica).
Li, Xuewei; Kui, Ling; Zhang, Jing; Xie, Yinpeng; Wang, Liping; Yan, Yan; Wang, Na; Xu, Jidi; Li, Cuiying; Wang, Wen; van Nocker, Steve; Dong, Yang; Ma, Fengwang; Guan, Qingmei
2016-08-08
Domesticated apple (Malus × domestica Borkh) is a popular temperate fruit with high nutrient levels and diverse flavors. In 2012, global apple production accounted for at least one tenth of all harvested fruits. A high-quality apple genome assembly is crucial for the selection and breeding of new cultivars. Currently, a single reference genome is available for apple, assembled from 16.9 × genome coverage short reads via Sanger and 454 sequencing technologies. Although a useful resource, this assembly covers only ~89 % of the non-repetitive portion of the genome, and has a relatively short (16.7 kb) contig N50 length. These downsides make it difficult to apply this reference in transcriptive or whole-genome re-sequencing analyses. Here we present an improved hybrid de novo genomic assembly of apple (Golden Delicious), which was obtained from 76 Gb (~102 × genome coverage) Illumina HiSeq data and 21.7 Gb (~29 × genome coverage) PacBio data. The final draft genome is approximately 632.4 Mb, representing ~ 90 % of the estimated genome. The contig N50 size is 111,619 bp, representing a 7 fold improvement. Further annotation analyses predicted 53,922 protein-coding genes and 2,765 non-coding RNA genes. The new apple genome assembly will serve as a valuable resource for investigating complex apple traits at the genomic level. It is not only suitable for genome editing and gene cloning, but also for RNA-seq and whole-genome re-sequencing studies.
BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons
2011-01-01
Background Visualisation of genome comparisons is invaluable for helping to determine genotypic differences between closely related prokaryotes. New visualisation and abstraction methods are required in order to improve the validation, interpretation and communication of genome sequence information; especially with the increasing amount of data arising from next-generation sequencing projects. Visualising a prokaryote genome as a circular image has become a powerful means of displaying informative comparisons of one genome to a number of others. Several programs, imaging libraries and internet resources already exist for this purpose, however, most are either limited in the number of comparisons they can show, are unable to adequately utilise draft genome sequence data, or require a knowledge of command-line scripting for implementation. Currently, there is no freely available desktop application that enables users to rapidly visualise comparisons between hundreds of draft or complete genomes in a single image. Results BLAST Ring Image Generator (BRIG) can generate images that show multiple prokaryote genome comparisons, without an arbitrary limit on the number of genomes compared. The output image shows similarity between a central reference sequence and other sequences as a set of concentric rings, where BLAST matches are coloured on a sliding scale indicating a defined percentage identity. Images can also include draft genome assembly information to show read coverage, assembly breakpoints and collapsed repeats. In addition, BRIG supports the mapping of unassembled sequencing reads against one or more central reference sequences. Many types of custom data and annotations can be shown using BRIG, making it a versatile approach for visualising a range of genomic comparison data. BRIG is readily accessible to any user, as it assumes no specialist computational knowledge and will perform all required file parsing and BLAST comparisons automatically. Conclusions There is a clear need for a user-friendly program that can produce genome comparisons for a large number of prokaryote genomes with an emphasis on rapidly utilising unfinished or unassembled genome data. Here we present BRIG, a cross-platform application that enables the interactive generation of comparative genomic images via a simple graphical-user interface. BRIG is freely available for all operating systems at http://sourceforge.net/projects/brig/. PMID:21824423
BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons.
Alikhan, Nabil-Fareed; Petty, Nicola K; Ben Zakour, Nouri L; Beatson, Scott A
2011-08-08
Visualisation of genome comparisons is invaluable for helping to determine genotypic differences between closely related prokaryotes. New visualisation and abstraction methods are required in order to improve the validation, interpretation and communication of genome sequence information; especially with the increasing amount of data arising from next-generation sequencing projects. Visualising a prokaryote genome as a circular image has become a powerful means of displaying informative comparisons of one genome to a number of others. Several programs, imaging libraries and internet resources already exist for this purpose, however, most are either limited in the number of comparisons they can show, are unable to adequately utilise draft genome sequence data, or require a knowledge of command-line scripting for implementation. Currently, there is no freely available desktop application that enables users to rapidly visualise comparisons between hundreds of draft or complete genomes in a single image. BLAST Ring Image Generator (BRIG) can generate images that show multiple prokaryote genome comparisons, without an arbitrary limit on the number of genomes compared. The output image shows similarity between a central reference sequence and other sequences as a set of concentric rings, where BLAST matches are coloured on a sliding scale indicating a defined percentage identity. Images can also include draft genome assembly information to show read coverage, assembly breakpoints and collapsed repeats. In addition, BRIG supports the mapping of unassembled sequencing reads against one or more central reference sequences. Many types of custom data and annotations can be shown using BRIG, making it a versatile approach for visualising a range of genomic comparison data. BRIG is readily accessible to any user, as it assumes no specialist computational knowledge and will perform all required file parsing and BLAST comparisons automatically. There is a clear need for a user-friendly program that can produce genome comparisons for a large number of prokaryote genomes with an emphasis on rapidly utilising unfinished or unassembled genome data. Here we present BRIG, a cross-platform application that enables the interactive generation of comparative genomic images via a simple graphical-user interface. BRIG is freely available for all operating systems at http://sourceforge.net/projects/brig/.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gonzalez-Esquer, C. Raul; Twary, Scott N.; Hovde, Blake T.
Picochlorum soloecismus is a halotolerant, fast-growing, and moderate-lipid-producing microalga that is being evaluated as a renewable feedstock for biofuel production. Herein, we report on an improved high-quality draft assembly and annotation for the nuclear, chloroplast, and mitochondrial genomes of P. soloecismus DOE 101.
Radford, Devon R; Leon-Velarde, Carlos G; Chen, Shu; Hamidi Oskouei, Amir M; Balamurugan, Sampathkumar
2018-03-29
The genomes of two strains of Salmonella enterica subsp. enterica serovar Cubana and serovar Muenchen, isolated from dry hazelnuts and chia seeds, respectively, were sequenced using the Illumina MiSeq platform, assembled de novo using the overlap-layout-consensus method, and aligned to their respective most identical sequence genome scaffolds using MUMMER and BLAST searches. Copyright © 2018 Radford et al.
USDA-ARS?s Scientific Manuscript database
‘Bacillus vanillea’ XY18T (=CGMCC 8629 T =NCCB 100507 T) was isolated from cured vanilla beans and involved in the formation of vanilla aroma compounds. A draft genome of this type strain was assembled and yielded a length of 3.72 Mbp and a GC content of 46.3%. Comparative genomic analysis with its ...
Genome Sequence Analysis of the Biogenic Amine-Degrading Strain Lactobacillus casei 5b
Ladero, Victor; Herrero-Fresno, Ana; Martinez, Noelia; del Río, Beatriz; Linares, Daniel M.; Fernández, María; Martín, María Cruz
2014-01-01
We here report a 3.02-Mbp annotated draft assembly of the Lactobacillus casei 5b genome. The sequence of this biogenic amine-degrading dairy isolate may help identify the mechanisms involved in the catabolism of biogenic amines and perhaps shed light on ways to reduce the presence of these toxic compounds in food. PMID:24435875
Urasaki, Naoya; Takagi, Hiroki; Natsume, Satoshi; Uemura, Aiko; Taniai, Naoki; Miyagi, Norimichi; Fukushima, Mai; Suzuki, Shouta; Tarora, Kazuhiko; Tamaki, Moritoshi; Sakamoto, Moriaki; Terauchi, Ryohei; Matsumura, Hideo
2017-02-01
Bitter gourd (Momordica charantia) is an important vegetable and medicinal plant in tropical and subtropical regions globally. In this study, the draft genome sequence of a monoecious bitter gourd inbred line, OHB3-1, was analyzed. Through Illumina sequencing and de novo assembly, scaffolds of 285.5 Mb in length were generated, corresponding to ∼84% of the estimated genome size of bitter gourd (339 Mb). In this draft genome sequence, 45,859 protein-coding gene loci were identified, and transposable elements accounted for 15.3% of the whole genome. According to synteny mapping and phylogenetic analysis of conserved genes, bitter gourd was more related to watermelon (Citrullus lanatus) than to cucumber (Cucumis sativus) or melon (C. melo). Using RAD-seq analysis, 1507 marker loci were genotyped in an F2 progeny of two bitter gourd lines, resulting in an improved linkage map, comprising 11 linkage groups. By anchoring RAD tag markers, 255 scaffolds were assigned to the linkage map. Comparative analysis of genome sequences and predicted genes determined that putative trypsin-inhibitor and ribosome-inactivating genes were distinctive in the bitter gourd genome. These genes could characterize the bitter gourd as a medicinal plant. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Li, Xi; Sun, Long; Zhu, Yongze; Shen, Mengyuan; Tu, Yuexing
2018-04-14
The emergence of carbapenem-resistant Escherichia coli has become a serious challenge to manage in the clinic because of multidrug resistance. Here we report the draft genome sequence of NDM-3-producing E. coli strain NT1 isolated from a bloodstream infection in China. Whole genomic DNA of E. coli strain NT1 was extracted and was sequenced using an Illumina HiSeq™ X Ten platform. The generated sequence reads were assembled using CLC Genomics Workbench. The draft genome was annotated using Rapid Annotation using Subsystem Technology (RAST). Bioinformatics analysis was further performed. The genome size was calculated at 5,353 620bp, with 5297 protein-coding sequences and the presence of genes conferring resistance to aminoglycosides, β-lactams, quinolones, macrolides, phenicols, sulphonamides, tetracycline and trimethoprim. In addition, genes encoding virulence factors were also identified. To our knowledge, this is the first report of an E. coli strain producing NDM-3 isolated from a human bloodstream infection. The genome sequence will provide valuable information to understand antibiotic resistance mechanisms and pathogenic mechanisms in this strain. Close surveillance is urgently needed to monitor the spread of NDM-3-producing isolates. Copyright © 2018 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Saha, Surya; Hunter, Wayne B; Reese, Justin; Morgan, J Kent; Marutani-Hert, Mizuri; Huang, Hong; Lindeberg, Magdalen
2012-01-01
Diaphorina citri (Hemiptera: Psyllidae), the Asian citrus psyllid, is the insect vector of Ca. Liberibacter asiaticus, the causal agent of citrus greening disease. Sequencing of the D. citri metagenome has been initiated to gain better understanding of the biology of this organism and the potential roles of its bacterial endosymbionts. To corroborate candidate endosymbionts previously identified by rDNA amplification, raw reads from the D. citri metagenome sequence were mapped to reference genome sequences. Results of the read mapping provided the most support for Wolbachia and an enteric bacterium most similar to Salmonella. Wolbachia-derived reads were extracted using the complete genome sequences for four Wolbachia strains. Reads were assembled into a draft genome sequence, and the annotation assessed for the presence of features potentially involved in host interaction. Genome alignment with the complete sequences reveals membership of Wolbachia wDi in supergroup B, further supported by phylogenetic analysis of FtsZ. FtsZ and Wsp phylogenies additionally indicate that the Wolbachia strain in the Florida D. citri isolate falls into a sub-clade of supergroup B, distinct from Wolbachia present in Chinese D. citri isolates, supporting the hypothesis that the D. citri introduced into Florida did not originate from China.
Saha, Surya; Hunter, Wayne B.; Reese, Justin; Morgan, J. Kent; Marutani-Hert, Mizuri; Huang, Hong; Lindeberg, Magdalen
2012-01-01
Diaphorina citri (Hemiptera: Psyllidae), the Asian citrus psyllid, is the insect vector of Ca. Liberibacter asiaticus, the causal agent of citrus greening disease. Sequencing of the D. citri metagenome has been initiated to gain better understanding of the biology of this organism and the potential roles of its bacterial endosymbionts. To corroborate candidate endosymbionts previously identified by rDNA amplification, raw reads from the D. citri metagenome sequence were mapped to reference genome sequences. Results of the read mapping provided the most support for Wolbachia and an enteric bacterium most similar to Salmonella. Wolbachia-derived reads were extracted using the complete genome sequences for four Wolbachia strains. Reads were assembled into a draft genome sequence, and the annotation assessed for the presence of features potentially involved in host interaction. Genome alignment with the complete sequences reveals membership of Wolbachia wDi in supergroup B, further supported by phylogenetic analysis of FtsZ. FtsZ and Wsp phylogenies additionally indicate that the Wolbachia strain in the Florida D. citri isolate falls into a sub-clade of supergroup B, distinct from Wolbachia present in Chinese D. citri isolates, supporting the hypothesis that the D. citri introduced into Florida did not originate from China. PMID:23166822
Mofiz, Ehtesham; Holt, Deborah C; Seemann, Torsten; Currie, Bart J; Fischer, Katja; Papenfuss, Anthony T
2016-06-02
The scabies mite, Sarcoptes scabiei, is a parasitic arachnid and cause of the infectious skin disease scabies in humans and mange in other animal species. Scabies infections are a major health problem, particularly in remote Indigenous communities in Australia, where secondary group A streptococcal and Staphylococcus aureus infections of scabies sores are thought to drive the high rate of rheumatic heart disease and chronic kidney disease. We sequenced the genome of two samples of Sarcoptes scabiei var. hominis obtained from unrelated patients with crusted scabies located in different parts of northern Australia using the Illumina HiSeq. We also sequenced samples of Sarcoptes scabiei var. suis from a pig model. Because of the small size of the scabies mite, these data are derived from pools of thousands of mites and are metagenomic, including host and microbiome DNA. We performed cleaning and de novo assembly and present Sarcoptes scabiei var. hominis and var. suis draft reference genomes. We have constructed a preliminary annotation of this reference comprising 13,226 putative coding sequences based on sequence similarity to known proteins. We have developed extensive genomic resources for the scabies mite, including reference genomes and a preliminary annotation.
Bian, Chao; Hu, Yinchang; Ravi, Vydianathan; Kuznetsova, Inna S.; Shen, Xueyan; Mu, Xidong; Sun, Ying; You, Xinxin; Li, Jia; Li, Xiaofeng; Qiu, Ying; Tay, Boon-Hui; Thevasagayam, Natascha May; Komissarov, Aleksey S.; Trifonov, Vladimir; Kabilov, Marsel; Tupikin, Alexey; Luo, Jianren; Liu, Yi; Song, Hongmei; Liu, Chao; Wang, Xuejie; Gu, Dangen; Yang, Yexin; Li, Wujiao; Polgar, Gianluca; Fan, Guangyi; Zeng, Peng; Zhang, He; Xiong, Zijun; Tang, Zhujing; Peng, Chao; Ruan, Zhiqiang; Yu, Hui; Chen, Jieming; Fan, Mingjun; Huang, Yu; Wang, Min; Zhao, Xiaomeng; Hu, Guojun; Yang, Huanming; Wang, Jian; Wang, Jun; Xu, Xun; Song, Linsheng; Xu, Gangchun; Xu, Pao; Xu, Junmin; O’Brien, Stephen J.; Orbán, László; Venkatesh, Byrappa; Shi, Qiong
2016-01-01
The Asian arowana (Scleropages formosus), one of the world’s most expensive cultivated ornamental fishes, is an endangered species. It represents an ancient lineage of teleosts: the Osteoglossomorpha. Here, we provide a high-quality chromosome-level reference genome of a female golden-variety arowana using a combination of deep shotgun sequencing and high-resolution linkage mapping. In addition, we have also generated two draft genome assemblies for the red and green varieties. Phylogenomic analysis supports a sister group relationship between Osteoglossomorpha (bonytongues) and Elopomorpha (eels and relatives), with the two clades together forming a sister group of Clupeocephala which includes all the remaining teleosts. The arowana genome retains the full complement of eight Hox clusters unlike the African butterfly fish (Pantodon buchholzi), another bonytongue fish, which possess only five Hox clusters. Differential gene expression among three varieties provides insights into the genetic basis of colour variation. A potential heterogametic sex chromosome is identified in the female arowana karyotype, suggesting that the sex is determined by a ZW/ZZ sex chromosomal system. The high-quality reference genome of the golden arowana and the draft assemblies of the red and green varieties are valuable resources for understanding the biology, adaptation and behaviour of Asian arowanas. PMID:27089831
Ramirez-Gonzalez, Ricardo; Caccamo, Mario; MacLean, Daniel
2011-10-01
Scientists now use high-throughput sequencing technologies and short-read assembly methods to create draft genome assemblies in just days. Tools and pipelines like the assembler, and the workflow management environments make it easy for a non-specialist to implement complicated pipelines to produce genome assemblies and annotations very quickly. Such accessibility results in a proliferation of assemblies and associated files, often for many organisms. These assemblies get used as a working reference by lots of different workers, from a bioinformatician doing gene prediction or a bench scientist designing primers for PCR. Here we describe Gee Fu, a database tool for genomic assembly and feature data, including next-generation sequence alignments. Gee Fu is an instance of a Ruby-On-Rails web application on a feature database that provides web and console interfaces for input, visualization of feature data via AnnoJ, access to data through a web-service interface, an API for direct data access by Ruby scripts and access to feature data stored in BAM files. Gee Fu provides a platform for storing and sharing different versions of an assembly and associated features that can be accessed and updated by bench biologists and bioinformaticians in ways that are easy and useful for each. http://tinyurl.com/geefu dan.maclean@tsl.ac.uk.
Draft transcriptome of Globodera ellingtonae
USDA-ARS?s Scientific Manuscript database
The recently described cyst nematode species, Globodera ellingtonae, is a phylogenetic intermediary between the potato cyst nematodes (PCN), G. rotochiensis and G. pallida, and as such provides a new avenue for understanding the evolution and biology of PCN. Given that assembled genomes and transcri...
Lin, Hsin-Hung; Liao, Yu-Chieh
2015-01-01
Despite the ever-increasing output of next-generation sequencing data along with developing assemblers, dozens to hundreds of gaps still exist in de novo microbial assemblies due to uneven coverage and large genomic repeats. Third-generation single-molecule, real-time (SMRT) sequencing technology avoids amplification artifacts and generates kilobase-long reads with the potential to complete microbial genome assembly. However, due to the low accuracy (~85%) of third-generation sequences, a considerable amount of long reads (>50X) are required for self-correction and for subsequent de novo assembly. Recently-developed hybrid approaches, using next-generation sequencing data and as few as 5X long reads, have been proposed to improve the completeness of microbial assembly. In this study we have evaluated the contemporary hybrid approaches and demonstrated that assembling corrected long reads (by runCA) produced the best assembly compared to long-read scaffolding (e.g., AHA, Cerulean and SSPACE-LongRead) and gap-filling (SPAdes). For generating corrected long reads, we further examined long-read correction tools, such as ECTools, LSC, LoRDEC, PBcR pipeline and proovread. We have demonstrated that three microbial genomes including Escherichia coli K12 MG1655, Meiothermus ruber DSM1279 and Pdeobacter heparinus DSM2366 were successfully hybrid assembled by runCA into near-perfect assemblies using ECTools-corrected long reads. In addition, we developed a tool, Patch, which implements corrected long reads and pre-assembled contigs as inputs, to enhance microbial genome assemblies. With the additional 20X long reads, short reads of S. cerevisiae W303 were hybrid assembled into 115 contigs using the verified strategy, ECTools + runCA. Patch was subsequently applied to upgrade the assembly to a 35-contig draft genome. Our evaluation of the hybrid approaches shows that assembling the ECTools-corrected long reads via runCA generates near complete microbial genomes, suggesting that genome assembly could benefit from re-analyzing the available hybrid datasets that were not assembled in an optimal fashion.
The Nuclear and Mitochondrial Genomes of the Facultatively Eusocial Orchid Bee Euglossa dilemma
Brand, Philipp; Saleh, Nicholas; Pan, Hailin; Li, Cai; Kapheim, Karen M.; Ramírez, Santiago R.
2017-01-01
Bees provide indispensable pollination services to both agricultural crops and wild plant populations, and several species of bees have become important models for the study of learning and memory, plant–insect interactions, and social behavior. Orchid bees (Apidae: Euglossini) are especially important to the fields of pollination ecology, evolution, and species conservation. Here we report the nuclear and mitochondrial genome sequences of the orchid bee Euglossa dilemma Bembé & Eltz. E. dilemma was selected because it is widely distributed, highly abundant, and it was recently naturalized in the southeastern United States. We provide a high-quality assembly of the 3.3 Gb genome, and an official gene set of 15,904 gene annotations. We find high conservation of gene synteny with the honey bee throughout 80 MY of divergence time. This genomic resource represents the first draft genome of the orchid bee genus Euglossa, and the first draft orchid bee mitochondrial genome, thus representing a valuable resource to the research community. PMID:28701376
The Nuclear and Mitochondrial Genomes of the Facultatively Eusocial Orchid Bee Euglossa dilemma.
Brand, Philipp; Saleh, Nicholas; Pan, Hailin; Li, Cai; Kapheim, Karen M; Ramírez, Santiago R
2017-09-07
Bees provide indispensable pollination services to both agricultural crops and wild plant populations, and several species of bees have become important models for the study of learning and memory, plant-insect interactions, and social behavior. Orchid bees (Apidae: Euglossini) are especially important to the fields of pollination ecology, evolution, and species conservation. Here we report the nuclear and mitochondrial genome sequences of the orchid bee Euglossa dilemma Bembé & Eltz. E. dilemma was selected because it is widely distributed, highly abundant, and it was recently naturalized in the southeastern United States. We provide a high-quality assembly of the 3.3 Gb genome, and an official gene set of 15,904 gene annotations. We find high conservation of gene synteny with the honey bee throughout 80 MY of divergence time. This genomic resource represents the first draft genome of the orchid bee genus Euglossa , and the first draft orchid bee mitochondrial genome, thus representing a valuable resource to the research community. Copyright © 2017 Brand et al.
Genome assembly and transcriptome resource for river buffalo, Bubalus bubalis (2n = 50)
Iamartino, Daniela; Pruitt, Kim D; Sonstegard, Tad; Smith, Timothy P L; Low, Wai Yee; Biagini, Tommaso; Bomba, Lorenzo; Capomaccio, Stefano; Castiglioni, Bianca; Coletta, Angelo; Corrado, Federica; Ferré, Fabrizio; Iannuzzi, Leopoldo; Lawley, Cynthia; Macciotta, Nicolò; McClure, Matthew; Mancini, Giordano; Matassino, Donato; Mazza, Raffaele; Milanesi, Marco; Moioli, Bianca; Morandi, Nicola; Ramunno, Luigi; Peretti, Vincenzo; Pilla, Fabio; Ramelli, Paola; Schroeder, Steven; Strozzi, Francesco; Thibaud-Nissen, Francoise; Zicarelli, Luigi; Ajmone-Marsan, Paolo; Valentini, Alessio; Chillemi, Giovanni; Zimin, Aleksey
2017-01-01
Abstract Water buffalo is a globally important species for agriculture and local economies. A de novo assembled, well-annotated reference sequence for the water buffalo is an important prerequisite for studying the biology of this species, and is necessary to manage genetic diversity and to use modern breeding and genomic selection techniques. However, no such genome assembly has been previously reported. There are 2 species of domestic water buffalo, the river (2n = 50) and the swamp (2n = 48) buffalo. Here we describe a draft quality reference sequence for the river buffalo created from Illumina GA and Roche 454 short read sequences using the MaSuRCA assembler. The assembled sequence is 2.83 Gb, consisting of 366 983 scaffolds with a scaffold N50 of 1.41 Mb and contig N50 of 21 398 bp. Annotation of the genome was supported by transcriptome data from 30 tissues and identified 21 711 predicted protein coding genes. Searches for complete mammalian BUSCO gene groups found 98.6% of curated single copy orthologs present among predicted genes, which suggests a high level of completeness of the genome. The annotated sequence is available from NCBI at accession GCA_000471725.1. PMID:29048578
Shimizu, Tokurou; Tanizawa, Yasuhiro; Mochizuki, Takako; Nagasaki, Hideki; Yoshioka, Terutaka; Toyoda, Atsushi; Fujiyama, Asao; Kaminuma, Eli; Nakamura, Yasukazu
2017-01-01
Satsuma (Citrus unshiu Marc.) is one of the most abundantly produced mandarin varieties of citrus, known for its seedless fruit production and as a breeding parent of citrus. De novo assembly of the heterozygous diploid genome of Satsuma (“Miyagawa Wase”) was conducted by a hybrid assembly approach using short-read sequences, three mate-pair libraries, and a long-read sequence of PacBio by the PLATANUS assembler. The assembled sequence, with a total size of 359.7 Mb at the N50 length of 386,404 bp, consisted of 20,876 scaffolds. Pseudomolecules of Satsuma constructed by aligning the scaffolds to three genetic maps showed genome-wide synteny to the genomes of Clementine, pummelo, and sweet orange. Gene prediction by modeling with MAKER-P proposed 29,024 genes and 37,970 mRNA; additionally, gene prediction analysis found candidates for novel genes in several biosynthesis pathways for gibberellin and violaxanthin catabolism. BUSCO scores for the assembled scaffold and predicted transcripts, and another analysis by BAC end sequence mapping indicated the assembled genome consistency was close to those of the haploid Clementine, pummel, and sweet orange genomes. The number of repeat elements and long terminal repeat retrotransposon were comparable to those of the seven citrus genomes; this suggested no significant failure in the assembly at the repeat region. A resequencing application using the assembled sequence confirmed that both kunenbo-A and Satsuma are offsprings of Kishu, and Satsuma is a back-crossed offspring of Kishu. These results illustrated the performance of the hybrid assembly approach and its ability to construct an accurate heterozygous diploid genome. PMID:29259619
Shimizu, Tokurou; Tanizawa, Yasuhiro; Mochizuki, Takako; Nagasaki, Hideki; Yoshioka, Terutaka; Toyoda, Atsushi; Fujiyama, Asao; Kaminuma, Eli; Nakamura, Yasukazu
2017-01-01
Satsuma ( Citrus unshiu Marc.) is one of the most abundantly produced mandarin varieties of citrus, known for its seedless fruit production and as a breeding parent of citrus. De novo assembly of the heterozygous diploid genome of Satsuma ("Miyagawa Wase") was conducted by a hybrid assembly approach using short-read sequences, three mate-pair libraries, and a long-read sequence of PacBio by the PLATANUS assembler. The assembled sequence, with a total size of 359.7 Mb at the N 50 length of 386,404 bp, consisted of 20,876 scaffolds. Pseudomolecules of Satsuma constructed by aligning the scaffolds to three genetic maps showed genome-wide synteny to the genomes of Clementine, pummelo, and sweet orange. Gene prediction by modeling with MAKER-P proposed 29,024 genes and 37,970 mRNA; additionally, gene prediction analysis found candidates for novel genes in several biosynthesis pathways for gibberellin and violaxanthin catabolism. BUSCO scores for the assembled scaffold and predicted transcripts, and another analysis by BAC end sequence mapping indicated the assembled genome consistency was close to those of the haploid Clementine, pummel, and sweet orange genomes. The number of repeat elements and long terminal repeat retrotransposon were comparable to those of the seven citrus genomes; this suggested no significant failure in the assembly at the repeat region. A resequencing application using the assembled sequence confirmed that both kunenbo-A and Satsuma are offsprings of Kishu, and Satsuma is a back-crossed offspring of Kishu. These results illustrated the performance of the hybrid assembly approach and its ability to construct an accurate heterozygous diploid genome.
Comparative Genomics as a Foundation for Evo-Devo Studies in Birds.
Grayson, Phil; Sin, Simon Y W; Sackton, Timothy B; Edwards, Scott V
2017-01-01
Developmental genomics is a rapidly growing field, and high-quality genomes are a useful foundation for comparative developmental studies. A high-quality genome forms an essential reference onto which the data from numerous assays and experiments, including ChIP-seq, ATAC-seq, and RNA-seq, can be mapped. A genome also streamlines and simplifies the development of primers used to amplify putative regulatory regions for enhancer screens, cDNA probes for in situ hybridization, microRNAs (miRNAs) or short hairpin RNAs (shRNA) for RNA interference (RNAi) knockdowns, mRNAs for misexpression studies, and even guide RNAs (gRNAs) for CRISPR knockouts. Finally, much can be gleaned from comparative genomics alone, including the identification of highly conserved putative regulatory regions. This chapter provides an overview of laboratory and bioinformatics protocols for DNA extraction, library preparation, library quantification, and genome assembly, from fresh or frozen tissue to a draft avian genome. Generating a high-quality draft genome can provide a developmental research group with excellent resources for their study organism, opening the doors to many additional assays and experiments.
Random Amplification and Pyrosequencing for Identification of Novel Viral Genome Sequences
Hang, Jun; Forshey, Brett M.; Kochel, Tadeusz J.; Li, Tao; Solórzano, Víctor Fiestas; Halsey, Eric S.; Kuschner, Robert A.
2012-01-01
ssRNA viruses have high levels of genomic divergence, which can lead to difficulty in genomic characterization of new viruses using traditional PCR amplification and sequencing methods. In this study, random reverse transcription, anchored random PCR amplification, and high-throughput pyrosequencing were used to identify orthobunyavirus sequences from total RNA extracted from viral cultures of acute febrile illness specimens. Draft genome sequence for the orthobunyavirus L segment was assembled and sequentially extended using de novo assembly contigs from pyrosequencing reads and orthobunyavirus sequences in GenBank as guidance. Accuracy and continuous coverage were achieved by mapping all reads to the L segment draft sequence. Subsequently, RT-PCR and Sanger sequencing were used to complete the genome sequence. The complete L segment was found to be 6936 bases in length, encoding a 2248-aa putative RNA polymerase. The identified L segment was distinct from previously published South American orthobunyaviruses, sharing 63% and 54% identity at the nucleotide and amino acid level, respectively, with the complete Oropouche virus L segment and 73% and 81% identity at the nucleotide and amino acid level, respectively, with a partial Caraparu virus L segment. The result demonstrated the effectiveness of a sequence-independent amplification and next-generation sequencing approach for obtaining complete viral genomes from total nucleic acid extracts and its use in pathogen discovery. PMID:22468136
USDA-ARS?s Scientific Manuscript database
We produced and assembled high quality draft genomes (~100X coverage) for 305 Salmonella from a diverse a group of over 100 serovars and diverse sources. Of these isolates, 119 were selected to capture a wide variety of different AR patterns. In our subsequent analyses we included 285 additional pub...
Sánchez-Rangel, Diana; Hernández-Domínguez, Eric; Pérez-Torres, Claudia-Anahí; Ortiz-Castro, Randy; Villafán, Emanuel; Alonso-Sánchez, Alexandro; Rodríguez-Haas, Benjamín; López-Buenfil, Abel; García-Avila, Clemente; Ramírez-Pool, José-Abrahán
2017-01-01
ABSTRACT Here, we report the genome of Fusarium euwallaceae strain HFEW-16-IV-019, an isolate obtained from Kuroshio shot hole borer (a Euwallacea sp.). These beetles were collected in Tijuana, Mexico, from elm trees showing typical symptoms of Fusarium dieback. The final assembly consists of 287 scaffolds spanning 48,274,071 bp and 13,777 genes. PMID:28860245
Draft Genome Sequence of Desulfovibrio BerOc1, a Mercury-Methylating Strain
Gassie, Claire; Bouchez, Oliver; Klopp, Christophe; Guyoneaud, Rémy
2017-01-01
ABSTRACT Desulfovibrio BerOc1 is a sulfate-reducing bacterium isolated from the Berre lagoon (French Mediterranean coast). BerOc1 is able to methylate and demethylate mercury. The genome size is 4,081,579 bp assembled into five contigs. We identified the hgcA and hgcB genes involved in mercury methylation, but not those responsible for mercury demethylation. PMID:28104657
Genome Sequence of the Freshwater Yangtze Finless Porpoise.
Yuan, Yuan; Zhang, Peijun; Wang, Kun; Liu, Mingzhong; Li, Jing; Zheng, Jingsong; Wang, Ding; Xu, Wenjie; Lin, Mingli; Dong, Lijun; Zhu, Chenglong; Qiu, Qiang; Li, Songhai
2018-04-16
The Yangtze finless porpoise ( Neophocaena asiaeorientalis ssp. asiaeorientalis ) is a subspecies of the narrow-ridged finless porpoise ( N. asiaeorientalis ). In total, 714.28 gigabases (Gb) of raw reads were generated by whole-genome sequencing of the Yangtze finless porpoise, using an Illumina HiSeq 2000 platform. After filtering the low-quality and duplicated reads, we assembled a draft genome of 2.22 Gb, with contig N50 and scaffold N50 values of 46.69 kilobases (kb) and 1.71 megabases (Mb), respectively. We identified 887.63 Mb of repetitive sequences and predicted 18,479 protein-coding genes in the assembled genome. The phylogenetic tree showed a relationship between the Yangtze finless porpoise and the Yangtze River dolphin, which diverged approximately 20.84 million years ago. In comparisons with the genomes of 10 other mammals, we detected 44 species-specific gene families, 164 expanded gene families, and 313 positively selected genes in the Yangtze finless porpoise genome. The assembled genome sequence and underlying sequence data are available at the National Center for Biotechnology Information under BioProject accession number PRJNA433603.
Genome Sequence of the Freshwater Yangtze Finless Porpoise
Yuan, Yuan; Zhang, Peijun; Wang, Kun; Liu, Mingzhong; Li, Jing; Zheng, Jinsong; Wang, Ding; Xu, Wenjie; Lin, Mingli; Dong, Lijun; Zhu, Chenglong; Qiu, Qiang
2018-01-01
The Yangtze finless porpoise (Neophocaena asiaeorientalis ssp. asiaeorientalis) is a subspecies of the narrow-ridged finless porpoise (N. asiaeorientalis). In total, 714.28 gigabases (Gb) of raw reads were generated by whole-genome sequencing of the Yangtze finless porpoise, using an Illumina HiSeq 2000 platform. After filtering the low-quality and duplicated reads, we assembled a draft genome of 2.22 Gb, with contig N50 and scaffold N50 values of 46.69 kilobases (kb) and 1.71 megabases (Mb), respectively. We identified 887.63 Mb of repetitive sequences and predicted 18,479 protein-coding genes in the assembled genome. The phylogenetic tree showed a relationship between the Yangtze finless porpoise and the Yangtze River dolphin, which diverged approximately 20.84 million years ago. In comparisons with the genomes of 10 other mammals, we detected 44 species-specific gene families, 164 expanded gene families, and 313 positively selected genes in the Yangtze finless porpoise genome. The assembled genome sequence and underlying sequence data are available at the National Center for Biotechnology Information under BioProject accession number PRJNA433603. PMID:29659530
Shim, Donghwan; Park, Sin-Gi; Kim, Kangmin; Bae, Wonsil; Lee, Gir Won; Ha, Byeong-Suk; Ro, Hyeon-Su; Kim, Myungkil; Ryoo, Rhim; Rhee, Sung-Keun; Nou, Ill-Sup; Koo, Chang-Duck; Hong, Chang Pyo; Ryu, Hojin
2016-04-10
Lentinula edodes, the popular shiitake mushroom, is one of the most important cultivated edible mushrooms. It is used as a food and for medicinal purposes. Here, we present the 46.1 Mb draft genome of L. edodes, comprising 13,028 predicted gene models. The genome assembly consists of 31 scaffolds. Gene annotation provides key information about various signaling pathways and secondary metabolites. This genomic information should help establish the molecular genetic markers for MAS/MAB and increase our understanding of the genome structure and function. Copyright © 2016 Elsevier B.V. All rights reserved.
Geib, Scott M; Hall, Brian; Derego, Theodore; Bremer, Forest T; Cannoles, Kyle; Sim, Sheina B
2018-04-01
One of the most overlooked, yet critical, components of a whole genome sequencing (WGS) project is the submission and curation of the data to a genomic repository, most commonly the National Center for Biotechnology Information (NCBI). While large genome centers or genome groups have developed software tools for post-annotation assembly filtering, annotation, and conversion into the NCBI's annotation table format, these tools typically require back-end setup and connection to an Structured Query Language (SQL) database and/or some knowledge of programming (Perl, Python) to implement. With WGS becoming commonplace, genome sequencing projects are moving away from the genome centers and into the ecology or biology lab, where fewer resources are present to support the process of genome assembly curation. To fill this gap, we developed software to assess, filter, and transfer annotation and convert a draft genome assembly and annotation set into the NCBI annotation table (.tbl) format, facilitating submission to the NCBI Genome Assembly database. This software has no dependencies, is compatible across platforms, and utilizes a simple command to perform a variety of simple and complex post-analysis, pre-NCBI submission WGS project tasks. The Genome Annotation Generator is a consistent and user-friendly bioinformatics tool that can be used to generate a .tbl file that is consistent with the NCBI submission pipeline. The Genome Annotation Generator achieves the goal of providing a publicly available tool that will facilitate the submission of annotated genome assemblies to the NCBI. It is useful for any individual researcher or research group that wishes to submit a genome assembly of their study system to the NCBI.
Hall, Brian; Derego, Theodore; Bremer, Forest T; Cannoles, Kyle
2018-01-01
Abstract Background One of the most overlooked, yet critical, components of a whole genome sequencing (WGS) project is the submission and curation of the data to a genomic repository, most commonly the National Center for Biotechnology Information (NCBI). While large genome centers or genome groups have developed software tools for post-annotation assembly filtering, annotation, and conversion into the NCBI’s annotation table format, these tools typically require back-end setup and connection to an Structured Query Language (SQL) database and/or some knowledge of programming (Perl, Python) to implement. With WGS becoming commonplace, genome sequencing projects are moving away from the genome centers and into the ecology or biology lab, where fewer resources are present to support the process of genome assembly curation. To fill this gap, we developed software to assess, filter, and transfer annotation and convert a draft genome assembly and annotation set into the NCBI annotation table (.tbl) format, facilitating submission to the NCBI Genome Assembly database. This software has no dependencies, is compatible across platforms, and utilizes a simple command to perform a variety of simple and complex post-analysis, pre-NCBI submission WGS project tasks. Findings The Genome Annotation Generator is a consistent and user-friendly bioinformatics tool that can be used to generate a .tbl file that is consistent with the NCBI submission pipeline Conclusions The Genome Annotation Generator achieves the goal of providing a publicly available tool that will facilitate the submission of annotated genome assemblies to the NCBI. It is useful for any individual researcher or research group that wishes to submit a genome assembly of their study system to the NCBI. PMID:29635297
Moskalev, Alexey А; Kudryavtseva, Anna V; Graphodatsky, Alexander S; Beklemisheva, Violetta R; Serdyukova, Natalya A; Krutovsky, Konstantin V; Sharov, Vadim V; Kulakovskiy, Ivan V; Lando, Andrey S; Kasianov, Artem S; Kuzmin, Dmitry A; Putintseva, Yuliya A; Feranchuk, Sergey I; Shaposhnikov, Mikhail V; Fraifeld, Vadim E; Toren, Dmitri; Snezhkina, Anastasia V; Sitnik, Vasily V
2017-12-28
Gray whale, Eschrichtius robustus (E. robustus), is a single member of the family Eschrichtiidae, which is considered to be the most primitive in the class Cetacea. Gray whale is often described as a "living fossil". It is adapted to extreme marine conditions and has a high life expectancy (77 years). The assembly of a gray whale genome and transcriptome will allow to carry out further studies of whale evolution, longevity, and resistance to extreme environment. In this work, we report the first de novo assembly and primary analysis of the E. robustus genome and transcriptome based on kidney and liver samples. The presented draft genome assembly is complete by 55% in terms of a total genome length, but only by 24% in terms of the BUSCO complete gene groups, although 10,895 genes were identified. Transcriptome annotation and comparison with other whale species revealed robust expression of DNA repair and hypoxia-response genes, which is expected for whales. This preliminary study of the gray whale genome and transcriptome provides new data to better understand the whale evolution and the mechanisms of their adaptation to the hypoxic conditions.
An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing.
Zimin, Aleksey V; Stevens, Kristian A; Crepeau, Marc W; Puiu, Daniela; Wegrzyn, Jill L; Yorke, James A; Langley, Charles H; Neale, David B; Salzberg, Steven L
2017-01-01
The 22-gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs whose weighted average (N50) size was 8206 bp. To improve this result, we generated approximately 12-fold coverage in long reads using the Single Molecule Real Time sequencing technology developed at Pacific Biosciences. We assembled the long and short reads together using the MaSuRCA mega-reads assembly algorithm, which produced a substantially better assembly, P. taeda version 2.0. The new assembly has an N50 contig size of 25 361, more than three times as large as achieved in the original assembly, and an N50 scaffold size of 107 821, 61% larger than the previous assembly. © The Author 2017. Published by Oxford University Press.
Zimin, Aleksey V; Stevens, Kristian A; Crepeau, Marc W; Puiu, Daniela; Wegrzyn, Jill L; Yorke, James A; Langley, Charles H; Neale, David B; Salzberg, Steven L
2017-10-01
The 22-gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs whose weighted average (N50) size was 8206 bp. To improve this result, we generated approximately 12-fold coverage in long reads using the Single Molecule Real Time sequencing technology developed at Pacific Biosciences. We assembled the long and short reads together using the MaSuRCA mega-reads assembly algorithm, which produced a substantially better assembly, P. taeda version 2.0. The new assembly has an N50 contig size of 25 361, more than three times as large as achieved in the original assembly, and an N50 scaffold size of 107 821, 61% larger than the previous assembly. © The Authors 2017. Published by Oxford University Press.
Gilchrist, Anthony Stuart; Shearman, Deborah C A; Frommer, Marianne; Raphael, Kathryn A; Deshpande, Nandan P; Wilkins, Marc R; Sherwin, William B; Sved, John A
2014-12-20
The tephritid fruit flies include a number of economically important pests of horticulture, with a large accumulated body of research on their biology and control. Amongst the Tephritidae, the genus Bactrocera, containing over 400 species, presents various species groups of potential utility for genetic studies of speciation, behaviour or pest control. In Australia, there exists a triad of closely-related, sympatric Bactrocera species which do not mate in the wild but which, despite distinct morphologies and behaviours, can be force-mated in the laboratory to produce fertile hybrid offspring. To exploit the opportunities offered by genomics, such as the efficient identification of genetic loci central to pest behaviour and to the earliest stages of speciation, investigators require genomic resources for future investigations. We produced a draft de novo genome assembly of Australia's major tephritid pest species, Bactrocera tryoni. The male genome (650-700 Mbp) includes approximately 150 Mb of interspersed repetitive DNA sequences and 60 Mb of satellite DNA. Assessment using conserved core eukaryotic sequences indicated 98% completeness. Over 16,000 MAKER-derived gene models showed a large degree of overlap with other Dipteran reference genomes. The sequence of the ribosomal RNA transcribed unit was also determined. Unscaffolded assemblies of B. neohumeralis and B. jarvisi were then produced; comparison with B. tryoni showed that the species are more closely related than any Drosophila species pair. The similarity of the genomes was exploited to identify 4924 potentially diagnostic indels between the species, all of which occur in non-coding regions. This first draft B. tryoni genome resembles other dipteran genomes in terms of size and putative coding sequences. For all three species included in this study, we have identified a comprehensive set of non-redundant repetitive sequences, including the ribosomal RNA unit, and have quantified the major satellite DNA families. These genetic resources will facilitate the further investigations of genetic mechanisms responsible for the behavioural and morphological differences between these three species and other tephritids. We have also shown how whole genome sequence data can be used to generate simple diagnostic tests between very closely-related species where only one of the species is scaffolded.
Recovering complete and draft population genomes from metagenome datasets
Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A.
2016-03-08
Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem ofmore » chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.« less
Recovering complete and draft population genomes from metagenome datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A.
Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem ofmore » chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.« less
Sedlar, Karel; Kolek, Jan; Skutkova, Helena; Branska, Barbora; Provaznik, Ivo; Patakova, Petra
2015-11-20
The strain Clostridium pasteurianum NRRL B-598 is non-type, oxygen tolerant, spore-forming, mesophilic and heterofermentative strain with high hydrogen production and ability of acetone-butanol fermentation (ethanol production being negligible). Here, we present the annotated complete genome sequence of this bacterium, replacing the previous draft genome assembly. The genome consisting of a single circular 6,186,879 bp chromosome with no plasmid was determined using PacBio RSII and Roche 454 sequencing. Copyright © 2015 Elsevier B.V. All rights reserved.
A High Quality Draft Consensus Sequence of the Genome of a Heterozygous Grapevine Variety
Cartwright, Dustin A.; Cestaro, Alessandro; Pruss, Dmitry; Pindo, Massimo; FitzGerald, Lisa M.; Vezzulli, Silvia; Reid, Julia; Malacarne, Giulia; Iliev, Diana; Coppola, Giuseppina; Wardell, Bryan; Micheletti, Diego; Macalma, Teresita; Facci, Marco; Mitchell, Jeff T.; Perazzolli, Michele; Eldredge, Glenn; Gatto, Pamela; Oyzerski, Rozan; Moretto, Marco; Gutin, Natalia; Stefanini, Marco; Chen, Yang; Segala, Cinzia; Davenport, Christine; Demattè, Lorenzo; Mraz, Amy; Battilana, Juri; Stormo, Keith; Costa, Fabrizio; Tao, Quanzhou; Si-Ammour, Azeddine; Harkins, Tim; Lackey, Angie; Perbost, Clotilde; Taillon, Bruce; Stella, Alessandra; Solovyev, Victor; Fawcett, Jeffrey A.; Sterck, Lieven; Vandepoele, Klaas; Grando, Stella M.; Toppo, Stefano; Moser, Claudio; Lanchbury, Jerry; Bogden, Robert; Skolnick, Mark; Sgaramella, Vittorio; Bhatnagar, Satish K.; Fontana, Paolo; Gutin, Alexander; Van de Peer, Yves; Salamini, Francesco; Viola, Roberto
2007-01-01
Background Worldwide, grapes and their derived products have a large market. The cultivated grape species Vitis vinifera has potential to become a model for fruit trees genetics. Like many plant species, it is highly heterozygous, which is an additional challenge to modern whole genome shotgun sequencing. In this paper a high quality draft genome sequence of a cultivated clone of V. vinifera Pinot Noir is presented. Principal Findings We estimate the genome size of V. vinifera to be 504.6 Mb. Genomic sequences corresponding to 477.1 Mb were assembled in 2,093 metacontigs and 435.1 Mb were anchored to the 19 linkage groups (LGs). The number of predicted genes is 29,585, of which 96.1% were assigned to LGs. This assembly of the grape genome provides candidate genes implicated in traits relevant to grapevine cultivation, such as those influencing wine quality, via secondary metabolites, and those connected with the extreme susceptibility of grape to pathogens. Single nucleotide polymorphism (SNP) distribution was consistent with a diffuse haplotype structure across the genome. Of around 2,000,000 SNPs, 1,751,176 were mapped to chromosomes and one or more of them were identified in 86.7% of anchored genes. The relative age of grape duplicated genes was estimated and this made possible to reveal a relatively recent Vitis-specific large scale duplication event concerning at least 10 chromosomes (duplication not reported before). Conclusions Sanger shotgun sequencing and highly efficient sequencing by synthesis (SBS), together with dedicated assembly programs, resolved a complex heterozygous genome. A consensus sequence of the genome and a set of mapped marker loci were generated. Homologous chromosomes of Pinot Noir differ by 11.2% of their DNA (hemizygous DNA plus chromosomal gaps). SNP markers are offered as a tool with the potential of introducing a new era in the molecular breeding of grape. PMID:18094749
Peng, Xinxia; Alföldi, Jessica; Gori, Kevin; Eisfeld, Amie J; Tyler, Scott R; Tisoncik-Go, Jennifer; Brawand, David; Law, G Lynn; Skunca, Nives; Hatta, Masato; Gasper, David J; Kelly, Sara M; Chang, Jean; Thomas, Matthew J; Johnson, Jeremy; Berlin, Aaron M; Lara, Marcia; Russell, Pamela; Swofford, Ross; Turner-Maier, Jason; Young, Sarah; Hourlier, Thibaut; Aken, Bronwen; Searle, Steve; Sun, Xingshen; Yi, Yaling; Suresh, M; Tumpey, Terrence M; Siepel, Adam; Wisely, Samantha M; Dessimoz, Christophe; Kawaoka, Yoshihiro; Birren, Bruce W; Lindblad-Toh, Kerstin; Di Palma, Federica; Engelhardt, John F; Palermo, Robert E; Katze, Michael G
2014-12-01
The domestic ferret (Mustela putorius furo) is an important animal model for multiple human respiratory diseases. It is considered the 'gold standard' for modeling human influenza virus infection and transmission. Here we describe the 2.41 Gb draft genome assembly of the domestic ferret, constituting 2.28 Gb of sequence plus gaps. We annotated 19,910 protein-coding genes on this assembly using RNA-seq data from 21 ferret tissues. We characterized the ferret host response to two influenza virus infections by RNA-seq analysis of 42 ferret samples from influenza time-course data and showed distinct signatures in ferret trachea and lung tissues specific to 1918 or 2009 human pandemic influenza virus infections. Using microarray data from 16 ferret samples reflecting cystic fibrosis disease progression, we showed that transcriptional changes in the CFTR-knockout ferret lung reflect pathways of early disease that cannot be readily studied in human infants with cystic fibrosis disease.
Peng, Xinxia; Alföldi, Jessica; Gori, Kevin; Eisfeld, Amie J.; Tyler, Scott R.; Tisoncik-Go, Jennifer; Brawand, David; Law, G. Lynn; Skunca, Nives; Hatta, Masato; Gasper, David J.; Kelly, Sara M.; Chang, Jean; Thomas, Matthew J.; Johnson, Jeremy; Berlin, Aaron M.; Lara, Marcia; Russell, Pamela; Swofford, Ross; Turner-Maier, Jason; Young, Sarah; Hourlier, Thibaut; Aken, Bronwen; Searle, Steve; Sun, Xingshen; Yi, Yaling; Suresh, M.; Tumpey, Terrence M.; Siepel, Adam; Wisely, Samantha M.; Dessimoz, Christophe; Kawaoka, Yoshihiro; Birren, Bruce W.; Lindblad-Toh, Kerstin; Di Palma, Federica; Engelhardt, John F.; Palermo, Robert E.; Katze, Michael G.
2014-01-01
The domestic ferret (Mustela putorius furo) is an important animal model for multiple human respiratory diseases. It is considered the ‘gold standard’ for modeling human influenza virus infection and transmission1–4. Here we describe the 2.41 Gb draft genome assembly of the domestic ferret, constituting 2.28 Gb of sequence plus gaps. We annotate 19,910 protein-coding genes on this assembly using RNA-seq data from 21 ferret tissues. We characterize the ferret host response to two influenza virus infections by RNA-seq analysis of 42 ferret samples from influenza time courses, and show distinct signatures in ferret trachea and lung tissues specific to 1918 or 2009 human pandemic influenza virus infections. Using microarray data from 16 ferret samples reflecting cystic fibrosis (CF) disease progression, we show that transcriptional changes in the CFTR-knockout ferret lung reflect pathways of early disease that cannot be readily studied in human infants with CF disease. PMID:25402615
Edger, Patrick P; VanBuren, Robert; Colle, Marivi; Poorten, Thomas J; Wai, Ching Man; Niederhuth, Chad E; Alger, Elizabeth I; Ou, Shujun; Acharya, Charlotte B; Wang, Jie; Callow, Pete; McKain, Michael R; Shi, Jinghua; Collier, Chad; Xiong, Zhiyong; Mower, Jeffrey P; Slovin, Janet P; Hytönen, Timo; Jiang, Ning; Childs, Kevin L; Knapp, Steven J
2018-02-01
Although draft genomes are available for most agronomically important plant species, the majority are incomplete, highly fragmented, and often riddled with assembly and scaffolding errors. These assembly issues hinder advances in tool development for functional genomics and systems biology. Here we utilized a robust, cost-effective approach to produce high-quality reference genomes. We report a near-complete genome of diploid woodland strawberry (Fragaria vesca) using single-molecule real-time sequencing from Pacific Biosciences (PacBio). This assembly has a contig N50 length of ∼7.9 million base pairs (Mb), representing a ∼300-fold improvement of the previous version. The vast majority (>99.8%) of the assembly was anchored to 7 pseudomolecules using 2 sets of optical maps from Bionano Genomics. We obtained ∼24.96 Mb of sequence not present in the previous version of the F. vesca genome and produced an improved annotation that includes 1496 new genes. Comparative syntenic analyses uncovered numerous, large-scale scaffolding errors present in each chromosome in the previously published version of the F. vesca genome. Our results highlight the need to improve existing short-read based reference genomes. Furthermore, we demonstrate how genome quality impacts commonly used analyses for addressing both fundamental and applied biological questions. © The Authors 2017. Published by Oxford University Press.
Draft Genome Sequence of Zobellia sp. Strain OII3, Isolated from the Coastal Zone of the Baltic Sea.
Harms, Henrik; Poehlein, Anja; Thürmer, Andrea; König, Gabriele M; Schäberle, Till F
2017-09-07
Zobellia sp. strain OII3 was isolated from a marine environmental sample due to its heterotrophic lifestyle, i.e., using Escherichia coli cells as prey. It shows strong agar-lytic activity. The genome was assembled into 41 contigs with a total size of 5.4 Mb, revealing the genetic basis for natural product biosynthesis. Copyright © 2017 Harms et al.
Ibarra-Laclette, Enrique; Sánchez-Rangel, Diana; Hernández-Domínguez, Eric; Pérez-Torres, Claudia-Anahí; Ortiz-Castro, Randy; Villafán, Emanuel; Alonso-Sánchez, Alexandro; Rodríguez-Haas, Benjamín; López-Buenfil, Abel; García-Avila, Clemente; Ramírez-Pool, José-Abrahán
2017-08-31
Here, we report the genome of Fusarium euwallaceae strain HFEW-16-IV-019, an isolate obtained from Kuroshio shot hole borer (a Euwallacea sp.). These beetles were collected in Tijuana, Mexico, from elm trees showing typical symptoms of Fusarium dieback. The final assembly consists of 287 scaffolds spanning 48,274,071 bp and 13,777 genes. Copyright © 2017 Ibarra-Laclette et al.
Bringing the fathead minnow into the genomic era | Science ...
The fathead minnow is a well-established ecotoxicological model organism that has been widely used for regulatory ecotoxicity testing and research for over a half century. While a large amount of molecular information has been gathered on the fathead minnow over the years, the lack of genomic sequence data has limited the utility of the fathead minnow for certain applications. To address this limitation, high-throughput Illumina sequencing technology was employed to sequence the fathead minnow genome. Approximately 100X coverage was achieved by sequencing several libraries of paired-end reads with differing genome insert sizes. Two draft genome assemblies were generated using the SOAPdenovo and String Graph Assembler (SGA) methods, respectively. When these were compared, the SOAPdenovo assembly had a higher scaffold N50 value of 60.4 kbp versus 15.4 kbp, and it also performed better in a Core Eukaryotic Genes Mapping Analysis (CEGMA), mapping 91% versus 67% of genes. As such, this assembly was selected for further development and annotation. The foundation for genome annotation was generated using AUGUSTUS, an ab initio method for gene prediction. A total of 43,345 potential coding sequences were predicted on the genome assembly. These predicted sequences were translated to peptides and queried in a BLAST search against all vertebrates, with 28,290 of these sequences corresponding to zebrafish peptides and 5,242 producing no significant alignments. Additional ty
Draft genome of the gayal, Bos frontalis
Wang, Ming-Shan; Zeng, Yan; Wang, Xiao; Nie, Wen-Hui; Wang, Jin-Huan; Su, Wei-Ting; Xiong, Zi-Jun; Wang, Sheng; Qu, Kai-Xing; Yan, Shou-Qing; Yang, Min-Min; Wang, Wen; Dong, Yang; Zhang, Ya-Ping
2017-01-01
Abstract Gayal (Bos frontalis), also known as mithan or mithun, is a large endangered semi-domesticated bovine that has a limited geographical distribution in the hill-forests of China, Northeast India, Bangladesh, Myanmar, and Bhutan. Many questions about the gayal such as its origin, population history, and genetic basis of local adaptation remain largely unresolved. De novo sequencing and assembly of the whole gayal genome provides an opportunity to address these issues. We report a high-depth sequencing, de novo assembly, and annotation of a female Chinese gayal genome. Based on the Illumina genomic sequencing platform, we have generated 350.38 Gb of raw data from 16 different insert-size libraries. A total of 276.86 Gb of clean data is retained after quality control. The assembled genome is about 2.85 Gb with scaffold and contig N50 sizes of 2.74 Mb and 14.41 kb, respectively. Repetitive elements account for 48.13% of the genome. Gene annotation has yielded 26 667 protein-coding genes, of which 97.18% have been functionally annotated. BUSCO assessment shows that our assembly captures 93% (3183 of 4104) of the core eukaryotic genes and 83.1% of vertebrate universal single-copy orthologs. We provide the first comprehensive de novo genome of the gayal. This genetic resource is integral for investigating the origin of the gayal and performing comparative genomic studies to improve understanding of the speciation and divergence of bovine species. The assembled genome could be used as reference in future population genetic studies of gayal. PMID:29048483
De novo genome assembly of the soil-borne fungus and tomato pathogen Pyrenochaeta lycopersici
2014-01-01
Background Pyrenochaeta lycopersici is a soil-dwelling ascomycete pathogen that causes corky root rot disease in tomato (Solanum lycopersicum) and other Solanaceous crops, reducing fruit yields by up to 75%. Fungal pathogens that infect roots receive less attention than those infecting the aerial parts of crops despite their significant impact on plant growth and fruit production. Results We assembled a 54.9Mb P. lycopersici draft genome sequence based on Illumina short reads, and annotated approximately 17,000 genes. The P. lycopersici genome is closely related to hemibiotrophs and necrotrophs, in agreement with the phenotypic characteristics of the fungus and its lifestyle. Several gene families related to host–pathogen interactions are strongly represented, including those responsible for nutrient absorption, the detoxification of fungicides and plant cell wall degradation, the latter confirming that much of the genome is devoted to the pathogenic activity of the fungus. We did not find a MAT gene, which is consistent with the classification of P. lycopersici as an imperfect fungus, but we observed a significant expansion of the gene families associated with heterokaryon incompatibility (HI). Conclusions The P. lycopersici draft genome sequence provided insight into the molecular and genetic basis of the fungal lifestyle, characterizing previously unknown pathogenic behaviors and defining strategies that allow this asexual fungus to increase genetic diversity and to acquire new pathogenic traits. PMID:24767544
Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies
2014-01-01
Background The size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination. Results We develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome. Conclusions In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied. PMID:24647006
Genome assembly and transcriptome resource for river buffalo, Bubalus bubalis (2n = 50).
Williams, John L; Iamartino, Daniela; Pruitt, Kim D; Sonstegard, Tad; Smith, Timothy P L; Low, Wai Yee; Biagini, Tommaso; Bomba, Lorenzo; Capomaccio, Stefano; Castiglioni, Bianca; Coletta, Angelo; Corrado, Federica; Ferré, Fabrizio; Iannuzzi, Leopoldo; Lawley, Cynthia; Macciotta, Nicolò; McClure, Matthew; Mancini, Giordano; Matassino, Donato; Mazza, Raffaele; Milanesi, Marco; Moioli, Bianca; Morandi, Nicola; Ramunno, Luigi; Peretti, Vincenzo; Pilla, Fabio; Ramelli, Paola; Schroeder, Steven; Strozzi, Francesco; Thibaud-Nissen, Francoise; Zicarelli, Luigi; Ajmone-Marsan, Paolo; Valentini, Alessio; Chillemi, Giovanni; Zimin, Aleksey
2017-10-01
Water buffalo is a globally important species for agriculture and local economies. A de novo assembled, well-annotated reference sequence for the water buffalo is an important prerequisite for studying the biology of this species, and is necessary to manage genetic diversity and to use modern breeding and genomic selection techniques. However, no such genome assembly has been previously reported. There are 2 species of domestic water buffalo, the river (2 n = 50) and the swamp (2 n = 48) buffalo. Here we describe a draft quality reference sequence for the river buffalo created from Illumina GA and Roche 454 short read sequences using the MaSuRCA assembler. The assembled sequence is 2.83 Gb, consisting of 366 983 scaffolds with a scaffold N50 of 1.41 Mb and contig N50 of 21 398 bp. Annotation of the genome was supported by transcriptome data from 30 tissues and identified 21 711 predicted protein coding genes. Searches for complete mammalian BUSCO gene groups found 98.6% of curated single copy orthologs present among predicted genes, which suggests a high level of completeness of the genome. The annotated sequence is available from NCBI at accession GCA_000471725.1. © The Author 2017. Published by Oxford University Press.
Spider genomes provide insight into composition and evolution of venom and silk
Sanggaard, Kristian W.; Bechsgaard, Jesper S.; Fang, Xiaodong; Duan, Jinjie; Dyrlund, Thomas F.; Gupta, Vikas; Jiang, Xuanting; Cheng, Ling; Fan, Dingding; Feng, Yue; Han, Lijuan; Huang, Zhiyong; Wu, Zongze; Liao, Li; Settepani, Virginia; Thøgersen, Ida B.; Vanthournout, Bram; Wang, Tobias; Zhu, Yabing; Funch, Peter; Enghild, Jan J.; Schauser, Leif; Andersen, Stig U.; Villesen, Palle; Schierup, Mikkel H; Bilde, Trine; Wang, Jun
2014-01-01
Spiders are ecologically important predators with complex venom and extraordinarily tough silk that enables capture of large prey. Here we present the assembled genome of the social velvet spider and a draft assembly of the tarantula genome that represent two major taxonomic groups of spiders. The spider genomes are large with short exons and long introns, reminiscent of mammalian genomes. Phylogenetic analyses place spiders and ticks as sister groups supporting polyphyly of the Acari. Complex sets of venom and silk genes/proteins are identified. We find that venom genes evolved by sequential duplication, and that the toxic effect of venom is most likely activated by proteases present in the venom. The set of silk genes reveals a highly dynamic gene evolution, new types of silk genes and proteins, and a novel use of aciniform silk. These insights create new opportunities for pharmacological applications of venom and biomaterial applications of silk. PMID:24801114
Moura, Quézia; Fernandes, Miriam R; Cerdeira, Louise; Nhambe, Lúcia F; Ienne, Susan; Souza, Tiago A; Lincopan, Nilton
2017-09-01
Multidrug-resistant (MDR) Enterobacter aerogenes strains are frequently associated with nosocomial infections and high mortality rates, representing a serious public health problem. The aim of this study was to present the draft genome sequence of a MDR KPC-2-producing E. aerogenes isolated from a perineal swab of a hospitalised patient in Brazil. Genomic DNA was sequenced using an Illumina MiSeq platform. De novo genome assembly was carried out using the A5-Miseq pipeline, and whole-genome sequence analysis was performed using tools from the Center for Genomic Epidemiology. The strain harboured resistance genes to β-lactams, aminoglycosides, sulphonamides and trimethoprim in addition to genes encoding multidrug efflux system proteins, a quaternary ammonium transporter and heavy metal efflux system proteins. In addition, the strain harboured genes encoding diverse virulence factors. These data might allow a better understanding of the genetic basis of antimicrobial resistance and virulence in E. aerogenes strains. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
First draft genome of an iconic clownfish species (Amphiprion frenatus).
Marcionetti, Anna; Rossier, Victor; Bertrand, Joris A M; Litsios, Glenn; Salamin, Nicolas
2018-02-17
Clownfishes (or anemonefishes) form an iconic group of coral reef fishes, principally known for their mutualistic interaction with sea anemones. They are characterized by particular life history traits, such as a complex social structure and mating system involving sequential hermaphroditism, coupled with an exceptionally long lifespan. Additionally, clownfishes are considered to be one of the rare groups to have experienced an adaptive radiation in the marine environment. Here, we assembled and annotated the first genome of a clownfish species, the tomato clownfish (Amphiprion frenatus). We obtained 17,801 assembled scaffolds, containing a total of 26,917 genes. The completeness of the assembly and annotation was satisfying, with 96.5% of the Actinopterygii Benchmarking Universal Single-Copy Orthologs (BUSCOs) being retrieved in A. frenatus assembly. The quality of the resulting assembly is comparable to other bony fish assemblies. This resource is valuable for advancing studies of the particular life history traits of clownfishes, as well as being useful for population genetic studies and the development of new phylogenetic markers. It will also open the way to comparative genomics. Indeed, future genomic comparison among closely related fishes may provide means to identify genes related to the unique adaptations to different sea anemone hosts, as well as better characterize the genomic signatures of an adaptive radiation. © 2018 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.
Sellera, Fábio P; Fernandes, Miriam R; Moura, Quézia; Souza, Tiago A; Nascimento, Cristiane L; Cerdeira, Louise; Lincopan, Nilton
2018-03-01
The incidence of multidrug-resistant bacteria in wildlife animals has been investigated to improve our knowledge of the spread of clinically relevant antimicrobial resistance genes. The aim of this study was to report the first draft genome sequence of an extensively drug-resistant (XDR) Pseudomonas aeruginosa ST644 isolate recovered from a Magellanic penguin with a footpad infection (bumblefoot) undergoing rehabilitation process. The genome was sequenced on an Illumina NextSeq ® platform using 150-bp paired-end reads. De novo genome assembly was performed using Velvet v.1.2.10, and the whole genome sequence was evaluated using bioinformatics approaches from the Center of Genomic Epidemiology, whereas an in-house method (mapping of raw whole genome sequence reads) was used to identify chromosomal point mutations. The genome size was calculated at 6436450bp, with 6357 protein-coding sequences and the presence of genes conferring resistance to aminoglycosides, β-lactams, phenicols, sulphonamides, tetracyclines, quinolones and fosfomycin; in addition, mutations in the genes gyrA (Thr83Ile), parC (Ser87Leu), phoQ (Arg61His) and pmrB (Tyr345His), conferring resistance to quinolones and polymyxins, respectively, were confirmed. This draft genome sequence can provide useful information for comparative genomic analysis regarding the dissemination of clinically significant antibiotic resistance genes and XDR bacterial species at the human-animal interface. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Draft Genome Sequence of Desulfovibrio BerOc1, a Mercury-Methylating Strain.
Goñi Urriza, Marisol; Gassie, Claire; Bouchez, Oliver; Klopp, Christophe; Guyoneaud, Rémy
2017-01-19
Desulfovibrio BerOc1 is a sulfate-reducing bacterium isolated from the Berre lagoon (French Mediterranean coast). BerOc1 is able to methylate and demethylate mercury. The genome size is 4,081,579 bp assembled into five contigs. We identified the hgcA and hgcB genes involved in mercury methylation, but not those responsible for mercury demethylation. Copyright © 2017 Goñi Urriza et al.
A draft genome assembly of the army worm, Spodoptera frugiperda.
Kakumani, Pavan Kumar; Malhotra, Pawan; Mukherjee, Sunil K; Bhatnagar, Raj K
2014-08-01
Spodoptera is an agriculturally important pest insect and studies in understanding its biology have been limited by the unavailability of its genome. In the present study, the genomic DNA was sequenced and assembled into 37,243 scaffolds of size, 358 Mb with N50 of 53.7 kb. Based on degree of identity, we could anchor 305 Mb of the genome onto all the 28 chromosomes of Bombyx mori. Repeat elements were identified, which accounts for 20.28% of the total genome. Further, we predicted 11,595 genes, with an average intron length of 726 bp. The genes were annotated and domain analysis revealed that Sf genes share a significant homology and expression pattern with B. mori, despite differences in KOG gene categories and representation of certain protein families. The present study on Sf genome would help in the characterization of cellular pathways to understand its biology and comparative evolutionary studies among lepidopteran family members to help annotate their genomes. Copyright © 2014 Elsevier Inc. All rights reserved.
Cannarozzi, Gina; Plaza-Wüthrich, Sonia; Esfeld, Korinna; Larti, Stéphanie; Wilson, Yi Song; Girma, Dejene; de Castro, Edouard; Chanyalew, Solomon; Blösch, Regula; Farinelli, Laurent; Lyons, Eric; Schneider, Michel; Falquet, Laurent; Kuhlemeier, Cris; Assefa, Kebebew; Tadele, Zerihun
2014-07-09
Tef (Eragrostis tef), an indigenous cereal critical to food security in the Horn of Africa, is rich in minerals and protein, resistant to many biotic and abiotic stresses and safe for diabetics as well as sufferers of immune reactions to wheat gluten. We present the genome of tef, the first species in the grass subfamily Chloridoideae and the first allotetraploid assembled de novo. We sequenced the tef genome for marker-assisted breeding, to shed light on the molecular mechanisms conferring tef's desirable nutritional and agronomic properties, and to make its genome publicly available as a community resource. The draft genome contains 672 Mbp representing 87% of the genome size estimated from flow cytometry. We also sequenced two transcriptomes, one from a normalized RNA library and another from unnormalized RNASeq data. The normalized RNA library revealed around 38000 transcripts that were then annotated by the SwissProt group. The CoGe comparative genomics platform was used to compare the tef genome to other genomes, notably sorghum. Scaffolds comprising approximately half of the genome size were ordered by syntenic alignment to sorghum producing tef pseudo-chromosomes, which were sorted into A and B genomes as well as compared to the genetic map of tef. The draft genome was used to identify novel SSR markers, investigate target genes for abiotic stress resistance studies, and understand the evolution of the prolamin family of proteins that are responsible for the immune response to gluten. It is highly plausible that breeding targets previously identified in other cereal crops will also be valuable breeding targets in tef. The draft genome and transcriptome will be of great use for identifying these targets for genetic improvement of this orphan crop that is vital for feeding 50 million people in the Horn of Africa.
Silva, Paula Renata Alves da; Simões-Araújo, Jean Luiz; Vidal, Márcia Soares; Cruz, Leonardo Magalhães; Souza, Emanuel Maltempi de; Baldani, José Ivo
Paraburkholderia tropica (syn Burkholderia tropica) are nitrogen-fixing bacteria commonly found in sugarcane. The Paraburkholderia tropica strain Ppe8 is part of the sugarcane inoculant consortium that has a beneficial effect on yield. Here, we report a draft genome sequence of this strain elucidating the mechanisms involved in its interaction mainly with Poaceae. A genome size of approximately 8.75Mb containing 7844 protein coding genes distributed in 526 subsystems was de novo assembled with ABySS and annotated by RAST. Genes related to the nitrogen fixation process, the secretion systems (I, II, III, IV, and VI), and related to a variety of metabolic traits, such as metabolism of carbohydrates, amino acids, vitamins, and proteins, were detected, suggesting a broad metabolic capacity and possible adaptation to plant association. Copyright © 2017 Sociedade Brasileira de Microbiologia. Published by Elsevier Editora Ltda. All rights reserved.
Pfeiffer, Friedhelm; Zamora-Lagos, Maria-Antonia; Blettinger, Martin; Yeroslaviz, Assa; Dahl, Andreas; Gruber, Stephan; Habermann, Bianca H
2018-01-05
Due to the predominant usage of short-read sequencing to date, most bacterial genome sequences reported in the last years remain at the draft level. This precludes certain types of analyses, such as the in-depth analysis of genome plasticity. Here we report the finalized genome sequence of the environmental strain Aeromonas salmonicida subsp. pectinolytica 34mel, for which only a draft genome with 253 contigs is currently available. Successful completion of the transposon-rich genome critically depended on the PacBio long read sequencing technology. Using finalized genome sequences of A. salmonicida subsp. pectinolytica and other Aeromonads, we report the detailed analysis of the transposon composition of these bacterial species. Mobilome evolution is exemplified by a complex transposon, which has shifted from pathogenicity-related to environmental-related gene content in A. salmonicida subsp. pectinolytica 34mel. Obtaining the complete, circular genome of A. salmonicida subsp. pectinolytica allowed us to perform an in-depth analysis of its mobilome. We demonstrate the mobilome-dependent evolution of this strain's genetic profile from pathogenic to environmental.
Read, Timothy D; Petit, Robert A; Joseph, Sandeep J; Alam, Md Tauqeer; Weil, M Ryan; Ahmad, Maida; Bhimani, Ravila; Vuong, Jocelyn S; Haase, Chad P; Webb, D Harry; Tan, Milton; Dove, Alistair D M
2017-07-14
The whale shark (Rhincodon typus) has by far the largest body size of any elasmobranch (shark or ray) species. Therefore, it is also the largest extant species of the paraphyletic assemblage commonly referred to as fishes. As both a phenotypic extreme and a member of the group Chondrichthyes - the sister group to the remaining gnathostomes, which includes all tetrapods and therefore also humans - its genome is of substantial comparative interest. Whale sharks are also listed as an endangered species on the International Union for Conservation of Nature's Red List of threatened species and are of growing popularity as both a target of ecotourism and as a charismatic conservation ambassador for the pelagic ecosystem. A genome map for this species would aid in defining effective conservation units and understanding global population structure. We characterised the nuclear genome of the whale shark using next generation sequencing (454, Illumina) and de novo assembly and annotation methods, based on material collected from the Georgia Aquarium. The data set consisted of 878,654,233 reads, which yielded a draft assembly of 1,213,200 contigs and 997,976 scaffolds. The estimated genome size was 3.44Gb. As expected, the proteome of the whale shark was most closely related to the only other complete genome of a cartilaginous fish, the holocephalan elephant shark. The whale shark contained a novel Toll-like-receptor (TLR) protein with sequence similarity to both the TLR4 and TLR13 proteins of mammals and TLR21 of teleosts. The data are publicly available on GenBank, FigShare, and from the NCBI Short Read Archive under accession number SRP044374. This represents the first shotgun elasmobranch genome and will aid studies of molecular systematics, biogeography, genetic differentiation, and conservation genetics in this and other shark species, as well as providing comparative data for studies of evolutionary biology and immunology across the jawed vertebrate lineages.
Minogue, T D; Daligault, H E; Davenport, K W; Bishop-Lilly, K A; Bruce, D C; Chain, P S; Coyne, S R; Chertkov, O; Freitas, T; Frey, K G; Jaissle, J; Koroleva, G I; Ladner, J T; Palacios, G F; Redden, C L; Xu, Y; Johnson, S L
2014-10-23
The Enterobacteriaceae are environmental and enteric microbes. We sequenced the genomes of two Enterobacter reference strains, E. aerogenes CDC 6003-71 and E. cloacae CDC 442-68, as well as one near neighbor used as an exclusionary reference for diagnostics, Pantoea agglomerans CDC UA0804-01. The genome sizes range from 4.72 to 5.55 Mbp and have G+C contents from 54.6 to 55.1%. Copyright © 2014 Minogue et al.
Martin, Guillaume; Baurens, Franc-Christophe; Droc, Gaëtan; Rouard, Mathieu; Cenci, Alberto; Kilian, Andrzej; Hastie, Alex; Doležel, Jaroslav; Aury, Jean-Marc; Alberti, Adriana; Carreel, Françoise; D'Hont, Angélique
2016-03-16
Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata). We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80%), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5% of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70%. Unknown sites (N) were reduced from 17.3 to 10.0%. The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species.
Integrated genome sequence and linkage map of physic nut (Jatropha curcas L.), a biodiesel plant.
Wu, Pingzhi; Zhou, Changpin; Cheng, Shifeng; Wu, Zhenying; Lu, Wenjia; Han, Jinli; Chen, Yanbo; Chen, Yan; Ni, Peixiang; Wang, Ying; Xu, Xun; Huang, Ying; Song, Chi; Wang, Zhiwen; Shi, Nan; Zhang, Xudong; Fang, Xiaohua; Yang, Qing; Jiang, Huawu; Chen, Yaping; Li, Meiru; Wang, Ying; Chen, Fan; Wang, Jun; Wu, Guojiang
2015-03-01
The family Euphorbiaceae includes some of the most efficient biomass accumulators. Whole genome sequencing and the development of genetic maps of these species are important components in molecular breeding and genetic improvement. Here we report the draft genome of physic nut (Jatropha curcas L.), a biodiesel plant. The assembled genome has a total length of 320.5 Mbp and contains 27,172 putative protein-coding genes. We established a linkage map containing 1208 markers and anchored the genome assembly (81.7%) to this map to produce 11 pseudochromosomes. After gene family clustering, 15,268 families were identified, of which 13,887 existed in the castor bean genome. Analysis of the genome highlighted specific expansion and contraction of a number of gene families during the evolution of this species, including the ribosome-inactivating proteins and oil biosynthesis pathway enzymes. The genomic sequence and linkage map provide a valuable resource not only for fundamental and applied research on physic nut but also for evolutionary and comparative genomics analysis, particularly in the Euphorbiaceae. © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.
Perry, George H; Reeves, Darryl; Melsted, Páll; Ratan, Aakrosh; Miller, Webb; Michelini, Katelyn; Louis, Edward E; Pritchard, Jonathan K; Mason, Christopher E; Gilad, Yoav
2012-01-01
We present a high-coverage draft genome assembly of the aye-aye (Daubentonia madagascariensis), a highly unusual nocturnal primate from Madagascar. Our assembly totals ~3.0 billion bp (3.0 Gb), roughly the size of the human genome, comprised of ~2.6 million scaffolds (N50 scaffold size = 13,597 bp) based on short paired-end sequencing reads. We compared the aye-aye genome sequence data with four other published primate genomes (human, chimpanzee, orangutan, and rhesus macaque) as well as with the mouse and dog genomes as nonprimate outgroups. Unexpectedly, we observed strong evidence for a relatively slow substitution rate in the aye-aye lineage compared with these and other primates. In fact, the aye-aye branch length is estimated to be ~10% shorter than that of the human lineage, which is known for its low substitution rate. This finding may be explained, in part, by the protracted aye-aye life-history pattern, including late weaning and age of first reproduction relative to other lemurs. Additionally, the availability of this draft lemur genome sequence allowed us to polarize nucleotide and protein sequence changes to the ancestral primate lineage-a critical period in primate evolution, for which the relevant fossil record is sparse. Finally, we identified 293,800 high-confidence single nucleotide polymorphisms in the donor individual for our aye-aye genome sequence, a captive-born individual from two wild-born parents. The resulting heterozygosity estimate of 0.051% is the lowest of any primate studied to date, which is understandable considering the aye-aye's extensive home-range size and relatively low population densities. Yet this level of genetic diversity also suggests that conservation efforts benefiting this unusual species should be prioritized, especially in the face of the accelerating degradation and fragmentation of Madagascar's forests.
Snake Genome Sequencing: Results and Future Prospects
Kerkkamp, Harald M. I.; Kini, R. Manjunatha; Pospelov, Alexey S.; Vonk, Freek J.; Henkel, Christiaan V.; Richardson, Michael K.
2016-01-01
Snake genome sequencing is in its infancy—very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression. PMID:27916957
Snake Genome Sequencing: Results and Future Prospects.
Kerkkamp, Harald M I; Kini, R Manjunatha; Pospelov, Alexey S; Vonk, Freek J; Henkel, Christiaan V; Richardson, Michael K
2016-12-01
Snake genome sequencing is in its infancy-very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression.
De novo assembly of a draft genome for Cucumis hystrix, the closest relative of cucumber
USDA-ARS?s Scientific Manuscript database
Cucumis hystrix (2n = 2x = 24, HH) is the only known species that is cross-compatible with cucumber and has a great potential for cucumber improvement, To facilitate introgression of C. hystrix chromatins into cucumber genetic background through development of introgression library, we sequenced two...
USDA-ARS?s Scientific Manuscript database
The Asian citrus psyllid (Diaphorina citri Kuwayama) is the insect vector of the bacterium Candidatus Liberibacter asiaticus (CLas), the causal agent for the citrus greening or Huanglongbing disease which threatens citrus industry worldwide. This vector is the primary target of approaches to stop th...
USDA-ARS?s Scientific Manuscript database
Premise of the study: Microsatellite markers were developed for Plasmopara obducens, the causal agent of the newly emergent downy mildew disease of Impatiens walleriana. Methods and Results: A 151.2 Mb draft genome assembly was generated from P. obducens using Illumina technology and mined to identi...
Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals.
Taylor, Jeremy F; Whitacre, Lynsey K; Hoff, Jesse L; Tizioto, Polyana C; Kim, JaeWoo; Decker, Jared E; Schnabel, Robert D
2016-08-17
Decreasing sequencing costs and development of new protocols for characterizing global methylation, gene expression patterns and regulatory regions have stimulated the generation of large livestock datasets. Here, we discuss experiences in the analysis of whole-genome and transcriptome sequence data. We analyzed whole-genome sequence (WGS) data from 132 individuals from five canid species (Canis familiaris, C. latrans, C. dingo, C. aureus and C. lupus) and 61 breeds, three bison (Bison bison), 64 water buffalo (Bubalus bubalis) and 297 bovines from 17 breeds. By individual, data vary in extent of reference genome depth of coverage from 4.9X to 64.0X. We have also analyzed RNA-seq data for 580 samples representing 159 Bos taurus and Rattus norvegicus animals and 98 tissues. By aligning reads to a reference assembly and calling variants, we assessed effects of average depth of coverage on the actual coverage and on the number of called variants. We examined the identity of unmapped reads by assembling them and querying produced contigs against the non-redundant nucleic acids database. By imputing high-density single nucleotide polymorphism data on 4010 US registered Angus animals to WGS using Run4 of the 1000 Bull Genomes Project and assessing the accuracy of imputation, we identified misassembled reference sequence regions. We estimate that a 24X depth of coverage is required to achieve 99.5 % coverage of the reference assembly and identify 95 % of the variants within an individual's genome. Genomes sequenced to low average coverage (e.g., <10X) may fail to cover 10 % of the reference genome and identify <75 % of variants. About 10 % of genomic DNA or transcriptome sequence reads fail to align to the reference assembly. These reads include loci missing from the reference assembly and misassembled genes and interesting symbionts, commensal and pathogenic organisms. Assembly errors and a lack of annotation of functional elements significantly limit the utility of the current draft livestock reference assemblies. The Functional Annotation of Animal Genomes initiative seeks to annotate functional elements, while a 70X Pac-Bio assembly for cow is underway and may result in a significantly improved reference assembly.
Qi, Weihong; Vaughan, Lloyd; Katharios, Pantelis; Schlapbach, Ralph; Seth-Smith, Helena M.B.
2016-01-01
Advances in single-cell and mini-metagenome sequencing have enabled important investigations into uncultured bacteria. In this study, we applied the mini-metagenome sequencing method to assemble genome drafts of the uncultured causative agents of epitheliocystis, an emerging infectious disease in the Mediterranean aquaculture species gilthead seabream. We sequenced multiple cyst samples and constructed 11 genome drafts from a novel beta-proteobacterial lineage, Candidatus Ichthyocystis. The draft genomes demonstrate features typical of pathogenic bacteria with an obligate intracellular lifestyle: a reduced genome of up to 2.6 Mb, reduced G + C content, and reduced metabolic capacity. Reconstruction of metabolic pathways reveals that Ca. Ichthyocystis genomes lack all amino acid synthesis pathways, compelling them to scavenge from the fish host. All genomes encode type II, III, and IV secretion systems, a large repertoire of predicted effectors, and a type IV pilus. These are all considered to be virulence factors, required for adherence, invasion, and host manipulation. However, no evidence of lipopolysaccharide synthesis could be found. Beyond the core functions shared within the genus, alignments showed distinction into different species, characterized by alternative large gene families. These comprise up to a third of each genome, appear to have arisen through duplication and diversification, encode many effector proteins, and are seemingly critical for virulence. Thus, Ca. Ichthyocystis represents a novel obligatory intracellular pathogenic beta-proteobacterial lineage. The methods used: mini-metagenome analysis and manual annotation, have generated important insights into the lifestyle and evolution of the novel, uncultured pathogens, elucidating many putative virulence factors including an unprecedented array of novel gene families. PMID:27190004
Xiao, Shijun; Li, Jiongtang; Ma, Fengshou; Fang, Lujing; Xu, Shuangbin; Chen, Wei; Wang, Zhi Yong
2015-09-03
Large yellow croaker (Larimichthys crocea) is an important commercial fish in China and East-Asia. The annual product of the species from the aqua-farming industry is about 90 thousand tons. In spite of its economic importance, genetic studies of economic traits and genomic selections of the species are hindered by the lack of genomic resources. Specifically, a whole-genome physical map of large yellow croaker is still missing. The traditional BAC-based fingerprint method is extremely time- and labour-consuming. Here we report the first genome map construction using the high-throughput whole-genome mapping technique by nanochannel arrays in BioNano Genomics Irys system. For an optimal marker density of ~10 per 100 kb, the nicking endonuclease Nt.BspQ1 was chosen for the genome map generation. 645,305 DNA molecules with a total length of ~112 Gb were labelled and detected, covering more than 160X of the large yellow croaker genome. Employing IrysView package and signature patterns in raw DNA molecules, a whole-genome map of large yellow croaker was assembled into 686 maps with a total length of 727 Mb, which was consistent with the estimated genome size. The N50 length of the whole-genome map, including 126 maps, was up to 1.7 Mb. The excellent hybrid alignment with large yellow croaker draft genome validated the consensus genome map assembly and highlighted a promising application of whole-genome mapping on draft genome sequence super-scaffolding. The genome map data of large yellow croaker are accessible on lycgenomics.jmu.edu.cn/pm. Using the state-of-the-art whole-genome mapping technique in Irys system, the first whole-genome map for large yellow croaker has been constructed and thus highly facilitates the ongoing genomic and evolutionary studies for the species. To our knowledge, this is the first public report on genome map construction by the whole-genome mapping for aquatic-organisms. Our study demonstrates a promising application of the whole-genome mapping on genome maps construction for other non-model organisms in a fast and reliable manner.
Dukić, Marinela; Berner, Daniel; Roesti, Marius; Haag, Christoph R; Ebert, Dieter
2016-10-13
Recombination rate is an essential parameter for many genetic analyses. Recombination rates are highly variable across species, populations, individuals and different genomic regions. Due to the profound influence that recombination can have on intraspecific diversity and interspecific divergence, characterization of recombination rate variation emerges as a key resource for population genomic studies and emphasises the importance of high-density genetic maps as tools for studying genome biology. Here we present such a high-density genetic map for Daphnia magna, and analyse patterns of recombination rate across the genome. A F2 intercross panel was genotyped by Restriction-site Associated DNA sequencing to construct the third-generation linkage map of D. magna. The resulting high-density map included 4037 markers covering 813 scaffolds and contigs that sum up to 77 % of the currently available genome draft sequence (v2.4) and 55 % of the estimated genome size (238 Mb). Total genetic length of the map presented here is 1614.5 cM and the genome-wide recombination rate is estimated to 6.78 cM/Mb. Merging genetic and physical information we consistently found that recombination rate estimates are high towards the peripheral parts of the chromosomes, while chromosome centres, harbouring centromeres in D. magna, show very low recombination rate estimates. Due to its high-density, the third-generation linkage map for D. magna can be coupled with the draft genome assembly, providing an essential tool for genome investigation in this model organism. Thus, our linkage map can be used for the on-going improvements of the genome assembly, but more importantly, it has enabled us to characterize variation in recombination rate across the genome of D. magna for the first time. These new insights can provide a valuable assistance in future studies of the genome evolution, mapping of quantitative traits and population genetic studies.
Hierarchical Scaffolding With Bambus
Pop, Mihai; Kosack, Daniel S.; Salzberg, Steven L.
2004-01-01
The output of a genome assembler generally comprises a collection of contiguous DNA sequences (contigs) whose relative placement along the genome is not defined. A procedure called scaffolding is commonly used to order and orient these contigs using paired read information. This ordering of contigs is an essential step when finishing and analyzing the data from a whole-genome shotgun project. Most recent assemblers include a scaffolding module; however, users have little control over the scaffolding algorithm or the information produced. We thus developed a general-purpose scaffolder, called Bambus, which affords users significant flexibility in controlling the scaffolding parameters. Bambus was used recently to scaffold the low-coverage draft dog genome data. Most significantly, Bambus enables the use of linking data other than that inferred from mate-pair information. For example, the sequence of a completed genome can be used to guide the scaffolding of a related organism. We present several applications of Bambus: support for finishing, comparative genomics, analysis of the haplotype structure of genomes, and scaffolding of a mammalian genome at low coverage. Bambus is available as an open-source package from our Web site. PMID:14707177
Hierarchical scaffolding with Bambus.
Pop, Mihai; Kosack, Daniel S; Salzberg, Steven L
2004-01-01
The output of a genome assembler generally comprises a collection of contiguous DNA sequences (contigs) whose relative placement along the genome is not defined. A procedure called scaffolding is commonly used to order and orient these contigs using paired read information. This ordering of contigs is an essential step when finishing and analyzing the data from a whole-genome shotgun project. Most recent assemblers include a scaffolding module; however, users have little control over the scaffolding algorithm or the information produced. We thus developed a general-purpose scaffolder, called Bambus, which affords users significant flexibility in controlling the scaffolding parameters. Bambus was used recently to scaffold the low-coverage draft dog genome data. Most significantly, Bambus enables the use of linking data other than that inferred from mate-pair information. For example, the sequence of a completed genome can be used to guide the scaffolding of a related organism. We present several applications of Bambus: support for finishing, comparative genomics, analysis of the haplotype structure of genomes, and scaffolding of a mammalian genome at low coverage. Bambus is available as an open-source package from our Web site.
Pajuelo, Mónica J; Eguiluz, María; Dahlstrom, Eric; Requena, David; Guzmán, Frank; Ramirez, Manuel; Sheen, Patricia; Frace, Michael; Sammons, Scott; Cama, Vitaliano; Anzick, Sarah; Bruno, Dan; Mahanty, Siddhartha; Wilkins, Patricia; Nash, Theodore; Gonzalez, Armando; García, Héctor H; Gilman, Robert H; Porcella, Steve; Zimic, Mirko
2015-12-01
Infections with Taenia solium are the most common cause of adult acquired seizures worldwide, and are the leading cause of epilepsy in developing countries. A better understanding of the genetic diversity of T. solium will improve parasite diagnostics and transmission pathways in endemic areas thereby facilitating the design of future control measures and interventions. Microsatellite markers are useful genome features, which enable strain typing and identification in complex pathogen genomes. Here we describe microsatellite identification and characterization in T. solium, providing information that will assist in global efforts to control this important pathogen. For genome sequencing, T. solium cysts and proglottids were collected from Huancayo and Puno in Peru, respectively. Using next generation sequencing (NGS) and de novo assembly, we assembled two draft genomes and one hybrid genome. Microsatellite sequences were identified and 36 of them were selected for further analysis. Twenty T. solium isolates were collected from Tumbes in the northern region, and twenty from Puno in the southern region of Peru. The size-polymorphism of the selected microsatellites was determined with multi-capillary electrophoresis. We analyzed the association between microsatellite polymorphism and the geographic origin of the samples. The predicted size of the hybrid (proglottid genome combined with cyst genome) T. solium genome was 111 MB with a GC content of 42.54%. A total of 7,979 contigs (>1,000 nt) were obtained. We identified 9,129 microsatellites in the Puno-proglottid genome and 9,936 in the Huancayo-cyst genome, with 5 or more repeats, ranging from mono- to hexa-nucleotide. Seven microsatellites were polymorphic and 29 were monomorphic within the analyzed isolates. T. solium tapeworms were classified into two genetic groups that correlated with the North/South geographic origin of the parasites. The availability of draft genomes for T. solium represents a significant step towards the understanding the biology of the parasite. We report here a set of T. solium polymorphic microsatellite markers that appear promising for genetic epidemiology studies.
Gupta, Vikas; Estrada, April D; Blakley, Ivory; Reid, Rob; Patel, Ketan; Meyer, Mason D; Andersen, Stig Uggerhøj; Brown, Allan F; Lila, Mary Ann; Loraine, Ann E
2015-01-01
Blueberries are a rich source of antioxidants and other beneficial compounds that can protect against disease. Identifying genes involved in synthesis of bioactive compounds could enable the breeding of berry varieties with enhanced health benefits. Toward this end, we annotated a previously sequenced draft blueberry genome assembly using RNA-Seq data from five stages of berry fruit development and ripening. Genome-guided assembly of RNA-Seq read alignments combined with output from ab initio gene finders produced around 60,000 gene models, of which more than half were similar to proteins from other species, typically the grape Vitis vinifera. Comparison of gene models to the PlantCyc database of metabolic pathway enzymes identified candidate genes involved in synthesis of bioactive compounds, including bixin, an apocarotenoid with potential disease-fighting properties, and defense-related cyanogenic glycosides, which are toxic. Cyanogenic glycoside (CG) biosynthetic enzymes were highly expressed in green fruit, and a candidate CG detoxification enzyme was up-regulated during fruit ripening. Candidate genes for ethylene, anthocyanin, and 400 other biosynthetic pathways were also identified. Homology-based annotation using Blast2GO and InterPro assigned Gene Ontology terms to around 15,000 genes. RNA-Seq expression profiling showed that blueberry growth, maturation, and ripening involve dynamic gene expression changes, including coordinated up- and down-regulation of metabolic pathway enzymes and transcriptional regulators. Analysis of RNA-seq alignments identified developmentally regulated alternative splicing, promoter use, and 3' end formation. We report genome sequence, gene models, functional annotations, and RNA-Seq expression data that provide an important new resource enabling high throughput studies in blueberry.
Dhar, Hena; Swarnkar, Mohit Kumar; Gulati, Arvind; Singh, Anil Kumar; Kasana, Ramesh Chand
2015-02-19
Paenibacillus sp. strain IHB B 3415 is a cellulase-producing psychrotrophic bacterium isolated from a soil sample from the cold deserts of Himachal Pradesh, India. Here, we report an 8.44-Mb assembly of its genome sequence with a G+C content of 50.77%. The data presented here will provide insights into the mechanisms of cellulose degradation at low temperature. Copyright © 2015 Dhar et al.
Humble, E; Martinez-Barrio, A; Forcada, J; Trathan, P N; Thorne, M A S; Hoffmann, M; Wolf, J B W; Hoffman, J I
2016-07-01
Custom genotyping arrays provide a flexible and accurate means of genotyping single nucleotide polymorphisms (SNPs) in a large number of individuals of essentially any organism. However, validation rates, defined as the proportion of putative SNPs that are verified to be polymorphic in a population, are often very low. A number of potential causes of assay failure have been identified, but none have been explored systematically. In particular, as SNPs are often developed from transcriptomes, parameters relating to the genomic context are rarely taken into account. Here, we assembled a draft Antarctic fur seal (Arctocephalus gazella) genome (assembly size: 2.41 Gb; scaffold/contig N50 : 3.1 Mb/27.5 kb). We then used this resource to map the probe sequences of 144 putative SNPs genotyped in 480 individuals. The number of probe-to-genome mappings and alignment length together explained almost a third of the variation in validation success, indicating that sequence uniqueness and proximity to intron-exon boundaries play an important role. The same pattern was found after mapping the probe sequences to the Walrus and Weddell seal genomes, suggesting that the genomes of species divergent by as much as 23 million years can hold information relevant to SNP validation outcomes. Additionally, reanalysis of genotyping data from seven previous studies found the same two variables to be significantly associated with SNP validation success across a variety of taxa. Finally, our study reveals considerable scope for validation rates to be improved, either by simply filtering for SNPs whose flanking sequences align uniquely and completely to a reference genome, or through predictive modelling. © 2015 John Wiley & Sons Ltd.
SmedGD 2.0: The Schmidtea mediterranea genome database
Robb, Sofia M.C.; Gotting, Kirsten; Ross, Eric; Sánchez Alvarado, Alejandro
2016-01-01
Planarians have emerged as excellent models for the study of key biological processes such as stem cell function and regulation, axial polarity specification, regeneration, and tissue homeostasis among others. The most widely used organism for these studies is the free-living flatworm Schmidtea mediterranea. In 2007, the Schmidtea mediterranea Genome Database (SmedGD) was first released to provide a much needed resource for the small, but growing planarian community. SmedGD 1.0 has been a depository for genome sequence, a draft assembly, and related experimental data (e.g., RNAi phenotypes, in situ hybridization images, and differential gene expression results). We report here a comprehensive update to SmedGD (SmedGD 2.0) that aims to expand its role as an interactive community resource. The new database includes more recent, and up-to-date transcription data, provides tools that enhance interconnectivity between different genome assemblies and transcriptomes, including next generation assemblies for both the sexual and asexual biotypes of S. mediterranea. SmedGD 2.0 (http://smedgd.stowers.org) not only provides significantly improved gene annotations, but also tools for data sharing, attributes that will help both the planarian and biomedical communities to more efficiently mine the genomics and transcriptomics of S. mediterranea. PMID:26138588
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chapman, Carol; Henry, Matthew; Bishop-Lilly, Kimberly A.
Historically, cholera outbreaks have been linked to V. cholerae O1 serogroup strains or its derivatives of the O37 and O139 serogroups. A genomic study on the 2010 Haiti cholera outbreak strains highlighted the putative role of non O1/non-O139 V. cholerae in causing cholera and the lack of genomic sequences of such strains from around the world. Here we address these gaps by scanning a global collection of V. cholerae strains as a first step towards understanding the population genetic diversity and epidemic potential of non O1/non-O139 strains. Whole Genome Mapping (Optical Mapping) based bar coding produces a high resolution, orderedmore » restriction map, depicting a complete view of the unique chromosomal architecture of an organism. To assess the genomic diversity of non-O1/non-O139 V. cholerae, we applied a Whole Genome Mapping strategy on a well-defined and geographically and temporally diverse strain collection, the Sakazaki serogroup type strains. Whole Genome Map data on 91 of the 206 serogroup type strains support the hypothesis that V. cholerae has an unprecedented genetic and genomic structural diversity. Interestingly, we discovered chromosomal fusions in two unusual strains that possess a single chromosome instead of the two chromosomes usually found in V. cholerae. We also found pervasive chromosomal rearrangements such as duplications and indels in many strains. The majority of Vibrio genome sequences currently in public databases are unfinished draft sequences. The Whole Genome Mapping approach presented here enables rapid screening of large strain collections to capture genomic complexities that would not have been otherwise revealed by unfinished draft genome sequencing and thus aids in assembling and finishing draft sequences of complex genomes. Furthermore, Whole Genome Mapping allows for prediction of novel V. cholerae non-O1/non-O139 strains that may have the potential to cause future cholera outbreaks.« less
Chapman, Carol; Henry, Matthew; Bishop-Lilly, Kimberly A; Awosika, Joy; Briska, Adam; Ptashkin, Ryan N; Wagner, Trevor; Rajanna, Chythanya; Tsang, Hsinyi; Johnson, Shannon L; Mokashi, Vishwesh P; Chain, Patrick S G; Sozhamannan, Shanmuga
2015-01-01
Historically, cholera outbreaks have been linked to V. cholerae O1 serogroup strains or its derivatives of the O37 and O139 serogroups. A genomic study on the 2010 Haiti cholera outbreak strains highlighted the putative role of non O1/non-O139 V. cholerae in causing cholera and the lack of genomic sequences of such strains from around the world. Here we address these gaps by scanning a global collection of V. cholerae strains as a first step towards understanding the population genetic diversity and epidemic potential of non O1/non-O139 strains. Whole Genome Mapping (Optical Mapping) based bar coding produces a high resolution, ordered restriction map, depicting a complete view of the unique chromosomal architecture of an organism. To assess the genomic diversity of non-O1/non-O139 V. cholerae, we applied a Whole Genome Mapping strategy on a well-defined and geographically and temporally diverse strain collection, the Sakazaki serogroup type strains. Whole Genome Map data on 91 of the 206 serogroup type strains support the hypothesis that V. cholerae has an unprecedented genetic and genomic structural diversity. Interestingly, we discovered chromosomal fusions in two unusual strains that possess a single chromosome instead of the two chromosomes usually found in V. cholerae. We also found pervasive chromosomal rearrangements such as duplications and indels in many strains. The majority of Vibrio genome sequences currently in public databases are unfinished draft sequences. The Whole Genome Mapping approach presented here enables rapid screening of large strain collections to capture genomic complexities that would not have been otherwise revealed by unfinished draft genome sequencing and thus aids in assembling and finishing draft sequences of complex genomes. Furthermore, Whole Genome Mapping allows for prediction of novel V. cholerae non-O1/non-O139 strains that may have the potential to cause future cholera outbreaks.
Chapman, Carol; Henry, Matthew; Bishop-Lilly, Kimberly A.; ...
2015-03-20
Historically, cholera outbreaks have been linked to V. cholerae O1 serogroup strains or its derivatives of the O37 and O139 serogroups. A genomic study on the 2010 Haiti cholera outbreak strains highlighted the putative role of non O1/non-O139 V. cholerae in causing cholera and the lack of genomic sequences of such strains from around the world. Here we address these gaps by scanning a global collection of V. cholerae strains as a first step towards understanding the population genetic diversity and epidemic potential of non O1/non-O139 strains. Whole Genome Mapping (Optical Mapping) based bar coding produces a high resolution, orderedmore » restriction map, depicting a complete view of the unique chromosomal architecture of an organism. To assess the genomic diversity of non-O1/non-O139 V. cholerae, we applied a Whole Genome Mapping strategy on a well-defined and geographically and temporally diverse strain collection, the Sakazaki serogroup type strains. Whole Genome Map data on 91 of the 206 serogroup type strains support the hypothesis that V. cholerae has an unprecedented genetic and genomic structural diversity. Interestingly, we discovered chromosomal fusions in two unusual strains that possess a single chromosome instead of the two chromosomes usually found in V. cholerae. We also found pervasive chromosomal rearrangements such as duplications and indels in many strains. The majority of Vibrio genome sequences currently in public databases are unfinished draft sequences. The Whole Genome Mapping approach presented here enables rapid screening of large strain collections to capture genomic complexities that would not have been otherwise revealed by unfinished draft genome sequencing and thus aids in assembling and finishing draft sequences of complex genomes. Furthermore, Whole Genome Mapping allows for prediction of novel V. cholerae non-O1/non-O139 strains that may have the potential to cause future cholera outbreaks.« less
Gan, Han M; Lee, Yin P; Austin, Christopher M
2017-01-01
We improved upon the previously reported draft genome of Hydrogenophaga intermedia strain PBC, a 4-aminobenzenesulfonate-degrading bacterium, by supplementing the assembly with Nanopore long reads which enabled the reconstruction of the genome as a single contig. From the complete genome, major genes responsible for the catabolism of 4-aminobenzenesulfonate in strain PBC are clustered in two distinct genomic regions. Although the catabolic genes for 4-sulfocatechol, the deaminated product of 4-aminobenzenesulfonate, are only found in H. intermedia , the sad operon responsible for the first deamination step of 4-aminobenzenesulfonate is conserved in various Hydrogenophaga strains. The absence of pabB gene in the complete genome of H. intermedia PBC is consistent with its p -aminobenzoic acid (pABA) auxotrophy but surprisingly comparative genomics analysis of 14 Hydrogenophaga genomes indicate that pABA auxotrophy is not an uncommon feature among members of this genus. Of even more interest, several Hydrogenophaga strains do not possess the genomic potential for hydrogen oxidation, calling for a revision to the taxonomic description of Hydrogenophaga as "hydrogen eating bacteria."
Approaches for in silico finishing of microbial genome sequences.
Kremer, Frederico Schmitt; McBride, Alan John Alexander; Pinto, Luciano da Silva
The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as "drafts", incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing.
Hanafy, Radwa A; Couger, M B; Baker, Kristina; Murphy, Chelsea; O'Kane, Shannon D; Budd, Connie; French, Donald P; Hoff, Wouter D; Youssef, Noha
2016-09-01
Micrococcus luteus is a predominant member of skin microbiome. We here report on the genomic analysis of Micrococcus luteus strain O'Kane that was isolated from an elevator. The partial genome assembly of Micrococcus luteus strain O'Kane is 2.5 Mb with 2256 protein-coding genes and 62 RNA genes. Genomic analysis revealed metabolic versatility with genes involved in the metabolism and transport of glucose, galactose, fructose, mannose, alanine, aspartate, asparagine, glutamate, glutamine, glycine, serine, cysteine, methionine, arginine, proline, histidine, phenylalanine, and fatty acids. Genomic comparison to other M. luteus representatives identified the potential to degrade polyhydroxybutyrates, as well as several antibiotic resistance genes absent from other genomes.
Issa, Mohammad Nouh; Ashhab, Yaqoub
2016-09-22
Brucella melitensis Rev.1 is an avirulent strain that is widely used as a live vaccine to control brucellosis in small ruminants. Although an assembled draft version of Rev.1 genome has been available since 2009, this genome has not been investigated to characterize this important vaccine. In the present work, we used the draft genome of Rev.1 to perform a thorough genomic comparison and sequence analysis to identify and characterize the panel of its unique genetic markers. The draft genome of Rev.1 was compared with genome sequences of 36 different Brucella melitensis strains from the Brucella project of the Broad Institute of MIT and Harvard. The comparative analyses revealed 32 genetic alterations (30 SNPs, 1 single-bp insertion and 1 single-bp deletion) that are exclusively present in the Rev.1 genome. In silico analyses showed that 9 out of the 17 non-synonymous mutations are deleterious. Three ABC transporters are among the disrupted genes that can be linked to virulence attenuation. Out of the 32 mutations, 11 Rev.1 specific markers were selected to test their potential to discriminate Rev.1 using a bi-directional allele-specific PCR assay. Six markers were able to distinguish between Rev.1 and a set of control strains. We succeeded in identifying a panel of 32 genome-specific markers of the B. melitensis Rev.1 vaccine strain. Extensive in silico analysis showed that a considerable number of these mutations could severely affect the function of the associated genes. In addition, some of the discovered markers were able to discriminate Rev.1 strain from a group of control strains using practical PCR tests that can be applied in resource-limited settings. Copyright © 2016 Elsevier Ltd. All rights reserved.
Complete genome sequence of the phenanthrene-degrading soil bacterium Delftia acidovorans Cs1-4
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shetty, Ameesha R.; de Gannes, Vidya; Obi, Chioma C.
Polycyclic aromatic hydrocarbons (PAH) are ubiquitous environmental pollutants and microbial biodegradation is an important means of remediation of PAH-contaminated soil. Delftia acidovorans Cs1-4 (formerly Delftia sp. Cs1-4) was isolated by using phenanthrene as the sole carbon source from PAH contaminated soil in Wisconsin. Its full genome sequence was determined to gain insights into a mechanisms underlying biodegradation of PAH. Three genomic libraries were constructed and sequenced: an Illumina GAii shotgun library (916,416,493 reads), a 454 Titanium standard library (770,171 reads) and one paired-end 454 library (average insert size of 8 kb, 508,092 reads). The initial assembly contained 40 contigs inmore » two scaffolds. The 454 Titanium standard data and the 454 paired end data were assembled together and the consensus sequences were computationally shredded into 2 kb overlapping shreds. Illumina sequencing data was assembled, and the consensus sequence was computationally shredded into 1.5 kb overlapping shreds. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks. A total of 182 additional reactions were needed to close gaps and to raise the quality of the finished sequence. The final assembly is based on 253.3 Mb of 454 draft data (averaging 38.4 X coverage) and 590.2 Mb of Illumina draft data (averaging 89.4 X coverage). The genome of strain Cs1-4 consists of a single circular chromosome of 6,685,842 bp (66.7 %G+C) containing 6,028 predicted genes; 5,931 of these genes were protein-encoding and 4,425 gene products were assigned to a putative function. Genes encoding phenanthrene degradation were localized to a 232 kb genomic island (termed the phn island), which contained near its 3’ end a bacteriophage P4-like integrase, an enzyme often associated with chromosomal integration of mobile genetic elements. Other biodegradation pathways reconstructed from the genome sequence included: benzoate (by the acetyl-CoA pathway), styrene, nicotinic acid (by the maleamate pathway) and the pesticides Dicamba and Fenitrothion. Lastly, determination of the complete genome sequence of D. acidovorans Cs1-4 has provided new insights the microbial mechanisms of PAH biodegradation that may shape the process in the environment.« less
Complete genome sequence of the phenanthrene-degrading soil bacterium Delftia acidovorans Cs1-4
Shetty, Ameesha R.; de Gannes, Vidya; Obi, Chioma C.; ...
2015-08-15
Polycyclic aromatic hydrocarbons (PAH) are ubiquitous environmental pollutants and microbial biodegradation is an important means of remediation of PAH-contaminated soil. Delftia acidovorans Cs1-4 (formerly Delftia sp. Cs1-4) was isolated by using phenanthrene as the sole carbon source from PAH contaminated soil in Wisconsin. Its full genome sequence was determined to gain insights into a mechanisms underlying biodegradation of PAH. Three genomic libraries were constructed and sequenced: an Illumina GAii shotgun library (916,416,493 reads), a 454 Titanium standard library (770,171 reads) and one paired-end 454 library (average insert size of 8 kb, 508,092 reads). The initial assembly contained 40 contigs inmore » two scaffolds. The 454 Titanium standard data and the 454 paired end data were assembled together and the consensus sequences were computationally shredded into 2 kb overlapping shreds. Illumina sequencing data was assembled, and the consensus sequence was computationally shredded into 1.5 kb overlapping shreds. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks. A total of 182 additional reactions were needed to close gaps and to raise the quality of the finished sequence. The final assembly is based on 253.3 Mb of 454 draft data (averaging 38.4 X coverage) and 590.2 Mb of Illumina draft data (averaging 89.4 X coverage). The genome of strain Cs1-4 consists of a single circular chromosome of 6,685,842 bp (66.7 %G+C) containing 6,028 predicted genes; 5,931 of these genes were protein-encoding and 4,425 gene products were assigned to a putative function. Genes encoding phenanthrene degradation were localized to a 232 kb genomic island (termed the phn island), which contained near its 3’ end a bacteriophage P4-like integrase, an enzyme often associated with chromosomal integration of mobile genetic elements. Other biodegradation pathways reconstructed from the genome sequence included: benzoate (by the acetyl-CoA pathway), styrene, nicotinic acid (by the maleamate pathway) and the pesticides Dicamba and Fenitrothion. Lastly, determination of the complete genome sequence of D. acidovorans Cs1-4 has provided new insights the microbial mechanisms of PAH biodegradation that may shape the process in the environment.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, Steven D; Nagaraju, Shilpa; Utturkar, Sagar M
Background Clostridium autoethanogenum strain JA1-1 (DSM 10061) is an acetogen capable of fermenting CO, CO2 and H2 (e.g. from syngas or waste gases) into biofuel ethanol and commodity chemicals such as 2,3-butanediol. A draft genome sequence consisting of 100 contigs has been published. Results A closed, high-quality genome sequence for C. autoethanogenum DSM10061 was generated using only the latest single-molecule DNA sequencing technology and without the need for manual finishing. It is assigned to the most complex genome classification based upon genome features such as repeats, prophage, nine copies of the rRNA gene operons. It has a low G +more » C content of 31.1%. Illumina, 454, Illumina/454 hybrid assemblies were generated and then compared to the draft and PacBio assemblies using summary statistics, CGAL, QUAST and REAPR bioinformatics tools and comparative genomic approaches. Assemblies based upon shorter read DNA technologies were confounded by the large number repeats and their size, which in the case of the rRNA gene operons were ~5 kb. CRISPR (Clustered Regularly Interspaced Short Paloindromic Repeats) systems among biotechnologically relevant Clostridia were classified and related to plasmid content and prophages. Potential associations between plasmid content and CRISPR systems may have implications for historical industrial scale Acetone-Butanol-Ethanol (ABE) fermentation failures and future large scale bacterial fermentations. While C. autoethanogenum contains an active CRISPR system, no such system is present in the closely related Clostridium ljungdahlii DSM 13528. A common prophage inserted into the Arg-tRNA shared between the strains suggests a common ancestor. However, C. ljungdahlii contains several additional putative prophages and it has more than double the amount of prophage DNA compared to C. autoethanogenum. Other differences include important metabolic genes for central metabolism (as an additional hydrogenase and the absence of a phophoenolpyruvate synthase) and substrate utilization pathway (mannose and aromatics utilization) that might explain phenotypic differences between C. autoethanogenum and C. ljungdahlii. Conclusions Single molecule sequencing will be increasingly used to produce finished microbial genomes. The complete genome will facilitate comparative genomics and functional genomics and support future comparisons between Clostridia and studies that examine the evolution of plasmids, bacteriophage and CRISPR systems.« less
Nishitsuji, Koki; Arimoto, Asuka; Iwai, Kenji; Sudo, Yusuke; Hisata, Kanako; Fujie, Manabu; Arakaki, Nana; Kushiro, Tetsuo; Konishi, Teruko; Shinzato, Chuya; Satoh, Noriyuki; Shoguchi, Eiichi
2016-12-01
The brown alga, Cladosiphon okamuranus (Okinawa mozuku), is economically one of the most important edible seaweeds, and is cultivated for market primarily in Okinawa, Japan. C. okamuranus constitutes a significant source of fucoidan, which has various physiological and biological activities. To facilitate studies of seaweed biology, we decoded the draft genome of C. okamuranus S-strain. The genome size of C. okamuranus was estimated as ∼140 Mbp, smaller than genomes of two other brown algae, Ectocarpus siliculosus and Saccharina japonica Sequencing with ∼100× coverage yielded an assembly of 541 scaffolds with N50 = 416 kbp. Together with transcriptomic data, we estimated that the C. okamuranus genome contains 13,640 protein-coding genes, approximately 94% of which have been confirmed with corresponding mRNAs. Comparisons with the E. siliculosus genome identified a set of C. okamuranus genes that encode enzymes involved in biosynthetic pathways for sulfated fucans and alginate biosynthesis. In addition, we identified C. okamuranus genes for enzymes involved in phlorotannin biosynthesis. The present decoding of the Cladosiphon okamuranus genome provides a platform for future studies of mozuku biology. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Whole genome sequencing data and de novo draft assemblies for 66 teleost species
Malmstrøm, Martin; Matschiner, Michael; Tørresen, Ole K.; Jakobsen, Kjetill S.; Jentoft, Sissel
2017-01-01
Teleost fishes comprise more than half of all vertebrate species, yet genomic data are only available for 0.2% of their diversity. Here, we present whole genome sequencing data for 66 new species of teleosts, vastly expanding the availability of genomic data for this important vertebrate group. We report on de novo assemblies based on low-coverage (9–39×) sequencing and present detailed methodology for all analyses. To facilitate further utilization of this data set, we present statistical analyses of the gene space completeness and verify the expected phylogenetic position of the sequenced genomes in a large mitogenomic context. We further present a nuclear marker set used for phylogenetic inference and evaluate each gene tree in relation to the species tree to test for homogeneity in the phylogenetic signal. Collectively, these analyses illustrate the robustness of this highly diverse data set and enable extensive reuse of the selected phylogenetic markers and the genomic data in general. This data set covers all major teleost lineages and provides unprecedented opportunities for comparative studies of teleosts. PMID:28094797
Wang, Zhiping; Liu, Lili; Guo, Feng; Zhang, Tong
2015-10-01
Biotreatment processes fed with coking wastewater often encounter insufficient removal of pollutants, such as ammonia, phenols, and polycyclic aromatic hydrocarbons (PAHs), especially for cyanides. However, only a limited number of bacterial species in pure cultures have been confirmed to metabolize cyanides, which hinders the improvement of these processes. In this study, a microbial community of activated sludge enriched in a coking wastewater treatment plant was analyzed using 454 pyrosequencing and Illumina sequencing to characterize the potential cyanide-degrading bacteria. According to the classification of these pyro-tags, targeting V3/V4 regions of 16S rRNA gene, half of them were assigned to the family Xanthomonadaceae, implying that Xanthomonadaceae bacteria are well-adapted to coking wastewater. A nearly complete draft genome of the dominant bacterium was reconstructed from metagenome of this community to explore cyanide metabolism based on analysis of the genome. The assembled 16S rRNA gene from this draft genome showed that this bacterium was a novel species of Thermomonas within Xanthomonadaceae, which was further verified by comparative genomics. The annotation using KEGG and Pfam identified genes related to cyanide metabolism, including genes responsible for the iron-harvesting system, cyanide-insensitive terminal oxidase, cyanide hydrolase/nitrilase, and thiosulfate:cyanide transferase. Phylogenetic analysis showed that these genes had homologs in previously identified genomes of bacteria within Xanthomonadaceae and even presented similar gene cassettes, thus implying an inherent cyanide-decomposing potential. The findings of this study expand our knowledge about the bacterial degradation of cyanide compounds and will be helpful in the remediation of cyanides contamination.
Zhang, Jianwei; Kudrna, Dave; Mu, Ting; Li, Weiming; Copetti, Dario; Yu, Yeisoo; Goicoechea, Jose Luis; Lei, Yang; Wing, Rod A
2016-10-15
Next generation sequencing technologies have revolutionized our ability to rapidly and affordably generate vast quantities of sequence data. Once generated, raw sequences are assembled into contigs or scaffolds. However, these assemblies are mostly fragmented and inaccurate at the whole genome scale, largely due to the inability to integrate additional informative datasets (e.g. physical, optical and genetic maps). To address this problem, we developed a semi-automated software tool-Genome Puzzle Master (GPM)-that enables the integration of additional genomic signposts to edit and build 'new-gen-assemblies' that result in high-quality 'annotation-ready' pseudomolecules. With GPM, loaded datasets can be connected to each other via their logical relationships which accomplishes tasks to 'group,' 'merge,' 'order and orient' sequences in a draft assembly. Manual editing can also be performed with a user-friendly graphical interface. Final pseudomolecules reflect a user's total data package and are available for long-term project management. GPM is a web-based pipeline and an important part of a Laboratory Information Management System (LIMS) which can be easily deployed on local servers for any genome research laboratory. The GPM (with LIMS) package is available at https://github.com/Jianwei-Zhang/LIMS CONTACTS: jzhang@mail.hzau.edu.cn or rwing@mail.arizona.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Koutsovoulos, Georgios; Laetsch, Dominik R.; Stevens, Lewis; Daub, Jennifer; Conlon, Claire; Maroon, Habib; Thomas, Fran; Aboobaker, Aziz A.
2016-01-01
Tardigrades are meiofaunal ecdysozoans that are key to understanding the origins of Arthropoda. Many species of Tardigrada can survive extreme conditions through cryptobiosis. In a recent paper [Boothby TC, et al. (2015) Proc Natl Acad Sci USA 112(52):15976–15981], the authors concluded that the tardigrade Hypsibius dujardini had an unprecedented proportion (17%) of genes originating through functional horizontal gene transfer (fHGT) and speculated that fHGT was likely formative in the evolution of cryptobiosis. We independently sequenced the genome of H. dujardini. As expected from whole-organism DNA sampling, our raw data contained reads from nontarget genomes. Filtering using metagenomics approaches generated a draft H. dujardini genome assembly of 135 Mb with superior assembly metrics to the previously published assembly. Additional microbial contamination likely remains. We found no support for extensive fHGT. Among 23,021 gene predictions we identified 0.2% strong candidates for fHGT from bacteria and 0.2% strong candidates for fHGT from nonmetazoan eukaryotes. Cross-comparison of assemblies showed that the overwhelming majority of HGT candidates in the Boothby et al. genome derived from contaminants. We conclude that fHGT into H. dujardini accounts for at most 1–2% of genes and that the proposal that one-sixth of tardigrade genes originate from functional HGT events is an artifact of undetected contamination. PMID:27035985
Koutsovoulos, Georgios; Kumar, Sujai; Laetsch, Dominik R; Stevens, Lewis; Daub, Jennifer; Conlon, Claire; Maroon, Habib; Thomas, Fran; Aboobaker, Aziz A; Blaxter, Mark
2016-05-03
Tardigrades are meiofaunal ecdysozoans that are key to understanding the origins of Arthropoda. Many species of Tardigrada can survive extreme conditions through cryptobiosis. In a recent paper [Boothby TC, et al. (2015) Proc Natl Acad Sci USA 112(52):15976-15981], the authors concluded that the tardigrade Hypsibius dujardini had an unprecedented proportion (17%) of genes originating through functional horizontal gene transfer (fHGT) and speculated that fHGT was likely formative in the evolution of cryptobiosis. We independently sequenced the genome of H. dujardini As expected from whole-organism DNA sampling, our raw data contained reads from nontarget genomes. Filtering using metagenomics approaches generated a draft H. dujardini genome assembly of 135 Mb with superior assembly metrics to the previously published assembly. Additional microbial contamination likely remains. We found no support for extensive fHGT. Among 23,021 gene predictions we identified 0.2% strong candidates for fHGT from bacteria and 0.2% strong candidates for fHGT from nonmetazoan eukaryotes. Cross-comparison of assemblies showed that the overwhelming majority of HGT candidates in the Boothby et al. genome derived from contaminants. We conclude that fHGT into H. dujardini accounts for at most 1-2% of genes and that the proposal that one-sixth of tardigrade genes originate from functional HGT events is an artifact of undetected contamination.
NASA Astrophysics Data System (ADS)
Hernsdorf, A. W.; Amano, Y.; Suzuki, Y.; Ise, K.; Thomas, B. C.; Banfield, J. F.
2015-12-01
Terrestrial sediments are an important global reservoir for methane. Microorganisms in the deep subsurface play a critical role in the methane cycle, yet much remains to be learned about their diversity and metabolisms. To provide more comprehensive insight into the microbiology of the methane cycle in the deep subsurface, we conducted a genome-resolved study of samples collected from the Horonobe Underground Research Laboratory (HURL), Japan. Groundwater samples were obtained from three boreholes from a depth range of between 140 m and 250 m in two consecutive years. Groundwater was filtered and metagenomic DNA extracted and sequenced, and the sequence data assembled. Based on the sequences of phylogenetically informative genes on the assembled fragments, we detected a high degree of overlap in community composition across a vertical transect within one borehole at the two sampling times. However, there was comparatively little similarity observed among communities across boreholes. Spatial and temporal abundance patterns were used in combination with tetranucleotide signatures of assembled genome fragments to bin the data and reconstruct over 200 unique draft genomes, of which 137 are considered to be of high quality (>90% complete). The deepest samples from one borehole were highly dominated by an archaeon identified as ANME-2D; this organism was also present at lower abundance in all other samples from that borehole. Also abundant in these microbial communities were novel members of the Gammaproteobacteria, Saccharibacteria (TM7) and Tenericute phyla. Notably, a ~2 Mbp draft genome for the ANME-2D archaeon was reconstructed. As expected, the genome encodes all of the genes predicted to be involved in the reverse methanogenesis pathway. In contrast with the previously reported ANME2-D genome, the HURL ANME-2D genome lacks the capacity to reduce nitrate. However, we identified many multiheme cytochromes with closest similarity to those of the known Fe-reducing/oxidizing archaeon Ferroglobus placidus. Thus, we suggest that ANME2-D may couple methane oxidation to reduction of ferric iron minerals in the sediment and may be generally important as a link between the iron and methane cycles in deep subsurface environments. Such information has important implications for modeling the global carbon cycle.
Draft genome of the leopard gecko, Eublepharis macularius.
Xiong, Zijun; Li, Fang; Li, Qiye; Zhou, Long; Gamble, Tony; Zheng, Jiao; Kui, Ling; Li, Cai; Li, Shengbin; Yang, Huanming; Zhang, Guojie
2016-10-26
Geckos are among the most species-rich reptile groups and the sister clade to all other lizards and snakes. Geckos possess a suite of distinctive characteristics, including adhesive digits, nocturnal activity, hard, calcareous eggshells, and a lack of eyelids. However, one gecko clade, the Eublepharidae, appears to be the exception to most of these 'rules' and lacks adhesive toe pads, has eyelids, and lays eggs with soft, leathery eggshells. These differences make eublepharids an important component of any investigation into the underlying genomic innovations contributing to the distinctive phenotypes in 'typical' geckos. We report high-depth genome sequencing, assembly, and annotation for a male leopard gecko, Eublepharis macularius (Eublepharidae). Illumina sequence data were generated from seven insert libraries (ranging from 170 to 20 kb), representing a raw sequencing depth of 136X from 303 Gb of data, reduced to 84X and 187 Gb after filtering. The assembled genome of 2.02 Gb was close to the 2.23 Gb estimated by k-mer analysis. Scaffold and contig N50 sizes of 664 and 20 kb, respectively, were comparable to the previously published Gekko japonicus genome. Repetitive elements accounted for 42 % of the genome. Gene annotation yielded 24,755 protein-coding genes, of which 93 % were functionally annotated. CEGMA and BUSCO assessment showed that our assembly captured 91 % (225 of 248) of the core eukaryotic genes, and 76 % of vertebrate universal single-copy orthologs. Assembly of the leopard gecko genome provides a valuable resource for future comparative genomic studies of geckos and other squamate reptiles.
2014-01-01
Background Clostridium autoethanogenum strain JA1-1 (DSM 10061) is an acetogen capable of fermenting CO, CO2 and H2 (e.g. from syngas or waste gases) into biofuel ethanol and commodity chemicals such as 2,3-butanediol. A draft genome sequence consisting of 100 contigs has been published. Results A closed, high-quality genome sequence for C. autoethanogenum DSM10061 was generated using only the latest single-molecule DNA sequencing technology and without the need for manual finishing. It is assigned to the most complex genome classification based upon genome features such as repeats, prophage, nine copies of the rRNA gene operons. It has a low G + C content of 31.1%. Illumina, 454, Illumina/454 hybrid assemblies were generated and then compared to the draft and PacBio assemblies using summary statistics, CGAL, QUAST and REAPR bioinformatics tools and comparative genomic approaches. Assemblies based upon shorter read DNA technologies were confounded by the large number repeats and their size, which in the case of the rRNA gene operons were ~5 kb. CRISPR (Clustered Regularly Interspaced Short Paloindromic Repeats) systems among biotechnologically relevant Clostridia were classified and related to plasmid content and prophages. Potential associations between plasmid content and CRISPR systems may have implications for historical industrial scale Acetone-Butanol-Ethanol (ABE) fermentation failures and future large scale bacterial fermentations. While C. autoethanogenum contains an active CRISPR system, no such system is present in the closely related Clostridium ljungdahlii DSM 13528. A common prophage inserted into the Arg-tRNA shared between the strains suggests a common ancestor. However, C. ljungdahlii contains several additional putative prophages and it has more than double the amount of prophage DNA compared to C. autoethanogenum. Other differences include important metabolic genes for central metabolism (as an additional hydrogenase and the absence of a phophoenolpyruvate synthase) and substrate utilization pathway (mannose and aromatics utilization) that might explain phenotypic differences between C. autoethanogenum and C. ljungdahlii. Conclusions Single molecule sequencing will be increasingly used to produce finished microbial genomes. The complete genome will facilitate comparative genomics and functional genomics and support future comparisons between Clostridia and studies that examine the evolution of plasmids, bacteriophage and CRISPR systems. PMID:24655715
Palaeosymbiosis Revealed by Genomic Fossils of Wolbachia in a Strongyloidean Nematode
Koutsovoulos, Georgios; Makepeace, Benjamin; Tanya, Vincent N.; Blaxter, Mark
2014-01-01
Wolbachia are common endosymbionts of terrestrial arthropods, and are also found in nematodes: the animal-parasitic filaria, and the plant-parasite Radopholus similis. Lateral transfer of Wolbachia DNA to the host genome is common. We generated a draft genome sequence for the strongyloidean nematode parasite Dictyocaulus viviparus, the cattle lungworm. In the assembly, we identified nearly 1 Mb of sequence with similarity to Wolbachia. The fragments were unlikely to derive from a live Wolbachia infection: most were short, and the genes were disabled through inactivating mutations. Many fragments were co-assembled with definitively nematode-derived sequence. We found limited evidence of expression of the Wolbachia-derived genes. The D. viviparus Wolbachia genes were most similar to filarial strains and strains from the host-promiscuous clade F. We conclude that D. viviparus was infected by Wolbachia in the past, and that clade F-like symbionts may have been the source of filarial Wolbachia infections. PMID:24901418
Argout, X; Martin, G; Droc, G; Fouet, O; Labadie, K; Rivals, E; Aury, J M; Lanaud, C
2017-09-15
Theobroma cacao L., native to the Amazonian basin of South America, is an economically important fruit tree crop for tropical countries as a source of chocolate. The first draft genome of the species, from a Criollo cultivar, was published in 2011. Although a useful resource, some improvements are possible, including identifying misassemblies, reducing the number of scaffolds and gaps, and anchoring un-anchored sequences to the 10 chromosomes. We used a NGS-based approach to significantly improve the assembly of the Belizian Criollo B97-61/B2 genome. We combined four Illumina large insert size mate paired libraries with 52x of Pacific Biosciences long reads to correct misassembled regions and reduced the number of scaffolds. We then used genotyping by sequencing (GBS) methods to increase the proportion of the assembly anchored to chromosomes. The scaffold number decreased from 4,792 in assembly V1 to 554 in V2 while the scaffold N50 size has increased from 0.47 Mb in V1 to 6.5 Mb in V2. A total of 96.7% of the assembly was anchored to the 10 chromosomes compared to 66.8% in the previous version. Unknown sites (Ns) were reduced from 10.8% to 5.7%. In addition, we updated the functional annotations and performed a new RefSeq structural annotation based on RNAseq evidence. Theobroma cacao Criollo genome version 2 will be a valuable resource for the investigation of complex traits at the genomic level and for future comparative genomics and genetics studies in cacao tree. New functional tools and annotations are available on the Cocoa Genome Hub ( http://cocoa-genome-hub.southgreen.fr ).
Assembly, Annotation, and Analysis of Multiple Mycorrhizal Fungal Genomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Initiative Consortium, Mycorrhizal Genomics; Kuo, Alan; Grigoriev, Igor
Mycorrhizal fungi play critical roles in host plant health, soil community structure and chemistry, and carbon and nutrient cycling, all areas of intense interest to the US Dept. of Energy (DOE) Joint Genome Institute (JGI). To this end we are building on our earlier sequencing of the Laccaria bicolor genome by partnering with INRA-Nancy and the mycorrhizal research community in the MGI to sequence and analyze dozens of mycorrhizal genomes of all Basidiomycota and Ascomycota orders and multiple ecological types (ericoid, orchid, and ectomycorrhizal). JGI has developed and deployed high-throughput sequencing techniques, and Assembly, RNASeq, and Annotation Pipelines. In 2012more » alone we sequenced, assembled, and annotated 12 draft or improved genomes of mycorrhizae, and predicted ~;;232831 genes and ~;;15011 multigene families, All of this data is publicly available on JGI MycoCosm (http://jgi.doe.gov/fungi/), which provides access to both the genome data and tools with which to analyze the data. Preliminary comparisons of the current total of 14 public mycorrhizal genomes suggest that 1) short secreted proteins potentially involved in symbiosis are more enriched in some orders than in others amongst the mycorrhizal Agaricomycetes, 2) there are wide ranges of numbers of genes involved in certain functional categories, such as signal transduction and post-translational modification, and 3) novel gene families are specific to some ecological types.« less
The Genome of the Western Clawed Frog Xenopus tropicalis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hellsten, Uffe; Harland, Richard M.; Gilchrist, Michael J.
2009-10-01
The western clawed frog Xenopus tropicalis is an important model for vertebrate development that combines experimental advantages of the African clawed frog Xenopus laevis with more tractable genetics. Here we present a draft genome sequence assembly of X. tropicalis. This genome encodes over 20,000 protein-coding genes, including orthologs of at least 1,700 human disease genes. Over a million expressed sequence tags validated the annotation. More than one-third of the genome consists of transposable elements, with unusually prevalent DNA transposons. Like other tetrapods, the genome contains gene deserts enriched for conserved non-coding elements. The genome exhibits remarkable shared synteny with humanmore » and chicken over major parts of large chromosomes, broken by lineage-specific chromosome fusions and fissions, mainly in the mammalian lineage.« less
Yasui, Yasuo; Hirakawa, Hideki; Oikawa, Tetsuo; Toyoshima, Masami; Matsuzaki, Chiaki; Ueno, Mariko; Mizuno, Nobuyuki; Nagatoshi, Yukari; Imamura, Tomohiro; Miyago, Manami; Tanaka, Kojiro; Mise, Kazuyuki; Tanaka, Tsutomu; Mizukoshi, Hiroharu; Mori, Masashi; Fujita, Yasunari
2016-01-01
Chenopodium quinoa Willd. (quinoa) originated from the Andean region of South America, and is a pseudocereal crop of the Amaranthaceae family. Quinoa is emerging as an important crop with the potential to contribute to food security worldwide and is considered to be an optimal food source for astronauts, due to its outstanding nutritional profile and ability to tolerate stressful environments. Furthermore, plant pathologists use quinoa as a representative diagnostic host to identify virus species. However, molecular analysis of quinoa is limited by its genetic heterogeneity due to outcrossing and its genome complexity derived from allotetraploidy. To overcome these obstacles, we established the inbred and standard quinoa accession Kd that enables rigorous molecular analysis, and presented the draft genome sequence of Kd, using an optimized combination of high-throughput next generation sequencing on the Illumina Hiseq 2500 and PacBio RS II sequencers. The de novo genome assembly contained 25 k scaffolds consisting of 1 Gbp with N50 length of 86 kbp. Based on these data, we constructed the free-access Quinoa Genome DataBase (QGDB). Thus, these findings provide insights into the mechanisms underlying agronomically important traits of quinoa and the effect of allotetraploidy on genome evolution. PMID:27458999
Multi-Omics Driven Assembly and Annotation of the Sandalwood (Santalum album) Genome.
Mahesh, Hirehally Basavarajegowda; Subba, Pratigya; Advani, Jayshree; Shirke, Meghana Deepak; Loganathan, Ramya Malarini; Chandana, Shankara Lingu; Shilpa, Siddappa; Chatterjee, Oishi; Pinto, Sneha Maria; Prasad, Thottethodi Subrahmanya Keshava; Gowda, Malali
2018-04-01
Indian sandalwood ( Santalum album ) is an important tropical evergreen tree known for its fragrant heartwood-derived essential oil and its valuable carving wood. Here, we applied an integrated genomic, transcriptomic, and proteomic approach to assemble and annotate the Indian sandalwood genome. Our genome sequencing resulted in the establishment of a draft map of the smallest genome for any woody tree species to date (221 Mb). The genome annotation predicted 38,119 protein-coding genes and 27.42% repetitive DNA elements. In-depth proteome analysis revealed the identities of 72,325 unique peptides, which confirmed 10,076 of the predicted genes. The addition of transcriptomic and proteogenomic approaches resulted in the identification of 53 novel proteins and 34 gene-correction events that were missed by genomic approaches. Proteogenomic analysis also helped in reassigning 1,348 potential noncoding RNAs as bona fide protein-coding messenger RNAs. Gene expression patterns at the RNA and protein levels indicated that peptide sequencing was useful in capturing proteins encoded by nuclear and organellar genomes alike. Mass spectrometry-based proteomic evidence provided an unbiased approach toward the identification of proteins encoded by organellar genomes. Such proteins are often missed in transcriptome data sets due to the enrichment of only messenger RNAs that contain poly(A) tails. Overall, the use of integrated omic approaches enhanced the quality of the assembly and annotation of this nonmodel plant genome. The availability of genomic, transcriptomic, and proteomic data will enhance genomics-assisted breeding, germplasm characterization, and conservation of sandalwood trees. © 2018 American Society of Plant Biologists. All Rights Reserved.
Perry, George H.; Reeves, Darryl; Melsted, Páll; Ratan, Aakrosh; Miller, Webb; Michelini, Katelyn; Louis, Edward E.; Pritchard, Jonathan K.; Mason, Christopher E.; Gilad, Yoav
2012-01-01
We present a high-coverage draft genome assembly of the aye-aye (Daubentonia madagascariensis), a highly unusual nocturnal primate from Madagascar. Our assembly totals ∼3.0 billion bp (3.0 Gb), roughly the size of the human genome, comprised of ∼2.6 million scaffolds (N50 scaffold size = 13,597 bp) based on short paired-end sequencing reads. We compared the aye-aye genome sequence data with four other published primate genomes (human, chimpanzee, orangutan, and rhesus macaque) as well as with the mouse and dog genomes as nonprimate outgroups. Unexpectedly, we observed strong evidence for a relatively slow substitution rate in the aye-aye lineage compared with these and other primates. In fact, the aye-aye branch length is estimated to be ∼10% shorter than that of the human lineage, which is known for its low substitution rate. This finding may be explained, in part, by the protracted aye-aye life-history pattern, including late weaning and age of first reproduction relative to other lemurs. Additionally, the availability of this draft lemur genome sequence allowed us to polarize nucleotide and protein sequence changes to the ancestral primate lineage—a critical period in primate evolution, for which the relevant fossil record is sparse. Finally, we identified 293,800 high-confidence single nucleotide polymorphisms in the donor individual for our aye-aye genome sequence, a captive-born individual from two wild-born parents. The resulting heterozygosity estimate of 0.051% is the lowest of any primate studied to date, which is understandable considering the aye-aye's extensive home-range size and relatively low population densities. Yet this level of genetic diversity also suggests that conservation efforts benefiting this unusual species should be prioritized, especially in the face of the accelerating degradation and fragmentation of Madagascar's forests. PMID:22155688
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tian, Rui; Parker, Matthew; Seshadri, Rekha
Bradyrhizobium sp. Ai1a-2 is is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen fixing root nodule of Andira inermis collected from Tres Piedras in Costa Rica. In this report we describe, for the first time, the genome sequence information and annotation of this legume microsymbiont. The 9,029,266 bp genome has a GC content of 62.56% with 247 contigs arranged into 246 scaffolds. The assembled genome contains 8,482 protein-coding genes and 102 RNA-only encoding genes. Lastly, this rhizobial genome was sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Rootmore » Nodule Bacteria (GEBA-RNB) project proposal.« less
Tian, Rui; Parker, Matthew; Seshadri, Rekha; ...
2015-06-14
Bradyrhizobium sp. Ai1a-2 is is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen fixing root nodule of Andira inermis collected from Tres Piedras in Costa Rica. In this report we describe, for the first time, the genome sequence information and annotation of this legume microsymbiont. The 9,029,266 bp genome has a GC content of 62.56% with 247 contigs arranged into 246 scaffolds. The assembled genome contains 8,482 protein-coding genes and 102 RNA-only encoding genes. Lastly, this rhizobial genome was sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Rootmore » Nodule Bacteria (GEBA-RNB) project proposal.« less
Pajuelo, Mónica J.; Eguiluz, María; Dahlstrom, Eric; Requena, David; Guzmán, Frank; Ramirez, Manuel; Sheen, Patricia; Frace, Michael; Sammons, Scott; Cama, Vitaliano; Anzick, Sarah; Bruno, Dan; Mahanty, Siddhartha; Wilkins, Patricia; Nash, Theodore; Gonzalez, Armando; García, Héctor H.; Gilman, Robert H.; Porcella, Steve; Zimic, Mirko
2015-01-01
Background Infections with Taenia solium are the most common cause of adult acquired seizures worldwide, and are the leading cause of epilepsy in developing countries. A better understanding of the genetic diversity of T. solium will improve parasite diagnostics and transmission pathways in endemic areas thereby facilitating the design of future control measures and interventions. Microsatellite markers are useful genome features, which enable strain typing and identification in complex pathogen genomes. Here we describe microsatellite identification and characterization in T. solium, providing information that will assist in global efforts to control this important pathogen. Methods For genome sequencing, T. solium cysts and proglottids were collected from Huancayo and Puno in Peru, respectively. Using next generation sequencing (NGS) and de novo assembly, we assembled two draft genomes and one hybrid genome. Microsatellite sequences were identified and 36 of them were selected for further analysis. Twenty T. solium isolates were collected from Tumbes in the northern region, and twenty from Puno in the southern region of Peru. The size-polymorphism of the selected microsatellites was determined with multi-capillary electrophoresis. We analyzed the association between microsatellite polymorphism and the geographic origin of the samples. Results The predicted size of the hybrid (proglottid genome combined with cyst genome) T. solium genome was 111 MB with a GC content of 42.54%. A total of 7,979 contigs (>1,000 nt) were obtained. We identified 9,129 microsatellites in the Puno-proglottid genome and 9,936 in the Huancayo-cyst genome, with 5 or more repeats, ranging from mono- to hexa-nucleotide. Seven microsatellites were polymorphic and 29 were monomorphic within the analyzed isolates. T. solium tapeworms were classified into two genetic groups that correlated with the North/South geographic origin of the parasites. Conclusions/Significance The availability of draft genomes for T. solium represents a significant step towards the understanding the biology of the parasite. We report here a set of T. solium polymorphic microsatellite markers that appear promising for genetic epidemiology studies. PMID:26697878
Driscoll, Connor B; Otten, Timothy G; Brown, Nathan M; Dreher, Theo W
2017-01-01
Here we report three complete bacterial genome assemblies from a PacBio shotgun metagenome of a co-culture from Upper Klamath Lake, OR. Genome annotations and culture conditions indicate these bacteria are dependent on carbon and nitrogen fixation from the cyanobacterium Aphanizomenon flos-aquae, whose genome was assembled to draft-quality . Due to their taxonomic novelty relative to previously sequenced bacteria, we have temporarily designated these bacteria as incertae sedis Hyphomonadaceae strain UKL13-1 (3,501,508 bp and 56.12% GC), incertae sedis Betaproteobacterium strain UKL13-2 (3,387,087 bp and 54.98% GC), and incertae sedis Bacteroidetes strain UKL13-3 (3,236,529 bp and 37.33% GC). Each genome consists of a single circular chromosome with no identified plasmids. When compared with binned Illumina assemblies of the same three genomes, there was ~7% discrepancy in total genome length. Gaps where Illumina assemblies broke were often due to repetitive elements. Within these missing sequences were essential genes and genes associated with a variety of functional categories. Annotated gene content reveals that both Proteobacteria are aerobic anoxygenic phototrophs, with Betaproteobacterium UKL13-2 potentially capable of phototrophic oxidation of sulfur compounds. Both proteobacterial genomes contain transporters suggesting they are scavenging fixed nitrogen from A. flos-aquae in the form of ammonium. Bacteroidetes UKL13-3 has few completely annotated biosynthetic pathways, and has a comparatively higher proportion of unannotated genes. The genomes were detected in only a few other freshwater metagenomes, suggesting that these bacteria are not ubiquitous in freshwater systems. Our results indicate that long-read sequencing is a viable method for sequencing dominant members from low-diversity microbial communities, and should be considered for environmental metagenomics when conditions meet these requirements.
Draft genome of the Antarctic dragonfish, Parachaenichthys charcoti.
Ahn, Do-Hwan; Shin, Seung Chul; Kim, Bo-Mi; Kang, Seunghyun; Kim, Jin-Hyoung; Ahn, Inhye; Park, Joonho; Park, Hyun
2017-08-01
The Antarctic bathydraconid dragonfish, Parachaenichthys charcoti, is an Antarctic notothenioid teleost endemic to the Southern Ocean. The Southern Ocean has cooled to -1.8ºC over the past 30 million years, and the seawater had retained this cold temperature and isolated oceanic environment because of the Antarctic Circumpolar Current. Notothenioids dominate Antarctic fish, making up 90% of the biomass, and all notothenioids have undergone molecular and ecological diversification to survive in this cold environment. Therefore, they are considered an attractive Antarctic fish model for evolutionary and ancestral genomic studies. Bathydraconidae is a speciose family of the Notothenioidei, the dominant taxonomic component of Antarctic teleosts. To understand the process of evolution of Antarctic fish, we select a typical Antarctic bathydraconid dragonfish, P. charcoti. Here, we have sequenced, de novo assembled, and annotated a comprehensive genome from P. charcoti. The draft genome of P. charcoti is 709 Mb in size. The N50 contig length is 6145 bp, and its N50 scaffold length 178 362 kb. The genome of P. charcoti is predicted to contain 32 712 genes, 18 455 of which have been assigned preliminary functions. A total of 8951 orthologous groups common to 7 species of fish were identified, while 333 genes were identified in P. charcoti only; 2519 orthologous groups were also identified in both P. charcoti and N. coriiceps, another Antarctic fish. Four gene ontology terms were statistically overrepresented among the 333 genes unique to P. charcoti, according to gene ontology enrichment analysis. The draft P. charcoti genome will broaden our understanding of the evolution of Antarctic fish in their extreme environment. It will provide a basis for further investigating the unusual characteristics of Antarctic fishes. © The Author 2017. Published by Oxford University Press.
Coughlan, Simone; Taylor, Ali Shirley; Feane, Eoghan; Sanders, Mandy; Schonian, Gabriele; Cotton, James A.
2018-01-01
The unicellular protozoan parasite Leishmania causes the neglected tropical disease leishmaniasis, affecting 12 million people in 98 countries. In South America, where the Viannia subgenus predominates, so far only L. (Viannia) braziliensis and L. (V.) panamensis have been sequenced, assembled and annotated as reference genomes. Addressing this deficit in molecular information can inform species typing, epidemiological monitoring and clinical treatment. Here, L. (V.) naiffi and L. (V.) guyanensis genomic DNA was sequenced to assemble these two genomes as draft references from short sequence reads. The methods used were tested using short sequence reads for L. braziliensis M2904 against its published reference as a comparison. This assembly and annotation pipeline identified 70 additional genes not annotated on the original M2904 reference. Phylogenetic and evolutionary comparisons of L. guyanensis and L. naiffi with 10 other Viannia genomes revealed four traits common to all Viannia: aneuploidy, 22 orthologous groups of genes absent in other Leishmania subgenera, elevated TATE transposon copies and a high NADH-dependent fumarate reductase gene copy number. Within the Viannia, there were limited structural changes in genome architecture specific to individual species: a 45 Kb amplification on chromosome 34 was present in all bar L. lainsoni, L. naiffi had a higher copy number of the virulence factor leishmanolysin, and laboratory isolate L. shawi M8408 had a possible minichromosome derived from the 3’ end of chromosome 34. This combination of genome assembly, phylogenetics and comparative analysis across an extended panel of diverse Viannia has uncovered new insights into the origin and evolution of this subgenus and can help improve diagnostics for leishmaniasis surveillance. PMID:29765675
A draft physical map of a D-genome cotton species (Gossypium raimondii)
2010-01-01
Background Genetically anchored physical maps of large eukaryotic genomes have proven useful both for their intrinsic merit and as an adjunct to genome sequencing. Cultivated tetraploid cottons, Gossypium hirsutum and G. barbadense, share a common ancestor formed by a merger of the A and D genomes about 1-2 million years ago. Toward the long-term goal of characterizing the spectrum of diversity among cotton genomes, the worldwide cotton community has prioritized the D genome progenitor Gossypium raimondii for complete sequencing. Results A whole genome physical map of G. raimondii, the putative D genome ancestral species of tetraploid cottons was assembled, integrating genetically-anchored overgo hybridization probes, agarose based fingerprints and 'high information content fingerprinting' (HICF). A total of 13,662 BAC-end sequences and 2,828 DNA probes were used in genetically anchoring 1585 contigs to a cotton consensus genetic map, and 370 and 438 contigs, respectively to Arabidopsis thaliana (AT) and Vitis vinifera (VV) whole genome sequences. Conclusion Several lines of evidence suggest that the G. raimondii genome is comprised of two qualitatively different components. Much of the gene rich component is aligned to the Arabidopsis and Vitis vinifera genomes and shows promise for utilizing translational genomic approaches in understanding this important genome and its resident genes. The integrated genetic-physical map is of value both in assembling and validating a planned reference sequence. PMID:20569427
De Groot, Anne S; Rappuoli, Rino
2004-02-01
Vaccine research entered a new era when the complete genome of a pathogenic bacterium was published in 1995. Since then, more than 97 bacterial pathogens have been sequenced and at least 110 additional projects are now in progress. Genome sequencing has also dramatically accelerated: high-throughput facilities can draft the sequence of an entire microbe (two to four megabases) in 1 to 2 days. Vaccine developers are using microarrays, immunoinformatics, proteomics and high-throughput immunology assays to reduce the truly unmanageable volume of information available in genome databases to a manageable size. Vaccines composed by novel antigens discovered from genome mining are already in clinical trials. Within 5 years we can expect to see a novel class of vaccines composed by genome-predicted, assembled and engineered T- and Bcell epitopes. This article addresses the convergence of three forces--microbial genome sequencing, computational immunology and new vaccine technologies--that are shifting genome mining for vaccines onto the forefront of immunology research.
A synteny-based draft genome sequence of the forage grass Lolium perenne.
Byrne, Stephen L; Nagy, Istvan; Pfeifer, Matthias; Armstead, Ian; Swain, Suresh; Studer, Bruno; Mayer, Klaus; Campbell, Jacqueline D; Czaban, Adrian; Hentrup, Stephan; Panitz, Frank; Bendixen, Christian; Hedegaard, Jakob; Caccamo, Mario; Asp, Torben
2015-11-01
Here we report the draft genome sequence of perennial ryegrass (Lolium perenne), an economically important forage and turf grass species that is widely cultivated in temperate regions worldwide. It is classified along with wheat, barley, oats and Brachypodium distachyon in the Pooideae sub-family of the grass family (Poaceae). Transcriptome data was used to identify 28,455 gene models, and we utilized macro-co-linearity between perennial ryegrass and barley, and synteny within the grass family, to establish a synteny-based linear gene order. The gametophytic self-incompatibility mechanism enables the pistil of a plant to reject self-pollen and therefore promote out-crossing. We have used the sequence assembly to characterize transcriptional changes in the stigma during pollination with both compatible and incompatible pollen. Characterization of the pollen transcriptome identified homologs to pollen allergens from a range of species, many of which were expressed to very high levels in mature pollen grains, and are potentially involved in the self-incompatibility mechanism. The genome sequence provides a valuable resource for future breeding efforts based on genomic prediction, and will accelerate the development of new varieties for more productive grasslands. © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.
Pritchard, Leighton; Holden, Nicola J; Bielaszewska, Martina; Karch, Helge; Toth, Ian K
2012-01-01
An Escherichia coli O104:H4 outbreak in Germany in summer 2011 caused 53 deaths, over 4000 individual infections across Europe, and considerable economic, social and political impact. This outbreak was the first in a position to exploit rapid, benchtop high-throughput sequencing (HTS) technologies and crowdsourced data analysis early in its investigation, establishing a new paradigm for rapid response to disease threats. We describe a novel strategy for design of diagnostic PCR primers that exploited this rapid draft bacterial genome sequencing to distinguish between E. coli O104:H4 outbreak isolates and other pathogenic E. coli isolates, including the historical hæmolytic uræmic syndrome (HUSEC) E. coli HUSEC041 O104:H4 strain, which possesses the same serotype as the outbreak isolates. Primers were designed using a novel alignment-free strategy against eleven draft whole genome assemblies of E. coli O104:H4 German outbreak isolates from the E. coli O104:H4 Genome Analysis Crowd-Sourcing Consortium website, and a negative sequence set containing 69 E. coli chromosome and plasmid sequences from public databases. Validation in vitro against 21 'positive' E. coli O104:H4 outbreak and 32 'negative' non-outbreak EHEC isolates indicated that individual primer sets exhibited 100% sensitivity for outbreak isolates, with false positive rates of between 9% and 22%. A minimal combination of two primers discriminated between outbreak and non-outbreak E. coli isolates with 100% sensitivity and 100% specificity. Draft genomes of isolates of disease outbreak bacteria enable high throughput primer design and enhanced diagnostic performance in comparison to traditional molecular assays. Future outbreak investigations will be able to harness HTS rapidly to generate draft genome sequences and diagnostic primer sets, greatly facilitating epidemiology and clinical diagnostics. We expect that high throughput primer design strategies will enable faster, more precise responses to future disease outbreaks of bacterial origin, and help to mitigate their societal impact.
Single cell genome analysis of an uncultured heterotrophic stramenopile
NASA Astrophysics Data System (ADS)
Roy, Rajat S.; Price, Dana C.; Schliep, Alexander; Cai, Guohong; Korobeynikov, Anton; Yoon, Hwan Su; Yang, Eun Chan; Bhattacharya, Debashish
2014-04-01
A broad swath of eukaryotic microbial biodiversity cannot be cultivated in the lab and is therefore inaccessible to conventional genome-wide comparative methods. One promising approach to study these lineages is single cell genomics (SCG), whereby an individual cell is captured from nature and genome data are produced from the amplified total DNA. Here we tested the efficacy of SCG to generate a draft genome assembly from a single sample, in this case a cell belonging to the broadly distributed MAST-4 uncultured marine stramenopiles. Using de novo gene prediction, we identified 6,996 protein-encoding genes in the MAST-4 genome. This genetic inventory was sufficient to place the cell within the ToL using multigene phylogenetics and provided preliminary insights into the complex evolutionary history of horizontal gene transfer (HGT) in the MAST-4 lineage.
Hybrid de novo genome assembly of the Chinese herbal fleabane Erigeron breviscapus
Zhang, Guanghui; Zhang, Jing; Liu, Hui; Chen, Wei; Wang, Xiao; Li, Yahe
2017-01-01
Abstract Background: The plants in the Erigeron genus of the Compositae (Asteraceae) family are commonly called fleabanes, possibly due to the belief that certain chemicals in these plants repel fleas. In the traditional Chinese medicine, Erigeron breviscapus, which is native to China, was widely used in the treatment of cerebrovascular disease. A handful of bioactive compounds, including scutellarin, 3,5-dicaffeoylquinic acid, and 3,4-dicaffeoylquinic acid, have been isolated from the plant. With the purpose of finding novel medicinal compounds and understanding their biosynthetic pathways, we propose to sequence the genome of E. breviscapus. Findings: We assembled the highly heterozygous E. breviscapus genome using a combination of PacBio single-molecular real-time sequencing and next-generation sequencing methods on the Illumina HiSeq platform. The final draft genome is approximately 1.2 Gb, with contig and scaffold N50 sizes of 18.8 kb and 31.5 kb, respectively. Further analyses predicted 37 504 protein-coding genes in the E. breviscapus genome and 8172 shared gene families among Compositae species. Conclusions: The E. breviscapus genome provides a valuable resource for the investigation of novel bioactive compounds in this Chinese herb. PMID:28431028
Iquebal, M A; Tomar, Rukam S; Parakhia, M V; Singla, Deepak; Jaiswal, Sarika; Rathod, V M; Padhiyar, S M; Kumar, Neeraj; Rai, Anil; Kumar, Dinesh
2017-07-13
Groundnut (Arachis hypogaea L.) is an important oil seed crop having major biotic constraint in production due to stem rot disease caused by fungus, Athelia rolfsii causing 25-80% loss in productivity. As chemical and biological combating strategies of this fungus are not very effective, thus genome sequencing can reveal virulence and pathogenicity related genes for better understanding of the host-parasite interaction. We report draft assembly of Athelia rolfsii genome of ~73 Mb having 8919 contigs. Annotation analysis revealed 16830 genes which are involved in fungicide resistance, virulence and pathogenicity along with putative effector and lethal genes. Secretome analysis revealed CAZY genes representing 1085 enzymatic genes, glycoside hydrolases, carbohydrate esterases, carbohydrate-binding modules, auxillary activities, glycosyl transferases and polysaccharide lyases. Repeat analysis revealed 11171 SSRs, LTR, GYPSY and COPIA elements. Comparative analysis with other existing ascomycotina genome predicted conserved domain family of WD40, CYP450, Pkinase and ABC transporter revealing insight of evolution of pathogenicity and virulence. This study would help in understanding pathogenicity and virulence at molecular level and development of new combating strategies. Such approach is imperative in endeavour of genome based solution in stem rot disease management leading to better productivity of groundnut crop in tropical region of world.
2010-01-01
Background De novo assembly of transcript sequences produced by short-read DNA sequencing technologies offers a rapid approach to obtain expressed gene catalogs for non-model organisms. A draft genome sequence will be produced in 2010 for a Eucalyptus tree species (E. grandis) representing the most important hardwood fibre crop in the world. Genome annotation of this valuable woody plant and genetic dissection of its superior growth and productivity will be greatly facilitated by the availability of a comprehensive collection of expressed gene sequences from multiple tissues and organs. Results We present an extensive expressed gene catalog for a commercially grown E. grandis × E. urophylla hybrid clone constructed using only Illumina mRNA-Seq technology and de novo assembly. A total of 18,894 transcript-derived contigs, a large proportion of which represent full-length protein coding genes were assembled and annotated. Analysis of assembly quality, length and diversity show that this dataset represent the most comprehensive expressed gene catalog for any Eucalyptus tree. mRNA-Seq analysis furthermore allowed digital expression profiling of all of the assembled transcripts across diverse xylogenic and non-xylogenic tissues, which is invaluable for ascribing putative gene functions. Conclusions De novo assembly of Illumina mRNA-Seq reads is an efficient approach for transcriptome sequencing and profiling in Eucalyptus and other non-model organisms. The transcriptome resource (Eucspresso, http://eucspresso.bi.up.ac.za/) generated by this study will be of value for genomic analysis of woody biomass production in Eucalyptus and for comparative genomic analysis of growth and development in woody and herbaceous plants. PMID:21122097
Kim, Seong-Ryul; Kwak, Woori; Kim, Hyaekang; Kim, Kee-Young; Kim, Su-Bae; Choi, Kwang-Ho; Kim, Seong-Wan; Hwang, Jae-Sam; Kim, Minjee; Kim, Iksoo; Goo, Tae-Won
2018-01-01
Abstract Background Antheraea yamamai, also known as the Japanese oak silk moth, is a wild species of silk moth. Silk produced by A. yamamai, referred to as tensan silk, shows different characteristics such as thickness, compressive elasticity, and chemical resistance compared with common silk produced from the domesticated silkworm, Bombyx mori. Its unique characteristics have led to its use in many research fields including biotechnology and medical science, and the scientific as well as economic importance of the wild silk moth continues to gradually increase. However, no genomic information for the wild silk moth, including A. yamamai, is currently available. Findings In order to construct the A. yamamai genome, a total of 147G base pairs using Illumina and Pacbio sequencing platforms were generated, providing 210-fold coverage based on the 700-Mb estimated genome size of A. yamamai. The assembled genome of A. yamamai was 656 Mb (>2 kb) with 3675 scaffolds, and the N50 length of assembly was 739 Kb with a 34.07% GC ratio. Identified repeat elements covered 37.33% of the total genome, and the completeness of the constructed genome assembly was estimated to be 96.7% by Benchmarking Universal Single-Copy Orthologs v2 analysis. A total of 15 481 genes were identified using Evidence Modeler based on the gene prediction results obtained from 3 different methods (ab initio, RNA-seq-based, known-gene-based) and manual curation. Conclusions Here we present the genome sequence of A. yamamai, the first genome sequence of the wild silk moth. These results provide valuable genomic information, which will help enrich our understanding of the molecular mechanisms relating to not only specific phenotypes such as wild silk itself but also the genomic evolution of Saturniidae. PMID:29186418
2012-01-01
Background The first draft assembly and gene prediction of the grapevine genome (8X base coverage) was made available to the scientific community in 2007, and functional annotation was developed on this gene prediction. Since then additional Sanger sequences were added to the 8X sequences pool and a new version of the genomic sequence with superior base coverage (12X) was produced. Results In order to more efficiently annotate the function of the genes predicted in the new assembly, it is important to build on as much of the previous work as possible, by transferring 8X annotation of the genome to the 12X version. The 8X and 12X assemblies and gene predictions of the grapevine genome were compared to answer the question, “Can we uniquely map 8X predicted genes to 12X predicted genes?” The results show that while the assemblies and gene structure predictions are too different to make a complete mapping between them, most genes (18,725) showed a one-to-one relationship between 8X predicted genes and the last version of 12X predicted genes. In addition, reshuffled genomic sequence structures appeared. These highlight regions of the genome where the gene predictions need to be taken with caution. Based on the new grapevine gene functional annotation and in-depth functional categorization, twenty eight new molecular networks have been created for VitisNet while the existing networks were updated. Conclusions The outcomes of this study provide a functional annotation of the 12X genes, an update of VitisNet, the system of the grapevine molecular networks, and a new functional categorization of genes. Data are available at the VitisNet website (http://www.sdstate.edu/ps/research/vitis/pathways.cfm). PMID:22554261
Establishing gene models from the Pinus pinaster genome using gene capture and BAC sequencing.
Seoane-Zonjic, Pedro; Cañas, Rafael A; Bautista, Rocío; Gómez-Maldonado, Josefa; Arrillaga, Isabel; Fernández-Pozo, Noé; Claros, M Gonzalo; Cánovas, Francisco M; Ávila, Concepción
2016-02-27
In the era of DNA throughput sequencing, assembling and understanding gymnosperm mega-genomes remains a challenge. Although drafts of three conifer genomes have recently been published, this number is too low to understand the full complexity of conifer genomes. Using techniques focused on specific genes, gene models can be established that can aid in the assembly of gene-rich regions, and this information can be used to compare genomes and understand functional evolution. In this study, gene capture technology combined with BAC isolation and sequencing was used as an experimental approach to establish de novo gene structures without a reference genome. Probes were designed for 866 maritime pine transcripts to sequence genes captured from genomic DNA. The gene models were constructed using GeneAssembler, a new bioinformatic pipeline, which reconstructed over 82% of the gene structures, and a high proportion (85%) of the captured gene models contained sequences from the promoter regulatory region. In a parallel experiment, the P. pinaster BAC library was screened to isolate clones containing genes whose cDNA sequence were already available. BAC clones containing the asparagine synthetase, sucrose synthase and xyloglucan endotransglycosylase gene sequences were isolated and used in this study. The gene models derived from the gene capture approach were compared with the genomic sequences derived from the BAC clones. This combined approach is a particularly efficient way to capture the genomic structures of gene families with a small number of members. The experimental approach used in this study is a valuable combined technique to study genomic gene structures in species for which a reference genome is unavailable. It can be used to establish exon/intron boundaries in unknown gene structures, to reconstruct incomplete genes and to obtain promoter sequences that can be used for transcriptional studies. A bioinformatics algorithm (GeneAssembler) is also provided as a Ruby gem for this class of analyses.
Yasui, Yasuo; Hirakawa, Hideki; Oikawa, Tetsuo; Toyoshima, Masami; Matsuzaki, Chiaki; Ueno, Mariko; Mizuno, Nobuyuki; Nagatoshi, Yukari; Imamura, Tomohiro; Miyago, Manami; Tanaka, Kojiro; Mise, Kazuyuki; Tanaka, Tsutomu; Mizukoshi, Hiroharu; Mori, Masashi; Fujita, Yasunari
2016-12-01
Chenopodium quinoa Willd. (quinoa) originated from the Andean region of South America, and is a pseudocereal crop of the Amaranthaceae family. Quinoa is emerging as an important crop with the potential to contribute to food security worldwide and is considered to be an optimal food source for astronauts, due to its outstanding nutritional profile and ability to tolerate stressful environments. Furthermore, plant pathologists use quinoa as a representative diagnostic host to identify virus species. However, molecular analysis of quinoa is limited by its genetic heterogeneity due to outcrossing and its genome complexity derived from allotetraploidy. To overcome these obstacles, we established the inbred and standard quinoa accession Kd that enables rigorous molecular analysis, and presented the draft genome sequence of Kd, using an optimized combination of high-throughput next generation sequencing on the Illumina Hiseq 2500 and PacBio RS II sequencers. The de novo genome assembly contained 25 k scaffolds consisting of 1 Gbp with N50 length of 86 kbp. Based on these data, we constructed the free-access Quinoa Genome DataBase (QGDB). Thus, these findings provide insights into the mechanisms underlying agronomically important traits of quinoa and the effect of allotetraploidy on genome evolution. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Zaccaron, Alex Z; Woloshuk, Charles P; Bluhm, Burton H
2017-11-01
Stenocarpella maydis is a plant pathogenic fungus that causes Diplodia ear rot, one of the most destructive diseases of maize. To date, little information is available regarding the molecular basis of pathogenesis in this organism, in part due to limited genomic resources. In this study, a 54.8 Mb draft genome assembly of S. maydis was obtained with Illumina and PacBio sequencing technologies, and analyzed. Comparative genomic analyses with the predominant maize ear rot pathogens Aspergillus flavus, Fusarium verticillioides, and Fusarium graminearum revealed an expanded set of carbohydrate-active enzymes for cellulose and hemicellulose degradation in S. maydis. Analyses of predicted genes involved in starch degradation revealed six putative α-amylases, four extracellular and two intracellular, and two putative γ-amylases, one of which appears to have been acquired from bacteria via horizontal transfer. Additionally, 87 backbone genes involved in secondary metabolism were identified, which represents one of the largest known assemblages among Pezizomycotina species. Numerous secondary metabolite gene clusters were identified, including two clusters likely involved in the biosynthesis of diplodiatoxin and chaetoglobosins. The draft genome of S. maydis presented here will serve as a useful resource for molecular genetics, functional genomics, and analyses of population diversity in this organism. Copyright © 2017 British Mycological Society. Published by Elsevier Ltd. All rights reserved.
Introduction to the fathead minnow genome browser and ...
Ab initio gene prediction and evidence alignment were used to produce the first annotations for the fathead minnow SOAPdenovo genome assembly. Additionally, a genome browser hosted at genome.setac.org provides simplified access to the annotation data in context with fathead minnow genomic sequence. This work is meant to extend the utility of fathead minnow genome as a resource and enable the continued development of this species as a model organism. The fathead minnow (Pimephales promelas) is a laboratory model organism widely used in regulatory toxicity testing and ecotoxicology research. Despite, the wealth of toxicological data for this organism, until recently genome scale information was lacking for the species, which limited the utility of the species for pathway-based toxicity testing and research. As part of a EPA Pathfinder Innovation Project, next generation sequencing was applied to generate a draft genome assembly, which was published in 2016. However, application of those genome-scale sequencing resources was still limited by the lack of available gene annotations for fathead minnow. Here we report on development of a first generation genome annotation for fathead minnow and the dissemination of that information through a web-based browser that makes it easy to search for genes of interest, extract the corresponding sequence, identify intron and exon boundaries and regulatory regions, and align the computationally predicted genes with other supporti
Three draft genomes of Vibrio coralliilyticus strains isolated from bivalve hatcheries
USDA-ARS?s Scientific Manuscript database
Reported here are the draft genomes of three Vibrio coralliilyticus isolates RE87, AIC-7, and 080116A. Each strain was isolated in association with diseased oyster larvae in commercial aquaculture systems. These draft genomes will be useful for further studies in understanding the genomic features...
Tirumalai, Madhan R; Stepanov, Victor G; Wünsche, Andrea; Montazari, Saied; Gonzalez, Racquel O; Venkateswaran, Kasturi; Fox, George E
2018-06-08
Bacillus strains producing highly resistant spores have been isolated from cleanrooms and space craft assembly facilities. Organisms that can survive such conditions merit planetary protection concern and if that resistance can be transferred to other organisms, a health concern too. To further efforts to understand these resistances, the complete genome of Bacillus safensis strain FO-36b, which produces spores resistant to peroxide and radiation was determined. The genome was compared to the complete genome of B. pumilus SAFR-032, and the draft genomes of B. safensis JPL-MERTA-8-2 and the type strain B. pumilus ATCC7061 T . Additional comparisons were made to 61 draft genomes that have been mostly identified as strains of B. pumilus or B. safensis. The FO-36b gene order is essentially the same as that in SAFR-032 and other B. pumilus strains. The annotated genome has 3850 open reading frames and 40 noncoding RNAs and riboswitches. Of these, 307 are not shared by SAFR-032, and 65 are also not shared by MERTA and ATCC7061 T . The FO-36b genome has ten unique open reading frames and two phage-like regions, homologous to the Bacillus bacteriophage SPP1 and Brevibacillus phage Jimmer1. Differing remnants of the Jimmer1 phage are found in essentially all B. safensis / B. pumilus strains. Seven unique genes are part of these phage elements. Whole Genome Phylogenetic Analysis of the B. pumilus, B. safensis and other Firmicutes genomes, separate them into three distinct clusters. Two clusters are subgroups of B. pumilus while one houses all the B. safensis strains. The Genome-genome distance analysis and a phylogenetic analysis of gyrA sequences corroborated these results. It is not immediately obvious that the presence or absence of any specific gene or combination of genes is responsible for the variations in resistance seen. It is quite possible that distinctions in gene regulation can alter the expression levels of key proteins thereby changing the organism's resistance properties without gain or loss of a particular gene. What is clear is that phage elements contribute significantly to genome variability. Multiple genome comparison indicates that many strains named as B. pumilus likely belong to the B. safensis group.
Bertelli, Claire; Aeby, Sébastien; Chassot, Bérénice; Clulow, James; Hilfiker, Olivier; Rappo, Samuel; Ritzmann, Sébastien; Schumacher, Paolo; Terrettaz, Céline; Benaglio, Paola; Falquet, Laurent; Farinelli, Laurent; Gharib, Walid H; Goesmann, Alexander; Harshman, Keith; Linke, Burkhard; Miyazaki, Ryo; Rivolta, Carlo; Robinson-Rechavi, Marc; van der Meer, Jan Roelof; Greub, Gilbert
2015-01-01
With the widespread availability of high-throughput sequencing technologies, sequencing projects have become pervasive in the molecular life sciences. The huge bulk of data generated daily must be analyzed further by biologists with skills in bioinformatics and by "embedded bioinformaticians," i.e., bioinformaticians integrated in wet lab research groups. Thus, students interested in molecular life sciences must be trained in the main steps of genomics: sequencing, assembly, annotation and analysis. To reach that goal, a practical course has been set up for master students at the University of Lausanne: the "Sequence a genome" class. At the beginning of the academic year, a few bacterial species whose genome is unknown are provided to the students, who sequence and assemble the genome(s) and perform manual annotation. Here, we report the progress of the first class from September 2010 to June 2011 and the results obtained by seven master students who specifically assembled and annotated the genome of Estrella lausannensis, an obligate intracellular bacterium related to Chlamydia. The draft genome of Estrella is composed of 29 scaffolds encompassing 2,819,825 bp that encode for 2233 putative proteins. Estrella also possesses a 9136 bp plasmid that encodes for 14 genes, among which we found an integrase and a toxin/antitoxin module. Like all other members of the Chlamydiales order, Estrella possesses a highly conserved type III secretion system, considered as a key virulence factor. The annotation of the Estrella genome also allowed the characterization of the metabolic abilities of this strictly intracellular bacterium. Altogether, the students provided the scientific community with the Estrella genome sequence and a preliminary understanding of the biology of this recently-discovered bacterial genus, while learning to use cutting-edge technologies for sequencing and to perform bioinformatics analyses.
Piombo, Edoardo; Sela, Noa; Wisniewski, Michael; Hoffmann, Maria; Gullino, Maria L.; Allard, Marc W.; Levin, Elena; Spadaro, Davide; Droby, Samir
2018-01-01
The yeast Metschnikowia fructicola was reported as an efficient biological control agent of postharvest diseases of fruits and vegetables, and it is the bases of the commercial formulated product “Shemer.” Several mechanisms of action by which M. fructicola inhibits postharvest pathogens were suggested including iron-binding compounds, induction of defense signaling genes, production of fungal cell wall degrading enzymes and relatively high amounts of superoxide anions. We assembled the whole genome sequence of two strains of M. fructicola using PacBio and Illumina shotgun sequencing technologies. Using the PacBio, a high-quality draft genome consisting of 93 contigs, with an estimated genome size of approximately 26 Mb, was obtained. Comparative analysis of M. fructicola proteins with the other three available closely related genomes revealed a shared core of homologous proteins coded by 5,776 genes. Comparing the genomes of the two M. fructicola strains using a SNP calling approach resulted in the identification of 564,302 homologous SNPs with 2,004 predicted high impact mutations. The size of the genome is exceptionally high when compared with those of available closely related organisms, and the high rate of homology among M. fructicola genes points toward a recent whole-genome duplication event as the cause of this large genome. Based on the assembled genome, sequences were annotated with a gene description and gene ontology (GO term) and clustered in functional groups. Analysis of CAZymes family genes revealed 1,145 putative genes, and transcriptomic analysis of CAZyme expression levels in M. fructicola during its interaction with either grapefruit peel tissue or Penicillium digitatum revealed a high level of CAZyme gene expression when the yeast was placed in wounded fruit tissue. PMID:29666611
The draft genome of Ruellia speciosa (Beautiful Wild Petunia: Acanthaceae).
Zhuang, Yongbin; Tripp, Erin A
2017-04-01
The genus Ruellia (Wild Petunias; Acanthaceae) is characterized by an enormous diversity of floral shapes and colours manifested among closely related species. Using Illumina platform, we reconstructed the draft genome of Ruellia speciosa, with a scaffold size of 1,021 Mb (or ∼1.02 Gb) and an N50 size of 17,908 bp, spanning ∼93% of the estimated genome (∼1.1 Gb). The draft assembly predicted 40,124 gene models and phylogenetic analyses of four key enzymes involved in anthocyanin colour production [flavanone 3-hydroxylase (F3H), flavonoid 3'-hydroxylase (F3'H), flavonoid 3',5'-hydroxylase (F3'5'H), and dihydroflavonol 4-reductase (DFR)] found that most angiosperms here sampled harboured at least one copy of F3H, F3'H, and DFR. In contrast, fewer than one-half (but including R. speciosa) harboured a copy of F3'5'H, supporting observations that blue flowers and/or fruits, which this enzyme is required for, are less common among flowering plants. Ka/Ks analyses of duplicated copies of F3'H and DFR in R. speciosa suggested purifying selection in the former but detected evidence of positive selection in the latter. The genome sequence and annotation of R. speciosa represents only one of only four families sequenced in the large and important Asterid clade of flowering plants and, as such, will facilitate extensive future research on this diverse group, particularly with respect to floral evolution. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
The draft genome of Ruellia speciosa (Beautiful Wild Petunia: Acanthaceae)
Zhuang, Yongbin
2017-01-01
Abstract The genus Ruellia (Wild Petunias; Acanthaceae) is characterized by an enormous diversity of floral shapes and colours manifested among closely related species. Using Illumina platform, we reconstructed the draft genome of Ruellia speciosa, with a scaffold size of 1,021 Mb (or ∼1.02 Gb) and an N50 size of 17,908 bp, spanning ∼93% of the estimated genome (∼1.1 Gb). The draft assembly predicted 40,124 gene models and phylogenetic analyses of four key enzymes involved in anthocyanin colour production [flavanone 3-hydroxylase (F3H), flavonoid 3′-hydroxylase (F3′H), flavonoid 3′,5′-hydroxylase (F3′5′H), and dihydroflavonol 4-reductase (DFR)] found that most angiosperms here sampled harboured at least one copy of F3H, F3′H, and DFR. In contrast, fewer than one-half (but including R. speciosa) harboured a copy of F3′5′H, supporting observations that blue flowers and/or fruits, which this enzyme is required for, are less common among flowering plants. Ka/Ks analyses of duplicated copies of F3′H and DFR in R. speciosa suggested purifying selection in the former but detected evidence of positive selection in the latter. The genome sequence and annotation of R. speciosa represents only one of only four families sequenced in the large and important Asterid clade of flowering plants and, as such, will facilitate extensive future research on this diverse group, particularly with respect to floral evolution. PMID:28431014
Zhang, Ya; Kitajima, Masaaki; Whittle, Andrew J.; Liu, Wen-Tso
2017-01-01
The occurrence of pathogenic bacteria in drinking water distribution systems (DWDSs) is a major health concern, and our current understanding is mostly related to pathogenic species such as Legionella pneumophila and Mycobacterium avium but not to bacterial species closely related to them. In this study, genomic-based approaches were used to characterize pathogen-related species in relation to their abundance, diversity, potential pathogenicity, genetic exchange, and distribution across an urban drinking water system. Nine draft genomes recovered from 10 metagenomes were identified as Legionella (4 draft genomes), Mycobacterium (3 draft genomes), Parachlamydia (1 draft genome), and Leptospira (1 draft genome). The pathogenicity potential of these genomes was examined by the presence/absence of virulence machinery, including genes belonging to Type III, IV, and VII secretion systems and their effectors. Several virulence factors known to pathogenic species were detected with these retrieved draft genomes except the Leptospira-related genome. Identical clustered regularly interspaced short palindromic repeats-CRISPR-associated proteins (CRISPR-Cas) genetic signatures were observed in two draft genomes recovered at different stages of the studied system, suggesting that the spacers in CRISPR-Cas could potentially be used as a biomarker in the monitoring of Legionella related strains at an evolutionary scale of several years across different drinking water production and distribution systems. Overall, metagenomics approach was an effective and complementary tool of culturing techniques to gain insights into the pathogenic characteristics and the CRISPR-Cas signatures of pathogen-related species in DWDSs. PMID:29097994
Draft Genome Sequence of a Virulent Strain of Pasteurella Multocida Isolated From Alpaca
Hurtado, Raquel Enma; Aburjaile, Flavia; Mariano, Diego; Canário, Marcus Vinicius; Benevides, Leandro; Fernandez, Daniel Antonio; Allasi, Nataly Olivia; Rimac, Rocio; Juscamayta, Julio Eduardo; Maximiliano, Jorge Enrique; Rosadio, Raul Hector; Azevedo, Vasco; Maturrano, Lenin
2017-01-01
Pasteurella multocida is one of the most frequently isolated bacteria in acute pneumonia cases, being responsible for high mortality rates in Peruvian young alpacas, with consequent social and economic costs. Here we report the genome sequence of P. multocida strain UNMSM, isolated from the lung of an alpaca diagnosed with pneumonia, in Peru. The genome consists of 2,439,814 base pairs assembled into 82 contigs and 2,252 protein encoding genes, revealing the presence of known virulence-associated genes (ompH, ompA, tonB, tbpA, nanA, nanB, nanH, sodA, sodC, plpB and toxA). Further analysis could provide insights about bacterial pathogenesis and control strategies of this disease in Peruvian alpacas. PMID:28698737
Pathak, Ashish; Chauhan, Ashvini; Ewida, Ayman Y.I.; Stothard, Paul
2016-01-01
We recently isolated Micrococcus sp. strain 2385 from Ochlockonee River, Florida and demonstrated potent biodegradative activity against two commonly used pesticides- alachlor [(2-chloro-2`,6`-diethylphenyl-N (methoxymethyl)acetanilide)] and endosulfan [(6,7,8,9,10,10-hexachloro-1,5,5a,6,9,9a-hexahydro-6,9methano-2,3,4-benzo(e)di-oxathiepin-3-oxide], respectively. To further identify the repertoire of metabolic functions possessed by strain 2385, a draft genome sequence was obtained, assembled, annotated and analyzed. The genome sequence of Micrococcus sp. strain 2385 consisted of 1,460,461,440 bases which assembled into 175 contigs with an N50 contig length of 50,109 bases and a coverage of 600x. The genome size of this strain was estimated at 2,431,226 base pairs with a G+C content of 72.8 and a total number of 2,268 putative genes. RAST annotated a total of 340 subsystems in the genome of strain 2385 along with the presence of 2,177 coding sequences. A genome wide survey indicated that that strain 2385 harbors a plethora of genes to degrade other pollutants including caprolactam, PAHs (such as naphthalene), styrene, toluene and several chloroaromatic compounds. PMID:27672405
Pathak, Ashish; Chauhan, Ashvini; Ewida, Ayman Y I; Stothard, Paul
2016-01-01
We recently isolated Micrococcus sp. strain 2385 from Ochlockonee River, Florida and demonstrated potent biodegradative activity against two commonly used pesticides- alachlor [(2-chloro-2`,6`-diethylphenyl-N (methoxymethyl)acetanilide)] and endosulfan [(6,7,8,9,10,10-hexachloro-1,5,5a,6,9,9a-hexahydro-6,9methano-2,3,4-benzo(e)di-oxathiepin-3-oxide], respectively. To further identify the repertoire of metabolic functions possessed by strain 2385, a draft genome sequence was obtained, assembled, annotated and analyzed. The genome sequence of Micrococcus sp. strain 2385 consisted of 1,460,461,440 bases which assembled into 175 contigs with an N50 contig length of 50,109 bases and a coverage of 600x. The genome size of this strain was estimated at 2,431,226 base pairs with a G+C content of 72.8 and a total number of 2,268 putative genes. RAST annotated a total of 340 subsystems in the genome of strain 2385 along with the presence of 2,177 coding sequences. A genome wide survey indicated that that strain 2385 harbors a plethora of genes to degrade other pollutants including caprolactam, PAHs (such as naphthalene), styrene, toluene and several chloroaromatic compounds.
Chen, Shicheng; Soehnlen, Marty; Downes, Frances P; Walker, Edward D
2017-01-01
Elizabethkingia meningoseptica is an emerging, healthcare-associated pathogen causing a high mortality rate in immunocompromised patients. We report the draft genome sequence of E. meningoseptica Em3, isolated from sputum from a patient with multiple underlying diseases. The genome has a length of 4,037,922 bp, a GC-content 36.4%, and 3673 predicted protein-coding sequences. Average nucleotide identity analysis (>95%) assigned the bacterium to the species E. meningoseptica. Genome analysis showed presence of the curli formation and assembly operon and a gene encoding hemagglutinins, indicating ability to form biofilm. In vitro biofilm assays demonstrated that E. meningoseptica Em3 formed more biofilm than E. anophelis Ag1 and E. miricola Emi3, both lacking the curli operon. A gene encoding thiol-activated cholesterol-dependent cytolysin in E. meningoseptica Em3 (potentially involved in lysing host immune cells) was also absent in E. anophelis Ag1 and E. miricola Emi3. Strain Em3 showed α-hemolysin activity on blood agar medium, congruent with presence of hemolysin and cytolysin genes. Furthermore, presence of heme uptake and utilization genes demonstrated adaptations for bloodstream infections. Strain Em3 contained 12 genes conferring resistance to β-lactams, including β-lactamases class A, class B, and metallo-β-lactamases. Results of comparative genomic analysis here provide insights into the evolution of E. meningoseptica Em3 as a pathogen.
Verde, Ignazio; Abbott, Albert G; Scalabrin, Simone; Jung, Sook; Shu, Shengqiang; Marroni, Fabio; Zhebentyayeva, Tatyana; Dettori, Maria Teresa; Grimwood, Jane; Cattonaro, Federica; Zuccolo, Andrea; Rossini, Laura; Jenkins, Jerry; Vendramin, Elisa; Meisel, Lee A; Decroocq, Veronique; Sosinski, Bryon; Prochnik, Simon; Mitros, Therese; Policriti, Alberto; Cipriani, Guido; Dondini, Luca; Ficklin, Stephen; Goodstein, David M; Xuan, Pengfei; Del Fabbro, Cristian; Aramini, Valeria; Copetti, Dario; Gonzalez, Susana; Horner, David S; Falchi, Rachele; Lucas, Susan; Mica, Erica; Maldonado, Jonathan; Lazzari, Barbara; Bielenberg, Douglas; Pirona, Raul; Miculan, Mara; Barakat, Abdelali; Testolin, Raffaele; Stella, Alessandra; Tartarini, Stefano; Tonutti, Pietro; Arús, Pere; Orellana, Ariel; Wells, Christina; Main, Dorrie; Vizzotto, Giannina; Silva, Herman; Salamini, Francesco; Schmutz, Jeremy; Morgante, Michele; Rokhsar, Daniel S
2013-05-01
Rosaceae is the most important fruit-producing clade, and its key commercially relevant genera (Fragaria, Rosa, Rubus and Prunus) show broadly diverse growth habits, fruit types and compact diploid genomes. Peach, a diploid Prunus species, is one of the best genetically characterized deciduous trees. Here we describe the high-quality genome sequence of peach obtained from a completely homozygous genotype. We obtained a complete chromosome-scale assembly using Sanger whole-genome shotgun methods. We predicted 27,852 protein-coding genes, as well as noncoding RNAs. We investigated the path of peach domestication through whole-genome resequencing of 14 Prunus accessions. The analyses suggest major genetic bottlenecks that have substantially shaped peach genome diversity. Furthermore, comparative analyses showed that peach has not undergone recent whole-genome duplication, and even though the ancestral triplicated blocks in peach are fragmentary compared to those in grape, all seven paleosets of paralogs from the putative paleoancestor are detectable.
Genome and transcriptome of the porcine whipworm Trichuris suis
Jex, Aaron R.; Nejsum, Peter; Schwarz, Erich M.; Hu, Li; Young, Neil D.; Hall, Ross S.; Korhonen, Pasi K.; Liao, Shengguang; Thamsborg, Stig; Xia, Jinquan; Xu, Pengwei; Wang, Shaowei; Scheerlinck, Jean-Pierre Y.; Hofmann, Andreas; Sternberg, Paul W.; Wang, Jun; Gasser, Robin B.
2014-01-01
Trichuris (whipworm) infects 1 billion people worldwide, and causes a disease (trichuriasis) that results in major socioeconomic losses in both humans and pigs. Trichuriasis relates to an inflammation of the large intestine manifested in bloody diarrhoea, and chronic disease can cause malnourishment and stunting in children. Paradoxically, Trichuris of pigs has shown substantial promise as a treatment for human autoimmune disorders, including inflammatory bowel disease (IBD) and multiple sclerosis (MS). Here, we report ~80 megabase (Mb) draft assemblies of the genomes of adult male and female T. suis, and explore stage-, sex- and tissue-specific transcription of messenger and small non-coding RNAs. PMID:24929829
Genome and transcriptome of the porcine whipworm Trichuris suis.
Jex, Aaron R; Nejsum, Peter; Schwarz, Erich M; Hu, Li; Young, Neil D; Hall, Ross S; Korhonen, Pasi K; Liao, Shengguang; Thamsborg, Stig; Xia, Jinquan; Xu, Pengwei; Wang, Shaowei; Scheerlinck, Jean-Pierre Y; Hofmann, Andreas; Sternberg, Paul W; Wang, Jun; Gasser, Robin B
2014-07-01
Trichuris (whipworm) infects 1 billion people worldwide and causes a disease (trichuriasis) that results in major socioeconomic losses in both humans and pigs. Trichuriasis relates to an inflammation of the large intestine manifested in bloody diarrhea, and chronic disease can cause malnourishment and stunting in children. Paradoxically, Trichuris of pigs has shown substantial promise as a treatment for human autoimmune disorders, including inflammatory bowel disease (IBD) and multiple sclerosis. Here we report whole-genome sequencing at ∼140-fold coverage of adult male and female T. suis and ∼80-Mb draft assemblies. We explore stage-, sex- and tissue-specific transcription of mRNAs and small noncoding RNAs.
Lin, Yuling; Min, Jiumeng; Lai, Ruilian; Wu, Zhangyan; Chen, Yukun; Yu, Lili; Cheng, Chunzhen; Jin, Yuanchun; Tian, Qilin; Liu, Qingfeng; Liu, Weihua; Zhang, Chengguang; Lin, Lixia; Hu, Yan; Zhang, Dongmin; Thu, Minkyaw; Zhang, Zihao; Liu, Shengcai; Zhong, Chunshui; Fang, Xiaodong; Wang, Jian; Yang, Huanming
2017-01-01
Abstract Longan (Dimocarpus longan Lour.), an important subtropical fruit in the family Sapindaceae, is grown in more than 10 countries. Longan is an edible drupe fruit and a source of traditional medicine with polyphenol-rich traits. Tree size, alternate bearing, and witches' broom disease still pose serious problems. To gain insights into the genomic basis of longan traits, a draft genome sequence was assembled. The draft genome (about 471.88 Mb) of a Chinese longan cultivar, “Honghezi,” was estimated to contain 31 007 genes and 261.88 Mb of repetitive sequences. No recent whole-genome-wide duplication event was detected in the genome. Whole-genome resequencing and analysis of 13 cultivated D. longan accessions revealed the extent of genetic diversity. Comparative transcriptome studies combined with genome-wide analysis revealed polyphenol-rich and pathogen resistance characteristics. Genes involved in secondary metabolism, especially those from significantly expanded (DHS, SDH, F3΄H, ANR, and UFGT) and contracted (PAL, CHS, and F3΄5΄H) gene families with tissue-specific expression, may be important contributors to the high accumulation levels of polyphenolic compounds observed in longan fruit. The high number of genes encoding nucleotide-binding site leucine-rich repeat (NBS-LRR) and leucine-rich repeat receptor-like kinase proteins, as well as the recent expansion and contraction of the NBS-LRR family, suggested a genomic basis for resistance to insects, fungus, and bacteria in this fruit tree. These data provide insights into the evolution and diversity of the longan genome. The comparative genomic and transcriptome analyses provided information about longan-specific traits, particularly genes involved in its polyphenol-rich and pathogen resistance characteristics. PMID:28368449
Speth, Daan R; Lagkouvardos, Ilias; Wang, Yong; Qian, Pei-Yuan; Dutilh, Bas E; Jetten, Mike S M
2017-07-01
Several recent studies have indicated that members of the phylum Planctomycetes are abundantly present at the brine-seawater interface (BSI) above multiple brine pools in the Red Sea. Planctomycetes include bacteria capable of anaerobic ammonium oxidation (anammox). Here, we investigated the possibility of anammox at BSI sites using metagenomic shotgun sequencing of DNA obtained from the BSI above the Discovery Deep brine pool. Analysis of sequencing reads matching the 16S rRNA and hzsA genes confirmed presence of anammox bacteria of the genus Scalindua. Phylogenetic analysis of the 16S rRNA gene indicated that this Scalindua sp. belongs to a distinct group, separate from the anammox bacteria in the seawater column, that contains mostly sequences retrieved from high-salt environments. Using coverage- and composition-based binning, we extracted and assembled the draft genome of the dominant anammox bacterium. Comparative genomic analysis indicated that this Scalindua species uses compatible solutes for osmoadaptation, in contrast to other marine anammox bacteria that likely use a salt-in strategy. We propose the name Candidatus Scalindua rubra for this novel species, alluding to its discovery in the Red Sea.
Genome sequencing of the winged midge, Parochlus steinenii, from the Antarctic Peninsula
Kim, Sanghee; Oh, Mijin; Jung, Woongsic; Park, Joonho; Choi, Han-Gu
2017-01-01
Abstract Background: In the Antarctic, only two species of Chironomidae occur naturally—the wingless midge, Belgica antarctica, and the winged midge, Parochlus steinenii. B. antarctica is an extremophile with unusual adaptations. The larvae of B. antarctica are desiccation- and freeze-tolerant and the adults are wingless. Recently, the compact genome of B. antarctica was reported and it is the first Antarctic eukaryote to be sequenced. Although P. steinenii occurs naturally in the Antarctic with B. antarctica, the larvae of P. steinenii are cold-tolerant but not freeze-tolerant and the adults are winged. Differences in adaptations in the Antarctic midges are interesting in terms of evolutionary processes within an extreme environment. Herein, we provide the genome of another Antarctic midge to help elucidate the evolution of these species. Results: The draft genome of P. steinenii had a total size of 138 Mbp, comprising 9513 contigs with an N50 contig size of 34,110 bp, and a GC content of 32.2%. Overall, 13,468 genes were predicted using the MAKER annotation pipeline, and gene ontology classified 10,801 (80.2%) predicted genes to a function. Compared with the assembled genome architecture of B. antarctica, that of P. steinenii was approximately 50 Mbp longer with 6.2-fold more repeat sequences, whereas gene regions were as similarly compact as in B. antarctica. Conclusions: We present an annotated draft genome of the Antarctic midge, P. steinenii. The genomes of P. steinenii and B. antarctica will aid in the elucidation of evolution in harsh environments and provide new resources for functional genomic analyses of the order Diptera. PMID:28327954
Feuk, Lars; MacDonald, Jeffrey R; Tang, Terence; Carson, Andrew R; Li, Martin; Rao, Girish; Khaja, Razi; Scherer, Stephen W
2005-10-01
With a draft genome-sequence assembly for the chimpanzee available, it is now possible to perform genome-wide analyses to identify, at a submicroscopic level, structural rearrangements that have occurred between chimpanzees and humans. The goal of this study was to investigate chromosomal regions that are inverted between the chimpanzee and human genomes. Using the net alignments for the builds of the human and chimpanzee genome assemblies, we identified a total of 1,576 putative regions of inverted orientation, covering more than 154 mega-bases of DNA. The DNA segments are distributed throughout the genome and range from 23 base pairs to 62 mega-bases in length. For the 66 inversions more than 25 kilobases (kb) in length, 75% were flanked on one or both sides by (often unrelated) segmental duplications. Using PCR and fluorescence in situ hybridization we experimentally validated 23 of 27 (85%) semi-randomly chosen regions; the largest novel inversion confirmed was 4.3 mega-bases at human Chromosome 7p14. Gorilla was used as an out-group to assign ancestral status to the variants. All experimentally validated inversion regions were then assayed against a panel of human samples and three of the 23 (13%) regions were found to be polymorphic in the human genome. These polymorphic inversions include 730 kb (at 7p22), 13 kb (at 7q11), and 1 kb (at 16q24) fragments with a 5%, 30%, and 48% minor allele frequency, respectively. Our results suggest that inversions are an important source of variation in primate genome evolution. The finding of at least three novel inversion polymorphisms in humans indicates this type of structural variation may be a more common feature of our genome than previously realized.
The draft genome of the carcinogenic human liver fluke Clonorchis sinensis
2011-01-01
Background Clonorchis sinensis is a carcinogenic human liver fluke that is widespread in Asian countries. Increasing infection rates of this neglected tropical disease are leading to negative economic and public health consequences in affected regions. Experimental and epidemiological studies have shown a strong association between the incidence of cholangiocarcinoma and the infection rate of C. sinensis. To aid research into this organism, we have sequenced its genome. Results We combined de novo sequencing with computational techniques to provide new information about the biology of this liver fluke. The assembled genome has a total size of 516 Mb with a scaffold N50 length of 42 kb. Approximately 16,000 reliable protein-coding gene models were predicted. Genes for the complete pathways for glycolysis, the Krebs cycle and fatty acid metabolism were found, but key genes involved in fatty acid biosynthesis are missing from the genome, reflecting the parasitic lifestyle of a liver fluke that receives lipids from the bile of its host. We also identified pathogenic molecules that may contribute to liver fluke-induced hepatobiliary diseases. Large proteins such as multifunctional secreted proteases and tegumental proteins were identified as potential targets for the development of drugs and vaccines. Conclusions This study provides valuable genomic information about the human liver fluke C. sinensis and adds to our knowledge on the biology of the parasite. The draft genome will serve as a platform to develop new strategies for parasite control. PMID:22023798
Berendsen, Erwin M.; Wells-Bennik, Marjon H. J.; Krawczyk, Antonina O.; de Jong, Anne; van Heel, Auke; Holsappel, Siger; Eijlander, Robyn T.
2016-01-01
Here, we report the draft genomes of five strains of Geobacillus spp., one Caldibacillus debilis strain, and one draft genome of Anoxybacillus flavithermus, all thermophilic spore-forming Gram-positive bacteria. PMID:27151781
Zhang, Jianwei; Kudrna, Dave; Mu, Ting; Li, Weiming; Copetti, Dario; Yu, Yeisoo; Goicoechea, Jose Luis; Lei, Yang; Wing, Rod A.
2016-01-01
Abstract Motivation: Next generation sequencing technologies have revolutionized our ability to rapidly and affordably generate vast quantities of sequence data. Once generated, raw sequences are assembled into contigs or scaffolds. However, these assemblies are mostly fragmented and inaccurate at the whole genome scale, largely due to the inability to integrate additional informative datasets (e.g. physical, optical and genetic maps). To address this problem, we developed a semi-automated software tool—Genome Puzzle Master (GPM)—that enables the integration of additional genomic signposts to edit and build ‘new-gen-assemblies’ that result in high-quality ‘annotation-ready’ pseudomolecules. Results: With GPM, loaded datasets can be connected to each other via their logical relationships which accomplishes tasks to ‘group,’ ‘merge,’ ‘order and orient’ sequences in a draft assembly. Manual editing can also be performed with a user-friendly graphical interface. Final pseudomolecules reflect a user’s total data package and are available for long-term project management. GPM is a web-based pipeline and an important part of a Laboratory Information Management System (LIMS) which can be easily deployed on local servers for any genome research laboratory. Availability and Implementation: The GPM (with LIMS) package is available at https://github.com/Jianwei-Zhang/LIMS Contacts: jzhang@mail.hzau.edu.cn or rwing@mail.arizona.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27318200
Bielaszewska, Martina; Karch, Helge; Toth, Ian K.
2012-01-01
Background An Escherichia coli O104:H4 outbreak in Germany in summer 2011 caused 53 deaths, over 4000 individual infections across Europe, and considerable economic, social and political impact. This outbreak was the first in a position to exploit rapid, benchtop high-throughput sequencing (HTS) technologies and crowdsourced data analysis early in its investigation, establishing a new paradigm for rapid response to disease threats. We describe a novel strategy for design of diagnostic PCR primers that exploited this rapid draft bacterial genome sequencing to distinguish between E. coli O104:H4 outbreak isolates and other pathogenic E. coli isolates, including the historical hæmolytic uræmic syndrome (HUSEC) E. coli HUSEC041 O104:H4 strain, which possesses the same serotype as the outbreak isolates. Methodology/Principal Findings Primers were designed using a novel alignment-free strategy against eleven draft whole genome assemblies of E. coli O104:H4 German outbreak isolates from the E. coli O104:H4 Genome Analysis Crowd-Sourcing Consortium website, and a negative sequence set containing 69 E. coli chromosome and plasmid sequences from public databases. Validation in vitro against 21 ‘positive’ E. coli O104:H4 outbreak and 32 ‘negative’ non-outbreak EHEC isolates indicated that individual primer sets exhibited 100% sensitivity for outbreak isolates, with false positive rates of between 9% and 22%. A minimal combination of two primers discriminated between outbreak and non-outbreak E. coli isolates with 100% sensitivity and 100% specificity. Conclusions/Significance Draft genomes of isolates of disease outbreak bacteria enable high throughput primer design and enhanced diagnostic performance in comparison to traditional molecular assays. Future outbreak investigations will be able to harness HTS rapidly to generate draft genome sequences and diagnostic primer sets, greatly facilitating epidemiology and clinical diagnostics. We expect that high throughput primer design strategies will enable faster, more precise responses to future disease outbreaks of bacterial origin, and help to mitigate their societal impact. PMID:22496820
Kavousi, Niloofar; Eng, Wilhelm Wei Han; Lee, Yin Peng; Tan, Lian Huat; Thuraisingham, Ravindran; Yule, Catherine M; Gan, Han Ming
2016-03-03
We report here the first high-quality draft genome sequence of Pasteurella multocida sequence type 128, which was isolated from the infected finger bone of an adult female who was bitten by a domestic dog. The draft genome will be a valuable addition to the scarce genomic resources available for P. multocida. Copyright © 2016 Kavousi et al.
ProDeGe: A computational protocol for fully automated decontamination of genomes
Tennessen, Kristin; Andersen, Evan; Clingenpeel, Scott; ...
2015-06-09
Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequences from them are propagating into public databases to drive novel scientific discoveries, rigorous quality controls and decontamination protocols are urgently needed. Here, we present ProDeGe (Protocol for fully automated Decontamination of Genomes), the first computational protocol for fully automated decontamination of draft genomes. ProDeGe classifies sequences into two classes—clean and contaminant—using a combination of homology and feature-based methodologies.more » On average, 84% of sequence from the non-target organism is removed from the data set (specificity) and 84% of the sequence from the target organism is retained (sensitivity). Lastly, the procedure operates successfully at a rate of ~0.30 CPU core hours per megabase of sequence and can be applied to any type of genome sequence.« less
ProDeGe: A computational protocol for fully automated decontamination of genomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tennessen, Kristin; Andersen, Evan; Clingenpeel, Scott
Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequences from them are propagating into public databases to drive novel scientific discoveries, rigorous quality controls and decontamination protocols are urgently needed. Here, we present ProDeGe (Protocol for fully automated Decontamination of Genomes), the first computational protocol for fully automated decontamination of draft genomes. ProDeGe classifies sequences into two classes—clean and contaminant—using a combination of homology and feature-based methodologies.more » On average, 84% of sequence from the non-target organism is removed from the data set (specificity) and 84% of the sequence from the target organism is retained (sensitivity). Lastly, the procedure operates successfully at a rate of ~0.30 CPU core hours per megabase of sequence and can be applied to any type of genome sequence.« less
2014-01-01
The Bactrian camel (Camelus bactrianus) and the dromedary (Camelus dromedarius) are among the last species that have been domesticated around 3000–6000 years ago. During domestication, strong artificial (anthropogenic) selection has shaped the livestock, creating a huge amount of phenotypes and breeds. Hence, domestic animals represent a unique resource to understand the genetic basis of phenotypic variation and adaptation. Similar to its late domestication history, the Bactrian camel is also among the last livestock animals to have its genome sequenced and deciphered. As no genomic data have been available until recently, we generated a de novo assembly by shotgun sequencing of a single male Bactrian camel. We obtained 1.6 Gb genomic sequences, which correspond to more than half of the Bactrian camel’s genome. The aim of this study was to identify heterozygous single-nucleotide polymorphisms (SNPs) and to estimate population parameters and nucleotide diversity based on an individual camel. With an average 6.6-fold coverage, we detected over 116 000 heterozygous SNPs and recorded a genome-wide nucleotide diversity similar to that of other domesticated ungulates. More than 20 000 (85%) dromedary expressed sequence tags successfully aligned to our genomic draft. Our results provide a template for future association studies targeting economically relevant traits and to identify changes underlying the process of camel domestication and environmental adaptation. PMID:23454912
Draft genome of the American Eel (Anguilla rostrata).
Pavey, Scott A; Laporte, Martin; Normandeau, Eric; Gaudin, Jérémy; Letourneau, Louis; Boisvert, Sébastien; Corbeil, Jacques; Audet, Céline; Bernatchez, Louis
2017-07-01
Freshwater eels (Anguilla sp.) have large economic, cultural, ecological and aesthetic importance worldwide, but they suffered more than 90% decline in global stocks over the past few decades. Proper genetic resources, such as sequenced, assembled and annotated genomes, are essential to help plan sustainable recoveries by identifying physiological, biochemical and genetic mechanisms that caused the declines or that may lead to recoveries. Here, we present the first sequenced genome of the American eel. This genome contained 305 043 contigs (N50 = 7397) and 79 209 scaffolds (N50 = 86 641) for a total size of 1.41 Gb, which is in the middle of the range of previous estimations for this species. In addition, protein-coding regions, including introns and flanking regions, are very well represented in the genome, as 95.2% of the 458 core eukaryotic genes and 98.8% of the 248 ultra-conserved subset were represented in the assembly and a total of 26 564 genes were annotated for future functional genomics studies. We performed a candidate gene analysis to compare three genes among all three freshwater eel species and, congruent with the phylogenetic relationships, Japanese eel (A. japanica) exhibited the most divergence. Overall, the sequenced genome presented in this study is a crucial addition to the presently available genetic tools to help guide future conservation efforts of freshwater eels. © 2016 John Wiley & Sons Ltd.
Tollis, Marc; DeNardo, Dale F.; Cornelius, John A.; Dolby, Greer A.; Edwards, Taylor; Henen, Brian T.; Karl, Alice E.; Murphy, Robert W.
2017-01-01
Agassiz’s desert tortoise (Gopherus agassizii) is a long-lived species native to the Mojave Desert and is listed as threatened under the US Endangered Species Act. To aid conservation efforts for preserving the genetic diversity of this species, we generated a whole genome reference sequence with an annotation based on deep transcriptome sequences of adult skeletal muscle, lung, brain, and blood. The draft genome assembly for G. agassizii has a scaffold N50 length of 252 kbp and a total length of 2.4 Gbp. Genome annotation reveals 20,172 protein-coding genes in the G. agassizii assembly, and that gene structure is more similar to chicken than other turtles. We provide a series of comparative analyses demonstrating (1) that turtles are among the slowest-evolving genome-enabled reptiles, (2) amino acid changes in genes controlling desert tortoise traits such as shell development, longevity and osmoregulation, and (3) fixed variants across the Gopherus species complex in genes related to desert adaptations, including circadian rhythm and innate immune response. This G. agassizii genome reference and annotation is the first such resource for any tortoise, and will serve as a foundation for future analysis of the genetic basis of adaptations to the desert environment, allow for investigation into genomic factors affecting tortoise health, disease and longevity, and serve as a valuable resource for additional studies in this species complex. PMID:28562605
Tollis, Marc; DeNardo, Dale F; Cornelius, John A; Dolby, Greer A; Edwards, Taylor; Henen, Brian T; Karl, Alice E; Murphy, Robert W; Kusumi, Kenro
2017-01-01
Agassiz's desert tortoise (Gopherus agassizii) is a long-lived species native to the Mojave Desert and is listed as threatened under the US Endangered Species Act. To aid conservation efforts for preserving the genetic diversity of this species, we generated a whole genome reference sequence with an annotation based on deep transcriptome sequences of adult skeletal muscle, lung, brain, and blood. The draft genome assembly for G. agassizii has a scaffold N50 length of 252 kbp and a total length of 2.4 Gbp. Genome annotation reveals 20,172 protein-coding genes in the G. agassizii assembly, and that gene structure is more similar to chicken than other turtles. We provide a series of comparative analyses demonstrating (1) that turtles are among the slowest-evolving genome-enabled reptiles, (2) amino acid changes in genes controlling desert tortoise traits such as shell development, longevity and osmoregulation, and (3) fixed variants across the Gopherus species complex in genes related to desert adaptations, including circadian rhythm and innate immune response. This G. agassizii genome reference and annotation is the first such resource for any tortoise, and will serve as a foundation for future analysis of the genetic basis of adaptations to the desert environment, allow for investigation into genomic factors affecting tortoise health, disease and longevity, and serve as a valuable resource for additional studies in this species complex.
2014-12-11
Cassava (Manihot esculenta Crantz) is a major staple crop in Africa, Asia, and South America, and its starchy roots provide nourishment for 800 million people worldwide. Although native to South America, cassava was brought to Africa 400-500 years ago and is now widely cultivated across sub-Saharan Africa, but it is subject to biotic and abiotic stresses. To assist in the rapid identification of markers for pathogen resistance and crop traits, and to accelerate breeding programs, we generated a framework map for M. esculenta Crantz from reduced representation sequencing [genotyping-by-sequencing (GBS)]. The composite 2412-cM map integrates 10 biparental maps (comprising 3480 meioses) and organizes 22,403 genetic markers on 18 chromosomes, in agreement with the observed karyotype. We used the map to anchor 71.9% of the draft genome assembly and 90.7% of the predicted protein-coding genes. The chromosome-anchored genome sequence will be useful for breeding improvement by assisting in the rapid identification of markers linked to important traits, and in providing a framework for genomic selection-enhanced breeding of this important crop. Copyright © 2015 International Cassava Genetic Map Consortium (ICGMC).
Lyons, Jessica
2014-12-11
Cassava Manihot esculenta Crantz) is a major staple crop in Africa, Asia, and South America, and its starchy roots provide nourishment for 800 million people worldwide. Although native to South America, cassava was brought to Africa 400–500 years ago and is now widely cultivated across sub-Saharan Africa, but it is subject to biotic and abiotic stresses. To assist in the rapid identification of markers for pathogen resistance and crop traits, and to accelerate breeding programs, we generated a framework map for M. esculent Crantz from reduced representation sequencing [genotyping-by-sequencing (GBS)]. The composite 2412-cM map integrates 10 biparental maps (comprising 3480more » meioses) and organizes 22,403 genetic markers on 18 chromosomes, in agreement with the observed karyotype. Here, we used the map to anchor 71.9% of the draft genome assembly and 90.7% of the predicted protein-coding genes. The chromosome-anchored genome sequence will be useful for breeding improvement by assisting in the rapid identification of markers linked to important traits, and in providing a framework for genomic selectionenhanced breeding of this important crop.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lyons, Jessica
Cassava Manihot esculenta Crantz) is a major staple crop in Africa, Asia, and South America, and its starchy roots provide nourishment for 800 million people worldwide. Although native to South America, cassava was brought to Africa 400–500 years ago and is now widely cultivated across sub-Saharan Africa, but it is subject to biotic and abiotic stresses. To assist in the rapid identification of markers for pathogen resistance and crop traits, and to accelerate breeding programs, we generated a framework map for M. esculent Crantz from reduced representation sequencing [genotyping-by-sequencing (GBS)]. The composite 2412-cM map integrates 10 biparental maps (comprising 3480more » meioses) and organizes 22,403 genetic markers on 18 chromosomes, in agreement with the observed karyotype. Here, we used the map to anchor 71.9% of the draft genome assembly and 90.7% of the predicted protein-coding genes. The chromosome-anchored genome sequence will be useful for breeding improvement by assisting in the rapid identification of markers linked to important traits, and in providing a framework for genomic selectionenhanced breeding of this important crop.« less
Tripathi, Charu; Mahato, Nitish K; Rani, Pooja; Singh, Yogendra; Kamra, Komal; Lal, Rup
2016-01-01
Lampropedia cohaerens strain CT6(T), a non-motile, aerobic and coccoid strain was isolated from arsenic rich microbial mats (temperature ~45 °C) of a hot water spring located atop the Himalayan ranges at Manikaran, India. The present study reports the first genome sequence of type strain CT6(T) of genus Lampropedia cohaerens. Sequencing data was generated using the Illumina HiSeq 2000 platform and assembled with ABySS v 1.3.5. The 3,158,922 bp genome was assembled into 41 contigs with a mean GC content of 63.5 % and 2823 coding sequences. Strain CT6(T) was found to harbour genes involved in both the Entner-Duodoroff pathway and non-phosphorylated ED pathway. Strain CT6(T) also contained genes responsible for imparting resistance to arsenic, copper, cobalt, zinc, cadmium and magnesium, providing survival advantages at a thermal location. Additionally, the presence of genes associated with biofilm formation, pyrroloquinoline-quinone production, isoquinoline degradation and mineral phosphate solubilisation in the genome demonstrate the diverse genetic potential for survival at stressed niches.
Discovery of a Novel Periodontal Disease-Associated Bacterium.
Torres, Pedro J; Thompson, John; McLean, Jeffrey S; Kelley, Scott T; Edlund, Anna
2018-06-02
One of the world's most common infectious disease, periodontitis (PD), derives from largely uncharacterized communities of oral bacteria growing as biofilms (a.k.a. plaque) on teeth and gum surfaces in periodontal pockets. Bacteria associated with periodontal disease trigger inflammatory responses in immune cells, which in later stages of the disease cause loss of both soft and hard tissue structures supporting teeth. Thus far, only a handful of bacteria have been characterized as infectious agents of PD. Although deep sequencing technologies, such as whole community shotgun sequencing have the potential to capture a detailed picture of highly complex bacterial communities in any given environment, we still lack major reference genomes for the oral microbiome associated with PD and other diseases. In recent work, by using a combination of supervised machine learning and genome assembly, we identified a genome from a novel member of the Bacteroidetes phylum in periodontal samples. Here, by applying a comparative metagenomics read-classification approach, including 272 metagenomes from various human body sites, and our previously assembled draft genome of the uncultivated Candidatus Bacteroides periocalifornicus (CBP) bacterium, we show CBP's ubiquitous distribution in dental plaque, as well as its strong association with the well-known pathogenic "red complex" that resides in deep periodontal pockets.
Metagenomic Analysis of Therapeutic PYO Phage Cocktails from 1997 to 2014
Larsen, Mette Voldby
2017-01-01
Phage therapy has regained interest in recent years due to the alarming spread of antibiotic resistance. Whilst phage cocktails are commonly sold in pharmacies in countries such as Georgia and Russia, this is not the case in western countries due to western regulatory agencies requiring a thorough characterization of the drug. Here, DNA sequencing of constituent biological entities constitutes a first step. The pyophage (PYO) cocktail is one of the main commercial products of the Georgian Eliava Institute of Bacteriophage, Microbiology and Virology and is used to cure skin infections. Since its first production in the 1930s, the composition of the cocktail has been periodically modified to add phages effective against emerging pathogenic strains. In this paper, we compared the composition of three PYO cocktails from 1997 (PYO97), 2000 (PYO2000) and 2014 (PYO2014). Based on next generation sequencing, de novo assembly and binning of contigs into draft genomes based on tetranucleotide distance, thirty and twenty-nine phage draft genomes were predicted in PYO97 and PYO2014, respectively. Of these, thirteen and fifteen shared high similarity to known phages. Eleven draft genomes were found to be common in the two cocktails. One of these showed no similarity to publicly available phage genomes. Representatives of phages targeting E. faecalis, E. faecium, E. coli, Proteus, P. aeruginosa and S. aureus were found in both cocktails. Finally, we estimated larger overlap of the PYO2000 cocktail to PYO97 compared to PYO2014. Using next generation sequencing and metagenomics analysis, we were able to characterize and compare the content of PYO cocktails separated by 17 years in time. Even though the cocktail composition is upgraded every six months, we found it to remain relatively stable over the years. PMID:29099783
The Physcomitrella patens chromosome-scale assembly reveals moss genome structure and evolution
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lang, Daniel; Ullrich, Kristian K.; Murat, Florent
Here, the draft genome of the moss model, Physcomitrella patens, comprised approximately 2000 unordered scaffolds. In order to enable analyses of genome structure and evolution we generated a chromosome–scale genome assembly using genetic linkage as well as (end) sequencing of long DNA fragments. We find that 57% of the genome comprises transposable elements (TEs), some of which may be actively transposing during the life cycle. Unlike in flowering plant genomes, gene– and TE–rich regions show an overall even distribution along the chromosomes. However, the chromosomes are mono–centric with peaks of a class of Copia elements potentially coinciding with centromeres. Genemore » body methylation is evident in 5.7% of the protein–coding genes, typically coinciding with low GC and low expression. Some giant virus insertions are transcriptionally active and might protect gametes from viral infection via siRNA mediated silencing. Structure–based detection methods show that the genome evolved via two rounds of whole genome duplications (WGDs), apparently common in mosses but not in liverworts and hornworts. Several hundred genes are present in colinear regions conserved since the last common ancestor of plants. These syntenic regions are enriched for functions related to plant–specific cell growth and tissue organization. The P. patens genome lacks the TE–rich pericentromeric and gene–rich distal regions typical for most flowering plant genomes. More non–seed plant genomes are needed to unravel how plant genomes evolve, and to understand whether the P. patens genome structure is typical for mosses or bryophytes.« less
The Physcomitrella patens chromosome-scale assembly reveals moss genome structure and evolution
Lang, Daniel; Ullrich, Kristian K.; Murat, Florent; ...
2017-12-13
Here, the draft genome of the moss model, Physcomitrella patens, comprised approximately 2000 unordered scaffolds. In order to enable analyses of genome structure and evolution we generated a chromosome–scale genome assembly using genetic linkage as well as (end) sequencing of long DNA fragments. We find that 57% of the genome comprises transposable elements (TEs), some of which may be actively transposing during the life cycle. Unlike in flowering plant genomes, gene– and TE–rich regions show an overall even distribution along the chromosomes. However, the chromosomes are mono–centric with peaks of a class of Copia elements potentially coinciding with centromeres. Genemore » body methylation is evident in 5.7% of the protein–coding genes, typically coinciding with low GC and low expression. Some giant virus insertions are transcriptionally active and might protect gametes from viral infection via siRNA mediated silencing. Structure–based detection methods show that the genome evolved via two rounds of whole genome duplications (WGDs), apparently common in mosses but not in liverworts and hornworts. Several hundred genes are present in colinear regions conserved since the last common ancestor of plants. These syntenic regions are enriched for functions related to plant–specific cell growth and tissue organization. The P. patens genome lacks the TE–rich pericentromeric and gene–rich distal regions typical for most flowering plant genomes. More non–seed plant genomes are needed to unravel how plant genomes evolve, and to understand whether the P. patens genome structure is typical for mosses or bryophytes.« less
Rao, Soumya; Nandineni, Madhusudan R
2017-01-01
Colletotrichum truncatum, a major fungal phytopathogen, causes the anthracnose disease on an economically important spice crop chilli (Capsicum annuum), resulting in huge economic losses in tropical and sub-tropical countries. It follows a subcuticular intramural infection strategy on chilli with a short, asymptomatic, endophytic phase, which contrasts with the intracellular hemibiotrophic lifestyle adopted by most of the Colletotrichum species. However, little is known about the molecular determinants and the mechanism of pathogenicity in this fungus. A high quality whole genome sequence and gene annotation based on transcriptome data of an Indian isolate of C. truncatum from chilli has been obtained. Analysis of the genome sequence revealed a rich repertoire of pathogenicity genes in C. truncatum encoding secreted proteins, effectors, plant cell wall degrading enzymes, secondary metabolism associated proteins, with potential roles in the host-specific infection strategy, placing it next only to the Fusarium species. The size of genome assembly, number of predicted genes and some of the functional categories were similar to other sequenced Colletotrichum species. The comparative genomic analyses with other species and related fungi identified some unique genes and certain highly expanded gene families of CAZymes, proteases and secondary metabolism associated genes in the genome of C. truncatum. The draft genome assembly and functional annotation of potential pathogenicity genes of C. truncatum provide an important genomic resource for understanding the biology and lifestyle of this important phytopathogen and will pave the way for designing efficient disease control regimens.
Rao, Soumya
2017-01-01
Colletotrichum truncatum, a major fungal phytopathogen, causes the anthracnose disease on an economically important spice crop chilli (Capsicum annuum), resulting in huge economic losses in tropical and sub-tropical countries. It follows a subcuticular intramural infection strategy on chilli with a short, asymptomatic, endophytic phase, which contrasts with the intracellular hemibiotrophic lifestyle adopted by most of the Colletotrichum species. However, little is known about the molecular determinants and the mechanism of pathogenicity in this fungus. A high quality whole genome sequence and gene annotation based on transcriptome data of an Indian isolate of C. truncatum from chilli has been obtained. Analysis of the genome sequence revealed a rich repertoire of pathogenicity genes in C. truncatum encoding secreted proteins, effectors, plant cell wall degrading enzymes, secondary metabolism associated proteins, with potential roles in the host-specific infection strategy, placing it next only to the Fusarium species. The size of genome assembly, number of predicted genes and some of the functional categories were similar to other sequenced Colletotrichum species. The comparative genomic analyses with other species and related fungi identified some unique genes and certain highly expanded gene families of CAZymes, proteases and secondary metabolism associated genes in the genome of C. truncatum. The draft genome assembly and functional annotation of potential pathogenicity genes of C. truncatum provide an important genomic resource for understanding the biology and lifestyle of this important phytopathogen and will pave the way for designing efficient disease control regimens. PMID:28846714
Draft genome of the mountain pine beetle, Dendroctonus ponderosae Hopkins, a major forest pest
2013-01-01
Background The mountain pine beetle, Dendroctonus ponderosae Hopkins, is the most serious insect pest of western North American pine forests. A recent outbreak destroyed more than 15 million hectares of pine forests, with major environmental effects on forest health, and economic effects on the forest industry. The outbreak has in part been driven by climate change, and will contribute to increased carbon emissions through decaying forests. Results We developed a genome sequence resource for the mountain pine beetle to better understand the unique aspects of this insect's biology. A draft de novo genome sequence was assembled from paired-end, short-read sequences from an individual field-collected male pupa, and scaffolded using mate-paired, short-read genomic sequences from pooled field-collected pupae, paired-end short-insert whole-transcriptome shotgun sequencing reads of mRNA from adult beetle tissues, and paired-end Sanger EST sequences from various life stages. We describe the cytochrome P450, glutathione S-transferase, and plant cell wall-degrading enzyme gene families important to the survival of the mountain pine beetle in its harsh and nutrient-poor host environment, and examine genome-wide single-nucleotide polymorphism variation. A horizontally transferred bacterial sucrose-6-phosphate hydrolase was evident in the genome, and its tissue-specific transcription suggests a functional role for this beetle. Conclusions Despite Coleoptera being the largest insect order with over 400,000 described species, including many agricultural and forest pest species, this is only the second genome sequence reported in Coleoptera, and will provide an important resource for the Curculionoidea and other insects. PMID:23537049
Characterization of mango (Mangifera indica L.) transcriptome and chloroplast genome.
Azim, M Kamran; Khan, Ishtaiq A; Zhang, Yong
2014-05-01
We characterized mango leaf transcriptome and chloroplast genome using next generation DNA sequencing. The RNA-seq output of mango transcriptome generated >12 million reads (total nucleotides sequenced >1 Gb). De novo transcriptome assembly generated 30,509 unigenes with lengths in the range of 300 to ≥3,000 nt and 67× depth of coverage. Blast searching against nonredundant nucleotide databases and several Viridiplantae genomic datasets annotated 24,593 mango unigenes (80% of total) and identified Citrus sinensis as closest neighbor of mango with 9,141 (37%) matched sequences. The annotation with gene ontology and Clusters of Orthologous Group terms categorized unigene sequences into 57 and 25 classes, respectively. More than 13,500 unigenes were assigned to 293 KEGG pathways. Besides major plant biology related pathways, KEGG based gene annotation pointed out active presence of an array of biochemical pathways involved in (a) biosynthesis of bioactive flavonoids, flavones and flavonols, (b) biosynthesis of terpenoids and lignins and (c) plant hormone signal transduction. The mango transcriptome sequences revealed 235 proteases belonging to five catalytic classes of proteolytic enzymes. The draft genome of mango chloroplast (cp) was obtained by a combination of Sanger and next generation sequencing. The draft mango cp genome size is 151,173 bp with a pair of inverted repeats of 27,093 bp separated by small and large single copy regions, respectively. Out of 139 genes in mango cp genome, 91 found to be protein coding. Sequence analysis revealed cp genome of C. sinensis as closest neighbor of mango. We found 51 short repeats in mango cp genome supposed to be associated with extensive rearrangements. This is the first report of transcriptome and chloroplast genome analysis of any Anacardiaceae family member.
Gao, Jian; Li, Qiye; Wang, Zongji; Zhou, Yang; Martelli, Paolo; Li, Fang; Xiong, Zijun; Wang, Jian; Yang, Huanming; Zhang, Guojie
2017-07-01
The Chinese crocodile lizard, Shinisaurus crocodilurus, is the only living representative of the monotypic family Shinisauridae under the order Squamata. It is an obligate semi-aquatic, viviparous, diurnal species restricted to specific portions of mountainous locations in southwestern China and northeastern Vietnam. However, in the past several decades, this species has undergone a rapid decrease in population size due to illegal poaching and habitat disruption, making this unique reptile species endangered and listed in the Convention on International Trade in Endangered Species of Wild Fauna and Flora Appendix II since 1990. A proposal to uplist it to Appendix I was passed at the Convention on International Trade in Endangered Species of Wild Fauna and Flora Seventeenth meeting of the Conference of the Parties in 2016. To promote the conservation of this species, we sequenced the genome of a male Chinese crocodile lizard using a whole-genome shotgun strategy on the Illumina HiSeq 2000 platform. In total, we generated ∼291 Gb of raw sequencing data (×149 depth) from 13 libraries with insert sizes ranging from 250 bp to 40 kb. After filtering for polymerase chain reaction-duplicated and low-quality reads, ∼137 Gb of clean data (×70 depth) were obtained for genome assembly. We yielded a draft genome assembly with a total length of 2.24 Gb and an N50 scaffold size of 1.47 Mb. The assembled genome was predicted to contain 20 150 protein-coding genes and up to 1114 Mb (49.6%) of repetitive elements. The genomic resource of the Chinese crocodile lizard will contribute to deciphering the biology of this organism and provides an essential tool for conservation efforts. It also provides a valuable resource for future study of squamate evolution. © The Authors 2017. Published by Oxford University Press.
Vikram, Surendra; Kumar, Shailesh; Vaidya, Bhumika; Pinnaka, Anil Kumar
2013-01-01
We report the 4.39-Mb draft genome sequence of the 2-chloro-4-nitrophenol-degrading bacterium Arthrobacter sp. strain SJCon, isolated from a pesticide-contaminated site. The draft genome sequence of strain SJCon will be helpful in studying the genetic pathways involved in the degradation of several aromatic compounds. PMID:23516196
Salgado-Morales, Rosalba; Rivera-Gómez, Nancy; Lozano-Aguirre Beltrán, Luis Fernando; Hernández-Mendoza, Armando; Dantán-González, Edgar
2017-09-07
We report the draft genome sequence of Gram-negative bacterium Pseudomonas aeruginosa NA04, isolated from the entomopathogenic nematode Heterorhabditis indica MOR03. The draft genome consists of 54 contigs, a length of 6.37 Mb, and a G+C content 66.49%. Copyright © 2017 Salgado-Morales et al.
Toward the automated generation of genome-scale metabolic networks in the SEED.
DeJongh, Matthew; Formsma, Kevin; Boillot, Paul; Gould, John; Rycenga, Matthew; Best, Aaron
2007-04-26
Current methods for the automated generation of genome-scale metabolic networks focus on genome annotation and preliminary biochemical reaction network assembly, but do not adequately address the process of identifying and filling gaps in the reaction network, and verifying that the network is suitable for systems level analysis. Thus, current methods are only sufficient for generating draft-quality networks, and refinement of the reaction network is still largely a manual, labor-intensive process. We have developed a method for generating genome-scale metabolic networks that produces substantially complete reaction networks, suitable for systems level analysis. Our method partitions the reaction space of central and intermediary metabolism into discrete, interconnected components that can be assembled and verified in isolation from each other, and then integrated and verified at the level of their interconnectivity. We have developed a database of components that are common across organisms, and have created tools for automatically assembling appropriate components for a particular organism based on the metabolic pathways encoded in the organism's genome. This focuses manual efforts on that portion of an organism's metabolism that is not yet represented in the database. We have demonstrated the efficacy of our method by reverse-engineering and automatically regenerating the reaction network from a published genome-scale metabolic model for Staphylococcus aureus. Additionally, we have verified that our method capitalizes on the database of common reaction network components created for S. aureus, by using these components to generate substantially complete reconstructions of the reaction networks from three other published metabolic models (Escherichia coli, Helicobacter pylori, and Lactococcus lactis). We have implemented our tools and database within the SEED, an open-source software environment for comparative genome annotation and analysis. Our method sets the stage for the automated generation of substantially complete metabolic networks for over 400 complete genome sequences currently in the SEED. With each genome that is processed using our tools, the database of common components grows to cover more of the diversity of metabolic pathways. This increases the likelihood that components of reaction networks for subsequently processed genomes can be retrieved from the database, rather than assembled and verified manually.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tiwari, Ravi; Howieson, John; Yates, Ron
Bradyrhizobium sp. WSM1253 is a novel N 2-fixing bacterium isolated from a root nodule of the herbaceous annual legume Ornithopus compressus that was growing on the Greek Island of Sifnos. WSM1253 emerged as a strain of interest in an Australian program that was selecting inoculant quality bradyrhizobial strains for inoculation of Mediterranean species of lupins ( Lupinus angustifolius, L. princei, L. atlanticus, L. pilosus ). In this report we describe, for the first time, the genome sequence information and annotation of this legume microsymbiont. The 8,719,808 bp genome has a G + C content of 63.09 % with 71 contigsmore » arranged into two scaffolds. The assembled genome contains 8,432 protein-coding genes, 66 RNA genes and a single rRNA operon. In conclusion, this improved-high-quality draft rhizobial genome is one of 20 sequenced through a DOE Joint Genome Institute 2010 Community Sequencing Project.« less
Tiwari, Ravi; Howieson, John; Yates, Ron; ...
2015-11-30
Bradyrhizobium sp. WSM1253 is a novel N 2-fixing bacterium isolated from a root nodule of the herbaceous annual legume Ornithopus compressus that was growing on the Greek Island of Sifnos. WSM1253 emerged as a strain of interest in an Australian program that was selecting inoculant quality bradyrhizobial strains for inoculation of Mediterranean species of lupins ( Lupinus angustifolius, L. princei, L. atlanticus, L. pilosus ). In this report we describe, for the first time, the genome sequence information and annotation of this legume microsymbiont. The 8,719,808 bp genome has a G + C content of 63.09 % with 71 contigsmore » arranged into two scaffolds. The assembled genome contains 8,432 protein-coding genes, 66 RNA genes and a single rRNA operon. In conclusion, this improved-high-quality draft rhizobial genome is one of 20 sequenced through a DOE Joint Genome Institute 2010 Community Sequencing Project.« less
Draft versus finished sequence data for DNA and protein diagnostic signature development
Gardner, Shea N.; Lam, Marisa W.; Smith, Jason R.; Torres, Clinton L.; Slezak, Tom R.
2005-01-01
Sequencing pathogen genomes is costly, demanding careful allocation of limited sequencing resources. We built a computational Sequencing Analysis Pipeline (SAP) to guide decisions regarding the amount of genomic sequencing necessary to develop high-quality diagnostic DNA and protein signatures. SAP uses simulations to estimate the number of target genomes and close phylogenetic relatives (near neighbors or NNs) to sequence. We use SAP to assess whether draft data are sufficient or finished sequencing is required using Marburg and variola virus sequences. Simulations indicate that intermediate to high-quality draft with error rates of 10−3–10−5 (∼8× coverage) of target organisms is suitable for DNA signature prediction. Low-quality draft with error rates of ∼1% (3× to 6× coverage) of target isolates is inadequate for DNA signature prediction, although low-quality draft of NNs is sufficient, as long as the target genomes are of high quality. For protein signature prediction, sequencing errors in target genomes substantially reduce the detection of amino acid sequence conservation, even if the draft is of high quality. In summary, high-quality draft of target and low-quality draft of NNs appears to be a cost-effective investment for DNA signature prediction, but may lead to underestimation of predicted protein signatures. PMID:16243783
A lncRNA Perspective into (Re)Building the Heart.
Frank, Stefan; Aguirre, Aitor; Hescheler, Juergen; Kurian, Leo
2016-01-01
Our conception of the human genome, long focused on the 2% that codes for proteins, has profoundly changed since its first draft assembly in 2001. Since then, an unanticipatedly expansive functionality and convolution has been attributed to the majority of the genome that is transcribed in a cell-type/context-specific manner into transcripts with no apparent protein coding ability. While the majority of these transcripts, currently annotated as long non-coding RNAs (lncRNAs), are functionally uncharacterized, their prominent role in embryonic development and tissue homeostasis, especially in the context of the heart, is emerging. In this review, we summarize and discuss the latest advances in understanding the relevance of lncRNAs in (re)building the heart.
Draft genome of the living fossil Ginkgo biloba.
Guan, Rui; Zhao, Yunpeng; Zhang, He; Fan, Guangyi; Liu, Xin; Zhou, Wenbin; Shi, Chengcheng; Wang, Jiahao; Liu, Weiqing; Liang, Xinming; Fu, Yuanyuan; Ma, Kailong; Zhao, Lijun; Zhang, Fumin; Lu, Zuhong; Lee, Simon Ming-Yuen; Xu, Xun; Wang, Jian; Yang, Huanming; Fu, Chengxin; Ge, Song; Chen, Wenbin
2016-11-21
Ginkgo biloba L. (Ginkgoaceae) is one of the most distinctive plants. It possesses a suite of fascinating characteristics including a large genome, outstanding resistance/tolerance to abiotic and biotic stresses, and dioecious reproduction, making it an ideal model species for biological studies. However, the lack of a high-quality genome sequence has been an impediment to our understanding of its biology and evolution. The 10.61 Gb genome sequence containing 41,840 annotated genes was assembled in the present study. Repetitive sequences account for 76.58% of the assembled sequence, and long terminal repeat retrotransposons (LTR-RTs) are particularly prevalent. The diversity and abundance of LTR-RTs is due to their gradual accumulation and a remarkable amplification between 16 and 24 million years ago, and they contribute to the long introns and large genome. Whole genome duplication (WGD) may have occurred twice, with an ancient WGD consistent with that shown to occur in other seed plants, and a more recent event specific to ginkgo. Abundant gene clusters from tandem duplication were also evident, and enrichment of expanded gene families indicates a remarkable array of chemical and antibacterial defense pathways. The ginkgo genome consists mainly of LTR-RTs resulting from ancient gradual accumulation and two WGD events. The multiple defense mechanisms underlying the characteristic resilience of ginkgo are fostered by a remarkable enrichment in ancient duplicated and ginkgo-specific gene clusters. The present study sheds light on sequencing large genomes, and opens an avenue for further genetic and evolutionary research.
Genome sequence of the olive tree, Olea europaea.
Cruz, Fernando; Julca, Irene; Gómez-Garrido, Jèssica; Loska, Damian; Marcet-Houben, Marina; Cano, Emilio; Galán, Beatriz; Frias, Leonor; Ribeca, Paolo; Derdak, Sophia; Gut, Marta; Sánchez-Fernández, Manuel; García, Jose Luis; Gut, Ivo G; Vargas, Pablo; Alioto, Tyler S; Gabaldón, Toni
2016-06-27
The Mediterranean olive tree (Olea europaea subsp. europaea) was one of the first trees to be domesticated and is currently of major agricultural importance in the Mediterranean region as the source of olive oil. The molecular bases underlying the phenotypic differences among domesticated cultivars, or between domesticated olive trees and their wild relatives, remain poorly understood. Both wild and cultivated olive trees have 46 chromosomes (2n). A total of 543 Gb of raw DNA sequence from whole genome shotgun sequencing, and a fosmid library containing 155,000 clones from a 1,000+ year-old olive tree (cv. Farga) were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 443 kb, and a total length of 1.31 Gb, which represents 95 % of the estimated genome length (1.38 Gb). In addition, the associated fungus Aureobasidium pullulans was partially sequenced. Genome annotation, assisted by RNA sequencing from leaf, root, and fruit tissues at various stages, resulted in 56,349 unique protein coding genes, suggesting recent genomic expansion. Genome completeness, as estimated using the CEGMA pipeline, reached 98.79 %. The assembled draft genome of O. europaea will provide a valuable resource for the study of the evolution and domestication processes of this important tree, and allow determination of the genetic bases of key phenotypic traits. Moreover, it will enhance breeding programs and the formation of new varieties.
Gutiérrez Acosta, Olga B; Schleheck, David; Schink, Bernhard
2014-07-11
The sulfate-reducing bacterium Desulfococcus biacutus is able to utilize acetone for growth by an inducible degradation pathway that involves a novel activation reaction for acetone with CO as a co-substrate. The mechanism, enzyme(s) and gene(s) involved in this acetone activation reaction are of great interest because they represent a novel and yet undefined type of activation reaction under strictly anoxic conditions. In this study, a draft genome sequence of D. biacutus was established. Sequencing, assembly and annotation resulted in 159 contigs with 5,242,029 base pairs and 4773 predicted genes; 4708 were predicted protein-encoding genes, and 3520 of these had a functional prediction. Proteins and genes were identified that are specifically induced during growth with acetone. A thiamine diphosphate-requiring enzyme appeared to be highly induced during growth with acetone and is probably involved in the activation reaction. Moreover, a coenzyme B12- dependent enzyme and proteins that are involved in redox reactions were also induced during growth with acetone. We present for the first time the genome of a sulfate reducer that is able to grow with acetone. The genome information of this organism represents an important tool for the elucidation of a novel reaction mechanism that is employed by a sulfate reducer in acetone activation.
Hane, James K; Ming, Yao; Kamphuis, Lars G; Nelson, Matthew N; Garg, Gagan; Atkins, Craig A; Bayer, Philipp E; Bravo, Armando; Bringans, Scott; Cannon, Steven; Edwards, David; Foley, Rhonda; Gao, Ling-Ling; Harrison, Maria J; Huang, Wei; Hurgobin, Bhavna; Li, Sean; Liu, Cheng-Wu; McGrath, Annette; Morahan, Grant; Murray, Jeremy; Weller, James; Jian, Jianbo; Singh, Karam B
2017-03-01
Lupins are important grain legume crops that form a critical part of sustainable farming systems, reducing fertilizer use and providing disease breaks. It has a basal phylogenetic position relative to other crop and model legumes and a high speciation rate. Narrow-leafed lupin (NLL; Lupinus angustifolius L.) is gaining popularity as a health food, which is high in protein and dietary fibre but low in starch and gluten-free. We report the draft genome assembly (609 Mb) of NLL cultivar Tanjil, which has captured >98% of the gene content, sequences of additional lines and a dense genetic map. Lupins are unique among legumes and differ from most other land plants in that they do not form mycorrhizal associations. Remarkably, we find that NLL has lost all mycorrhiza-specific genes, but has retained genes commonly required for mycorrhization and nodulation. In addition, the genome also provided candidate genes for key disease resistance and domestication traits. We also find evidence of a whole-genome triplication at around 25 million years ago in the genistoid lineage leading to Lupinus. Our results will support detailed studies of legume evolution and accelerate lupin breeding programmes. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Genome sequencing of the winged midge, Parochlus steinenii, from the Antarctic Peninsula.
Kim, Sanghee; Oh, Mijin; Jung, Woongsic; Park, Joonho; Choi, Han-Gu; Shin, Seung Chul
2017-03-01
In the Antarctic, only two species of Chironomidae occur naturally-the wingless midge, Belgica antarctica , and the winged midge, Parochlus steinenii . B. antarctica is an extremophile with unusual adaptations. The larvae of B. antarctica are desiccation- and freeze-tolerant and the adults are wingless. Recently, the compact genome of B. antarctica was reported and it is the first Antarctic eukaryote to be sequenced. Although P. steinenii occurs naturally in the Antarctic with B. antarctica , the larvae of P. steinenii are cold-tolerant but not freeze-tolerant and the adults are winged. Differences in adaptations in the Antarctic midges are interesting in terms of evolutionary processes within an extreme environment. Herein, we provide the genome of another Antarctic midge to help elucidate the evolution of these species. The draft genome of P. steinenii had a total size of 138 Mbp, comprising 9513 contigs with an N50 contig size of 34,110 bp, and a GC content of 32.2%. Overall, 13,468 genes were predicted using the MAKER annotation pipeline, and gene ontology classified 10,801 (80.2%) predicted genes to a function. Compared with the assembled genome architecture of B. antarctica , that of P. steinenii was approximately 50 Mbp longer with 6.2-fold more repeat sequences, whereas gene regions were as similarly compact as in B. antarctica . We present an annotated draft genome of the Antarctic midge, P. steinenii . The genomes of P. steinenii and B. antarctica will aid in the elucidation of evolution in harsh environments and provide new resources for functional genomic analyses of the order Diptera. © The Authors 2017. Published by Oxford University Press.
Iwanowicz, Luke R; Iwanowicz, Deborah D; Adams, Cynthia R; Galbraith, Heather; Aunins, Aaron; Cornman, Robert S
2017-10-12
Here, we report a draft genome sequence of a picorna-like virus associated with brook trout, Salvelinus fontinalis , gill tissue. The draft genome comprises 8,681 nucleotides, excluding the poly(A) tract, and contains two open reading frames. It is most similar to picorna-like viruses that infect invertebrates.
Iwanowicz, Luke R.; Iwanowicz, Deborah; Adams, Cynthia; Galbraith, Heather S.; Aunins, Aaron W.; Cornman, Robert S.
2017-01-01
Here, we report a draft genome sequence of a picorna-like virus associated with brook trout, Salvelinus fontinalis, gill tissue. The draft genome comprises 8,681 nucleotides, excluding the poly(A) tract, and contains two open reading frames. It is most similar to picorna-like viruses that infect invertebrates.
Seuylemezian, Arman; Singh, Nitin K; Vaishampayan, Parag; Venkateswaran, Kasthuri
2017-08-31
We report here the draft genome of Solibacillus kalamii ISSFR-015, isolated from a high-energy particulate arrestance filter aboard the International Space Station. The draft genome sequence of this strain contains 3,809,180 bp with an estimated G+C content of 38.61%. Copyright © 2017 Seuylemezian et al.
Genome sequence of Phytophthora ramorum: implications for management
Brett Tyler; Sucheta Tripathy; Nik Grunwald; Kurt Lamour; Kelly Ivors; Matteo Garbelotto; Daniel Rokhsar; Nik Putnam; Igor Grigoriev; Jeffrey Boore
2006-01-01
A draft genome sequence has been determined for Phytophthora ramorum, together with a draft sequence of the soybean pathogen Phytophthora sojae. The P. ramorum genome was sequenced to a depth of 7-fold coverage, while the P. sojae genome was sequenced to a depth of 9-fold coverage. The genome...
Liang, Zhishu; Li, Guiying; Das, Ranjit
2016-01-01
Here, we report the draft genome sequence of Bacillus sp. strain GZT, a 2,4,6-tribromophenol (TBP)-degrading bacterium previously isolated from an electronic waste-dismantling region. The draft genome sequence is 5.18 Mb and has a G+C content of 35.1%. This is the first genome report of a brominated flame retardant-degrading strain. PMID:27257197
Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca
2015-01-01
Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources. PMID:26151450
Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca
2015-01-01
Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.
Coyne, Robert S; Thiagarajan, Mathangi; Jones, Kristie M; Wortman, Jennifer R; Tallon, Luke J; Haas, Brian J; Cassidy-Hanley, Donna M; Wiley, Emily A; Smith, Joshua J; Collins, Kathleen; Lee, Suzanne R; Couvillion, Mary T; Liu, Yifan; Garg, Jyoti; Pearlman, Ronald E; Hamilton, Eileen P; Orias, Eduardo; Eisen, Jonathan A; Methé, Barbara A
2008-01-01
Background Tetrahymena thermophila, a widely studied model for cellular and molecular biology, is a binucleated single-celled organism with a germline micronucleus (MIC) and somatic macronucleus (MAC). The recent draft MAC genome assembly revealed low sequence repetitiveness, a result of the epigenetic removal of invasive DNA elements found only in the MIC genome. Such low repetitiveness makes complete closure of the MAC genome a feasible goal, which to achieve would require standard closure methods as well as removal of minor MIC contamination of the MAC genome assembly. Highly accurate preliminary annotation of Tetrahymena's coding potential was hindered by the lack of both comparative genomic sequence information from close relatives and significant amounts of cDNA evidence, thus limiting the value of the genomic information and also leaving unanswered certain questions, such as the frequency of alternative splicing. Results We addressed the problem of MIC contamination using comparative genomic hybridization with purified MIC and MAC DNA probes against a whole genome oligonucleotide microarray, allowing the identification of 763 genome scaffolds likely to contain MIC-limited DNA sequences. We also employed standard genome closure methods to essentially finish over 60% of the MAC genome. For the improvement of annotation, we have sequenced and analyzed over 60,000 verified EST reads from a variety of cellular growth and development conditions. Using this EST evidence, a combination of automated and manual reannotation efforts led to updates that affect 16% of the current protein-coding gene models. By comparing EST abundance, many genes showing apparent differential expression between these conditions were identified. Rare instances of alternative splicing and uses of the non-standard amino acid selenocysteine were also identified. Conclusion We report here significant progress in genome closure and reannotation of Tetrahymena thermophila. Our experience to date suggests that complete closure of the MAC genome is attainable. Using the new EST evidence, automated and manual curation has resulted in substantial improvements to the over 24,000 gene models, which will be valuable to researchers studying this model organism as well as for comparative genomics purposes. PMID:19036158
2013-01-01
Background Though India has sequenced water buffalo genome but its draft assembly is based on cattle genome BTau 4.0, thus de novo chromosome wise assembly is a major pending issue for global community. The existing radiation hybrid of buffalo and these reported STR can be used further in final gap plugging and “finishing” expected in de novo genome assembly. QTL and gene mapping needs mining of putative STR from buffalo genome at equal interval on each and every chromosome. Such markers have potential role in improvement of desirable characteristics, such as high milk yields, resistance to diseases, high growth rate. The STR mining from whole genome and development of user friendly database is yet to be done to reap the benefit of whole genome sequence. Description By in silico microsatellite mining of whole genome, we have developed first STR database of water buffalo, BuffSatDb (Buffalo MicroSatellite Database (http://cabindb.iasri.res.in/buffsatdb/) which is a web based relational database of 910529 microsatellite markers, developed using PHP and MySQL database. Microsatellite markers have been generated using MIcroSAtellite tool. It is simple and systematic web based search for customised retrieval of chromosome wise and genome-wide microsatellites. Search has been enabled based on chromosomes, motif type (mono-hexa), repeat motif and repeat kind (simple and composite). The search may be customised by limiting location of STR on chromosome as well as number of markers in that range. This is a novel approach and not been implemented in any of the existing marker database. This database has been further appended with Primer3 for primer designing of the selected markers enabling researcher to select markers of choice at desired interval over the chromosome. The unique add-on of degenerate bases further helps in resolving presence of degenerate bases in current buffalo assembly. Conclusion Being first buffalo STR database in the world , this would not only pave the way in resolving current assembly problem but shall be of immense use for global community in QTL/gene mapping critically required to increase knowledge in the endeavour to increase buffalo productivity, especially for third world country where rural economy is significantly dependent on buffalo productivity. PMID:23336431
Sarika; Arora, Vasu; Iquebal, Mir Asif; Rai, Anil; Kumar, Dinesh
2013-01-19
Though India has sequenced water buffalo genome but its draft assembly is based on cattle genome BTau 4.0, thus de novo chromosome wise assembly is a major pending issue for global community. The existing radiation hybrid of buffalo and these reported STR can be used further in final gap plugging and "finishing" expected in de novo genome assembly. QTL and gene mapping needs mining of putative STR from buffalo genome at equal interval on each and every chromosome. Such markers have potential role in improvement of desirable characteristics, such as high milk yields, resistance to diseases, high growth rate. The STR mining from whole genome and development of user friendly database is yet to be done to reap the benefit of whole genome sequence. By in silico microsatellite mining of whole genome, we have developed first STR database of water buffalo, BuffSatDb (Buffalo MicroSatellite Database (http://cabindb.iasri.res.in/buffsatdb/) which is a web based relational database of 910529 microsatellite markers, developed using PHP and MySQL database. Microsatellite markers have been generated using MIcroSAtellite tool. It is simple and systematic web based search for customised retrieval of chromosome wise and genome-wide microsatellites. Search has been enabled based on chromosomes, motif type (mono-hexa), repeat motif and repeat kind (simple and composite). The search may be customised by limiting location of STR on chromosome as well as number of markers in that range. This is a novel approach and not been implemented in any of the existing marker database. This database has been further appended with Primer3 for primer designing of the selected markers enabling researcher to select markers of choice at desired interval over the chromosome. The unique add-on of degenerate bases further helps in resolving presence of degenerate bases in current buffalo assembly. Being first buffalo STR database in the world , this would not only pave the way in resolving current assembly problem but shall be of immense use for global community in QTL/gene mapping critically required to increase knowledge in the endeavour to increase buffalo productivity, especially for third world country where rural economy is significantly dependent on buffalo productivity.
2017-01-01
ABSTRACT Here, we report a draft genome sequence of a picorna-like virus associated with brook trout, Salvelinus fontinalis, gill tissue. The draft genome comprises 8,681 nucleotides, excluding the poly(A) tract, and contains two open reading frames. It is most similar to picorna-like viruses that infect invertebrates. PMID:29025930
Seribelli, Amanda Aparecida; Frazão, Miliane Rodrigues; Gonzales, Júlia Cunha; Cao, Guojie; Leon, Maria Sanchez; Kich, Jalusa Deon; Allard, Marc William; Falcão, Juliana Pfrimer
2018-04-19
Salmonellosis is a disease with a high incidence worldwide, and Salmonella enterica subsp. enterica serovar Typhimurium is one of the most clinically important serovars. We report here the draft genome sequences of 20 S. Typhimurium strains isolated from swine in Santa Catarina, Brazil. These draft genomes will improve our understanding of S. Typhimurium in Brazil.
Draft Genome Sequence of Ideonella sp. Strain A 288, Isolated from an Iron-Precipitating Biofilm
Künzel, Sven; Szewzyk, Ulrich
2017-01-01
ABSTRACT Here, we report the draft genome sequence of the betaproteobacterium Ideonella sp. strain A_228. This isolate, obtained from a bog iron ore-containing floodplain area in Germany, provides valuable information about the genetic diversity of neutrophilic iron-depositing bacteria. The Illumina NextSeq technique was used to sequence the draft genome sequence of the strain. PMID:28818902
Draft genome of agar-degrading marine bacterium Gilvimarinus agarilyticus JEA5.
Lee, Youngdeuk; Lee, Su-Jin; Park, Gun-Hoo; Heo, Soo-Jin; Umasuthan, Navaneethaiyer; Kang, Do-Hyung; Oh, Chulhong
2015-06-01
Gilvimarinus agarilyticus JEA5, which effectively degrades agar, was isolated from the seawater of Jeju Island, Republic of Korea. Here, we report the draft genome sequence of G. agarilyticus JEA5 with a total genome size of 4,179,438bp from 2 scaffolds (21 contigs) with 53.15% G+C content. Various polysaccharidases including 11 predicted agarases were observed from the draft genome of G. agarilyticus JEA5. Copyright © 2015 Elsevier B.V. All rights reserved.
Diop, Awa; Diop, Khoudia; Tomei, Enora; Raoult, Didier; Fenollar, Florence; Fournier, Pierre-Edouard
2018-03-01
We report here the draft genome sequence of Ezakiella peruensis strain M6.X2 T The draft genome is 1,672,788 bp long and harbors 1,589 predicted protein-encoding genes, including 26 antibiotic resistance genes with 1 gene encoding vancomycin resistance. The genome also exhibits 1 clustered regularly interspaced short palindromic repeat region and 333 genes acquired by horizontal gene transfer. Copyright © 2018 Diop et al.
Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers.
Varshney, Rajeev K; Chen, Wenbin; Li, Yupeng; Bharti, Arvind K; Saxena, Rachit K; Schlueter, Jessica A; Donoghue, Mark T A; Azam, Sarwar; Fan, Guangyi; Whaley, Adam M; Farmer, Andrew D; Sheridan, Jaime; Iwata, Aiko; Tuteja, Reetu; Penmetsa, R Varma; Wu, Wei; Upadhyaya, Hari D; Yang, Shiaw-Pyng; Shah, Trushar; Saxena, K B; Michael, Todd; McCombie, W Richard; Yang, Bicheng; Zhang, Gengyun; Yang, Huanming; Wang, Jun; Spillane, Charles; Cook, Douglas R; May, Gregory D; Xu, Xun; Jackson, Scott A
2011-11-06
Pigeonpea is an important legume food crop grown primarily by smallholder farmers in many semi-arid tropical regions of the world. We used the Illumina next-generation sequencing platform to generate 237.2 Gb of sequence, which along with Sanger-based bacterial artificial chromosome end sequences and a genetic map, we assembled into scaffolds representing 72.7% (605.78 Mb) of the 833.07 Mb pigeonpea genome. Genome analysis predicted 48,680 genes for pigeonpea and also showed the potential role that certain gene families, for example, drought tolerance-related genes, have played throughout the domestication of pigeonpea and the evolution of its ancestors. Although we found a few segmental duplication events, we did not observe the recent genome-wide duplication events observed in soybean. This reference genome sequence will facilitate the identification of the genetic basis of agronomically important traits, and accelerate the development of improved pigeonpea varieties that could improve food security in many developing countries.
The genome of woodland strawberry (Fragaria vesca)
Shulaev, Vladimir; Sargent, Daniel J; Crowhurst, Ross N; Mockler, Todd C; Folkerts, Otto; Delcher, Arthur L; Jaiswal, Pankaj; Mockaitis, Keithanne; Liston, Aaron; Mane, Shrinivasrao P; Burns, Paul; Davis, Thomas M; Slovin, Janet P; Bassil, Nahla; Hellens, Roger P; Evans, Clive; Harkins, Tim; Kodira, Chinnappa; Desany, Brian; Crasta, Oswald R; Jensen, Roderick V; Allan, Andrew C; Michael, Todd P; Setubal, Joao Carlos; Celton, Jean-Marc; Rees, D Jasper G; Williams, Kelly P; Holt, Sarah H; Ruiz Rojas, Juan Jairo; Chatterjee, Mithu; Liu, Bo; Silva, Herman; Meisel, Lee; Adato, Avital; Filichkin, Sergei A; Troggio, Michela; Viola, Roberto; Ashman, Tia-Lynn; Wang, Hao; Dharmawardhana, Palitha; Elser, Justin; Raja, Rajani; Priest, Henry D; Bryant, Douglas W; Fox, Samuel E; Givan, Scott A; Wilhelm, Larry J; Naithani, Sushma; Christoffels, Alan; Salama, David Y; Carter, Jade; Girona, Elena Lopez; Zdepski, Anna; Wang, Wenqin; Kerstetter, Randall A; Schwab, Wilfried; Korban, Schuyler S; Davik, Jahn; Monfort, Amparo; Denoyes-Rothan, Beatrice; Arus, Pere; Mittler, Ron; Flinn, Barry; Aharoni, Asaph; Bennetzen, Jeffrey L; Salzberg, Steven L; Dickerman, Allan W; Velasco, Riccardo; Borodovsky, Mark; Veilleux, Richard E; Folta, Kevin M
2012-01-01
The woodland strawberry, Fragaria vesca (2n = 2x = 14), is a versatile experimental plant system. This diminutive herbaceous perennial has a small genome (240 Mb), is amenable to genetic transformation and shares substantial sequence identity with the cultivated strawberry (Fragaria × ananassa) and other economically important rosaceous plants. Here we report the draft F. vesca genome, which was sequenced to ×39 coverage using second-generation technology, assembled de novo and then anchored to the genetic linkage map into seven pseudochromosomes. This diploid strawberry sequence lacks the large genome duplications seen in other rosids. Gene prediction modeling identified 34,809 genes, with most being supported by transcriptome mapping. Genes critical to valuable horticultural traits including flavor, nutritional value and flowering time were identified. Macrosyntenic relationships between Fragaria and Prunus predict a hypothetical ancestral Rosaceae genome that had nine chromosomes. New phylogenetic analysis of 154 protein-coding genes suggests that assignment of Populus to Malvidae, rather than Fabidae, is warranted. PMID:21186353
Behera, Pratiksha; Vaishampayan, Parag; Singh, Nitin K; Mishra, Samir R; Raina, Vishakha; Suar, Mrutyunjay; Pattnaik, Ajit K; Rastogi, Gurdeep
2016-09-01
Till date, only one draft genome has been reported within the genus Mangrovibacter. Here, we report the second draft genome shotgun sequence of a Mangrovibacter sp. strain MP23 that was isolated from the roots of Phargmites karka (P. karka), an invasive weed growing in the Chilika Lagoon, Odisha, India. Strain MP23 is a facultative anaerobic, nitrogen-fixing endophytic bacteria that grows optimally at 37 °C, 7.0 pH, and 1% NaCl concentration. The draft genome sequence of strain MP23 contains 4,947,475 bp with an estimated G + C content of 49.9% and total 4392 protein coding genes. The genome sequence has provided information on putative genes that code for proteins involved in oxidative stress, uptake of nutrients, and nitrogen fixation that might offer niche specific ecological fitness and explain the invasive success of P. karka in Chilika Lagoon. The draft genome sequence and annotation have been deposited at DDBJ/EMBL/GenBank under the accession number LYRP00000000.
USDA-ARS?s Scientific Manuscript database
This report includes the complete genome of the Campylobacter concisus type strain ATCC 33237T and the draft genomes of eight additional well characterized C. concisus genomes. C. concisus has been shown to be a genetically heterogeneous species and these nine genomes provide valuable information re...
Hoy, Marjorie A.; Waterhouse, Robert M.; Wu, Ke; Estep, Alden S.; Ioannidis, Panagiotis; Palmer, William J.; Pomerantz, Aaron F.; Simão, Felipe A.; Thomas, Jainy; Jiggins, Francis M.; Murphy, Terence D.; Pritham, Ellen J.; Robertson, Hugh M.; Zdobnov, Evgeny M.; Gibbs, Richard A.; Richards, Stephen
2016-01-01
Metaseiulus occidentalis is an eyeless phytoseiid predatory mite employed for the biological control of agricultural pests including spider mites. Despite appearances, these predator and prey mites are separated by some 400 Myr of evolution and radically different lifestyles. We present a 152-Mb draft assembly of the M. occidentalis genome: Larger than that of its favored prey, Tetranychus urticae, but considerably smaller than those of many other chelicerates, enabling an extremely contiguous and complete assembly to be built—the best arachnid to date. Aided by transcriptome data, genome annotation cataloged 18,338 protein-coding genes and identified large numbers of Helitron transposable elements. Comparisons with other arthropods revealed a particularly dynamic and turbulent genomic evolutionary history. Its genes exhibit elevated molecular evolution, with strikingly high numbers of intron gains and losses, in stark contrast to the deer tick Ixodes scapularis. Uniquely among examined arthropods, this predatory mite’s Hox genes are completely atomized, dispersed across the genome, and it encodes five copies of the normally single-copy RNA processing Dicer-2 gene. Examining gene families linked to characteristic biological traits of this tiny predator provides initial insights into processes of sex determination, development, immune defense, and how it detects, disables, and digests its prey. As the first reference genome for the Phytoseiidae, and for any species with the rare sex determination system of parahaploidy, the genome of the western orchard predatory mite improves genomic sampling of chelicerates and provides invaluable new resources for functional genomic analyses of this family of agriculturally important mites. PMID:26951779
Hosmani, Prashant S.; Villalobos-Ayala, Krystal; Miller, Sherry; Shippy, Teresa; Flores, Mirella; Rosendale, Andrew; Cordola, Chris; Bell, Tracey; Mann, Hannah; DeAvila, Gabe; DeAvila, Daniel; Moore, Zachary; Buller, Kyle; Ciolkevich, Kathryn; Nandyal, Samantha; Mahoney, Robert; Van Voorhis, Joshua; Dunlevy, Megan; Farrow, David; Hunter, David; Morgan, Taylar; Shore, Kayla; Guzman, Victoria; Izsak, Allison; Dixon, Danielle E.; Cridge, Andrew; Cano, Liliana; Cao, Xiaolong; Jiang, Haobo; Leng, Nan; Johnson, Shannon; Cantarel, Brandi L.; Richards, Stephen; English, Adam; Shatters, Robert G.; Childers, Chris; Chen, Mei-Ju; Hunter, Wayne; Cilia, Michelle; Mueller, Lukas A.; Munoz-Torres, Monica; Nelson, David; Poelchau, Monica F.; Benoit, Joshua B.; Wiersma-Koch, Helen; D’Elia, Tom; Brown, Susan J.
2017-01-01
Abstract The Asian citrus psyllid (Diaphorina citri Kuwayama) is the insect vector of the bacterium Candidatus Liberibacter asiaticus (CLas), the pathogen associated with citrus Huanglongbing (HLB, citrus greening). HLB threatens citrus production worldwide. Suppression or reduction of the insect vector using chemical insecticides has been the primary method to inhibit the spread of citrus greening disease. Accurate structural and functional annotation of the Asian citrus psyllid genome, as well as a clear understanding of the interactions between the insect and CLas, are required for development of new molecular-based HLB control methods. A draft assembly of the D. citri genome has been generated and annotated with automated pipelines. However, knowledge transfer from well-curated reference genomes such as that of Drosophila melanogaster to newly sequenced ones is challenging due to the complexity and diversity of insect genomes. To identify and improve gene models as potential targets for pest control, we manually curated several gene families with a focus on genes that have key functional roles in D. citri biology and CLas interactions. This community effort produced 530 manually curated gene models across developmental, physiological, RNAi regulatory and immunity-related pathways. As previously shown in the pea aphid, RNAi machinery genes putatively involved in the microRNA pathway have been specifically duplicated. A comprehensive transcriptome enabled us to identify a number of gene families that are either missing or misassembled in the draft genome. In order to develop biocuration as a training experience, we included undergraduate and graduate students from multiple institutions, as well as experienced annotators from the insect genomics research community. The resulting gene set (OGS v1.0) combines both automatically predicted and manually curated gene models. Database URL: https://citrusgreening.org/ PMID:29220441
O-Thong, Sompong; Khongkliang, Peerawat; Mamimin, Chonticha; Singkhala, Apinya; Prasertsan, Poonsuk; Birkeland, Nils-Kåre
2017-06-01
Thermoanaerobacterium sp. strain PSU-2 was isolated from thermophilic hydrogen producing reactor and subjected to draft genome sequencing on 454 pyrosequencing and annotated on RAST. The draft genome sequence of strain PSU-2 contains 2,552,497 bases with an estimated G + C content of 35.2%, 2555 CDS, 8 rRNAs and 57 tRNAs. The strain had a number of genes responsible for carbohydrates metabolic, amino acids and derivatives, and protein metabolism of 17.7%, 14.39% and 9.81%, respectively. Strain PSU-2 also had gene responsible for hydrogen biosynthesis as well as the genes related to Ni-Fe hydrogenase. Comparative genomic analysis indicates strain PSU-2 shares about 94% genome sequence similarity with Thermoanaerobacterium xylanolyticum LX-11. The nucleotide sequence of this draft genome was deposited into DDBJ/ENA/GenBank under the accession MSQD00000000.
Wüthrich, Daniel; Bruggmann, Rémy; Berthoud, Hélène; Arias-Roth, Emmanuelle
2015-01-01
Clostridium tyrobutyricum is the main microorganism responsible for late blowing defect in cheeses. Here, we present the draft genome sequences of two C. tyrobutyricum strains isolated from a Swiss semihard red-smear cheese. The two draft genomes comprise 3.05 and 3.08 Mbp and contain 3,030 and 3,089 putative coding sequences, respectively. PMID:25767226
Hazen, Tracy H.; Mettus, Roberta T.; McElheny, Christi L.; Bowler, Sarah L.
2018-01-01
ABSTRACT We report here the draft genome sequences of four blaKPC-containing bacteria identified as Klebsiella aerogenes, Citrobacter freundii, and Citrobacter koseri. Additionally, we report the draft genome sequence of a K. aerogenes strain that did not contain a blaKPC gene but was isolated from the patient who had the blaKPC-2-containing K. aerogenes strain. PMID:29472325
Hazen, Tracy H; Mettus, Roberta T; McElheny, Christi L; Bowler, Sarah L; Doi, Yohei; Rasko, David A
2018-02-22
We report here the draft genome sequences of four bla KPC -containing bacteria identified as Klebsiella aerogenes , Citrobacter freundii , and Citrobacter koseri Additionally, we report the draft genome sequence of a K. aerogenes strain that did not contain a bla KPC gene but was isolated from the patient who had the bla KPC-2 -containing K. aerogenes strain. Copyright © 2018 Hazen et al.
Dua, Ankita; Sangwan, Naseer; Kaur, Jasvinder; Saxena, Anjali; Kohli, Puneet; Gupta, A. K.
2013-01-01
We report here the draft genome sequence of the alphaproteobacterium Agrobacterium sp. strain UHFBA-218, which was isolated from rhizosphere soil of crown gall-infected cherry rootstock Colt. The draft genome of strain UHFBA-218 consists of 112 contigs (5,425,303 bp) and 5,063 coding sequences with a G+C content of 59.8%. PMID:23723402
Genome Analysis of Staphylococcus agnetis, an Agent of Lameness in Broiler Chickens
Ojha, Sohita; Pummill, Jeff F.; Koon, Joseph A.; Wideman, Robert F.; Rhoads, Douglas D.
2015-01-01
Lameness in broiler chickens is a significant animal welfare and financial issue. Lameness can be enhanced by rearing young broilers on wire flooring. We have identified Staphylococcus agnetis as significantly involved in bacterial chondronecrosis with osteomyelitis (BCO) in proximal tibia and femorae, leading to lameness in broiler chickens in the wire floor system. Administration of S. agnetis in water induces lameness. Previously reported in some cases of cattle mastitis, this is the first report of this poorly described pathogen in chickens. We used long and short read next generation sequencing to assemble single finished contigs for the genome and a large plasmid from the chicken pathogen. Comparison of the S. agnetis genome to those of other pathogenic Staphylococci shows that S.agnetis contains a distinct repertoire of virulence determinants. Additionally, the S. agnetis genome has several regions that differ substantially from the genomes of other pathogenic Staphylococci. Comparison of our finished genome to a recent draft genome for a cattle mastitis isolate suggests that future investigations focus on the evolutionary epidemiology of this emerging pathogen of domestic animals. PMID:26606420
Yassin, Atteyet F; Langenberg, Stefan; Huntemann, Marcel; Clum, Alicia; Pillay, Manoj; Palaniappan, Krishnaveni; Varghese, Neha; Mikhailova, Natalia; Mukherjee, Supratim; Reddy, T B K; Daum, Chris; Shapiro, Nicole; Ivanova, Natalia; Woyke, Tanja; Kyrpides, Nikos C
2017-01-01
The permanent draft genome sequence of Actinotignum schaalii DSM 15541T is presented. The annotated genome includes 2,130,987 bp, with 1777 protein-coding and 58 rRNA-coding genes. Genome sequence analysis revealed absence of genes encoding for: components of the PTS systems, enzymes of the TCA cycle, glyoxylate shunt and gluconeogensis. Genomic data revealed that A. schaalii is able to oxidize carbohydrates via glycolysis, the nonoxidative pentose phosphate and the Entner-Doudoroff pathways. Besides, the genome harbors genes encoding for enzymes involved in the conversion of pyruvate to lactate, acetate and ethanol, which are found to be the end products of carbohydrate fermentation. The genome contained the gene encoding Type I fatty acid synthase required for de novo FAS biosynthesis. The plsY and plsX genes encoding the acyltransferases necessary for phosphatidic acid biosynthesis were absent from the genome. The genome harbors genes encoding enzymes responsible for isoprene biosynthesis via the mevalonate (MVA) pathway. Genes encoding enzymes that confer resistance to reactive oxygen species (ROS) were identified. In addition, A. schaalii harbors genes that protect the genome against viral infections. These include restriction-modification (RM) systems, type II toxin-antitoxin (TA), CRISPR-Cas and abortive infection system. A. schaalii genome also encodes several virulence factors that contribute to adhesion and internalization of this pathogen such as the tad genes encoding proteins required for pili assembly, the nanI gene encoding exo-alpha-sialidase, genes encoding heat shock proteins and genes encoding type VII secretion system. These features are consistent with anaerobic and pathogenic lifestyles. Finally, resistance to ciprofloxacin occurs by mutation in chromosomal genes that encode the subunits of DNA-gyrase (GyrA) and topisomerase IV (ParC) enzymes, while resistant to metronidazole was due to the frxA gene, which encodes NADPH-flavin oxidoreductase.
Yuliana, Tri; Nakajima, Nobuyoshi; Yamamura, Shigeki; Tomita, Masaru; Suzuki, Haruo; Amachi, Seigo
2017-01-01
Roseovarius sp. A-2 is a heterotrophic iodide (I - )-oxidizing bacterium isolated from iodide-rich natural gas brine water in Chiba, Japan. This strain oxidizes iodide to molecular iodine (I 2 ) by means of an extracellular multicopper oxidase. Here we report the draft genome sequence of strain A-2. The draft genome contained 46 tRNA genes, 1 copy of a 16S-23S-5S rRNA operon, and 4,514 protein coding DNA sequences, of which 1,207 (27%) were hypothetical proteins. The genome contained a gene encoding IoxA, a multicopper oxidase previously found to catalyze the oxidation of iodide in Iodidimonas sp. Q-1. This draft genome provides detailed insights into the metabolism and potential application of Roseovarius sp. A-2.
Draft genome sequence of two Shingopyxis sp. strains H107 and H115 isolated from a chloraminated drinking water distriburion system simulatorThis dataset is associated with the following publication:Gomez-Alvarez, V., S. Pfaller , and R. Revetta. Draft Genome of Two Sphingopyxis sp. Strains, Dominant Members of the Bacterial Community Associated with a Drinking Water Distribution System Simulator. Genome Announcements. American Society for Microbiology, Washington, DC, USA, 4(2): e00183-16, (2016).
Storari, Michelangelo; Wüthrich, Daniel; Bruggmann, Rémy; Berthoud, Hélène; Arias-Roth, Emmanuelle
2015-03-12
Clostridium tyrobutyricum is the main microorganism responsible for late blowing defect in cheeses. Here, we present the draft genome sequences of two C. tyrobutyricum strains isolated from a Swiss semihard red-smear cheese. The two draft genomes comprise 3.05 and 3.08 Mbp and contain 3,030 and 3,089 putative coding sequences, respectively. Copyright © 2015 Storari et al.
The draft genome sequence and annotation of the desert woodrat Neotoma lepida.
Campbell, Michael; Oakeson, Kelly F; Yandell, Mark; Halpert, James R; Dearing, Denise
2016-09-01
We present the de novo draft genome sequence for a vertebrate mammalian herbivore, the desert woodrat (Neotoma lepida). This species is of ecological and evolutionary interest with respect to ingestion, microbial detoxification and hepatic metabolism of toxic plant secondary compounds from the highly toxic creosote bush (Larrea tridentata) and the juniper shrub (Juniperus monosperma). The draft genome sequence and annotation have been deposited at GenBank under the accession LZPO01000000.
Kohli, Puneet; Dua, Ankita; Sangwan, Naseer; Oldach, Phoebe; Khurana, J. P.
2013-01-01
Here, we report the draft genome sequence of the hexachlorocyclohexane (HCH)-degrading bacterium Sphingobium ummariense strain RL-3, which was isolated from the HCH dumpsite located in Lucknow, India (27°00′N and 81°09′E). The annotated draft genome sequence (4.75 Mb) of strain RL-3 consisted of 139 contigs, 4,645 coding sequences, and 65% G+C content. PMID:24233594
Suhaili, Zarizal; Lean, Soo-Sum; Mohamad, Noor Muzamil; Rachman, Abdul R Abdul; Desa, Mohd Nasir Mohd; Yeo, Chew Chieng
2016-09-01
Most of the efforts in elucidating the molecular relatedness and epidemiology of Staphylococcus aureus in Malaysia have been largely focused on methicillin-resistant S. aureus (MRSA). Therefore, here we report the draft genome sequence of the methicillin-susceptible Staphylococcus aureus (MSSA) with sequence type 1 (ST1), spa type t127 with Panton-Valentine Leukocidin (pvl) pathogenic determinant isolated from pus sample designated as KT/314250 strain. The size of the draft genome is 2.86 Mbp with 32.7% of G + C content consisting 2673 coding sequences. The draft genome sequence has been deposited in DDBJ/EMBL/GenBank under the accession number AOCP00000000.
Nam, Bo-Hye; Kwak, Woori; Kim, Young-Ok; Kim, Dong-Gyun; Kong, Hee Jeong; Kim, Woo-Jin; Kang, Jeong-Ha; Park, Jung Youn; An, Cheul Min; Moon, Ji-Young; Park, Choul Ji; Yu, Jae Woong; Yoon, Joon; Seo, Minseok; Kim, Kwondo; Kim, Duk Kyung; Lee, SaetByeol; Sung, Samsun; Lee, Chul; Shin, Younhee; Jung, Myunghee; Kang, Byeong-Chul; Shin, Ga-Hee; Ka, Sojeong; Caetano-Anolles, Kelsey; Cho, Seoae; Kim, Heebal
2017-05-01
Abalones are large marine snails in the family Haliotidae and the genus Haliotis belonging to the class Gastropoda of the phylum Mollusca. The family Haliotidae contains only one genus, Haliotis, and this single genus is known to contain several species of abalone. With 18 additional subspecies, the most comprehensive treatment of Haliotidae considers 56 species valid [ 1 ]. Abalone is an economically important fishery and aquaculture animal that is considered a highly prized seafood delicacy. The total global supply of abalone has increased 5-fold since the 1970s and farm production increased explosively from 50 mt to 103 464 mt in the past 40 years. Additionally, researchers have recently focused on abalone given their reported tumor suppression effect. However, despite the valuable features of this marine animal, no genomic information is available for the Haliotidae family and related research is still limited. To construct the H . discus hannai genome, a total of 580-G base pairs using Illumina and Pacbio platforms were generated with 322-fold coverage based on the 1.8-Gb estimated genome size of H . discus hannai using flow cytometry. The final genome assembly consisted of 1.86 Gb with 35 450 scaffolds (>2 kb). GC content level was 40.51%, and the N50 length of assembled scaffolds was 211 kb. We identified 29 449 genes using Evidence Modeler based on the gene information from ab initio prediction, protein homology with known genes, and transcriptome evidence of RNA-seq. Here we present the first Haliotidae genome, H . discus hannai , with sequencing data, assembly, and gene annotation information. This will be helpful for resolving the lack of genomic information in the Haliotidae family as well as providing more opportunities for understanding gastropod evolution. © The Authors 2017. Published by Oxford University Press.
Takeda, Hiroyuki
2008-06-01
The medaka Oryzias latipes is a small egg-laying freshwater teleost, and has become an excellent model system for developmental genetics and evolutionary biology. The medaka genome is relatively small in size, approximately 800 Mb, and the genome sequencing project was recently completed by Japanese research groups, providing a high-quality draft genome sequence of the inbred Hd-rR strain of medaka. In this review, I present an overview of the medaka genome project including genome resources, followed by specific findings obtained with the medaka draft genome. In particular, I focus on the analysis that was done by taking advantage of the medaka system, such as the sex chromosome differentiation and the regional history of medaka species using single nucleotide polymorphisms as genomic markers.
The draft genome of a socially polymorphic halictid bee, Lasioglossum albipes
2013-01-01
Background Taxa that harbor natural phenotypic variation are ideal for ecological genomic approaches aimed at understanding how the interplay between genetic and environmental factors can lead to the evolution of complex traits. Lasioglossum albipes is a polymorphic halictid bee that expresses variation in social behavior among populations, and common-garden experiments have suggested that this variation is likely to have a genetic component. Results We present the L. albipes genome assembly to characterize the genetic and ecological factors associated with the evolution of social behavior. The de novo assembly is comparable to other published social insect genomes, with an N50 scaffold length of 602 kb. Gene families unique to L. albipes are associated with integrin-mediated signaling and DNA-binding domains, and several appear to be expanded in this species, including the glutathione-s-transferases and the inositol monophosphatases. L. albipes has an intact DNA methylation system, and in silico analyses suggest that methylation occurs primarily in exons. Comparisons to other insect genomes indicate that genes associated with metabolism and nucleotide binding undergo accelerated evolution in the halictid lineage. Whole-genome resequencing data from one solitary and one social L. albipes female identify six genes that appear to be rapidly diverging between social forms, including a putative odorant receptor and a cuticular protein. Conclusions L. albipes represents a novel genetic model system for understanding the evolution of social behavior. It represents the first published genome sequence of a primitively social insect, thereby facilitating comparative genomic studies across the Hymenoptera as a whole. PMID:24359881
Quality scores for 32,000 genomes
Land, Miriam L.; Hyatt, Doug; Jun, Se-Ran; ...
2014-12-08
More than 80% of the microbial genomes in GenBank are of ‘draft’ quality (12,553 draft vs. 2,679 finished, as of October, 2013). In this study, we have examined all the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases, and assigned quality scores for more than 30,000 prokaryotic genome sequences. Scores were assigned using four categories: the completeness of the assembly, the presence of full-length rRNA genes, tRNA composition and the presence of a set of 102 conserved genes in prokaryotes. Most (~88%) of the genomes hadmore » quality scores of 0.8 or better and can be safely used for standard comparative genomics analysis. We compared genomes across factors that may influence the score. We found that although sequencing depth coverage of over 100x did not ensure a better score, sequencing read length was a better indicator of sequencing quality. With few exceptions, most of the 30,000 genomes have nearly all the 102 essential genes. The score can be used to set thresholds for screening data when analyzing “all published genomes” and reference data is either not available or not applicable. The scores highlighted organisms for which commonly used tools do not perform well. This information can be used to improve tools and to serve a broad group of users as more diverse organisms are sequenced. Finally and unexpectedly, the comparison of predicted tRNAs across 15,000 high quality genomes showed that anticodons beginning with an ‘A’ (codons ending with a ‘U’) are almost non-existent, with the exception of one arginine codon (CGU); this has been noted previously in the literature for a few genomes, but not with the depth found here.« less
Quality scores for 32,000 genomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Land, Miriam L.; Hyatt, Doug; Jun, Se-Ran
More than 80% of the microbial genomes in GenBank are of ‘draft’ quality (12,553 draft vs. 2,679 finished, as of October, 2013). In this study, we have examined all the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases, and assigned quality scores for more than 30,000 prokaryotic genome sequences. Scores were assigned using four categories: the completeness of the assembly, the presence of full-length rRNA genes, tRNA composition and the presence of a set of 102 conserved genes in prokaryotes. Most (~88%) of the genomes hadmore » quality scores of 0.8 or better and can be safely used for standard comparative genomics analysis. We compared genomes across factors that may influence the score. We found that although sequencing depth coverage of over 100x did not ensure a better score, sequencing read length was a better indicator of sequencing quality. With few exceptions, most of the 30,000 genomes have nearly all the 102 essential genes. The score can be used to set thresholds for screening data when analyzing “all published genomes” and reference data is either not available or not applicable. The scores highlighted organisms for which commonly used tools do not perform well. This information can be used to improve tools and to serve a broad group of users as more diverse organisms are sequenced. Finally and unexpectedly, the comparison of predicted tRNAs across 15,000 high quality genomes showed that anticodons beginning with an ‘A’ (codons ending with a ‘U’) are almost non-existent, with the exception of one arginine codon (CGU); this has been noted previously in the literature for a few genomes, but not with the depth found here.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chain, P; Garcia, E
2003-02-06
The goal of this proposed effort was to assess the difficulty in identifying and characterizing virulence candidate genes in an organism for which very limited data exists. This was accomplished by first addressing the finishing phase of draft-sequenced F. tularensis genomes and conducting comparative analyses to determine the coding potential of each genome; to discover the differences in genome structure and content, and to identify potential genes whose products may be involved in the F. tularensis virulence process. The project was divided into three parts: (1) Genome finishing: This part involves determining the order and orientation of the consensus sequencesmore » of contigs obtained from Phrap assemblies of random draft genomic sequences. This tedious process consists of linking contig ends using information embedded in each sequence file that relates the sequence to the original cloned insert. Since inserts are sequenced from both ends, we can establish a link between these paired-ends in different contigs and thus order and orient contigs. Since these genomes carry numerous copies of insertion sequences, these repeated elements ''confuse'' the Phrap assembly program. It is thus necessary to break these contigs apart at the repeated sequences and individually join the proper flanking regions using paired-end information, or using results of comparisons against a similar genome. Larger repeated elements such as the small subunit ribosomal RNA operon require verification with PCR. Tandem repeats require manual intervention and typically rely on single nucleotide polymorphisms to be resolved. Remaining gaps require PCR reactions and sequencing. Once the genomes have been ''closed'', low quality regions are addressed by resequencing reactions. (2) Genome analysis: The final consensus sequences are processed by combining the results of three gene modelers: Glimmer, Critica and Generation. The final gene models are submitted to a battery of homology searches and domain prediction programs in order to annotate them (e.g. BLAST, Pfam, TIGRfam, COG, KEGG, InterPro, TMhmm, SignalP). The genome structure is also assessed in terms of G+C content, GC bias (GC skew), and locations of repeated regions (e.g. IS elements) and phage-like genes. (3) Comparative genomics: The results of the various genome analyses are compared between the finished (or almost finished) genomes. Here, we have compared the F. tularensis genomes from the extremely lethal strain Schu4 (subsp. tularensis), the vaccine strain LVS (subsp. holartica), and strain UT01-4992 of the less virulent, opportunistic subsp. novicida. Regions present in the highly virulent strain that are absent from the other less virulent strains may provide insight into what factors are required for the high level of virulence.« less
Draft Genome Sequence of a Rare Smut Relative, Tilletiaria anomala UBC 951
Toome, Merje; Kuo, Alan; Henrissat, Bernard; ...
2014-06-12
We present the draft genome sequence of the smut fungus Tilletiaria anomala UBC 951 (Basidiomycota, Ustilaginomycotina). The sequenced genome size is 18.7 Mb, consisting of 289 scaffolds and a total of 6,810 predicted genes. This is the first genome sequence published for a fungus in the order Georgefisheriales (Exobasidiomycetes).
Draft genome sequence of Enterococcus faecium strain LMG 8148.
Michiels, Joran E; Van den Bergh, Bram; Fauvart, Maarten; Michiels, Jan
2016-01-01
Enterococcus faecium, traditionally considered a harmless gut commensal, is emerging as an important nosocomial pathogen showing increasing rates of multidrug resistance. We report the draft genome sequence of E. faecium strain LMG 8148, isolated in 1968 from a human in Gothenburg, Sweden. The draft genome has a total length of 2,697,490 bp, a GC-content of 38.3 %, and 2,402 predicted protein-coding sequences. The isolation of this strain predates the emergence of E. faecium as a nosocomial pathogen. Consequently, its genome can be useful in comparative genomic studies investigating the evolution of E. faecium as a pathogen.
Exome capture from the spruce and pine giga-genomes.
Suren, H; Hodgins, K A; Yeaman, S; Nurkowski, K A; Smets, P; Rieseberg, L H; Aitken, S N; Holliday, J A
2016-09-01
Sequence capture is a flexible tool for generating reduced representation libraries, particularly in species with massive genomes. We used an exome capture approach to sequence the gene space of two of the dominant species in Canadian boreal and montane forests - interior spruce (Picea glauca x engelmanii) and lodgepole pine (Pinus contorta). Transcriptome data generated with RNA-seq were coupled with draft genome sequences to design baits corresponding to 26 824 genes from pine and 28 649 genes from spruce. A total of 579 samples for spruce and 631 samples for pine were included, as well as two pine congeners and six spruce congeners. More than 50% of targeted regions were sequenced at >10× depth in each species, while ~12% captured near-target regions within 500 bp of a bait position were sequenced to a depth >10×. Much of our read data arose from off-target regions, which was likely due to the fragmented and incomplete nature of the draft genome assemblies. Capture in general was successful for the related species, suggesting that baits designed for a single species are likely to successfully capture sequences from congeners. From these data, we called approximately 10 million SNPs and INDELs in each species from coding regions, introns, untranslated and flanking regions, as well as from the intergenic space. Our study demonstrates the utility of sequence capture for resequencing in complex conifer genomes, suggests guidelines for improving capture efficiency and provides a rich resource of genetic variants for studies of selection and local adaptation in these species. © 2016 John Wiley & Sons Ltd.
A genomic overview of the population structure of Salmonella.
Alikhan, Nabil-Fareed; Zhou, Zhemin; Sergeant, Martin J; Achtman, Mark
2018-04-01
For many decades, Salmonella enterica has been subdivided by serological properties into serovars or further subdivided for epidemiological tracing by a variety of diagnostic tests with higher resolution. Recently, it has been proposed that so-called eBurst groups (eBGs) based on the alleles of seven housekeeping genes (legacy multilocus sequence typing [MLST]) corresponded to natural populations and could replace serotyping. However, this approach lacks the resolution needed for epidemiological tracing and the existence of natural populations had not been independently validated by independent criteria. Here, we describe EnteroBase, a web-based platform that assembles draft genomes from Illumina short reads in the public domain or that are uploaded by users. EnteroBase implements legacy MLST as well as ribosomal gene MLST (rMLST), core genome MLST (cgMLST), and whole genome MLST (wgMLST) and currently contains over 100,000 assembled genomes from Salmonella. It also provides graphical tools for visual interrogation of these genotypes and those based on core single nucleotide polymorphisms (SNPs). eBGs based on legacy MLST are largely consistent with eBGs based on rMLST, thus demonstrating that these correspond to natural populations. rMLST also facilitated the selection of representative genotypes for SNP analyses of the entire breadth of diversity within Salmonella. In contrast, cgMLST provides the resolution needed for epidemiological investigations. These observations show that genomic genotyping, with the assistance of EnteroBase, can be applied at all levels of diversity within the Salmonella genus.
Genome Sequence of the Historical Clinical Isolate Burkholderia pseudomallei PHLS 6
DOE Office of Scientific and Technical Information (OSTI.GOV)
D’haeseleer, Patrik; Johnson, Shannon L.; Davenport, Karen W.
We present the draft genome sequence ofBurkholderia pseudomalleiPHLS 6, a virulent clinical strain isolated from a melioidosis patient in Bangladesh in 1960. This draft genome consists of 39 contigs and is 7,322,181 bp long.
Genome Sequence of the Historical Clinical Isolate Burkholderia pseudomallei PHLS 6
D’haeseleer, Patrik; Johnson, Shannon L.; Davenport, Karen W.; ...
2016-06-30
We present the draft genome sequence ofBurkholderia pseudomalleiPHLS 6, a virulent clinical strain isolated from a melioidosis patient in Bangladesh in 1960. This draft genome consists of 39 contigs and is 7,322,181 bp long.
Kobayashi, Michie; Hiraka, Yukie; Abe, Akira; Yaegashi, Hiroki; Natsume, Satoshi; Kikuchi, Hideko; Takagi, Hiroki; Saitoh, Hiromasa; Win, Joe; Kamoun, Sophien; Terauchi, Ryohei
2017-11-22
Downy mildew, caused by the oomycete pathogen Sclerospora graminicola, is an economically important disease of Gramineae crops including foxtail millet (Setaria italica). Plants infected with S. graminicola are generally stunted and often undergo a transformation of flower organs into leaves (phyllody or witches' broom), resulting in serious yield loss. To establish the molecular basis of downy mildew disease in foxtail millet, we carried out whole-genome sequencing and an RNA-seq analysis of S. graminicola. Sequence reads were generated from S. graminicola using an Illumina sequencing platform and assembled de novo into a draft genome sequence comprising approximately 360 Mbp. Of this sequence, 73% comprised repetitive elements, and a total of 16,736 genes were predicted from the RNA-seq data. The predicted genes included those encoding effector-like proteins with high sequence similarity to those previously identified in other oomycete pathogens. Genes encoding jacalin-like lectin-domain-containing secreted proteins were enriched in S. graminicola compared to other oomycetes. Of a total of 1220 genes encoding putative secreted proteins, 91 significantly changed their expression levels during the infection of plant tissues compared to the sporangia and zoospore stages of the S. graminicola lifecycle. We established the draft genome sequence of a downy mildew pathogen that infects Gramineae plants. Based on this sequence and our transcriptome analysis, we generated a catalog of in planta-induced candidate effector genes, providing a solid foundation from which to identify the effectors causing phyllody.
Sun, Fengming; Fan, Guangyi; Hu, Qiong; Zhou, Yongming; Guan, Mei; Tong, Chaobo; Li, Jiana; Du, Dezhi; Qi, Cunkou; Jiang, Liangcai; Liu, Weiqing; Huang, Shunmou; Chen, Wenbin; Yu, Jingyin; Mei, Desheng; Meng, Jinling; Zeng, Peng; Shi, Jiaqin; Liu, Kede; Wang, Xi; Wang, Xinfa; Long, Yan; Liang, Xinming; Hu, Zhiyong; Huang, Guodong; Dong, Caihua; Zhang, He; Li, Jun; Zhang, Yaolei; Li, Liangwei; Shi, Chengcheng; Wang, Jiahao; Lee, Simon Ming-Yuen; Guan, Chunyun; Xu, Xun; Liu, Shengyi; Liu, Xin; Chalhoub, Boulos; Hua, Wei; Wang, Hanzhong
2017-11-01
Allotetraploid oilseed rape (Brassica napus L.) is an agriculturally important crop. Cultivation and breeding of B. napus by humans has resulted in numerous genetically diverse morphotypes with optimized agronomic traits and ecophysiological adaptation. To further understand the genetic basis of diversification and adaptation, we report a draft genome of an Asian semi-winter oilseed rape cultivar 'ZS11' and its comprehensive genomic comparison with the genomes of the winter-type cultivar 'Darmor-bzh' as well as two progenitors. The integrated BAC-to-BAC and whole-genome shotgun sequencing strategies were effective in the assembly of repetitive regions (especially young long terminal repeats) and resulted in a high-quality genome assembly of B. napus 'ZS11'. Within a short evolutionary period (~6700 years ago), semi-winter-type 'ZS11' and the winter-type 'Darmor-bzh' maintained highly genomic collinearity. Even so, certain genetic differences were also detected in two morphotypes. Relative to 'Darmor-bzh', both two subgenomes of 'ZS11' are closely related to its progenitors, and the 'ZS11' genome harbored several specific segmental homoeologous exchanges (HEs). Furthermore, the semi-winter-type 'ZS11' underwent potential genomic introgressions with B. rapa (A r ). Some of these genetic differences were associated with key agronomic traits. A key gene of A03.FLC3 regulating vernalization-responsive flowering time in 'ZS11' was first experienced HE, and then underwent genomic introgression event with A r , which potentially has led to genetic differences in controlling vernalization in the semi-winter types. Our observations improved our understanding of the genetic diversity of different B. napus morphotypes and the cultivation history of semi-winter oilseed rape in Asia. © 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.
Dwivedi, Vatsala; Sangwan, Naseer; Nigam, Aeshna; Garg, Nidhi; Niharika, Neha; Khurana, Paramjit; Khurana, Jitendra P.
2012-01-01
Thermus sp. strain RL was isolated from a hot water spring (90°C to 98°C) at Manikaran, Himachal Pradesh, India. Here we report the draft genome sequence (20,36,600 bp) of this strain. The draft genome sequence consists of 17 contigs and 1,986 protein-coding sequences and has an average G+C content of 68.77%. PMID:22689228
Neptune: a bioinformatics tool for rapid discovery of genomic variation in bacterial populations
Marinier, Eric; Zaheer, Rahat; Berry, Chrystal; Weedmark, Kelly A.; Domaratzki, Michael; Mabon, Philip; Knox, Natalie C.; Reimer, Aleisha R.; Graham, Morag R.; Chui, Linda; Patterson-Fortin, Laura; Zhang, Jian; Pagotto, Franco; Farber, Jeff; Mahony, Jim; Seyer, Karine; Bekal, Sadjia; Tremblay, Cécile; Isaac-Renton, Judy; Prystajecky, Natalie; Chen, Jessica; Slade, Peter
2017-01-01
Abstract The ready availability of vast amounts of genomic sequence data has created the need to rethink comparative genomics algorithms using ‘big data’ approaches. Neptune is an efficient system for rapidly locating differentially abundant genomic content in bacterial populations using an exact k-mer matching strategy, while accommodating k-mer mismatches. Neptune’s loci discovery process identifies sequences that are sufficiently common to a group of target sequences and sufficiently absent from non-targets using probabilistic models. Neptune uses parallel computing to efficiently identify and extract these loci from draft genome assemblies without requiring multiple sequence alignments or other computationally expensive comparative sequence analyses. Tests on simulated and real datasets showed that Neptune rapidly identifies regions that are both sensitive and specific. We demonstrate that this system can identify trait-specific loci from different bacterial lineages. Neptune is broadly applicable for comparative bacterial analyses, yet will particularly benefit pathogenomic applications, owing to efficient and sensitive discovery of differentially abundant genomic loci. The software is available for download at: http://github.com/phac-nml/neptune. PMID:29048594
Hargreaves, Katherine R; Otieno, James R; Thanki, Anisha; Blades, Matthew J; Millard, Andrew D; Browne, Hilary P; Lawley, Trevor D; Clokie, Martha R J
2015-05-27
The bacterium Clostridium difficile is a significant cause of nosocomial infections worldwide. The pathogenic success of this organism can be attributed to its flexible genome which is characterized by the exchange of mobile genetic elements, and by ongoing genome evolution. Despite its pathogenic status, C. difficile can also be carried asymptomatically, and has been isolated from natural environments such as water and sediments where multiple strain types (ribotypes) are found in close proximity. These include ribotypes which are associated with disease, as well as those that are less commonly isolated from patients. Little is known about the genomic content of strains in such reservoirs in the natural environment. In this study, draft genomes have been generated for 13 C. difficile isolates from estuarine sediments including clinically relevant and environmental associated types. To identify the genetic diversity within this strain collection, whole-genome comparisons were performed using the assemblies. The strains are highly genetically diverse with regards to the C. difficile "mobilome," which includes transposons and prophage elements. We identified a novel transposon-like element in two R078 isolates. Multiple, related and unrelated, prophages were detected in isolates across ribotype groups, including two novel prophage elements and those related to the transducing phage φC2. The susceptibility of these isolates to lytic phage infection was tested using a panel of characterized phages found from the same locality. In conclusion, estuarine sediments are a source of genetically diverse C. difficile strains with a complex network of prophages, which could contribute to the emergence of new strains in clinics. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Hargreaves, Katherine R.; Otieno, James R.; Thanki, Anisha; Blades, Matthew J.; Millard, Andrew D.; Browne, Hilary P.; Lawley, Trevor D.; Clokie, Martha R.J.
2015-01-01
The bacterium Clostridium difficile is a significant cause of nosocomial infections worldwide. The pathogenic success of this organism can be attributed to its flexible genome which is characterized by the exchange of mobile genetic elements, and by ongoing genome evolution. Despite its pathogenic status, C. difficile can also be carried asymptomatically, and has been isolated from natural environments such as water and sediments where multiple strain types (ribotypes) are found in close proximity. These include ribotypes which are associated with disease, as well as those that are less commonly isolated from patients. Little is known about the genomic content of strains in such reservoirs in the natural environment. In this study, draft genomes have been generated for 13 C. difficile isolates from estuarine sediments including clinically relevant and environmental associated types. To identify the genetic diversity within this strain collection, whole-genome comparisons were performed using the assemblies. The strains are highly genetically diverse with regards to the C. difficile “mobilome,” which includes transposons and prophage elements. We identified a novel transposon-like element in two R078 isolates. Multiple, related and unrelated, prophages were detected in isolates across ribotype groups, including two novel prophage elements and those related to the transducing phage φC2. The susceptibility of these isolates to lytic phage infection was tested using a panel of characterized phages found from the same locality. In conclusion, estuarine sediments are a source of genetically diverse C. difficile strains with a complex network of prophages, which could contribute to the emergence of new strains in clinics. PMID:26019165
Creating reference gene annotation for the mouse C57BL6/J genome assembly.
Mudge, Jonathan M; Harrow, Jennifer
2015-10-01
Annotation on the reference genome of the C57BL6/J mouse has been an ongoing project ever since the draft genome was first published. Initially, the principle focus was on the identification of all protein-coding genes, although today the importance of describing long non-coding RNAs, small RNAs, and pseudogenes is recognized. Here, we describe the progress of the GENCODE mouse annotation project, which combines manual annotation from the HAVANA group with Ensembl computational annotation, alongside experimental and in silico validation pipelines from other members of the consortium. We discuss the more recent incorporation of next-generation sequencing datasets into this workflow, including the usage of mass-spectrometry data to potentially identify novel protein-coding genes. Finally, we will outline how the C57BL6/J genebuild can be used to gain insights into the variant sites that distinguish different mouse strains and species.
Draft genome sequence of the D-Xylose-Fermenting yeast Spathaspora xylofermentans UFMG-HMD23.3
USDA-ARS?s Scientific Manuscript database
Here, we report the draft genome sequence of the yeast Spathaspora xylofermentans UFMG-HMD23.3 (CBMAI 1427=CBS 12681), a D-xylose fermenting yeast isolated from the Amazonian forest. The genome consists of 298 contigs, with a total size of 15.1 Mb, including the mitochondrial genome, and 5,948 predi...
USDA-ARS?s Scientific Manuscript database
The coffee berry borer, Hypothenemus hampei, is the most economically important insect pest of coffee worldwide, causing millions of dollars in yearly losses to coffee growers. We present the third genomic analysis for a Coleopteran species, a draft genome of female coffee berry borers. The genome s...
A draft genome sequence of “Candidatus Liberibacter asiaticus” from California, USA
USDA-ARS?s Scientific Manuscript database
The draft genome sequence of “Candidatus Liberibacter asiaticus” strain HHCA, collected from a lemon tree in California, USA, is reported. The HHCA strain has a genome size of 1,118,244 bp, with G+C content of 36.6%. The HHCA genome encodes 1,191 predicted open reading frames and 51 RNA genes....
Tian, Rui; Parker, Matthew; Seshadri, Rekha; ...
2015-05-16
Bradyrhizobium sp. Th.b2 is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen-fixing root nodule of Amphicarpaea bracteata collected in Johnson City, New York. Here we describe the features of Bradyrhizobium sp. Th.b2, together with high-quality permanent draft genome sequence information and annotation. The 10,118,060 high-quality draft genome is arranged in 266 scaffolds of 274 contigs, contains 9,809 protein-coding genes and 108 RNA-only encoding genes. In conclusion, this rhizobial genome was sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tian, Rui; Parker, Matthew; Seshadri, Rekha
Bradyrhizobium sp. Th.b2 is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen-fixing root nodule of Amphicarpaea bracteata collected in Johnson City, New York. Here we describe the features of Bradyrhizobium sp. Th.b2, together with high-quality permanent draft genome sequence information and annotation. The 10,118,060 high-quality draft genome is arranged in 266 scaffolds of 274 contigs, contains 9,809 protein-coding genes and 108 RNA-only encoding genes. In conclusion, this rhizobial genome was sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project.
Hua, Zheng-Shuang; Han, Yu-Jiao; Chen, Lin-Xing; Liu, Jun; Hu, Min; Li, Sheng-Jin; Kuang, Jia-Liang; Chain, Patrick S G; Huang, Li-Nan; Shu, Wen-Sheng
2015-06-01
High-throughput sequencing is expanding our knowledge of microbial diversity in the environment. Still, understanding the metabolic potentials and ecological roles of rare and uncultured microbes in natural communities remains a major challenge. To this end, we applied a 'divide and conquer' strategy that partitioned a massive metagenomic data set (>100 Gbp) into subsets based on K-mer frequency in sequence assembly to a low-diversity acid mine drainage (AMD) microbial community and, by integrating with an additional metatranscriptomic assembly, successfully obtained 11 draft genomes most of which represent yet uncultured and/or rare taxa (relative abundance <1%). We report the first genome of a naturally occurring Ferrovum population (relative abundance >90%) and its metabolic potentials and gene expression profile, providing initial molecular insights into the ecological role of these lesser known, but potentially important, microorganisms in the AMD environment. Gene transcriptional analysis of the active taxa revealed major metabolic capabilities executed in situ, including carbon- and nitrogen-related metabolisms associated with syntrophic interactions, iron and sulfur oxidation, which are key in energy conservation and AMD generation, and the mechanisms of adaptation and response to the environmental stresses (heavy metals, low pH and oxidative stress). Remarkably, nitrogen fixation and sulfur oxidation were performed by the rare taxa, indicating their critical roles in the overall functioning and assembly of the AMD community. Our study demonstrates the potential of the 'divide and conquer' strategy in high-throughput sequencing data assembly for genome reconstruction and functional partitioning analysis of both dominant and rare species in natural microbial assemblages.
Draft genome of bagasse-degrading bacteria Bacillus aryabhattai GZ03 from deep sea water.
Wen, Jian; Ren, Chong; Huang, Nan; Liu, Yang; Zeng, Runying
2015-02-01
Bacillus aryabhattai GZ03 was isolated from deep sea water of the South China Sea, which can produce glucose and fructose by degrading bagasse at 25 °C. Here we report the draft genome sequence of Bacillus aryabhattai GZ03. The data obtained revealed 37 contigs with genome size of 5,105,129 bp and G+C content of 38.09%. The draft genome of B. aryabhattai GZ03 may provide insights into the mechanism of microbial carbohydrate and lignocellulosic material degradation. Copyright © 2014 Elsevier B.V. All rights reserved.
Schwartz, John C; Gibson, Mark S; Heimeier, Dorothea; Koren, Sergey; Phillippy, Adam M; Bickhart, Derek M; Smith, Timothy P L; Medrano, Juan F; Hammond, John A
2017-04-01
Natural killer (NK) cells are a diverse population of lymphocytes with a range of biological roles including essential immune functions. NK cell diversity is in part created by the differential expression of cell surface receptors which modulate activation and function, including multiple subfamilies of C-type lectin receptors encoded within the NK complex (NKC). Little is known about the gene content of the NKC beyond rodent and primate lineages, other than it appears to be extremely variable between mammalian groups. We compared the NKC structure between mammalian species using new high-quality draft genome assemblies for cattle and goat; re-annotated sheep, pig, and horse genome assemblies; and the published human, rat, and mouse lemur NKC. The major NKC genes are largely in the equivalent positions in all eight species, with significant independent expansions and deletions between species, allowing us to propose a model for NKC evolution during mammalian radiation. The ruminant species, cattle and goats, have independently evolved a second KLRC locus flanked by KLRA and KLRJ, and a novel KLRH-like gene has acquired an activating tail. This novel gene has duplicated several times within cattle, while other activating receptor genes have been selectively disrupted. Targeted genome enrichment in cattle identified varying levels of allelic polymorphism between the NKC genes concentrated in the predicted extracellular ligand-binding domains. This novel recombination and allelic polymorphism is consistent with NKC evolution under balancing selection, suggesting that this diversity influences individual immune responses and may impact on differential outcomes of pathogen infection and vaccination.
Susanti, Dwi; Johnson, Eric F; Lapidus, Alla; Han, James; Reddy, T B K; Pilay, Manoj; Ivanova, Natalia N; Markowitz, Victor M; Woyke, Tanja; Kyrpides, Nikos C; Mukhopadhyay, Biswarup
2016-01-01
This report presents the permanent draft genome sequence of Desulfurococcus mobilis type strain DSM 2161, an obligate anaerobic hyperthermophilic crenarchaeon that was isolated from acidic hot springs in Hveravellir, Iceland. D. mobilis utilizes peptides as carbon and energy sources and reduces elemental sulfur to H2S. A metabolic construction derived from the draft genome identified putative pathways for peptide degradation and sulfur respiration in this archaeon. Existence of several hydrogenase genes in the genome supported previous findings that H2 is produced during the growth of D. mobilis in the absence of sulfur. Interestingly, genes encoding glucose transport and utilization systems also exist in the D. mobilis genome though this archaeon does not utilize carbohydrate for growth. The draft genome of D. mobilis provides an additional mean for comparative genomic analysis of desulfurococci. In addition, our analysis on the Average Nucleotide Identity between D. mobilis and Desulfurococcus mucosus suggested that these two desulfurococci are two different strains of the same species.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Susanti, Dwi; Johnson, Eric F.; Lapidus, Alla
Our report presents the permanent draft genome sequence of Desulfurococcus mobilis type strain DSM 2161, an obligate anaerobic hyperthermophilic crenarchaeon that was isolated from acidic hot springs in Hveravellir, Iceland. D. mobilis utilizes peptides as carbon and energy sources and reduces elemental sulfur to H 2S. A metabolic construction derived from the draft genome identified putative pathways for peptide degradation and sulfur respiration in this archaeon. Existence of several hydrogenase genes in the genome supported previous findings that H 2 is produced during the growth of D. mobilis in the absence of sulfur. Interestingly, genes encoding glucose transport and utilizationmore » systems also exist in the D. mobilis genome though this archaeon does not utilize carbohydrate for growth. The draft genome of D. mobilis provides an additional mean for comparative genomic analysis of desulfurococci. In addition, our analysis on the Average Nucleotide Identity between D. mobilis and Desulfurococcus mucosus suggested that these two desulfurococci are two different strains of the same species.« less
Susanti, Dwi; Johnson, Eric F.; Lapidus, Alla; ...
2016-01-13
Our report presents the permanent draft genome sequence of Desulfurococcus mobilis type strain DSM 2161, an obligate anaerobic hyperthermophilic crenarchaeon that was isolated from acidic hot springs in Hveravellir, Iceland. D. mobilis utilizes peptides as carbon and energy sources and reduces elemental sulfur to H 2S. A metabolic construction derived from the draft genome identified putative pathways for peptide degradation and sulfur respiration in this archaeon. Existence of several hydrogenase genes in the genome supported previous findings that H 2 is produced during the growth of D. mobilis in the absence of sulfur. Interestingly, genes encoding glucose transport and utilizationmore » systems also exist in the D. mobilis genome though this archaeon does not utilize carbohydrate for growth. The draft genome of D. mobilis provides an additional mean for comparative genomic analysis of desulfurococci. In addition, our analysis on the Average Nucleotide Identity between D. mobilis and Desulfurococcus mucosus suggested that these two desulfurococci are two different strains of the same species.« less
Draft genome sequences of Actinomyces timonensis strain 7400942T and its prophage.
Gorlas, Aurore; Gimenez, Grégory; Raoult, Didier; Roux, Véronique
2012-12-01
A draft genome sequence of Actinomyces timonensis, an anaerobic bacterium isolated from a human clinical osteoarticular sample, is described here. CRISPR-associated proteins, insertion sequence, and toxin-antitoxin loci were found on the genome. A new virus or provirus, AT-1, was characterized.
Venturia carpophila draft genome sequence
USDA-ARS?s Scientific Manuscript database
Venturia carpophila causes peach scab, a disease that renders peach fruit unmarketable. We report a high-quality draft genome sequence (36.9 Mb) of V. carpophila from an isolate collected from a peach tree in central Georgia in the United States. The genome sequence described will be a useful resour...
Draft genome sequence of rice orange leaf phytoplasma from Guangdong, China
USDA-ARS?s Scientific Manuscript database
The genome of rice orange leaf phytoplasma strain LD1 from Luoding City, Guangdong, P. R. China, was sequenced. The draft LD1genome is 599,264 bp with GC content of 28.2%, 647 predicted open reading frames and 33 RNA genes....
Seuylemezian, Arman; Vaishampayan, Parag; Cooper, Kerry
2018-01-01
ABSTRACT We report here the draft genome sequences of four strains isolated from spacecraft-associated surfaces exhibiting increased resistance to stressors such as UV radiation and exposure to H2O2. The draft genomes of strains 1P01SCT, FO-92T, 50v1, and 2P01AA had sizes of 5,500,894 bp, 4,699,376 bp, 3,174,402 bp, and 4,328,804 bp, respectively. PMID:29439046
Shur, K V; Zaychikova, M V; Mikheecheva, N E; Klimina, K M; Bekker, O B; Zhdanova, S N; Ogarkov, O B; Danilenko, V N
2016-12-01
We report a draft genome sequence of Mycobacterium tuberculosis strain B9741 belonging to Beijing B0/W lineage isolated from a HIV patient from Siberia, Russia. This clinical isolate showed MDR phenotype and resistance to isoniazid, rifampin, streptomycin and pyrazinamide. We analyzed SNPs associated with virulence and resistance. The draft genome sequence and annotation have been deposited at GenBank under the accession NZ_LVJJ00000000.
Draft Genome Sequence of Candida pseudohaemulonii Isolated from the Blood of a Neutropenic Patient.
Mohd Tap, Ratna; Kamarudin, Nur Amalina; Ginsapu, Stephanie Jane; Ahmed Bakri, Ahmed Rafezzan; Ahmad, Norazah; Amran, Fairuz; Sipiczki, Matthias
2018-04-05
Candida pseudohaemulonii is phylogenetically close to the C. haemulonii complex and exhibits resistance to amphotericin B and azole agents. We report here the draft genome sequence of C. pseudohaemulonii UZ153_17 isolated from the blood culture of a neutropenic patient. The draft genome is 3,532,003,666 bp in length, with 579,838 reads, 130 contigs, and a G+C content of 47.15%. Copyright © 2018 Mohd Tap et al.
Kanost, Michael R.; Arrese, Estela L.; Cao, Xiaolong; Chen, Yun-Ru; Chellapilla, Sanjay; Goldsmith, Marian R; Grosse-Wilde, Ewald; Heckel, David G.; Herndon, Nicolae; Jiang, Haobo; Papanicolaou, Alexie; Qu, Jiaxin; Soulages, Jose L.; Vogel, Heiko; Walters, James; Waterhouse, Robert M.; Ahn, Seung-Joon; Almeida, Francisca C.; An, Chunju; Aqrawi, Peshtewani; Bretschneider, Anne; Bryant, William B.; Bucks, Sascha; Chao, Hsu; Chevignon, Germain; Christen, Jayne M.; Clarke, David F.; Dittmer, Neal T.; Ferguson, Laura C.F.; Garavelou, Spyridoula; Gordon, Karl H.J.; Gunaratna, Ramesh T.; Han, Yi; Hauser, Frank; He, Yan; Heidel-Fischer, Hanna; Hirsh, Ariana; Hu, Yingxia; Jiang, Hongbo; Kalra, Divya; Klinner, Christian; König, Christopher; Kovar, Christie; Kroll, Ashley R.; Kuwar, Suyog S.; Lee, Sandy L.; Lehman, Rüdiger; Li, Kai; Li, Zhaofei; Liang, Hanquan; Lovelace, Shanna; Lu, Zhiqiang; Mansfield, Jennifer H.; McCulloch, Kyle J.; Mathew, Tittu; Morton, Brian; Muzny, Donna M.; Neunemann, David; Ongeri, Fiona; Pauchet, Yannick; Pu, Ling-Ling; Pyrousis, Ioannis; Rao, Xiang-Jun; Redding, Amanda; Roesel, Charles; Sanchez-Gracia, Alejandro; Schaack, Sarah; Shukla, Aditi; Tetreau, Guillaume; Wang, Yang; Xiong, Guang-Hua; Traut, Walther; Walsh, Tom K.; Worley, Kim C.; Wu, Di; Wu, Wenbi; Wu, Yuan-Qing; Zhang, Xiufeng; Zou, Zhen; Zucker, Hannah; Briscoe, Adriana D.; Burmester, Thorsten; Clem, Rollie J.; Feyereisen, René; Grimmelikhuijzen, Cornelis J.P; Hamodrakas, Stavros J.; Hansson, Bill S.; Huguet, Elisabeth; Jermiin, Lars S.; Lan, Que; Lehman, Herman K.; Lorenzen, Marce; Merzendorfer, Hans; Michalopoulos, Ioannis; Morton, David B.; Muthukrishnan, Subbaratnam; Oakeshott, John G.; Palmer, Will; Park, Yoonseong; Passarelli, A. Lorena; Rozas, Julio; Schwartz, Lawrence M.; Smith, Wendy; Southgate, Agnes; Vilcinskas, Andreas; Vogt, Richard; Wang, Ping; Werren, John; Yu, Xiao-Qiang; Zhou, Jing-Jiang; Brown, Susan J.; Scherer, Steven E.; Richards, Stephen; Blissard, Gary W.
2016-01-01
Manduca sexta, known as the tobacco hornworm or Carolina sphinx moth, is a lepidopteran insect that is used extensively as a model system for research in insect biochemistry, physiology, neurobiology, development, and immunity. One important benefit of this species as an experimental model is its extremely large size, reaching more than 10 g in the larval stage. M. sexta larvae feed on solanaceous plants and thus must tolerate a substantial challenge from plant allelochemicals, including nicotine. We report the sequence and annotation of the M. sexta genome, and a survey of gene expression in various tissues and developmental stages. The Msex_1.0 genome assembly resulted in a total genome size of 419.4 Mbp. Repetitive sequences accounted for 25.8% of the assembled genome. The official gene set is comprised of 15,451 protein-coding genes, of which 2498 were manually curated. Extensive RNA-seq data from many tissues and developmental stages were used to improve gene models and for insights into gene expression patterns. Genome wide synteny analysis indicated a high level of macrosynteny in the Lepidoptera. Annotation and analyses were carried out for gene families involved in a wide spectrum of biological processes, including apoptosis, vacuole sorting, growth and development, structures of exoskeleton, egg shells, and muscle, vision, chemosensation, ion channels, signal transduction, neuropeptide signaling, neurotransmitter synthesis and transport, nicotine tolerance, lipid metabolism, and immunity. This genome sequence, annotation, and analysis provide an important new resource from a well-studied model insect species and will facilitate further biochemical and mechanistic experimental studies of many biological systems in insects. PMID:27522922
Kawakami, Takeshi; Backström, Niclas; Burri, Reto; Husby, Arild; Olason, Pall; Rice, Amber M; Ålund, Murielle; Qvarnström, Anna; Ellegren, Hans
2014-01-01
With the access to draft genome sequence assemblies and whole-genome resequencing data from population samples, molecular ecology studies will be able to take truly genome-wide approaches. This now applies to an avian model system in ecological and evolutionary research: Old World flycatchers of the genus Ficedula, for which we recently obtained a 1.1 Gb collared flycatcher genome assembly and identified 13 million single-nucleotide polymorphism (SNP)s in population resequencing of this species and its sister species, pied flycatcher. Here, we developed a custom 50K Illumina iSelect flycatcher SNP array with markers covering 30 autosomes and the Z chromosome. Using a number of selection criteria for inclusion in the array, both genotyping success rate and polymorphism information content (mean marker heterozygosity = 0.41) were high. We used the array to assess linkage disequilibrium (LD) and hybridization in flycatchers. Linkage disequilibrium declined quickly to the background level at an average distance of 17 kb, but the extent of LD varied markedly within the genome and was more than 10-fold higher in ‘genomic islands’ of differentiation than in the rest of the genome. Genetic ancestry analysis identified 33 F1 hybrids but no later-generation hybrids from sympatric populations of collared flycatchers and pied flycatchers, contradicting earlier reports of backcrosses identified from much fewer number of markers. With an estimated divergence time as recently as <1 Ma, this suggests strong selection against F1 hybrids and unusually rapid evolution of reproductive incompatibility in an avian system. PMID:24784959
A catalogue of 136 microbial draft genomes from Red Sea metagenomes
NASA Astrophysics Data System (ADS)
Haroon, Mohamed F.; Thompson, Luke R.; Parks, Donovan H.; Hugenholtz, Philip; Stingl, Ulrich
2016-07-01
Earth is expected to continue warming and the Red Sea is a model environment for understanding the effects of global warming on ocean microbiomes due to its unusually high temperature, salinity and solar irradiance. However, most microbial diversity analyses of the Red Sea have been limited to cultured representatives and single marker gene analyses, hence neglecting the substantial uncultured majority. Here, we report 136 microbial genomes (completion minus contamination is ≥50%) assembled from 45 metagenomes from eight stations spanning the Red Sea and taken from multiple depths between 10 to 500 m. Phylogenomic analysis showed that most of the retrieved genomes belong to seven different phyla of known marine microbes, but more than half representing currently uncultured species. The open-access data presented here is the largest number of Red Sea representative microbial genomes reported in a single study and will help facilitate future studies in understanding the physiology of these microorganisms and how they have adapted to the relatively harsh conditions of the Red Sea.
A catalogue of 136 microbial draft genomes from Red Sea metagenomes.
Haroon, Mohamed F; Thompson, Luke R; Parks, Donovan H; Hugenholtz, Philip; Stingl, Ulrich
2016-07-05
Earth is expected to continue warming and the Red Sea is a model environment for understanding the effects of global warming on ocean microbiomes due to its unusually high temperature, salinity and solar irradiance. However, most microbial diversity analyses of the Red Sea have been limited to cultured representatives and single marker gene analyses, hence neglecting the substantial uncultured majority. Here, we report 136 microbial genomes (completion minus contamination is ≥50%) assembled from 45 metagenomes from eight stations spanning the Red Sea and taken from multiple depths between 10 to 500 m. Phylogenomic analysis showed that most of the retrieved genomes belong to seven different phyla of known marine microbes, but more than half representing currently uncultured species. The open-access data presented here is the largest number of Red Sea representative microbial genomes reported in a single study and will help facilitate future studies in understanding the physiology of these microorganisms and how they have adapted to the relatively harsh conditions of the Red Sea.
Damerau, Malte; Matschiner, Michael; Jentoft, Sissel
2017-01-01
Recent developments in the field of genomics have provided new and powerful insights into population structure and dynamics that are essential for the conservation of biological diversity. As a commercially highly valuable species, the yellowfin tuna (Thunnus albacares) is intensely exploited throughout its distribution in tropical oceans around the world, and is currently classified as near threatened. However, conservation efforts for this species have so far been hampered by limited knowledge of its population structure, due to incongruent results of previous investigations. Here, we use whole-genome sequencing in concert with a draft genome assembly to decipher the global population structure of the yellowfin tuna, and to investigate its demographic history. We detect significant differentiation of Atlantic and Indo-Pacific yellowfin tuna populations as well as the possibility of a third diverged yellowfin tuna group in the Arabian Sea. We further observe evidence for past population expansion as well as asymmetric gene flow from the Indo-Pacific to the Atlantic. PMID:28419285
Finkel, Omri M; Delmont, Tom O; Post, Anton F; Belkin, Shimshon
2016-05-01
The leaves of Tamarix aphylla, a globally distributed, salt-secreting desert tree, are dotted with alkaline droplets of high salinity. To successfully inhabit these organic carbon-rich droplets, bacteria need to be adapted to multiple stress factors, including high salinity, high alkalinity, high UV radiation, and periodic desiccation. To identify genes that are important for survival in this harsh habitat, microbial community DNA was extracted from the leaf surfaces of 10 Tamarix aphylla trees along a 350-km longitudinal gradient. Shotgun metagenomic sequencing, contig assembly, and binning yielded 17 genome bins, six of which were >80% complete. These genomic bins, representing three phyla (Proteobacteria,Bacteroidetes, and Firmicutes), were closely related to halophilic and alkaliphilic taxa isolated from aquatic and soil environments. Comparison of these genomic bins to the genomes of their closest relatives revealed functional traits characteristic of bacterial populations inhabiting the Tamarix phyllosphere, independent of their taxonomic affiliation. These functions, most notably light-sensing genes, are postulated to represent important adaptations toward colonization of this habitat. Plant leaves are an extensive and diverse microbial habitat, forming the main interface between solar energy and the terrestrial biosphere. There are hundreds of thousands of plant species in the world, exhibiting a wide range of morphologies, leaf surface chemistries, and ecological ranges. In order to understand the core adaptations of microorganisms to this habitat, it is important to diversify the type of leaves that are studied. This study provides an analysis of the genomic content of the most abundant bacterial inhabitants of the globally distributed, salt-secreting desert tree Tamarix aphylla Draft genomes of these bacteria were assembled, using the culture-independent technique of assembly and binning of metagenomic data. Analysis of the genomes reveals traits that are important for survival in this habitat, most notably, light-sensing and light utilization genes. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Draft Genome Sequence of Aspergillus oryzae ATCC 12892
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deng, Shuang; Pomraning, Kyle R.; Bohutskyi, Pavlo
The draft genome sequence ofAspergillus oryzaeATCC 12892 is presented here.A. oryzaeproduces 3-nitropropionic acid, which has been investigated with regard to understanding the biosynthesis of nitroorganic compounds.
Draft Genome Sequence of Mycobacterium asiaticum Strain DSM 44297.
Croce, Olivier; Robert, Catherine; Raoult, Didier; Drancourt, Michel
2014-04-17
We report the draft genome sequence of Mycobacterium asiaticum strain DSM 44297, a tropical mycobacterium seldom responsible for human infection. The genome of M. asiaticum has a size of 5,935,986 bp, with a 66.03% G+C content, encoding 5,591 proteins and 81 RNAs.
Ross, Daniel E.; Gulliver, Djuna
2016-10-06
The draft genome sequence ofPseudomonas stutzeristrain K35 was separated from a metagenome derived from a produced water microbial community of a coalbed methane well. The genome encodes a complete nitrogen fixation pathway and the upper and lower naphthalene degradation pathways.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ross, Daniel E.; Gulliver, Djuna
The draft genome sequence ofPseudomonas stutzeristrain K35 was separated from a metagenome derived from a produced water microbial community of a coalbed methane well. The genome encodes a complete nitrogen fixation pathway and the upper and lower naphthalene degradation pathways.
Takagi, Misako; Nakano, Akiyo; Toh, Hidehiro; Oshima, Kenshiro; Arakawa, Kensuke; Nakajima, Fumihiko; Tashiro, Kosuke; Kikusui, Tekefumi; Yanagida, Fujitoshi; Morita, Hidetoshi
2016-01-14
Streptococcus orisasini SH06 was isolated from a healthy thoroughbred gastrointestinal tract. Here, we report the draft genome sequence of this organism. This paper is the first published report of the genomic sequence of S. orisasini. Copyright © 2016 Takagi et al.
Draft Genome Sequence of Lactobacillus panis DSM 6035T, First Isolated from Sourdough
Zhu, Yixin; Fang, Daiqiong; Shi, Ding; Li, Ang; Lv, Longxian; Yan, Ren; Yao, Jian; Hua, Dasong; Hu, Xinjun; Guo, Feifei; Wu, Wenrui; Guo, Jing; Chen, Yanfei; Jiang, Xiawei; Chen, Xiaoxiao
2015-01-01
We report a draft genome sequence of Lactobacillus panis DSM 6035T, isolated from sourdough. The genome of this strain is 2,082,789 bp long, with 47.9% G+C content. A total of 2,047 protein-coding genes were predicted. PMID:26205855
Draft genome sequence of Xylella fastidiosa pear leaf scorch strain in Taiwan
USDA-ARS?s Scientific Manuscript database
The draft genome sequence of Xylella fastidiosa pear leaf scorch strain (PLS229) isolated from pear cultivar Hengshan (Pyrus pyrifolia) in Taiwan is reported. The bacterium has a genome size of 2,733,013 bp with a G+C content of 53.1%. The PLS229 strain genome was annotated to have 3,259 open readin...
The aquatic animals' transcriptome resource for comparative functional analysis.
Chou, Chih-Hung; Huang, Hsi-Yuan; Huang, Wei-Chih; Hsu, Sheng-Da; Hsiao, Chung-Der; Liu, Chia-Yu; Chen, Yu-Hung; Liu, Yu-Chen; Huang, Wei-Yun; Lee, Meng-Lin; Chen, Yi-Chang; Huang, Hsien-Da
2018-05-09
Aquatic animals have great economic and ecological importance. Among them, non-model organisms have been studied regarding eco-toxicity, stress biology, and environmental adaptation. Due to recent advances in next-generation sequencing techniques, large amounts of RNA-seq data for aquatic animals are publicly available. However, currently there is no comprehensive resource exist for the analysis, unification, and integration of these datasets. This study utilizes computational approaches to build a new resource of transcriptomic maps for aquatic animals. This aquatic animal transcriptome map database dbATM provides de novo assembly of transcriptome, gene annotation and comparative analysis of more than twenty aquatic organisms without draft genome. To improve the assembly quality, three computational tools (Trinity, Oases and SOAPdenovo-Trans) were employed to enhance individual transcriptome assembly, and CAP3 and CD-HIT-EST software were then used to merge these three assembled transcriptomes. In addition, functional annotation analysis provides valuable clues to gene characteristics, including full-length transcript coding regions, conserved domains, gene ontology and KEGG pathways. Furthermore, all aquatic animal genes are essential for comparative genomics tasks such as constructing homologous gene groups and blast databases and phylogenetic analysis. In conclusion, we establish a resource for non model organism aquatic animals, which is great economic and ecological importance and provide transcriptomic information including functional annotation and comparative transcriptome analysis. The database is now publically accessible through the URL http://dbATM.mbc.nctu.edu.tw/ .
Tian, Rui; Parker, Matthew; Seshadri, Rekha; ...
2015-05-17
Bradyrhizobiumsp. Tv2a.2 is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen-fixing root nodule of Tachigali versicolor collected in Barro Colorado Island of Panama. Here we describe the features of Bradyrhizobiumsp. Tv2a.2, together with high-quality permanent draft genome sequence information and annotation. The 8,496,279 bp high-quality draft genome is arranged in 87 scaffolds of 87 contigs, contains 8,109 protein-coding genes and 72 RNA-only encoding genes. In conclusion, this rhizobial genome was sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project.
Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu
2015-01-01
The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Brown, Steven D.; Utturkar, Sagar M.; Magnuson, Timothy S.; ...
2014-09-04
Pelosinus fermentans strain R7 was isolated from Russian kaolin clays as the type strain and it can reduce Fe(III) during fermentative growth (1). Draft genome sequences for P. fermentans R7 and four strains from Hanford, Washington, USA, have been published (2–4). The P. fermentans 16S rRNA sequence dominated the lactate-based enrichment cultures from three geochemically contrasting soils from the Melton Branch Watershed, Oak Ridge, Tennessee, USA (5) and also at another stimulated, uraniumcontaminated field site near Oak Ridge (6). For the current work, strain UFO1 was isolated from pristine sediments at a background field site in Oak Ridge and characterizedmore » as facilitating U(VI) reduction and precipitation with phosphate (7).« less
Shearman, Jeremy R.; Sangsrakru, Duangjai; Jomchai, Nukoon; Ruang-areerate, Panthita; Sonthirod, Chutima; Naktang, Chaiwat; Theerawattanasuk, Kanikar; Tragoonrung, Somvong; Tangphatsornruang, Sithichoke
2015-01-01
Hevea brasiliensis, or rubber tree, is an important crop species that accounts for the majority of natural latex production. The rubber tree nuclear genome consists of 18 chromosomes and is roughly 2.15 Gb. The current rubber tree reference genome assembly consists of 1,150,326 scaffolds ranging from 200 to 531,465 bp and totalling 1.1 Gb. Only 143 scaffolds, totalling 7.6 Mb, have been placed into linkage groups. We have performed RNA-seq on 6 varieties of rubber tree to identify SNPs and InDels and used this information to perform target sequence enrichment and high throughput sequencing to genotype a set of SNPs in 149 rubber tree offspring from a cross between RRIM 600 and RRII 105 rubber tree varieties. We used this information to generate a linkage map allowing for the anchoring of 24,424 contigs from 3,009 scaffolds, totalling 115 Mb or 10.4% of the published sequence, into 18 linkage groups. Each linkage group contains between 319 and 1367 SNPs, or 60 to 194 non-redundant marker positions, and ranges from 156 to 336 cM in length. This linkage map includes 20,143 of the 69,300 predicted genes from rubber tree and will be useful for mapping studies and improving the reference genome assembly. PMID:25831195
Shearman, Jeremy R; Sangsrakru, Duangjai; Jomchai, Nukoon; Ruang-Areerate, Panthita; Sonthirod, Chutima; Naktang, Chaiwat; Theerawattanasuk, Kanikar; Tragoonrung, Somvong; Tangphatsornruang, Sithichoke
2015-01-01
Hevea brasiliensis, or rubber tree, is an important crop species that accounts for the majority of natural latex production. The rubber tree nuclear genome consists of 18 chromosomes and is roughly 2.15 Gb. The current rubber tree reference genome assembly consists of 1,150,326 scaffolds ranging from 200 to 531,465 bp and totalling 1.1 Gb. Only 143 scaffolds, totalling 7.6 Mb, have been placed into linkage groups. We have performed RNA-seq on 6 varieties of rubber tree to identify SNPs and InDels and used this information to perform target sequence enrichment and high throughput sequencing to genotype a set of SNPs in 149 rubber tree offspring from a cross between RRIM 600 and RRII 105 rubber tree varieties. We used this information to generate a linkage map allowing for the anchoring of 24,424 contigs from 3,009 scaffolds, totalling 115 Mb or 10.4% of the published sequence, into 18 linkage groups. Each linkage group contains between 319 and 1367 SNPs, or 60 to 194 non-redundant marker positions, and ranges from 156 to 336 cM in length. This linkage map includes 20,143 of the 69,300 predicted genes from rubber tree and will be useful for mapping studies and improving the reference genome assembly.
Hand, Brian K.; Hether, Tyler D; Kovach, Ryan P.; Muhlfeld, Clint C.; Amish, Stephen J.; Boyer, Matthew C.; O’Rourke, Sean M.; Miller, Michael R.; Lowe, Winsor H.; Hohenlohe, Paul A.; Luikart, Gordon
2015-01-01
Invasive hybridization and introgression pose a serious threat to the persistence of many native species. Understanding the effects of hybridization on native populations (e.g., fitness consequences) requires numerous species-diagnostic loci distributed genome-wide. Here we used RAD sequencing to discover thousands of single-nucleotide polymorphisms (SNPs) that are diagnostic between rainbow trout (RBT, Oncorhynchus mykiss), the world’s most widely introduced fish, and native westslope cutthroat trout (WCT, O. clarkii lewisi) in the northern Rocky Mountains, USA. We advanced previous work that identified 4,914 species-diagnostic loci by using longer sequence reads (100 bp vs. 60 bp) and a larger set of individuals (n = 84). We sequenced RAD libraries for individuals from diverse sampling sources, including native populations of WCT and hatchery broodstocks of WCT and RBT. We also took advantage of a newly released reference genome assembly for RBT to align our RAD loci. In total, we discovered 16,788 putatively diagnostic SNPs, 10,267 of which we mapped to anchored chromosome locations on the RBT genome. A small portion of previously discovered putative diagnostic loci (325 of 4,914) were no longer diagnostic (i.e., fixed between species) based on our wider survey of non-hybridized RBT and WCT individuals. Our study suggests that RAD loci mapped to a draft genome assembly could provide the marker density required to identify genes and chromosomal regions influencing selection in admixed populations of conservation concern and evolutionary interest.