USDA-ARS?s Scientific Manuscript database
Assembled sequence contigs by SOAPdenova and Volvet algorithms from metagenomic short reads of a new bacterial isolate of gut origin. This study included 2 submissions with a total of 9.8 million bp of assembled contigs....
BAC sequencing using pooled methods.
Saski, Christopher A; Feltus, F Alex; Parida, Laxmi; Haiminen, Niina
2015-01-01
Shotgun sequencing and assembly of a large, complex genome can be both expensive and challenging to accurately reconstruct the true genome sequence. Repetitive DNA arrays, paralogous sequences, polyploidy, and heterozygosity are main factors that plague de novo genome sequencing projects that typically result in highly fragmented assemblies and are difficult to extract biological meaning. Targeted, sub-genomic sequencing offers complexity reduction by removing distal segments of the genome and a systematic mechanism for exploring prioritized genomic content through BAC sequencing. If one isolates and sequences the genome fraction that encodes the relevant biological information, then it is possible to reduce overall sequencing costs and efforts that target a genomic segment. This chapter describes the sub-genome assembly protocol for an organism based upon a BAC tiling path derived from a genome-scale physical map or from fine mapping using BACs to target sub-genomic regions. Methods that are described include BAC isolation and mapping, DNA sequencing, and sequence assembly.
Sequencing of individual chromosomes of plant pathogenic Fusarium oxysporum.
Kashiwa, Takeshi; Kozaki, Toshinori; Ishii, Kazuo; Turgeon, B Gillian; Teraoka, Tohru; Komatsu, Ken; Arie, Tsutomu
2017-01-01
A small chromosome in reference isolate 4287 of F. oxysporum f. sp. lycopersici (Fol) has been designated as a 'pathogenicity chromosome' because it carries several pathogenicity related genes such as the Secreted In Xylem (SIX) genes. Sequence assembly of small chromosomes in other isolates, based on a reference genome template, is difficult because of karyotype variation among isolates and a high number of sequences associated with transposable elements. These factors often result in misassembly of sequences, making it unclear whether other isolates possess the same pathogenicity chromosome harboring SIX genes as in the reference isolate. To overcome this difficulty, single chromosome sequencing after Contour-clamped Homogeneous Electric Field (CHEF) separation of chromosomes was performed, followed by de novo assembly of sequences. The assembled sequences of individual chromosomes were consistent with results of probing gels of CHEF separated chromosomes with SIX genes. Individual chromosome sequencing revealed that several SIX genes are located on a single small chromosome in two pathogenic forms of F. oxysporum, beyond the reference isolate 4287, and in the cabbage yellows fungus F. oxysporum f. sp. conglutinans. The particular combination of SIX genes on each small chromosome varied. Moreover, not all SIX genes were found on small chromosomes; depending on the isolate, some were on big chromosomes. This suggests that recombination of chromosomes and/or translocation of SIX genes may occur frequently. Our method improves sequence comparison of small chromosomes among isolates. Copyright © 2016 Elsevier Inc. All rights reserved.
Seuylemezian, Arman; Cooper, Kerry; Schubert, Wayne
2018-01-01
ABSTRACT Spore-forming microorganisms are of concern for forward contamination because they can survive harsh interplanetary travel. Here, we report the draft genome sequences of 12 spore-forming strains isolated from the Manned Spacecraft Operations Building (MSOB) and the Vehicle Assembly Building (VAB) in Cape Canaveral, FL, where the Viking spacecraft were assembled. PMID:29567731
Seuylemezian, Arman; Cooper, Kerry; Schubert, Wayne; Vaishampayan, Parag
2018-03-22
Spore-forming microorganisms are of concern for forward contamination because they can survive harsh interplanetary travel. Here, we report the draft genome sequences of 12 spore-forming strains isolated from the Manned Spacecraft Operations Building (MSOB) and the Vehicle Assembly Building (VAB) in Cape Canaveral, FL, where the Viking spacecraft were assembled. Copyright © 2018 Seuylemezian et al.
2009-01-01
Background Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. Results We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. Conclusion We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The results of the present work provide important new information about the structure and content of conifer genomic DNA that will guide future efforts to sequence and assemble conifer genomes. PMID:19656416
Hamberger, Björn; Hall, Dawn; Yuen, Mack; Oddy, Claire; Hamberger, Britta; Keeling, Christopher I; Ritland, Carol; Ritland, Kermit; Bohlmann, Jörg
2009-08-06
Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The results of the present work provide important new information about the structure and content of conifer genomic DNA that will guide future efforts to sequence and assemble conifer genomes.
Istace, Benjamin; Friedrich, Anne; d'Agata, Léo; Faye, Sébastien; Payen, Emilie; Beluche, Odette; Caradec, Claudia; Davidas, Sabrina; Cruaud, Corinne; Liti, Gianni; Lemainque, Arnaud; Engelen, Stefan; Wincker, Patrick; Schacherer, Joseph; Aury, Jean-Marc
2017-02-01
Oxford Nanopore Technologies Ltd (Oxford, UK) have recently commercialized MinION, a small single-molecule nanopore sequencer, that offers the possibility of sequencing long DNA fragments from small genomes in a matter of seconds. The Oxford Nanopore technology is truly disruptive; it has the potential to revolutionize genomic applications due to its portability, low cost, and ease of use compared with existing long reads sequencing technologies. The MinION sequencer enables the rapid sequencing of small eukaryotic genomes, such as the yeast genome. Combined with existing assembler algorithms, near complete genome assemblies can be generated and comprehensive population genomic analyses can be performed. Here, we resequenced the genome of the Saccharomyces cerevisiae S288C strain to evaluate the performance of nanopore-only assemblers. Then we de novo sequenced and assembled the genomes of 21 isolates representative of the S. cerevisiae genetic diversity using the MinION platform. The contiguity of our assemblies was 14 times higher than the Illumina-only assemblies and we obtained one or two long contigs for 65 % of the chromosomes. This high contiguity allowed us to accurately detect large structural variations across the 21 studied genomes. Because of the high completeness of the nanopore assemblies, we were able to produce a complete cartography of transposable elements insertions and inspect structural variants that are generally missed using a short-read sequencing strategy. Our analyses show that the Oxford Nanopore technology is already usable for de novo sequencing and assembly; however, non-random errors in homopolymers require polishing the consensus using an alternate sequencing technology. © The Author 2017. Published by Oxford University Press.
Istace, Benjamin; Friedrich, Anne; d'Agata, Léo; Faye, Sébastien; Payen, Emilie; Beluche, Odette; Caradec, Claudia; Davidas, Sabrina; Cruaud, Corinne; Liti, Gianni; Lemainque, Arnaud; Engelen, Stefan; Wincker, Patrick; Schacherer, Joseph
2017-01-01
Abstract Background: Oxford Nanopore Technologies Ltd (Oxford, UK) have recently commercialized MinION, a small single-molecule nanopore sequencer, that offers the possibility of sequencing long DNA fragments from small genomes in a matter of seconds. The Oxford Nanopore technology is truly disruptive; it has the potential to revolutionize genomic applications due to its portability, low cost, and ease of use compared with existing long reads sequencing technologies. The MinION sequencer enables the rapid sequencing of small eukaryotic genomes, such as the yeast genome. Combined with existing assembler algorithms, near complete genome assemblies can be generated and comprehensive population genomic analyses can be performed. Results: Here, we resequenced the genome of the Saccharomyces cerevisiae S288C strain to evaluate the performance of nanopore-only assemblers. Then we de novo sequenced and assembled the genomes of 21 isolates representative of the S. cerevisiae genetic diversity using the MinION platform. The contiguity of our assemblies was 14 times higher than the Illumina-only assemblies and we obtained one or two long contigs for 65 % of the chromosomes. This high contiguity allowed us to accurately detect large structural variations across the 21 studied genomes. Conclusion: Because of the high completeness of the nanopore assemblies, we were able to produce a complete cartography of transposable elements insertions and inspect structural variants that are generally missed using a short-read sequencing strategy. Our analyses show that the Oxford Nanopore technology is already usable for de novo sequencing and assembly; however, non-random errors in homopolymers require polishing the consensus using an alternate sequencing technology. PMID:28369459
Single-cell isolation by a modular single-cell pipette for RNA-sequencing.
Zhang, Kai; Gao, Min; Chong, Zechen; Li, Ying; Han, Xin; Chen, Rui; Qin, Lidong
2016-11-29
Single-cell transcriptome sequencing highly requires a convenient and reliable method to rapidly isolate a live cell into a specific container such as a PCR tube. Here, we report a modular single-cell pipette (mSCP) consisting of three modular components, a SCP-Tip, an air-displacement pipette (ADP), and ADP-Tips, that can be easily assembled, disassembled, and reassembled. By assembling the SCP-Tip containing a hydrodynamic trap, the mSCP can isolate single cells from 5-10 cells per μL of cell suspension. The mSCP is compatible with microscopic identification of captured single cells to finally achieve 100% single-cell isolation efficiency. The isolated live single cells are in submicroliter volumes and well suitable for single-cell PCR analysis and RNA-sequencing. The mSCP possesses merits of convenience, rapidness, and high efficiency, making it a powerful tool to isolate single cells for transcriptome analysis.
Reliable Detection of Herpes Simplex Virus Sequence Variation by High-Throughput Resequencing.
Morse, Alison M; Calabro, Kaitlyn R; Fear, Justin M; Bloom, David C; McIntyre, Lauren M
2017-08-16
High-throughput sequencing (HTS) has resulted in data for a number of herpes simplex virus (HSV) laboratory strains and clinical isolates. The knowledge of these sequences has been critical for investigating viral pathogenicity. However, the assembly of complete herpesviral genomes, including HSV, is complicated due to the existence of large repeat regions and arrays of smaller reiterated sequences that are commonly found in these genomes. In addition, the inherent genetic variation in populations of isolates for viruses and other microorganisms presents an additional challenge to many existing HTS sequence assembly pipelines. Here, we evaluate two approaches for the identification of genetic variants in HSV1 strains using Illumina short read sequencing data. The first, a reference-based approach, identifies variants from reads aligned to a reference sequence and the second, a de novo assembly approach, identifies variants from reads aligned to de novo assembled consensus sequences. Of critical importance for both approaches is the reduction in the number of low complexity regions through the construction of a non-redundant reference genome. We compared variants identified in the two methods. Our results indicate that approximately 85% of variants are identified regardless of the approach. The reference-based approach to variant discovery captures an additional 15% representing variants divergent from the HSV1 reference possibly due to viral passage. Reference-based approaches are significantly less labor-intensive and identify variants across the genome where de novo assembly-based approaches are limited to regions where contigs have been successfully assembled. In addition, regions of poor quality assembly can lead to false variant identification in de novo consensus sequences. For viruses with a well-assembled reference genome, a reference-based approach is recommended.
High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic.
Sealfon, Rachel; Gire, Stephen; Ellis, Crystal; Calderwood, Stephen; Qadri, Firdausi; Hensley, Lisa; Kellis, Manolis; Ryan, Edward T; LaRocque, Regina C; Harris, Jason B; Sabeti, Pardis C
2012-09-11
Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x); four of the seven isolates were previously sequenced. Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961), 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.
Labbé, Geneviève; Ziebell, Kim; Bekal, Sadjia; Parmley, E. Jane; Agunos, Agnes; Desruisseau, Andrea; Daignault, Danielle; Slavic, Durda; Hoang, Linda; Ramsay, Danielle; Pollari, Frank; Robertson, James; Nash, John H. E.
2016-01-01
Salmonella enterica subsp. enterica serovar Heidelberg is a highly clonal serovar frequently associated with foodborne illness. To facilitate subtyping efforts, we report fully assembled genome sequences of 17 Canadian S. Heidelberg isolates including six pairs of epidemiologically related strains. The plasmid sequences of eight isolates contain several drug resistance genes. PMID:27635008
Labbé, Geneviève; Ziebell, Kim; Bekal, Sadjia; Macdonald, Kimberley A; Parmley, E Jane; Agunos, Agnes; Desruisseau, Andrea; Daignault, Danielle; Slavic, Durda; Hoang, Linda; Ramsay, Danielle; Pollari, Frank; Robertson, James; Nash, John H E; Johnson, Roger P
2016-09-15
Salmonella enterica subsp. enterica serovar Heidelberg is a highly clonal serovar frequently associated with foodborne illness. To facilitate subtyping efforts, we report fully assembled genome sequences of 17 Canadian S Heidelberg isolates including six pairs of epidemiologically related strains. The plasmid sequences of eight isolates contain several drug resistance genes. © Crown copyright 2016.
Assembling in Sequence: A Saleable Work Skill. Occupation Simulation Packet. Grades 3rd-4th.
ERIC Educational Resources Information Center
Hueston, Jean
This teacher's guide for grades 3 and 4 contains simulated work experiences for students using the isolated skill concept - assembling in sequence. Teacher instructions include objectives, evaluation, and sequence of activities. The guide contains pre-tests and post-tests with instructions and answer keys. Three pre-skill activities are suggested,…
Genome assembly reborn: recent computational challenges
2009-01-01
Research into genome assembly algorithms has experienced a resurgence due to new challenges created by the development of next generation sequencing technologies. Several genome assemblers have been published in recent years specifically targeted at the new sequence data; however, the ever-changing technological landscape leads to the need for continued research. In addition, the low cost of next generation sequencing data has led to an increased use of sequencing in new settings. For example, the new field of metagenomics relies on large-scale sequencing of entire microbial communities instead of isolate genomes, leading to new computational challenges. In this article, we outline the major algorithmic approaches for genome assembly and describe recent developments in this domain. PMID:19482960
Harnessing Whole Genome Sequencing in Medical Mycology.
Cuomo, Christina A
2017-01-01
Comparative genome sequencing studies of human fungal pathogens enable identification of genes and variants associated with virulence and drug resistance. This review describes current approaches, resources, and advances in applying whole genome sequencing to study clinically important fungal pathogens. Genomes for some important fungal pathogens were only recently assembled, revealing gene family expansions in many species and extreme gene loss in one obligate species. The scale and scope of species sequenced is rapidly expanding, leveraging technological advances to assemble and annotate genomes with higher precision. By using iteratively improved reference assemblies or those generated de novo for new species, recent studies have compared the sequence of isolates representing populations or clinical cohorts. Whole genome approaches provide the resolution necessary for comparison of closely related isolates, for example, in the analysis of outbreaks or sampled across time within a single host. Genomic analysis of fungal pathogens has enabled both basic research and diagnostic studies. The increased scale of sequencing can be applied across populations, and new metagenomic methods allow direct analysis of complex samples.
Finished genome assembly of warm spring isolate Francisella novicida DPG 3A-IS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, Shannon L.; Minogue, Timothy D.; Daligault, Hajnalka E.
2015-09-17
We sequenced the complete genome of Francisella novicida DPG 3A-IS to closed and finished status. This is a warm spring isolate recovered from Hobo Warm Spring (Utah, USA). The last assembly is available in NCBI under accession number CP012037.
Ortiz, Elio M.; Berretta, Marcelo F.; Benintende, Graciela B.; Zandomeni, Rubén O.
2015-01-01
Geobacillus sp. isolate T6 was collected from a thermal spring in Salta, Argentina. The draft genome sequence (3,767,773 bp) of this isolate is represented by one major scaffold of 3,46 Mbp, a second one of 207 kbp, and 20 scaffolds of <13 kbp. The assembled sequences revealed 3,919 protein-coding genes. PMID:26184933
Genome assemblies for 11 Yersinia pestis strains isolated in the Caucasus region
Zhgenti, Ekaterine; Johnson, Shannon L.; Davenport, Karen W.; ...
2015-09-17
Yersinia pestis, the causative agent of plague, is endemic to the Caucasus region but few reference strain genome sequences from that region are available. We present the improved draft or finished assembled genomes from 11 strains isolated in the nation of Georgia and surrounding countries.
Singh, Nitin Kumar; Blachowicz, Adriana; Checinska, Aleksandra; Wang, Clay; Venkateswaran, Kasthuri
2016-07-14
Draft genome sequences of Aspergillus fumigatus strains (ISSFT-021 and IF1SW-F4), opportunistic pathogens isolated from the International Space Station (ISS), were assembled to facilitate investigations of the nature of the virulence characteristics of the ISS strains to other clinical strains isolated on Earth. Copyright © 2016 Singh et al.
Deep Sequencing Analysis of Apple Infecting Viruses in Korea
Cho, In-Sook; Igori, Davaajargal; Lim, Seungmo; Choi, Gug-Seoun; Hammond, John; Lim, Hyoun-Sub; Moon, Jae Sun
2016-01-01
Deep sequencing has generated 52 contigs derived from five viruses; Apple chlorotic leaf spot virus (ACLSV), Apple stem grooving virus (ASGV), Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), and Apricot latent virus (ApLV) were identified from eight apple samples showing small leaves and/or growth retardation. Nucleotide (nt) sequence identity of the assembled contigs was from 68% to 99% compared to the reference sequences of the five respective viral genomes. Sequences of ASPV and ASGV were the most abundantly represented by the 52 contigs assembled. The presence of the five viruses in the samples was confirmed by RT-PCR using specific primers based on the sequences of each assembled contig. All five viruses were detected in three of the samples, whereas all samples had mixed infections with at least two viruses. The most frequently detected virus was ASPV, followed by ASGV, ApLV, ACLSV, and AGCaV which were withal found in mixed infections in the tested samples. AGCaV was identified in assembled contigs ID 1012480 and 93549, which showed 82% and 78% nt sequence identity with ORF1 of AGCaV isolate Aurora-1. ApLV was identified in three assembled contigs, ID 65587, 1802365, and 116777, which showed 77%, 78%, and 76% nt sequence identity respectively with ORF1 of ApLV isolate LA2. Deep sequencing assay was shown to be a valuable and powerful tool for detection and identification of known and unknown virome in infected apple trees, here identifying ApLV and AGCaV in commercial orchards in Korea for the first time. PMID:27721694
Salazar, Joelle K; Gonsalves, Lauren J; Schill, Kristin M; Sanchez Leon, Maria; Anderson, Nathan; Keller, Susanne E
2018-06-07
The genome of Listeria monocytogenes strain DFPST0073, isolated from imported fresh Mexican soft cheese in 2003, was sequenced using the Illumina MiSeq platform. Reads were assembled using SPAdes, and genome annotation was performed using the NCBI Prokaryotic Genome Annotation Pipeline.
Are commercial providers a viable option for clinical bacterial sequencing?
Raven, Kathy; Blane, Beth; Churcher, Carol; Parkhill, Julian; Peacock, Sharon J
2018-04-05
Bacterial whole-genome sequencing in the clinical setting has the potential to bring major improvements to infection control and clinical practice. Sequencing instruments are not currently available in the majority of routine microbiology laboratories worldwide, but an alternative is to use external sequencing providers. To foster discussion around this we investigated whether send-out services were a viable option. Four providers offering MiSeq sequencing were selected based on cost and evaluated based on the service provided and sequence data quality. DNA was prepared from five methicillin-resistant Staphylococcus aureus (MRSA) isolates, four of which were investigated during a previously published outbreak in the UK together with a reference MRSA isolate (ST22 HO 5096 0412). Cost of sequencing per isolate ranged from £155 to £342 and turnaround times from DNA postage to arrival of sequence data ranged from 12 to 63 days. Comparison of commercially generated genomes against the original sequence data demonstrated very high concordance, with no more than one single nucleotide polymorphism (SNP) difference on core genome mapping between the original sequences and the new sequence for all four providers. Multilocus sequence type could not be assigned based on assembly for the two cheapest sequence providers due to fragmented assemblies probably caused by a lower output of sequence data per isolate. Our results indicate that external providers returned highly accurate genome data, but that improvements are required in turnaround time to make this a viable option for use in clinical practice.
O'Hair, Joshua A.; Li, Hui; Thapa, Santosh; Scholz, Matthew B.
2017-01-01
ABSTRACT Novel cellulolytic microorganisms can potentially influence second-generation biofuel production. This paper reports the draft genome sequence of Bacillus licheniformis strain YNP1-TSU, isolated from hydrothermal-vegetative microbiomes inside Yellowstone National Park. The assembled sequence contigs predicted 4,230 coding genes, 66 tRNAs, and 10 rRNAs through automated annotation. PMID:28254968
Jang, Hyein; Addy, Nicole; Ewing, Laura; Jean-Gilles Beaubrun, Junia; Lee, YouYoung; Woo, JungHa; Negrete, Flavia; Finkelstein, Samantha; Tall, Ben D; Lehner, Angelika; Eshwar, Athmanya; Gopinath, Gopal R
2018-04-12
Here, we present draft genome sequences of 29 Cronobacter sakazakii isolates obtained from foods of plant origin and dried-food manufacturing facilities. Assemblies and annotations resulted in genome sizes ranging from 4.3 to 4.5 Mb and 3,977 to 4,256 gene-coding sequences with G+C contents of ∼57.0%.
Use of simulated data sets to evaluate the fidelity of metagenomic processing methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mavromatis, K; Ivanova, N; Barry, Kerrie
2007-01-01
Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based ( blast hit distribution) and twomore » sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less
Use of simulated data sets to evaluate the fidelity of Metagenomicprocessing methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mavromatis, Konstantinos; Ivanova, Natalia; Barry, Kerri
2006-12-01
Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity--based (blast hit distribution) and twomore » sequence composition--based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less
First genome report on novel sequence types of Neisseria meningitidis: ST12777 and ST12778.
Veeraraghavan, Balaji; Lal, Binesh; Devanga Ragupathi, Naveen Kumar; Neeravi, Iyyan Raj; Jeyaraman, Ranjith; Varghese, Rosemol; Paul, Miracle Magdalene; Baskaran, Ashtawarthani; Ranjan, Ranjini
2018-03-01
Neisseria meningitidis is an important causative agent of meningitis and/or sepsis with high morbidity and mortality. Baseline genome data on N. meningitidis, especially from developing countries such as India, are lacking. This study aimed to investigate the whole genome sequences of N. meningitidis isolates from a tertiary care centre in India. Whole-genome sequencing was performed using an Ion Torrent™ Personal Genome Machine™ (PGM) with 400-bp chemistry. Data were assembled de novo using SPAdes Genome Assembler v.5.0.0.0. Sequence annotation was performed through PATRIC, RAST and the NCBI PGAAP server. Downstream analysis of the isolates was performed using the Center for Genomic Epidemiology databases for antimicrobial resistance genes and sequence types. Virulence factors and CRISPR were analysed using the PubMLST database and CRISPRFinder, respectively. This study reports the whole genome shotgun sequences of eight N. meningitidis isolates from bloodstream infections. The genome data revealed two novel sequence types (ST12777 and ST12778), along with ST11, ST437 and ST6928. The virulence profile of the isolates matched their sequence types. All isolates were negative for plasmid-mediated resistance genes. To the best of our knowledge, this is the first report of ST11 and ST437 N. meningitidis isolates in India along with two novel sequence types (ST12777 and ST12778). These results indicate that the sequence types circulating in India are diverse and require continuous monitoring. Further studies strengthening the genome data on N. meningitidis are required to understand the prevalence, spread, exact resistance and virulence mechanisms along with serotypes. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Riveros-Mckay, Fernando; Campos, Itzia; Giles-Gómez, Martha; Bolívar, Francisco
2014-01-01
Leuconostoc mesenteroides P45 was isolated from the traditional Mexican pulque beverage. We report its draft genome sequence, assembled in 6 contigs consisting of 1,874,188 bp and no plasmids. Genome annotation predicted a total of 1,800 genes, 1,687 coding sequences, 52 pseudogenes, 9 rRNAs, 51 tRNAs, 1 noncoding RNA, and 44 frameshifted genes. PMID:25377708
Addy, Nicole; Ewing, Laura; Jean-Gilles Beaubrun, Junia; Lee, YouYoung; Woo, JungHa; Negrete, Flavia; Finkelstein, Samantha; Tall, Ben D.; Lehner, Angelika; Eshwar, Athmanya; Gopinath, Gopal R.
2018-01-01
ABSTRACT Here, we present draft genome sequences of 29 Cronobacter sakazakii isolates obtained from foods of plant origin and dried-food manufacturing facilities. Assemblies and annotations resulted in genome sizes ranging from 4.3 to 4.5 Mb and 3,977 to 4,256 gene-coding sequences with G+C contents of ∼57.0%. PMID:29650569
Establishing gene models from the Pinus pinaster genome using gene capture and BAC sequencing.
Seoane-Zonjic, Pedro; Cañas, Rafael A; Bautista, Rocío; Gómez-Maldonado, Josefa; Arrillaga, Isabel; Fernández-Pozo, Noé; Claros, M Gonzalo; Cánovas, Francisco M; Ávila, Concepción
2016-02-27
In the era of DNA throughput sequencing, assembling and understanding gymnosperm mega-genomes remains a challenge. Although drafts of three conifer genomes have recently been published, this number is too low to understand the full complexity of conifer genomes. Using techniques focused on specific genes, gene models can be established that can aid in the assembly of gene-rich regions, and this information can be used to compare genomes and understand functional evolution. In this study, gene capture technology combined with BAC isolation and sequencing was used as an experimental approach to establish de novo gene structures without a reference genome. Probes were designed for 866 maritime pine transcripts to sequence genes captured from genomic DNA. The gene models were constructed using GeneAssembler, a new bioinformatic pipeline, which reconstructed over 82% of the gene structures, and a high proportion (85%) of the captured gene models contained sequences from the promoter regulatory region. In a parallel experiment, the P. pinaster BAC library was screened to isolate clones containing genes whose cDNA sequence were already available. BAC clones containing the asparagine synthetase, sucrose synthase and xyloglucan endotransglycosylase gene sequences were isolated and used in this study. The gene models derived from the gene capture approach were compared with the genomic sequences derived from the BAC clones. This combined approach is a particularly efficient way to capture the genomic structures of gene families with a small number of members. The experimental approach used in this study is a valuable combined technique to study genomic gene structures in species for which a reference genome is unavailable. It can be used to establish exon/intron boundaries in unknown gene structures, to reconstruct incomplete genes and to obtain promoter sequences that can be used for transcriptional studies. A bioinformatics algorithm (GeneAssembler) is also provided as a Ruby gem for this class of analyses.
Yoshida, Catherine E; Kruczkiewicz, Peter; Laing, Chad R; Lingohr, Erika J; Gannon, Victor P J; Nash, John H E; Taboada, Eduardo N
2016-01-01
For nearly 100 years serotyping has been the gold standard for the identification of Salmonella serovars. Despite the increasing adoption of DNA-based subtyping approaches, serotype information remains a cornerstone in food safety and public health activities aimed at reducing the burden of salmonellosis. At the same time, recent advances in whole-genome sequencing (WGS) promise to revolutionize our ability to perform advanced pathogen characterization in support of improved source attribution and outbreak analysis. We present the Salmonella In Silico Typing Resource (SISTR), a bioinformatics platform for rapidly performing simultaneous in silico analyses for several leading subtyping methods on draft Salmonella genome assemblies. In addition to performing serovar prediction by genoserotyping, this resource integrates sequence-based typing analyses for: Multi-Locus Sequence Typing (MLST), ribosomal MLST (rMLST), and core genome MLST (cgMLST). We show how phylogenetic context from cgMLST analysis can supplement the genoserotyping analysis and increase the accuracy of in silico serovar prediction to over 94.6% on a dataset comprised of 4,188 finished genomes and WGS draft assemblies. In addition to allowing analysis of user-uploaded whole-genome assemblies, the SISTR platform incorporates a database comprising over 4,000 publicly available genomes, allowing users to place their isolates in a broader phylogenetic and epidemiological context. The resource incorporates several metadata driven visualizations to examine the phylogenetic, geospatial and temporal distribution of genome-sequenced isolates. As sequencing of Salmonella isolates at public health laboratories around the world becomes increasingly common, rapid in silico analysis of minimally processed draft genome assemblies provides a powerful approach for molecular epidemiology in support of public health investigations. Moreover, this type of integrated analysis using multiple sequence-based methods of sub-typing allows for continuity with historical serotyping data as we transition towards the increasing adoption of genomic analyses in epidemiology. The SISTR platform is freely available on the web at https://lfz.corefacility.ca/sistr-app/.
Li, Qili; Bu, Junyan; Yu, Zhihe; Tang, Lihua; Huang, Suiping; Guo, Tangxun; Mo, Jianyou; Hsiang, Tom
2018-02-22
Here, we present a draft genome sequence of isolate 15060 of Colletotrichum fructicola , a causal agent of mango anthracnose. The final assembly consists of 1,048 scaffolds totaling 56,493,063 bp (G+C content, 53.38%) and 15,180 predicted genes. Copyright © 2018 Li et al.
Carroll, Laura M.; Miller, Rachel A.; Wiedmann, Martin
2017-01-01
ABSTRACT The Bacillus cereus group comprises nine species, several of which are pathogenic. Differentiating between isolates that may cause disease and those that do not is a matter of public health and economic importance, but it can be particularly challenging due to the high genomic similarity within the group. To this end, we have developed BTyper, a computational tool that employs a combination of (i) virulence gene-based typing, (ii) multilocus sequence typing (MLST), (iii) panC clade typing, and (iv) rpoB allelic typing to rapidly classify B. cereus group isolates using nucleotide sequencing data. BTyper was applied to a set of 662 B. cereus group genome assemblies to (i) identify anthrax-associated genes in non-B. anthracis members of the B. cereus group, and (ii) identify assemblies from B. cereus group strains with emetic potential. With BTyper, the anthrax toxin genes cya, lef, and pagA were detected in 8 genomes classified by the NCBI as B. cereus that clustered into two distinct groups using k-medoids clustering, while either the B. anthracis poly-γ-d-glutamate capsule biosynthesis genes capABCDE or the hyaluronic acid capsule hasA gene was detected in an additional 16 assemblies classified as either B. cereus or Bacillus thuringiensis isolated from clinical, environmental, and food sources. The emetic toxin genes cesABCD were detected in 24 assemblies belonging to panC clades III and VI that had been isolated from food, clinical, and environmental settings. The command line version of BTyper is available at https://github.com/lmc297/BTyper. In addition, BMiner, a companion application for analyzing multiple BTyper output files in aggregate, can be found at https://github.com/lmc297/BMiner. IMPORTANCE Bacillus cereus is a foodborne pathogen that is estimated to cause tens of thousands of illnesses each year in the United States alone. Even with molecular methods, it can be difficult to distinguish nonpathogenic B. cereus group isolates from their pathogenic counterparts, including the human pathogen Bacillus anthracis, which is responsible for anthrax, as well as the insect pathogen B. thuringiensis. By using the variety of typing schemes employed by BTyper, users can rapidly classify, characterize, and assess the virulence potential of any isolate using its nucleotide sequencing data. PMID:28625989
Tamariz, Jesus; Llanos, Carlos; Seas, Carlos; Montenegro, Paola; Lagos, Jose; Fernandes, Miriam R; Cerdeira, Louise; Lincopan, Nilton
2018-03-29
We present here the draft genome sequence of the first New Delhi metallo-β-lactamase (NDM-1)-producing Escherichia coli strain, belonging to sequence type 155 (ST155), isolated in Peru. Assembly of this draft genome resulted in 5,061,184 bp, revealing a clinically significant resistome for β-lactams, aminoglycosides, tetracyclines, phenicols, sulfonamides, trimethoprim, and fluoroquinolones. Copyright © 2018 Tamariz et al.
Xu, Youqiang; Liu, Yang; Yao, Su; Li, Jinxia; Cheng, Chi
2014-08-28
Noni is a plant reported to have nutritional and therapeutic properties. Paenibacillus polymyxa CICC 10580 is a strain that was isolated from the fruit of noni and showed comprehensive antagonistic activity against many pathogens. Its genome was sequenced and assembled (6.10 Mb). The coding sequences (CDSs) correlated with antagonistic activity were annotated. Copyright © 2014 Xu et al.
Radford, Devon R; Leon-Velarde, Carlos G; Chen, Shu; Hamidi Oskouei, Amir M; Balamurugan, Sampathkumar
2018-03-29
The genomes of two strains of Salmonella enterica subsp. enterica serovar Cubana and serovar Muenchen, isolated from dry hazelnuts and chia seeds, respectively, were sequenced using the Illumina MiSeq platform, assembled de novo using the overlap-layout-consensus method, and aligned to their respective most identical sequence genome scaffolds using MUMMER and BLAST searches. Copyright © 2018 Radford et al.
Draft Genome Sequence of Lactobacillus paracasei DmW181, a Bacterium Isolated from Wild Drosophila.
Hammer, Austin J; Walters, Amber; Carroll, Courtney; Newell, Peter D; Chaston, John M
2017-07-06
The draft genome sequence of Lactobacillus paracasei DmW181, an anaerobic bacterium isolate from wild Drosophila flies, is reported here. Strain DmW181 possesses genes for sialic acid and mannose metabolism. The assembled genome is 3,201,429 bp, with 3,454 predicted genes. Copyright © 2017 Hammer et al.
Johnson, Shannon Lyn; Khiani, A.; Bishop-Lilly, K. A.; ...
2015-05-14
We report the completed genome sequences for two non-O1/non-O139 Vibrio cholerae isolates. Each isolate has only a single chromosome, as opposed to the normal paradigm of two chromosomes found in all other V. cholerae isolates.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, Shannon Lyn; Khiani, A.; Bishop-Lilly, K. A.
We report the completed genome sequences for two non-O1/non-O139 Vibrio cholerae isolates. Each isolate has only a single chromosome, as opposed to the normal paradigm of two chromosomes found in all other V. cholerae isolates.
Riveros-Mckay, Fernando; Campos, Itzia; Giles-Gómez, Martha; Bolívar, Francisco; Escalante, Adelfo
2014-11-06
Leuconostoc mesenteroides P45 was isolated from the traditional Mexican pulque beverage. We report its draft genome sequence, assembled in 6 contigs consisting of 1,874,188 bp and no plasmids. Genome annotation predicted a total of 1,800 genes, 1,687 coding sequences, 52 pseudogenes, 9 rRNAs, 51 tRNAs, 1 noncoding RNA, and 44 frameshifted genes. Copyright © 2014 Riveros-Mckay et al.
USDA-ARS?s Scientific Manuscript database
We report the complete genome sequence of Clavibacter michiganensis subsp. insidiosus R1-1 isolated in Minnesota, USA. The R1-1 genome, generated by de novo assembly of PacBio sequencing data, is the first complete genome sequence available for this subspecies....
Near complete genome sequence of Clostridium paradoxum strain JW-YL-7
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lancaster, Andrew; Utturkar, Sagar M.; Poole, Farris
2016-05-05
Clostridium paradoxum strain JW-YL-7 is a moderately thermophilic anaerobic alkaliphile isolated from the municipal sewage treatment plant in Athens, GA. We report the near-complete genome sequence of C. paradoxum strain JW-YL-7 obtained by using PacBio DNA sequencing and Pilon for sequence assembly refinement with Illumina data.
Nanthini, Jayaram; Chia, Kim-Hou; Thottathil, Gincy P; Taylor, Todd D; Kondo, Shinji; Najimudin, Nazalan; Baybayan, Primo; Singh, Siddharth; Sudesh, Kumar
2015-11-20
Streptomyces sp. strain CFMR 7, which naturally degrades rubber, was isolated from a rubber plantation. Whole genome sequencing and assembly resulted in 2 contigs with total genome size of 8.248 Mb. Two latex clearing protein (lcp) genes which are responsible for rubber degrading activities were identified. Copyright © 2015 Elsevier B.V. All rights reserved.
Dichosa, Armand E. K.; Davenport, Karen W.; Li, Po-E; ...
2015-03-19
In this study, we report here the genome sequence of Thauera sp. strain SWB20, isolated from a Singaporean wastewater treatment facility using gel microdroplets (GMDs) and single-cell genomics (SCG). This approach provided a single clonal microcolony that was sufficient to obtain a 4.9-Mbp genome assembly of an ecologically relevant Thauera species.
Hyson, Peter; Shapiro, Joshua A; Wien, Michelle W
2015-10-08
Exiguobacterium sp. strain BMC-KP was isolated as part of a student environmental sampling project at Bryn Mawr College, PA. Sequencing of bacterial DNA assembled a 3.32-Mb draft genome. Analysis suggests the presence of genes for tolerance to cold and toxic metals, broad carbohydrate metabolism, and genes derived from phage. Copyright © 2015 Hyson et al.
Draft Genome Sequence of Lactobacillus plantarum Strain IPLA 88
Ladero, Victor; Alvarez-Sieiro, Patricia; Redruello, Begoña; del Rio, Beatriz; Linares, Daniel M.; Martin, M. Cruz; Fernández, María
2013-01-01
Here, we report a 3.2-Mbp draft assembly for the genome of Lactobacillus plantarum IPLA 88. The sequence of this sourdough isolate provides insight into the adaptation of this versatile species to different environments. PMID:23887921
Genome Sequence of an Endophytic Fungus, Fusarium solani JS-169, Which Has Antifungal Activity.
Kim, Jung A; Jeon, Jongbum; Park, Sook-Young; Kim, Ki-Tae; Choi, Gobong; Lee, Hyun-Jung; Kim, Yangsun; Yang, Hee-Sun; Yeo, Joo-Hong; Lee, Yong-Hwan; Kim, Soonok
2017-10-19
An endophytic fungus, Fusarium solani strain JS-169, isolated from a mulberry twig, showed considerable antifungal activity. Here, we report the draft genome sequence of this strain. The assembly comprises 17 scaffolds, with an N 50 value of 4.93 Mb. The assembled genome was 45,813,297 bp in length, with a G+C content of 49.91%. Copyright © 2017 Kim et al.
Complete genome sequence of a novel genotype of squash mosaic virus
USDA-ARS?s Scientific Manuscript database
Complete genome sequence of a novel genotype of Squash mosaic virus (SqMV) infecting squash plants in Spain was obtained using deep sequencing of small ribonucleic acids and assembly. The low nucleotide sequence identities, with 87-88% on RNA1 and 84-86% on RNA2 to known SqMV isolates, suggest a new...
Lu, You; Samac, Deborah A.; Glazebrook, Jane
2015-01-01
We report here the complete genome sequence of Clavibacter michiganensis subsp. insidiosus R1-1, isolated in Minnesota, USA. The R1-1 genome, generated by a de novo assembly of PacBio sequencing data, is the first complete genome sequence available for this subspecies. PMID:25953184
Sen, Diya; Chandrababunaidu, Mathu Malar; Singh, Deeksha; Sanghi, Neha; Ghorai, Arpita; Mishra, Gyan Prakash; Madduluri, Madhavi
2015-01-01
We report here the draft genome sequence of Scytonema millei VB511283, a cyanobacterium isolated from biofilms on the exterior of stone monuments in Santiniketan, eastern India. The draft genome is 11,627,246 bp long (11.63 Mb), with 118 scaffolds. About 9,011 protein-coding genes, 117 tRNAs, and 12 rRNAs are predicted from this assembly. PMID:25744984
Miller, Marisa E.; Zhang, Ying; Omidvar, Vahid; Sperschneider, Jana; Raley, Castle; Palmer, Jonathan M.; Garnica, Diana; Upadhyaya, Narayana; Rathjen, John; Taylor, Jennifer M.; Park, Robert F.; Dodds, Peter N.; Hirsch, Cory D.
2018-01-01
ABSTRACT Oat crown rust, caused by the fungus Pucinnia coronata f. sp. avenae, is a devastating disease that impacts worldwide oat production. For much of its life cycle, P. coronata f. sp. avenae is dikaryotic, with two separate haploid nuclei that may vary in virulence genotype, highlighting the importance of understanding haplotype diversity in this species. We generated highly contiguous de novo genome assemblies of two P. coronata f. sp. avenae isolates, 12SD80 and 12NC29, from long-read sequences. In total, we assembled 603 primary contigs for 12SD80, for a total assembly length of 99.16 Mbp, and 777 primary contigs for 12NC29, for a total length of 105.25 Mbp; approximately 52% of each genome was assembled into alternate haplotypes. This revealed structural variation between haplotypes in each isolate equivalent to more than 2% of the genome size, in addition to about 260,000 and 380,000 heterozygous single-nucleotide polymorphisms in 12SD80 and 12NC29, respectively. Transcript-based annotation identified 26,796 and 28,801 coding sequences for isolates 12SD80 and 12NC29, respectively, including about 7,000 allele pairs in haplotype-phased regions. Furthermore, expression profiling revealed clusters of coexpressed secreted effector candidates, and the majority of orthologous effectors between isolates showed conservation of expression patterns. However, a small subset of orthologs showed divergence in expression, which may contribute to differences in virulence between 12SD80 and 12NC29. This study provides the first haplotype-phased reference genome for a dikaryotic rust fungus as a foundation for future studies into virulence mechanisms in P. coronata f. sp. avenae. PMID:29463655
Lu, You; Samac, Deborah A; Glazebrook, Jane; Ishimaru, Carol A
2015-05-07
We report here the complete genome sequence of Clavibacter michiganensis subsp. insidiosus R1-1, isolated in Minnesota, USA. The R1-1 genome, generated by a de novo assembly of PacBio sequencing data, is the first complete genome sequence available for this subspecies. Copyright © 2015 Lu et al.
Sen, Diya; Chandrababunaidu, Mathu Malar; Singh, Deeksha; Sanghi, Neha; Ghorai, Arpita; Mishra, Gyan Prakash; Madduluri, Madhavi; Adhikary, Siba Prasad; Tripathy, Sucheta
2015-03-05
We report here the draft genome sequence of Scytonema millei VB511283, a cyanobacterium isolated from biofilms on the exterior of stone monuments in Santiniketan, eastern India. The draft genome is 11,627,246 bp long (11.63 Mb), with 118 scaffolds. About 9,011 protein-coding genes, 117 tRNAs, and 12 rRNAs are predicted from this assembly. Copyright © 2015 Sen et al.
Auffret, Pauline; Segura, Audrey; Klopp, Christophe; Bouchez, Olivier; Kérourédan, Monique; Bibbal, Delphine; Brugère, Hubert; Forano, Evelyne
2017-01-01
ABSTRACT Enterohemorrhagic Escherichia coli (EHEC) with serotype O157:H7 is a major foodborne pathogen. Here, we report the draft genome sequence of EHEC O157:H7 strain MC2 isolated from cattle in France. The assembly contains 5,400,376 bp that encoded 5,914 predicted genes (5,805 protein-encoding genes and 109 RNA genes). PMID:28983004
Arenavirus Coinfections Are Common in Snakes with Boid Inclusion Body Disease.
Hepojoki, J; Salmenperä, P; Sironen, T; Hetzel, U; Korzyukov, Y; Kipar, A; Vapalahti, O
2015-08-01
Recently, novel arenaviruses were found in snakes with boid inclusion body disease (BIBD); these form the new genus Reptarenavirus within the family Arenaviridae. We used next-generation sequencing and de novo sequence assembly to investigate reptarenavirus isolates from our previous study. Four of the six isolates and all of the samples from snakes with BIBD contained at least two reptarenavirus species. The viruses sequenced comprise four novel reptarenavirus species and a representative of a new arenavirus genus. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Cristancho, Marco A.; Botero-Rozo, David Octavio; Giraldo, William; Tabima, Javier; Riaño-Pachón, Diego Mauricio; Escobar, Carolina; Rozo, Yomara; Rivera, Luis F.; Durán, Andrés; Restrepo, Silvia; Eilam, Tamar; Anikster, Yehoshua; Gaitán, Alvaro L.
2014-01-01
Coffee leaf rust caused by the fungus Hemileia vastatrix is the most damaging disease to coffee worldwide. The pathogen has recently appeared in multiple outbreaks in coffee producing countries resulting in significant yield losses and increases in costs related to its control. New races/isolates are constantly emerging as evidenced by the presence of the fungus in plants that were previously resistant. Genomic studies are opening new avenues for the study of the evolution of pathogens, the detailed description of plant-pathogen interactions and the development of molecular techniques for the identification of individual isolates. For this purpose we sequenced 8 different H. vastatrix isolates using NGS technologies and gathered partial genome assemblies due to the large repetitive content in the coffee rust hybrid genome; 74.4% of the assembled contigs harbor repetitive sequences. A hybrid assembly of 333 Mb was built based on the 8 isolates; this assembly was used for subsequent analyses. Analysis of the conserved gene space showed that the hybrid H. vastatrix genome, though highly fragmented, had a satisfactory level of completion with 91.94% of core protein-coding orthologous genes present. RNA-Seq from urediniospores was used to guide the de novo annotation of the H. vastatrix gene complement. In total, 14,445 genes organized in 3921 families were uncovered; a considerable proportion of the predicted proteins (73.8%) were homologous to other Pucciniales species genomes. Several gene families related to the fungal lifestyle were identified, particularly 483 predicted secreted proteins that represent candidate effector genes and will provide interesting hints to decipher virulence in the coffee rust fungus. The genome sequence of Hva will serve as a template to understand the molecular mechanisms used by this fungus to attack the coffee plant, to study the diversity of this species and for the development of molecular markers to distinguish races/isolates. PMID:25400655
Febrer, Melanie; Goicoechea, Jose Luis; Wright, Jonathan; McKenzie, Neil; Song, Xiang; Lin, Jinke; Collura, Kristi; Wissotski, Marina; Yu, Yeisoo; Ammiraju, Jetty S. S.; Wolny, Elzbieta; Idziak, Dominika; Betekhtin, Alexander; Kudrna, Dave; Hasterok, Robert; Wing, Rod A.; Bevan, Michael W.
2010-01-01
The pooid subfamily of grasses includes some of the most important crop, forage and turf species, such as wheat, barley and Lolium. Developing genomic resources, such as whole-genome physical maps, for analysing the large and complex genomes of these crops and for facilitating biological research in grasses is an important goal in plant biology. We describe a bacterial artificial chromosome (BAC)-based physical map of the wild pooid grass Brachypodium distachyon and integrate this with whole genome shotgun sequence (WGS) assemblies using BAC end sequences (BES). The resulting physical map contains 26 contigs spanning the 272 Mb genome. BES from the physical map were also used to integrate a genetic map. This provides an independent vaildation and confirmation of the published WGS assembly. Mapped BACs were used in Fluorescence In Situ Hybridisation (FISH) experiments to align the integrated physical map and sequence assemblies to chromosomes with high resolution. The physical, genetic and cytogenetic maps, integrated with whole genome shotgun sequence assemblies, enhance the accuracy and durability of this important genome sequence and will directly facilitate gene isolation. PMID:20976139
Zhao, Kaixi; Margaria, Paolo; Rosa, Cristina
2018-05-10
Impatiens necrotic spot orthotospovirus (INSV) can impact economically important ornamental plants and vegetables worldwide. Characterization studies on INSV are limited. For most INSV isolates, there are no complete genome sequences available. This lack of genomic information has a negative impact on the understanding of the INSV genetic diversity and evolution. Here we report the first complete nucleotide sequence of a US INSV isolate. INSV-UP01 was isolated from an impatiens in Pennsylvania, US. RT-PCR was used to clone its full-length genome and Vector NTI to assemble overlapping sequences. Phylogenetic trees were constructed by using MEGA7 software to show the phylogenetic relationships with other available INSV sequences worldwide. This US isolate has genome and biological features classical of INSV species and clusters in the Western Hemisphere clade, but its origin appears to be recent. Furthermore, INSV-UP01 might have been involved in a recombination event with an Italian isolate belonging to the Asian clade. Our analyses support that INSV isolates infect a broad plant-host range they group by geographic origin and not by host, and are subjected to frequent recombination events. These results justify the need to generate and analyze complete genome sequences of orthotospoviruses in general and INSV in particular.
2018-01-01
ABSTRACT Mucorales are ubiquitous environmental molds responsible for mucormycosis in diabetic, immunocompromised, and severely burned patients. Small outbreaks of invasive wound mucormycosis (IWM) have already been reported in burn units without extensive microbiological investigations. We faced an outbreak of IWM in our center and investigated the clinical isolates with whole-genome sequencing (WGS) analysis. We analyzed M. circinelloides isolates from patients in our burn unit (BU1, Hôpital Saint-Louis, Paris, France) together with nonoutbreak isolates from Burn Unit 2 (BU2, Paris area) and from France over a 2-year period (2013 to 2015). A total of 21 isolates, including 14 isolates from six BU1 patients, were analyzed by whole-genome sequencing (WGS). Phylogenetic classification based on de novo assembly and assembly free approaches showed that the clinical isolates clustered in four highly divergent clades. Clade 1 contained at least one of the strains from the six epidemiologically linked BU1 patients. The clinical isolates were specific to each patient. Two patients were infected with more than two strains from different clades, suggesting that an environmental reservoir of clonally unrelated isolates was the source of contamination. Only two patients from BU1 shared one strain, which could correspond to direct transmission or contamination with the same environmental source. In conclusion, WGS of several isolates per patients coupled with precise epidemiological data revealed a complex situation combining potential cross-transmission between patients and multiple contaminations with a heterogeneous pool of strains from a cryptic environmental reservoir. PMID:29691339
Curtobacterium sp. Genome Sequencing Underlines Plant Growth Promotion-Related Traits
Bulgari, Daniela; Minio, Andrea; Casati, Paola; Quaglino, Fabio; Delledonne, Massimo
2014-01-01
Endophytic bacteria are microorganisms residing in plant tissues without causing disease symptoms. Here, we provide the high-quality genome sequence of Curtobacterium sp. strain S6, isolated from grapevine plant. The genome assembly contains 2,759,404 bp in 13 contigs and 2,456 predicted genes. PMID:25035321
California mild CTV strains that break resistance in Trifoliate Orange
USDA-ARS?s Scientific Manuscript database
This is the final report of a project to characterize California isolates of Citrus tristeza virus (CTV) that replicate in Poncirus trifoliata (trifoliate orange). Next Generation Sequencing (NGS) of viral small interfering RNAs (siRNAs) and assembly of full-length sequences of mild California CTV i...
Draft Genome Sequence of the Putrescine-Producing Strain Lactococcus lactis subsp. lactis 1AA59
del Rio, Beatriz; Linares, Daniel M.; Fernandez, María; Mayo, Baltasar; Martín, M. Cruz
2015-01-01
We report here the 2,576,542-bp genome annotated draft assembly sequence of Lactococcus lactis subsp. lactis 1AA59. This strain—isolated from a traditional cheese—produces putrescine, one of the most frequently biogenic amines found in dairy products. PMID:26089428
Judge, Kim; Hunt, Martin; Reuter, Sandra; Tracey, Alan; Quail, Michael A; Parkhill, Julian; Peacock, Sharon J
2016-09-01
Translating the Oxford Nanopore MinION sequencing technology into medical microbiology requires on-going analysis that keeps pace with technological improvements to the instrument and release of associated analysis software. Here, we use a multidrug-resistant Enterobacter kobei isolate as a model organism to compare open source software for the assembly of genome data, and relate this to the time taken to generate actionable information. Three software tools (PBcR, Canu and miniasm) were used to assemble MinION data and a fourth (SPAdes) was used to combine MinION and Illumina data to produce a hybrid assembly. All four had a similar number of contigs and were more contiguous than the assembly using Illumina data alone, with SPAdes producing a single chromosomal contig. Evaluation of the four assemblies to represent the genome structure revealed a single large inversion in the SPAdes assembly, which also incorrectly integrated a plasmid into the chromosomal contig. Almost 50 %, 80 % and 90 % of MinION pass reads were generated in the first 6, 9 and 12 h, respectively. Using data from the first 6 h alone led to a less accurate, fragmented assembly, but data from the first 9 or 12 h generated similar assemblies to that from 48 h sequencing. Assemblies were generated in 2 h using Canu, indicating that going from isolate to assembled data is possible in less than 48 h. MinION data identified that genes responsible for resistance were carried by two plasmids encoding resistance to carbapenem and to sulphonamides, rifampicin and aminoglycosides, respectively.
Draft Genome Sequence of Microbacterium sp. Strain UCD-TDU (Phylum Actinobacteria)
Bendiks, Zachary A.; Lang, Jenna M.; Darling, Aaron E.; Coil, David A.
2013-01-01
Here, we present the draft genome sequence of Microbacterium sp. strain UCD-TDU, a member of the phylum Actinobacteria. The assembly contains 3,746,321 bp (in 8 scaffolds). This strain was isolated from a residential toilet as part of an undergraduate student research project to sequence reference genomes of microbes from the built environment. PMID:23516225
Methods For Self-Organizing Software
Bouchard, Ann M.; Osbourn, Gordon C.
2005-10-18
A method for dynamically self-assembling and executing software is provided, containing machines that self-assemble execution sequences and data structures. In addition to ordered functions calls (found commonly in other software methods), mutual selective bonding between bonding sites of machines actuates one or more of the bonding machines. Two or more machines can be virtually isolated by a construct, called an encapsulant, containing a population of machines and potentially other encapsulants that can only bond with each other. A hierarchical software structure can be created using nested encapsulants. Multi-threading is implemented by populations of machines in different encapsulants that are interacting concurrently. Machines and encapsulants can move in and out of other encapsulants, thereby changing the functionality. Bonding between machines' sites can be deterministic or stochastic with bonding triggering a sequence of actions that can be implemented by each machine. A self-assembled execution sequence occurs as a sequence of stochastic binding between machines followed by their deterministic actuation. It is the sequence of bonding of machines that determines the execution sequence, so that the sequence of instructions need not be contiguous in memory.
Bowden, Katherine E; Weigand, Michael R; Peng, Yanhui; Cassiday, Pamela K; Sammons, Scott; Knipe, Kristen; Rowe, Lori A; Loparev, Vladimir; Sheth, Mili; Weening, Keeley; Tondella, M Lucia; Williams, Margaret M
2016-01-01
During 2010 and 2012, California and Vermont, respectively, experienced statewide epidemics of pertussis with differences seen in the demographic affected, case clinical presentation, and molecular epidemiology of the circulating strains. To overcome limitations of the current molecular typing methods for pertussis, we utilized whole-genome sequencing to gain a broader understanding of how current circulating strains are causing large epidemics. Through the use of combined next-generation sequencing technologies, this study compared de novo, single-contig genome assemblies from 31 out of 33 Bordetella pertussis isolates collected during two separate pertussis statewide epidemics and 2 resequenced vaccine strains. Final genome architecture assemblies were verified with whole-genome optical mapping. Sixteen distinct genome rearrangement profiles were observed in epidemic isolate genomes, all of which were distinct from the genome structures of the two resequenced vaccine strains. These rearrangements appear to be mediated by repetitive sequence elements, such as high-copy-number mobile genetic elements and rRNA operons. Additionally, novel and previously identified single nucleotide polymorphisms were detected in 10 virulence-related genes in the epidemic isolates. Whole-genome variation analysis identified state-specific variants, and coding regions bearing nonsynonymous mutations were classified into functional annotated orthologous groups. Comprehensive studies on whole genomes are needed to understand the resurgence of pertussis and develop novel tools to better characterize the molecular epidemiology of evolving B. pertussis populations. IMPORTANCE Pertussis, or whooping cough, is the most poorly controlled vaccine-preventable bacterial disease in the United States, which has experienced a resurgence for more than a decade. Once viewed as a monomorphic pathogen, B. pertussis strains circulating during epidemics exhibit diversity visible on a genome structural level, previously undetectable by traditional sequence analysis using short-read technologies. For the first time, we combine short- and long-read sequencing platforms with restriction optical mapping for single-contig, de novo assembly of 31 isolates to investigate two geographically and temporally independent U.S. pertussis epidemics. These complete genomes reshape our understanding of B. pertussis evolution and strengthen molecular epidemiology toward one day understanding the resurgence of pertussis.
Dhar, Hena; Swarnkar, Mohit Kumar; Gulati, Arvind; Singh, Anil Kumar; Kasana, Ramesh Chand
2015-02-19
Paenibacillus sp. strain IHB B 3415 is a cellulase-producing psychrotrophic bacterium isolated from a soil sample from the cold deserts of Himachal Pradesh, India. Here, we report an 8.44-Mb assembly of its genome sequence with a G+C content of 50.77%. The data presented here will provide insights into the mechanisms of cellulose degradation at low temperature. Copyright © 2015 Dhar et al.
Garza-Ramos, Ulises; Tamayo-Legorreta, Elsa; Arellano-Quintanilla, Doris María; Rodriguez-Medina, Nadia; Silva-Sanchez, Jesús; Catalan-Najera, Juan; Rocha-Martínez, Marisol Karina; Bravo-Díaz, María Asunción
2018-01-01
ABSTRACT A colistin-resistant mcr-1-carrying Escherichia coli strain, RC2-007, was isolated from a swine farm in Mexico. This extraintestinal and uropathogenic strain of E. coli belongs to serotype O89:H9 and sequence type 744. Assembly and annotation resulted in a 4.9-Mb draft genome that revealed the presence of plasmid-mediated mcr-1-ISApI1 genes as part of a prophage. PMID:29519827
Brown, Nathan M; Mueller, Ryan S; Shepardson, Jonathan W; Landry, Zachary C; Morré, Jeffrey T; Maier, Claudia S; Hardy, F Joan; Dreher, Theo W
2016-06-13
Very few closed genomes of the cyanobacteria that commonly produce toxic blooms in lakes and reservoirs are available, limiting our understanding of the properties of these organisms. A new anatoxin-a-producing member of the Nostocaceae, Anabaena sp. WA102, was isolated from a freshwater lake in Washington State, USA, in 2013 and maintained in non-axenic culture. The Anabaena sp. WA102 5.7 Mbp genome assembly has been closed with long-read, single-molecule sequencing and separately a draft genome assembly has been produced with short-read sequencing technology. The closed and draft genome assemblies are compared, showing a correlation between long repeats in the genome and the many gaps in the short-read assembly. Anabaena sp. WA102 encodes anatoxin-a biosynthetic genes, as does its close relative Anabaena sp. AL93 (also introduced in this study). These strains are distinguished by differences in the genes for light-harvesting phycobilins, with Anabaena sp. AL93 possessing a phycoerythrocyanin operon. Biologically relevant structural variants in the Anabaena sp. WA102 genome were detected only by long-read sequencing: a tandem triplication of the anaBCD promoter region in the anatoxin-a synthase gene cluster (not triplicated in Anabaena sp. AL93) and a 5-kbp deletion variant present in two-thirds of the population. The genome has a large number of mobile elements (160). Strikingly, there was no synteny with the genome of its nearest fully assembled relative, Anabaena sp. 90. Structural and functional genome analyses indicate that Anabaena sp. WA102 has a flexible genome. Genome closure, which can be readily achieved with long-read sequencing, reveals large scale (e.g., gene order) and local structural features that should be considered in understanding genome evolution and function.
De novo characterization of Lentinula edodes C(91-3) transcriptome by deep Solexa sequencing.
Zhong, Mintao; Liu, Ben; Wang, Xiaoli; Liu, Lei; Lun, Yongzhi; Li, Xingyun; Ning, Anhong; Cao, Jing; Huang, Min
2013-02-01
Lentinula edodes, has been utilized as food, as well as, in popular medicine, moreover, its extract isolated from its mycelium and fruiting body have shown several therapeutic properties. Yet little is understood about its genes involved in these properties, and the absence of L.edodes genomes has been a barrier to the development of functional genomics research. However, high throughput sequencing technologies are now being widely applied to non-model species. To facilitate research on L.edodes, we leveraged Solexa sequencing technology in de novo assembly of L.edodes C(91-3) transcriptome. In a single run, we produced more than 57 million sequencing reads. These reads were assembled into 28,923 unigene sequences (mean size=689bp) including 18,120 unigenes with coding sequence (CDS). Based on similarity search with known proteins, assembled unigene sequences were annotated with gene descriptions, gene ontology (GO) and clusters of orthologous group (COG) terms. Our data provides the first comprehensive sequence resource available for functional genomics studies in L.edodes, and demonstrates the utility of Illumina/Solexa sequencing for de novo transcriptome characterization and gene discovery in a non-model mushroom. Copyright © 2012 Elsevier Inc. All rights reserved.
Li, Chien-Feng; Tang, Hui-Ling; Chiou, Chien-Shun; Tung, Kwong-Chung; Lu, Min-Chi; Lai, Yi-Chyi
2018-03-01
Klebsiella spp. are regarded as major pathogens causing infections in humans and various animals. Here we report the draft genome sequence of a CTX-M-type β-lactamase-producing Klebsiella quasipneumoniae subsp. similipneumoniae strain CHKP0062 isolated from a Yellow-margined Box turtle. An Illumina-Solexa platform was used to sequence the genome of CHKP0062. Qualified reads were assembled de novo using Velvet. The draft genome was annotated by the NCBI Prokaryotic Genome Annotation Pipeline (PGAP). The resistome and virulome of the strain were investigated. A total of 5423 protein-coding sequences, 87 tRNAs, 24 rRNAs and 12 ncRNAs were identified in the 5 699 275-bp genome. CHKP0062 was assigned to sequence type ST2131 with the K-loci type as KL67. No virulence-associated genes were identified. However, numerous antimicrobial resistance genes were present in this strain. Plasmid contigs were assembled and revealed homology to the multidrug resistance plasmids pC15-K, pCTX-M3 and pKF3-94, with the carriage of the class A β-lactamase genes bla TEM-1b and bla CTX-M-3 . The genome sequence reported in this study will be useful for comparative genomic analysis regarding the dissemination of clinically important antibiotic resistance genes among Klebsiella spp. isolated from humans and animals. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Curtobacterium sp. Genome Sequencing Underlines Plant Growth Promotion-Related Traits.
Bulgari, Daniela; Minio, Andrea; Casati, Paola; Quaglino, Fabio; Delledonne, Massimo; Bianco, Piero A
2014-07-17
Endophytic bacteria are microorganisms residing in plant tissues without causing disease symptoms. Here, we provide the high-quality genome sequence of Curtobacterium sp. strain S6, isolated from grapevine plant. The genome assembly contains 2,759,404 bp in 13 contigs and 2,456 predicted genes. Copyright © 2014 Bulgari et al.
NASA Technical Reports Server (NTRS)
Venkateswaran, Kasthuri; Kempf, Michael; Chen, Fei; Satomi, Masataka; Nicholson, Wayne; Kern, Roger
2003-01-01
One of the spore-formers isolated from a spacecraft-assembly facility, belonging to the genus Bacillus, is described on the basis of phenotypic characterization, 16S rDNA sequence analysis and DNA-DNA hybridization studies. It is a Gram-positive, facultatively anaerobic, rod-shaped eubacterium that produces endospores. The spores of this novel bacterial species exhibited resistance to UV, gamma-radiation, H2O2 and desiccation. The 18S rDNA sequence analysis revealed a clear affiliation between this strain and members of the low G+C Firmicutes. High 16S rDNA sequence similarity values were found with members of the genus Bacillus and this was supported by fatty acid profiles. The 16S rDNA sequence similarity between strain FO-92T and Bacillus benzoevorans DSM 5391T was very high. However, molecular characterizations employing small-subunit 16S rDNA sequences were at the limits of resolution for the differentiation of species in this genus, but DNA-DNA hybridization data support the proposal of FO-92T as Bacillus nealsonii sp. nov. (type strain is FO-92T =ATCC BAAM-519T =DSM 15077T).
2013-01-01
Background With high quantity and quality data production and low cost, next generation sequencing has the potential to provide new opportunities for plant phylogeographic studies on single and multiple species. Here we present an approach for in silicio chloroplast DNA assembly and single nucleotide polymorphism detection from short-read shotgun sequencing. The approach is simple and effective and can be implemented using standard bioinformatic tools. Results The chloroplast genome of Toona ciliata (Meliaceae), 159,514 base pairs long, was assembled from shotgun sequencing on the Illumina platform using de novo assembly of contigs. To evaluate its practicality, value and quality, we compared the short read assembly with an assembly completed using 454 data obtained after chloroplast DNA isolation. Sanger sequence verifications indicated that the Illumina dataset outperformed the longer read 454 data. Pooling of several individuals during preparation of the shotgun library enabled detection of informative chloroplast SNP markers. Following validation, we used the identified SNPs for a preliminary phylogeographic study of T. ciliata in Australia and to confirm low diversity across the distribution. Conclusions Our approach provides a simple method for construction of whole chloroplast genomes from shotgun sequencing of whole genomic DNA using short-read data and no available closely related reference genome (e.g. from the same species or genus). The high coverage of Illumina sequence data also renders this method appropriate for multiplexing and SNP discovery and therefore a useful approach for landscape level studies of evolutionary ecology. PMID:23497206
Bainomugisa, Arnold; Duarte, Tania; Lavu, Evelyn; Pandey, Sushil; Coulter, Chris; Marais, Ben J; Coin, Lachlan M
2018-06-15
A better understanding of the genomic changes that facilitate the emergence and spread of drug-resistant Mycobacterium tuberculosis strains is currently required. Here, we report the use of the MinION nanopore sequencer (Oxford Nanopore Technologies) to sequence and assemble an extensively drug-resistant (XDR) isolate, which is part of a modern Beijing sub-lineage strain, prevalent in Western Province, Papua New Guinea. Using 238-fold coverage obtained from a single flow-cell, de novo assembly of nanopore reads resulted into one contiguous assembly with 99.92 % assembly accuracy. Incorporation of complementary short read sequences (Illumina) as part of consensus error correction resulted in a 4 404 064 bp genome with 99.98 % assembly accuracy. This assembly had an average nucleotide identity of 99.7 % relative to the reference genome, H37Rv. We assembled nearly all GC-rich repetitive PE/PPE family genes (166/168) and identified variants within these genes. With an estimated genotypic error rate of 5.3 % from MinION data, we demonstrated identification of variants to include the conventional drug resistance mutations, and those that contribute to the resistance phenotype (efflux pumps/transporter) and virulence. Reference-based alignment of the assembly allowed detection of deletions and insertions. MinION sequencing provided a fully annotated assembly of a transmissible XDR strain from an endemic setting and showed its utility to provide further understanding of genomic processes within Mycobacterium tuberculosis.
Transcriptome of the Caribbean stony coral Porites astreoides from three developmental stages.
Mansour, Tamer A; Rosenthal, Joshua J C; Brown, C Titus; Roberson, Loretta M
2016-08-02
Porites astreoides is a ubiquitous species of coral on modern Caribbean reefs that is resistant to increasing temperatures, overfishing, and other anthropogenic impacts that have threatened most other coral species. We assembled and annotated a transcriptome from this coral using Illumina sequences from three different developmental stages collected over several years: free-swimming larvae, newly settled larvae, and adults (>10 cm in diameter). This resource will aid understanding of coral calcification, larval settlement, and host-symbiont interactions. A de novo transcriptome for the P. astreoides holobiont (coral plus algal symbiont) was assembled using 594 Mbp of raw Illumina sequencing data generated from five age-specific cDNA libraries. The new transcriptome consists of 867 255 transcript elements with an average length of 685 bases. The isolated P. astreoides assembly consists of 129 718 transcript elements with an average length of 811 bases, and the isolated Symbiodinium sp. assembly had 186 177 transcript elements with an average length of 1105 bases. This contribution to coral transcriptome data provides a valuable resource for researchers studying the ontogeny of gene expression patterns within both the coral and its dinoflagellate symbiont.
USDA-ARS?s Scientific Manuscript database
Clostridium perfringens strain LLY_N11 is a commensal bacterial isolate from a healthy chicken that produced a necrotic enteritis in experimental studies. Here we present the assembly and annotation of its genome, which may provide further insights into improved understanding of the molecular mechan...
Yakym, Christopher J.; Helmkampf, Martin; Hagiwara, Kehau; Ip, Courtney G.; Antonio, Brandi J.; Armstrong, Ellie; Ulloa, Wesley J.; Awaya, Jonathan D.
2016-01-01
We report here the 6.0-Mb draft genome assembly of Pseudoalteromonas luteoviolacea strain IPB1 that was isolated from the Hawaiian marine sponge Iotrochota protea. Genome mining complemented with bioassay studies will elucidate secondary metabolite biosynthetic pathways and will help explain the ecological interaction between host sponge and microorganism. PMID:27660784
Whole-Genome Sequencing of Lactobacillus salivarius Strains BCRC 14759 and BCRC 12574
Chiu, Shih-Hau; Wang, Li-Ting; Huang, Lina
2017-01-01
ABSTRACT Lactobacillus salivarius BCRC 14759 has been identified as a high-exopolysaccharide-producing strain with potential as a probiotic or fermented dairy product. Here, we report the genome sequences of L. salivarius BCRC 14759 and the comparable strain BCRC 12574, isolated from human saliva. The PacBio RSII sequencing platform was used to obtain high-quality assemblies for characterization of this probiotic candidate. PMID:29167259
Saha, Surya; Hunter, Wayne B; Reese, Justin; Morgan, J Kent; Marutani-Hert, Mizuri; Huang, Hong; Lindeberg, Magdalen
2012-01-01
Diaphorina citri (Hemiptera: Psyllidae), the Asian citrus psyllid, is the insect vector of Ca. Liberibacter asiaticus, the causal agent of citrus greening disease. Sequencing of the D. citri metagenome has been initiated to gain better understanding of the biology of this organism and the potential roles of its bacterial endosymbionts. To corroborate candidate endosymbionts previously identified by rDNA amplification, raw reads from the D. citri metagenome sequence were mapped to reference genome sequences. Results of the read mapping provided the most support for Wolbachia and an enteric bacterium most similar to Salmonella. Wolbachia-derived reads were extracted using the complete genome sequences for four Wolbachia strains. Reads were assembled into a draft genome sequence, and the annotation assessed for the presence of features potentially involved in host interaction. Genome alignment with the complete sequences reveals membership of Wolbachia wDi in supergroup B, further supported by phylogenetic analysis of FtsZ. FtsZ and Wsp phylogenies additionally indicate that the Wolbachia strain in the Florida D. citri isolate falls into a sub-clade of supergroup B, distinct from Wolbachia present in Chinese D. citri isolates, supporting the hypothesis that the D. citri introduced into Florida did not originate from China.
Saha, Surya; Hunter, Wayne B.; Reese, Justin; Morgan, J. Kent; Marutani-Hert, Mizuri; Huang, Hong; Lindeberg, Magdalen
2012-01-01
Diaphorina citri (Hemiptera: Psyllidae), the Asian citrus psyllid, is the insect vector of Ca. Liberibacter asiaticus, the causal agent of citrus greening disease. Sequencing of the D. citri metagenome has been initiated to gain better understanding of the biology of this organism and the potential roles of its bacterial endosymbionts. To corroborate candidate endosymbionts previously identified by rDNA amplification, raw reads from the D. citri metagenome sequence were mapped to reference genome sequences. Results of the read mapping provided the most support for Wolbachia and an enteric bacterium most similar to Salmonella. Wolbachia-derived reads were extracted using the complete genome sequences for four Wolbachia strains. Reads were assembled into a draft genome sequence, and the annotation assessed for the presence of features potentially involved in host interaction. Genome alignment with the complete sequences reveals membership of Wolbachia wDi in supergroup B, further supported by phylogenetic analysis of FtsZ. FtsZ and Wsp phylogenies additionally indicate that the Wolbachia strain in the Florida D. citri isolate falls into a sub-clade of supergroup B, distinct from Wolbachia present in Chinese D. citri isolates, supporting the hypothesis that the D. citri introduced into Florida did not originate from China. PMID:23166822
Zapata, Luis; Ding, Jia; Willing, Eva-Maria; Hartwig, Benjamin; Bezdan, Daniela; Jiao, Wen-Biao; Patel, Vipul; Velikkakam James, Geo; Koornneef, Maarten; Ossowski, Stephan; Schneeberger, Korbinian
2016-07-12
Resequencing or reference-based assemblies reveal large parts of the small-scale sequence variation. However, they typically fail to separate such local variation into colinear and rearranged variation, because they usually do not recover the complement of large-scale rearrangements, including transpositions and inversions. Besides the availability of hundreds of genomes of diverse Arabidopsis thaliana accessions, there is so far only one full-length assembled genome: the reference sequence. We have assembled 117 Mb of the A. thaliana Landsberg erecta (Ler) genome into five chromosome-equivalent sequences using a combination of short Illumina reads, long PacBio reads, and linkage information. Whole-genome comparison against the reference sequence revealed 564 transpositions and 47 inversions comprising ∼3.6 Mb, in addition to 4.1 Mb of nonreference sequence, mostly originating from duplications. Although rearranged regions are not different in local divergence from colinear regions, they are drastically depleted for meiotic recombination in heterozygotes. Using a 1.2-Mb inversion as an example, we show that such rearrangement-mediated reduction of meiotic recombination can lead to genetically isolated haplotypes in the worldwide population of A. thaliana Moreover, we found 105 single-copy genes, which were only present in the reference sequence or the Ler assembly, and 334 single-copy orthologs, which showed an additional copy in only one of the genomes. To our knowledge, this work gives first insights into the degree and type of variation, which will be revealed once complete assemblies will replace resequencing or other reference-dependent methods.
Garza-Ramos, Ulises; Tamayo-Legorreta, Elsa; Arellano-Quintanilla, Doris María; Rodriguez-Medina, Nadia; Silva-Sanchez, Jesús; Catalan-Najera, Juan; Rocha-Martínez, Marisol Karina; Bravo-Díaz, María Asunción; Alpuche-Aranda, Celia
2018-03-08
A colistin-resistant mcr-1 -carrying Escherichia coli strain, RC2-007, was isolated from a swine farm in Mexico. This extraintestinal and uropathogenic strain of E. coli belongs to serotype O89:H9 and sequence type 744. Assembly and annotation resulted in a 4.9-Mb draft genome that revealed the presence of plasmid-mediated mcr-1 -IS ApI1 genes as part of a prophage. Copyright © 2018 Garza-Ramos et al.
Expanding the Species and Chemical Diversity of Penicillium Section Cinnamopurpurea
Peterson, Stephen W.; Jurjević, Željko; Frisvad, Jens C.
2015-01-01
A set of isolates very similar to or potentially conspecific with an unidentified Penicillium isolate NRRL 735, was assembled using a BLAST search of ITS similarity among described (GenBank) and undescribed Penicillium isolates in our laboratories. DNA was amplified from six loci of the assembled isolates and sequenced. Two species in section Cinnamopurpurea are self-compatible sexual species, but the asexual species had polymorphic loci suggestive of sexual reproduction and variation in conidium size suggestive of ploidy level differences typical of heterothallism. Accordingly we use genealogical concordance analysis, a technique valid only in heterothallic organisms, for putatively asexual species. Seven new species were revealed in the analysis and are described here. Extrolite analysis showed that two of the new species, P. colei and P. monsserratidens produce the mycotoxin citreoviridin that has demonstrated pharmacological activity against human lung tumors. These isolates could provide leads in pharmaceutical research. PMID:25853891
Complete Coding Genome Sequence for Mogiana Tick Virus, a Jingmenvirus Isolated from Ticks in Brazil
2017-05-04
and capable of infecting a wide range of animal hosts (1–5). Here, we report the complete coding genome sequence (i.e., only missing portions of...segmented nature of the genome was not under- stood. Therefore, only the two genome segments with detectable sequence homolo- gies to flaviviruses were...originally reported (2). We revisited the data set of Maruyama et al. (2) and assembled the complete coding sequences for all four genome segments. We
Whole-Genome Sequencing of Lactobacillus salivarius Strains BCRC 14759 and BCRC 12574.
Chiu, Shih-Hau; Chen, Chien-Chi; Wang, Li-Ting; Huang, Lina
2017-11-22
Lactobacillus salivarius BCRC 14759 has been identified as a high-exopolysaccharide-producing strain with potential as a probiotic or fermented dairy product. Here, we report the genome sequences of L. salivarius BCRC 14759 and the comparable strain BCRC 12574, isolated from human saliva. The PacBio RSII sequencing platform was used to obtain high-quality assemblies for characterization of this probiotic candidate. Copyright © 2017 Chiu et al.
Young, Lydia M.; Tu, Ling-Hsien; Raleigh, Daniel P.; Ashcroft, Alison E.
2017-01-01
Although amyloid assembly in vitro is commonly investigated using single protein sequences, fibril formation in vivo can be more heterogeneous, involving co-assembly of proteins of different length, sequence and/or post-translational modifications. Emerging evidence suggests that co-polymerization can alter the rate and/or mechanism of aggregation and can contribute to pathogenicity. Electrospray ionization-ion mobility spectrometry-mass spectrometry (ESI-IMS-MS) is uniquely suited to the study of these heterogeneous ensembles. Here, ESI-IMS-MS combined with analysis of fibrillation rates using thioflavin T (ThT) fluorescence, is used to track the course of aggregation of variants of islet-amyloid polypeptide (IAPP) in isolation and in pairwise mixtures. We identify a sub-population of extended monomers as the key precursors of amyloid assembly, and reveal that the fastest aggregating sequence in peptide mixtures determines the lag time of fibrillation, despite being unable to cross-seed polymerization. The results demonstrate that co-polymerization of IAPP sequences radically alters the rate of amyloid assembly by altering the conformational properties of the mixed oligomers that form. PMID:28970890
Draft Genome Sequence of Hafnia paralvei Strain GTA-HAF03.
Kohlman, Melissa E; Carrillo, Catherine D; Wong, Alex
2015-02-19
Hafnia paralvei is a Gram-negative member of the Enterobacteriaceae family, closely related to the opportunistic pathogen Hafnia alvei. We report here the first draft genome sequence of H. paralvei, from the beef trim isolate GTA-HAF03, consisting of a 5.0-Mbp assembly encoding 4,382 proteins and 90 predicted RNAs. Copyright © 2015 Kohlman et al.
Genome Sequence Analysis of the Biogenic Amine-Degrading Strain Lactobacillus casei 5b
Ladero, Victor; Herrero-Fresno, Ana; Martinez, Noelia; del Río, Beatriz; Linares, Daniel M.; Fernández, María; Martín, María Cruz
2014-01-01
We here report a 3.02-Mbp annotated draft assembly of the Lactobacillus casei 5b genome. The sequence of this biogenic amine-degrading dairy isolate may help identify the mechanisms involved in the catabolism of biogenic amines and perhaps shed light on ways to reduce the presence of these toxic compounds in food. PMID:24435875
Whole genome sequence and comparative analysis of Borrelia burgdorferi MM1
Jabbari, Neda; Reddy, Panga Jaipal; Hood, Leroy
2018-01-01
Lyme disease is caused by spirochaetes of the Borrelia burgdorferi sensu lato genospecies. Complete genome assemblies are available for fewer than ten strains of Borrelia burgdorferi sensu stricto, the primary cause of Lyme disease in North America. MM1 is a sensu stricto strain originally isolated in the midwestern United States. Aside from a small number of genes, the complete genome sequence of this strain has not been reported. Here we present the complete genome sequence of MM1 in relation to other sensu stricto strains and in terms of its Multi Locus Sequence Typing. Our results indicate that MM1 is a new sequence type which contains a conserved main chromosome and 15 plasmids. Our results include the first contiguous 28.5 kb assembly of lp28-8, a linear plasmid carrying the vls antigenic variation system, from a Borrelia burgdorferi sensu stricto strain. PMID:29889842
Tripathi, Charu; Mahato, Nitish K; Rani, Pooja; Singh, Yogendra; Kamra, Komal; Lal, Rup
2016-01-01
Lampropedia cohaerens strain CT6(T), a non-motile, aerobic and coccoid strain was isolated from arsenic rich microbial mats (temperature ~45 °C) of a hot water spring located atop the Himalayan ranges at Manikaran, India. The present study reports the first genome sequence of type strain CT6(T) of genus Lampropedia cohaerens. Sequencing data was generated using the Illumina HiSeq 2000 platform and assembled with ABySS v 1.3.5. The 3,158,922 bp genome was assembled into 41 contigs with a mean GC content of 63.5 % and 2823 coding sequences. Strain CT6(T) was found to harbour genes involved in both the Entner-Duodoroff pathway and non-phosphorylated ED pathway. Strain CT6(T) also contained genes responsible for imparting resistance to arsenic, copper, cobalt, zinc, cadmium and magnesium, providing survival advantages at a thermal location. Additionally, the presence of genes associated with biofilm formation, pyrroloquinoline-quinone production, isoquinoline degradation and mineral phosphate solubilisation in the genome demonstrate the diverse genetic potential for survival at stressed niches.
Behrendt, Undine; Schumann, Peter; Stieglmeier, Michaela; Pukall, Rüdiger; Augustin, Jürgen; Spröer, Cathrin; Schwendner, Petra; Moissl-Eichinger, Christine; Ulrich, Andreas
2010-10-01
In the course of studying the influence of N-fertilization on N(2) and N(2)O flux rates in relation to soil bacterial community composition of a long-term fertilization experiment in fen peat grassland, a strain group was isolated that was related to a strain isolated from a spacecraft assembly clean room during diversity studies of microorganisms, which withstood cleaning and bioburden reduction strategies. Both the fen soil isolates and the clean room strain revealed versatile physiological capacities in N-transformation processes by performing heterotrophic nitrification, respiratory ammonification and denitrification activity. Phylogenetic analysis based on 16S rRNA gene sequences demonstrated that the investigated isolates belonged to the genus Paenibacillus. Sequence similarities lower than 97% in comparison to established species indicated a separate species position. Except for the peptidoglycan type (A4alpha L-Lys-D-Asp), chemotaxonomic features of the isolates matched the genus description, but differences in several physiological characteristics separated them from related species and supported their novel species status. Despite a high 16S rRNA gene sequence similarity between the clean room isolate ES_MS17(T) and the representative fen soil isolate N3/975(T), DNA-DNA hybridization studies revealed genetic differences at the species level. These differences were substantiated by MALDI-TOF MS analysis, ribotyping and several distinct physiological characteristics. On the basis of these results, it was concluded that the fen soil isolates and the clean room isolate ES_MS17(T) represented two novel species for which the names Paenibacillus uliginis sp. nov. (type strain N3/975(T)=DSM 21861(T)=LMG 24790(T)) and Paenibacillus purispatii sp. nov. (type strain ES_MS17(T)=DSM 22991(T)=CIP 110057(T)) are proposed. Copyright © 2010 Elsevier GmbH. All rights reserved.
Marcy, Yann; Ouverney, Cleber; Bik, Elisabeth M.; Lösekann, Tina; Ivanova, Natalia; Martin, Hector Garcia; Szeto, Ernest; Platt, Darren; Hugenholtz, Philip; Relman, David A.; Quake, Stephen R.
2007-01-01
We have developed a microfluidic device that allows the isolation and genome amplification of individual microbial cells, thereby enabling organism-level genomic analysis of complex microbial ecosystems without the need for culture. This device was used to perform a directed survey of the human subgingival crevice and to isolate bacteria having rod-like morphology. Several isolated microbes had a 16S rRNA sequence that placed them in candidate phylum TM7, which has no cultivated or sequenced members. Genome amplification from individual TM7 cells allowed us to sequence and assemble >1,000 genes, providing insight into the physiology of members of this phylum. This approach enables single-cell genetic analysis of any uncultivated minority member of a microbial community. PMID:17620602
Karamitros, Timokratis; Piorkowska, Renata; Katzourakis, Aris; Magiorkinis, Gkikas; Mbisa, Jean Lutamyo
2016-01-01
Human herpesvirus type 1 (HHV-1) has a large double-stranded DNA genome of approximately 152 kbp that is structurally complex and GC-rich. This makes the assembly of HHV-1 whole genomes from short-read sequencing data technically challenging. To improve the assembly of HHV-1 genomes we have employed a hybrid genome assembly protocol using data from two sequencing technologies: the short-read Roche 454 and the long-read Oxford Nanopore MinION sequencers. We sequenced 18 HHV-1 cell culture-isolated clinical specimens collected from immunocompromised patients undergoing antiviral therapy. The susceptibility of the samples to several antivirals was determined by plaque reduction assay. Hybrid genome assembly resulted in a decrease in the number of contigs in 6 out of 7 samples and an increase in N(G)50 and N(G)75 of all 7 samples sequenced by both technologies. The approach also enhanced the detection of non-canonical contigs including a rearrangement between the unique (UL) and repeat (T/IRL) sequence regions of one sample that was not detectable by assembly of 454 reads alone. We detected several known and novel resistance-associated mutations in UL23 and UL30 genes. Genome-wide genetic variability ranged from <1% to 53% of amino acids in each gene exhibiting at least one substitution within the pool of samples. The UL23 gene had one of the highest genetic variabilities at 35.2% in keeping with its role in development of drug resistance. The assembly of accurate, full-length HHV-1 genomes will be useful in determining genetic determinants of drug resistance, virulence, pathogenesis and viral evolution. The numerous, complex repeat regions of the HHV-1 genome currently remain a barrier towards this goal. PMID:27309375
Karamitros, Timokratis; Harrison, Ian; Piorkowska, Renata; Katzourakis, Aris; Magiorkinis, Gkikas; Mbisa, Jean Lutamyo
2016-01-01
Human herpesvirus type 1 (HHV-1) has a large double-stranded DNA genome of approximately 152 kbp that is structurally complex and GC-rich. This makes the assembly of HHV-1 whole genomes from short-read sequencing data technically challenging. To improve the assembly of HHV-1 genomes we have employed a hybrid genome assembly protocol using data from two sequencing technologies: the short-read Roche 454 and the long-read Oxford Nanopore MinION sequencers. We sequenced 18 HHV-1 cell culture-isolated clinical specimens collected from immunocompromised patients undergoing antiviral therapy. The susceptibility of the samples to several antivirals was determined by plaque reduction assay. Hybrid genome assembly resulted in a decrease in the number of contigs in 6 out of 7 samples and an increase in N(G)50 and N(G)75 of all 7 samples sequenced by both technologies. The approach also enhanced the detection of non-canonical contigs including a rearrangement between the unique (UL) and repeat (T/IRL) sequence regions of one sample that was not detectable by assembly of 454 reads alone. We detected several known and novel resistance-associated mutations in UL23 and UL30 genes. Genome-wide genetic variability ranged from <1% to 53% of amino acids in each gene exhibiting at least one substitution within the pool of samples. The UL23 gene had one of the highest genetic variabilities at 35.2% in keeping with its role in development of drug resistance. The assembly of accurate, full-length HHV-1 genomes will be useful in determining genetic determinants of drug resistance, virulence, pathogenesis and viral evolution. The numerous, complex repeat regions of the HHV-1 genome currently remain a barrier towards this goal.
Borelli, Guilherme; José, Juliana; Teixeira, Paulo José Pereira Lima; dos Santos, Leandro Vieira
2016-01-01
Candida boidinii and Candida sojae yeasts were isolated from energy cane bagasse and plague-insects. Both have fast xylose uptake rate and produce great amounts of xylitol, which are interesting features for food and 2G ethanol industries. Because they lack published genomes, we have sequenced and assembled them, offering new possibilities for gene prospection. PMID:26769937
Draft genome sequence of field isolate Brucella melitensis strain 2007BM/1 from India.
Singh, D K; Kumar, Bablu; Shrinet, Garima; Singh, R P; Das, Aparajita; Mantur, B G; Abhishek; Pandey, Aruna; Mondal, Piyali; Sajjanar, B K; Doimari, Soni; Singh, Vijayata; Kumari, Reena; Tiwari, A K; Gandham, Ravi Kumar
2018-04-21
Brucellosis is among one of the most widespread important global zoonotic diseases that is endemic in many parts of India. Brucella melitensis is supposed to be the most pathogenic species for humans. Here we report the draft genome sequence of B. melitensis strain 2007BM/1 isolated from a human in India. Genomic DNA was extracted from Brucella culture and was sequenced using an Illumina MiSeq platform. The generated reads were assembled using three de novo assemblers and the draft genome was annotated. This monoisolate, with a genome length of 3268756bp, was found to be resistant to azithromycin and trimethoprim/sulfamethoxazole but susceptible to tetracycline, ofloxacin, rifampicin, ciprofloxacin and doxycycline. The presence of virulence genes in the strain was identified. The results obtained will help in understanding drug resistance mechanisms and virulence factors in highly zoonotic B. melitensis and suggest the need for judicious use of antibiotics in livestock health and management practices. Copyright © 2018 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Brown, Steven D.; Utturkar, Sagar M.; Magnuson, Timothy S.; ...
2014-09-04
Pelosinus fermentans strain R7 was isolated from Russian kaolin clays as the type strain and it can reduce Fe(III) during fermentative growth (1). Draft genome sequences for P. fermentans R7 and four strains from Hanford, Washington, USA, have been published (2–4). The P. fermentans 16S rRNA sequence dominated the lactate-based enrichment cultures from three geochemically contrasting soils from the Melton Branch Watershed, Oak Ridge, Tennessee, USA (5) and also at another stimulated, uraniumcontaminated field site near Oak Ridge (6). For the current work, strain UFO1 was isolated from pristine sediments at a background field site in Oak Ridge and characterizedmore » as facilitating U(VI) reduction and precipitation with phosphate (7).« less
Beet western yellows virus infects the carnivorous plant Nepenthes mirabilis.
Miguel, Sissi; Biteau, Flore; Mignard, Benoit; Marais, Armelle; Candresse, Thierry; Theil, Sébastien; Bourgaud, Frédéric; Hehn, Alain
2016-08-01
Although poleroviruses are known to infect a broad range of higher plants, carnivorous plants have not yet been reported as hosts. Here, we describe the first polerovirus naturally infecting the pitcher plant Nepenthes mirabilis. The virus was identified through bioinformatic analysis of NGS transcriptome data. The complete viral genome sequence was assembled from overlapping PCR fragments and shown to share 91.1 % nucleotide sequence identity with the US isolate of beet western yellows virus (BWYV). Further analysis of other N. mirabilis plants revealed the presence of additional BWYV isolates differing by several insertion/deletion mutations in ORF5.
Ivy, Reid A; Farber, Jeffrey M; Pagotto, Franco; Wiedmann, Martin
2013-01-01
Foodborne pathogen isolate collections are important for the development of detection methods, for validation of intervention strategies, and to develop an understanding of pathogenesis and virulence. We have assembled a publicly available Cronobacter (formerly Enterobacter sakazakii) isolate set that consists of (i) 25 Cronobacter sakazakii isolates, (ii) two Cronobacter malonaticus isolates, (iii) one Cronobacter muytjensii isolate, which displays some atypical phenotypic characteristics, biochemical profiles, and colony color on selected differential media, and (iv) two nonclinical Enterobacter asburiae isolates, which show some phenotypic characteristics similar to those of Cronobacter spp. The set consists of human (n = 10), food (n = 11), and environmental (n = 9) isolates. Analysis of partial 16S rDNA sequence and seven-gene multilocus sequence typing data allowed for reliable identification of these isolates to species and identification of 14 isolates as sequence type 4, which had previously been shown to be the most common C. sakazakii sequence type associated with neonatal meningitis. Phenotypic characterization was carried out with API 20E and API 32E test strips and streaking on two selective chromogenic agars; isolates were also assessed for sorbitol fermentation and growth at 45°C. Although these strategies typically produced the same classification as sequence-based strategies, based on a panel of four biochemical tests, one C. sakazakii isolate yielded inconclusive data and one was classified as C. malonaticus. EcoRI automated ribotyping and pulsed-field gel electrophoresis (PFGE) with XbaI separated the set into 23 unique ribotypes and 30 unique PFGE types, respectively, indicating subtype diversity within the set. Subtype and source data for the collection are publicly available in the PathogenTracker database (www. pathogentracker. net), which allows for continuous updating of information on the set, including links to publications that include information on isolates from this collection.
Yokomi, Raymond K; Selvaraj, Vijayanandraj; Maheshwari, Yogita; Saponari, Maria; Giampetruzzi, Annalisa; Chiumenti, Michela; Hajeri, Subhas
2017-07-01
Most Citrus tristeza virus (CTV) isolates in California are biologically mild and symptomless in commercial cultivars on CTV tolerant rootstocks. However, to better define California CTV isolates showing divergent serological and genetic profiles, selected isolates were subjected to deep sequencing of small RNAs. Full-length sequences were assembled, annotated and trifoliate orange resistance-breaking (RB) isolates of CTV were identified. Phylogenetic relationships based on their full genomes placed three isolates in the RB clade: CA-RB-115, CA-RB-AT25, and CA-RB-AT35. The latter two isolates were obtained by aphid transmission from Murcott and Dekopon trees, respectively, containing CTV mixtures. The California RB isolates were further distinguished into two subclades. Group I included CA-RB-115 and CA-RB-AT25 with 99% nucleotide sequence identity with RB type strain NZRB-G90; and group II included CA-RB-AT35 with 99 and 96% sequence identity with Taiwan Pumelo/SP/T1 and HA18-9, respectively. The RB phenotype was confirmed by detecting CTV replication in graft-inoculated Poncirus trifoliata and transmission from P. trifoliata to sweet orange. The California RB isolates induced mild symptoms compared with severe isolates in greenhouse indexing tests. Further examination of 570 CTV accessions, acquired from approximately 1960 and maintained in planta at the Central California Tristeza Eradication Agency, revealed 16 RB positive isolates based on partial p65 sequences. Six isolates collected from 1992 to 2011 from Tulare and Kern counties were CA-RB-115-like; and 10 isolates collected from 1968 to 2010 from Riverside, Fresno, and Kern counties were CA-RB-AT35-like. The presence of the RB genotype is relevant because P. trifoliata and its hybrids are the most popular rootstocks in California.
Singh, Deeksha; Chandrababunaidu, Mathu Malar; Panda, Arijit; Sen, Diya; Bhattacharyya, Sourav
2015-01-01
The draft genome assembly of Hassallia byssoidea strain VB512170 with a genome size of ~13 Mb and 10,183 protein-coding genes in 62 scaffolds is reported here for the first time. This is a terrestrial hydrophobic cyanobacterium isolated from monuments in India. We report several copies of luciferase and antibiotic genes in this organism. PMID:25745001
Sakai-Kawada, Francis E; Yakym, Christopher J; Helmkampf, Martin; Hagiwara, Kehau; Ip, Courtney G; Antonio, Brandi J; Armstrong, Ellie; Ulloa, Wesley J; Awaya, Jonathan D
2016-09-22
We report here the 6.0-Mb draft genome assembly of Pseudoalteromonas luteoviolacea strain IPB1 that was isolated from the Hawaiian marine sponge Iotrochota protea Genome mining complemented with bioassay studies will elucidate secondary metabolite biosynthetic pathways and will help explain the ecological interaction between host sponge and microorganism. Copyright © 2016 Sakai-Kawada et al.
FARME DB: a functional antibiotic resistance element database
Wallace, James C.; Port, Jesse A.; Smith, Marissa N.; Faustman, Elaine M.
2017-01-01
Antibiotic resistance (AR) is a major global public health threat but few resources exist that catalog AR genes outside of a clinical context. Current AR sequence databases are assembled almost exclusively from genomic sequences derived from clinical bacterial isolates and thus do not include many microbial sequences derived from environmental samples that confer resistance in functional metagenomic studies. These environmental metagenomic sequences often show little or no similarity to AR sequences from clinical isolates using standard classification criteria. In addition, existing AR databases provide no information about flanking sequences containing regulatory or mobile genetic elements. To help address this issue, we created an annotated database of DNA and protein sequences derived exclusively from environmental metagenomic sequences showing AR in laboratory experiments. Our Functional Antibiotic Resistant Metagenomic Element (FARME) database is a compilation of publically available DNA sequences and predicted protein sequences conferring AR as well as regulatory elements, mobile genetic elements and predicted proteins flanking antibiotic resistant genes. FARME is the first database to focus on functional metagenomic AR gene elements and provides a resource to better understand AR in the 99% of bacteria which cannot be cultured and the relationship between environmental AR sequences and antibiotic resistant genes derived from cultured isolates. Database URL: http://staff.washington.edu/jwallace/farme PMID:28077567
Ma, Zhiwei; Shen, Xuemei; Wang, Wei; Peng, Huasong; Xu, Ping; Zhang, Xuehong
2012-01-01
Sphingomonas wittichii DP58 (CCTCC M 2012027), the first reported phenazine-1-carboxylic acid (PCA)-degrading strain, was isolated from pimiento rhizosphere soils. Here we present a 5.6-Mb assembly of its genome. This sequence would contribute to the elucidation of the molecular mechanism of PCA degradation to improve the antifungal's effectiveness or remove superfluous PCA. PMID:22689229
Das, Subhadeep; Singh, Deeksha; Madduluri, Madhavi; Chandrababunaidu, Mathu Malar; Gupta, Akash
2015-01-01
We report here the draft genome sequence of Tolypothrix campylonemoides VB511288, isolated from building facades in Santiniketan, India. The members of this genus produce several compounds of commercial importance. The draft assembly is 10,627,177 bases in 135 scaffolds, and it contains 7,886 protein-coding genes, 994 pseudogenes, 18 rRNA genes, and 76 tRNA genes. PMID:25838485
Genome Sequence of Torulaspora delbrueckii NRRL Y-50541, Isolated from Mezcal Fermentation
Gomez-Angulo, Jorge; Vega-Alvarado, Leticia; Escalante-García, Zazil; Grande, Ricardo; Gschaedler-Mathis, Anne; Amaya-Delgado, Lorena
2015-01-01
Torulaspora delbrueckii presents metabolic features interesting for biotechnological applications (in the dairy and wine industries). Recently, the T. delbrueckii CBS 1146 genome, which has been maintained under laboratory conditions since 1970, was published. Thus, a genome of a new mezcal yeast was sequenced and characterized and showed genetic differences and a higher genome assembly quality, offering a better reference genome. PMID:26205871
Hasan, Nabeeh A; Warren, René L; Epperson, L Elaine; Malecha, Allyson; Alexander, David C; Turenne, Christine Y; MacMillan, Daniel; Birol, Inanc; Pleasance, Stephen; Coope, Robin; Jones, Steven J M; Romney, Marc G; Ng, Monica; Chan, Tracy; Rodrigues, Mabel; Tang, Patrick; Gardy, Jennifer L; Strong, Michael
2017-09-14
Mycobacterium chimaera , a nontuberculous mycobacterium (NTM) belonging to the Mycobacterium avium complex (MAC), is an opportunistic pathogen that can cause respiratory and disseminated disease. We report the complete genome sequence of a strain, SJ42, isolated from an immunocompromised male presenting with MAC pneumonia, assembled from Illumina and Oxford Nanopore data. Copyright © 2017 Hasan et al.
Draft Genome Sequence of Gordonia sp. Strain UCD-TK1 (Phylum Actinobacteria)
Koenigsaecker, Tynisha M.; Coil, David A.
2016-01-01
Here, we present the draft genome of Gordonia sp. strain UCD-TK1. The assembly contains 5,470,576 bp in 98 contigs. This strain was isolated from a disinfected ambulatory surgery center. PMID:27738036
Xia, Wei; Mason, Annaliese S.; Xia, Zhihui; Qiao, Fei; Zhao, Songlin; Tang, Haoru
2013-01-01
Background Cocos nucifera (coconut), a member of the Arecaceae family, is an economically important woody palm grown in tropical regions. Despite its agronomic importance, previous germplasm assessment studies have relied solely on morphological and agronomical traits. Molecular biology techniques have been scarcely used in assessment of genetic resources and for improvement of important agronomic and quality traits in Cocos nucifera, mostly due to the absence of available sequence information. Methodology/Principal Findings To provide basic information for molecular breeding and further molecular biological analysis in Cocos nucifera, we applied RNA-seq technology and de novo assembly to gain a global overview of the Cocos nucifera transcriptome from mixed tissue samples. Using Illumina sequencing, we obtained 54.9 million short reads and conducted de novo assembly to obtain 57,304 unigenes with an average length of 752 base pairs. Sequence comparison between assembled unigenes and released cDNA sequences of Cocos nucifera and Elaeis guineensis indicated that the assembled sequences were of high quality. Approximately 99.9% of unigenes were novel compared to the released coconut EST sequences. Using BLASTX, 68.2% of unigenes were successfully annotated based on the Genbank non-redundant (Nr) protein database. The annotated unigenes were then further classified using the Gene Ontology (GO), Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Conclusions/Significance Our study provides a large quantity of novel genetic information for Cocos nucifera. This information will act as a valuable resource for further molecular genetic studies and breeding in coconut, as well as for isolation and characterization of functional genes involved in different biochemical pathways in this important tropical crop species. PMID:23555859
Fan, Haikuo; Xiao, Yong; Yang, Yaodong; Xia, Wei; Mason, Annaliese S; Xia, Zhihui; Qiao, Fei; Zhao, Songlin; Tang, Haoru
2013-01-01
Cocos nucifera (coconut), a member of the Arecaceae family, is an economically important woody palm grown in tropical regions. Despite its agronomic importance, previous germplasm assessment studies have relied solely on morphological and agronomical traits. Molecular biology techniques have been scarcely used in assessment of genetic resources and for improvement of important agronomic and quality traits in Cocos nucifera, mostly due to the absence of available sequence information. To provide basic information for molecular breeding and further molecular biological analysis in Cocos nucifera, we applied RNA-seq technology and de novo assembly to gain a global overview of the Cocos nucifera transcriptome from mixed tissue samples. Using Illumina sequencing, we obtained 54.9 million short reads and conducted de novo assembly to obtain 57,304 unigenes with an average length of 752 base pairs. Sequence comparison between assembled unigenes and released cDNA sequences of Cocos nucifera and Elaeis guineensis indicated that the assembled sequences were of high quality. Approximately 99.9% of unigenes were novel compared to the released coconut EST sequences. Using BLASTX, 68.2% of unigenes were successfully annotated based on the Genbank non-redundant (Nr) protein database. The annotated unigenes were then further classified using the Gene Ontology (GO), Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Our study provides a large quantity of novel genetic information for Cocos nucifera. This information will act as a valuable resource for further molecular genetic studies and breeding in coconut, as well as for isolation and characterization of functional genes involved in different biochemical pathways in this important tropical crop species.
High quality de novo sequencing and assembly of the Saccharomyces arboricolus genome
2013-01-01
Background Comparative genomics is a formidable tool to identify functional elements throughout a genome. In the past ten years, studies in the budding yeast Saccharomyces cerevisiae and a set of closely related species have been instrumental in showing the benefit of analyzing patterns of sequence conservation. Increasing the number of closely related genome sequences makes the comparative genomics approach more powerful and accurate. Results Here, we report the genome sequence and analysis of Saccharomyces arboricolus, a yeast species recently isolated in China, that is closely related to S. cerevisiae. We obtained high quality de novo sequence and assemblies using a combination of next generation sequencing technologies, established the phylogenetic position of this species and considered its phenotypic profile under multiple environmental conditions in the light of its gene content and phylogeny. Conclusions We suggest that the genome of S. arboricolus will be useful in future comparative genomics analysis of the Saccharomyces sensu stricto yeasts. PMID:23368932
Garcia-Hermoso, Dea; Criscuolo, Alexis; Lee, Soo Chan; Legrand, Matthieu; Chaouat, Marc; Denis, Blandine; Lafaurie, Matthieu; Rouveau, Martine; Soler, Charles; Schaal, Jean-Vivien; Mimoun, Maurice; Mebazaa, Alexandre; Heitman, Joseph; Dromer, Françoise; Brisse, Sylvain; Bretagne, Stéphane; Alanio, Alexandre
2018-04-24
Mucorales are ubiquitous environmental molds responsible for mucormycosis in diabetic, immunocompromised, and severely burned patients. Small outbreaks of invasive wound mucormycosis (IWM) have already been reported in burn units without extensive microbiological investigations. We faced an outbreak of IWM in our center and investigated the clinical isolates with whole-genome sequencing (WGS) analysis. We analyzed M. circinelloides isolates from patients in our burn unit (BU1, Hôpital Saint-Louis, Paris, France) together with nonoutbreak isolates from Burn Unit 2 (BU2, Paris area) and from France over a 2-year period (2013 to 2015). A total of 21 isolates, including 14 isolates from six BU1 patients, were analyzed by whole-genome sequencing (WGS). Phylogenetic classification based on de novo assembly and assembly free approaches showed that the clinical isolates clustered in four highly divergent clades. Clade 1 contained at least one of the strains from the six epidemiologically linked BU1 patients. The clinical isolates were specific to each patient. Two patients were infected with more than two strains from different clades, suggesting that an environmental reservoir of clonally unrelated isolates was the source of contamination. Only two patients from BU1 shared one strain, which could correspond to direct transmission or contamination with the same environmental source. In conclusion, WGS of several isolates per patients coupled with precise epidemiological data revealed a complex situation combining potential cross-transmission between patients and multiple contaminations with a heterogeneous pool of strains from a cryptic environmental reservoir. IMPORTANCE Invasive wound mucormycosis (IWM) is a severe infection due to environmental molds belonging to the order Mucorales. Severely burned patients are particularly at risk for IWM. Here, we used whole-genome sequencing (WGS) analysis to resolve an outbreak of IWM due to Mucor circinelloides that occurred in our hospital (BU1). We sequenced 21 clinical isolates, including 14 from BU1 and 7 unrelated isolates, and compared them to the reference genome (1006PhL). This analysis revealed that the outbreak was mainly due to multiple strains that seemed patient specific, suggesting that the patients were more likely infected from a pool of diverse strains from the environment rather than from direct transmission among them. This study revealed the complexity of a Mucorales outbreak in the settings of IWM in burn patients, which has been highlighted based on WGS combined with careful sampling. Copyright © 2018 Garcia-Hermoso et al.
NASA Astrophysics Data System (ADS)
Kempf, M. J.; Chen, F.; Quigley, M. S.; Pillai, S.; Kern, R.; Venkateswaran, K.
2001-12-01
Hydrogen peroxide vapor is currently the sterilant-of-choice for flight hardware because it is a low-heat sterilization process suitable for use with various spacecraft components. Hydrogen peroxide is a strong oxidizing agent that produces hydroxyl free radicals ( .OH) which attack essential cell components, including lipids, proteins, and DNA. Planetary protection research efforts at the Jet Propulsion Laboratory (JPL) are focused on developing cleaning and sterilization technologies for spacecraft preparation prior to launch. These efforts include research to assess the microbial diversity of spacecraft assembly areas and any extreme characteristics these microbes might possess. Previous studies have shown that some heat-tolerant Bacillus species isolated from the JPL Spacecraft Assembly Facility (SAF) are resistant to recommended hydrogen peroxide vapor sterilization exposures. A Bacillus species, which was related to a hydrogen peroxide resistant strain, was repeatedly isolated from various locations in the JPL-SAF. This species was found in both unclassified (entrance floors, ante-room, and air-lock) and classified (class 100K) (floors, cabinet tops, and air) areas. The phylogenetic affiliation of these strains was carried out using biochemical tests and 16S rDNA sequencing. The 16S rDNA analysis showed >99% sequence similarity to Bacillus pumilus. In order to understand the epidemiology of these strains, a more highly evolved gene (topoisomerase II β -subunit, gyrB) was also sequenced. Among 4 clades, one cluster, comprised of 3 strains isolated from the air-lock area, tightly aligned with the B. pumilus ATCC 7061 type strain (97%). The gyrB sequence similarity of this clade was only 91% with the 3 other clades. The genetic relatedness of these strains, as per pulse field gel electrophoresis patterns, will be presented. The vegetative cells and spores of a number of isolates were tested for their hydrogen peroxide resistance. Cells and spores were separately treated with 5% liquid hydrogen peroxide. After 60 minutes of exposure, the samples were diluted in tryptic soy broth and incubated at 32oC. Vegetative cells of one of the isolates, FO-036b, were the only cells to survive the exposure to hydrogen peroxide. In contrast, spores of several of the isolates survived exposure to hydrogen peroxide. Spores of these isolates do not appear to have any obvious morphological changes. We are in the process of analyzing these hydrogen peroxide resistant spores and comparing them to spores of microbes that are not as hydrogen peroxide resistant. The impact and implications of the identification and recurrence of these hydrogen peroxide microbes, and their spores, will be discussed.
Luo, Chengwei; Tsementzi, Despina; Kyrpides, Nikos; Read, Timothy; Konstantinidis, Konstantinos T
2012-01-01
Next-generation sequencing (NGS) is commonly used in metagenomic studies of complex microbial communities but whether or not different NGS platforms recover the same diversity from a sample and their assembled sequences are of comparable quality remain unclear. We compared the two most frequently used platforms, the Roche 454 FLX Titanium and the Illumina Genome Analyzer (GA) II, on the same DNA sample obtained from a complex freshwater planktonic community. Despite the substantial differences in read length and sequencing protocols, the platforms provided a comparable view of the community sampled. For instance, derived assemblies overlapped in ~90% of their total sequences and in situ abundances of genes and genotypes (estimated based on sequence coverage) correlated highly between the two platforms (R(2)>0.9). Evaluation of base-call error, frameshift frequency, and contig length suggested that Illumina offered equivalent, if not better, assemblies than Roche 454. The results from metagenomic samples were further validated against DNA samples of eighteen isolate genomes, which showed a range of genome sizes and G+C% content. We also provide quantitative estimates of the errors in gene and contig sequences assembled from datasets characterized by different levels of complexity and G+C% content. For instance, we noted that homopolymer-associated, single-base errors affected ~1% of the protein sequences recovered in Illumina contigs of 10× coverage and 50% G+C; this frequency increased to ~3% when non-homopolymer errors were also considered. Collectively, our results should serve as a useful practical guide for choosing proper sampling strategies and data possessing protocols for future metagenomic studies.
Borelli, Guilherme; José, Juliana; Teixeira, Paulo José Pereira Lima; Dos Santos, Leandro Vieira; Pereira, Gonçalo Amarante Guimarães
2016-01-14
Candida boidinii and Candida sojae yeasts were isolated from energy cane bagasse and plague-insects. Both have fast xylose uptake rate and produce great amounts of xylitol, which are interesting features for food and 2G ethanol industries. Because they lack published genomes, we have sequenced and assembled them, offering new possibilities for gene prospection. Copyright © 2016 Borelli et al.
Li, Yong; Zhang, Weirui
2015-10-01
Microsatellite markers of Jasminum sambac (Oleaceae) were isolated to investigate wild germplasm resources and provide markers for breeding. Illumina sequencing was used to isolate microsatellite markers from the transcriptome of J. sambac. A total of 1322 microsatellites were identified from 49,772 assembled unigenes. One hundred primer pairs were randomly selected to verify primer amplification efficiency. Out of these tested primer pairs, 31 were successfully amplified: 18 primer pairs yielded a single allele, seven exhibited fixed heterozygosity with two alleles, and only six displayed polymorphisms. This study obtained the first set of microsatellite markers for J. sambac, which will be helpful for the assessment of wild germplasm resources and the development of molecular marker-assisted breeding.
Draft Genome Sequence of a Virulent Strain of Pasteurella Multocida Isolated From Alpaca
Hurtado, Raquel Enma; Aburjaile, Flavia; Mariano, Diego; Canário, Marcus Vinicius; Benevides, Leandro; Fernandez, Daniel Antonio; Allasi, Nataly Olivia; Rimac, Rocio; Juscamayta, Julio Eduardo; Maximiliano, Jorge Enrique; Rosadio, Raul Hector; Azevedo, Vasco; Maturrano, Lenin
2017-01-01
Pasteurella multocida is one of the most frequently isolated bacteria in acute pneumonia cases, being responsible for high mortality rates in Peruvian young alpacas, with consequent social and economic costs. Here we report the genome sequence of P. multocida strain UNMSM, isolated from the lung of an alpaca diagnosed with pneumonia, in Peru. The genome consists of 2,439,814 base pairs assembled into 82 contigs and 2,252 protein encoding genes, revealing the presence of known virulence-associated genes (ompH, ompA, tonB, tbpA, nanA, nanB, nanH, sodA, sodC, plpB and toxA). Further analysis could provide insights about bacterial pathogenesis and control strategies of this disease in Peruvian alpacas. PMID:28698737
Li, Xi; Sun, Long; Zhu, Yongze; Shen, Mengyuan; Tu, Yuexing
2018-04-14
The emergence of carbapenem-resistant Escherichia coli has become a serious challenge to manage in the clinic because of multidrug resistance. Here we report the draft genome sequence of NDM-3-producing E. coli strain NT1 isolated from a bloodstream infection in China. Whole genomic DNA of E. coli strain NT1 was extracted and was sequenced using an Illumina HiSeq™ X Ten platform. The generated sequence reads were assembled using CLC Genomics Workbench. The draft genome was annotated using Rapid Annotation using Subsystem Technology (RAST). Bioinformatics analysis was further performed. The genome size was calculated at 5,353 620bp, with 5297 protein-coding sequences and the presence of genes conferring resistance to aminoglycosides, β-lactams, quinolones, macrolides, phenicols, sulphonamides, tetracycline and trimethoprim. In addition, genes encoding virulence factors were also identified. To our knowledge, this is the first report of an E. coli strain producing NDM-3 isolated from a human bloodstream infection. The genome sequence will provide valuable information to understand antibiotic resistance mechanisms and pathogenic mechanisms in this strain. Close surveillance is urgently needed to monitor the spread of NDM-3-producing isolates. Copyright © 2018 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Grigorev, Kirill; Kliver, Sergey; Dobrynin, Pavel; Komissarov, Aleksey; Wolfsberger, Walter; Krasheninnikova, Ksenia; Afanador-Herna Ndez, Yashira M; Brandt, Adam L; Paulino, Liz A; Carreras, Rosanna; Rodríguez, Luis E; Nu N Ez, Adrell; Brandt, Jessica R; Silva, Filipe; Herna Ndez-Martich, J David; Majeske, Audrey J; Antunes, Agostinho; Roca, Alfred L; O'Brien, Stephen J; Martínez-Cruzado, Juan Carlos; Oleksyk, Taras K
2018-03-16
Solenodons are insectivores living in Hispaniola and Cuba that form an isolated branch in the tree of placental mammals highly divergent from other eulipothyplan insectivores The history, unique biology and adaptations of these enigmatic venomous species could be illuminated by the availability of genome data, but a whole genome assembly for solenodons has not been previously performed, partially due to the difficulty in obtaining samples from the field. Island isolation and reduced numbers have likely resulted in high homozygosity within the Hispaniolan solenodon (Solenodon paradoxus), thus we tested the performance of several assembly strategies on the genome of this genetically impoverished species. The string-graph based assembly strategy seemed a better choice compared to the conventional de Bruijn graph approach, due to the high levels of homozygosity, which is often a hallmark of endemic or endangered species. A consensus reference genome was assembled from sequences of five individuals from the southern subspecies (S. p. woodi). In addition, we obtained additional sequence from one sample of the northern subspecies (S. p. paradoxus). The resulting genome assemblies were compared to each other, and annotated for genes, with a specific emphasis on venom genes, repeats, variable microsatellite loci and other genomic variants. Phylogenetic positioning and selection signatures were inferred based on 4,416 single copy orthologs from 10 other mammals. We estimated that solenodons diverged from other extant mammals 73.6 Mya. Patterns of SNP variation allowed us to infer population demography, which supported a subspecies split within the Hispaniolan solenodon at least 300 Kya.
Genome sequence analysis of dengue virus 1 isolated in Key West, Florida.
Shin, Dongyoung; Richards, Stephanie L; Alto, Barry W; Bettinardi, David J; Smartt, Chelsea T
2013-01-01
Dengue virus (DENV) is transmitted to humans through the bite of mosquitoes. In November 2010, a dengue outbreak was reported in Monroe County in southern Florida (FL), including greater than 20 confirmed human cases. The virus collected from the human cases was verified as DENV serotype 1 (DENV-1) and one isolate was provided for sequence analysis. RNA was extracted from the DENV-1 isolate and was used in reverse transcription polymerase chain reaction (RT-PCR) to amplify PCR fragments to sequence. Nucleic acid primers were designed to generate overlapping PCR fragments that covered the entire genome. The DENV-1 isolate found in Key West (KW), FL was sequenced for whole genome characterization. Sequence assembly, Genbank searches, and recombination analyses were performed to verify the identity of the genome sequences and to determine percent similarity to known DENV-1 sequences. We show that the KW DENV-1 strain is 99% identical to Nicaraguan and Mexican DENV-1 strains. Phylogenetic and recombination analyses suggest that the DENV-1 isolated in KW originated from Nicaragua (NI) and the KW strain may circulate in KW. Also, recombination analysis results detected recombination events in the KW strain compared to DENV-1 strains from Puerto Rico. We evaluate the relative growth of KW strain of DENV-1 compared to other dengue viruses to determine whether the underlying genetics of the strain is associated with a replicative advantage, an important consideration since local transmission of DENV may result because domestic tourism can spread DENVs.
Singh, Deeksha; Chandrababunaidu, Mathu Malar; Panda, Arijit; Sen, Diya; Bhattacharyya, Sourav; Adhikary, Siba Prasad; Tripathy, Sucheta
2015-03-05
The draft genome assembly of Hassallia byssoidea strain VB512170 with a genome size of ~13 Mb and 10,183 protein-coding genes in 62 scaffolds is reported here for the first time. This is a terrestrial hydrophobic cyanobacterium isolated from monuments in India. We report several copies of luciferase and antibiotic genes in this organism. Copyright © 2015 Singh et al.
Draft Genome Sequence of Zobellia sp. Strain OII3, Isolated from the Coastal Zone of the Baltic Sea.
Harms, Henrik; Poehlein, Anja; Thürmer, Andrea; König, Gabriele M; Schäberle, Till F
2017-09-07
Zobellia sp. strain OII3 was isolated from a marine environmental sample due to its heterotrophic lifestyle, i.e., using Escherichia coli cells as prey. It shows strong agar-lytic activity. The genome was assembled into 41 contigs with a total size of 5.4 Mb, revealing the genetic basis for natural product biosynthesis. Copyright © 2017 Harms et al.
Nasser, Kother; Mustafa, Abu Salim; Khan, Mohd Wasif; Purohit, Prashant; Al-Obaid, Inaam; Dhar, Rita; Al-Fouzan, Wadha
2018-04-19
Acinetobacter baumannii is an important opportunistic pathogen in global health care settings. Its dissemination and multidrug resistance pose an issue with treatment and outbreak control. Here, we present draft genome assemblies of six multidrug-resistant clinical strains of A. baumannii isolated from patients admitted to one of two major hospitals in Kuwait. Copyright © 2018 Nasser et al.
USDA-ARS?s Scientific Manuscript database
We announce the draft genome assembly of Lactococcus garvieae str. PAQ102015-99, a recently isolated strain from an outbreak of lactococcosis at a commercial trout farm in the Northwestern US. The draft genome comprises 14 contigs totaling 2,068,357 bp with an N50 of 496,618 bp and average G+C conte...
Vashee, Sanjay; Stockwell, Timothy B; Alperovich, Nina; Denisova, Evgeniya A; Gibson, Daniel G; Cady, Kyle C; Miller, Kristofer; Kannan, Krishna; Malouli, Daniel; Crawford, Lindsey B; Voorhies, Alexander A; Bruening, Eric; Caposio, Patrizia; Früh, Klaus
2017-01-01
Genetic engineering of cytomegalovirus (CMV) currently relies on generating a bacterial artificial chromosome (BAC) by introducing a bacterial origin of replication into the viral genome using in vivo recombination in virally infected tissue culture cells. However, this process is inefficient, results in adaptive mutations, and involves deletion of viral genes to avoid oversized genomes when inserting the BAC cassette. Moreover, BAC technology does not permit the simultaneous manipulation of multiple genome loci and cannot be used to construct synthetic genomes. To overcome these limitations, we adapted synthetic biology tools to clone CMV genomes in Saccharomyces cerevisiae . Using an early passage of the human CMV isolate Toledo, we first applied transformation-associated recombination (TAR) to clone 16 overlapping fragments covering the entire Toledo genome in Saccharomyces cerevisiae . Then, we assembled these fragments by TAR in a stepwise process until the entire genome was reconstituted in yeast. Since next-generation sequence analysis revealed that the low-passage-number isolate represented a mixture of parental and fibroblast-adapted genomes, we selectively modified individual DNA fragments of fibroblast-adapted Toledo (Toledo-F) and again used TAR assembly to recreate parental Toledo (Toledo-P). Linear, full-length HCMV genomes were transfected into human fibroblasts to recover virus. Unlike Toledo-F, Toledo-P displayed characteristics of primary isolates, including broad cellular tropism in vitro and the ability to establish latency and reactivation in humanized mice. Our novel strategy thus enables de novo cloning of CMV genomes, more-efficient genome-wide engineering, and the generation of viral genomes that are partially or completely derived from synthetic DNA. IMPORTANCE The genomes of large DNA viruses, such as human cytomegalovirus (HCMV), are difficult to manipulate using current genetic tools, and at this time, it is not possible to obtain, molecular clones of CMV without extensive tissue culture. To overcome these limitations, we used synthetic biology tools to capture genomic fragments from viral DNA and assemble full-length genomes in yeast. Using an early passage of the HCMV isolate Toledo containing a mixture of wild-type and tissue culture-adapted virus. we directly cloned the majority sequence and recreated the minority sequence by simultaneous modification of multiple genomic regions. Thus, our novel approach provides a paradigm to not only efficiently engineer HCMV and other large DNA viruses on a genome-wide scale but also facilitates the cloning and genetic manipulation of primary isolates and provides a pathway to generating entirely synthetic genomes.
Vashee, Sanjay; Stockwell, Timothy B.; Alperovich, Nina; Denisova, Evgeniya A.; Gibson, Daniel G.; Cady, Kyle C.; Miller, Kristofer; Kannan, Krishna; Malouli, Daniel; Crawford, Lindsey B.; Voorhies, Alexander A.; Bruening, Eric; Caposio, Patrizia
2017-01-01
ABSTRACT Genetic engineering of cytomegalovirus (CMV) currently relies on generating a bacterial artificial chromosome (BAC) by introducing a bacterial origin of replication into the viral genome using in vivo recombination in virally infected tissue culture cells. However, this process is inefficient, results in adaptive mutations, and involves deletion of viral genes to avoid oversized genomes when inserting the BAC cassette. Moreover, BAC technology does not permit the simultaneous manipulation of multiple genome loci and cannot be used to construct synthetic genomes. To overcome these limitations, we adapted synthetic biology tools to clone CMV genomes in Saccharomyces cerevisiae. Using an early passage of the human CMV isolate Toledo, we first applied transformation-associated recombination (TAR) to clone 16 overlapping fragments covering the entire Toledo genome in Saccharomyces cerevisiae. Then, we assembled these fragments by TAR in a stepwise process until the entire genome was reconstituted in yeast. Since next-generation sequence analysis revealed that the low-passage-number isolate represented a mixture of parental and fibroblast-adapted genomes, we selectively modified individual DNA fragments of fibroblast-adapted Toledo (Toledo-F) and again used TAR assembly to recreate parental Toledo (Toledo-P). Linear, full-length HCMV genomes were transfected into human fibroblasts to recover virus. Unlike Toledo-F, Toledo-P displayed characteristics of primary isolates, including broad cellular tropism in vitro and the ability to establish latency and reactivation in humanized mice. Our novel strategy thus enables de novo cloning of CMV genomes, more-efficient genome-wide engineering, and the generation of viral genomes that are partially or completely derived from synthetic DNA. IMPORTANCE The genomes of large DNA viruses, such as human cytomegalovirus (HCMV), are difficult to manipulate using current genetic tools, and at this time, it is not possible to obtain, molecular clones of CMV without extensive tissue culture. To overcome these limitations, we used synthetic biology tools to capture genomic fragments from viral DNA and assemble full-length genomes in yeast. Using an early passage of the HCMV isolate Toledo containing a mixture of wild-type and tissue culture-adapted virus. we directly cloned the majority sequence and recreated the minority sequence by simultaneous modification of multiple genomic regions. Thus, our novel approach provides a paradigm to not only efficiently engineer HCMV and other large DNA viruses on a genome-wide scale but also facilitates the cloning and genetic manipulation of primary isolates and provides a pathway to generating entirely synthetic genomes. PMID:28989973
Das, Subhadeep; Singh, Deeksha; Madduluri, Madhavi; Chandrababunaidu, Mathu Malar; Gupta, Akash; Adhikary, Siba Prasad; Tripathy, Sucheta
2015-04-02
We report here the draft genome sequence of Tolypothrix campylonemoides VB511288, isolated from building facades in Santiniketan, India. The members of this genus produce several compounds of commercial importance. The draft assembly is 10,627,177 bases in 135 scaffolds, and it contains 7,886 protein-coding genes, 994 pseudogenes, 18 rRNA genes, and 76 tRNA genes. Copyright © 2015 Das et al.
Genome Sequence of Torulaspora delbrueckii NRRL Y-50541, Isolated from Mezcal Fermentation.
Gomez-Angulo, Jorge; Vega-Alvarado, Leticia; Escalante-García, Zazil; Grande, Ricardo; Gschaedler-Mathis, Anne; Amaya-Delgado, Lorena; Arrizon, Javier; Sanchez-Flores, Alejandro
2015-07-23
Torulaspora delbrueckii presents metabolic features interesting for biotechnological applications (in the dairy and wine industries). Recently, the T. delbrueckii CBS 1146 genome, which has been maintained under laboratory conditions since 1970, was published. Thus, a genome of a new mezcal yeast was sequenced and characterized and showed genetic differences and a higher genome assembly quality, offering a better reference genome. Copyright © 2015 Gomez-Angulo et al.
Pathak, Ashish; Chauhan, Ashvini; Ewida, Ayman Y.I.; Stothard, Paul
2016-01-01
We recently isolated Micrococcus sp. strain 2385 from Ochlockonee River, Florida and demonstrated potent biodegradative activity against two commonly used pesticides- alachlor [(2-chloro-2`,6`-diethylphenyl-N (methoxymethyl)acetanilide)] and endosulfan [(6,7,8,9,10,10-hexachloro-1,5,5a,6,9,9a-hexahydro-6,9methano-2,3,4-benzo(e)di-oxathiepin-3-oxide], respectively. To further identify the repertoire of metabolic functions possessed by strain 2385, a draft genome sequence was obtained, assembled, annotated and analyzed. The genome sequence of Micrococcus sp. strain 2385 consisted of 1,460,461,440 bases which assembled into 175 contigs with an N50 contig length of 50,109 bases and a coverage of 600x. The genome size of this strain was estimated at 2,431,226 base pairs with a G+C content of 72.8 and a total number of 2,268 putative genes. RAST annotated a total of 340 subsystems in the genome of strain 2385 along with the presence of 2,177 coding sequences. A genome wide survey indicated that that strain 2385 harbors a plethora of genes to degrade other pollutants including caprolactam, PAHs (such as naphthalene), styrene, toluene and several chloroaromatic compounds. PMID:27672405
Pathak, Ashish; Chauhan, Ashvini; Ewida, Ayman Y I; Stothard, Paul
2016-01-01
We recently isolated Micrococcus sp. strain 2385 from Ochlockonee River, Florida and demonstrated potent biodegradative activity against two commonly used pesticides- alachlor [(2-chloro-2`,6`-diethylphenyl-N (methoxymethyl)acetanilide)] and endosulfan [(6,7,8,9,10,10-hexachloro-1,5,5a,6,9,9a-hexahydro-6,9methano-2,3,4-benzo(e)di-oxathiepin-3-oxide], respectively. To further identify the repertoire of metabolic functions possessed by strain 2385, a draft genome sequence was obtained, assembled, annotated and analyzed. The genome sequence of Micrococcus sp. strain 2385 consisted of 1,460,461,440 bases which assembled into 175 contigs with an N50 contig length of 50,109 bases and a coverage of 600x. The genome size of this strain was estimated at 2,431,226 base pairs with a G+C content of 72.8 and a total number of 2,268 putative genes. RAST annotated a total of 340 subsystems in the genome of strain 2385 along with the presence of 2,177 coding sequences. A genome wide survey indicated that that strain 2385 harbors a plethora of genes to degrade other pollutants including caprolactam, PAHs (such as naphthalene), styrene, toluene and several chloroaromatic compounds.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thorell, Kaisa; Hosseini, Shaghayegh; Palacios Gonzales, Reyna Victoria Palacios
In this study, Helicobacter pylori (H. pylori) is one of the most common bacterial infections in humans and this infection can lead to gastric ulcers and gastric cancer. H. pylori is one of the most genetically variable human pathogens and the ability of the bacterium to bind to the host epithelium as well as the presence of different virulence factors and genetic variants within these genes have been associated with disease severity. Nicaragua has particularly high gastric cancer incidence and we therefore studied Nicaraguan clinical H. pylori isolates for factors that could contribute to cancer risk. The complete genomes ofmore » fifty-two Nicaraguan H. pylorii isolates were sequenced and assembled de novo, and phylogenetic and virulence factor analyses were performed. The Nicaraguan isolates showed phylogenetic relationship with West African isolates in whole-genome sequence comparisons and with Western and urban South-and Central American isolates using MLSA (Multi-locus sequence analysis). A majority, 77 % of the isolates carried the cancer-associated virulence gene cagA and also the s1/i1/m1 vacuolating cytotoxin, vacA allele combination, which is linked to increased severity of disease. Specifically, we also found that Nicaraguan isolates have a blood group-binding adhesin (BabA) variant highly similar to previously reported BabA sequences from Latin America, including from isolates belonging to other phylogenetic groups. These BabA sequences were found to be under positive selection at several amino acid positions that differed from the global collection of isolates. In conclusion, the discovery of a Latin American BabA variant, independent of overall phylogenetic background, suggests hitherto unknown host or environmental factors within the Latin American population giving H. pylori isolates carrying this adhesin variant a selective advantage, which could affect pathogenesis and risk for sequelae through specific adherence properties.« less
Sánchez-Nieves, Rubén; Facciotti, Marc; Saavedra-Collado, Sofía; Dávila-Santiago, Lizbeth; Rodríguez-Carrero, Roy; Montalvo-Rodríguez, Rafael
2016-03-01
The genus Haloarcula belongs to the family Halobacteriaceae which currently has 10 valid species. Here we report the draft genome sequence of strain SL3, a new species within this genus, isolated from the Solar Salterns of Cabo Rojo, Puerto Rico. Genome assembly performed using NGEN Assembler resulted in 18 contigs (N50 = 601,911 bp), the largest of which contains 1,023,775 bp. The genome consists of 3.97 MB and has a GC content of 61.97%. Like all species of Haloarcula, the genome encodes heterogeneous copies of the small subunit ribosomal RNA. In addition, the genome includes 6 rRNAs, 48 tRNAs, and 3797 protein coding sequences. Several carbohydrate-active enzymes genes were found, as well as enzymes involved in the dihydroxyacetone processing pathway which are not found in other Haloarcula species. The NCBI accession number for this genome is LIUF00000000 and the strain deposit number is CECT9001.
Glynn, Neil C; Comstock, Jack C; Sood, Sushma G; Dang, Phat M; Chaparro, Jose X
2008-01-01
Resistance gene analogues (RGAs) have been isolated from many crops and offer potential in breeding for disease resistance through marker-assisted selection, either as closely linked or as perfect markers. Many R-gene sequences contain kinase domains, and indeed kinase genes have been reported as being proximal to R-genes, making kinase analogues an additionally promising target. The first step towards utilizing RGAs as markers for disease resistance is isolation and characterization of the sequences. Sugarcane clone US01-1158 was identified as resistant to yellow leaf caused by the sugarcane yellow leaf virus (SCYLV) and moderately resistant to rust caused by Puccinia melanocephala Sydow & Sydow. Degenerate primers that had previously proved useful for isolating RGAs and kinase analogues in wheat and soybean were used to amplify DNA from sugarcane (Saccharum spp.) clone US-01-1158. Sequences generated from 1512 positive clones were assembled into 134 contigs of between two and 105 sequences. Comparison of the contig consensuses with the NCBI sequence database using BLASTx showed that 20 had sequence homology to nuclear binding site and leucine rich repeat (NBS-LRR) RGAs, and eight to kinase genes. Alignment of the deduced amino acid sequences with similar sequences from the NCBI database allowed the identification of several conserved domains. The alignment and resulting phenetic tree showed that many of the sequences had greater similarity to sequences from other species than to one another. The use of degenerate primers is a useful method for isolating novel sugarcane RGA and kinase gene analogues. Further studies are needed to evaluate the role of these genes in disease resistance.
Owen, Joseph R.; Noyes, Noelle; Young, Amy E.; Prince, Daniel J.; Blanchard, Patricia C.; Lehenbauer, Terry W.; Aly, Sharif S.; Davis, Jessica H.; O’Rourke, Sean M.; Abdo, Zaid; Belk, Keith; Miller, Michael R.; Morley, Paul; Van Eenennaam, Alison L.
2017-01-01
Extended laboratory culture and antimicrobial susceptibility testing timelines hinder rapid species identification and susceptibility profiling of bacterial pathogens associated with bovine respiratory disease, the most prevalent cause of cattle mortality in the United States. Whole-genome sequencing offers a culture-independent alternative to current bacterial identification methods, but requires a library of bacterial reference genomes for comparison. To contribute new bacterial genome assemblies and evaluate genetic diversity and variation in antimicrobial resistance genotypes, whole-genome sequencing was performed on bovine respiratory disease–associated bacterial isolates (Histophilus somni, Mycoplasma bovis, Mannheimia haemolytica, and Pasteurella multocida) from dairy and beef cattle. One hundred genomically distinct assemblies were added to the NCBI database, doubling the available genomic sequences for these four species. Computer-based methods identified 11 predicted antimicrobial resistance genes in three species, with none being detected in M. bovis. While computer-based analysis can identify antibiotic resistance genes within whole-genome sequences (genotype), it may not predict the actual antimicrobial resistance observed in a living organism (phenotype). Antimicrobial susceptibility testing on 64 H. somni, M. haemolytica, and P. multocida isolates had an overall concordance rate between genotype and phenotypic resistance to the associated class of antimicrobials of 72.7% (P < 0.001), showing substantial discordance. Concordance rates varied greatly among different antimicrobial, antibiotic resistance gene, and bacterial species combinations. This suggests that antimicrobial susceptibility phenotypes are needed to complement genomically predicted antibiotic resistance gene genotypes to better understand how the presence of antibiotic resistance genes within a given bacterial species could potentially impact optimal bovine respiratory disease treatment and morbidity/mortality outcomes. PMID:28739600
Möbius, Petra; Hölzer, Martin; Felder, Marius; Nordsiek, Gabriele; Groth, Marco; Köhler, Heike; Reichwald, Kathrin; Platzer, Matthias; Marz, Manja
2015-01-01
Mycobacterium avium (M. a.) subsp. paratuberculosis (MAP)—the etiologic agent of Johne’s disease—affects cattle, sheep, and other ruminants worldwide. To decipher phenotypic differences among sheep and cattle strains (belonging to MAP-S [Type-I/III], respectively, MAP-C [Type-II]), comparative genome analysis needs data from diverse isolates originating from different geographic regions of the world. This study presents the so far best assembled genome of a MAP-S-strain: Sheep isolate JIII-386 from Germany. One newly sequenced cattle isolate (JII-1961, Germany), four published MAP strains of MAP-C and MAP-S from the United States and Australia, and M. a. subsp. hominissuis (MAH) strain 104 were used for assembly improvement and comparisons. All genomes were annotated by BacProt and results compared with NCBI (National Center for Biotechnology Information) annotation. Corresponding protein-coding sequences (CDSs) were detected, but also CDSs that were exclusively determined by either NCBI or BacProt. A new Shine–Dalgarno sequence motif (5′-AGCTGG-3′) was extracted. Novel CDSs including PE-PGRS family protein genes and about 80 noncoding RNAs exhibiting high sequence conservation are presented. Previously found genetic differences between MAP-types are partially revised. Four of ten assumed MAP-S-specific large sequence polymorphism regions (LSPSs) are still present in MAP-C strains; new LSPSs were identified. Independently of the regional origin of the strains, the number of individual CDSs and single nucleotide variants confirms the strong similarity of MAP-C strains and shows higher diversity among MAP-S strains. This study gives ambiguous results regarding the hypothesis that MAP-S is the evolutionary intermediate between MAH and MAP-C, but it clearly shows a higher similarity of MAP to MAH than to Mycobacterium intracellulare. PMID:26384038
Owen, Joseph R; Noyes, Noelle; Young, Amy E; Prince, Daniel J; Blanchard, Patricia C; Lehenbauer, Terry W; Aly, Sharif S; Davis, Jessica H; O'Rourke, Sean M; Abdo, Zaid; Belk, Keith; Miller, Michael R; Morley, Paul; Van Eenennaam, Alison L
2017-09-07
Extended laboratory culture and antimicrobial susceptibility testing timelines hinder rapid species identification and susceptibility profiling of bacterial pathogens associated with bovine respiratory disease, the most prevalent cause of cattle mortality in the United States. Whole-genome sequencing offers a culture-independent alternative to current bacterial identification methods, but requires a library of bacterial reference genomes for comparison. To contribute new bacterial genome assemblies and evaluate genetic diversity and variation in antimicrobial resistance genotypes, whole-genome sequencing was performed on bovine respiratory disease-associated bacterial isolates ( Histophilus somni , Mycoplasma bovis , Mannheimia haemolytica , and Pasteurella multocida ) from dairy and beef cattle. One hundred genomically distinct assemblies were added to the NCBI database, doubling the available genomic sequences for these four species. Computer-based methods identified 11 predicted antimicrobial resistance genes in three species, with none being detected in M. bovis While computer-based analysis can identify antibiotic resistance genes within whole-genome sequences (genotype), it may not predict the actual antimicrobial resistance observed in a living organism (phenotype). Antimicrobial susceptibility testing on 64 H. somni , M. haemolytica , and P. multocida isolates had an overall concordance rate between genotype and phenotypic resistance to the associated class of antimicrobials of 72.7% ( P < 0.001), showing substantial discordance. Concordance rates varied greatly among different antimicrobial, antibiotic resistance gene, and bacterial species combinations. This suggests that antimicrobial susceptibility phenotypes are needed to complement genomically predicted antibiotic resistance gene genotypes to better understand how the presence of antibiotic resistance genes within a given bacterial species could potentially impact optimal bovine respiratory disease treatment and morbidity/mortality outcomes. Copyright © 2017 Owen et al.
Thorell, Kaisa; Hosseini, Shaghayegh; Palacios Gonzales, Reyna Victoria Palacios; ...
2016-02-29
In this study, Helicobacter pylori (H. pylori) is one of the most common bacterial infections in humans and this infection can lead to gastric ulcers and gastric cancer. H. pylori is one of the most genetically variable human pathogens and the ability of the bacterium to bind to the host epithelium as well as the presence of different virulence factors and genetic variants within these genes have been associated with disease severity. Nicaragua has particularly high gastric cancer incidence and we therefore studied Nicaraguan clinical H. pylori isolates for factors that could contribute to cancer risk. The complete genomes ofmore » fifty-two Nicaraguan H. pylorii isolates were sequenced and assembled de novo, and phylogenetic and virulence factor analyses were performed. The Nicaraguan isolates showed phylogenetic relationship with West African isolates in whole-genome sequence comparisons and with Western and urban South-and Central American isolates using MLSA (Multi-locus sequence analysis). A majority, 77 % of the isolates carried the cancer-associated virulence gene cagA and also the s1/i1/m1 vacuolating cytotoxin, vacA allele combination, which is linked to increased severity of disease. Specifically, we also found that Nicaraguan isolates have a blood group-binding adhesin (BabA) variant highly similar to previously reported BabA sequences from Latin America, including from isolates belonging to other phylogenetic groups. These BabA sequences were found to be under positive selection at several amino acid positions that differed from the global collection of isolates. In conclusion, the discovery of a Latin American BabA variant, independent of overall phylogenetic background, suggests hitherto unknown host or environmental factors within the Latin American population giving H. pylori isolates carrying this adhesin variant a selective advantage, which could affect pathogenesis and risk for sequelae through specific adherence properties.« less
Oeo-Santos, Carmen; Mas, Salvador; Benedé, Sara; López-Lucendo, María; Quiralte, Joaquín; Blanca, Miguel; Mayorga, Cristobalina; Villalba, Mayte; Barderas, Rodrigo
2018-06-05
The allergenic non-specific lipid transfer protein Ole e 7 from olive pollen is a major allergen associated with severe symptoms in areas with high olive pollen levels. Despite its clinical importance, its cloning and recombinant production has been unable by classical approaches. This study aimed at determining by mass-spectrometry based proteomics its complete amino acid sequence for its subsequent expression and characterization. To this end, the natural protein was in-2D-gel tryptic digested, and CID and HCD fragmentation spectra obtained by nLC-MS/MS analyzed using PEAKS software. Thirteen out of the 457 de novo sequenced peptides obtained allowed assembling its full-length amino acid sequence. Then, Ole e 7-encoding cDNA was synthesized and cloned in pPICZαA vector for its expression in Pichia pastoris yeast. The analyses by Circular Dichroism, and WB, ELISA and cell-based tests using sera and blood from olive pollen-sensitized patients showed that rOle e 7 mostly retained the structural, allergenic and antigenic properties of the natural allergen. In summary, rOle e 7 allergen assembled by de novo peptide sequencing by MS behaved immunologically similar to the natural allergen scarcely isolated from pollen. Olive pollen is an important cause of allergy. The non-specific lipid binding protein Ole e 7 is a major allergen with a high incidence and a phenotype associated to severe clinical symptoms. Despite its relevance, its cloning and recombinant expression has been unable by classical techniques. Here, we have inferred the primary amino acid sequence of Ole e 7 by mass-spectrometry. We separated Ole e 7 isolated from pollen by 2DE. After in-gel digestion with trypsin and a direct analysis by nLC-MS/MS in an LTQ-Orbitrap Velos, we got the complete de novo sequenced peptides repertoire that allowed the assembling of the primary sequence of Ole e 7. After its protein expression, purification to homogeneity, and structural and immunological characterization using sera from olive pollen allergic patients and cell-based assays, we observed that the recombinant allergen retained the antigenic and allergenic properties of the natural allergen. Collectively, we show that the recombinant protein assembled by proteomics would be suitable for a better in vitro diagnosis of olive pollen allergic patients. Copyright © 2018. Published by Elsevier B.V.
Sadsad, Rosemarie; Martinez, Elena; Jelfs, Peter; Hill-Cawthorne, Grant A.; Gilbert, Gwendolyn L.; Marais, Ben J.; Sintchenko, Vitali
2016-01-01
Background Improved tuberculosis control and the need to contain the spread of drug-resistant strains provide a strong rationale for exploring tuberculosis transmission dynamics at the population level. Whole-genome sequencing provides optimal strain resolution, facilitating detailed mapping of potential transmission pathways. Methods We sequenced 22 isolates from a Mycobacterium tuberculosis cluster in New South Wales, Australia, identified during routine 24-locus mycobacterial interspersed repetitive unit typing. Following high-depth paired-end sequencing using the Illumina HiSeq 2000 platform, two independent pipelines were employed for analysis, both employing read mapping onto reference genomes as well as de novo assembly, to control biases in variant detection. In addition to single-nucleotide polymorphisms, the analyses also sought to identify insertions, deletions and structural variants. Results Isolates were highly similar, with a distance of 13 variants between the most distant members of the cluster. The most sensitive analysis classified the 22 isolates into 18 groups. Four of the isolates did not appear to share a recent common ancestor with the largest clade; another four isolates had an uncertain ancestral relationship with the largest clade. Conclusion Whole genome sequencing, with analysis of single-nucleotide polymorphisms, insertions, deletions, structural variants and subpopulations, enabled the highest possible level of discrimination between cluster members, clarifying likely transmission pathways and exposing the complexity of strain origin. The analysis provides a basis for targeted public health intervention and enhanced classification of future isolates linked to the cluster. PMID:26938641
Chauhan, Sushma; Rahman, Hifzur; Mastan, Shaik G; Pamidimarri, D V N Sudheer; Reddy, Muppala P
2018-07-20
Begomoviruses belong to the family Geminiviridae are associated with several disease symptoms, such as mosaic and leaf curling in Jatropha curcas. The molecular characterization of these viral strains will help in developing management strategies to control the disease. In this study, J. curcas that was infected with begomovirus and showed acute leaf curling symptoms were identified. DNA-A segment from pathogenic viral strain was isolated and sequenced. The sequenced genome was assembled and characterized in detail. The full-length DNA-A sequence was covered by primer walking. The genome sequence showed the general organization of DNA-A from begomovirus by the distribution of ORFs in both viral and anti-viral strands. The genome size ranged from 2844 bp-2852 bp. Three strains with minor nucleotide variations were identified, and a phylogenetic analysis was performed by comparing the DNA-A segments from other reported begomovirus isolates. The maximum sequence similarity was observed with Euphorbia yellow mosaic virus (FN435995). In the phylogenetic tree, no clustering was observed with previously reported begomovirus strains isolated from J. curcas host. The strains isolated in this study belong to new begomoviral strain that elicits symptoms of leaf curling in J. curcas. The results indicate that the probable origin of the strains is from Jatropha mosaic virus infecting J. gassypifolia. The strains isolated in this study are referred as Jatropha curcas leaf curl India virus (JCLCIV) based on the major symptoms exhibited by host J. curcas. Copyright © 2018 Elsevier B.V. All rights reserved.
Li, Yong; Zhang, Weirui
2015-01-01
Premise of the study: Microsatellite markers of Jasminum sambac (Oleaceae) were isolated to investigate wild germplasm resources and provide markers for breeding. Methods and Results: Illumina sequencing was used to isolate microsatellite markers from the transcriptome of J. sambac. A total of 1322 microsatellites were identified from 49,772 assembled unigenes. One hundred primer pairs were randomly selected to verify primer amplification efficiency. Out of these tested primer pairs, 31 were successfully amplified: 18 primer pairs yielded a single allele, seven exhibited fixed heterozygosity with two alleles, and only six displayed polymorphisms. Conclusions: This study obtained the first set of microsatellite markers for J. sambac, which will be helpful for the assessment of wild germplasm resources and the development of molecular marker–assisted breeding. PMID:26504683
Carbohydrate active enzymes revealed in Coptotermes formosanus transcriptome
USDA-ARS?s Scientific Manuscript database
A normalized cDNA library of Coptotermes formosanus was constructed using mixed RNA isolated from workers, soldiers, nymphs and alates of both sexes. Sequencing of this library generated 131,637 EST and 25,939 unigenes were assembled. Carbohydrate active enzymes (CAZymes) revealed in this library we...
Moura, Quézia; Fernandes, Miriam R; Cerdeira, Louise; Santos, Ana Carolina M; de Souza, Tiago A; Ienne, Susan; Pignatari, Antonio Carlos C; Gales, Ana C; Silva, Rosa M; Lincopan, Nilton
2017-09-01
Here we report the draft genome sequence of a multidrug-resistant (MDR) Aeromonas hydrophila strain belonging to sequence type 508 (ST508) isolated from a human bloodstream infection. Assembly and annotation of this draft genome resulted in 5028498bp and revealed the presence of 16S rRNA methylase rmtD and bla CTX-M-131 genes encoding high-level resistance to aminoglycosides and cephalosporins, respectively, as well as multiple virulence genes. This draft genome can provide significant information for understanding mechanisms on the establishment and treatment of infections caused by this pathogen. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Xia, Chongjing; Wang, Meinan; Yin, Chuntao; Cornejo, Omar E; Hulbert, Scot; Chen, Xianming
2018-05-24
Puccinia striiformis f. sp. tritici (Pst) causes devastating stripe (yellow) rust on wheat and P. striiformis f. sp. hordei (Psh) causes stripe rust on barley. Several Pst genomes are available, but no Psh genome is available. More genomes of Pst and Psh are needed to understand the genome evolution and molecular mechanisms of their pathogenicity. We sequenced Pst isolate 93-210 and Psh isolate 93TX-2 using PacBio and Illumina technologies, and RNA sequencing. Their genomic sequences were assembled to contigs with high continuity and showed significant structural differences. The circular mitochondria genomes of both were complete. These genomes provide high-quality resources for deciphering the genomic basis of rapid evolution and host adaptation, identifying genes for avirulence and other important traits, and studying host-pathogen interaction.
Kamada, Mayumi; Hase, Sumitaka; Fujii, Kazushi; Miyake, Masato; Sato, Kengo; Kimura, Keitarou; Sakakibara, Yasubumi
2015-01-01
Bacillus subtilis is the main component in the fermentation of soybeans. To investigate the genetics of the soybean-fermenting B. subtilis strains and its relationship with the productivity of extracellular poly-γ-glutamic acid (γPGA), we sequenced the whole genome of eight B. subtilis stains isolated from non-salted fermented soybean foods in Southeast Asia. Assembled nucleotide sequences were compared with those of a natto (fermented soybean food) starter strain B. subtilis BEST195 and the laboratory standard strain B. subtilis 168 that is incapable of γPGA production. Detected variants were investigated in terms of insertion sequences, biotin synthesis, production of subtilisin NAT, and regulatory genes for γPGA synthesis, which were related to fermentation process. Comparing genome sequences, we found that the strains that produce γPGA have a deletion in a protein that constitutes the flagellar basal body, and this deletion was not found in the non-producing strains. We further identified diversity in variants of the bio operon, which is responsible for the biotin auxotrophism of the natto starter strains. Phylogenetic analysis using multilocus sequencing typing revealed that the B. subtilis strains isolated from the non-salted fermented soybeans were not clustered together, while the natto-fermenting strains were tightly clustered; this analysis also suggested that the strain isolated from "Tua Nao" of Thailand traces a different evolutionary process from other strains.
Sequencing and comparing whole mitochondrial genomes ofanimals
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica
2005-04-22
Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based onmore » our experiences to date with determining and comparing complete mtDNA sequences.« less
Miller, Marisa E; Zhang, Ying; Omidvar, Vahid; Sperschneider, Jana; Schwessinger, Benjamin; Raley, Castle; Palmer, Jonathan M; Garnica, Diana; Upadhyaya, Narayana; Rathjen, John; Taylor, Jennifer M; Park, Robert F; Dodds, Peter N; Hirsch, Cory D; Kianian, Shahryar F; Figueroa, Melania
2018-02-20
Oat crown rust, caused by the fungus Pucinnia coronata f. sp. avenae , is a devastating disease that impacts worldwide oat production. For much of its life cycle, P. coronata f. sp. avenae is dikaryotic, with two separate haploid nuclei that may vary in virulence genotype, highlighting the importance of understanding haplotype diversity in this species. We generated highly contiguous de novo genome assemblies of two P. coronata f. sp. avenae isolates, 12SD80 and 12NC29, from long-read sequences. In total, we assembled 603 primary contigs for 12SD80, for a total assembly length of 99.16 Mbp, and 777 primary contigs for 12NC29, for a total length of 105.25 Mbp; approximately 52% of each genome was assembled into alternate haplotypes. This revealed structural variation between haplotypes in each isolate equivalent to more than 2% of the genome size, in addition to about 260,000 and 380,000 heterozygous single-nucleotide polymorphisms in 12SD80 and 12NC29, respectively. Transcript-based annotation identified 26,796 and 28,801 coding sequences for isolates 12SD80 and 12NC29, respectively, including about 7,000 allele pairs in haplotype-phased regions. Furthermore, expression profiling revealed clusters of coexpressed secreted effector candidates, and the majority of orthologous effectors between isolates showed conservation of expression patterns. However, a small subset of orthologs showed divergence in expression, which may contribute to differences in virulence between 12SD80 and 12NC29. This study provides the first haplotype-phased reference genome for a dikaryotic rust fungus as a foundation for future studies into virulence mechanisms in P. coronata f. sp. avenae IMPORTANCE Disease management strategies for oat crown rust are challenged by the rapid evolution of Puccinia coronata f. sp. avenae , which renders resistance genes in oat varieties ineffective. Despite the economic importance of understanding P. coronata f. sp. avenae , resources to study the molecular mechanisms underpinning pathogenicity and the emergence of new virulence traits are lacking. Such limitations are partly due to the obligate biotrophic lifestyle of P. coronata f. sp. avenae as well as the dikaryotic nature of the genome, features that are also shared with other important rust pathogens. This study reports the first release of a haplotype-phased genome assembly for a dikaryotic fungal species and demonstrates the amenability of using emerging technologies to investigate genetic diversity in populations of P. coronata f. sp. avenae . Copyright © 2018 Miller et al.
Barley whole exome capture: a tool for genomic research in the genus Hordeum and beyond
Mascher, Martin; Richmond, Todd A; Gerhardt, Daniel J; Himmelbach, Axel; Clissold, Leah; Sampath, Dharanya; Ayling, Sarah; Steuernagel, Burkhard; Pfeifer, Matthias; D'Ascenzo, Mark; Akhunov, Eduard D; Hedley, Pete E; Gonzales, Ana M; Morrell, Peter L; Kilian, Benjamin; Blattner, Frank R; Scholz, Uwe; Mayer, Klaus FX; Flavell, Andrew J; Muehlbauer, Gary J; Waugh, Robbie; Jeddeloh, Jeffrey A; Stein, Nils
2013-01-01
Advanced resources for genome-assisted research in barley (Hordeum vulgare) including a whole-genome shotgun assembly and an integrated physical map have recently become available. These have made possible studies that aim to assess genetic diversity or to isolate single genes by whole-genome resequencing and in silico variant detection. However such an approach remains expensive given the 5 Gb size of the barley genome. Targeted sequencing of the mRNA-coding exome reduces barley genomic complexity more than 50-fold, thus dramatically reducing this heavy sequencing and analysis load. We have developed and employed an in-solution hybridization-based sequence capture platform to selectively enrich for a 61.6 megabase coding sequence target that includes predicted genes from the genome assembly of the cultivar Morex as well as publicly available full-length cDNAs and de novo assembled RNA-Seq consensus sequence contigs. The platform provides a highly specific capture with substantial and reproducible enrichment of targeted exons, both for cultivated barley and related species. We show that this exome capture platform provides a clear path towards a broader and deeper understanding of the natural variation residing in the mRNA-coding part of the barley genome and will thus constitute a valuable resource for applications such as mapping-by-sequencing and genetic diversity analyzes. PMID:23889683
Li, Zhigang; Hu, Songnian; Yao, Nan; Dean, Ralph A.; Zhao, Wensheng; Shen, Mi; Zhang, Haiwang; Li, Chao; Liu, Liyuan; Cao, Lei; Xu, Xiaowen; Xing, Yunfei; Hsiang, Tom; Zhang, Ziding; Xu, Jin-Rong; Peng, You-Liang
2012-01-01
Rice blast caused by Magnaporthe oryzae is one of the most destructive diseases of rice worldwide. The fungal pathogen is notorious for its ability to overcome host resistance. To better understand its genetic variation in nature, we sequenced the genomes of two field isolates, Y34 and P131. In comparison with the previously sequenced laboratory strain 70-15, both field isolates had a similar genome size but slightly more genes. Sequences from the field isolates were used to improve genome assembly and gene prediction of 70-15. Although the overall genome structure is similar, a number of gene families that are likely involved in plant-fungal interactions are expanded in the field isolates. Genome-wide analysis on asynonymous to synonymous nucleotide substitution rates revealed that many infection-related genes underwent diversifying selection. The field isolates also have hundreds of isolate-specific genes and a number of isolate-specific gene duplication events. Functional characterization of randomly selected isolate-specific genes revealed that they play diverse roles, some of which affect virulence. Furthermore, each genome contains thousands of loci of transposon-like elements, but less than 30% of them are conserved among different isolates, suggesting active transposition events in M. oryzae. A total of approximately 200 genes were disrupted in these three strains by transposable elements. Interestingly, transposon-like elements tend to be associated with isolate-specific or duplicated sequences. Overall, our results indicate that gain or loss of unique genes, DNA duplication, gene family expansion, and frequent translocation of transposon-like elements are important factors in genome variation of the rice blast fungus. PMID:22876203
2013-01-01
Background In this report we have explored the genomic and microbiological basis for a sustained increase in bloodstream infections at a major Australian hospital caused by Enterococcus faecium multi-locus sequence type (ST) 203, an outbreak strain that has largely replaced a predecessor ST17 sequence type. Results To establish a ST203 reference sequence we fully assembled and annotated the genome of Aus0085, a 2009 vancomycin-resistant Enterococcus faecium (VREfm) bloodstream isolate, and the first example of a completed ST203 genome. Aus0085 has a 3.2 Mb genome, comprising a 2.9 Mb circular chromosome and six circular plasmids (2 kb–130 kb). Twelve percent of the 3222 coding sequences (CDS) in Aus0085 are not present in ST17 E. faecium Aus0004 and ST18 E. faecium TX16. Extending this comparison to an additional 12 ST17 and 14 ST203 E. faecium hospital isolate genomes revealed only six genomic regions spanning 41 kb that were present in all ST203 and absent from all ST17 genomes. The 40 CDS have predicted functions that include ion transport, riboflavin metabolism and two phosphotransferase systems. Comparison of the vancomycin resistance-conferring Tn1549 transposon between Aus0004 and Aus0085 revealed differences in transposon length and insertion site, and van locus sequence variation that correlated with a higher vancomycin MIC in Aus0085. Additional phenotype comparisons between ST17 and ST203 isolates showed that while there were no differences in biofilm-formation and killing of Galleria mellonella, ST203 isolates grew significantly faster and out-competed ST17 isolates in growth assays. Conclusions Here we have fully assembled and annotated the first ST203 genome, and then characterized the genomic differences between ST17 and ST203 E. faecium. We also show that ST203 E. faecium are faster growing and can out-compete ST17 E. faecium. While a causal genetic basis for these phenotype differences is not provided here, this study revealed conserved genetic differences between the two clones, differences that can now be tested to explain the molecular basis for the success and emergence of ST203 E. faecium. PMID:24004955
[Sequencing and analysis of the complete genome of a rabies virus isolate from Sika deer].
Zhao, Yun-Jiao; Guo, Li; Huang, Ying; Zhang, Li-Shi; Qian, Ai-Dong
2008-05-01
One DRV strain was isolated from Sika Deer brain and sequenced. Nine overlapped gene fragments were amplified by RT-PCR through 3'-RACE and 5'-RACE method, and the complete DRV genome sequence was assembled. The length of the complete genome is 11863bp. The DRV genome organization was similar to other rabies viruses which were composed of five genes and the initiation sites and termination sites were highly conservative. There were mutated amino acids in important antigen sites of nucleoprotein and glycoprotein. The nucleotide and amino acid homologies of gene N, P, M, G, L in strains with completed genomie sequencing were compared. Compared with N gene sequence of other typical rabies viruses, a phylogenetic tree was established . These results indicated that DRV belonged to gene type 1. The highest homology compared with Chinese vaccine strain 3aG was 94%, and the lowest was 71% compared with WCBV. These findings provided theoretical reference for further research in rabies virus.
Batista, Thiago M; Moreira, Rennan G; Hilário, Heron O; Morais, Camila G; Franco, Glória R; Rosa, Luiz H; Rosa, Carlos A
2017-03-01
We present the draft genome sequence of the type strain of the yeast Sugiyamaella xylanicola UFMG-CM-Y1884 T (= UFMG-CA-32.1 T = CBS 12683 T ), a xylan-degrading species capable of fermenting d-xylose to ethanol. The assembled genome has a size of ~ 13.7 Mb and a GC content of 33.8% and contains 5971 protein-coding genes. We identified 15 genes with significant similarity to the d-xylose reductase gene from several other fungal species. The draft genome assembled from whole-genome shotgun sequencing of the yeast Sugiyamaella xylanicola UFMG-CM-Y1884 T (= UFMG-CA-32.1 T = CBS 12683 T ) has been deposited at DDBJ/ENA/GenBank under the accession number MQSX00000000 under version MQSX01000000.
Hodgins, Kathryn A; Lai, Zhao; Oliveira, Luiz O; Still, David W; Scascitelli, Moira; Barker, Michael S; Kane, Nolan C; Dempewolf, Hannes; Kozik, Alex; Kesseli, Richard V; Burke, John M; Michelmore, Richard W; Rieseberg, Loren H
2014-01-01
Although the Compositae harbours only two major food crops, sunflower and lettuce, many other species in this family are utilized by humans and have experienced various levels of domestication. Here, we have used next-generation sequencing technology to develop 15 reference transcriptome assemblies for Compositae crops or their wild relatives. These data allow us to gain insight into the evolutionary and genomic consequences of plant domestication. Specifically, we performed Illumina sequencing of Cichorium endivia, Cichorium intybus, Echinacea angustifolia, Iva annua, Helianthus tuberosus, Dahlia hybrida, Leontodon taraxacoides and Glebionis segetum, as well 454 sequencing of Guizotia scabra, Stevia rebaudiana, Parthenium argentatum and Smallanthus sonchifolius. Illumina reads were assembled using Trinity, and 454 reads were assembled using MIRA and CAP3. We evaluated the coverage of the transcriptomes using BLASTX analysis of a set of ultra-conserved orthologs (UCOs) and recovered most of these genes (88-98%). We found a correlation between contig length and read length for the 454 assemblies, and greater contig lengths for the 454 compared with the Illumina assemblies. This suggests that longer reads can aid in the assembly of more complete transcripts. Finally, we compared the divergence of orthologs at synonymous sites (Ks) between Compositae crops and their wild relatives and found greater divergence when the progenitors were self-incompatible. We also found greater divergence between pairs of taxa that had some evidence of postzygotic isolation. For several more distantly related congeners, such as chicory and endive, we identified a signature of introgression in the distribution of Ks values. © 2013 John Wiley & Sons Ltd.
Mead, David A.; Lucas, Susan; Copeland, Alex; Lapidus, Alla; Cheng, Jan-Feng; Bruce, David C.; Goodwin, Lynne A.; Pitluck, Sam; Chertkov, Olga; Zhang, Xiaojing; Detter, John C.; Han, Cliff S.; Tapia, Roxanne; Land, Miriam; Hauser, Loren J.; Chang, Yun-juan; Kyrpides, Nikos C.; Ivanova, Natalia N.; Ovchinnikova, Galina; Woyke, Tanja; Brumm, Catherine; Hochstein, Rebecca; Schoenfeld, Thomas; Brumm, Phillip
2012-01-01
Paenibacillus sp.Y412MC10 was one of a number of organisms isolated from Obsidian Hot Spring, Yellowstone National Park, Montana, USA under permit from the National Park Service. The isolate was initially classified as a Geobacillus sp. Y412MC10 based on its isolation conditions and similarity to other organisms isolated from hot springs at Yellowstone National Park. Comparison of 16 S rRNA sequences within the Bacillales indicated that Geobacillus sp.Y412MC10 clustered with Paenibacillus species, and the organism was most closely related to Paenibacillus lautus. Lucigen Corp. prepared genomic DNA and the genome was sequenced, assembled, and annotated by the DOE Joint Genome Institute. The genome sequence was deposited at the NCBI in October 2009 (NC_013406). The genome of Paenibacillus sp. Y412MC10 consists of one circular chromosome of 7,121,665 bp with an average G+C content of 51.2%. Comparison to other Paenibacillus species shows the organism lacks nitrogen fixation, antibiotic production and social interaction genes reported in other paenibacilli. The Y412MC10 genome shows a high level of synteny and homology to the draft sequence of Paenibacillus sp. HGF5, an organism from the Human Microbiome Project (HMP) Reference Genomes. This, combined with genomic CAZyme analysis, suggests an intestinal, rather than environmental origin for Y412MC10. PMID:23408395
Mead, David A; Lucas, Susan; Copeland, Alex; Lapidus, Alla; Cheng, Jan-Feng; Bruce, David C; Goodwin, Lynne A; Pitluck, Sam; Chertkov, Olga; Zhang, Xiaojing; Detter, John C; Han, Cliff S; Tapia, Roxanne; Land, Miriam; Hauser, Loren J; Chang, Yun-Juan; Kyrpides, Nikos C; Ivanova, Natalia N; Ovchinnikova, Galina; Woyke, Tanja; Brumm, Catherine; Hochstein, Rebecca; Schoenfeld, Thomas; Brumm, Phillip
2012-07-30
Paenibacillus sp.Y412MC10 was one of a number of organisms isolated from Obsidian Hot Spring, Yellowstone National Park, Montana, USA under permit from the National Park Service. The isolate was initially classified as a Geobacillus sp. Y412MC10 based on its isolation conditions and similarity to other organisms isolated from hot springs at Yellowstone National Park. Comparison of 16 S rRNA sequences within the Bacillales indicated that Geobacillus sp.Y412MC10 clustered with Paenibacillus species, and the organism was most closely related to Paenibacillus lautus. Lucigen Corp. prepared genomic DNA and the genome was sequenced, assembled, and annotated by the DOE Joint Genome Institute. The genome sequence was deposited at the NCBI in October 2009 (NC_013406). The genome of Paenibacillus sp. Y412MC10 consists of one circular chromosome of 7,121,665 bp with an average G+C content of 51.2%. Comparison to other Paenibacillus species shows the organism lacks nitrogen fixation, antibiotic production and social interaction genes reported in other paenibacilli. The Y412MC10 genome shows a high level of synteny and homology to the draft sequence of Paenibacillus sp. HGF5, an organism from the Human Microbiome Project (HMP) Reference Genomes. This, combined with genomic CAZyme analysis, suggests an intestinal, rather than environmental origin for Y412MC10.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mead, David; Lucas, Susan; Copeland, A
2012-01-01
Paenibacillus speciesY412MC10 was one of a number of organisms initially isolated from Obsidian Hot Spring, Yellowstone National Park, Montana, USA. The isolate Y412MC10 was initially classified as a Geobacillus sp. based on its isolation conditions and similarity to other organisms isolated from hot springs at Yellowstone National Park. Comparison of 16 S rRNA sequences within the Bacillales indicated that Geobacillus sp.Y412MC10 clustered with Paenibacillus species and not Geobacillus; the 16S rRNA analysis indicated the organism was a strain of Paenibacillus lautus. Lucigen Corp. prepared genomic DNA and the genome was sequenced, assembled, and annotated by the DOE Joint Genome Institute.more » The genome of Paenibacillus lautus strain Y412MC10 consists of one circular chromosome of 7,121,665 bp with an average G+C content of 51.2%. The Paenibacillus sp.Y412MC10 genome sequence was deposited at the NCBI in October 2009 (NC{_}013406). Comparison to other Paenibacillus species shows the organism lacks nitrogen fixation, antibiotic production and social interaction genes reported in other Paenibacilli. Over 25% of the proteins predicted by the Y412MC10 genome share no identity with the closest sequenced Paenibacillus species; most of these are predicted hypothetical proteins and their specific function in the environment is unknown.« less
Whole-Genome Sequencing for Detecting Antimicrobial Resistance in Nontyphoidal Salmonella
Tyson, Gregory H.; Kabera, Claudine; Chen, Yuansha; Li, Cong; Folster, Jason P.; Ayers, Sherry L.; Lam, Claudia; Tate, Heather P.; Zhao, Shaohua
2016-01-01
Laboratory-based in vitro antimicrobial susceptibility testing is the foundation for guiding anti-infective therapy and monitoring antimicrobial resistance trends. We used whole-genome sequencing (WGS) technology to identify known antimicrobial resistance determinants among strains of nontyphoidal Salmonella and correlated these with susceptibility phenotypes to evaluate the utility of WGS for antimicrobial resistance surveillance. Six hundred forty Salmonella of 43 different serotypes were selected from among retail meat and human clinical isolates that were tested for susceptibility to 14 antimicrobials using broth microdilution. The MIC for each drug was used to categorize isolates as susceptible or resistant based on Clinical and Laboratory Standards Institute clinical breakpoints or National Antimicrobial Resistance Monitoring System (NARMS) consensus interpretive criteria. Each isolate was subjected to whole-genome shotgun sequencing, and resistance genes were identified from assembled sequences. A total of 65 unique resistance genes, plus mutations in two structural resistance loci, were identified. There were more unique resistance genes (n = 59) in the 104 human isolates than in the 536 retail meat isolates (n = 36). Overall, resistance genotypes and phenotypes correlated in 99.0% of cases. Correlations approached 100% for most classes of antibiotics but were lower for aminoglycosides and beta-lactams. We report the first finding of extended-spectrum β-lactamases (ESBLs) (blaCTX-M1 and blaSHV2a) in retail meat isolates of Salmonella in the United States. Whole-genome sequencing is an effective tool for predicting antibiotic resistance in nontyphoidal Salmonella, although the use of more appropriate surveillance breakpoints and increased knowledge of new resistance alleles will further improve correlations. PMID:27381390
Brehony, Carina; O'Connor, Lois; Meyler, Kenneth; Jolley, Keith A.; Bray, James; Bennett, Desiree; Maiden, Martin C. J.; Cunney, Robert
2016-01-01
A carriage study was undertaken (n = 112) to ascertain the prevalence of Neisseria spp. following the eighth case of invasive meningococcal disease in young children (5 to 46 months) and members of a large extended indigenous ethnic minority Traveller family (n = 123), typically associated with high-occupancy living conditions. Nested multilocus sequence typing (MLST) was employed for case specimen extracts. Isolates were genome sequenced and then were assembled de novo and deposited into the Bacterial Isolate Genome Sequencing Database (BIGSdb). This facilitated an expanded MLST approach utilizing large numbers of loci for isolate characterization and discrimination. A rare sequence type, ST-6697, predominated in disease specimens and isolates that were carried (n = 8/14), persisting for at least 44 months, likely driven by the high population density of houses (n = 67/112) and trailers (n = 45/112). Carriage for Neisseria meningitidis (P < 0.05) and Neisseria lactamica (P < 0.002) (2-sided Fisher's exact test) was more likely in the smaller, more densely populated trailers. Meningococcal carriage was highest in 24- to 39-year-olds (45%, n = 9/20). Evidence of horizontal gene transfer (HGT) was observed in four individuals cocolonized by Neisseria lactamica and Neisseria meningitidis. One HGT event resulted in the acquisition of 26 consecutive N. lactamica alleles. This study demonstrates how housing density can drive meningococcal transmission and carriage, which likely facilitated the persistence of ST-6697 and prolonged the outbreak. Whole-genome MLST effectively distinguished between highly similar outbreak strain isolates, including those isolated from person-to-person transmission, and also highlighted how a few HGT events can distort the true phylogenetic relationship between highly similar clonal isolates. PMID:27629899
Draft genome sequence of Dactylonectria macrodydima, a plant pathogenic fungus in the Nectriaceae
USDA-ARS?s Scientific Manuscript database
Dactylonectria macrodidyma is part of the Nectriaceae, a family containing important plant pathogens. This species possesses the ability to induce disease on grapevine, avocado and olive. Here, we report the first draft genome of D. macrodidyma isolate JAC15-08. The assembled genome was 58 Mbp and c...
Basra, Prabh; Koziol, Adam; Wong, Alex; Carrillo, Catherine D
2015-01-08
Citrobacter braakii is a Gram-negative bacterium belonging to the Enterobacteriaceae family. Here, we report 5.2- and 5.0-Mb genome assemblies for C. braakii strains GTA-CB01 and GTA-CB04, respectively. Copyright © 2015 Basra et al.
Whole-Genome Sequencing of Two Bartonella bacilliformis Strains
Guillen, Yolanda; Casadellà, Maria; García-de-la-Guarda, Ruth; Espinoza-Culupú, Abraham; Paredes, Roger; Ruiz, Joaquim
2016-01-01
Bartonella bacilliformis is the causative agent of Carrion’s disease, a highly endemic human bartonellosis in Peru. We performed a whole-genome assembly of two B. bacilliformis strains isolated from the blood of infected patients in the acute phase of Carrion’s disease from the Cusco and Piura regions in Peru. PMID:27389274
Whole-Genome Sequence of the Soil Bacterium Micrococcus sp. KBS0714.
Kuo, V; Shoemaker, W R; Muscarella, M E; Lennon, J T
2017-08-10
We present here a draft genome assembly of Micrococcus sp. KBS0714, which was isolated from agricultural soil. The genome provides insight into the strategies that Micrococcus spp. use to contend with environmental stressors such as desiccation and starvation in environmental and host-associated ecosystems. Copyright © 2017 Kuo et al.
Dayao, Denise Ann Estarez; Seddon, Jennifer M; Gibson, Justine S; Blackall, Patrick J; Turni, Conny
2016-10-01
Macrolides are often used to treat and control bacterial pathogens causing respiratory disease in pigs. This study analyzed the whole genome sequences of one clinical isolate of Actinobacillus pleuropneumoniae, Haemophilus parasuis, Pasteurella multocida, and Bordetella bronchiseptica, all isolated from Australian pigs to identify the mechanism underlying the elevated minimum inhibitory concentrations (MICs) for erythromycin, tilmicosin, or tulathromycin. The H. parasuis assembled genome had a nucleotide transition at position 2059 (A to G) in the six copies of the 23S rRNA gene. This mutation has previously been associated with macrolide resistance but this is the first reported mechanism associated with elevated macrolide MICs in H. parasuis. There was no known macrolide resistance mechanism identified in the other three bacterial genomes. However, strA and sul2, aminoglycoside and sulfonamide resistance genes, respectively, were detected in one contiguous sequence (contig 1) of A. pleuropneumoniae assembled genome. This contig was identical to plasmids previously identified in Pasteurellaceae. This study has provided one possible explanation of elevated MICs to macrolides in H. parasuis. Further studies are necessary to clarify the mechanism causing the unexplained macrolide resistance in other Australian pig respiratory pathogens including the role of efflux systems, which were detected in all analyzed genomes.
Eimeria genomics: Where are we now and where are we going?
Blake, Damer P
2015-08-15
The evolution of sequencing technologies, from Sanger to next generation (NGS) and now the emerging third generation, has prompted a radical frameshift moving genomics from the specialist to the mainstream. For parasitology, genomics has moved fastest for the protozoa with sequence assemblies becoming available for multiple genera including Babesia, Cryptosporidium, Eimeria, Giardia, Leishmania, Neospora, Plasmodium, Theileria, Toxoplasma and Trypanosoma. Progress has commonly been slower for parasites of animals which lack zoonotic potential, but the deficit is now being redressed with impact likely in the areas of drug and vaccine development, molecular diagnostics and population biology. Genomics studies with the apicomplexan Eimeria species clearly illustrate the approaches and opportunities available. Specifically, more than ten years after initiation of a genome sequencing project a sequence assembly was published for Eimeria tenella in 2014, complemented by assemblies for all other Eimeria species which infect the chicken and Eimeria falciformis, a parasite of the mouse. Public access to these and other coccidian genome assemblies through resources such as GeneDB and ToxoDB now promotes comparative analysis, encouraging better use of shared resources and enhancing opportunities for development of novel diagnostic and control strategies. In the short term genomics resources support development of targeted and genome-wide genetic markers such as single nucleotide polymorphisms (SNPs), with whole genome re-sequencing becoming viable in the near future. Experimental power will develop rapidly as additional species, strains and isolates are sampled with particular emphasis on population structure and allelic diversity. Copyright © 2015 Elsevier B.V. All rights reserved.
Tellgren-Roth, Christian; Baudo, Charles D.; Kennell, John C.; Sun, Sheng; Billmyre, R. Blake; Schröder, Markus S.; Andersson, Anna; Holm, Tina; Sigurgeirsson, Benjamin; Wu, Guangxi; Sankaranarayanan, Sundar Ram; Siddharthan, Rahul; Sanyal, Kaustuv; Lundeberg, Joakim; Nystedt, Björn; Boekhout, Teun; Dawson, Thomas L.; Heitman, Joseph
2017-01-01
Abstract Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies. PMID:28100699
Advances in Cryptococcus genomics: insights into the evolution of pathogenesis.
Cuomo, Christina A; Rhodes, Johanna; Desjardins, Christopher A
2018-01-01
Cryptococcus species are the causative agents of cryptococcal meningitis, a significant source of mortality in immunocompromised individuals. Initial work on the molecular epidemiology of this fungal pathogen utilized genotyping approaches to describe the genetic diversity and biogeography of two species, Cryptococcus neoformans and Cryptococcus gattii. Whole genome sequencing of representatives of both species resulted in reference assemblies enabling a wide array of downstream studies and genomic resources. With the increasing availability of whole genome sequencing, both species have now had hundreds of individual isolates sequenced, providing fine-scale insight into the evolution and diversification of Cryptococcus and allowing for the first genome-wide association studies to identify genetic variants associated with human virulence. Sequencing has also begun to examine the microevolution of isolates during prolonged infection and to identify variants specific to outbreak lineages, highlighting the potential role of hyper-mutation in evolving within short time scales. We can anticipate that further advances in sequencing technology and sequencing microbial genomes at scale, including metagenomics approaches, will continue to refine our view of how the evolution of Cryptococcus drives its success as a pathogen.
Kamada, Mayumi; Hase, Sumitaka; Fujii, Kazushi; Miyake, Masato; Sato, Kengo; Kimura, Keitarou; Sakakibara, Yasubumi
2015-01-01
Bacillus subtilis is the main component in the fermentation of soybeans. To investigate the genetics of the soybean-fermenting B. subtilis strains and its relationship with the productivity of extracellular poly-γ-glutamic acid (γPGA), we sequenced the whole genome of eight B. subtilis stains isolated from non-salted fermented soybean foods in Southeast Asia. Assembled nucleotide sequences were compared with those of a natto (fermented soybean food) starter strain B. subtilis BEST195 and the laboratory standard strain B. subtilis 168 that is incapable of γPGA production. Detected variants were investigated in terms of insertion sequences, biotin synthesis, production of subtilisin NAT, and regulatory genes for γPGA synthesis, which were related to fermentation process. Comparing genome sequences, we found that the strains that produce γPGA have a deletion in a protein that constitutes the flagellar basal body, and this deletion was not found in the non-producing strains. We further identified diversity in variants of the bio operon, which is responsible for the biotin auxotrophism of the natto starter strains. Phylogenetic analysis using multilocus sequencing typing revealed that the B. subtilis strains isolated from the non-salted fermented soybeans were not clustered together, while the natto-fermenting strains were tightly clustered; this analysis also suggested that the strain isolated from “Tua Nao” of Thailand traces a different evolutionary process from other strains. PMID:26505996
Cavanagh, Jorunn Pauline; Hjerde, Erik; Holden, Matthew T G; Kahlke, Tim; Klingenberg, Claus; Flægstad, Trond; Parkhill, Julian; Bentley, Stephen D; Sollid, Johanna U Ericson
2014-11-01
Staphylococcus haemolyticus is an emerging cause of nosocomial infections, primarily affecting immunocompromised patients. A comparative genomic analysis was performed on clinical S. haemolyticus isolates to investigate their genetic relationship and explore the coding sequences with respect to antimicrobial resistance determinants and putative hospital adaptation. Whole-genome sequencing was performed on 134 isolates of S. haemolyticus from geographically diverse origins (Belgium, 2; Germany, 10; Japan, 13; Norway, 54; Spain, 2; Switzerland, 43; UK, 9; USA, 1). Each genome was individually assembled. Protein coding sequences (CDSs) were predicted and homologous genes were categorized into three types: Type I, core genes, homologues present in all strains; Type II, unique core genes, homologues shared by only a subgroup of strains; and Type III, unique genes, strain-specific CDSs. The phylogenetic relationship between the isolates was built from variable sites in the form of single nucleotide polymorphisms (SNPs) in the core genome and used to construct a maximum likelihood phylogeny. SNPs in the genome core regions divided the isolates into one major group of 126 isolates and one minor group of isolates with highly diverse genomes. The major group was further subdivided into seven clades (A-G), of which four (A-D) encompassed isolates only from Europe. Antimicrobial multiresistance was observed in 77.7% of the collection. High levels of homologous recombination were detected in genes involved in adherence, staphylococcal host adaptation and bacterial cell communication. The presence of several successful and highly resistant clones underlines the adaptive potential of this opportunistic pathogen. © The Author 2014. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy.
Cavanagh, Jorunn Pauline; Hjerde, Erik; Holden, Matthew T. G.; Kahlke, Tim; Klingenberg, Claus; Flægstad, Trond; Parkhill, Julian; Bentley, Stephen D.; Sollid, Johanna U. Ericson
2014-01-01
Objectives Staphylococcus haemolyticus is an emerging cause of nosocomial infections, primarily affecting immunocompromised patients. A comparative genomic analysis was performed on clinical S. haemolyticus isolates to investigate their genetic relationship and explore the coding sequences with respect to antimicrobial resistance determinants and putative hospital adaptation. Methods Whole-genome sequencing was performed on 134 isolates of S. haemolyticus from geographically diverse origins (Belgium, 2; Germany, 10; Japan, 13; Norway, 54; Spain, 2; Switzerland, 43; UK, 9; USA, 1). Each genome was individually assembled. Protein coding sequences (CDSs) were predicted and homologous genes were categorized into three types: Type I, core genes, homologues present in all strains; Type II, unique core genes, homologues shared by only a subgroup of strains; and Type III, unique genes, strain-specific CDSs. The phylogenetic relationship between the isolates was built from variable sites in the form of single nucleotide polymorphisms (SNPs) in the core genome and used to construct a maximum likelihood phylogeny. Results SNPs in the genome core regions divided the isolates into one major group of 126 isolates and one minor group of isolates with highly diverse genomes. The major group was further subdivided into seven clades (A–G), of which four (A–D) encompassed isolates only from Europe. Antimicrobial multiresistance was observed in 77.7% of the collection. High levels of homologous recombination were detected in genes involved in adherence, staphylococcal host adaptation and bacterial cell communication. Conclusions The presence of several successful and highly resistant clones underlines the adaptive potential of this opportunistic pathogen. PMID:25038069
The History of Bordetella pertussis Genome Evolution Includes Structural Rearrangement
Peng, Yanhui; Loparev, Vladimir; Batra, Dhwani; Bowden, Katherine E.; Burroughs, Mark; Cassiday, Pamela K.; Davis, Jamie K.; Johnson, Taccara; Juieng, Phalasy; Knipe, Kristen; Mathis, Marsenia H.; Pruitt, Andrea M.; Rowe, Lori; Sheth, Mili; Tondella, M. Lucia; Williams, Margaret M.
2017-01-01
ABSTRACT Despite high pertussis vaccine coverage, reported cases of whooping cough (pertussis) have increased over the last decade in the United States and other developed countries. Although Bordetella pertussis is well known for its limited gene sequence variation, recent advances in long-read sequencing technology have begun to reveal genomic structural heterogeneity among otherwise indistinguishable isolates, even within geographically or temporally defined epidemics. We have compared rearrangements among complete genome assemblies from 257 B. pertussis isolates to examine the potential evolution of the chromosomal structure in a pathogen with minimal gene nucleotide sequence diversity. Discrete changes in gene order were identified that differentiated genomes from vaccine reference strains and clinical isolates of various genotypes, frequently along phylogenetic boundaries defined by single nucleotide polymorphisms. The observed rearrangements were primarily large inversions centered on the replication origin or terminus and flanked by IS481, a mobile genetic element with >240 copies per genome and previously suspected to mediate rearrangements and deletions by homologous recombination. These data illustrate that structural genome evolution in B. pertussis is not limited to reduction but also includes rearrangement. Therefore, although genomes of clinical isolates are structurally diverse, specific changes in gene order are conserved, perhaps due to positive selection, providing novel information for investigating disease resurgence and molecular epidemiology. IMPORTANCE Whooping cough, primarily caused by Bordetella pertussis, has resurged in the United States even though the coverage with pertussis-containing vaccines remains high. The rise in reported cases has included increased disease rates among all vaccinated age groups, provoking questions about the pathogen's evolution. The chromosome of B. pertussis includes a large number of repetitive mobile genetic elements that obstruct genome analysis. However, these mobile elements facilitate large rearrangements that alter the order and orientation of essential protein-encoding genes, which otherwise exhibit little nucleotide sequence diversity. By comparing the complete genome assemblies from 257 isolates, we show that specific rearrangements have been conserved throughout recent evolutionary history, perhaps by eliciting changes in gene expression, which may also provide useful information for molecular epidemiology. PMID:28167525
Hatahet, Feras; Blazyk, Jessica L; Martineau, Eugenie; Mandela, Eric; Zhao, Yongxin; Campbell, Robert E; Beckwith, Jonathan; Boyd, Dana
2015-12-08
Functional overexpression of polytopic membrane proteins, particularly when in a foreign host, is often a challenging task. Factors that negatively affect such processes are poorly understood. Using the mammalian membrane protein vitamin K epoxide reductase (VKORc1) as a reporter, we describe a genetic selection approach allowing the isolation of Escherichia coli mutants capable of functionally expressing this blood-coagulation enzyme. The isolated mutants map to components of membrane protein assembly and quality control proteins YidC and HslV. We show that changes in the VKORc1 sequence and in the YidC hydrophilic groove along with the inactivation of HslV promote VKORc1 activity and dramatically increase its expression level. We hypothesize that such changes correct for mismatches in the membrane topogenic signals between E. coli and eukaryotic cells guiding proper membrane integration. Furthermore, the obtained mutants allow the study of VKORc1 reaction mechanisms, inhibition by warfarin, and the high-throughput screening for potential anticoagulants.
Hatahet, Feras; Blazyk, Jessica L.; Martineau, Eugenie; Mandela, Eric; Zhao, Yongxin; Campbell, Robert E.; Beckwith, Jonathan; Boyd, Dana
2015-01-01
Functional overexpression of polytopic membrane proteins, particularly when in a foreign host, is often a challenging task. Factors that negatively affect such processes are poorly understood. Using the mammalian membrane protein vitamin K epoxide reductase (VKORc1) as a reporter, we describe a genetic selection approach allowing the isolation of Escherichia coli mutants capable of functionally expressing this blood-coagulation enzyme. The isolated mutants map to components of membrane protein assembly and quality control proteins YidC and HslV. We show that changes in the VKORc1 sequence and in the YidC hydrophilic groove along with the inactivation of HslV promote VKORc1 activity and dramatically increase its expression level. We hypothesize that such changes correct for mismatches in the membrane topogenic signals between E. coli and eukaryotic cells guiding proper membrane integration. Furthermore, the obtained mutants allow the study of VKORc1 reaction mechanisms, inhibition by warfarin, and the high-throughput screening for potential anticoagulants. PMID:26598701
Melo, Ricardo Rodrigues de; Persinoti, Gabriela Felix; Paixão, Douglas Antonio Alvaredo; Squina, Fábio Márcio; Ruller, Roberto; Sato, Helia Harumi
Here, we show the draft genome sequence of Streptomyces sp. F1, a strain isolated from soil with great potential for secretion of hydrolytic enzymes used to deconstruct cellulosic biomass. The draft genome assembly of Streptomyces sp. strain F1 has 69 contigs with a total genome size of 8,142,296bp and G+C 72.65%. Preliminary genome analysis identified 175 proteins as Carbohydrate-Active Enzymes, being 85 glycoside hydrolases organized in 33 distinct families. This draft genome information provides new insights on the key genes encoding hydrolytic enzymes involved in biomass deconstruction employed by soil bacteria. Copyright © 2017 Sociedade Brasileira de Microbiologia. Published by Elsevier Editora Ltda. All rights reserved.
Chalker, Victoria J; Smith, Alyson; Al-Shahib, Ali; Botchway, Stella; Macdonald, Emily; Daniel, Roger; Phillips, Sarah; Platt, Steven; Doumith, Michel; Tewolde, Rediat; Coelho, Juliana; Jolley, Keith A; Underwood, Anthony; McCarthy, Noel D
2016-06-01
Single-strain outbreaks of Streptococcus pyogenes infections are common and often go undetected. In 2013, two clusters of invasive group A Streptococcus (iGAS) infection were identified in independent but closely located care homes in Oxfordshire, United Kingdom. Investigation included visits to each home, chart review, staff survey, microbiologic sampling, and genome sequencing. S. pyogenes emm type 1.0, the most common circulating type nationally, was identified from all cases yielding GAS isolates. A tailored whole-genome reference population comprising epidemiologically relevant contemporaneous isolates and published isolates was assembled. Data were analyzed independently using whole-genome multilocus sequencing and single-nucleotide polymorphism analyses. Six isolates from staff and residents of the homes formed a single cluster that was separated from the reference population by both analytical approaches. No further cases occurred after mass chemoprophylaxis and enhanced infection control. Our findings demonstrate the ability of 2 independent analytical approaches to enable robust conclusions from nonstandardized whole-genome analysis to support public health practice.
Figueroa-Montiel, Andrea; Ramos, Marco A; Mares, Rosa E; Dueñas, Salvador; Pimienta, Genaro; Ortiz, Ernesto; Possani, Lourival D; Licea-Navarro, Alexei F
2016-01-01
Small peptides isolated from the venom of the marine snails belonging to the genus Conus have been largely studied because of their therapeutic value. These peptides can be classified in two groups. The largest one is composed by peptides rich in disulfide bonds, and referred to as conotoxins. Despite the importance of conotoxins given their pharmacology value, little is known about the protein disulfide isomerase (PDI) enzymes that are required to catalyze their correct folding. To discover the PDIs that may participate in the folding and structural maturation of conotoxins, the transcriptomes of the venom duct of four different species of Conus from the peninsula of Baja California (Mexico) were assembled. Complementary DNA (cDNA) libraries were constructed for each species and sequenced using a Genome Analyzer Illumina platform. The raw RNA-seq data was converted into transcript sequences using Trinity, a de novo assembler that allows the grouping of reads into contigs without a reference genome. An N50 value of 605 was established as a reference for future assemblies of Conus transcriptomes using this software. Transdecoder was used to extract likely coding sequences from Trinity transcripts, and PDI-specific sequence motif "APWCGHCK" was used to capture potential PDIs. An in silico analysis was performed to characterize the group of PDI protein sequences encoded by the duct-transcriptome of each species. The computational approach entailed a structural homology characterization, based on the presence of functional Thioredoxin-like domains. Four different PDI families were characterized, which are constituted by a total of 41 different gene sequences. The sequences had an average of 65% identity with other PDIs. Using MODELLER 9.14, the homology-based three-dimensional structure prediction of a subset of the sequences reported, showed the expected thioredoxin fold which was confirmed by a "simulated annealing" method.
Pepo aphid-borne yellows virus: a new species in the genus Polerovirus.
Ibaba, Jacques D; Laing, Mark D; Gubba, Augustine
2017-02-01
Pepo aphid-borne yellows virus (PABYV) has been proposed as a putative representative of a new species in the genus Polerovirus in the family Luteoviridae. The genomes of two South African (SA) isolates of cucurbit-infecting PABYV were described in this record. Total RNA, extracted from a pattypan (Cucurbita pepo L.) and a baby marrow (C. pepo L.) leaf samples, was subjected to next-generation sequencing (NGS) on the HiSeq Illumina platform. Sanger sequencing was subsequently used to authenticate the integrity of PABYV's genome generated from de novo assembly of the NGS data. PABYV genome of SA isolates consists of 5813 nucleotides and displays an organisation typical of poleroviruses. Genome sequence comparisons of the SA PABYV isolates to other poleroviruses support the classification of PABYV as a new species in the genus Polerovirus. Recombination analyses showed that PABYV and Cucurbit aphid-borne yellows virus (CABYV) shared the same ancestor for the genome part situated between breaking points. Phylogenetic analyses of the RNA-dependent RNA polymerase and the coat protein genes showed that SA PABYV isolates shared distant relationship with CABYV and Suakwa aphid-borne yellows virus. Based on our results, we propose that PABYV is a distinct species in the genus Polerovirus.
Sánchez-Nieves, Rubén; Facciotti, Marc T; Saavedra-Collado, Sofía; Dávila-Santiago, Lizbeth; Rodríguez-Carrero, Roy; Montalvo-Rodríguez, Rafael
2016-03-01
The genus Halorubrum is a member of the family Halobacteriaceae which currently has the highest number of described species (31) of all the haloarchaea. Here we report the draft genome sequence of strain V5, a new species within this genus that was isolated from the solar salterns of Cabo Rojo, Puerto Rico. Assembly was performed and rendered the genome into 17 contigs (N50 = 515,834 bp), the largest of which contains 1,031,026 bp. The genome consists of 3.57 MB in length with G + C content of 67.6%. In general, the genome includes 4 rRNAs, 52 tRNAs, and 3246 protein-coding sequences. The NCBI accession number for this genome is LIST00000000 and the strain deposit number is CECT9000.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gallaher, Sean D.; Fitz-Gibbon, Sorel T.; Strenkert, Daniela
Chlamydomonas reinhardtii is a unicellular chlorophyte alga that is widely studied as a reference organism for understanding photosynthesis, sensory and motile cilia, and for development of an algal-based platform for producing biofuels and bio-products. Its highly repetitive, ~205-kbp circular chloroplast genome and ~15.8-kbp linear mitochondrial genome were sequenced prior to the advent of high-throughput sequencing technologies. Here, high coverage shotgun sequencing was used to assemble both organellar genomes de novo. These new genomes correct dozens of errors in the prior genome sequences and annotations. Gen-ome sequencing coverage indicates that each cell contains on average 83 copies of the chloroplast genomemore » and 130 copies of the mitochondrial genome. Using protocols and analyses optimized for organellar tran-scripts, RNA-Seq was used to quantify their relative abundances across 12 different growth conditions. Forty-six percent of total cellular mRNA is attributable to high expression from a few dozen chloroplast genes. RNA-Seq data were used to guide gene annotation, to demonstrate polycistronic gene expression, and to quantify splicing of psaA and psbA introns. In contrast to a conclusion from a recent study, we found that chloroplast transcripts are not edited. Unexpectedly, cytosine-rich polynucleotide tails were observed at the 3’-end of all mitochondrial transcripts. A comparative genomics analysis of eight laboratory strains and 11 wild isolates of C. reinhardtii identified 2658 variants in the organellargenomes, which is 1/10th as much genetic diversity as is found in the nucleus.« less
Microbial community assembly and evolution in subseafloor sediment.
Starnawski, Piotr; Bataillon, Thomas; Ettema, Thijs J G; Jochum, Lara M; Schreiber, Lars; Chen, Xihan; Lever, Mark A; Polz, Martin F; Jørgensen, Bo B; Schramm, Andreas; Kjeldsen, Kasper U
2017-03-14
Bacterial and archaeal communities inhabiting the subsurface seabed live under strong energy limitation and have growth rates that are orders of magnitude slower than laboratory-grown cultures. It is not understood how subsurface microbial communities are assembled and whether populations undergo adaptive evolution or accumulate mutations as a result of impaired DNA repair under such energy-limited conditions. Here we use amplicon sequencing to explore changes of microbial communities during burial and isolation from the surface to the >5,000-y-old subsurface of marine sediment and identify a small core set of mostly uncultured bacteria and archaea that is present throughout the sediment column. These persisting populations constitute a small fraction of the entire community at the surface but become predominant in the subsurface. We followed patterns of genome diversity with depth in four dominant lineages of the persisting populations by mapping metagenomic sequence reads onto single-cell genomes. Nucleotide sequence diversity was uniformly low and did not change with age and depth of the sediment. Likewise, there was no detectable change in mutation rates and efficacy of selection. Our results indicate that subsurface microbial communities predominantly assemble by selective survival of taxa able to persist under extreme energy limitation.
Whole-Genome Sequencing for Detecting Antimicrobial Resistance in Nontyphoidal Salmonella.
McDermott, Patrick F; Tyson, Gregory H; Kabera, Claudine; Chen, Yuansha; Li, Cong; Folster, Jason P; Ayers, Sherry L; Lam, Claudia; Tate, Heather P; Zhao, Shaohua
2016-09-01
Laboratory-based in vitro antimicrobial susceptibility testing is the foundation for guiding anti-infective therapy and monitoring antimicrobial resistance trends. We used whole-genome sequencing (WGS) technology to identify known antimicrobial resistance determinants among strains of nontyphoidal Salmonella and correlated these with susceptibility phenotypes to evaluate the utility of WGS for antimicrobial resistance surveillance. Six hundred forty Salmonella of 43 different serotypes were selected from among retail meat and human clinical isolates that were tested for susceptibility to 14 antimicrobials using broth microdilution. The MIC for each drug was used to categorize isolates as susceptible or resistant based on Clinical and Laboratory Standards Institute clinical breakpoints or National Antimicrobial Resistance Monitoring System (NARMS) consensus interpretive criteria. Each isolate was subjected to whole-genome shotgun sequencing, and resistance genes were identified from assembled sequences. A total of 65 unique resistance genes, plus mutations in two structural resistance loci, were identified. There were more unique resistance genes (n = 59) in the 104 human isolates than in the 536 retail meat isolates (n = 36). Overall, resistance genotypes and phenotypes correlated in 99.0% of cases. Correlations approached 100% for most classes of antibiotics but were lower for aminoglycosides and beta-lactams. We report the first finding of extended-spectrum β-lactamases (ESBLs) (blaCTX-M1 and blaSHV2a) in retail meat isolates of Salmonella in the United States. Whole-genome sequencing is an effective tool for predicting antibiotic resistance in nontyphoidal Salmonella, although the use of more appropriate surveillance breakpoints and increased knowledge of new resistance alleles will further improve correlations. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Genome Sequences of Four Staphylococcus aureus Strains Isolated from Bovine Mastitis
Taponen, Suvi; Koort, Joanna; Paulin, Lars; Åvall-Jääskeläinen, Silja
2015-01-01
Staphylococcus aureus is a major causative agent of mastitis in dairy cows. The pathogenicity of S. aureus may vary; it is able to cause severe clinical mastitis, but most often it is associated with chronic subclinical mastitis. Here, we present the genome assemblies of four S. aureus strains from bovine mastitis. PMID:25908141
Draft genome sequence of the fish pathogen Flavobacterium columnare strain CSF-298-10
USDA-ARS?s Scientific Manuscript database
We announce the genome assembly of Flavobacterium columnare strain CSF-298-10, a strain isolated from an outbreak of Columnaris disease at a commercial trout farm in Snake River Valley Idaho, USA. The complete genome consists of 13 contigs totaling 3,284,579 bp, average G+C content of 31.5% and 2933...
Bunka, David H J; Lane, Stephen W; Lane, Claire L; Dykeman, Eric C; Ford, Robert J; Barker, Amy M; Twarock, Reidun; Phillips, Simon E V; Stockley, Peter G
2011-10-14
Using a recombinant, T=1 Satellite Tobacco Necrosis Virus (STNV)-like particle expressed in Escherichia coli, we have established conditions for in vitro disassembly and reassembly of the viral capsid. In vivo assembly is dependent on the presence of the coat protein (CP) N-terminal region, and in vitro assembly requires RNA. Using immobilised CP monomers under reassembly conditions with "free" CP subunits, we have prepared a range of partially assembled CP species for RNA aptamer selection. SELEX directed against the RNA-binding face of the STNV CP resulted in the isolation of several clones, one of which (B3) matches the STNV-1 genome in 16 out of 25 nucleotide positions, including across a statistically significant 10/10 stretch. This 10-base region folds into a stem-loop displaying the motif ACAA and has been shown to bind to STNV CP. Analysis of the other aptamer sequences reveals that the majority can be folded into stem-loops displaying versions of this motif. Using a sequence and secondary structure search motif to analyse the genomic sequence of STNV-1, we identified 30 stem-loops displaying the sequence motif AxxA. The implication is that there are many stem-loops in the genome carrying essential recognition features for binding STNV CP. Secondary structure predictions of the genomic RNA using Mfold showed that only 8 out of 30 of these stem-loops would be formed in the lowest-energy structure. These results are consistent with an assembly mechanism based on kinetically driven folding of the RNA. Copyright © 2011 Elsevier Ltd. All rights reserved.
Utilizing Gene Tree Variation to Identify Candidate Effector Genes in Zymoseptoria tritici
McDonald, Megan C.; McGinness, Lachlan; Hane, James K.; Williams, Angela H.; Milgate, Andrew; Solomon, Peter S.
2016-01-01
Zymoseptoria tritici is a host-specific, necrotrophic pathogen of wheat. Infection by Z. tritici is characterized by its extended latent period, which typically lasts 2 wks, and is followed by extensive host cell death, and rapid proliferation of fungal biomass. This work characterizes the level of genomic variation in 13 isolates, for which we have measured virulence on 11 wheat cultivars with differential resistance genes. Between the reference isolate, IPO323, and the 13 Australian isolates we identified over 800,000 single nucleotide polymorphisms, of which ∼10% had an effect on the coding regions of the genome. Furthermore, we identified over 1700 probable presence/absence polymorphisms in genes across the Australian isolates using de novo assembly. Finally, we developed a gene tree sorting method that quickly identifies groups of isolates within a single gene alignment whose sequence haplotypes correspond with virulence scores on a single wheat cultivar. Using this method, we have identified < 100 candidate effector genes whose gene sequence correlates with virulence toward a wheat cultivar carrying a major resistance gene. PMID:26837952
Nicolás, Marisa F.; Ramos, Pablo Ivan Pereira; Marques de Carvalho, Fabíola; Camargo, Dhian R. A.; de Fátima Morais Alves, Carlene; Loss de Morais, Guilherme; Almeida, Luiz G. P.; Souza, Rangel C.; Ciapina, Luciane P.; Vicente, Ana C. P.; Coimbra, Roney S.; Ribeiro de Vasconcelos, Ana T.
2018-01-01
The aim of this study was to unravel the genetic determinants responsible for multidrug (including carbapenems) resistance and virulence in a clinical isolate of Klebsiella quasipneumoniae subsp. similipneumoniae by whole-genome sequencing and comparative analyses. Eighty-three clinical isolates initially identified as carbapenem-resistant K. pneumoniae were collected from nosocomial infections in southeast Brazil. After RAPD screening, the KPC-142 isolate, showing the most divergent DNA pattern, was selected for complete genome sequencing in an Illumina HiSeq 2500 instrument. Reads were assembled into scaffolds, gaps between scaffolds were resolved by in silico gap filling and extensive bioinformatics analyses were performed, using multiple comparative analysis tools and databases. Genome sequencing allowed to correct the classification of the KPC-142 isolate as K. quasipneumoniae subsp. similipneumoniae. To the best of our knowledge this is the first complete genome reported to date of a clinical isolate of this subspecies harboring both class A beta-lactamases KPC-2 and OKP-B-6 from South America. KPC-142 has one 5.2 Mbp chromosome (57.8% G+C) and two plasmids: 190 Kbp pKQPS142a (50.7% G+C) and 11 Kbp pKQPS142b (57.3% G+C). The 3 Kbp region in pKQPS142b containing the blaKPC−2 was found highly similar to that of pKp13d of K. pneumoniae Kp13 isolated in Southern Brazil in 2009, suggesting the horizontal transfer of this resistance gene between different species of Klebsiella. KPC-142 additionally harbors an integrative conjugative element ICEPm1 that could be involved in the mobilization of pKQPS142b and determinants of resistance to other classes of antimicrobials, including aminoglycoside and silver. We present the completely assembled genome sequence of a clinical isolate of K. quasipneumoniae subsp. similipneumoniae, a KPC-2 and OKP-B-6 beta-lactamases producer and discuss the most relevant genomic features of this important resistant pathogen in comparison to several strains belonging to K. quasipneumoniae subsp. similipneumoniae (phylogroup II-B), K. quasipneumoniae subsp. quasipneumoniae (phylogroup II-A), K. pneumoniae (phylogroup I), and K. variicola (phylogroup III). Our study contributes to the description of the characteristics of a novel K. quasipneumoniae subsp. similipneumoniae strain circulating in South America that currently represent a serious potential risk for nosocomial settings. PMID:29503635
Hyndman, Timothy H; Marschang, Rachel E; Wellehan, James F X; Nicholls, Philip K
2012-10-01
This paper describes the isolation and molecular identification of a novel paramyxovirus found during an investigation of an outbreak of neurorespiratory disease in a collection of Australian pythons. Using Illumina® high-throughput sequencing, a 17,187 nucleotide sequence was assembled from RNA extracts from infected viper heart cells (VH2) displaying widespread cytopathic effects in the form of multinucleate giant cells. The sequence appears to contain all the coding regions of the genome, including the following predicted paramyxoviral open reading frames (ORFs): 3'--Nucleocapsid (N)--putative Phosphoprotein (P)--Matrix (M)--Fusion (F)--putative attachment protein--Polymerase (L)--5'. There is also a 540 nucleotide ORF between the N and putative P genes that may be an additional coding region. Phylogenetic analyses of the complete N, M, F and L genes support the clustering of this virus within the family Paramyxoviridae but outside both of the current subfamilies: Paramyxovirinae and Pneumovirinae. We propose to name this new virus, Sunshine virus, after the geographic origin of the first isolate--the Sunshine Coast of Queensland, Australia. Copyright © 2012 Elsevier B.V. All rights reserved.
Wu, Xiaodong; Wu, Xiaoyun; Li, Wenbin; Cheng, Xiaofei
2018-05-01
Through sequencing and assembly of small RNAs, an orthotospovirus was identified from a celtuce plant (Lactuca sativa var. augustana) showing vein clearing and chlorotic spots in the Zhejiang province of China. The S, M, and L RNAs of this orthotospovirus were determined to be 3146, 4734, and 8934 nt, respectively, and shared 30.4-72.5%, 43.4-80.8%, and 29.84-82.9% nucleotide sequence identities with that of known orthotospoviruses. The full length nucleoprotein (N) of this orthotospovirus shared highest amino acid sequence identity (90.25%) with that of calla lily chlorotic spot virus isolated from calla lily (CCSV-calla) [China: Taiwan: 2001] and tobacco (CCSV-LJ1) [China: Lijiang: 2014]. Phylogenetic analyses showed that this orthotospovirus is phylogenetically associated with CCSV isolates and clustered with CCSV, tomato zonate spot virus (TZSV), and tomato necrotic spot-associated virus (TNSaV) in a separate sub-branch. These results suggest that this orthotospovirus is a divergent isolate of CCSV and was thus named CCSV-Cel [China: Zhejiang: 2017].
Isolation and characterization of a virus infecting the freshwater algae Chrysochromulina parva
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mirza, S.F.; Staniewski, M.A.; Short, C.M.
Water samples from Lake Ontario, Canada were tested for lytic activity against the freshwater haptophyte algae Chrysochromulina parva. A filterable lytic agent was isolated and identified as a virus via transmission electron microscopy and molecular methods. The virus, CpV-BQ1, is icosahedral, ca. 145 nm in diameter, assembled within the cytoplasm, and has a genome size of ca. 485 kb. Sequences obtained through PCR-amplification of DNA polymerase (polB) genes clustered among sequences from the family Phycodnaviridae, whereas major capsid protein (MCP) sequences clustered among sequences from either the Phycodnaviridae or Mimiviridae. Based on quantitative molecular assays, C. parva's abundance in Lakemore » Ontario was relatively stable, yet CpV-BQ1's abundance was variable suggesting complex virus-host dynamics. This study demonstrates that CpV-BQ1 is a member of the proposed order Megavirales with characteristics of both phycodnaviruses and mimiviruses indicating that, in addition to its complex ecological dynamics, it also has a complex evolutionary history. - Highlights: • A virus infecting the algae C. parva was isolated from Lake Ontario. • Virus characteristics demonstrated that this novel virus is an NCLDV. • The virus's polB sequence suggests taxonomic affiliation with the Phycodnaviridae. • The virus's capsid protein sequences also suggest Mimiviridae ancestry. • Surveys of host and virus natural abundances revealed complex host–virus dynamics.« less
Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements
Mukherjee, Supratim; Stamatis, Dimitri; Bertsch, Jon; Ovchinnikova, Galina; Verezemska, Olena; Isbandi, Michelle; Thomas, Alex D.; Ali, Rida; Sharma, Kaushal; Kyrpides, Nikos C.; Reddy, T. B. K.
2017-01-01
The Genomes Online Database (GOLD) (https://gold.jgi.doe.gov) is a manually curated data management system that catalogs sequencing projects with associated metadata from around the world. In the current version of GOLD (v.6), all projects are organized based on a four level classification system in the form of a Study, Organism (for isolates) or Biosample (for environmental samples), Sequencing Project and Analysis Project. Currently, GOLD provides information for 26 117 Studies, 239 100 Organisms, 15 887 Biosamples, 97 212 Sequencing Projects and 78 579 Analysis Projects. These are integrated with over 312 metadata fields from which 58 are controlled vocabularies with 2067 terms. The web interface facilitates submission of a diverse range of Sequencing Projects (such as isolate genome, single-cell genome, metagenome, metatranscriptome) and complex Analysis Projects (such as genome from metagenome, or combined assembly from multiple Sequencing Projects). GOLD provides a seamless interface with the Integrated Microbial Genomes (IMG) system and supports and promotes the Genomic Standards Consortium (GSC) Minimum Information standards. This paper describes the data updates and additional features added during the last two years. PMID:27794040
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, L.E.; Detter, C,; Barrie, K.
2006-06-01
Sequencing of the large (>50 kb), low-copy-number (<5 per cell) plasmids that mediate horizontal gene transfer has been hindered by the difficulty and expense of isolating DNA from individual plasmids of this class. We report here that a kit method previously devised for purification of bacterial artificial chromosomes (BACs) can be adapted for effective preparation of individual plasmids up to 220 kb from wild gram-negative and gram-positive bacteria. Individual plasmid DNA recovered from less than 10 ml of Escherichia coli, Staphylococcus, and Corynebacterium cultures was of sufficient quantity and quality for construction of highcoverage libraries, as shown by sequencing fivemore » native plasmids ranging in size from 30 kb to 94 kb. We also report recommendations for vector screening to optimize plasmid sequence assembly, preliminary annotation of novel plasmid genomes, and insights on mobile genetic element biology derived from these sequences. Adaptation of this BAC method for large plasmid isolation removes one major technical hurdle to expanding our knowledge of the natural plasmid gene pool.« less
Comparative genomics of Beauveria bassiana: uncovering signatures of virulence against mosquitoes.
Valero-Jiménez, Claudio A; Faino, Luigi; Spring In't Veld, Daphne; Smit, Sandra; Zwaan, Bas J; van Kan, Jan A L
2016-12-01
Entomopathogenic fungi such as Beauveria bassiana are promising biological agents for control of malaria mosquitoes. Indeed, infection with B. bassiana reduces the lifespan of mosquitoes in the laboratory and in the field. Natural isolates of B. bassiana show up to 10-fold differences in virulence between the most and the least virulent isolate. In this study, we sequenced the genomes of five isolates representing the extremes of low/high virulence and three RNA libraries, and applied a genome comparison approach to uncover genetic mechanisms underpinning virulence. A high-quality, near-complete genome assembly was achieved for the highly virulent isolate Bb8028, which was compared to the assemblies of the four other isolates. Whole genome analysis showed a high level of genetic diversity between the five isolates (2.85-16.8 SNPs/kb), which grouped into two distinct phylogenetic clusters. Mating type gene analysis revealed the presence of either the MAT1-1-1 or the MAT1-2-1 gene. Moreover, a putative new MAT gene (MAT1-2-8) was detected in the MAT1-2 locus. Comparative genome analysis revealed that Bb8028 contains 163 genes exclusive for this isolate. These unique genes have a tendency to cluster in the genome and to be often located near the telomeres. Among the genes unique to Bb8028 are a Non-Ribosomal Peptide Synthetase (NRPS) secondary metabolite gene cluster, a polyketide synthase (PKS) gene, and five genes with homology to bacterial toxins. A survey of candidate virulence genes for B. bassiana is presented. Our results indicate several genes and molecular processes that may underpin virulence towards mosquitoes. Thus, the genome sequences of five isolates of B. bassiana provide a better understanding of the natural variation in virulence and will offer a major resource for future research on this important biological control agent.
Pritchard, Leighton; Holden, Nicola J; Bielaszewska, Martina; Karch, Helge; Toth, Ian K
2012-01-01
An Escherichia coli O104:H4 outbreak in Germany in summer 2011 caused 53 deaths, over 4000 individual infections across Europe, and considerable economic, social and political impact. This outbreak was the first in a position to exploit rapid, benchtop high-throughput sequencing (HTS) technologies and crowdsourced data analysis early in its investigation, establishing a new paradigm for rapid response to disease threats. We describe a novel strategy for design of diagnostic PCR primers that exploited this rapid draft bacterial genome sequencing to distinguish between E. coli O104:H4 outbreak isolates and other pathogenic E. coli isolates, including the historical hæmolytic uræmic syndrome (HUSEC) E. coli HUSEC041 O104:H4 strain, which possesses the same serotype as the outbreak isolates. Primers were designed using a novel alignment-free strategy against eleven draft whole genome assemblies of E. coli O104:H4 German outbreak isolates from the E. coli O104:H4 Genome Analysis Crowd-Sourcing Consortium website, and a negative sequence set containing 69 E. coli chromosome and plasmid sequences from public databases. Validation in vitro against 21 'positive' E. coli O104:H4 outbreak and 32 'negative' non-outbreak EHEC isolates indicated that individual primer sets exhibited 100% sensitivity for outbreak isolates, with false positive rates of between 9% and 22%. A minimal combination of two primers discriminated between outbreak and non-outbreak E. coli isolates with 100% sensitivity and 100% specificity. Draft genomes of isolates of disease outbreak bacteria enable high throughput primer design and enhanced diagnostic performance in comparison to traditional molecular assays. Future outbreak investigations will be able to harness HTS rapidly to generate draft genome sequences and diagnostic primer sets, greatly facilitating epidemiology and clinical diagnostics. We expect that high throughput primer design strategies will enable faster, more precise responses to future disease outbreaks of bacterial origin, and help to mitigate their societal impact.
Huang, Yao-Ting; Cheng, Jan-Fang; Chen, Shi-Yu; Hong, Yu-Kai; Wu, Zong-Yen; Liu, Po-Yu
2018-06-19
Shewanella algae is an environmental marine bacteria and an emerging opportunistic human pathogen. Moreover, there are increasing reports of strains showing multi-drug resistance, particularly carbapenem-resistant isolates. Although S. algae have been found in bivalve shellfish aquaculture, there is very little genome-wide data on resistant determinants in S. algae from shellfish. In the study, we aimed to determine the whole genome sequence of carbapenem-resistant S. algae strain AC isolated from small abalone in Taiwan. Genome DNA was sequenced using an Illumina MiSeq platform using 250bp paired-end reads. De novo genome assembly was performed using Velvet v1.2.07. The whole genome was annotated and several candidate genes for antimicrobial resistance were identified. The genome size was calculated at 4,751,156bp, with a mean G+C content of 53.09%. A total of 4,164 protein-coding sequences, 7 rRNAs, 85 tRNAs, and 5 non-coding RNAs were identified. The genome contains genes associated with resistance to β-lactams, trimethoprim, tetracycline, colistin, and quinolone resistance. Multiple efflux pump genes were also detected. Small abalone is a potential source of foodborne drug resistant S. algae. The genome sequence of a carbapenem-resistant S. algae strain AC isolated from small abalone will provide valuable information for further study of the dissemination of resistance genes at the human-animal interface. Copyright © 2018. Published by Elsevier Ltd.
Genome-wide characterization of centromeric satellites from multiple mammalian genomes.
Alkan, Can; Cardone, Maria Francesca; Catacchio, Claudia Rita; Antonacci, Francesca; O'Brien, Stephen J; Ryder, Oliver A; Purgato, Stefania; Zoli, Monica; Della Valle, Giuliano; Eichler, Evan E; Ventura, Mario
2011-01-01
Despite its importance in cell biology and evolution, the centromere has remained the final frontier in genome assembly and annotation due to its complex repeat structure. However, isolation and characterization of the centromeric repeats from newly sequenced species are necessary for a complete understanding of genome evolution and function. In recent years, various genomes have been sequenced, but the characterization of the corresponding centromeric DNA has lagged behind. Here, we present a computational method (RepeatNet) to systematically identify higher-order repeat structures from unassembled whole-genome shotgun sequence and test whether these sequence elements correspond to functional centromeric sequences. We analyzed genome datasets from six species of mammals representing the diversity of the mammalian lineage, namely, horse, dog, elephant, armadillo, opossum, and platypus. We define candidate monomer satellite repeats and demonstrate centromeric localization for five of the six genomes. Our analysis revealed the greatest diversity of centromeric sequences in horse and dog in contrast to elephant and armadillo, which showed high-centromeric sequence homogeneity. We could not isolate centromeric sequences within the platypus genome, suggesting that centromeres in platypus are not enriched in satellite DNA. Our method can be applied to the characterization of thousands of other vertebrate genomes anticipated for sequencing in the near future, providing an important tool for annotation of centromeres.
Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.
2005-08-26
Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. Amore » minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas
The number of genomes from uncultivated microbes will soon surpass the number of isolate genomes in public databases (Hugenholtz, Skarshewski, & Parks, 2016). Technological advancements in high-throughput sequencing and assembly, including single-cell genomics and the computational extraction of genomes from metagenomes (GFMs), are largely responsible. Here we propose community standards for reporting the Minimum Information about a Single-Cell Genome (MIxS-SCG) and Minimum Information about Genomes extracted From Metagenomes (MIxS-GFM) specific for Bacteria and Archaea. The standards have been developed in the context of the International Genomics Standards Consortium (GSC) community (Field et al., 2014) and can be viewed as amore » supplement to other GSC checklists including the Minimum Information about a Genome Sequence (MIGS), Minimum information about a Metagenomic Sequence(s) (MIMS) (Field et al., 2008) and Minimum Information about a Marker Gene Sequence (MIMARKS) (P. Yilmaz et al., 2011). Community-wide acceptance of MIxS-SCG and MIxS-GFM for Bacteria and Archaea will enable broad comparative analyses of genomes from the majority of taxa that remain uncultivated, improving our understanding of microbial function, ecology, and evolution.« less
Zhu, Yafeng; Engström, Pär G; Tellgren-Roth, Christian; Baudo, Charles D; Kennell, John C; Sun, Sheng; Billmyre, R Blake; Schröder, Markus S; Andersson, Anna; Holm, Tina; Sigurgeirsson, Benjamin; Wu, Guangxi; Sankaranarayanan, Sundar Ram; Siddharthan, Rahul; Sanyal, Kaustuv; Lundeberg, Joakim; Nystedt, Björn; Boekhout, Teun; Dawson, Thomas L; Heitman, Joseph; Scheynius, Annika; Lehtiö, Janne
2017-03-17
Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Johnston, Chad W; Skinnider, Michael A; Wyatt, Morgan A; Li, Xiang; Ranieri, Michael R M; Yang, Lian; Zechel, David L; Ma, Bin; Magarvey, Nathan A
2015-09-28
Bacterial natural products are a diverse and valuable group of small molecules, and genome sequencing indicates that the vast majority remain undiscovered. The prediction of natural product structures from biosynthetic assembly lines can facilitate their discovery, but highly automated, accurate, and integrated systems are required to mine the broad spectrum of sequenced bacterial genomes. Here we present a genome-guided natural products discovery tool to automatically predict, combinatorialize and identify polyketides and nonribosomal peptides from biosynthetic assembly lines using LC-MS/MS data of crude extracts in a high-throughput manner. We detail the directed identification and isolation of six genetically predicted polyketides and nonribosomal peptides using our Genome-to-Natural Products platform. This highly automated, user-friendly programme provides a means of realizing the potential of genetically encoded natural products.
NASA Astrophysics Data System (ADS)
Jones, Frances Patricia; Clark, Ian M.; King, Robert; Shaw, Liz J.; Woodward, Martin J.; Hirsch, Penny R.
2016-05-01
The slow-growing genus Bradyrhizobium is biologically important in soils, with different representatives found to perform a range of biochemical functions including photosynthesis, induction of root nodules and symbiotic nitrogen fixation and denitrification. Consequently, the role of the genus in soil ecology and biogeochemical transformations is of agricultural and environmental significance. Some isolates of Bradyrhizobium have been shown to be non-symbiotic and do not possess the ability to form nodules. Here we present the genome and gene annotations of two such free-living Bradyrhizobium isolates, named G22 and BF49, from soils with differing long-term management regimes (grassland and bare fallow respectively) in addition to carbon metabolism analysis. These Bradyrhizobium isolates are the first to be isolated and sequenced from European soil and are the first free-living Bradyrhizobium isolates, lacking both nodulation and nitrogen fixation genes, to have their genomes sequenced and assembled from cultured samples. The G22 and BF49 genomes are distinctly different with respect to size and number of genes; the grassland isolate also contains a plasmid. There are also a number of functional differences between these isolates and other published genomes, suggesting that this ubiquitous genus is extremely heterogeneous and has roles within the community not including symbiotic nitrogen fixation.
Gualtieri, Gustavo; Conner, Joann A.; Morishige, Daryl T.; Moore, L. David; Mullet, John E.; Ozias-Akins, Peggy
2006-01-01
Bacterial artificial chromosome (BAC) clones from apomicts Pennisetum squamulatum and buffelgrass (Cenchrus ciliaris), isolated with the apospory-specific genomic region (ASGR) marker ugt197, were assembled into contigs that were extended by chromosome walking. Gene-like sequences from contigs were identified by shotgun sequencing and BLAST searches, and used to isolate orthologous rice contigs. Additional gene-like sequences in the apomicts' contigs were identified by bioinformatics using fully sequenced BACs from orthologous rice contigs as templates, as well as by interspecies, whole-contig cross-hybridizations. Hierarchical contig orthology was rapidly assessed by constructing detailed long-range contig molecular maps showing the distribution of gene-like sequences and markers, and searching for microsyntenic patterns of sequence identity and spatial distribution within and across species contigs. We found microsynteny between P. squamulatum and buffelgrass contigs. Importantly, this approach also enabled us to isolate from within the rice (Oryza sativa) genome contig Rice A, which shows the highest microsynteny and is most orthologous to the ugt197-containing C1C buffelgrass contig. Contig Rice A belongs to the rice genome database contig 77 (according to the current September 12, 2003, rice fingerprint contig build) that maps proximal to the chromosome 11 centromere, a feature that interestingly correlates with the mapping of ASGR-linked BACs proximal to the centromere or centromere-like sequences. Thus, relatedness between these two orthologous contigs is supported both by their molecular microstructure and by their centromeric-proximal location. Our discoveries promote the use of a microsynteny-based positional-cloning approach using the rice genome as a template to aid in constructing the ASGR toward the isolation of genes underlying apospory. PMID:16415213
Gualtieri, Gustavo; Conner, Joann A; Morishige, Daryl T; Moore, L David; Mullet, John E; Ozias-Akins, Peggy
2006-03-01
Bacterial artificial chromosome (BAC) clones from apomicts Pennisetum squamulatum and buffelgrass (Cenchrus ciliaris), isolated with the apospory-specific genomic region (ASGR) marker ugt197, were assembled into contigs that were extended by chromosome walking. Gene-like sequences from contigs were identified by shotgun sequencing and BLAST searches, and used to isolate orthologous rice contigs. Additional gene-like sequences in the apomicts' contigs were identified by bioinformatics using fully sequenced BACs from orthologous rice contigs as templates, as well as by interspecies, whole-contig cross-hybridizations. Hierarchical contig orthology was rapidly assessed by constructing detailed long-range contig molecular maps showing the distribution of gene-like sequences and markers, and searching for microsyntenic patterns of sequence identity and spatial distribution within and across species contigs. We found microsynteny between P. squamulatum and buffelgrass contigs. Importantly, this approach also enabled us to isolate from within the rice (Oryza sativa) genome contig Rice A, which shows the highest microsynteny and is most orthologous to the ugt197-containing C1C buffelgrass contig. Contig Rice A belongs to the rice genome database contig 77 (according to the current September 12, 2003, rice fingerprint contig build) that maps proximal to the chromosome 11 centromere, a feature that interestingly correlates with the mapping of ASGR-linked BACs proximal to the centromere or centromere-like sequences. Thus, relatedness between these two orthologous contigs is supported both by their molecular microstructure and by their centromeric-proximal location. Our discoveries promote the use of a microsynteny-based positional-cloning approach using the rice genome as a template to aid in constructing the ASGR toward the isolation of genes underlying apospory.
Hannou, Najat; Mondy, Samuel; Planamente, Sara; Moumni, Mohieddine; Llop, Pablo; López, María; Manceau, Charles; Barny, Marie-Anne; Faure, Denis
2013-10-01
Erwinia amylovora causes economic losses that affect pear and apple production in Morocco. Here, we report comparative genomics of four Moroccan E. amylovora strains with the European strain CFBP1430 and North-American strain ATCC49946. Analysis of single nucleotide polymorphisms (SNPs) revealed genetic homogeneity of Moroccan's strains and their proximity to the European strain CFBP1430. Moreover, the collected sequences allowed the assembly of a 65 kpb plasmid, which is highly similar to the plasmid pEI70 harbored by several European E. amylovora isolates. This plasmid was found in 33% of the 40 E. amylovora strains collected from several host plants in 2009 and 2010 in Morocco. Copyright © 2013 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.
Rodriguez-R, Luis M; Gunturu, Santosh; Harvey, William T; Rosselló-Mora, Ramon; Tiedje, James M; Cole, James R; Konstantinidis, Konstantinos T
2018-06-14
The small subunit ribosomal RNA gene (16S rRNA) has been successfully used to catalogue and study the diversity of prokaryotic species and communities but it offers limited resolution at the species and finer levels, and cannot represent the whole-genome diversity and fluidity. To overcome these limitations, we introduced the Microbial Genomes Atlas (MiGA), a webserver that allows the classification of an unknown query genomic sequence, complete or partial, against all taxonomically classified taxa with available genome sequences, as well as comparisons to other related genomes including uncultivated ones, based on the genome-aggregate Average Nucleotide and Amino Acid Identity (ANI/AAI) concepts. MiGA integrates best practices in sequence quality trimming and assembly and allows input to be raw reads or assemblies from isolate genomes, single-cell sequences, and metagenome-assembled genomes (MAGs). Further, MiGA can take as input hundreds of closely related genomes of the same or closely related species (a so-called 'Clade Project') to assess their gene content diversity and evolutionary relationships, and calculate important clade properties such as the pangenome and core gene sets. Therefore, MiGA is expected to facilitate a range of genome-based taxonomic and diversity studies, and quality assessment across environmental and clinical settings. MiGA is available at http://microbial-genomes.org/.
Molecular Microbial Analyses of the Mars Exploration Rovers Assembly Facility
NASA Technical Reports Server (NTRS)
Venkateswaran, Kasthuri; LaDuc, Myron T.; Newcombe, David; Kempf, Michael J.; Koke, John. A.; Smoot, James C.; Smoot, Laura M.; Stahl, David A.
2004-01-01
During space exploration, the control of terrestrial microbes associated with robotic space vehicles intended to land on extraterrestrial solar system bodies is necessary to prevent forward contamination and maintain scientific integrity during the search for life. Microorganisms associated with the spacecraft assembly environment can be a source of contamination for the spacecraft. In this study, we have monitored the microbial burden of air samples of the Mars Exploration Rovers' assembly facility at the Kennedy Space Center utilizing complementary diagnostic tools. To estimate the microbial burden and identify potential contaminants in the assembly facility, several microbiological techniques were used including culturing, cloning and sequencing of 16S rRNA genes, DNA microarray analysis, and ATP assays to assess viable microorganisms. Culturing severely underestimated types and amounts of contamination since many of the microbes implicated by molecular analyses were not cultivable. In addition to the cultivation of Agrobacterium, Burkholderia and Bacillus species, the cloning approach retrieved 16s rDNA sequences of oligotrophs, symbionts, and y-proteobacteria members. DNA microarray analysis based on rational probe design and dissociation curves complemented existing molecular techniques and produced a highly parallel, high resolution analysis of contaminating microbial populations. For instance, strong hybridization signals to probes targeting the Bacillus species indicated that members of this species were present in the assembly area samples; however, differences in dissociation curves between perfect-match and air sample sequences showed that these samples harbored nucleotide polymorphisms. Vegetative cells of several isolates were resistant when subjected to treatments of UVC (254 nm) and vapor H202 (4 mg/L). This study further validates the significance of non-cultivable microbes in association with spacecraft assembly facilities, as our analyses have identified several non-cultivable microbes likely to contaminate the surfaces of spacecraft hardware.
USDA-ARS?s Scientific Manuscript database
In previous work, we reported on the isolation and genome sequence analysis of Bacillus cereus strain tsu1 NCBI accession number JPYN00000000. The 36 scaffolds in the assembled tsu1 genome were all aligned with B. cereus B4264 genome with variations. Genes encoding for xylanase and cellulase and the...
Draft Genome Sequence of the Tyramine Producer Enterococcus durans Strain IPLA 655
Ladero, Victor; Linares, Daniel M.; del Rio, Beatriz; Fernandez, Maria; Martin, M. Cruz
2013-01-01
We here report a 3.059-Mbp draft assembly for the genome of Enterococcus durans strain IPLA 655. This dairy isolate provides a model for studying the regulation of the biosynthesis of tyramine (a toxic compound). These results should aid our understanding of tyramine production and allow tyramine accumulation in food to be reduced. PMID:23682153
Long-read sequencing data analysis for yeasts.
Yue, Jia-Xing; Liti, Gianni
2018-06-01
Long-read sequencing technologies have become increasingly popular due to their strengths in resolving complex genomic regions. As a leading model organism with small genome size and great biotechnological importance, the budding yeast Saccharomyces cerevisiae has many isolates currently being sequenced with long reads. However, analyzing long-read sequencing data to produce high-quality genome assembly and annotation remains challenging. Here, we present a modular computational framework named long-read sequencing data analysis for yeasts (LRSDAY), the first one-stop solution that streamlines this process. Starting from the raw sequencing reads, LRSDAY can produce chromosome-level genome assembly and comprehensive genome annotation in a highly automated manner with minimal manual intervention, which is not possible using any alternative tool available to date. The annotated genomic features include centromeres, protein-coding genes, tRNAs, transposable elements (TEs), and telomere-associated elements. Although tailored for S. cerevisiae, we designed LRSDAY to be highly modular and customizable, making it adaptable to virtually any eukaryotic organism. When applying LRSDAY to an S. cerevisiae strain, it takes ∼41 h to generate a complete and well-annotated genome from ∼100× Pacific Biosciences (PacBio) running the basic workflow with four threads. Basic experience working within the Linux command-line environment is recommended for carrying out the analysis using LRSDAY.
Rusch, Douglas B; Halpern, Aaron L; Sutton, Granger; Heidelberg, Karla B; Williamson, Shannon; Yooseph, Shibu; Wu, Dongying; Eisen, Jonathan A; Hoffman, Jeff M; Remington, Karin; Beeson, Karen; Tran, Bao; Smith, Hamilton; Baden-Tillson, Holly; Stewart, Clare; Thorpe, Joyce; Freeman, Jason; Andrews-Pfannkoch, Cynthia; Venter, Joseph E; Li, Kelvin; Kravitz, Saul; Heidelberg, John F; Utterback, Terry; Rogers, Yu-Hui; Falcón, Luisa I; Souza, Valeria; Bonilla-Rosso, Germán; Eguiarte, Luis E; Karl, David M; Sathyendranath, Shubha; Platt, Trevor; Bermingham, Eldredge; Gallardo, Victor; Tamayo-Castillo, Giselle; Ferrari, Michael R; Strausberg, Robert L; Nealson, Kenneth; Friedman, Robert; Frazier, Marvin; Venter, J. Craig
2007-01-01
The world's oceans contain a complex mixture of micro-organisms that are for the most part, uncharacterized both genetically and biochemically. We report here a metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition. These samples, collected across a several-thousand km transect from the North Atlantic through the Panama Canal and ending in the South Pacific yielded an extensive dataset consisting of 7.7 million sequencing reads (6.3 billion bp). Though a few major microbial clades dominate the planktonic marine niche, the dataset contains great diversity with 85% of the assembled sequence and 57% of the unassembled data being unique at a 98% sequence identity cutoff. Using the metadata associated with each sample and sequencing library, we developed new comparative genomic and assembly methods. One comparative genomic method, termed “fragment recruitment,” addressed questions of genome structure, evolution, and taxonomic or phylogenetic diversity, as well as the biochemical diversity of genes and gene families. A second method, termed “extreme assembly,” made possible the assembly and reconstruction of large segments of abundant but clearly nonclonal organisms. Within all abundant populations analyzed, we found extensive intra-ribotype diversity in several forms: (1) extensive sequence variation within orthologous regions throughout a given genome; despite coverage of individual ribotypes approaching 500-fold, most individual sequencing reads are unique; (2) numerous changes in gene content some with direct adaptive implications; and (3) hypervariable genomic islands that are too variable to assemble. The intra-ribotype diversity is organized into genetically isolated populations that have overlapping but independent distributions, implying distinct environmental preference. We present novel methods for measuring the genomic similarity between metagenomic samples and show how they may be grouped into several community types. Specific functional adaptations can be identified both within individual ribotypes and across the entire community, including proteorhodopsin spectral tuning and the presence or absence of the phosphate-binding gene PstS. PMID:17355176
Gan, Han Ming; Thomas, Bolaji N; Cavanaugh, Nicole T; Morales, Grace H; Mayers, Ashley N; Savka, Michael A; Hudson, André O
2017-01-01
In industry, the yeast Rhodotorula mucilaginosa is commonly used for the production of carotenoids. The production of carotenoids is important because they are used as natural colorants in food and some carotenoids are precursors of retinol (vitamin A). However, the identification and molecular characterization of the carotenoid pathway/s in species belonging to the genus Rhodotorula is scarce due to the lack of genomic information thus potentially impeding effective metabolic engineering of these yeast strains for improved carotenoid production. In this study, we report the isolation, identification, characterization and the whole nuclear genome and mitogenome sequence of the endophyte R. mucilaginosa RIT389 isolated from Distemonanthus benthamianus, a plant known for its anti-fungal and antibacterial properties and commonly used as chewing sticks. The assembled genome of R. mucilaginosa RIT389 is 19 Mbp in length with an estimated genomic heterozygosity of 9.29%. Whole genome phylogeny supports the species designation of strain RIT389 within the genus in addition to supporting the monophyly of the currently sequenced Rhodotorula species. Further, we report for the first time, the recovery of the complete mitochondrial genome of R. mucilaginosa using the genome skimming approach. The assembled mitogenome is at least 7,000 bases larger than that of Rhodotorula taiwanensis which is largely attributed to the presence of large intronic regions containing open reading frames coding for homing endonuclease from the LAGLIDADG and GIY-YIG families. Furthermore, genomic regions containing the key genes for carotenoid production were identified in R. mucilaginosa RIT389, revealing differences in gene synteny that may play a role in the regulation of the biotechnologically important carotenoid synthesis pathways in yeasts.
Thomas, Bolaji N.; Cavanaugh, Nicole T.; Morales, Grace H.; Mayers, Ashley N.; Savka, Michael A.
2017-01-01
In industry, the yeast Rhodotorula mucilaginosa is commonly used for the production of carotenoids. The production of carotenoids is important because they are used as natural colorants in food and some carotenoids are precursors of retinol (vitamin A). However, the identification and molecular characterization of the carotenoid pathway/s in species belonging to the genus Rhodotorula is scarce due to the lack of genomic information thus potentially impeding effective metabolic engineering of these yeast strains for improved carotenoid production. In this study, we report the isolation, identification, characterization and the whole nuclear genome and mitogenome sequence of the endophyte R. mucilaginosa RIT389 isolated from Distemonanthus benthamianus, a plant known for its anti-fungal and antibacterial properties and commonly used as chewing sticks. The assembled genome of R. mucilaginosa RIT389 is 19 Mbp in length with an estimated genomic heterozygosity of 9.29%. Whole genome phylogeny supports the species designation of strain RIT389 within the genus in addition to supporting the monophyly of the currently sequenced Rhodotorula species. Further, we report for the first time, the recovery of the complete mitochondrial genome of R. mucilaginosa using the genome skimming approach. The assembled mitogenome is at least 7,000 bases larger than that of Rhodotorula taiwanensis which is largely attributed to the presence of large intronic regions containing open reading frames coding for homing endonuclease from the LAGLIDADG and GIY-YIG families. Furthermore, genomic regions containing the key genes for carotenoid production were identified in R. mucilaginosa RIT389, revealing differences in gene synteny that may play a role in the regulation of the biotechnologically important carotenoid synthesis pathways in yeasts. PMID:29158974
Benardini, James N; Vaishampayan, Parag A; Schwendner, Petra; Swanner, Elizabeth; Fukui, Youhei; Osman, Sharif; Satomi, Masakata; Venkateswaran, Kasthuri
2011-06-01
A novel Gram-positive, motile, endospore-forming, aerobic bacterium was isolated from the NASA Phoenix Lander assembly clean room that exhibits 100 % 16S rRNA gene sequence similarity to two strains isolated from a deep subsurface environment. All strains are rod-shaped, endospore-forming bacteria, whose endospores are resistant to UV radiation up to 500 J m(-2). A polyphasic taxonomic study including traditional phenotypic tests, fatty acid analysis, 16S rRNA gene sequencing and DNA-DNA hybridization analysis was performed to characterize these novel strains. The 16S rRNA gene sequencing convincingly grouped these novel strains within the genus Paenibacillus as a separate cluster from previously described species. The similarity of 16S rRNA gene sequences among the novel strains was identical but only 98.1 to 98.5 % with their nearest neighbours Paenibacillus barengoltzii ATCC BAA-1209(T) and Paenibacillus timonensis CIP 108005(T). The menaquinone MK-7 was dominant in these novel strains as shown in other species of the genus Paenibacillus. The DNA-DNA hybridization dissociation value was <45 % with the closest related species. The novel strains had DNA G+C contents of 51.9 to 52.8 mol%. Phenotypically, the novel strains can be readily differentiated from closely related species by the absence of urease and gelatinase and the production of acids from a variety of sugars including l-arabinose. The major fatty acid was anteiso-C(15 : 0) as seen in P. barengoltzii and P. timonensis whereas the proportion of C(16 : 0) was significantly different from the closely related species. Based on phylogenetic and phenotypic results, it was concluded that these strains represent a novel species of the genus Paenibacillus, for which the name Paenibacillus phoenicis sp. nov. is proposed. The type strain is 3PO2SA(T) ( = NRRL B-59348(T) = NBRC 106274(T)).
Genetic Characterization of a Panel of Diverse HIV-1 Isolates at Seven International Sites
Chen, Yue; Sanchez, Ana M.; Sabino, Ester; Hunt, Gillian; Ledwaba, Johanna; Hackett, John; Swanson, Priscilla; Hewlett, Indira; Ragupathy, Viswanath; Vikram Vemula, Sai; Zeng, Peibin; Tee, Kok-Keng; Chow, Wei Zhen; Ji, Hezhao; Sandstrom, Paul; Denny, Thomas N.; Busch, Michael P.; Gao, Feng
2016-01-01
HIV-1 subtypes and drug resistance are routinely tested by many international surveillance groups. However, results from different sites often vary. A systematic comparison of results from multiple sites is needed to determine whether a standardized protocol is required for consistent and accurate data analysis. A panel of well-characterized HIV-1 isolates (N = 50) from the External Quality Assurance Program Oversight Laboratory (EQAPOL) was assembled for evaluation at seven international sites. This virus panel included seven subtypes, six circulating recombinant forms (CRFs), nine unique recombinant forms (URFs) and three group O viruses. Seven viruses contained 10 major drug resistance mutations (DRMs). HIV-1 isolates were prepared at a concentration of 107 copies/ml and compiled into blinded panels. Subtypes and DRMs were determined with partial or full pol gene sequences by conventional Sanger sequencing and/or Next Generation Sequencing (NGS). Subtype and DRM results were reported and decoded for comparison with full-length genome sequences generated by EQAPOL. The partial pol gene was amplified by RT-PCR and sequenced for 89.4%-100% of group M viruses at six sites. Subtyping results of majority of the viruses (83%-97.9%) were correctly determined for the partial pol sequences. All 10 major DRMs in seven isolates were detected at these six sites. The complete pol gene sequence was also obtained by NGS at one site. However, this method missed six group M viruses and sequences contained host chromosome fragments. Three group O viruses were only characterized with additional group O-specific RT-PCR primers employed by one site. These results indicate that PCR protocols and subtyping tools should be standardized to efficiently amplify diverse viruses and more consistently assign virus genotypes, which is critical for accurate global subtype and drug resistance surveillance. Targeted NGS analysis of partial pol sequences can serve as an alternative approach, especially for detection of low-abundance DRMs. PMID:27314585
Genetic Characterization of a Panel of Diverse HIV-1 Isolates at Seven International Sites.
Hora, Bhavna; Keating, Sheila M; Chen, Yue; Sanchez, Ana M; Sabino, Ester; Hunt, Gillian; Ledwaba, Johanna; Hackett, John; Swanson, Priscilla; Hewlett, Indira; Ragupathy, Viswanath; Vikram Vemula, Sai; Zeng, Peibin; Tee, Kok-Keng; Chow, Wei Zhen; Ji, Hezhao; Sandstrom, Paul; Denny, Thomas N; Busch, Michael P; Gao, Feng
2016-01-01
HIV-1 subtypes and drug resistance are routinely tested by many international surveillance groups. However, results from different sites often vary. A systematic comparison of results from multiple sites is needed to determine whether a standardized protocol is required for consistent and accurate data analysis. A panel of well-characterized HIV-1 isolates (N = 50) from the External Quality Assurance Program Oversight Laboratory (EQAPOL) was assembled for evaluation at seven international sites. This virus panel included seven subtypes, six circulating recombinant forms (CRFs), nine unique recombinant forms (URFs) and three group O viruses. Seven viruses contained 10 major drug resistance mutations (DRMs). HIV-1 isolates were prepared at a concentration of 107 copies/ml and compiled into blinded panels. Subtypes and DRMs were determined with partial or full pol gene sequences by conventional Sanger sequencing and/or Next Generation Sequencing (NGS). Subtype and DRM results were reported and decoded for comparison with full-length genome sequences generated by EQAPOL. The partial pol gene was amplified by RT-PCR and sequenced for 89.4%-100% of group M viruses at six sites. Subtyping results of majority of the viruses (83%-97.9%) were correctly determined for the partial pol sequences. All 10 major DRMs in seven isolates were detected at these six sites. The complete pol gene sequence was also obtained by NGS at one site. However, this method missed six group M viruses and sequences contained host chromosome fragments. Three group O viruses were only characterized with additional group O-specific RT-PCR primers employed by one site. These results indicate that PCR protocols and subtyping tools should be standardized to efficiently amplify diverse viruses and more consistently assign virus genotypes, which is critical for accurate global subtype and drug resistance surveillance. Targeted NGS analysis of partial pol sequences can serve as an alternative approach, especially for detection of low-abundance DRMs.
Complete genome sequences of Geobacillus sp. WCH70, a thermophilic strain isolated from wood compost
Brumm, Phillip; Land, Miriam L.; Mead, David
2016-04-27
Geobacillus sp. WCH70 was one of several thermophilic organisms isolated from hot composts in the Middleton, WI area. Comparison of 16 S rRNA sequences showed the strain may be a new species, and is most closely related to G. galactosidasius and G. toebii. The genome was sequenced, assembled, and annotated by the DOE Joint Genome Institute and deposited at the NCBI in December 2009 (CP001638). The genome of Geobacillus species WCH70 consists of one circular chromosome of 3,893,306 bp with an average G + C content of 43 %, and two circular plasmids of 33,899 and 10,287 bp with anmore » average G + C content of 40 %. Among sequenced organisms, Geobacillus sp. WCH70 shares highest Average Nucleotide Identity (86 %) with G. thermoglucosidasius strains, as well as similar genome organization. Geobacillus sp. WCH70 appears to be a highly adaptable organism, with an exceptionally high 125 annotated transposons in the genome. The organism also possesses four predicted restriction-modification systems not found in other Geobacillus species.« less
Moura, Quézia; Fernandes, Miriam R; Cerdeira, Louise; Nhambe, Lúcia F; Ienne, Susan; Souza, Tiago A; Lincopan, Nilton
2017-09-01
Multidrug-resistant (MDR) Enterobacter aerogenes strains are frequently associated with nosocomial infections and high mortality rates, representing a serious public health problem. The aim of this study was to present the draft genome sequence of a MDR KPC-2-producing E. aerogenes isolated from a perineal swab of a hospitalised patient in Brazil. Genomic DNA was sequenced using an Illumina MiSeq platform. De novo genome assembly was carried out using the A5-Miseq pipeline, and whole-genome sequence analysis was performed using tools from the Center for Genomic Epidemiology. The strain harboured resistance genes to β-lactams, aminoglycosides, sulphonamides and trimethoprim in addition to genes encoding multidrug efflux system proteins, a quaternary ammonium transporter and heavy metal efflux system proteins. In addition, the strain harboured genes encoding diverse virulence factors. These data might allow a better understanding of the genetic basis of antimicrobial resistance and virulence in E. aerogenes strains. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Complete genome sequences of Geobacillus sp. WCH70, a thermophilic strain isolated from wood compost
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brumm, Phillip; Land, Miriam L.; Mead, David
Geobacillus sp. WCH70 was one of several thermophilic organisms isolated from hot composts in the Middleton, WI area. Comparison of 16 S rRNA sequences showed the strain may be a new species, and is most closely related to G. galactosidasius and G. toebii. The genome was sequenced, assembled, and annotated by the DOE Joint Genome Institute and deposited at the NCBI in December 2009 (CP001638). The genome of Geobacillus species WCH70 consists of one circular chromosome of 3,893,306 bp with an average G + C content of 43 %, and two circular plasmids of 33,899 and 10,287 bp with anmore » average G + C content of 40 %. Among sequenced organisms, Geobacillus sp. WCH70 shares highest Average Nucleotide Identity (86 %) with G. thermoglucosidasius strains, as well as similar genome organization. Geobacillus sp. WCH70 appears to be a highly adaptable organism, with an exceptionally high 125 annotated transposons in the genome. The organism also possesses four predicted restriction-modification systems not found in other Geobacillus species.« less
Sánchez-Rangel, Diana; Hernández-Domínguez, Eric; Pérez-Torres, Claudia-Anahí; Ortiz-Castro, Randy; Villafán, Emanuel; Alonso-Sánchez, Alexandro; Rodríguez-Haas, Benjamín; López-Buenfil, Abel; García-Avila, Clemente; Ramírez-Pool, José-Abrahán
2017-01-01
ABSTRACT Here, we report the genome of Fusarium euwallaceae strain HFEW-16-IV-019, an isolate obtained from Kuroshio shot hole borer (a Euwallacea sp.). These beetles were collected in Tijuana, Mexico, from elm trees showing typical symptoms of Fusarium dieback. The final assembly consists of 287 scaffolds spanning 48,274,071 bp and 13,777 genes. PMID:28860245
Draft Genome Sequence of Desulfovibrio BerOc1, a Mercury-Methylating Strain
Gassie, Claire; Bouchez, Oliver; Klopp, Christophe; Guyoneaud, Rémy
2017-01-01
ABSTRACT Desulfovibrio BerOc1 is a sulfate-reducing bacterium isolated from the Berre lagoon (French Mediterranean coast). BerOc1 is able to methylate and demethylate mercury. The genome size is 4,081,579 bp assembled into five contigs. We identified the hgcA and hgcB genes involved in mercury methylation, but not those responsible for mercury demethylation. PMID:28104657
Guo, Yong; Deng, Xiao; Liang, Yuan; Zhang, Liang; Zhao, Guo-Ping; Zhou, Yan
2018-06-26
The group B Streptococcus (GBS) is a human commensal bacterium, which is capable of causing several infectious diseases in infants, and people with chronic diseases. GBS has been the most common cause of infections in urinary tract of the elders, but relatively few studies reported the urine-isolated GBS and their antimicrobial susceptibilities. Hence, we decided to investigate GBS specially isolated from urine in Suzhou, China. 27 GBS samples were isolated from urine in Suzhou, China. The PCR and agarose gel electrophoresis were used to identify the serotype distribution. Susceptibility tests were based on MIC test and Kirby-Bauer test. Genome were sequenced via Illumina Hiseq platform and assembled by SPAdes. Genomes of five isolates were sequenced and submitted to NCBI genome database. The sequencing files in fastq format were submitted to NCBI SRA database. Five serotypes were identified. The resistant rates measured for tetracycline, erythromycin, clindamycin and fluoroquinolones were 74.1, 63.0, 44.4 and 48.1%, respectively. 18.5% of the isolates were nonsusceptible to nitrofurantoin. The resistance to tetracycline was mainly associated with the gene tetM. The erythromycin resistance was mainly associated with the genes ermB and mefE. The genes ermB and lnuB were the prevalent genes in cMLSB type. No known nitrofurantoin resistance gene was found in nitrofurantoin-nonsusceptible GBS. Five serotypes were identified in our study. High rates of GBS isolates were resistant to tetracycline, erythromycin, clindamycin and fluoroquinolones. The genes ermB and lnuB occupied high rates in cMLS B phenotype.
Kim, Hyung Jun; Jang, Soojin
2017-12-01
Staphylococcus haemolyticus is the second most frequently isolated coagulase-negative staphylococci from blood cultures. Moreover, multidrug resistance associated with the genome flexibility of S. haemolyticus has been increasingly reported worldwide. Here we report the draft genome sequence of multidrug-resistant S. haemolyticus IPK_TSA25 isolated from a building surface in South Korea. Genomic DNA of S. haemolyticus IPK_TSA25 was sequenced using the PacBio RS II sequencing platform. Generated reads were assembled using PacBio SMRT Analysis 2.3.0. The draft genome was annotated and antibiotic resistance genes were identified. The genome of 2517398bp contains various antibiotic resistance genes associated with resistance to β-lactams, aminoglycosides and macrolides. Genome analysis also revealed chromosomal integration of the full-length Staphylococcus aureus plasmid pS0385-1 containing a tetracycline resistance gene. The genome sequence reported in this study will provide valuable information to understand the flexibility of the S. haemolyticus genome, which facilitates acquisition of antibiotic resistance genes and contributes to the dissemination of antibiotic resistance by this emerging pathogen. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Sellera, Fábio P; Fernandes, Miriam R; Moura, Quézia; Souza, Tiago A; Nascimento, Cristiane L; Cerdeira, Louise; Lincopan, Nilton
2018-03-01
The incidence of multidrug-resistant bacteria in wildlife animals has been investigated to improve our knowledge of the spread of clinically relevant antimicrobial resistance genes. The aim of this study was to report the first draft genome sequence of an extensively drug-resistant (XDR) Pseudomonas aeruginosa ST644 isolate recovered from a Magellanic penguin with a footpad infection (bumblefoot) undergoing rehabilitation process. The genome was sequenced on an Illumina NextSeq ® platform using 150-bp paired-end reads. De novo genome assembly was performed using Velvet v.1.2.10, and the whole genome sequence was evaluated using bioinformatics approaches from the Center of Genomic Epidemiology, whereas an in-house method (mapping of raw whole genome sequence reads) was used to identify chromosomal point mutations. The genome size was calculated at 6436450bp, with 6357 protein-coding sequences and the presence of genes conferring resistance to aminoglycosides, β-lactams, phenicols, sulphonamides, tetracyclines, quinolones and fosfomycin; in addition, mutations in the genes gyrA (Thr83Ile), parC (Ser87Leu), phoQ (Arg61His) and pmrB (Tyr345His), conferring resistance to quinolones and polymyxins, respectively, were confirmed. This draft genome sequence can provide useful information for comparative genomic analysis regarding the dissemination of clinically significant antibiotic resistance genes and XDR bacterial species at the human-animal interface. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Draft genome of a Xanthomonas perforans strain associated with pith necrosis.
Torelli, Emanuela; Aiello, Dalia; Polizzi, Giancarlo; Firrao, Giuseppe; Cirvilleri, Gabriella
2015-02-01
Xanthomonas perforans causes bacterial spot of tomato and pepper. A genome draft of an unusual isolate (strain 4P1S2), differing in that it was associated with stem pith necrosis, was assembled from Illumina MiSeq sequencing data using the draft of X. perforans strain 91-118 as a reference. The resulting draft (accession number JRWW00000000) largely overlapped with the reference draft. In addition, the reads not mapping on the reference assembly were selected and used for a further assembly, that revealed a large putative plasmid. The analysis of the predicted proteins showed only few gene features that could be potentially implicated in the switch of a phytopathological behavior. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Purification of High Molecular Weight Genomic DNA from Powdery Mildew for Long-Read Sequencing.
Feehan, Joanna M; Scheibel, Katherine E; Bourras, Salim; Underwood, William; Keller, Beat; Somerville, Shauna C
2017-03-31
The powdery mildew fungi are a group of economically important fungal plant pathogens. Relatively little is known about the molecular biology and genetics of these pathogens, in part due to a lack of well-developed genetic and genomic resources. These organisms have large, repetitive genomes, which have made genome sequencing and assembly prohibitively difficult. Here, we describe methods for the collection, extraction, purification and quality control assessment of high molecular weight genomic DNA from one powdery mildew species, Golovinomyces cichoracearum. The protocol described includes mechanical disruption of spores followed by an optimized phenol/chloroform genomic DNA extraction. A typical yield was 7 µg DNA per 150 mg conidia. The genomic DNA that is isolated using this procedure is suitable for long-read sequencing (i.e., > 48.5 kbp). Quality control measures to ensure the size, yield, and purity of the genomic DNA are also described in this method. Sequencing of the genomic DNA of the quality described here will allow for the assembly and comparison of multiple powdery mildew genomes, which in turn will lead to a better understanding and improved control of this agricultural pathogen.
Genomics of high molecular weight plasmids isolated from an on-farm biopurification system.
Martini, María C; Wibberg, Daniel; Lozano, Mauricio; Torres Tejerizo, Gonzalo; Albicoro, Francisco J; Jaenicke, Sebastian; van Elsas, Jan Dirk; Petroni, Alejandro; Garcillán-Barcia, M Pilar; de la Cruz, Fernando; Schlüter, Andreas; Pühler, Alfred; Pistorio, Mariano; Lagares, Antonio; Del Papa, María F
2016-06-20
The use of biopurification systems (BPS) constitutes an efficient strategy to eliminate pesticides from polluted wastewaters from farm activities. BPS environments contain a high microbial density and diversity facilitating the exchange of information among bacteria, mediated by mobile genetic elements (MGEs), which play a key role in bacterial adaptation and evolution in such environments. Here we sequenced and characterized high-molecular-weight plasmids from a bacterial collection of an on-farm BPS. The high-throughput-sequencing of the plasmid pool yielded a total of several Mb sequence information. Assembly of the sequence data resulted in six complete replicons. Using in silico analyses we identified plasmid replication genes whose encoding proteins represent 13 different Pfam families, as well as proteins involved in plasmid conjugation, indicating a large diversity of plasmid replicons and suggesting the occurrence of horizontal gene transfer (HGT) events within the habitat analyzed. In addition, genes conferring resistance to 10 classes of antimicrobial compounds and those encoding enzymes potentially involved in pesticide and aromatic hydrocarbon degradation were found. Global analysis of the plasmid pool suggest that the analyzed BPS represents a key environment for further studies addressing the dissemination of MGEs carrying catabolic genes and pathway assembly regarding degradation capabilities.
Coughlan, Simone; Taylor, Ali Shirley; Feane, Eoghan; Sanders, Mandy; Schonian, Gabriele; Cotton, James A.
2018-01-01
The unicellular protozoan parasite Leishmania causes the neglected tropical disease leishmaniasis, affecting 12 million people in 98 countries. In South America, where the Viannia subgenus predominates, so far only L. (Viannia) braziliensis and L. (V.) panamensis have been sequenced, assembled and annotated as reference genomes. Addressing this deficit in molecular information can inform species typing, epidemiological monitoring and clinical treatment. Here, L. (V.) naiffi and L. (V.) guyanensis genomic DNA was sequenced to assemble these two genomes as draft references from short sequence reads. The methods used were tested using short sequence reads for L. braziliensis M2904 against its published reference as a comparison. This assembly and annotation pipeline identified 70 additional genes not annotated on the original M2904 reference. Phylogenetic and evolutionary comparisons of L. guyanensis and L. naiffi with 10 other Viannia genomes revealed four traits common to all Viannia: aneuploidy, 22 orthologous groups of genes absent in other Leishmania subgenera, elevated TATE transposon copies and a high NADH-dependent fumarate reductase gene copy number. Within the Viannia, there were limited structural changes in genome architecture specific to individual species: a 45 Kb amplification on chromosome 34 was present in all bar L. lainsoni, L. naiffi had a higher copy number of the virulence factor leishmanolysin, and laboratory isolate L. shawi M8408 had a possible minichromosome derived from the 3’ end of chromosome 34. This combination of genome assembly, phylogenetics and comparative analysis across an extended panel of diverse Viannia has uncovered new insights into the origin and evolution of this subgenus and can help improve diagnostics for leishmaniasis surveillance. PMID:29765675
Orlovskis, Zigmunds; Canale, Maria Cristina; Haryono, Mindia; Lopes, João Roberto Spotti
2017-01-01
Background and Aims Maize bushy stunt phytoplasma (MBSP) is a bacterial pathogen of maize (Zea mays L.) across Latin America. MBSP belongs to the 16SrI-B sub-group within the genus ‘Candidatus Phytoplasma’. MBSP and its insect vector Dalbulus maidis (Hemiptera: Cicadellidae) are restricted to maize; both are thought to have coevolved with maize during its domestication from a teosinte-like ancestor. MBSP-infected maize plants show a diversity of symptoms. and it is likely that MBSP is under strong selection for increased virulence and insect transmission on maize hybrids that are widely grown in Brazil. In this study it was investigated whether the differences in genome sequences of MBSP isolates from two maize-growing regions in South-east Brazil explain variations in symptom severity of the MBSP isolates on various maize genotypes. Methods MBSP isolates were collected from maize production fields in Guaíra and Piracicaba in South-east Brazil for infection assays. One representative isolate was chosen for de novo whole-genome assembly and for the alignment of sequence reads from the genomes of other phytoplasma isolates to detect polymorphisms. Statistical methods were applied to investigate the correlation between variations in disease symptoms of infected maize plants and MBSP sequence polymorphisms. Key Results MBSP isolates contributed consistently to organ proliferation symptoms and maize genotype to leaf necrosis, reddening and yellowing of infected maize plants. The symptom differences are associated with polymorphisms in a phase-variable lipoprotein, which is a candidate effector, and an ATP-dependent lipoprotein ABC export protein, whereas no polymorphisms were observed in other candidate effector genes. Lipoproteins and ABC export proteins activate host defence responses, regulate pathogen attachment to host cells and activate effector secretion systems in other pathogens. Conclusions Polymorphisms in two putative virulence genes among MBSP isolates from maize-growing regions in South-east Brazil are associated with variations in organ proliferation symptoms of MBSP-infected maize plants. PMID:28069632
Jiang, Yujia; Lu, Jiasheng; Chen, Tianpeng; Yan, Wei; Dong, Weiliang; Zhou, Jie; Zhang, Wenming; Ma, Jiangfeng; Jiang, Min; Xin, Fengxue
2018-05-23
A novel butanogenic Clostridium sp. NJ4 was successfully isolated and characterized, which could directly produce relatively high titer of butanol from inulin through consolidated bioprocessing (CBP). The assembled draft genome of strain NJ4 is 4.09 Mp, containing 3891 encoded protein sequences with G+C content of 30.73%. Among these annotated genes, a levanase, a hypothetical inulinase, and two bifunctional alcohol/aldehyde dehydrogenases (AdhE) were found to play key roles in the achievement of ABE production from inulin through CBP.
Molecular microbial diversity of a spacecraft assembly facility
NASA Technical Reports Server (NTRS)
Venkateswaran, K.; Satomi, M.; Chung, S.; Kern, R.; Koukol, R.; Basic, C.; White, D.
2001-01-01
In ongoing investigations to map and archive the microbial footprints in various components of the spacecraft and its accessories, we have examined the microbial populations of the Jet Propulsion Laboratory's Spacecraft Assembly Facility (JPL-SAF). Witness plates made up of spacecraft materials, some painted with spacecraft qualified paints, were exposed for approximately 7 to 9 months at JPL-SAF and examined the particulate materials collected for the incidence of total cultivable aerobic heterotrophs and heat-tolerant (80 degrees C for 15-min.) spore-formers. The results showed that the witness plates coated with spacecraft qualified paints attracted more dust particles than the non-coated stainless steel witness plates. Among the four paints tested, witness plates coated with NS43G accumulated the highest number of particles, and hence attracted more cultivable microbes. The conventional microbiological examination revealed that the JPL-SAF harbors mainly Gram-positive microbes and mostly spore-forming Bacillus species. Most of the isolated microbes were heat resistant to 80 degrees C and proliferate at 60 degrees C. The phylogenetic relationships among 23 cultivable heat-tolerant microbes were examined using a battery of morphological, physiological, molecular and chemotaxonomic characterizations. By 16S rDNA sequence analysis, the isolates fell into seven clades: Bacillus licheniformis, B. pumilus, B. cereus, B. circulans, Staphylococcus capitis, Planococcus sp. and Micrococcus lylae. In contrast to the cultivable approach, direct DNA isolation, cloning and 16S rDNA sequencing analysis revealed equal representation of both Gram-positive and Gram-negative microorganisms.
An efficient approach to BAC based assembly of complex genomes.
Visendi, Paul; Berkman, Paul J; Hayashi, Satomi; Golicz, Agnieszka A; Bayer, Philipp E; Ruperao, Pradeep; Hurgobin, Bhavna; Montenegro, Juan; Chan, Chon-Kit Kenneth; Staňková, Helena; Batley, Jacqueline; Šimková, Hana; Doležel, Jaroslav; Edwards, David
2016-01-01
There has been an exponential growth in the number of genome sequencing projects since the introduction of next generation DNA sequencing technologies. Genome projects have increasingly involved assembly of whole genome data which produces inferior assemblies compared to traditional Sanger sequencing of genomic fragments cloned into bacterial artificial chromosomes (BACs). While whole genome shotgun sequencing using next generation sequencing (NGS) is relatively fast and inexpensive, this method is extremely challenging for highly complex genomes, where polyploidy or high repeat content confounds accurate assembly, or where a highly accurate 'gold' reference is required. Several attempts have been made to improve genome sequencing approaches by incorporating NGS methods, to variable success. We present the application of a novel BAC sequencing approach which combines indexed pools of BACs, Illumina paired read sequencing, a sequence assembler specifically designed for complex BAC assembly, and a custom bioinformatics pipeline. We demonstrate this method by sequencing and assembling BAC cloned fragments from bread wheat and sugarcane genomes. We demonstrate that our assembly approach is accurate, robust, cost effective and scalable, with applications for complete genome sequencing in large and complex genomes.
Yang, Jun-Bo; Li, De-Zhu; Li, Hong-Tao
2014-09-01
Chloroplast genomes supply indispensable information that helps improve the phylogenetic resolution and even as organelle-scale barcodes. Next-generation sequencing technologies have helped promote sequencing of complete chloroplast genomes, but compared with the number of angiosperms, relatively few chloroplast genomes have been sequenced. There are two major reasons for the paucity of completely sequenced chloroplast genomes: (i) massive amounts of fresh leaves are needed for chloroplast sequencing and (ii) there are considerable gaps in the sequenced chloroplast genomes of many plants because of the difficulty of isolating high-quality chloroplast DNA, preventing complete chloroplast genomes from being assembled. To overcome these obstacles, all known angiosperm chloroplast genomes available to date were analysed, and then we designed nine universal primer pairs corresponding to the highly conserved regions. Using these primers, angiosperm whole chloroplast genomes can be amplified using long-range PCR and sequenced using next-generation sequencing methods. The primers showed high universality, which was tested using 24 species representing major clades of angiosperms. To validate the functionality of the primers, eight species representing major groups of angiosperms, that is, early-diverging angiosperms, magnoliids, monocots, Saxifragales, fabids, malvids and asterids, were sequenced and assembled their complete chloroplast genomes. In our trials, only 100 mg of fresh leaves was used. The results show that the universal primer set provided an easy, effective and feasible approach for sequencing whole chloroplast genomes in angiosperms. The designed universal primer pairs provide a possibility to accelerate genome-scale data acquisition and will therefore magnify the phylogenetic resolution and species identification in angiosperms. © 2014 John Wiley & Sons Ltd.
Hall, Barry G.
2017-01-01
ABSTRACT Strict infection control practices have been implemented for health care visits by cystic fibrosis (CF) patients in an attempt to prevent transmission of important pathogens. This study used whole-genome sequencing (WGS) to determine strain relatedness and assess population dynamics of Staphylococcus aureus isolates from a cohort of CF patients as assessed by strain relatedness. A total of 311 S. aureus isolates were collected from respiratory cultures of 115 CF patients during a 22-month study period. Whole-genome sequencing was performed, and using single nucleotide polymorphism (SNP) analysis, phylogenetic trees were assembled to determine relatedness between isolates. Methicillin-resistant Staphylococcus aureus (MRSA) phenotypes were predicted using PPFS2 and compared to the observed phenotype. The accumulation of SNPs in multiple isolates obtained over time from the same patient was examined to determine if a genomic molecular clock could be calculated. Pairs of isolates with ≤71 SNP differences were considered to be the “same” strain. All of the “same” strain isolates were either from the same patient or siblings pairs. There were 47 examples of patients being superinfected with an unrelated strain. The predicted MRSA phenotype was accurate in all but three isolates. Mutation rates were unable to be determined because the branching order in the phylogenetic tree was inconsistent with the order of isolation. The observation that transmissions were identified between sibling patients shows that WGS is an effective tool for determining transmission between patients. The observation that transmission only occurred between siblings suggests that Staphylococcus aureus acquisition in our CF population occurred outside the hospital environment and indicates that current infection prevention efforts appear effective. PMID:28446577
Bielaszewska, Martina; Karch, Helge; Toth, Ian K.
2012-01-01
Background An Escherichia coli O104:H4 outbreak in Germany in summer 2011 caused 53 deaths, over 4000 individual infections across Europe, and considerable economic, social and political impact. This outbreak was the first in a position to exploit rapid, benchtop high-throughput sequencing (HTS) technologies and crowdsourced data analysis early in its investigation, establishing a new paradigm for rapid response to disease threats. We describe a novel strategy for design of diagnostic PCR primers that exploited this rapid draft bacterial genome sequencing to distinguish between E. coli O104:H4 outbreak isolates and other pathogenic E. coli isolates, including the historical hæmolytic uræmic syndrome (HUSEC) E. coli HUSEC041 O104:H4 strain, which possesses the same serotype as the outbreak isolates. Methodology/Principal Findings Primers were designed using a novel alignment-free strategy against eleven draft whole genome assemblies of E. coli O104:H4 German outbreak isolates from the E. coli O104:H4 Genome Analysis Crowd-Sourcing Consortium website, and a negative sequence set containing 69 E. coli chromosome and plasmid sequences from public databases. Validation in vitro against 21 ‘positive’ E. coli O104:H4 outbreak and 32 ‘negative’ non-outbreak EHEC isolates indicated that individual primer sets exhibited 100% sensitivity for outbreak isolates, with false positive rates of between 9% and 22%. A minimal combination of two primers discriminated between outbreak and non-outbreak E. coli isolates with 100% sensitivity and 100% specificity. Conclusions/Significance Draft genomes of isolates of disease outbreak bacteria enable high throughput primer design and enhanced diagnostic performance in comparison to traditional molecular assays. Future outbreak investigations will be able to harness HTS rapidly to generate draft genome sequences and diagnostic primer sets, greatly facilitating epidemiology and clinical diagnostics. We expect that high throughput primer design strategies will enable faster, more precise responses to future disease outbreaks of bacterial origin, and help to mitigate their societal impact. PMID:22496820
Mukai, Motoko; Gonser, Rusty A.; Wingfield, John C.; London, Sarah E.; Tuttle, Elaina M.; Clayton, David F.
2014-01-01
Emberizid sparrows (emberizidae) have played a prominent role in the study of avian vocal communication and social behavior. We present here brain transcriptomes for three emberizid model systems, song sparrow Melospiza melodia, white-throated sparrow Zonotrichia albicollis, and Gambel’s white-crowned sparrow Zonotrichia leucophrys gambelii. Each of the assemblies covered fully or in part, over 89% of the previously annotated protein coding genes in the zebra finch Taeniopygia guttata, with 16,846, 15,805, and 16,646 unique BLAST hits in song, white-throated and white-crowned sparrows, respectively. As in previous studies, we find tissue of origin (auditory forebrain versus hypothalamus and whole brain) as an important determinant of overall expression profile. We also demonstrate the successful isolation of RNA and RNA-sequencing from post-mortem samples from building strikes and suggest that such an approach could be useful when traditional sampling opportunities are limited. These transcriptomes will be an important resource for the study of social behavior in birds and for data driven annotation of forthcoming whole genome sequences for these and other bird species. PMID:24883256
A fully decompressed synthetic bacteriophage øX174 genome assembled and archived in yeast.
Jaschke, Paul R; Lieberman, Erica K; Rodriguez, Jon; Sierra, Adrian; Endy, Drew
2012-12-20
The 5386 nucleotide bacteriophage øX174 genome has a complicated architecture that encodes 11 gene products via overlapping protein coding sequences spanning multiple reading frames. We designed a 6302 nucleotide synthetic surrogate, øX174.1, that fully separates all primary phage protein coding sequences along with cognate translation control elements. To specify øX174.1f, a decompressed genome the same length as wild type, we truncated the gene F coding sequence. We synthesized DNA encoding fragments of øX174.1f and used a combination of in vitro- and yeast-based assembly to produce yeast vectors encoding natural or designer bacteriophage genomes. We isolated clonal preparations of yeast plasmid DNA and transfected E. coli C strains. We recovered viable øX174 particles containing the øX174.1f genome from E. coli C strains that independently express full-length gene F. We expect that yeast can serve as a genomic 'drydock' within which to maintain and manipulate clonal lineages of other obligate lytic phage. Copyright © 2012 Elsevier Inc. All rights reserved.
Hwang, Hwan-Su; Lee, Hyoshin; Choi, Yong Eui
2015-03-14
Eleutherococcus senticosus, Siberian ginseng, is a highly valued woody medicinal plant belonging to the family Araliaceae. E. senticosus produces a rich variety of saponins such as oleanane-type, noroleanane-type, 29-hydroxyoleanan-type, and lupane-type saponins. Genomic or transcriptomic approaches have not been used to investigate the saponin biosynthetic pathway in this plant. In this study, de novo sequencing was performed to select candidate genes involved in the saponin biosynthetic pathway. A half-plate 454 pyrosequencing run produced 627,923 high-quality reads with an average sequence length of 422 bases. De novo assembly generated 72,811 unique sequences, including 15,217 contigs and 57,594 singletons. Approximately 48,300 (66.3%) unique sequences were annotated using BLAST similarity searches. All of the mevalonate pathway genes for saponin biosynthesis starting from acetyl-CoA were isolated. Moreover, 206 reads of cytochrome P450 (CYP) and 145 reads of uridine diphosphate glycosyltransferase (UGT) sequences were isolated. Based on methyl jasmonate (MeJA) treatment and real-time PCR (qPCR) analysis, 3 CYPs and 3 UGTs were finally selected as candidate genes involved in the saponin biosynthetic pathway. The identified sequences associated with saponin biosynthesis will facilitate the study of the functional genomics of saponin biosynthesis and genetic engineering of E. senticosus.
NASA Technical Reports Server (NTRS)
Birmele, Michele N.
2011-01-01
The Regenerative, Environmental Control and Life Support System (ECLSS) on the International Space Station (ISS) includes the the Water Recovery System (WRS) and the Oxygen Generation System (OGS). The WRS consists of a Urine Processor Assembly (UPA) and Water Processor Assembly (WPA). This report describes microbial characterization of wastewater and surface samples collected from the WRS and OGS subsystems, returned to KSC, JSC, and MSFC on consecutive shuttle flights (STS-129 and STS-130) in 2009-10. STS-129 returned two filters that contained fluid samples from the WPA Waste Tank Orbital Recovery Unit (ORU), one from the waste tank and the other from the ISS humidity condensate. Direct count by microscopic enumeration revealed 8.38 x 104 cells per mL in the humidity condensate sample, but none of those cells were recoverable on solid agar media. In contrast, 3.32 x lOs cells per mL were measured from a surface swab of the WRS waste tank, including viable bacteria and fungi recovered after S12 days of incubation on solid agar media. Based on rDNA sequencing and phenotypic characterization, a fungus recovered from the filter was determined to be Lecythophora mutabilis. The bacterial isolate was identified by rDNA sequence data to be Methylobacterium radiotolerans. Additional UPA subsystem samples were returned on STS-130 for analysis. Both liquid and solid samples were collected from the Russian urine container (EDV), Distillation Assembly (DA) and Recycle Filter Tank Assembly (RFTA) for post-flight analysis. The bacterium Pseudomonas aeruginosa and fungus Chaetomium brasiliense were isolated from the EDV samples. No viable bacteria or fungi were recovered from RFTA brine samples (N= 6), but multiple samples (N = 11) from the DA and RFTA were found to contain fungal and bacterial cells. Many recovered cells have been identified to genus by rDNA sequencing and carbon source utilization profiling (BiOLOG Gen III). The presence of viable bacteria and fungi from WRS and OGS subsystems demonstrates the need for continued monitoring of ECLSS during future ISS operations and investigation of advanced antimicrobial controls.
Morphological Identification and Single-Cell Genomics of Marine Diplonemids.
Gawryluk, Ryan M R; Del Campo, Javier; Okamoto, Noriko; Strassert, Jürgen F H; Lukeš, Julius; Richards, Thomas A; Worden, Alexandra Z; Santoro, Alyson E; Keeling, Patrick J
2016-11-21
Recent global surveys of marine biodiversity have revealed that a group of organisms known as "marine diplonemids" constitutes one of the most abundant and diverse planktonic lineages [1]. Though discovered over a decade ago [2, 3], their potential importance was unrecognized, and our knowledge remains restricted to a single gene amplified from environmental DNA, the 18S rRNA gene (small subunit [SSU]). Here, we use single-cell genomics (SCG) and microscopy to characterize ten marine diplonemids, isolated from a range of depths in the eastern North Pacific Ocean. Phylogenetic analysis confirms that the isolates reflect the entire range of marine diplonemid diversity, and comparisons to environmental SSU surveys show that sequences from the isolates range from rare to superabundant, including the single most common marine diplonemid known. SCG generated a total of ∼915 Mbp of assembled sequence across all ten cells and ∼4,000 protein-coding genes with homologs in the Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology database, distributed across categories expected for heterotrophic protists. Models of highly conserved genes indicate a high density of non-canonical introns, lacking conventional GT-AG splice sites. Mapping metagenomic datasets [4] to SCG assemblies reveals virtually no overlap, suggesting that nuclear genomic diversity is too great for representative SCG data to provide meaningful phylogenetic context to metagenomic datasets. This work provides an entry point to the future identification, isolation, and cultivation of these elusive yet ecologically important cells. The high density of nonconventional introns, however, also portends difficulty in generating accurate gene models and highlights the need for the establishment of stable cultures and transcriptomic analyses. Copyright © 2016 Elsevier Ltd. All rights reserved.
Ibarra-Laclette, Enrique; Sánchez-Rangel, Diana; Hernández-Domínguez, Eric; Pérez-Torres, Claudia-Anahí; Ortiz-Castro, Randy; Villafán, Emanuel; Alonso-Sánchez, Alexandro; Rodríguez-Haas, Benjamín; López-Buenfil, Abel; García-Avila, Clemente; Ramírez-Pool, José-Abrahán
2017-08-31
Here, we report the genome of Fusarium euwallaceae strain HFEW-16-IV-019, an isolate obtained from Kuroshio shot hole borer (a Euwallacea sp.). These beetles were collected in Tijuana, Mexico, from elm trees showing typical symptoms of Fusarium dieback. The final assembly consists of 287 scaffolds spanning 48,274,071 bp and 13,777 genes. Copyright © 2017 Ibarra-Laclette et al.
A vertebrate case study of the quality of assemblies derived from next-generation sequences
2011-01-01
The unparalleled efficiency of next-generation sequencing (NGS) has prompted widespread adoption, but significant problems remain in the use of NGS data for whole genome assembly. We explore the advantages and disadvantages of chicken genome assemblies generated using a variety of sequencing and assembly methodologies. NGS assemblies are equivalent in some ways to a Sanger-based assembly yet deficient in others. Nonetheless, these assemblies are sufficient for the identification of the majority of genes and can reveal novel sequences when compared to existing assembly references. PMID:21453517
Microbial characterization of the Mars Odyssey spacecraft and its encapsulation facility.
La Duc, Myron T; Nicholson, Wayne; Kern, Roger; Venkateswaran, Kasthuri
2003-10-01
Microbial characterization of the Mars Odyssey spacecraft and the Kennedy Space Center Spacecraft Assembly and Encapsulation Facility II (SAEF-II) was carried out by both culture-based and molecular methods. The most dominant cultivable microbes were species of Bacillus, with comamonads, microbacteria and actinomycetales also represented. Several spore-forming isolates were resistant to gamma-radiation, UV, H2O2 and desiccation, and one Acinetobacter radioresistens isolate and several Aureobasidium, isolated directly from the spacecraft, survived various conditions. Sequences arising in clone libraries were fairly consistent between the spacecraft and facility; predominant genera included Variovorax, Ralstonia and Aquaspirillum. This study improves our understanding of the microbial community structure, diversity and survival capabilities of microbes in an encapsulation facility and physically associated with colocated spacecraft.
Microbial characterization of the Mars Odyssey spacecraft and its encapsulation facility
NASA Technical Reports Server (NTRS)
La Duc, Myron T.; Nicholson, Wayne; Kern, Roger; Venkateswaran, Kasthuri
2003-01-01
Microbial characterization of the Mars Odyssey spacecraft and the Kennedy Space Center Spacecraft Assembly and Encapsulation Facility II (SAEF-II) was carried out by both culture-based and molecular methods. The most dominant cultivable microbes were species of Bacillus, with comamonads, microbacteria and actinomycetales also represented. Several spore-forming isolates were resistant to gamma-radiation, UV, H2O2 and desiccation, and one Acinetobacter radioresistens isolate and several Aureobasidium, isolated directly from the spacecraft, survived various conditions. Sequences arising in clone libraries were fairly consistent between the spacecraft and facility; predominant genera included Variovorax, Ralstonia and Aquaspirillum. This study improves our understanding of the microbial community structure, diversity and survival capabilities of microbes in an encapsulation facility and physically associated with colocated spacecraft.
Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay
2013-01-01
Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.
Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies
Sundquist, Andreas; Ronaghi, Mostafa; Tang, Haixu; Pevzner, Pavel; Batzoglou, Serafim
2007-01-01
While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology. PMID:17534434
preAssemble: a tool for automatic sequencer trace data processing.
Adzhubei, Alexei A; Laerdahl, Jon K; Vlasova, Anna V
2006-01-17
Trace or chromatogram files (raw data) are produced by automatic nucleic acid sequencing equipment or sequencers. Each file contains information which can be interpreted by specialised software to reveal the sequence (base calling). This is done by the sequencer proprietary software or publicly available programs. Depending on the size of a sequencing project the number of trace files can vary from just a few to thousands of files. Sequencing quality assessment on various criteria is important at the stage preceding clustering and contig assembly. Two major publicly available packages--Phred and Staden are used by preAssemble to perform sequence quality processing. The preAssemble pre-assembly sequence processing pipeline has been developed for small to large scale automatic processing of DNA sequencer chromatogram (trace) data. The Staden Package Pregap4 module and base-calling program Phred are utilized in the pipeline, which produces detailed and self-explanatory output that can be displayed with a web browser. preAssemble can be used successfully with very little previous experience, however options for parameter tuning are provided for advanced users. preAssemble runs under UNIX and LINUX operating systems. It is available for downloading and will run as stand-alone software. It can also be accessed on the Norwegian Salmon Genome Project web site where preAssemble jobs can be run on the project server. preAssemble is a tool allowing to perform quality assessment of sequences generated by automatic sequencing equipment. preAssemble is flexible since both interactive jobs on the preAssemble server and the stand alone downloadable version are available. Virtually no previous experience is necessary to run a default preAssemble job, on the other hand options for parameter tuning are provided. Consequently preAssemble can be used as efficiently for just several trace files as for large scale sequence processing.
Complete genome sequence of the phenanthrene-degrading soil bacterium Delftia acidovorans Cs1-4
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shetty, Ameesha R.; de Gannes, Vidya; Obi, Chioma C.
Polycyclic aromatic hydrocarbons (PAH) are ubiquitous environmental pollutants and microbial biodegradation is an important means of remediation of PAH-contaminated soil. Delftia acidovorans Cs1-4 (formerly Delftia sp. Cs1-4) was isolated by using phenanthrene as the sole carbon source from PAH contaminated soil in Wisconsin. Its full genome sequence was determined to gain insights into a mechanisms underlying biodegradation of PAH. Three genomic libraries were constructed and sequenced: an Illumina GAii shotgun library (916,416,493 reads), a 454 Titanium standard library (770,171 reads) and one paired-end 454 library (average insert size of 8 kb, 508,092 reads). The initial assembly contained 40 contigs inmore » two scaffolds. The 454 Titanium standard data and the 454 paired end data were assembled together and the consensus sequences were computationally shredded into 2 kb overlapping shreds. Illumina sequencing data was assembled, and the consensus sequence was computationally shredded into 1.5 kb overlapping shreds. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks. A total of 182 additional reactions were needed to close gaps and to raise the quality of the finished sequence. The final assembly is based on 253.3 Mb of 454 draft data (averaging 38.4 X coverage) and 590.2 Mb of Illumina draft data (averaging 89.4 X coverage). The genome of strain Cs1-4 consists of a single circular chromosome of 6,685,842 bp (66.7 %G+C) containing 6,028 predicted genes; 5,931 of these genes were protein-encoding and 4,425 gene products were assigned to a putative function. Genes encoding phenanthrene degradation were localized to a 232 kb genomic island (termed the phn island), which contained near its 3’ end a bacteriophage P4-like integrase, an enzyme often associated with chromosomal integration of mobile genetic elements. Other biodegradation pathways reconstructed from the genome sequence included: benzoate (by the acetyl-CoA pathway), styrene, nicotinic acid (by the maleamate pathway) and the pesticides Dicamba and Fenitrothion. Lastly, determination of the complete genome sequence of D. acidovorans Cs1-4 has provided new insights the microbial mechanisms of PAH biodegradation that may shape the process in the environment.« less
Complete genome sequence of the phenanthrene-degrading soil bacterium Delftia acidovorans Cs1-4
Shetty, Ameesha R.; de Gannes, Vidya; Obi, Chioma C.; ...
2015-08-15
Polycyclic aromatic hydrocarbons (PAH) are ubiquitous environmental pollutants and microbial biodegradation is an important means of remediation of PAH-contaminated soil. Delftia acidovorans Cs1-4 (formerly Delftia sp. Cs1-4) was isolated by using phenanthrene as the sole carbon source from PAH contaminated soil in Wisconsin. Its full genome sequence was determined to gain insights into a mechanisms underlying biodegradation of PAH. Three genomic libraries were constructed and sequenced: an Illumina GAii shotgun library (916,416,493 reads), a 454 Titanium standard library (770,171 reads) and one paired-end 454 library (average insert size of 8 kb, 508,092 reads). The initial assembly contained 40 contigs inmore » two scaffolds. The 454 Titanium standard data and the 454 paired end data were assembled together and the consensus sequences were computationally shredded into 2 kb overlapping shreds. Illumina sequencing data was assembled, and the consensus sequence was computationally shredded into 1.5 kb overlapping shreds. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks. A total of 182 additional reactions were needed to close gaps and to raise the quality of the finished sequence. The final assembly is based on 253.3 Mb of 454 draft data (averaging 38.4 X coverage) and 590.2 Mb of Illumina draft data (averaging 89.4 X coverage). The genome of strain Cs1-4 consists of a single circular chromosome of 6,685,842 bp (66.7 %G+C) containing 6,028 predicted genes; 5,931 of these genes were protein-encoding and 4,425 gene products were assigned to a putative function. Genes encoding phenanthrene degradation were localized to a 232 kb genomic island (termed the phn island), which contained near its 3’ end a bacteriophage P4-like integrase, an enzyme often associated with chromosomal integration of mobile genetic elements. Other biodegradation pathways reconstructed from the genome sequence included: benzoate (by the acetyl-CoA pathway), styrene, nicotinic acid (by the maleamate pathway) and the pesticides Dicamba and Fenitrothion. Lastly, determination of the complete genome sequence of D. acidovorans Cs1-4 has provided new insights the microbial mechanisms of PAH biodegradation that may shape the process in the environment.« less
Reducing assembly complexity of microbial genomes with single-molecule sequencing.
Koren, Sergey; Harhay, Gregory P; Smith, Timothy P L; Bono, James L; Harhay, Dayna M; Mcvey, Scott D; Radune, Diana; Bergman, Nicholas H; Phillippy, Adam M
2013-01-01
The short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes. Therefore, most genomes have been left unfinished due to the significant resources required to manually close gaps in draft assemblies. Third-generation, single-molecule sequencing addresses this problem by greatly increasing sequencing read length, which simplifies the assembly problem. To measure the benefit of single-molecule sequencing on microbial genome assembly, we sequenced and assembled the genomes of six bacteria and analyzed the repeat complexity of 2,267 complete bacteria and archaea. Our results indicate that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio RS sequencing library. These single-library assemblies are also more accurate than typical short-read assemblies and hybrid assemblies of short and long reads. Automated assembly of long, single-molecule sequencing data reduces the cost of microbial finishing to $1,000 for most genomes, and future advances in this technology are expected to drive the cost lower. This is expected to increase the number of completed genomes, improve the quality of microbial genome databases, and enable high-fidelity, population-scale studies of pan-genomes and chromosomal organization.
Roisin, S; Gaudin, C; De Mendonça, R; Bellon, J; Van Vaerenbergh, K; De Bruyne, K; Byl, B; Pouseele, H; Denis, O; Supply, P
2016-06-01
We used a two-step whole genome sequencing analysis for resolving two concurrent outbreaks in two neonatal services in Belgium, caused by exfoliative toxin A-encoding-gene-positive (eta+) methicillin-susceptible Staphylococcus aureus with an otherwise sporadic spa-type t209 (ST-109). Outbreak A involved 19 neonates and one healthcare worker in a Brussels hospital from May 2011 to October 2013. After a first episode interrupted by decolonization procedures applied over 7 months, the outbreak resumed concomitantly with the onset of outbreak B in a hospital in Asse, comprising 11 neonates and one healthcare worker from mid-2012 to January 2013. Pan-genome multilocus sequence typing, defined on the basis of 42 core and accessory reference genomes, and single-nucleotide polymorphisms mapped on an outbreak-specific de novo assembly were used to compare 28 available outbreak isolates and 19 eta+/spa-type t209 isolates identified by routine or nationwide surveillance. Pan-genome multilocus sequence typing showed that the outbreaks were caused by independent clones not closely related to any of the surveillance isolates. Isolates from only ten cases with overlapping stays in outbreak A, including four pairs of twins, showed no or only a single nucleotide polymorphism variation, indicating limited sequential transmission. Detection of larger genomic variation, even from the start of the outbreak, pointed to sporadic seeding from a pre-existing exogenous source, which persisted throughout the whole course of outbreak A. Whole genome sequencing analysis can provide unique fine-tuned insights into transmission pathways of complex outbreaks even at their inception, which, with timely use, could valuably guide efforts for early source identification. Copyright © 2016 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.
Altman, D. R.; Sebra, R.; Hand, J.; Attie, O.; Deikus, G.; Carpini, K. W. D.; Patel, G.; Rana, M.; Arvelakis, A.; Grewal, P.; Dutta, J.; Rose, H.; Shopsin, B.; Daefler, S.; Schadt, E.; Kasarskis, A.; van Bakel, H.; Bashir, A.; Huprikar, S.
2015-01-01
Donor-derived bacterial infection is a recognized complication of solid organ transplantation (SOT). The present report describes the clinical details and successful outcome in a liver transplant recipient despite transmission of methicillin-resistant Staphylococcus aureus (MRSA) from a deceased donor with MRSA endocarditis and bacteremia. We further describe whole genome sequencing (WGS) and complete de novo assembly of the donor and recipient MRSA isolate genomes, which confirms that both isolates are genetically 100% identical. We propose that similar application of WGS techniques to future investigations of donor bacterial transmission would strengthen the definition of proven bacterial transmission in SOT, particularly in the presence of highly clonal bacteria such as MRSA. WGS will further improve our understanding of the epidemiology of bacterial transmission in SOT and the risk of adverse patient outcomes when it occurs. PMID:25250641
Simões-Araújo, Jean Luiz; Rumjanek, Norma Gouvêa; Xavier, Gustavo Ribeiro; Zilli, Jerri Édson
The strain BR 3351 T (Bradyrhizobium manausense) was obtained from nodules of cowpea (Vigna unguiculata L. Walp) growing in soil collected from Amazon rainforest. Furthermore, it was observed that the strain has high capacity to fix nitrogen symbiotically in symbioses with cowpea. We report here the draft genome sequence of strain BR 3351 T . The information presented will be important for comparative analysis of nodulation and nitrogen fixation for diazotrophic bacteria. A draft genome with 9,145,311bp and 62.9% of GC content was assembled in 127 scaffolds using 100bp pair-end Illumina MiSeq system. The RAST annotation identified 8603 coding sequences, 51 RNAs genes, classified in 504 subsystems. Published by Elsevier Editora Ltda.
Gong, Yu-Nong; Chen, Guang-Wu; Yang, Shu-Li; Lee, Ching-Ju; Shih, Shin-Ru; Tsao, Kuo-Chien
2016-01-01
Forty-two cytopathic effect (CPE)-positive isolates were collected from 2008 to 2012. All isolates could not be identified for known viral pathogens by routine diagnostic assays. They were pooled into 8 groups of 5-6 isolates to reduce the sequencing cost. Next-generation sequencing (NGS) was conducted for each group of mixed samples, and the proposed data analysis pipeline was used to identify viral pathogens in these mixed samples. Polymerase chain reaction (PCR) or enzyme-linked immunosorbent assay (ELISA) was individually conducted for each of these 42 isolates depending on the predicted viral types in each group. Two isolates remained unknown after these tests. Moreover, iteration mapping was implemented for each of these 2 isolates, and predicted human parechovirus (HPeV) in both. In summary, our NGS pipeline detected the following viruses among the 42 isolates: 29 human rhinoviruses (HRVs), 10 HPeVs, 1 human adenovirus (HAdV), 1 echovirus and 1 rotavirus. We then focused on the 10 identified Taiwanese HPeVs because of their reported clinical significance over HRVs. Their genomes were assembled and their genetic diversity was explored. One novel 6-bp deletion was found in one HPeV-1 virus. In terms of nucleotide heterogeneity, 64 genetic variants were detected from these HPeVs using the mapped NGS reads. Most importantly, a recombination event was found between our HPeV-3 and a known HPeV-4 strain in the database. Similar event was detected in the other HPeV-3 strains in the same clade of the phylogenetic tree. These findings demonstrated that the proposed NGS data analysis pipeline identified unknown viruses from the mixed clinical samples, revealed their genetic identity and variants, and characterized their genetic features in terms of viral evolution.
Proline-poor hydrophobic domains modulate the assembly and material properties of polymeric elastin.
Muiznieks, Lisa D; Reichheld, Sean E; Sitarz, Eva E; Miao, Ming; Keeley, Fred W
2015-10-01
Elastin is a self-assembling extracellular matrix protein that provides elasticity to tissues. For entropic elastomers such as elastin, conformational disorder of the monomer building block, even in the polymeric form, is essential for elastomeric recoil. The highly hydrophobic monomer employs a range of strategies for maintaining disorder and flexibility within hydrophobic domains, particularly involving a minimum compositional threshold of proline and glycine residues. However, the native sequence of hydrophobic elastin domain 30 is uncharacteristically proline-poor and, as an isolated polypeptide, is susceptible to formation of amyloid-like structures comprised of stacked β-sheet. Here we investigated the biophysical and mechanical properties of multiple sets of elastin-like polypeptides designed with different numbers of proline-poor domain 30 from human or rat tropoelastins. We compared the contributions of these proline-poor hydrophobic sequences to self-assembly through characterization of phase separation, and to the tensile properties of cross-linked, polymeric materials. We demonstrate that length of hydrophobic domains and propensity to form β-structure, both affecting polypeptide chain flexibility and cross-link density, play key roles in modulating elastin mechanical properties. This study advances the understanding of elastin sequence-structure-function relationships, and provides new insights that will directly support rational approaches to the design of biomaterials with defined suites of mechanical properties. © 2015 Wiley Periodicals, Inc.
Huang, Kristen M; Geunes-Boyer, Scarlett; Wu, Sufen; Dutra, Amalia; Favor, Jack; Stambolian, Dwight
2004-05-01
Xcat mice display X-linked congenital cataracts and are a mouse model for the human X-linked cataract disease Nance Horan syndrome (NHS). The genetic defect in Xcat mice and NHS patients is not known. We isolated and sequenced a BAC contig representing a portion of the Xcat critical region. We combined our sequencing data with the most recent mouse sequence assemblies from both Celera and public databases. The sequence of the 2.2-Mb Xcat critical region was then analyzed for potential Xcat candidate genes. The coding regions of the seven known genes within this area (Rai2, Rbbp7, Ctps2, Calb3, Grpr, Reps2, and Syap1) were sequenced in Xcat mice and no mutations were detected. The expression of Rai2 was quantitatively identical in wild-type and Xcat mutant eyes. These results indicate that the Xcat mutation is within a novel, undiscovered gene.
Jiang, W; Woitach, J T; Gupta, D; Bhavanandan, V P
1998-10-20
Secreted epithelial mucins are extremely large and heterogeneous glycoproteins. We report the 5 kilobase DNA sequence of a second gene, BSM2, which encodes bovine submaxillary mucin. The determined nucleotide and deduced amino acid sequences of BSM2 are 95.2% and 92. 2% identical, respectively, to those of the previously described BSM1 gene isolated from the same cow. Further, the five predicted protein domains of the two genes are 100%, 94%, 93%, 77%, and 88% identical. Based on the above results, we propose that expression of multiple homologous core proteins from a single animal is a factor in generating diversity of saccharides in mucins and in providing resistance of the molecules to proteolysis. In addition, this work raises several important issues in mucin cloning such as assembling sequences from seemingly overlapping clones and deducing consensus sequences for nearly identical tandem repeats. Copyright 1998 Academic Press.
SCARF: maximizing next-generation EST assemblies for evolutionary and population genomic analyses.
Barker, Michael S; Dlugosch, Katrina M; Reddy, A Chaitanya C; Amyotte, Sarah N; Rieseberg, Loren H
2009-02-15
Scaffolded and Corrected Assembly of Roche 454 (SCARF) is a next-generation sequence assembly tool for evolutionary genomics that is designed especially for assembling 454 EST sequences against high-quality reference sequences from related species. The program was created to knit together 454 contigs that do not assemble during traditional de novo assembly, using a reference sequence library to orient the 454 sequences. SCARF is freely available at http://msbarker.com/software.htm, and is released under the open source GPLv3 license (http://www.opensource.org/licenses/gpl-3.0.html.
Analysis of Epstein-Barr Virus Genomes and Expression Profiles in Gastric Adenocarcinoma.
Borozan, Ivan; Zapatka, Marc; Frappier, Lori; Ferretti, Vincent
2018-01-15
Epstein-Barr virus (EBV) is a causative agent of a variety of lymphomas, nasopharyngeal carcinoma (NPC), and ∼9% of gastric carcinomas (GCs). An important question is whether particular EBV variants are more oncogenic than others, but conclusions are currently hampered by the lack of sequenced EBV genomes. Here, we contribute to this question by mining whole-genome sequences of 201 GCs to identify 13 EBV-positive GCs and by assembling 13 new EBV genome sequences, almost doubling the number of available GC-derived EBV genome sequences and providing the first non-Asian EBV genome sequences from GC. Whole-genome sequence comparisons of all EBV isolates sequenced to date (85 from tumors and 57 from healthy individuals) showed that most GC and NPC EBV isolates were closely related although American Caucasian GC samples were more distant, suggesting a geographical component. However, EBV GC isolates were found to contain some consistent changes in protein sequences regardless of geographical origin. In addition, transcriptome data available for eight of the EBV-positive GCs were analyzed to determine which EBV genes are expressed in GC. In addition to the expected latency proteins (EBNA1, LMP1, and LMP2A), specific subsets of lytic genes were consistently expressed that did not reflect a typical lytic or abortive lytic infection, suggesting a novel mechanism of EBV gene regulation in the context of GC. These results are consistent with a model in which a combination of specific latent and lytic EBV proteins promotes tumorigenesis. IMPORTANCE Epstein-Barr virus (EBV) is a widespread virus that causes cancer, including gastric carcinoma (GC), in a small subset of individuals. An important question is whether particular EBV variants are more cancer associated than others, but more EBV sequences are required to address this question. Here, we have generated 13 new EBV genome sequences from GC, almost doubling the number of EBV sequences from GC isolates and providing the first EBV sequences from non-Asian GC. We further identify sequence changes in some EBV proteins common to GC isolates. In addition, gene expression analysis of eight of the EBV-positive GCs showed consistent expression of both the expected latency proteins and a subset of lytic proteins that was not consistent with typical lytic or abortive lytic expression. These results suggest that novel mechanisms activate expression of some EBV lytic proteins and that their expression may contribute to oncogenesis. Copyright © 2018 American Society for Microbiology.
Sequencing and assembly of the 22-gb loblolly pine genome.
Zimin, Aleksey; Stevens, Kristian A; Crepeau, Marc W; Holtz-Morris, Ann; Koriabine, Maxim; Marçais, Guillaume; Puiu, Daniela; Roberts, Michael; Wegrzyn, Jill L; de Jong, Pieter J; Neale, David B; Salzberg, Steven L; Yorke, James A; Langley, Charles H
2014-03-01
Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer "super-reads," rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp.
Assembly: a resource for assembled genomes at NCBI
Kitts, Paul A.; Church, Deanna M.; Thibaud-Nissen, Françoise; Choi, Jinna; Hem, Vichet; Sapojnikov, Victor; Smith, Robert G.; Tatusova, Tatiana; Xiang, Charlie; Zherikov, Andrey; DiCuccio, Michael; Murphy, Terence D.; Pruitt, Kim D.; Kimchi, Avi
2016-01-01
The NCBI Assembly database (www.ncbi.nlm.nih.gov/assembly/) provides stable accessioning and data tracking for genome assembly data. The model underlying the database can accommodate a range of assembly structures, including sets of unordered contig or scaffold sequences, bacterial genomes consisting of a single complete chromosome, or complex structures such as a human genome with modeled allelic variation. The database provides an assembly accession and version to unambiguously identify the set of sequences that make up a particular version of an assembly, and tracks changes to updated genome assemblies. The Assembly database reports metadata such as assembly names, simple statistical reports of the assembly (number of contigs and scaffolds, contiguity metrics such as contig N50, total sequence length and total gap length) as well as the assembly update history. The Assembly database also tracks the relationship between an assembly submitted to the International Nucleotide Sequence Database Consortium (INSDC) and the assembly represented in the NCBI RefSeq project. Users can find assemblies of interest by querying the Assembly Resource directly or by browsing available assemblies for a particular organism. Links in the Assembly Resource allow users to easily download sequence and annotations for current versions of genome assemblies from the NCBI genomes FTP site. PMID:26578580
Draft genome sequence of non-shiga toxin-producing Escherichia coli O157 NCCP15738.
Kwon, Taesoo; Kim, Jung-Beom; Bak, Young-Seok; Yu, Young-Bin; Kwon, Ki Sung; Kim, Won; Cho, Seung-Hak
2016-01-01
The non-shiga toxin-producing Escherichia coli (non-STEC) O157 is a pathogenic strain that cause diarrhea but does not cause hemolytic-uremic syndrome, or hemorrhagic colitis. Here, we present the 5-Mb draft genome sequence of non-STEC O157 NCCP15738, which was isolated from the feces of a Korean patient with diarrhea, and describe its features and the structural basis for its genome evolution. A total of 565-Mbp paired-end reads were generated using the Illumina-HiSeq 2000 platform. The reads were assembled into 135 scaffolds throughout the de novo assembly. The assembled genome size of NCCP15738 was 5,005,278 bp with an N50 value of 142,450 bp and 50.65 % G+C content. Using Rapid Annotation using Subsystem Technology analysis, we predicted 4780 ORFs and 31 RNA genes. The evolutionary tree was inferred from multiple sequence alignment of 45 E. coli species. The most closely related neighbor of NCCP15738 indicated by whole-genome phylogeny was E. coli UMNK88, but that indicated by multilocus sequence analysis was E. coli DH1(ME8569). A comparison between the NCCP15738 genome and those of reference strains, E. coli K-12 substr. MG1655 and EHEC O157:H7 EDL933 by bioinformatics analyses revealed unique genes in NCCP15738 associated with lysis protein S, two-component signal transduction system, conjugation, the flagellum, nucleotide-binding proteins, and metal-ion binding proteins. Notably, NCCP15738 has a dual flagella system like that in Vibrio parahaemolyticus, Aeromonas spp., and Rhodospirillum centenum. The draft genome sequence and the results of bioinformatics analysis of NCCP15738 provide the basis for understanding the genomic evolution of this strain.
A New Zamilon-like Virophage Partial Genome Assembled from a Bioreactor Metagenome
Bekliz, Meriem; Verneau, Jonathan; Benamar, Samia; Raoult, Didier; La Scola, Bernard; Colson, Philippe
2015-01-01
Virophages replicate within viral factories inside the Acanthamoeba cytoplasm, and decrease the infectivity and replication of their associated giant viruses. Culture isolation and metagenome analyses have suggested that they are common in our environment. By screening metagenomic databases in search of amoebal viruses, we detected virophage-related sequences among sequences generated from the same non-aerated bioreactor metagenome as recently screened by another team for virophage capsid-encoding genes. We describe here the assembled partial genome of a virophage closely related to Zamilon, which infects Acanthamoeba with mimiviruses of lineages B and C but not A. Searches for sequences related to amoebal giant viruses, other Megavirales representatives and virophages were conducted using BLAST against this bioreactor metagenome (PRJNA73603). Comparative genomic and phylogenetic analyses were performed using sequences from previously identified virophages. A total of 72 metagenome contigs generated from the bioreactor were identified as best matching with sequences from Megavirales representatives, mostly Pithovirus sibericum, pandoraviruses and amoebal mimiviruses from three lineages A–C, as well as from virophages. In addition, a partial genome from a Zamilon-like virophage, we named Zamilon 2, was assembled. This genome has a size of 6716 base pairs, corresponding to 39% of the Zamilon genome, and comprises partial or full-length homologs for 15 Zamilon predicted open reading frames (ORFs). Mean nucleotide and amino acid identities for these 15 Zamilon 2 ORFs with their Zamilon counterparts were 89% (range, 81–96%) and 91% (range, 78–99%), respectively. Notably, these ORFs included two encoding a capsid protein and a packaging ATPase. Comparative genomics and phylogenetic analyses indicated that the partial genome was that of a new Zamilon-like virophage. Further studies are needed to gain better knowledge of the tropism and prevalence of virophages in our biosphere and in humans. PMID:26640459
Scar-less multi-part DNA assembly design automation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hillson, Nathan J.
The present invention provides a method of a method of designing an implementation of a DNA assembly. In an exemplary embodiment, the method includes (1) receiving a list of DNA sequence fragments to be assembled together and an order in which to assemble the DNA sequence fragments, (2) designing DNA oligonucleotides (oligos) for each of the DNA sequence fragments, and (3) creating a plan for adding flanking homology sequences to each of the DNA oligos. In an exemplary embodiment, the method includes (1) receiving a list of DNA sequence fragments to be assembled together and an order in which tomore » assemble the DNA sequence fragments, (2) designing DNA oligonucleotides (oligos) for each of the DNA sequence fragments, and (3) creating a plan for adding optimized overhang sequences to each of the DNA oligos.« less
Brumm, Phillip J; Land, Miriam L; Mead, David A
2015-01-01
Geobacillus thermoglucosidasius C56-YS93 was one of several thermophilic organisms isolated from Obsidian Hot Spring, Yellowstone National Park, Montana, USA under permit from the National Park Service. Comparison of 16 S rRNA sequences confirmed the classification of the strain as a G. thermoglucosidasius species. The genome was sequenced, assembled, and annotated by the DOE Joint Genome Institute and deposited at the NCBI in December 2011 (CP002835). The genome of G. thermoglucosidasius C56-YS93 consists of one circular chromosome of 3,893,306 bp and two circular plasmids of 80,849 and 19,638 bp and an average G + C content of 43.93 %. G. thermoglucosidasius C56-YS93 possesses a xylan degradation cluster not found in the other G. thermoglucosidasius sequenced strains. This cluster appears to be related to the xylan degradation cluster found in G. stearothermophilus. G. thermoglucosidasius C56-YS93 possesses two plasmids not found in the other two strains. One plasmid contains a novel gene cluster coding for proteins involved in proline degradation and metabolism, the other contains a collection of mostly hypothetical proteins.
Mavromatis, Konstantinos; Land, Miriam L; Brettin, Thomas S; Quest, Daniel J; Copeland, Alex; Clum, Alicia; Goodwin, Lynne; Woyke, Tanja; Lapidus, Alla; Klenk, Hans Peter; Cottingham, Robert W; Kyrpides, Nikos C
2012-01-01
The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation. In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis. These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).
Chiriac, Cecilia; Baricz, Andreea
2018-01-01
ABSTRACT The draft genome assembly of Janthinobacterium sp. strain ROICE36 has 207 contigs, with a total genome size of 5,977,006 bp and a G+C content of 62%. Preliminary genome analysis identified 5,363 protein-coding genes and a total of 7 secondary metabolic gene clusters (encoding bacteriocins, nonribosomal peptide-synthetase [NRPS], terpene, hserlactone, and other ketide synthases). PMID:29650588
Kim, Jung A; Jeon, Jongbum; Kim, Ki-Tae; Choi, Gobong; Park, Sook-Young; Lee, Hyun-Jung; Shim, Sang-Hee; Lee, Yong-Hwan; Kim, Soonok
2017-08-03
An endophytic fungus, Gaeumannomyces sp. strain JS-464, is capable of producing a number of secondary metabolites which showed significant nitric oxide reduction activity. The draft genome assembly has a size of 53,151,282 bp, with a G+C content of 53.11% consisting of 80 scaffolds with an N 50 of 7.46 Mbp. Copyright © 2017 Kim et al.
Draft Genome Sequence of Desulfovibrio BerOc1, a Mercury-Methylating Strain.
Goñi Urriza, Marisol; Gassie, Claire; Bouchez, Oliver; Klopp, Christophe; Guyoneaud, Rémy
2017-01-19
Desulfovibrio BerOc1 is a sulfate-reducing bacterium isolated from the Berre lagoon (French Mediterranean coast). BerOc1 is able to methylate and demethylate mercury. The genome size is 4,081,579 bp assembled into five contigs. We identified the hgcA and hgcB genes involved in mercury methylation, but not those responsible for mercury demethylation. Copyright © 2017 Goñi Urriza et al.
Lu, Fu-Hao; McKenzie, Neil; Kettleborough, George; Heavens, Darren; Clark, Matthew D; Bevan, Michael W
2018-05-01
The accurate sequencing and assembly of very large, often polyploid, genomes remains a challenging task, limiting long-range sequence information and phased sequence variation for applications such as plant breeding. The 15-Gb hexaploid bread wheat (Triticum aestivum) genome has been particularly challenging to sequence, and several different approaches have recently generated long-range assemblies. Mapping and understanding the types of assembly errors are important for optimising future sequencing and assembly approaches and for comparative genomics. Here we use a Fosill 38-kb jumping library to assess medium and longer-range order of different publicly available wheat genome assemblies. Modifications to the Fosill protocol generated longer Illumina sequences and enabled comprehensive genome coverage. Analyses of two independent Bacterial Artificial Chromosome (BAC)-based chromosome-scale assemblies, two independent Illumina whole genome shotgun assemblies, and a hybrid Single Molecule Real Time (SMRT-PacBio) and short read (Illumina) assembly were carried out. We revealed a surprising scale and variety of discrepancies using Fosill mate-pair mapping and validated several of each class. In addition, Fosill mate-pairs were used to scaffold a whole genome Illumina assembly, leading to a 3-fold increase in N50 values. Our analyses, using an independent means to validate different wheat genome assemblies, show that whole genome shotgun assemblies based solely on Illumina sequences are significantly more accurate by all measures compared to BAC-based chromosome-scale assemblies and hybrid SMRT-Illumina approaches. Although current whole genome assemblies are reasonably accurate and useful, additional improvements will be needed to generate complete assemblies of wheat genomes using open-source, computationally efficient, and cost-effective methods.
Transcriptomic survey of the midgut of Anthonomus grandis (Coleoptera: Curculionidae).
Salvador, Ricardo; Príncipi, Darío; Berretta, Marcelo; Fernández, Paula; Paniego, Norma; Sciocco-Cap, Alicia; Hopp, Esteban
2014-01-01
Anthonomus grandis Boheman is a key pest in cotton crops in the New World. Its larval stage develops within the flower bud using it as food and as protection against its predators. This behavior limits the effectiveness of its control using conventional insecticide applications and biocontrol techniques. In spite of its importance, little is known about its genome sequence and, more important, its specific expression in key organs like the midgut. Total mRNA isolated from larval midguts was used for pyrosequencing. Sequence reads were assembled and annotated to generate a unigene data set. In total, 400,000 reads from A. grandis midgut with an average length of 237 bp were assembled and combined into 20,915 contigs. The assembled reads fell into 6,621 genes models. BlastX search using the NCBI-NR database showed that 3,006 unigenes had significant matches to known sequences. Gene Ontology (GO) mapping analysis evidenced that A. grandis is able to transcripts coding for proteins involved in catalytic processing of macromolecules that allows its adaptation to very different feeding source scenarios. Furthermore, transcripts encoding for proteins involved in detoxification mechanisms such as p450 genes, glutathione-S-transferase, and carboxylesterases are also expressed. This is the first report of a transcriptomic study in A. grandis and the largest set of sequence data reported for this species. These data are valuable resources to expand the knowledge of this insect group and could be used in the design of new control strategies based in molecular information. © The Author 2014. Published by Oxford University Press on behalf of the Entomological Society of America.
Transcriptomic Survey of the Midgut of Anthonomus grandis (Coleoptera: Curculionidae)
Salvador, Ricardo; Príncipi, Darío; Berretta, Marcelo; Fernández, Paula; Paniego, Norma; Sciocco-Cap, Alicia; Hopp, Esteban
2014-01-01
Abstract Anthonomus grandis Boheman is a key pest in cotton crops in the New World. Its larval stage develops within the flower bud using it as food and as protection against its predators. This behavior limits the effectiveness of its control using conventional insecticide applications and biocontrol techniques. In spite of its importance, little is known about its genome sequence and, more important, its specific expression in key organs like the midgut. Total mRNA isolated from larval midguts was used for pyrosequencing. Sequence reads were assembled and annotated to generate a unigene data set. In total, 400,000 reads from A. grandis midgut with an average length of 237 bp were assembled and combined into 20,915 contigs. The assembled reads fell into 6,621 genes models. BlastX search using the NCBI-NR database showed that 3,006 unigenes had significant matches to known sequences. Gene Ontology (GO) mapping analysis evidenced that A. grandis is able to transcripts coding for proteins involved in catalytic processing of macromolecules that allows its adaptation to very different feeding source scenarios. Furthermore, transcripts encoding for proteins involved in detoxification mechanisms such as p450 genes, glutathione-S-transferase , and carboxylesterases are also expressed. This is the first report of a transcriptomic study in A. grandis and the largest set of sequence data reported for this species. These data are valuable resources to expand the knowledge of this insect group and could be used in the design of new control strategies based in molecular information. PMID:25473064
Representations of mechanical assembly sequences
NASA Technical Reports Server (NTRS)
Homem De Mello, Luiz S.; Sanderson, Arthur C.
1991-01-01
Five types of representations for assembly sequences are reviewed: the directed graph of feasible assembly sequences, the AND/OR graph of feasible assembly sequences, the set of establishment conditions, and two types of sets of precedence relationships. (precedence relationships between the establishment of one connection between parts and the establishment of another connection, and precedence relationships between the establishment of one connection and states of the assembly process). The mappings of one representation into the others are established. The correctness and completeness of these representations are established. The results presented are needed in the proof of correctness and completeness of algorithms for the generation of mechanical assembly sequences.
Bacillus horneckiae sp. nov., isolated from a spacecraft-assembly clean room.
Vaishampayan, Parag; Probst, Alexander; Krishnamurthi, Srinivasan; Ghosh, Sudeshna; Osman, Shariff; McDowall, Alasdair; Ruckmani, Arunachalam; Mayilraj, Shanmugam; Venkateswaran, Kasthuri
2010-05-01
Five Gram-stain-positive, motile, aerobic strains were isolated from a clean room of the Kennedy Space Center where the Phoenix spacecraft was assembled. All strains are rod-shaped, spore-forming bacteria, whose spores were resistant to UV radiation up to 1000 J m(-2). The spores were subterminally positioned and produced an external layer. A polyphasic taxonomic study including traditional biochemical tests, fatty acid analysis, cell-wall typing, lipid analyses, 16S rRNA gene sequencing and DNA-DNA hybridization studies was performed to characterize these novel strains. 16S rRNA gene sequencing and lipid analyses convincingly grouped these novel strains within the genus Bacillus as a cluster separate from already described species. The similarity of 16S rRNA gene sequences among the novel strains was >99 %, but the similarity was only about 97 % with their nearest neighbours Bacillus pocheonensis, Bacillus firmus and Bacillus bataviensis. DNA-DNA hybridization dissociation values were <24 % to the closest related type strains. The novel strains had a G+C content 35.6+/-0.5 mol% and could liquefy gelatin but did not utilize or produce acids from any of the carbon substrates tested. The major fatty acids were iso-C(15 : 0) and anteiso-C(15 : 0) and the cell-wall diamino acid was meso-diaminopimelic acid. Based on phylogenetic and phenotypic results, it is concluded that these strains represent a novel species of the genus Bacillus, for which the name Bacillus horneckiae sp. nov. is proposed. The type strain is 1P01SC(T) (=NRRL B-59162(T) =MTCC 9535(T)).
Ostergaard, Elsebet; Weraarpachai, Woranontee; Ravn, Kirstine; Born, Alfred Peter; Jønson, Lars; Duno, Morten; Wibrand, Flemming; Shoubridge, Eric A; Vissing, John
2015-03-01
We investigated a subject with an isolated cytochrome c oxidase (COX) deficiency presenting with an unusual phenotype characterised by neuropathy, exercise intolerance, obesity, and short stature. Blue-native polyacrylamide gel electrophoresis (BN-PAGE) analysis showed an almost complete lack of COX assembly in subject fibroblasts, consistent with the very low enzymatic activity, and pulse-labelling mitochondrial translation experiments showed a specific decrease in synthesis of the COX1 subunit, the core catalytic subunit that nucleates assembly of the holoenzyme. Whole exome sequencing identified compound heterozygous mutations (c.199dupC, c.215A>G) in COA3, a small inner membrane COX assembly factor, resulting in a pronounced decrease in the steady-state levels of COA3 protein. Retroviral expression of a wild-type COA3 cDNA completely rescued the COX assembly and mitochondrial translation defects, confirming the pathogenicity of the mutations, and resulted in increased steady-state levels of COX1 in control cells, demonstrating a role for COA3 in the stabilisation of this subunit. COA3 exists in an early COX assembly complex that contains COX1 and other COX assembly factors including COX14 (C12orf62), another single pass transmembrane protein that also plays a role in coupling COX1 synthesis with holoenzyme assembly. Immunoblot analysis showed that COX14 was undetectable in COA3 subject fibroblasts, and that COA3 was undetectable in fibroblasts from a COX14 subject, demonstrating the interdependence of these two COX assembly factors. The mild clinical course in this patient contrasts with nearly all other cases of severe COX assembly defects that are usually fatal early in life, and underscores the marked tissue-specific involvement in mitochondrial diseases. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Current conducting end plate of fuel cell assembly
Walsh, Michael M.
1999-01-01
A fuel cell assembly has a current conducting end plate with a conductive body formed integrally with isolating material. The conductive body has a first surface, a second surface opposite the first surface, and an electrical connector. The first surface has an exposed portion for conducting current between a working section of the fuel cell assembly and the electrical connector. The isolating material is positioned on at least a portion of the second surface. The conductive body can have support passage(s) extending therethrough for receiving structural member(s) of the fuel cell assembly. Isolating material can electrically isolate the conductive body from the structural member(s). The conductive body can have service passage(s) extending therethrough for servicing one or more fluids for the fuel cell assembly. Isolating material can chemically isolate the one or more fluids from the conductive body. The isolating material can also electrically isolate the conductive body from the one or more fluids.
Bashyal, Bishnu M.; Rawat, Kirti; Sharma, Sapna; Kulshreshtha, Deepika; Gopala Krishnan, S.; Singh, Ashok K.; Dubey, Himanshu; Solanke, Amolkumar U.; Sharma, T. R.; Aggarwal, Rashmi
2017-01-01
Fusarium fujikuroi causing bakanae disease has emerged as one of the major pathogen of rice across the world. The study aims to comparative genomic analysis of Fusarium fujikuroi isolates and identification of the secretary proteins of the fungus involved in rice pathogenesis. In the present study, F. fujikuroi isolate “F250” was sequenced with an assembly size of 42.47 Mb providing coverage of 96.89% on reference IMI58289 genome. A total of 13,603 protein-coding genes were predicted from genome assembly. The average gene density in the F. fujikuroi genome was 315.10 genes per Mb with an average gene length of 1.67 kb. Additionally, 134,374 single nucleotide polymorphisms (SNPs) are identified against IMI58289 isolate, with an average SNP density of 3.11 per kb of genome. Repetitive elements represent approximately 270,550 bp, which is 0.63% of the total genome. In total, 3,109 simple sequence repeats (SSRs), including 302 compound SSRs are identified in the 8,656 scaffolds. Comparative analysis of the isolates of F. fujikuroi revealed that they shared a total of 12,240 common clusters with F250 showing higher similarity with IMI58289. A total of 1,194 secretory proteins were identified in its genome among which there were 356 genes encoding carbohydrate active enzymes (CAZymes) capable for degradation of complex polysaccharides. Out of them glycoside hydrolase (GH) families were most prevalent (41%) followed by carbohydrate esterase (CE). Out of them CE8 (4 genes), PL1 (10 genes), PL3 (5 genes), and GH28 (8 genes) were prominent plant cell wall degrading enzymes families in F250 secretome. Besides this, 585 genes essential for the pathogen–host interactions were also identified. Selected genes were validated through quantitative real-time PCR analyses in resistant and susceptible genotypes of rice at different days of inoculation. The data offers a better understanding of F. fujikuroi genome and will help us enhance our knowledge on Fusarium fujikuroi–rice interactions. PMID:29230233
Metagenome assembly through clustering of next-generation sequencing data using protein sequences.
Sim, Mikang; Kim, Jaebum
2015-02-01
The study of environmental microbial communities, called metagenomics, has gained a lot of attention because of the recent advances in next-generation sequencing (NGS) technologies. Microbes play a critical role in changing their environments, and the mode of their effect can be solved by investigating metagenomes. However, the difficulty of metagenomes, such as the combination of multiple microbes and different species abundance, makes metagenome assembly tasks more challenging. In this paper, we developed a new metagenome assembly method by utilizing protein sequences, in addition to the NGS read sequences. Our method (i) builds read clusters by using mapping information against available protein sequences, and (ii) creates contig sequences by finding consensus sequences through probabilistic choices from the read clusters. By using simulated NGS read sequences from real microbial genome sequences, we evaluated our method in comparison with four existing assembly programs. We found that our method could generate relatively long and accurate metagenome assemblies, indicating that the idea of using protein sequences, as a guide for the assembly, is promising. Copyright © 2015 Elsevier B.V. All rights reserved.
Emergence of endemic MLST non-typeable vancomycin-resistant Enterococcus faecium.
Carter, Glen P; Buultjens, Andrew H; Ballard, Susan A; Baines, Sarah L; Tomita, Takehiro; Strachan, Janet; Johnson, Paul D R; Ferguson, John K; Seemann, Torsten; Stinear, Timothy P; Howden, Benjamin P
2016-12-01
Enterococcus faecium is a major nosocomial pathogen causing significant morbidity and mortality worldwide. Assessment of E. faecium using MLST to understand the spread of this organism is an important component of hospital infection control measures. Recent studies, however, suggest that MLST might be inadequate for E. faecium surveillance. To use WGS to characterize recently identified vancomycin-resistant E. faecium (VREfm) isolates non-typeable by MLST that appear to be causing a multi-jurisdictional outbreak in Australia. Illumina NextSeq and Pacific Biosciences SMRT sequencing platforms were used to determine the genome sequences of 66 non-typeable E. faecium (NTEfm) isolates. Phylogenetic and bioinformatics analyses were subsequently performed using a number of in silico tools. Sixty-six E. faecium isolates were identified by WGS from multiple health jurisdictions in Australia that could not be typed by MLST due to a missing pstS allele. SMRT sequencing and complete genome assembly revealed a large chromosomal rearrangement in representative strain DMG1500801, which likely facilitated the deletion of the pstS region. Phylogenomic analysis of this population suggests that deletion of pstS within E. faecium has arisen independently on at least three occasions. Importantly, the majority of these isolates displayed a vancomycin-resistant genotype. We have identified NTEfm isolates that appear to be causing a multi-jurisdictional outbreak in Australia. Identification of these isolates has important implications for MLST-based typing activities designed to monitor the spread of VREfm and provides further evidence supporting the use of WGS for hospital surveillance of E. faecium. © The Author 2016. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species
2013-01-01
Background The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. Results In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. Conclusions Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another. PMID:23870653
Enhanced sequencing coverage with digital droplet multiple displacement amplification
Sidore, Angus M.; Lan, Freeman; Lim, Shaun W.; Abate, Adam R.
2016-01-01
Sequencing small quantities of DNA is important for applications ranging from the assembly of uncultivable microbial genomes to the identification of cancer-associated mutations. To obtain sufficient quantities of DNA for sequencing, the small amount of starting material must be amplified significantly. However, existing methods often yield errors or non-uniform coverage, reducing sequencing data quality. Here, we describe digital droplet multiple displacement amplification, a method that enables massive amplification of low-input material while maintaining sequence accuracy and uniformity. The low-input material is compartmentalized as single molecules in millions of picoliter droplets. Because the molecules are isolated in compartments, they amplify to saturation without competing for resources; this yields uniform representation of all sequences in the final product and, in turn, enhances the quality of the sequence data. We demonstrate the ability to uniformly amplify the genomes of single Escherichia coli cells, comprising just 4.7 fg of starting DNA, and obtain sequencing coverage distributions that rival that of unamplified material. Digital droplet multiple displacement amplification provides a simple and effective method for amplifying minute amounts of DNA for accurate and uniform sequencing. PMID:26704978
Single haplotype assembly of the human genome from a hydatidiform mole.
Steinberg, Karyn Meltz; Schneider, Valerie A; Graves-Lindsay, Tina A; Fulton, Robert S; Agarwala, Richa; Huddleston, John; Shiryev, Sergey A; Morgulis, Aleksandr; Surti, Urvashi; Warren, Wesley C; Church, Deanna M; Eichler, Evan E; Wilson, Richard K
2014-12-01
A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly. © 2014 Steinberg et al.; Published by Cold Spring Harbor Laboratory Press.
Single haplotype assembly of the human genome from a hydatidiform mole
Steinberg, Karyn Meltz; Schneider, Valerie A.; Graves-Lindsay, Tina A.; Fulton, Robert S.; Agarwala, Richa; Huddleston, John; Shiryev, Sergey A.; Morgulis, Aleksandr; Surti, Urvashi; Warren, Wesley C.; Church, Deanna M.; Eichler, Evan E.; Wilson, Richard K.
2014-01-01
A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly. PMID:25373144
Restrepo Restrepo, Silvia; Aristizábal Gutiérrez, Fabio Ancizar; Montoya Castaño, Dolly
2015-01-01
Natural rubber (Hevea brasiliensis) is a tropical tree used commercially for the production of latex, from which 40,000 products are generated. The fungus Microcyclus ulei infects this tree, causing South American leaf blight (SALB) disease. This disease causes developmental delays and significant crop losses, thereby decreasing the production of latex. Currently several groups are working on obtaining clones of rubber tree with durable resistance to SALB through the use of extensive molecular biology techniques. In this study, we used a secondary clone that was resistant to M. ulei isolate GCL012. This clone, FX 3864 was obtained by crossing between clones PB 86 and B 38 (H. brasiliensis x H. brasiliensis). RNA-Seq high-throughput sequencing technology was used to analyze the differential expression of the FX 3864 clone transcriptome at 0 and 48 h post infection (hpi) with the M. ulei isolate GCL012. A total of 158,134,220 reads were assembled using the de novo assembly strategy to generate 90,775 contigs with an N50 of 1672. Using a reference-based assembly, 76,278 contigs were generated with an N50 of 1324. We identified 86 differentially expressed genes associated with the defense response of FX 3864 to GCL012. Seven putative genes members of the AP2/ERF ethylene (ET)-dependent superfamily were found to be down-regulated. An increase in salicylic acid (SA) was associated with the up-regulation of three genes involved in cell wall synthesis and remodeling, as well as in the down-regulation of the putative gene CPR5. The defense response of FX 3864 against the GCL012 isolate was associated with the antagonistic SA, ET and jasmonic acid (JA) pathways. These responses are characteristic of plant resistance to biotrophic pathogens. PMID:26287380
Hybrid de novo genome assembly of the Chinese herbal fleabane Erigeron breviscapus
Zhang, Guanghui; Zhang, Jing; Liu, Hui; Chen, Wei; Wang, Xiao; Li, Yahe
2017-01-01
Abstract Background: The plants in the Erigeron genus of the Compositae (Asteraceae) family are commonly called fleabanes, possibly due to the belief that certain chemicals in these plants repel fleas. In the traditional Chinese medicine, Erigeron breviscapus, which is native to China, was widely used in the treatment of cerebrovascular disease. A handful of bioactive compounds, including scutellarin, 3,5-dicaffeoylquinic acid, and 3,4-dicaffeoylquinic acid, have been isolated from the plant. With the purpose of finding novel medicinal compounds and understanding their biosynthetic pathways, we propose to sequence the genome of E. breviscapus. Findings: We assembled the highly heterozygous E. breviscapus genome using a combination of PacBio single-molecular real-time sequencing and next-generation sequencing methods on the Illumina HiSeq platform. The final draft genome is approximately 1.2 Gb, with contig and scaffold N50 sizes of 18.8 kb and 31.5 kb, respectively. Further analyses predicted 37 504 protein-coding genes in the E. breviscapus genome and 8172 shared gene families among Compositae species. Conclusions: The E. breviscapus genome provides a valuable resource for the investigation of novel bioactive compounds in this Chinese herb. PMID:28431028
Single molecule sequencing-guided scaffolding and correction of draft assemblies.
Zhu, Shenglong; Chen, Danny Z; Emrich, Scott J
2017-12-06
Although single molecule sequencing is still improving, the lengths of the generated sequences are inevitably an advantage in genome assembly. Prior work that utilizes long reads to conduct genome assembly has mostly focused on correcting sequencing errors and improving contiguity of de novo assemblies. We propose a disassembling-reassembling approach for both correcting structural errors in the draft assembly and scaffolding a target assembly based on error-corrected single molecule sequences. To achieve this goal, we formulate a maximum alternating path cover problem. We prove that this problem is NP-hard, and solve it by a 2-approximation algorithm. Our experimental results show that our approach can improve the structural correctness of target assemblies in the cost of some contiguity, even with smaller amounts of long reads. In addition, our reassembling process can also serve as a competitive scaffolder relative to well-established assembly benchmarks.
De novo assembly of human genomes with massively parallel short read sequencing.
Li, Ruiqiang; Zhu, Hongmei; Ruan, Jue; Qian, Wubin; Fang, Xiaodong; Shi, Zhongbin; Li, Yingrui; Li, Shengting; Shan, Gao; Kristiansen, Karsten; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun
2010-02-01
Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.
Tejerizo, Gonzalo Torres; Kim, Yong Sung; Maus, Irena; Wibberg, Daniel; Winkler, Anika; Off, Sandra; Pühler, Alfred; Scherer, Paul; Schlüter, Andreas
2017-04-10
Methanogenic Archaea are of importance at the end of the anaerobic digestion (AD) chain for biomass conversion. They finally produce methane, the end-product of AD. Among this group of microorganisms, members of the genus Methanobacterium are ubiquitously present in anaerobic habitats, such as bioreactors. The genome of a novel methanogenic archaeon, namely Methanobacterium congolense Buetzberg, originally isolated from a mesophilic biogas plant, was completely sequenced to analyze putative adaptive genome features conferring competitiveness of this isolate within the biogas reactor environment. Sequencing and assembly of the M. congolense Buetzberg genome yielded a chromosome with a size of 2,451,457bp and a mean GC-content of 38.51%. Additionally, a plasmid with a size of 18,118bp, featuring a GC content of 36.05% was identified. The M. congolense Buetzberg plasmid showed no sequence similarities with the plasmids described previously suggesting that it represents a new plasmid type. Analysis of the M. congolense Buetzberg chromosome architecture revealed a high collinearity with the Methanobacterium paludis chromosome. Furthermore, annotation of the genome and functional predictions disclosed several genes involved in cell wall and membrane biogenesis. Compilation of specific genes among Methanobacterium strains originating from AD environments revealed 474 genetic determinants that could be crucial for adaptation of these strains to specific conditions prevailing in AD habitats. Copyright © 2017 Elsevier B.V. All rights reserved.
Cooper, Vaughn S.; Hatcher, Philip J.; Verheyde, Bart; Carlier, Aurélien; Vandamme, Peter
2017-01-01
The natural environment serves as a reservoir of opportunistic pathogens. A well-established method for studying the epidemiology of such opportunists is multilocus sequence typing, which in many cases has defined strains predisposed to causing infection. Burkholderia multivorans is an important pathogen in people with cystic fibrosis (CF) and its epidemiology suggests that strains are acquired from non-human sources such as the natural environment. This raises the central question of whether the isolation source (CF or environment) or the multilocus sequence type (ST) of B. multivorans better predicts their genomic content and functionality. We identified four pairs of B. multivorans isolates, representing distinct STs and consisting of one CF and one environmental isolate each. All genomes were sequenced using the PacBio SMRT sequencing technology, which resulted in eight high-quality B. multivorans genome assemblies. The present study demonstrated that the genomic structure of the examined B. multivorans STs is highly conserved and that the B. multivorans genomic lineages are defined by their ST. Orthologous protein families were not uniformly distributed among chromosomes, with core orthologs being enriched on the primary chromosome and ST-specific orthologs being enriched on the second and third chromosome. The ST-specific orthologs were enriched in genes involved in defense mechanisms and secondary metabolism, corroborating the strain-specificity of these virulence characteristics. Finally, the same B. multivorans genomic lineages occur in both CF and environmental samples and on different continents, demonstrating their ubiquity and evolutionary persistence. PMID:28430818
Mottawea, Walid; Duceppe, Marc-Olivier; Dupras, Andrée A; Usongo, Valentine; Jeukens, Julie; Freschi, Luca; Emond-Rheault, Jean-Guillaume; Hamel, Jeremie; Kukavica-Ibrulj, Irena; Boyle, Brian; Gill, Alexander; Burnett, Elton; Franz, Eelco; Arya, Gitanjali; Weadge, Joel T; Gruenheid, Samantha; Wiedmann, Martin; Huang, Hongsheng; Daigle, France; Moineau, Sylvain; Bekal, Sadjia; Levesque, Roger C; Goodridge, Lawrence D; Ogunremi, Dele
2018-01-01
Non-typhoidal Salmonella is a leading cause of foodborne illness worldwide. Prompt and accurate identification of the sources of Salmonella responsible for disease outbreaks is crucial to minimize infections and eliminate ongoing sources of contamination. Current subtyping tools including single nucleotide polymorphism (SNP) typing may be inadequate, in some instances, to provide the required discrimination among epidemiologically unrelated Salmonella strains. Prophage genes represent the majority of the accessory genes in bacteria genomes and have potential to be used as high discrimination markers in Salmonella . In this study, the prophage sequence diversity in different Salmonella serovars and genetically related strains was investigated. Using whole genome sequences of 1,760 isolates of S. enterica representing 151 Salmonella serovars and 66 closely related bacteria, prophage sequences were identified from assembled contigs using PHASTER. We detected 154 different prophages in S. enterica genomes. Prophage sequences were highly variable among S. enterica serovars with a median ± interquartile range (IQR) of 5 ± 3 prophage regions per genome. While some prophage sequences were highly conserved among the strains of specific serovars, few regions were lineage specific. Therefore, strains belonging to each serovar could be clustered separately based on their prophage content. Analysis of S . Enteritidis isolates from seven outbreaks generated distinct prophage profiles for each outbreak. Taken altogether, the diversity of the prophage sequences correlates with genome diversity. Prophage repertoires provide an additional marker for differentiating S. enterica subtypes during foodborne outbreaks.
A whole-genome assembly of the domestic cow, Bos taurus
USDA-ARS?s Scientific Manuscript database
Background: The genome of the domestic cow, Bos taurus, was sequenced using a mixture of hierarchical and whole-genome shotgun sequencing methods. Results: We have assembled the 35 million sequence reads and applied a variety of assembly improvement techniques, creating an assembly of 2.86 billion b...
Chen, Chaoyang; Sun, Chongran; Wu, Yi-Rui
2018-03-21
A wild-type solventogenic strain Clostridium diolis WST, isolated from mangrove sediments, was characterized to produce high amount of butanol and acetone with negligible level of ethanol and acids from glucose via a unique acetone-butanol (AB) fermentation pathway. Through the genomic sequencing, the assembled draft genome of strain WST is calculated to be 5.85 Mb with a GC content of 29.69% and contains 5263 genes that contribute to the annotation of 5049 protein-coding sequences. Within these annotated genes, the butanol dehydrogenase gene (bdh) was determined to be in a higher amount from strain WST compared to other Clostridial strains, which is positively related to its high-efficient production of butanol. Therefore, we present a draft genome sequence analysis of strain WST in this article that should facilitate to further understand the solventogenic mechanism of this special microorganism.
Assessing pooled BAC and whole genome shotgun strategies for assembly of complex genomes.
Haiminen, Niina; Feltus, F Alex; Parida, Laxmi
2011-04-15
We investigate if pooling BAC clones and sequencing the pools can provide for more accurate assembly of genome sequences than the "whole genome shotgun" (WGS) approach. Furthermore, we quantify this accuracy increase. We compare the pooled BAC and WGS approaches using in silico simulations. Standard measures of assembly quality focus on assembly size and fragmentation, which are desirable for large whole genome assemblies. We propose additional measures enabling easy and visual comparison of assembly quality, such as rearrangements and redundant sequence content, relative to the known target sequence. The best assembly quality scores were obtained using 454 coverage of 15× linear and 5× paired (3kb insert size) reads (15L-5P) on Arabidopsis. This regime gave similarly good results on four additional plant genomes of very different GC and repeat contents. BAC pooling improved assembly scores over WGS assembly, coverage and redundancy scores improving the most. BAC pooling works better than WGS, however, both require a physical map to order the scaffolds. Pool sizes up to 12Mbp work well, suggesting this pooling density to be effective in medium-scale re-sequencing applications such as targeted sequencing of QTL intervals for candidate gene discovery. Assuming the current Roche/454 Titanium sequencing limitations, a 12 Mbp region could be re-sequenced with a full plate of linear reads and a half plate of paired-end reads, yielding 15L-5P coverage after read pre-processing. Our simulation suggests that massively over-sequencing may not improve accuracy. Our scoring measures can be used generally to evaluate and compare results of simulated genome assemblies.
MetaCAA: A clustering-aided methodology for efficient assembly of metagenomic datasets.
Reddy, Rachamalla Maheedhar; Mohammed, Monzoorul Haque; Mande, Sharmila S
2014-01-01
A key challenge in analyzing metagenomics data pertains to assembly of sequenced DNA fragments (i.e. reads) originating from various microbes in a given environmental sample. Several existing methodologies can assemble reads originating from a single genome. However, these methodologies cannot be applied for efficient assembly of metagenomic sequence datasets. In this study, we present MetaCAA - a clustering-aided methodology which helps in improving the quality of metagenomic sequence assembly. MetaCAA initially groups sequences constituting a given metagenome into smaller clusters. Subsequently, sequences in each cluster are independently assembled using CAP3, an existing single genome assembly program. Contigs formed in each of the clusters along with the unassembled reads are then subjected to another round of assembly for generating the final set of contigs. Validation using simulated and real-world metagenomic datasets indicates that MetaCAA aids in improving the overall quality of assembly. A software implementation of MetaCAA is available at https://metagenomics.atc.tcs.com/MetaCAA. Copyright © 2014 Elsevier Inc. All rights reserved.
Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo
2003-01-01
To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979
Consensus generation and variant detection by Celera Assembler.
Denisov, Gennady; Walenz, Brian; Halpern, Aaron L; Miller, Jason; Axelrod, Nelson; Levy, Samuel; Sutton, Granger
2008-04-15
We present an algorithm to identify allelic variation given a Whole Genome Shotgun (WGS) assembly of haploid sequences, and to produce a set of haploid consensus sequences rather than a single consensus sequence. Existing WGS assemblers take a column-by-column approach to consensus generation, and produce a single consensus sequence which can be inconsistent with the underlying haploid alleles, and inconsistent with any of the aligned sequence reads. Our new algorithm uses a dynamic windowing approach. It detects alleles by simultaneously processing the portions of aligned reads spanning a region of sequence variation, assigns reads to their respective alleles, phases adjacent variant alleles and generates a consensus sequence corresponding to each confirmed allele. This algorithm was used to produce the first diploid genome sequence of an individual human. It can also be applied to assemblies of multiple diploid individuals and hybrid assemblies of multiple haploid organisms. Being applied to the individual human genome assembly, the new algorithm detects exactly two confirmed alleles and reports two consensus sequences in 98.98% of the total number 2,033311 detected regions of sequence variation. In 33,269 out of 460,373 detected regions of size >1 bp, it fixes the constructed errors of a mosaic haploid representation of a diploid locus as produced by the original Celera Assembler consensus algorithm. Using an optimized procedure calibrated against 1 506 344 known SNPs, it detects 438 814 new heterozygous SNPs with false positive rate 12%. The open source code is available at: http://wgs-assembler.cvs.sourceforge.net/wgs-assembler/
Popova, Blagovesta; Schubert, Steffen; Bulla, Ingo; Buchwald, Daniela; Kramer, Wilfried
2015-01-01
A major challenge in gene library generation is to guarantee a large functional size and diversity that significantly increases the chances of selecting different functional protein variants. The use of trinucleotides mixtures for controlled randomization results in superior library diversity and offers the ability to specify the type and distribution of the amino acids at each position. Here we describe the generation of a high diversity gene library using tHisF of the hyperthermophile Thermotoga maritima as a scaffold. Combining various rational criteria with contingency, we targeted 26 selected codons of the thisF gene sequence for randomization at a controlled level. We have developed a novel method of creating full-length gene libraries by combinatorial assembly of smaller sub-libraries. Full-length libraries of high diversity can easily be assembled on demand from smaller and much less diverse sub-libraries, which circumvent the notoriously troublesome long-term archivation and repeated proliferation of high diversity ensembles of phages or plasmids. We developed a generally applicable software tool for sequence analysis of mutated gene sequences that provides efficient assistance for analysis of library diversity. Finally, practical utility of the library was demonstrated in principle by assessment of the conformational stability of library members and isolating protein variants with HisF activity from it. Our approach integrates a number of features of nucleic acids synthetic chemistry, biochemistry and molecular genetics to a coherent, flexible and robust method of combinatorial gene synthesis. PMID:26355961
Popova, Blagovesta; Schubert, Steffen; Bulla, Ingo; Buchwald, Daniela; Kramer, Wilfried
2015-01-01
A major challenge in gene library generation is to guarantee a large functional size and diversity that significantly increases the chances of selecting different functional protein variants. The use of trinucleotides mixtures for controlled randomization results in superior library diversity and offers the ability to specify the type and distribution of the amino acids at each position. Here we describe the generation of a high diversity gene library using tHisF of the hyperthermophile Thermotoga maritima as a scaffold. Combining various rational criteria with contingency, we targeted 26 selected codons of the thisF gene sequence for randomization at a controlled level. We have developed a novel method of creating full-length gene libraries by combinatorial assembly of smaller sub-libraries. Full-length libraries of high diversity can easily be assembled on demand from smaller and much less diverse sub-libraries, which circumvent the notoriously troublesome long-term archivation and repeated proliferation of high diversity ensembles of phages or plasmids. We developed a generally applicable software tool for sequence analysis of mutated gene sequences that provides efficient assistance for analysis of library diversity. Finally, practical utility of the library was demonstrated in principle by assessment of the conformational stability of library members and isolating protein variants with HisF activity from it. Our approach integrates a number of features of nucleic acids synthetic chemistry, biochemistry and molecular genetics to a coherent, flexible and robust method of combinatorial gene synthesis.
Ferreira, Dalila Souza Santos; Kato, Rodrigo Bentes; Miranda, Fábio Malcher; da Costa Pinheiro, Kenny; Fonseca, Paula Luize Camargos; Tomé, Luiz Marcelo Ribeiro; Vaz, Aline Bruna Martins; Badotti, Fernanda; Ramos, Rommel Thiago Jucá; Brenig, Bertram; Azevedo, Vasco Ariston de Carvalho; Benevides, Raquel Guimarães; Góes-Neto, Aristóteles
2018-06-01
Herein, we present the draft genome of Trametes villosa isolate CCMB561, a wood-decaying Basidiomycota commonly found in tropical semiarid climate. The genome assembly was 57.98 Mb in size with an L50 of 691. A total of 16,711 putative protein-encoding genes was predicted, including 590 genes coding for carbohydrate-active enzymes (CAZy), directly involved in the decomposition of lignocellulosic materials. This is the first genome of this species of high interest in bioenergy research. The draft genome of Trametes villosa isolate CCMB561 will provide an important resource for future investigations in biofuel production, bioremediation and other green technologies.
Sharma, Sandeep; Zaccaron, Alex Z; Ridenour, John B; Allen, Tom W; Conner, Kassie; Doyle, Vinson P; Price, Trey; Sikora, Edward; Singh, Raghuwinder; Spurlock, Terry; Tomaso-Peterson, Maria; Wilkerson, Tessie; Bluhm, Burton H
2018-04-01
The draft genome of Xylaria sp. isolate MSU_SB201401, causal agent of taproot decline of soybean in the southern U.S., is presented here. The genome assembly was 56.7 Mb in size with an L50 of 246. A total of 10,880 putative protein-encoding genes were predicted, including 647 genes encoding carbohydrate-active enzymes and 1053 genes encoding secreted proteins. This is the first draft genome of a plant-pathogenic Xylaria sp. associated with soybean. The draft genome of Xylaria sp. isolate MSU_SB201401 will provide an important resource for future experiments to determine the molecular basis of pathogenesis.
Software for pre-processing Illumina next-generation sequencing short read sequences
2014-01-01
Background When compared to Sanger sequencing technology, next-generation sequencing (NGS) technologies are hindered by shorter sequence read length, higher base-call error rate, non-uniform coverage, and platform-specific sequencing artifacts. These characteristics lower the quality of their downstream analyses, e.g. de novo and reference-based assembly, by introducing sequencing artifacts and errors that may contribute to incorrect interpretation of data. Although many tools have been developed for quality control and pre-processing of NGS data, none of them provide flexible and comprehensive trimming options in conjunction with parallel processing to expedite pre-processing of large NGS datasets. Methods We developed ngsShoRT (next-generation sequencing Short Reads Trimmer), a flexible and comprehensive open-source software package written in Perl that provides a set of algorithms commonly used for pre-processing NGS short read sequences. We compared the features and performance of ngsShoRT with existing tools: CutAdapt, NGS QC Toolkit and Trimmomatic. We also compared the effects of using pre-processed short read sequences generated by different algorithms on de novo and reference-based assembly for three different genomes: Caenorhabditis elegans, Saccharomyces cerevisiae S288c, and Escherichia coli O157 H7. Results Several combinations of ngsShoRT algorithms were tested on publicly available Illumina GA II, HiSeq 2000, and MiSeq eukaryotic and bacteria genomic short read sequences with the focus on removing sequencing artifacts and low-quality reads and/or bases. Our results show that across three organisms and three sequencing platforms, trimming improved the mean quality scores of trimmed sequences. Using trimmed sequences for de novo and reference-based assembly improved assembly quality as well as assembler performance. In general, ngsShoRT outperformed comparable trimming tools in terms of trimming speed and improvement of de novo and reference-based assembly as measured by assembly contiguity and correctness. Conclusions Trimming of short read sequences can improve the quality of de novo and reference-based assembly and assembler performance. The parallel processing capability of ngsShoRT reduces trimming time and improves the memory efficiency when dealing with large datasets. We recommend combining sequencing artifacts removal, and quality score based read filtering and base trimming as the most consistent method for improving sequence quality and downstream assemblies. ngsShoRT source code, user guide and tutorial are available at http://research.bioinformatics.udel.edu/genomics/ngsShoRT/. ngsShoRT can be incorporated as a pre-processing step in genome and transcriptome assembly projects. PMID:24955109
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lapidus, Alla L.
From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly ofmore » whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.« less
Phylogenomics of Colombian Helicobacter pylori isolates.
Gutiérrez-Escobar, Andrés Julián; Trujillo, Esperanza; Acevedo, Orlando; Bravo, María Mercedes
2017-01-01
During the Spanish colonisation of South America, African slaves and Europeans arrived in the continent with their corresponding load of pathogens, including Helicobacter pylori . Colombian strains have been clustered with the hpEurope population and with the hspWestAfrica subpopulation in multilocus sequence typing (MLST) studies. However, ancestry studies have revealed the presence of population components specific to H. pylori in Colombia. The aim of this study was to perform a thorough phylogenomic analysis to describe the evolution of the Colombian urban H. pylori isolates. A total of 115 genomes of H. pylori were sequenced with Illumina technology from H. pylori isolates obtained in Colombia in a region of high risk for gastric cancer. The genomes were assembled, annotated and underwent phylogenomic analysis with 36 reference strains. Additionally, population differentiation analyses were performed for two bacterial genes. The phylogenetic tree revealed clustering of the Colombian strains with hspWestAfrica and hpEurope, along with three clades formed exclusively by Colombian strains, suggesting the presence of independent evolutionary lines for Colombia. Additionally, the nucleotide diversity of horB and vacA genes from Colombian isolates was lower than in the reference strains and showed a significant genetic differentiation supporting the hypothesis of independent clades with recent evolution. The presence of specific lineages suggest the existence of an hspColombia subtype that emerged from a small and relatively isolated ancestral population that accompanied crossbreeding of human population in Colombia.
Moltó-García, Belén; Liébana-Martos, María del Carmen; Cuadros-Moronta, Elena; Rodríguez-Granger, Javier; Sampedro-Martínez, Antonio; Rosa-Fraile, Manuel; Gutierrez-Fernández, José; Puertas-Priet, Alberto; Navarro-Marí, José María
2016-03-01
Streptococcus agalactiae (Group B streptococcus, GBS) is increasingly recognized as a pathogen in adult populations, including the elderly. Appropriate treatment involves antibiotics. An alternative to this strategy would be the administration of a polysaccharide vaccine therefore the capsular serotypes and molecular characterization of circulating strains needs to be known. Few studies have been conducted in this population. One hundred and seven GBS isolates collected from vagino-rectal swabs from 600 post-menopausal women were analysed for their capsular type, antimicrobial resistance and genetic relatedness (multilocus sequence typing, MLST). The colonization rate was 17.8%. Capsular type III was predominant (34.6%), followed by type V (22.4%). The most frequent sequence type (ST) was 19 (23.3%), followed by 23 (18.7%), 1 (16.8%) and 17 (12.1%). Isolates were assembled into three phylogenetic groups from ST-19, ST-23 and ST-17 founders. All isolates were susceptible to penicillin, whereas resistance to erythromycin and clindamycin was recorded in 23.4% and 20.6% of isolates, respectively. In our setting, the GBS colonization rate in postmenopausal women is similar to that reported in others populations studied. The population structure of these isolates is highly diverse and contains different STs. These data can contribute to the future development of a polysaccharide vaccine for preventing GBS infection in older adults. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Genome analysis of E. coli isolated from Crohn's disease patients.
Rakitina, Daria V; Manolov, Alexander I; Kanygina, Alexandra V; Garushyants, Sofya K; Baikova, Julia P; Alexeev, Dmitry G; Ladygina, Valentina G; Kostryukova, Elena S; Larin, Andrei K; Semashko, Tatiana A; Karpova, Irina Y; Babenko, Vladislav V; Ismagilova, Ruzilya K; Malanin, Sergei Y; Gelfand, Mikhail S; Ilina, Elena N; Gorodnichev, Roman B; Lisitsyna, Eugenia S; Aleshkin, Gennady I; Scherbakov, Petr L; Khalif, Igor L; Shapina, Marina V; Maev, Igor V; Andreev, Dmitry N; Govorun, Vadim M
2017-07-19
Escherichia coli (E. coli) has been increasingly implicated in the pathogenesis of Crohn's disease (CD). The phylogeny of E. coli isolated from Crohn's disease patients (CDEC) was controversial, and while genotyping results suggested heterogeneity, the sequenced strains of E. coli from CD patients were closely related. We performed the shotgun genome sequencing of 28 E. coli isolates from ten CD patients and compared genomes from these isolates with already published genomes of CD strains and other pathogenic and non-pathogenic strains. CDEC was shown to belong to A, B1, B2 and D phylogenetic groups. The plasmid and several operons from the reference CD-associated E. coli strain LF82 were demonstrated to be more often present in CDEC genomes belonging to different phylogenetic groups than in genomes of commensal strains. The operons include carbon-source induced invasion GimA island, prophage I, iron uptake operons I and II, capsular assembly pathogenetic island IV and propanediol and galactitol utilization operons. Our findings suggest that CDEC are phylogenetically diverse. However, some strains isolated from independent sources possess highly similar chromosome or plasmids. Though no CD-specific genes or functional domains were present in all CD-associated strains, some genes and operons are more often found in the genomes of CDEC than in commensal E. coli. They are principally linked to gut colonization and utilization of propanediol and other sugar alcohols.
Analysis of Illumina Microbial Assemblies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clum, Alicia; Foster, Brian; Froula, Jeff
2010-05-28
Since the emerging of second generation sequencing technologies, the evaluation of different sequencing approaches and their assembly strategies for different types of genomes has become an important undertaken. Next generation sequencing technologies dramatically increase sequence throughput while decreasing cost, making them an attractive tool for whole genome shotgun sequencing. To compare different approaches for de-novo whole genome assembly, appropriate tools and a solid understanding of both quantity and quality of the underlying sequence data are crucial. Here, we performed an in-depth analysis of short-read Illumina sequence assembly strategies for bacterial and archaeal genomes. Different types of Illumina libraries as wellmore » as different trim parameters and assemblers were evaluated. Results of the comparative analysis and sequencing platforms will be presented. The goal of this analysis is to develop a cost-effective approach for the increased throughput of the generation of high quality microbial genomes.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Torella, JP; Lienert, F; Boehm, CR
2014-08-07
Recombination-based DNA construction methods, such as Gibson assembly, have made it possible to easily and simultaneously assemble multiple DNA parts, and they hold promise for the development and optimization of metabolic pathways and functional genetic circuits. Over time, however, these pathways and circuits have become more complex, and the increasing need for standardization and insulation of genetic parts has resulted in sequence redundancies-for example, repeated terminator and insulator sequences-that complicate recombination-based assembly. We and others have recently developed DNA assembly methods, which we refer to collectively as unique nucleotide sequence (UNS)-guided assembly, in which individual DNA parts are flanked withmore » UNSs to facilitate the ordered, recombination-based assembly of repetitive sequences. Here we present a detailed protocol for UNS-guided assembly that enables researchers to convert multiple DNA parts into sequenced, correctly assembled constructs, or into high-quality combinatorial libraries in only 2-3 d. If the DNA parts must be generated from scratch, an additional 2-5 d are necessary. This protocol requires no specialized equipment and can easily be implemented by a student with experience in basic cloning techniques.« less
GFinisher: a new strategy to refine and finish bacterial genome assemblies
NASA Astrophysics Data System (ADS)
Guizelini, Dieval; Raittz, Roberto T.; Cruz, Leonardo M.; Souza, Emanuel M.; Steffens, Maria B. R.; Pedrosa, Fabio O.
2016-10-01
Despite the development in DNA sequencing technology, improving the number and the length of reads, the process of reconstruction of complete genome sequences, the so called genome assembly, is still complex. Only 13% of the prokaryotic genome sequencing projects have been completed. Draft genome sequences deposited in public databases are fragmented in contigs and may lack the full gene complement. The aim of the present work is to identify assembly errors and improve the assembly process of bacterial genomes. The biological patterns observed in genomic sequences and the application of a priori information can allow the identification of misassembled regions, and the reorganization and improvement of the overall de novo genome assembly. GFinisher starts generating a Fuzzy GC skew graphs for each contig in an assembly and follows breaking down the contigs in critical points in order to reassemble and close them using jFGap. This has been successfully applied to dataset from 96 genome assemblies, decreasing the number of contigs by up to 86%. GFinisher can easily optimize assemblies of prokaryotic draft genomes and can be used to improve the assembly programs based on nucleotide sequence patterns in the genome. The software and source code are available at http://gfinisher.sourceforge.net/.
Torella, Joseph P.; Lienert, Florian; Boehm, Christian R.; Chen, Jan-Hung; Way, Jeffrey C.; Silver, Pamela A.
2016-01-01
Recombination-based DNA construction methods, such as Gibson assembly, have made it possible to easily and simultaneously assemble multiple DNA parts and hold promise for the development and optimization of metabolic pathways and functional genetic circuits. Over time, however, these pathways and circuits have become more complex, and the increasing need for standardization and insulation of genetic parts has resulted in sequence redundancies — for example repeated terminator and insulator sequences — that complicate recombination-based assembly. We and others have recently developed DNA assembly methods that we refer to collectively as unique nucleotide sequence (UNS)-guided assembly, in which individual DNA parts are flanked with UNSs to facilitate the ordered, recombination-based assembly of repetitive sequences. Here we present a detailed protocol for UNS-guided assembly that enables researchers to convert multiple DNA parts into sequenced, correctly-assembled constructs, or into high-quality combinatorial libraries in only 2–3 days. If the DNA parts must be generated from scratch, an additional 2–5 days are necessary. This protocol requires no specialized equipment and can easily be implemented by a student with experience in basic cloning techniques. PMID:25101822
GFinisher: a new strategy to refine and finish bacterial genome assemblies.
Guizelini, Dieval; Raittz, Roberto T; Cruz, Leonardo M; Souza, Emanuel M; Steffens, Maria B R; Pedrosa, Fabio O
2016-10-10
Despite the development in DNA sequencing technology, improving the number and the length of reads, the process of reconstruction of complete genome sequences, the so called genome assembly, is still complex. Only 13% of the prokaryotic genome sequencing projects have been completed. Draft genome sequences deposited in public databases are fragmented in contigs and may lack the full gene complement. The aim of the present work is to identify assembly errors and improve the assembly process of bacterial genomes. The biological patterns observed in genomic sequences and the application of a priori information can allow the identification of misassembled regions, and the reorganization and improvement of the overall de novo genome assembly. GFinisher starts generating a Fuzzy GC skew graphs for each contig in an assembly and follows breaking down the contigs in critical points in order to reassemble and close them using jFGap. This has been successfully applied to dataset from 96 genome assemblies, decreasing the number of contigs by up to 86%. GFinisher can easily optimize assemblies of prokaryotic draft genomes and can be used to improve the assembly programs based on nucleotide sequence patterns in the genome. The software and source code are available at http://gfinisher.sourceforge.net/.
An improved genome assembly uncovers prolific tandem repeats in Atlantic cod.
Tørresen, Ole K; Star, Bastiaan; Jentoft, Sissel; Reinar, William B; Grove, Harald; Miller, Jason R; Walenz, Brian P; Knight, James; Ekholm, Jenny M; Peluso, Paul; Edvardsen, Rolf B; Tooming-Klunderud, Ave; Skage, Morten; Lien, Sigbjørn; Jakobsen, Kjetill S; Nederbragt, Alexander J
2017-01-18
The first Atlantic cod (Gadus morhua) genome assembly published in 2011 was one of the early genome assemblies exclusively based on high-throughput 454 pyrosequencing. Since then, rapid advances in sequencing technologies have led to a multitude of assemblies generated for complex genomes, although many of these are of a fragmented nature with a significant fraction of bases in gaps. The development of long-read sequencing and improved software now enable the generation of more contiguous genome assemblies. By combining data from Illumina, 454 and the longer PacBio sequencing technologies, as well as integrating the results of multiple assembly programs, we have created a substantially improved version of the Atlantic cod genome assembly. The sequence contiguity of this assembly is increased fifty-fold and the proportion of gap-bases has been reduced fifteen-fold. Compared to other vertebrates, the assembly contains an unusual high density of tandem repeats (TRs). Indeed, retrospective analyses reveal that gaps in the first genome assembly were largely associated with these TRs. We show that 21% of the TRs across the assembly, 19% in the promoter regions and 12% in the coding sequences are heterozygous in the sequenced individual. The inclusion of PacBio reads combined with the use of multiple assembly programs drastically improved the Atlantic cod genome assembly by successfully resolving long TRs. The high frequency of heterozygous TRs within or in the vicinity of genes in the genome indicate a considerable standing genomic variation in Atlantic cod populations, which is likely of evolutionary importance.
Assessing pooled BAC and whole genome shotgun strategies for assembly of complex genomes
2011-01-01
Background We investigate if pooling BAC clones and sequencing the pools can provide for more accurate assembly of genome sequences than the "whole genome shotgun" (WGS) approach. Furthermore, we quantify this accuracy increase. We compare the pooled BAC and WGS approaches using in silico simulations. Standard measures of assembly quality focus on assembly size and fragmentation, which are desirable for large whole genome assemblies. We propose additional measures enabling easy and visual comparison of assembly quality, such as rearrangements and redundant sequence content, relative to the known target sequence. Results The best assembly quality scores were obtained using 454 coverage of 15× linear and 5× paired (3kb insert size) reads (15L-5P) on Arabidopsis. This regime gave similarly good results on four additional plant genomes of very different GC and repeat contents. BAC pooling improved assembly scores over WGS assembly, coverage and redundancy scores improving the most. Conclusions BAC pooling works better than WGS, however, both require a physical map to order the scaffolds. Pool sizes up to 12Mbp work well, suggesting this pooling density to be effective in medium-scale re-sequencing applications such as targeted sequencing of QTL intervals for candidate gene discovery. Assuming the current Roche/454 Titanium sequencing limitations, a 12 Mbp region could be re-sequenced with a full plate of linear reads and a half plate of paired-end reads, yielding 15L-5P coverage after read pre-processing. Our simulation suggests that massively over-sequencing may not improve accuracy. Our scoring measures can be used generally to evaluate and compare results of simulated genome assemblies. PMID:21496274
An investigation of Hebbian phase sequences as assembly graphs
Almeida-Filho, Daniel G.; Lopes-dos-Santos, Vitor; Vasconcelos, Nivaldo A. P.; Miranda, José G. V.; Tort, Adriano B. L.; Ribeiro, Sidarta
2014-01-01
Hebb proposed that synapses between neurons that fire synchronously are strengthened, forming cell assemblies and phase sequences. The former, on a shorter scale, are ensembles of synchronized cells that function transiently as a closed processing system; the latter, on a larger scale, correspond to the sequential activation of cell assemblies able to represent percepts and behaviors. Nowadays, the recording of large neuronal populations allows for the detection of multiple cell assemblies. Within Hebb's theory, the next logical step is the analysis of phase sequences. Here we detected phase sequences as consecutive assembly activation patterns, and then analyzed their graph attributes in relation to behavior. We investigated action potentials recorded from the adult rat hippocampus and neocortex before, during and after novel object exploration (experimental periods). Within assembly graphs, each assembly corresponded to a node, and each edge corresponded to the temporal sequence of consecutive node activations. The sum of all assembly activations was proportional to firing rates, but the activity of individual assemblies was not. Assembly repertoire was stable across experimental periods, suggesting that novel experience does not create new assemblies in the adult rat. Assembly graph attributes, on the other hand, varied significantly across behavioral states and experimental periods, and were separable enough to correctly classify experimental periods (Naïve Bayes classifier; maximum AUROCs ranging from 0.55 to 0.99) and behavioral states (waking, slow wave sleep, and rapid eye movement sleep; maximum AUROCs ranging from 0.64 to 0.98). Our findings agree with Hebb's view that assemblies correspond to primitive building blocks of representation, nearly unchanged in the adult, while phase sequences are labile across behavioral states and change after novel experience. The results are compatible with a role for phase sequences in behavior and cognition. PMID:24782715
Kolente virus, a rhabdovirus species isolated from ticks and bats in the Republic of Guinea.
Ghedin, Elodie; Rogers, Matthew B; Widen, Steven G; Guzman, Hilda; Travassos da Rosa, Amelia P A; Wood, Thomas G; Fitch, Adam; Popov, Vsevolod; Holmes, Edward C; Walker, Peter J; Vasilakis, Nikos; Tesh, Robert B
2013-12-01
Kolente virus (KOLEV) is a rhabdovirus originally isolated from ticks and a bat in Guinea, West Africa, in 1985. Although tests at the time of isolation suggested that KOLEV is a novel rhabdovirus, it has remained largely uncharacterized. We assembled the complete genome sequence of the prototype strain DakAr K7292, which was found to encode the five canonical rhabdovirus structural proteins (N, P, M, G and L) with alternative ORFs (>180 nt) in the P and L genes. Serologically, KOLEV exhibited a weak antigenic relationship with Barur and Fukuoka viruses in the Kern Canyon group. Phylogenetic analysis revealed that KOLEV represents a distinct and divergent lineage that shows no clear relationship to any rhabdovirus except Oita virus, although with limited phylogenetic resolution. In summary, KOLEV represents a novel species in the family Rhabdoviridae.
Brumm, Phillip J.; Land, Miriam L.; Mead, David A.
2015-10-05
Geobacillus thermoglucosidasius C56-YS93 was one of several thermophilic organisms isolated from Obsidian Hot Spring, Yellowstone National Park, Montana, USA under permit from the National Park Service. Comparison of 16 S rRNA sequences confirmed the classification of the strain as a G. thermoglucosidasius species. We sequenced the genome, assembled, and annotated by the DOE Joint Genome Institute and deposited at the NCBI in December 2011 (CP002835). Moreover, the genome of G. thermoglucosidasius C56-YS93 consists of one circular chromosome of 3,893,306 bp and two circular plasmids of 80,849 and 19,638 bp and an average G + C content of 43.93 %. G.more » thermoglucosidasius C56-YS93 possesses a xylan degradation cluster not found in the other G. thermoglucosidasius sequenced strains. Furthermore this cluster appears to be related to the xylan degradation cluster found in G. stearothermophilus. G. thermoglucosidasius C56-YS93 possesses two plasmids not found in the other two strains. One plasmid contains a novel gene cluster coding for proteins involved in proline degradation and metabolism, the other contains a collection of mostly hypothetical proteins.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brumm, Phillip J.; Land, Miriam L.; Mead, David A.
Geobacillus thermoglucosidasius C56-YS93 was one of several thermophilic organisms isolated from Obsidian Hot Spring, Yellowstone National Park, Montana, USA under permit from the National Park Service. Comparison of 16 S rRNA sequences confirmed the classification of the strain as a G. thermoglucosidasius species. We sequenced the genome, assembled, and annotated by the DOE Joint Genome Institute and deposited at the NCBI in December 2011 (CP002835). Moreover, the genome of G. thermoglucosidasius C56-YS93 consists of one circular chromosome of 3,893,306 bp and two circular plasmids of 80,849 and 19,638 bp and an average G + C content of 43.93 %. G.more » thermoglucosidasius C56-YS93 possesses a xylan degradation cluster not found in the other G. thermoglucosidasius sequenced strains. Furthermore this cluster appears to be related to the xylan degradation cluster found in G. stearothermophilus. G. thermoglucosidasius C56-YS93 possesses two plasmids not found in the other two strains. One plasmid contains a novel gene cluster coding for proteins involved in proline degradation and metabolism, the other contains a collection of mostly hypothetical proteins.« less
Structure and Evolution of Chlorate Reduction Composite Transposons
Clark, Iain C.; Melnyk, Ryan A.; Engelbrektson, Anna; Coates, John D.
2013-01-01
ABSTRACT The genes for chlorate reduction in six bacterial strains were analyzed in order to gain insight into the metabolism. A newly isolated chlorate-reducing bacterium (Shewanella algae ACDC) and three previously isolated strains (Ideonella dechloratans, Pseudomonas sp. strain PK, and Dechloromarinus chlorophilus NSS) were genome sequenced and compared to published sequences (Alicycliphilus denitrificans BC plasmid pALIDE01 and Pseudomonas chloritidismutans AW-1). De novo assembly of genomes failed to join regions adjacent to genes involved in chlorate reduction, suggesting the presence of repeat regions. Using a bioinformatics approach and finishing PCRs to connect fragmented contigs, we discovered that chlorate reduction genes are flanked by insertion sequences, forming composite transposons in all four newly sequenced strains. These insertion sequences delineate regions with the potential to move horizontally and define a set of genes that may be important for chlorate reduction. In addition to core metabolic components, we have highlighted several such genes through comparative analysis and visualization. Phylogenetic analysis places chlorate reductase within a functionally diverse clade of type II dimethyl sulfoxide (DMSO) reductases, part of a larger family of enzymes with reactivity toward chlorate. Nucleotide-level forensics of regions surrounding chlorite dismutase (cld), as well as its phylogenetic clustering in a betaproteobacterial Cld clade, indicate that cld has been mobilized at least once from a perchlorate reducer to build chlorate respiration. PMID:23919996
Comparing de novo genome assembly: the long and short of it.
Narzisi, Giuseppe; Mishra, Bud
2011-04-29
Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implementations. And yet, scant attention has been paid to comparative assessments of these assemblers' quality and accuracy. No commonly accepted and standardized method for comparison exists yet. Even worse, widely used metrics to compare the assembled sequences emphasize only size, poorly capturing the contig quality and accuracy. This paper addresses these concerns: it highlights common anomalies in assembly accuracy through a rigorous study of several assemblers, compared under both standard metrics (N50, coverage, contig sizes, etc.) as well as a more comprehensive metric (Feature-Response Curves, FRC) that is introduced here; FRC transparently captures the trade-offs between contigs' quality against their sizes. For this purpose, most of the publicly available major sequence assemblers--both for low-coverage long (Sanger) and high-coverage short (Illumina) reads technologies--are compared. These assemblers are applied to microbial (Escherichia coli, Brucella, Wolbachia, Staphylococcus, Helicobacter) and partial human genome sequences (Chr. Y), using sequence reads of various read-lengths, coverages, accuracies, and with and without mate-pairs. It is hoped that, based on these evaluations, computational biologists will identify innovative sequence assembly paradigms, bioinformaticists will determine promising approaches for developing "next-generation" assemblers, and biotechnologists will formulate more meaningful design desiderata for sequencing technology platforms. A new software tool for computing the FRC metric has been developed and is available through the AMOS open-source consortium.
Recent advances in sequence assembly: principles and applications.
Chen, Qingfeng; Lan, Chaowang; Zhao, Liang; Wang, Jianxin; Chen, Baoshan; Chen, Yi-Ping Phoebe
2017-11-01
The application of advanced sequencing technologies and the rapid growth of various sequence data have led to increasing interest in DNA sequence assembly. However, repeats and polymorphism occur frequently in genomes, and each of these has different impacts on assembly. Further, many new applications for sequencing, such as metagenomics regarding multiple species, have emerged in recent years. These not only give rise to higher complexity but also prevent short-read assembly in an efficient way. This article reviews the theoretical foundations that underlie current mapping-based assembly and de novo-based assembly, and highlights the key issues and feasible solutions that need to be considered. It focuses on how individual processes, such as optimal k-mer determination and error correction in assembly, rely on intelligent strategies or high-performance computation. We also survey primary algorithms/software and offer a discussion on the emerging challenges in assembly. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
2011-01-01
Background BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. Results This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. Conclusions Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed. PMID:21794110
Feltus, Frank A; Saski, Christopher A; Mockaitis, Keithanne; Haiminen, Niina; Parida, Laxmi; Smith, Zachary; Ford, James; Staton, Margaret E; Ficklin, Stephen P; Blackmon, Barbara P; Cheng, Chun-Huai; Schnell, Raymond J; Kuhn, David N; Motamayor, Juan-Carlos
2011-07-27
BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed.
Zhao, Mengran; Hsiang, Tom; Feng, Xiaoxing
2016-01-01
Noncoding RNAs (ncRNAs) have been identified in many fungi. However, no genome-scale identification of ncRNAs has been inventoried for basidiomycetes. In this research, we detected 254 small noncoding RNAs (sncRNAs) in a genome assembly of an isolate (CCEF00389) of Pleurotus ostreatus, which is a widely cultivated edible basidiomycetous fungus worldwide. The identified sncRNAs include snRNAs, snoRNAs, tRNAs, and miRNAs. SnRNA U1 was not found in CCEF00389 genome assembly and some other basidiomycetous genomes by BLASTn. This implies that if snRNA U1 of basidiomycetes exists, it has a sequence that varies significantly from other organisms. By analyzing the distribution of sncRNA loci, we found that snRNAs and most tRNAs (88.6%) were located in pseudo-UTR regions, while miRNAs are commonly found in introns. To analyze the evolutionary conservation of the sncRNAs in P. ostreatus, we aligned all 254 sncRNAs to the genome assemblies of some other Agaricomycotina fungi. The results suggest that most sncRNAs (77.56%) were highly conserved in P. ostreatus, and 20% were conserved in Agaricomycotina fungi. These findings indicate that most sncRNAs of P. ostreatus were not conserved across Agaricomycotina fungi. PMID:27703969
Polyclonal emergence of vanA vancomycin-resistant Enterococcus faecium in Australia.
van Hal, Sebastiaan J; Espedido, Björn A; Coombs, Geoffrey W; Howden, Benjamin P; Korman, Tony M; Nimmo, Graeme R; Gosbell, Iain B; Jensen, Slade O
2017-04-01
To investigate the genetic context associated with the emergence of vanA VRE in Australia. The whole genomes of 18 randomly selected vanA -positive Enterococcus faecium patient isolates, collected between 2011 and 2013 from hospitals in four Australian capitals, were sequenced and analysed. In silico typing and transposon/plasmid assembly revealed that the sequenced isolates represented (in most cases) different hospital-adapted STs and were associated with a variety of different Tn 1546 variants and plasmid backbone structures. The recent emergence of vanA VRE in Australia was polyclonal and not associated with the dissemination of a single 'dominant' ST or vanA -encoding plasmid. Interestingly, the factors contributing to this epidemiological change are not known and future studies may need to consider investigation of potential community sources. © The Author 2016. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Comparative Genomics of Bacteriophage of the Genus Seuratvirus
Sazinas, Pavelas; Redgwell, Tamsin; Rihtman, Branko; Grigonyte, Aurelija; Michniewski, Slawomir; Scanlan, David J; Hobman, Jon
2018-01-01
Abstract Despite being more abundant and having smaller genomes than their bacterial host, relatively few bacteriophages have had their genomes sequenced. Here, we isolated 14 bacteriophages from cattle slurry and performed de novo genome sequencing, assembly, and annotation. The commonly used marker genes polB and terL showed these bacteriophages to be closely related to members of the genus Seuratvirus. We performed a core-gene analysis using the 14 new and four closely related genomes. A total of 58 core genes were identified, the majority of which has no known function. These genes were used to construct a core-gene phylogeny, the results of which confirmed the new isolates to be part of the genus Seuratvirus and expanded the number of species within this genus to four. All bacteriophages within the genus contained the genes queCDE encoding enzymes involved in queuosine biosynthesis. We suggest these genes are carried as a mechanism to modify DNA in order to protect these bacteriophages against host endonucleases. PMID:29272407
Isolation of a Breast Cancer Tumor Suppressor Gene from Chromosome 3p
1997-10-01
and persistence of HPV infection and p53 mutation in cancer of the cervix uteri and the vulva. Int. J. Cancer . 63, 639-645. 17 Nancarrow, J.K., Holman...heterozygosity on the short arm of chromosome 3 in carcinoma of the uterine cervix . Cancer Res. 49, 3598-3601. 9. APPENDIX. Figure Legends: Figure 2. Map...uterine cervix . Cancer Asmssimilarities by 1Res., 49, 3598-3601.Assembled sequences were analyzed for database14. Cohen, A.J., Li, F.P., Berg, S
Synthetic muscle promoters: activities exceeding naturally occurring regulatory sequences
NASA Technical Reports Server (NTRS)
Li, X.; Eastman, E. M.; Schwartz, R. J.; Draghia-Akli, R.
1999-01-01
Relatively low levels of expression from naturally occurring promoters have limited the use of muscle as a gene therapy target. Myogenic restricted gene promoters display complex organization usually involving combinations of several myogenic regulatory elements. By random assembly of E-box, MEF-2, TEF-1, and SRE sites into synthetic promoter recombinant libraries, and screening of hundreds of individual clones for transcriptional activity in vitro and in vivo, several artificial promoters were isolated whose transcriptional potencies greatly exceed those of natural myogenic and viral gene promoters.
Chingandu, Nomatter; Zia-Ur-Rehman, Muhammad; Sreenivasan, Thyail N; Surujdeo-Maharaj, Surendra; Umaharan, Pathmanathan; Gutierrez, Osman A; Brown, Judith K
2017-05-01
Suspected virus-like symptoms were observed in cacao plants in Trinidad during 1943, and the viruses associated with these symptoms were designated as strains A and B of cacao Trinidad virus (CTV). However, viral etiology has not been demonstrated for either phenotype. Total DNA was isolated from symptomatic cacao leaves exhibiting the CTV A and B phenotypes and subjected to Illumina HiSeq and Sanger DNA sequencing. Based on de novo assembly, two apparently full-length badnavirus genomes of 7,533 and 7,454 nucleotides (nt) were associated with CTV strain A and B, respectively. The Trinidad badnaviral genomes contained four open reading frames, three of which are characteristic of other known badnaviruses, and a fourth that is present in only some badnaviruses. Both badnaviral genomes harbored hallmark caulimovirus-like features, including a tRNA Met priming site, a TATA box, and a polyadenylation-like signal. Pairwise comparisons of the RT-RNase H region indicated that the Trinidad isolates share 57-71% nt sequence identity with other known badnaviruses. Based on the system for badnavirus species demarcation in which viruses with less than 80% nt sequence identity in the RT-RNase gene are considered members of separate species, these isolates represent two previously unidentified badnaviruses, herein named cacao mild mosaic virus and cacao yellow vein banding virus, making them the first cacao-infecting badnaviruses identified thus far in the Western Hemisphere.
Houston, L S; Cook, R G; Norris, S J
1990-01-01
A native structure containing the major 60-kilodalton common antigen polypeptide (designated TpN60) was isolated from Treponema pallidum subsp. pallidum (Nichols strain) through a combination of differential centrifugation and sucrose density gradient sedimentation. Gel filtration chromatography indicated that this structure is a high-molecular-weight homo-oligomer of TpN60. Antisera to TpN60 reacted with the groEL polypeptide of Escherichia coli, as determined by immunoperoxidase staining of two-dimensional electroblots. Electron microscopy of the isolated complex revealed a ringlike structure with a diameter of approximately 16 nm which was very similar in appearance to the groEL protein. Comparison of the N-terminal amino acid sequence of TpN60 with the deduced sequences of the E. coli groEL protein, related chaperonin proteins from mycobacteria and Coxiella burnetti, the hsp60 protein of Saccharomyces cerevisiae, the wheat ribulose bisphosphate carboxylase-oxygenase-subunit-binding protein (alpha subunit), and the human P1 mitochondrial protein indicated sequence identity at 8 of 22 to 10 of 22 residues (36 to 45% identity). We conclude that the oligomer of TpN60 is homologous to the groEL protein and related chaperonins found in a wide variety of procaryotes and eucaryotes and thus may represent a heat shock protein involved in protein folding and assembly. Images PMID:1971618
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies.
Utturkar, Sagar M; Klingeman, Dawn M; Hurt, Richard A; Brown, Steven D
2017-01-01
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.
Evans, Teri; Johnson, Andrew D; Loose, Matthew
2018-01-12
Large repeat rich genomes present challenges for assembly using short read technologies. The 32 Gb axolotl genome is estimated to contain ~19 Gb of repetitive DNA making an assembly from short reads alone effectively impossible. Indeed, this model species has been sequenced to 20× coverage but the reads could not be conventionally assembled. Using an alternative strategy, we have assembled subsets of these reads into scaffolds describing over 19,000 gene models. We call this method Virtual Genome Walking as it locally assembles whole genome reads based on a reference transcriptome, identifying exons and iteratively extending them into surrounding genomic sequence. These assemblies are then linked and refined to generate gene models including upstream and downstream genomic, and intronic, sequence. Our assemblies are validated by comparison with previously published axolotl bacterial artificial chromosome (BAC) sequences. Our analyses of axolotl intron length, intron-exon structure, repeat content and synteny provide novel insights into the genic structure of this model species. This resource will enable new experimental approaches in axolotl, such as ChIP-Seq and CRISPR and aid in future whole genome sequencing efforts. The assembled sequences and annotations presented here are freely available for download from https://tinyurl.com/y8gydc6n . The software pipeline is available from https://github.com/LooseLab/iterassemble .
2010-01-01
Background Corynebacterium pseudotuberculosis is generally regarded as an important animal pathogen that rarely infects humans. Clinical strains are occasionally recovered from human cases of lymphadenitis, such as C. pseudotuberculosis FRC41 that was isolated from the inguinal lymph node of a 12-year-old girl with necrotizing lymphadenitis. To detect potential virulence factors and corresponding gene-regulatory networks in this human isolate, the genome sequence of C. pseudotuberculosis FCR41 was determined by pyrosequencing and functionally annotated. Results Sequencing and assembly of the C. pseudotuberculosis FRC41 genome yielded a circular chromosome with a size of 2,337,913 bp and a mean G+C content of 52.2%. Specific gene sets associated with iron and zinc homeostasis were detected among the 2,110 predicted protein-coding regions and integrated into a gene-regulatory network that is linked with both the central metabolism and the oxidative stress response of FRC41. Two gene clusters encode proteins involved in the sortase-mediated polymerization of adhesive pili that can probably mediate the adherence to host tissue to facilitate additional ligand-receptor interactions and the delivery of virulence factors. The prominent virulence factors phospholipase D (Pld) and corynebacterial protease CP40 are encoded in the genome of this human isolate. The genome annotation revealed additional serine proteases, neuraminidase H, nitric oxide reductase, an invasion-associated protein, and acyl-CoA carboxylase subunits involved in mycolic acid biosynthesis as potential virulence factors. The cAMP-sensing transcription regulator GlxR plays a key role in controlling the expression of several genes contributing to virulence. Conclusion The functional data deduced from the genome sequencing and the extended knowledge of virulence factors indicate that the human isolate C. pseudotuberculosis FRC41 is equipped with a distinct gene set promoting its survival under unfavorable environmental conditions encountered in the mammalian host. PMID:21192786
Strickland, Michelle; Tudorica, Victor; Řezáč, Milan; Thomas, Neil R; Goodacre, Sara L
2018-06-01
Spiders produce multiple silks with different physical properties that allow them to occupy a diverse range of ecological niches, including the underwater environment. Despite this functional diversity, past molecular analyses show a high degree of amino acid sequence similarity between C-terminal regions of silk genes that appear to be independent of the physical properties of the resulting silks; instead, this domain is crucial to the formation of silk fibers. Here, we present an analysis of the C-terminal domain of all known types of spider silk and include silk sequences from the spider Argyroneta aquatica, which spins the majority of its silk underwater. Our work indicates that spiders have retained a highly conserved mechanism of silk assembly, despite the extraordinary diversification of species, silk types and applications of silk over 350 million years. Sequence analysis of the silk C-terminal domain across the entire gene family shows the conservation of two uncommon amino acids that are implicated in the formation of a salt bridge, a functional bond essential to protein assembly. This conservation extends to the novel sequences isolated from A. aquatica. This finding is relevant to research regarding the artificial synthesis of spider silk, suggesting that synthesis of all silk types will be possible using a single process.
Hosokawa, Masahito; Nishikawa, Yohei; Kogawa, Masato; Takeyama, Haruko
2017-07-12
Massively parallel single-cell genome sequencing is required to further understand genetic diversities in complex biological systems. Whole genome amplification (WGA) is the first step for single-cell sequencing, but its throughput and accuracy are insufficient in conventional reaction platforms. Here, we introduce single droplet multiple displacement amplification (sd-MDA), a method that enables massively parallel amplification of single cell genomes while maintaining sequence accuracy and specificity. Tens of thousands of single cells are compartmentalized in millions of picoliter droplets and then subjected to lysis and WGA by passive droplet fusion in microfluidic channels. Because single cells are isolated in compartments, their genomes are amplified to saturation without contamination. This enables the high-throughput acquisition of contamination-free and cell specific sequence reads from single cells (21,000 single-cells/h), resulting in enhancement of the sequence data quality compared to conventional methods. This method allowed WGA of both single bacterial cells and human cancer cells. The obtained sequencing coverage rivals those of conventional techniques with superior sequence quality. In addition, we also demonstrate de novo assembly of uncultured soil bacteria and obtain draft genomes from single cell sequencing. This sd-MDA is promising for flexible and scalable use in single-cell sequencing.
A fast sequence assembly method based on compressed data structures.
Liang, Peifeng; Zhang, Yancong; Lin, Kui; Hu, Jinglu
2014-01-01
Assembling a large genome using next generation sequencing reads requires large computer memory and a long execution time. To reduce these requirements, a memory and time efficient assembler is presented from applying FM-index in JR-Assembler, called FMJ-Assembler, where FM stand for FMR-index derived from the FM-index and BWT and J for jumping extension. The FMJ-Assembler uses expanded FM-index and BWT to compress data of reads to save memory and jumping extension method make it faster in CPU time. An extensive comparison of the FMJ-Assembler with current assemblers shows that the FMJ-Assembler achieves a better or comparable overall assembly quality and requires lower memory use and less CPU time. All these advantages of the FMJ-Assembler indicate that the FMJ-Assembler will be an efficient assembly method in next generation sequencing technology.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tian, Rui; Parker, Matthew; Seshadri, Rekha
Bradyrhizobium sp. Ai1a-2 is is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen fixing root nodule of Andira inermis collected from Tres Piedras in Costa Rica. In this report we describe, for the first time, the genome sequence information and annotation of this legume microsymbiont. The 9,029,266 bp genome has a GC content of 62.56% with 247 contigs arranged into 246 scaffolds. The assembled genome contains 8,482 protein-coding genes and 102 RNA-only encoding genes. Lastly, this rhizobial genome was sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Rootmore » Nodule Bacteria (GEBA-RNB) project proposal.« less
Tian, Rui; Parker, Matthew; Seshadri, Rekha; ...
2015-06-14
Bradyrhizobium sp. Ai1a-2 is is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen fixing root nodule of Andira inermis collected from Tres Piedras in Costa Rica. In this report we describe, for the first time, the genome sequence information and annotation of this legume microsymbiont. The 9,029,266 bp genome has a GC content of 62.56% with 247 contigs arranged into 246 scaffolds. The assembled genome contains 8,482 protein-coding genes and 102 RNA-only encoding genes. Lastly, this rhizobial genome was sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Rootmore » Nodule Bacteria (GEBA-RNB) project proposal.« less
GENESUS: a two-step sequence design program for DNA nanostructure self-assembly.
Tsutsumi, Takanobu; Asakawa, Takeshi; Kanegami, Akemi; Okada, Takao; Tahira, Tomoko; Hayashi, Kenshi
2014-01-01
DNA has been recognized as an ideal material for bottom-up construction of nanometer scale structures by self-assembly. The generation of sequences optimized for unique self-assembly (GENESUS) program reported here is a straightforward method for generating sets of strand sequences optimized for self-assembly of arbitrarily designed DNA nanostructures by a generate-candidates-and-choose-the-best strategy. A scalable procedure to prepare single-stranded DNA having arbitrary sequences is also presented. Strands for the assembly of various structures were designed and successfully constructed, validating both the program and the procedure.
Chen, Shen-Bo; Wang, Yue; Kassegne, Kokouvi; Xu, Bin; Shen, Hai-Mo; Chen, Jun-Hu
2017-02-06
Currently in China, the trend of Plasmodium vivax cases imported from Southeast Asia was increased especially in the China-Myanmar border area. Driven by the increase in P. vivax cases and stronger need for vaccine and drug development, several P. vivax isolates genome sequencing projects are underway. However, little is known about the genetic variability in this area until now. The sequencing of the first P. vivax isolate from China-Myanmar border area (CMB-1) generated 120 million paired-end reads. A percentage of 10.6 of the quality-evaluated reads were aligned onto 99.9% of the reference strain Sal I genome in 62-fold coverage with an average of 4.8 SNPs per kb. We present a 539-SNP marker data set for P. vivax that can identify different parasites from different geographic origins with high sensitivity. We also identified exceptionally high levels of genetic variability in members of multigene families such as RBP, SERA, vir, MSP3 and AP2. The de-novo assembly yielded a database composed of 8,409 contigs with N50 lengths of 6.6 kb and revealed 661 novel predicted genes including 78 vir genes, suggesting a greater functional variation in P. vivax from this area. Our result contributes to a better understanding of P. vivax genetic variation, and provides a fundamental basis for the geographic differentiation of vivax malaria from China-Myanmar border area using a direct sequencing approach without leukocyte depletion. This novel sequencing method can be used as an essential tool for the genomic research of P. vivax in the near future.
Maiga, Mamoudou; Abeel, Thomas; Shea, Terrance; Desjardins, Christopher A.; Diarra, Bassirou; Baya, Bocar; Sanogo, Moumine; Diallo, Souleymane; Earl, Ashlee M.; Bishai, William R.
2016-01-01
Background Mycobacterium africanum, made up of lineages 5 and 6 within the Mycobacterium tuberculosis complex (MTC), causes up to half of all tuberculosis cases in West Africa, but is rarely found outside of this region. The reasons for this geographical restriction remain unknown. Possible reasons include a geographically restricted animal reservoir, a unique preference for hosts of West African ethnicity, and an inability to compete with other lineages outside of West Africa. These latter two hypotheses could be caused by loss of fitness or altered interactions with the host immune system. Methodology/Principal Findings We sequenced 92 MTC clinical isolates from Mali, including two lineage 5 and 24 lineage 6 strains. Our genome sequencing assembly, alignment, phylogeny and average nucleotide identity analyses enabled us to identify features that typify lineages 5 and 6 and made clear that these lineages do not constitute a distinct species within the MTC. We found that in Mali, lineage 6 and lineage 4 strains have similar levels of diversity and evolve drug resistance through similar mechanisms. In the process, we identified a putative novel streptomycin resistance mutation. In addition, we found evidence of person-to-person transmission of lineage 6 isolates and showed that lineage 6 is not enriched for mutations in virulence-associated genes. Conclusions This is the largest collection of lineage 5 and 6 whole genome sequences to date, and our assembly and alignment data provide valuable insights into what distinguishes these lineages from other MTC lineages. Lineages 5 and 6 do not appear to be geographically restricted due to an inability to transmit between West African hosts or to an elevated number of mutations in virulence-associated genes. However, lineage-specific mutations, such as mutations in cell wall structure, secretion systems and cofactor biosynthesis, provide alternative mechanisms that may lead to host specificity. PMID:26751217
Utturkar, Sagar M.; Klingeman, Dawn Marie; Land, Miriam L.; ...
2014-06-14
Our motivation with this work was to assess the potential of different types of sequence data combined with de novo and hybrid assembly approaches to improve existing draft genome sequences. Our results show Illumina, 454 and PacBio sequencing technologies were used to generate de novo and hybrid genome assemblies for four different bacteria, which were assessed for quality using summary statistics (e.g. number of contigs, N50) and in silico evaluation tools. Differences in predictions of multiple copies of rDNA operons for each respective bacterium were evaluated by PCR and Sanger sequencing, and then the validated results were applied as anmore » additional criterion to rank assemblies. In general, assemblies using longer PacBio reads were better able to resolve repetitive regions. In this study, the combination of Illumina and PacBio sequence data assembled through the ALLPATHS-LG algorithm gave the best summary statistics and most accurate rDNA operon number predictions. This study will aid others looking to improve existing draft genome assemblies. As to availability and implementation–all assembly tools except CLC Genomics Workbench are freely available under GNU General Public License.« less
Ion thruster design and analysis
NASA Technical Reports Server (NTRS)
Kami, S.; Schnelker, D. E.
1976-01-01
Questions concerning the mechanical design of a thruster are considered, taking into account differences in the design of an 8-cm and a 30-cm model. The components of a thruster include the thruster shell assembly, the ion extraction electrode assembly, the cathode isolator vaporizer assembly, the neutralizer isolator vaporizer assembly, ground screen and mask, and the main isolator vaporizer assembly. Attention is given to the materials used in thruster fabrication, the advanced manufacturing methods used, details of thruster performance, an evaluation of thruster life, structural and thermal design considerations, and questions of reliability and quality assurance.
Aguilera-Cogley, Vidal Antonio; Berbegal, Mónica; Català, Santiago; Brentu, Francis Collison; Armengol, Josep; Vicent, Antonio
2017-01-01
Greasy spot of citrus, caused by Zasmidium citri-griseum (= Mycosphaerella citri), is widely distributed in the Caribbean Basin, inducing leaf spots, premature defoliation, and yield loss. Greasy spot-like symptoms were frequently observed in humid citrus-growing regions in Panama as well as in semi-arid areas in Spain, but disease aetiology was unknown. Citrus-growing areas in Panama and Spain were surveyed and isolates of Mycosphaerellaceae were obtained from citrus greasy spot lesions. A selection of isolates from Panama (n = 22) and Spain (n = 16) was assembled based on their geographical origin, citrus species, and affected tissue. The isolates were characterized based on multi-locus DNA (ITS and EF-1α) sequence analyses, morphology, growth at different temperatures, and independent pathogenicity tests on the citrus species most affected in each country. Reference isolates and sequences were also included in the analysis. Isolates from Panama were identified as Z. citri-griseum complex, and others from Spain attributed to Amycosphaerella africana. Isolates of the Z. citri-griseum complex had a significantly higher optimal growth temperature (26.8°C) than those of A. africana (19.3°C), which corresponded well with their actual biogeographical range. The isolates of the Z. citri-griseum complex from Panama induced typical greasy spot symptoms in 'Valencia' sweet orange plants and the inoculated fungi were reisolated. No symptoms were observed in plants of the 'Ortanique' tangor inoculated with A. africana. These results demonstrate the presence of citrus greasy spot, caused by Z. citri-griseum complex, in Panama whereas A. africana was associated with greasy spot-like symptoms in Spain.
Evolutionary origins of the emergent ST796 clone of vancomycin resistant Enterococcus faecium
Buultjens, Andrew H.; Lam, Margaret M.C.; Ballard, Susan; Monk, Ian R.; Mahony, Andrew A.; Grabsch, Elizabeth A.; Grayson, M. Lindsay; Pang, Stanley; Coombs, Geoffrey W.; Robinson, J. Owen; Seemann, Torsten; Howden, Benjamin P.
2017-01-01
From early 2012, a novel clone of vancomycin resistant Enterococcus faecium (assigned the multi locus sequence type ST796) was simultaneously isolated from geographically separate hospitals in south eastern Australia and New Zealand. Here we describe the complete genome sequence of Ef_aus0233, a representative ST796 E. faecium isolate. We used PacBio single molecule real-time sequencing to establish a high quality, fully assembled genome comprising a circular chromosome of 2,888,087 bp and five plasmids. Comparison of Ef_aus0233 to other E. faecium genomes shows Ef_aus0233 is a member of the epidemic hospital-adapted lineage and has evolved from an ST555-like ancestral progenitor by the accumulation or modification of five mosaic plasmids and five putative prophage, acquisition of two cryptic genomic islands, accrued chromosomal single nucleotide polymorphisms and a 80 kb region of recombination, also gaining Tn1549 and Tn916, transposons conferring resistance to vancomycin and tetracycline respectively. The genomic dissection of this new clone presented here underscores the propensity of the hospital E. faecium lineage to change, presumably in response to the specific conditions of hospital and healthcare environments. PMID:28149688
A new strategy for genome assembly using short sequence reads and reduced representation libraries.
Young, Andrew L; Abaan, Hatice Ozel; Zerbino, Daniel; Mullikin, James C; Birney, Ewan; Margulies, Elliott H
2010-02-01
We have developed a novel approach for using massively parallel short-read sequencing to generate fast and inexpensive de novo genomic assemblies comparable to those generated by capillary-based methods. The ultrashort (<100 base) sequences generated by this technology pose specific biological and computational challenges for de novo assembly of large genomes. To account for this, we devised a method for experimentally partitioning the genome using reduced representation (RR) libraries prior to assembly. We use two restriction enzymes independently to create a series of overlapping fragment libraries, each containing a tractable subset of the genome. Together, these libraries allow us to reassemble the entire genome without the need of a reference sequence. As proof of concept, we applied this approach to sequence and assembled the majority of the 125-Mb Drosophila melanogaster genome. We subsequently demonstrate the accuracy of our assembly method with meaningful comparisons against the current available D. melanogaster reference genome (dm3). The ease of assembly and accuracy for comparative genomics suggest that our approach will scale to future mammalian genome-sequencing efforts, saving both time and money without sacrificing quality.
Lei, Yong-Liang; Wang, Xiao-Guang; Tao, Xiao-Yan; Li, Hao; Meng, Sheng-Li; Chen, Xiu-Ying; Liu, Fu-Ming; Ye, Bi-Feng; Tang, Qing
2010-01-01
Based on sequencing the full-length genomes of four Chinese Ferret-Badger and dog, we analyze the properties of rabies viruses genetic variation in molecular level, get the information about rabies viruses prevalence and variation in Zhejiang, and enrich the genome database of rabies viruses street strains isolated from China. Rabies viruses in suckling mice were isolated, overlapped fragments were amplified by RT-PCR and full-length genomes were assembled to analyze the nucleotide and deduced protein similarities and phylogenetic analyses from Chinese Ferret-Badger, dog, sika deer, vole, used vaccine strain were determined. The four full-length genomes were sequenced completely and had the same genetic structure with the length of 11, 923 nts or 11, 925 nts including 58 nts-Leader, 1353 nts-NP, 894 nts-PP, 609 nts-MP, 1575 nts-GP, 6386 nts-LP, and 2, 5, 5 nts- intergenic regions(IGRs), 423 nts-Pseudogene-like sequence (psi), 70 nts-Trailer. The four full-length genomes were in accordance with the properties of Rhabdoviridae Lyssa virus by BLAST and multi-sequence alignment. The nucleotide and amino acid sequences among Chinese strains had the highest similarity, especially among animals of the same species. Of the four full-length genomes, the similarity in amino acid level was dramatically higher than that in nucleotide level, so the nucleotide mutations happened in these four genomes were most synonymous mutations. Compared with the reference rabies viruses, the lengths of the five protein coding regions had no change, no recombination, only with a few point mutations. It was evident that the five proteins appeared to be stable. The variation sites and types of the four genomes were similar to the reference vaccine or street strains. And the four strains were genotype 1 according to the multi-sequence and phylogenetic analyses, which possessed the distinct district characteristics of China. Therefore, these four rabies viruses are likely to be street viruses already existing in the natural world.
Construction of Red Fox Chromosomal Fragments from the Short-Read Genome Assembly.
Rando, Halie M; Farré, Marta; Robson, Michael P; Won, Naomi B; Johnson, Jennifer L; Buch, Ronak; Bastounes, Estelle R; Xiang, Xueyan; Feng, Shaohong; Liu, Shiping; Xiong, Zijun; Kim, Jaebum; Zhang, Guojie; Trut, Lyudmila N; Larkin, Denis M; Kukekova, Anna V
2018-06-20
The genome of a red fox ( Vulpes vulpes ) was recently sequenced and assembled using next-generation sequencing (NGS). The assembly is of high quality, with 94X coverage and a scaffold N50 of 11.8 Mbp, but is split into 676,878 scaffolds, some of which are likely to contain assembly errors. Fragmentation and misassembly hinder accurate gene prediction and downstream analysis such as the identification of loci under selection. Therefore, assembly of the genome into chromosome-scale fragments was an important step towards developing this genomic model. Scaffolds from the assembly were aligned to the dog reference genome and compared to the alignment of an outgroup genome (cat) against the dog to identify syntenic sequences among species. The program Reference-Assisted Chromosome Assembly (RACA) then integrated the comparative alignment with the mapping of the raw sequencing reads generated during assembly against the fox scaffolds. The 128 sequence fragments RACA assembled were compared to the fox meiotic linkage map to guide the construction of 40 chromosomal fragments. This computational approach to assembly was facilitated by prior research in comparative mammalian genomics, and the continued improvement of the red fox genome can in turn offer insight into canid and carnivore chromosome evolution. This assembly is also necessary for advancing genetic research in foxes and other canids.
Holmes, Anne; Allison, Lesley; Ward, Melissa; Dallman, Timothy J; Clark, Richard; Fawkes, Angie; Murphy, Lee; Hanson, Mary
2015-11-01
Detailed laboratory characterization of Escherichia coli O157 is essential to inform epidemiological investigations. This study assessed the utility of whole-genome sequencing (WGS) for outbreak detection and epidemiological surveillance of E. coli O157, and the data were used to identify discernible associations between genotypes and clinical outcomes. One hundred five E. coli O157 strains isolated over a 5-year period from human fecal samples in Lothian, Scotland, were sequenced with the Ion Torrent Personal Genome Machine. A total of 8,721 variable sites in the core genome were identified among the 105 isolates; 47% of the single nucleotide polymorphisms (SNPs) were attributable to six "atypical" E. coli O157 strains and included recombinant regions. Phylogenetic analyses showed that WGS correlated well with the epidemiological data. Epidemiological links existed between cases whose isolates differed by three or fewer SNPs. WGS also correlated well with multilocus variable-number tandem repeat analysis (MLVA) typing data, with only three discordant results observed, all among isolates from cases not known to be epidemiologically related. WGS produced a better-supported, higher-resolution phylogeny than MLVA, confirming that the method is more suitable for epidemiological surveillance of E. coli O157. A combination of in silico analyses (VirulenceFinder, ResFinder, and local BLAST searches) were used to determine stx subtypes, multilocus sequence types (15 loci), and the presence of virulence and acquired antimicrobial resistance genes. There was a high level of correlation between the WGS data and our routine typing methods, although some discordant results were observed, mostly related to the limitation of short sequence read assembly. The data were used to identify sublineages and clades of E. coli O157, and when they were correlated with the clinical outcome data, they showed that one clade, Ic3, was significantly associated with severe disease. Together, the results show that WGS data can provide higher resolution of the relationships between E. coli O157 isolates than that provided by MLVA. The method has the potential to streamline the laboratory workflow and provide detailed information for the clinical management of patients and public health interventions. Copyright © 2015, Holmes et al.
Allison, Lesley; Ward, Melissa; Dallman, Timothy J.; Clark, Richard; Fawkes, Angie; Murphy, Lee; Hanson, Mary
2015-01-01
Detailed laboratory characterization of Escherichia coli O157 is essential to inform epidemiological investigations. This study assessed the utility of whole-genome sequencing (WGS) for outbreak detection and epidemiological surveillance of E. coli O157, and the data were used to identify discernible associations between genotypes and clinical outcomes. One hundred five E. coli O157 strains isolated over a 5-year period from human fecal samples in Lothian, Scotland, were sequenced with the Ion Torrent Personal Genome Machine. A total of 8,721 variable sites in the core genome were identified among the 105 isolates; 47% of the single nucleotide polymorphisms (SNPs) were attributable to six “atypical” E. coli O157 strains and included recombinant regions. Phylogenetic analyses showed that WGS correlated well with the epidemiological data. Epidemiological links existed between cases whose isolates differed by three or fewer SNPs. WGS also correlated well with multilocus variable-number tandem repeat analysis (MLVA) typing data, with only three discordant results observed, all among isolates from cases not known to be epidemiologically related. WGS produced a better-supported, higher-resolution phylogeny than MLVA, confirming that the method is more suitable for epidemiological surveillance of E. coli O157. A combination of in silico analyses (VirulenceFinder, ResFinder, and local BLAST searches) were used to determine stx subtypes, multilocus sequence types (15 loci), and the presence of virulence and acquired antimicrobial resistance genes. There was a high level of correlation between the WGS data and our routine typing methods, although some discordant results were observed, mostly related to the limitation of short sequence read assembly. The data were used to identify sublineages and clades of E. coli O157, and when they were correlated with the clinical outcome data, they showed that one clade, Ic3, was significantly associated with severe disease. Together, the results show that WGS data can provide higher resolution of the relationships between E. coli O157 isolates than that provided by MLVA. The method has the potential to streamline the laboratory workflow and provide detailed information for the clinical management of patients and public health interventions. PMID:26354815
Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes.
Oyola, Samuel O; Otto, Thomas D; Gu, Yong; Maslen, Gareth; Manske, Magnus; Campino, Susana; Turner, Daniel J; Macinnis, Bronwyn; Kwiatkowski, Dominic P; Swerdlow, Harold P; Quail, Michael A
2012-01-03
Massively parallel sequencing technology is revolutionizing approaches to genomic and genetic research. Since its advent, the scale and efficiency of Next-Generation Sequencing (NGS) has rapidly improved. In spite of this success, sequencing genomes or genomic regions with extremely biased base composition is still a great challenge to the currently available NGS platforms. The genomes of some important pathogenic organisms like Plasmodium falciparum (high AT content) and Mycobacterium tuberculosis (high GC content) display extremes of base composition. The standard library preparation procedures that employ PCR amplification have been shown to cause uneven read coverage particularly across AT and GC rich regions, leading to problems in genome assembly and variation analyses. Alternative library-preparation approaches that omit PCR amplification require large quantities of starting material and hence are not suitable for small amounts of DNA/RNA such as those from clinical isolates. We have developed and optimized library-preparation procedures suitable for low quantity starting material and tolerant to extremely high AT content sequences. We have used our optimized conditions in parallel with standard methods to prepare Illumina sequencing libraries from a non-clinical and a clinical isolate (containing ~53% host contamination). By analyzing and comparing the quality of sequence data generated, we show that our optimized conditions that involve a PCR additive (TMAC), produces amplified libraries with improved coverage of extremely AT-rich regions and reduced bias toward GC neutral templates. We have developed a robust and optimized Next-Generation Sequencing library amplification method suitable for extremely AT-rich genomes. The new amplification conditions significantly reduce bias and retain the complexity of either extremes of base composition. This development will greatly benefit sequencing clinical samples that often require amplification due to low mass of DNA starting material.
Hybrid error correction and de novo assembly of single-molecule sequencing reads
Koren, Sergey; Schatz, Michael C.; Walenz, Brian P.; Martin, Jeffrey; Howard, Jason; Ganapathy, Ganeshkumar; Wang, Zhong; Rasko, David A.; McCombie, W. Richard; Jarvis, Erich D.; Phillippy, Adam M.
2012-01-01
Emerging single-molecule sequencing instruments can generate multi-kilobase sequences with the potential to dramatically improve genome and transcriptome assembly. However, the high error rate of single-molecule reads is challenging, and has limited their use to resequencing bacteria. To address this limitation, we introduce a novel correction algorithm and assembly strategy that utilizes shorter, high-identity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on Pacbio RS reads of phage, prokaryotic, and eukaryotic whole genomes, including the novel genome of the parrot Melopsittacus undulatus, as well as for RNA-seq reads of the corn (Zea mays) transcriptome. Our approach achieves over 99.9% read correction accuracy and produces substantially better assemblies than current sequencing strategies: in the best example, quintupling the median contig size relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly. PMID:22750884
SIMBA: a web tool for managing bacterial genome assembly generated by Ion PGM sequencing technology.
Mariano, Diego C B; Pereira, Felipe L; Aguiar, Edgar L; Oliveira, Letícia C; Benevides, Leandro; Guimarães, Luís C; Folador, Edson L; Sousa, Thiago J; Ghosh, Preetam; Barh, Debmalya; Figueiredo, Henrique C P; Silva, Artur; Ramos, Rommel T J; Azevedo, Vasco A C
2016-12-15
The evolution of Next-Generation Sequencing (NGS) has considerably reduced the cost per sequenced-base, allowing a significant rise of sequencing projects, mainly in prokaryotes. However, the range of available NGS platforms requires different strategies and software to correctly assemble genomes. Different strategies are necessary to properly complete an assembly project, in addition to the installation or modification of various software. This requires users to have significant expertise in these software and command line scripting experience on Unix platforms, besides possessing the basic expertise on methodologies and techniques for genome assembly. These difficulties often delay the complete genome assembly projects. In order to overcome this, we developed SIMBA (SImple Manager for Bacterial Assemblies), a freely available web tool that integrates several component tools for assembling and finishing bacterial genomes. SIMBA provides a friendly and intuitive user interface so bioinformaticians, even with low computational expertise, can work under a centralized administrative control system of assemblies managed by the assembly center head. SIMBA guides the users to execute assembly process through simple and interactive pages. SIMBA workflow was divided in three modules: (i) projects: allows a general vision of genome sequencing projects, in addition to data quality analysis and data format conversions; (ii) assemblies: allows de novo assemblies with the software Mira, Minia, Newbler and SPAdes, also assembly quality validations using QUAST software; and (iii) curation: presents methods to finishing assemblies through tools for scaffolding contigs and close gaps. We also presented a case study that validated the efficacy of SIMBA to manage bacterial assemblies projects sequenced using Ion Torrent PGM. Besides to be a web tool for genome assembly, SIMBA is a complete genome assemblies project management system, which can be useful for managing of several projects in laboratories. SIMBA source code is available to download and install in local webservers at http://ufmg-simba.sourceforge.net .
Byrd, Allyson L; Deming, Clay; Cassidy, Sara K B; Harrison, Oliver J; Ng, Weng-Ian; Conlan, Sean; Belkaid, Yasmine; Segre, Julia A; Kong, Heidi H
2017-07-05
The heterogeneous course, severity, and treatment responses among patients with atopic dermatitis (AD; eczema) highlight the complexity of this multifactorial disease. Prior studies have used traditional typing methods on cultivated isolates or sequenced a bacterial marker gene to study the skin microbial communities of AD patients. Shotgun metagenomic sequence analysis provides much greater resolution, elucidating multiple levels of microbial community assembly ranging from kingdom to species and strain-level diversification. We analyzed microbial temporal dynamics from a cohort of pediatric AD patients sampled throughout the disease course. Species-level investigation of AD flares showed greater Staphylococcus aureus predominance in patients with more severe disease and Staphylococcus epidermidis predominance in patients with less severe disease. At the strain level, metagenomic sequencing analyses demonstrated clonal S. aureus strains in more severe patients and heterogeneous S. epidermidis strain communities in all patients. To investigate strain-level biological effects of S. aureus , we topically colonized mice with human strains isolated from AD patients and controls. This cutaneous colonization model demonstrated S. aureus strain-specific differences in eliciting skin inflammation and immune signatures characteristic of AD patients. Specifically, S. aureus isolates from AD patients with more severe flares induced epidermal thickening and expansion of cutaneous T helper 2 (T H 2) and T H 17 cells. Integrating high-resolution sequencing, culturing, and animal models demonstrated how functional differences of staphylococcal strains may contribute to the complexity of AD disease. Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Assembly & Metrology of First Wall Components of SST-1
NASA Astrophysics Data System (ADS)
Parekh, Tejas; Santra, Prosenjit; Biswas, Prabal; Patel, Hiteshkumar; Paravastu, Yuvakiran; Jaiswal, Snehal; Chauhan, Pradeep; Babu, Gattu Ramesh; A, Arun Prakash; Bhavsar, Dhaval; Raval, Dilip C.; Khan, Ziauddin; Pradhan, Subrata
2017-04-01
First Wall Components (FWC) of SST-1 tokamak, which are in the immediate vicinity of plasma comprises of limiters, divertors, baffles, passive stabilizers are designed to operate long duration (1000 s) discharges of elongated plasma. All FWC consists of a copper alloy heat sink modules with SS cooling tubes brazed onto it, graphite tiles acting as armour material facing the plasma, and are mounted to the vacuum vessels with suitable Inconel support structures at ring & port locations. The FWC are very recently assembled and commissioned successfully inside the vacuum vessel of SST-1 undergoing a meticulous planning of assembly sequence, quality checks at every stage of the assembly process. This paper will present the metrology aspects & procedure of each FWC, both outside the vacuum vessel, and inside the vessel, assembly tolerances, tools, equipment and jig/fixtures, used at each stage of assembly, starting from location of support bases on vessel rings, fixing of copper modules on support structures, around 3800 graphite tile mounting on 136 copper modules with proper tightening torques, till final toroidal and poloidal geometry of the in-vessel components are obtained within acceptable limits, also ensuring electrical continuity of passive stabilizers to form a closed saddle loop, electrical isolation of passive stabilizers from vacuum vessel.
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies
Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.; ...
2017-07-18
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies
Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Richard A.; Brown, Steven D.
2017-01-01
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences. PMID:28769883
Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
Chin, Chen-Shan; Alexander, David H; Marks, Patrick; Klammer, Aaron A; Drake, James; Heiner, Cheryl; Clum, Alicia; Copeland, Alex; Huddleston, John; Eichler, Evan E; Turner, Stephen W; Korlach, Jonas
2013-06-01
We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.
Jayakumar, Vasanthan; Sakakibara, Yasubumi
2017-11-03
Long reads obtained from third-generation sequencing platforms can help overcome the long-standing challenge of the de novo assembly of sequences for the genomic analysis of non-model eukaryotic organisms. Numerous long-read-aided de novo assemblies have been published recently, which exhibited superior quality of the assembled genomes in comparison with those achieved using earlier second-generation sequencing technologies. Evaluating assemblies is important in guiding the appropriate choice for specific research needs. In this study, we evaluated 10 long-read assemblers using a variety of metrics on Pacific Biosciences (PacBio) data sets from different taxonomic categories with considerable differences in genome size. The results allowed us to narrow down the list to a few assemblers that can be effectively applied to eukaryotic assembly projects. Moreover, we highlight how best to use limited genomic resources for effectively evaluating the genome assemblies of non-model organisms. © The Author 2017. Published by Oxford University Press.
Patil, Yogita; Müller, Nicolai; Schink, Bernhard; ...
2017-02-20
Anaerobium acetethylicum strain GluBS11 T belongs to the family Lachnospiraceae within the order Clostridiales. It is a Gram-positive, non-motile and strictly anaerobic bacterium isolated from biogas slurry that was originally enriched with gluconate as carbon source (Patil, et al., Int J Syst Evol Microbiol 65:3289-3296, 2015). Here we describe the draft genome sequence of strain GluBS11 T and provide a detailed insight into its physiological and metabolic features. The draft genome sequence generated 4,609,043 bp, distributed among 105 scaffolds assembled using the SPAdes genome assembler method. It comprises in total 4,132 genes, of which 4,008 were predicted to be proteinmore » coding genes, 124 RNA genes and 867 pseudogenes. The content was 43.51 mol %. The annotated genome of strain GluBS11 T contains putative genes coding for the pentose phosphate pathway, the Embden-Meyerhoff-Parnas pathway, the Entner-Doudoroff pathway and the tricarboxylic acid cycle. The genome revealed the presence of most of the necessary genes required for the fermentation of glucose and gluconate to acetate, ethanol, and hydrogen gas. However, a candidate gene for production of formate was not identified.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Patil, Yogita; Müller, Nicolai; Schink, Bernhard
Anaerobium acetethylicum strain GluBS11 T belongs to the family Lachnospiraceae within the order Clostridiales. It is a Gram-positive, non-motile and strictly anaerobic bacterium isolated from biogas slurry that was originally enriched with gluconate as carbon source (Patil, et al., Int J Syst Evol Microbiol 65:3289-3296, 2015). Here we describe the draft genome sequence of strain GluBS11 T and provide a detailed insight into its physiological and metabolic features. The draft genome sequence generated 4,609,043 bp, distributed among 105 scaffolds assembled using the SPAdes genome assembler method. It comprises in total 4,132 genes, of which 4,008 were predicted to be proteinmore » coding genes, 124 RNA genes and 867 pseudogenes. The content was 43.51 mol %. The annotated genome of strain GluBS11 T contains putative genes coding for the pentose phosphate pathway, the Embden-Meyerhoff-Parnas pathway, the Entner-Doudoroff pathway and the tricarboxylic acid cycle. The genome revealed the presence of most of the necessary genes required for the fermentation of glucose and gluconate to acetate, ethanol, and hydrogen gas. However, a candidate gene for production of formate was not identified.« less
Lee, Wonbae; Gillies, John P.; Jose, Davis; Israels, Brett A.; von Hippel, Peter H.; Marcus, Andrew H.
2016-01-01
Gene 32 protein (gp32) is the single-stranded (ss) DNA binding protein of the bacteriophage T4. It binds transiently and cooperatively to ssDNA sequences exposed during the DNA replication process and regulates the interactions of the other sub-assemblies of the replication complex during the replication cycle. We here use single-molecule FRET techniques to build on previous thermodynamic studies of gp32 binding to initiate studies of the dynamics of the isolated and cooperative binding of gp32 molecules within the replication complex. DNA primer/template (p/t) constructs are used as models to determine the effects of ssDNA lattice length, gp32 concentration, salt concentration, binding cooperativity and binding polarity at p/t junctions. Hidden Markov models (HMMs) and transition density plots (TDPs) are used to characterize the dynamics of the multi-step assembly pathway of gp32 at p/t junctions of differing polarity, and show that isolated gp32 molecules bind to their ssDNA targets weakly and dissociate quickly, while cooperatively bound dimeric or trimeric clusters of gp32 bind much more tightly, can ‘slide’ on ssDNA sequences, and exhibit binding dynamics that depend on p/t junction polarities. The potential relationships of these binding dynamics to interactions with other components of the T4 DNA replication complex are discussed. PMID:27694621
Conte, Matthew A; Gammerdinger, William J; Bartie, Kerry L; Penman, David J; Kocher, Thomas D
2017-05-02
Tilapias are the second most farmed fishes in the world and a sustainable source of food. Like many other fish, tilapias are sexually dimorphic and sex is a commercially important trait in these fish. In this study, we developed a significantly improved assembly of the tilapia genome using the latest genome sequencing methods and show how it improves the characterization of two sex determination regions in two tilapia species. A homozygous clonal XX female Nile tilapia (Oreochromis niloticus) was sequenced to 44X coverage using Pacific Biosciences (PacBio) SMRT sequencing. Dozens of candidate de novo assemblies were generated and an optimal assembly (contig NG50 of 3.3Mbp) was selected using principal component analysis of likelihood scores calculated from several paired-end sequencing libraries. Comparison of the new assembly to the previous O. niloticus genome assembly reveals that recently duplicated portions of the genome are now well represented. The overall number of genes in the new assembly increased by 27.3%, including a 67% increase in pseudogenes. The new tilapia genome assembly correctly represents two recent vasa gene duplication events that have been verified with BAC sequencing. At total of 146Mbp of additional transposable element sequence are now assembled, a large proportion of which are recent insertions. Large centromeric satellite repeats are assembled and annotated in cichlid fish for the first time. Finally, the new assembly identifies the long-range structure of both a ~9Mbp XY sex determination region on LG1 in O. niloticus, and a ~50Mbp WZ sex determination region on LG3 in the related species O. aureus. This study highlights the use of long read sequencing to correctly assemble recent duplications and to characterize repeat-filled regions of the genome. The study serves as an example of the need for high quality genome assemblies and provides a framework for identifying sex determining genes in tilapia and related fish species.
Links, Matthew G; Chaban, Bonnie; Hemmingsen, Sean M; Muirhead, Kevin; Hill, Janet E
2013-08-15
Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database. Here we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure. mPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly.
Kelley; Dana A. , Farooque; Mohammad , Davis; Keith
2007-10-02
A fuel cell system with improved electrical isolation having a fuel cell stack with a positive potential end and a negative potential, a manifold for use in coupling gases to and from a face of the fuel cell stack, an electrical isolating assembly for electrically isolating the manifold from the stack, and a unit for adjusting an electrical potential of the manifold such as to impede the flow of electrolyte from the stack across the isolating assembly.
Assembly of cucumber (Cucumis sativus L.) somaclones
NASA Astrophysics Data System (ADS)
Skarzyńska, Agnieszka; Kuśmirek, Wiktor; Pawełkowicz, Magdalena; PlÄ der, Wojciech; Nowak, Robert M.
2017-08-01
The development of next generation sequencing opens the possibility of using sequencing in various plant studies, such as finding structural changes and small polymorphisms between species and within them. Most analyzes rely on genomic sequences and it is crucial to use well-assembled genomes of high quality and completeness. Herein we compare commonly available programs for genomic assembling and newly developed software - dnaasm. Assemblies were tested on cucumber (Cucumis sativus L.) lines obtained by in vitro regeneration (somaclones), showing different phenotypes. Obtained results shows that dnaasm assembler is a good tool for short read assembly, which allows obtaining genomes of high quality and completeness.
Y and W Chromosome Assemblies: Approaches and Discoveries.
Tomaszkiewicz, Marta; Medvedev, Paul; Makova, Kateryna D
2017-04-01
Hundreds of vertebrate genomes have been sequenced and assembled to date. However, most sequencing projects have ignored the sex chromosomes unique to the heterogametic sex - Y and W - that are known as sex-limited chromosomes (SLCs). Indeed, haploid and repetitive Y chromosomes in species with male heterogamety (XY), and W chromosomes in species with female heterogamety (ZW), are difficult to sequence and assemble. Nevertheless, obtaining their sequences is important for understanding the intricacies of vertebrate genome function and evolution. Recent progress has been made towards the adaptation of next-generation sequencing (NGS) techniques to deciphering SLC sequences. We review here currently available methodology and results with regard to SLC sequencing and assembly. We focus on vertebrates, but bring in some examples from other taxa. Copyright © 2017 Elsevier Ltd. All rights reserved.
Kolente virus, a rhabdovirus species isolated from ticks and bats in the Republic of Guinea
Ghedin, Elodie; Rogers, Matthew B.; Widen, Steven G.; Guzman, Hilda; Travassos da Rosa, Amelia P. A.; Wood, Thomas G.; Fitch, Adam; Popov, Vsevolod; Holmes, Edward C.; Walker, Peter J.; Tesh, Robert B.
2013-01-01
Kolente virus (KOLEV) is a rhabdovirus originally isolated from ticks and a bat in Guinea, West Africa, in 1985. Although tests at the time of isolation suggested that KOLEV is a novel rhabdovirus, it has remained largely uncharacterized. We assembled the complete genome sequence of the prototype strain DakAr K7292, which was found to encode the five canonical rhabdovirus structural proteins (N, P, M, G and L) with alternative ORFs (>180 nt) in the P and L genes. Serologically, KOLEV exhibited a weak antigenic relationship with Barur and Fukuoka viruses in the Kern Canyon group. Phylogenetic analysis revealed that KOLEV represents a distinct and divergent lineage that shows no clear relationship to any rhabdovirus except Oita virus, although with limited phylogenetic resolution. In summary, KOLEV represents a novel species in the family Rhabdoviridae. PMID:24062532
Augmenting Chinese hamster genome assembly by identifying regions of high confidence.
Vishwanathan, Nandita; Bandyopadhyay, Arpan A; Fu, Hsu-Yuan; Sharma, Mohit; Johnson, Kathryn C; Mudge, Joann; Ramaraj, Thiruvarangan; Onsongo, Getiria; Silverstein, Kevin A T; Jacob, Nitya M; Le, Huong; Karypis, George; Hu, Wei-Shou
2016-09-01
Chinese hamster Ovary (CHO) cell lines are the dominant industrial workhorses for therapeutic recombinant protein production. The availability of genome sequence of Chinese hamster and CHO cells will spur further genome and RNA sequencing of producing cell lines. However, the mammalian genomes assembled using shot-gun sequencing data still contain regions of uncertain quality due to assembly errors. Identifying high confidence regions in the assembled genome will facilitate its use for cell engineering and genome engineering. We assembled two independent drafts of Chinese hamster genome by de novo assembly from shotgun sequencing reads and by re-scaffolding and gap-filling the draft genome from NCBI for improved scaffold lengths and gap fractions. We then used the two independent assemblies to identify high confidence regions using two different approaches. First, the two independent assemblies were compared at the sequence level to identify their consensus regions as "high confidence regions" which accounts for at least 78 % of the assembled genome. Further, a genome wide comparison of the Chinese hamster scaffolds with mouse chromosomes revealed scaffolds with large blocks of collinearity, which were also compiled as high-quality scaffolds. Genome scale collinearity was complemented with EST based synteny which also revealed conserved gene order compared to mouse. As cell line sequencing becomes more commonly practiced, the approaches reported here are useful for assessing the quality of assembly and potentially facilitate the engineering of cell lines. Copyright © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Yeast prions assembly and propagation
2011-01-01
Yeast prions are self-perpetuating protein aggregates that are at the origin of heritable and transmissible non-Mendelian phenotypic traits. Among these, [PSI+], [URE3] and [PIN+] are the most well documented prions and arise from the assembly of Sup35p, Ure2p and Rnq1p, respectively, into insoluble fibrillar assemblies. Fibril assembly depends on the presence of N- or C-terminal prion domains (PrDs) which are not homologous in sequence but share unusual amino-acid compositions, such as enrichment in polar residues (glutamines and asparagines) or the presence of oligopeptide repeats. Purified PrDs form amyloid fibrils that can convert prion-free cells to the prion state upon transformation. Nonetheless, isolated PrDs and full-length prion proteins have different aggregation, structural and infectious properties. In addition, mutations in the “non-prion” domains (non-PrDs) of Sup35p, Ure2p and Rnq1p were shown to affect their prion properties in vitro and in vivo. Despite these evidences, the implication of the functional non-PrDs in fibril assembly and prion propagation has been mostly overlooked. In this review, we discuss the contribution of non-PrDs to prion assemblies, and the structure-function relationship in prion infectivity in the light of recent findings on Sup35p and Ure2p assembly into infectious fibrils from our laboratory and others. PMID:22052349
NASA Technical Reports Server (NTRS)
Young, Ken (Inventor); Hindle, Timothy (Inventor); Barber, Tim Daniel (Inventor)
2016-01-01
Mounting systems for structural members, fastening assemblies thereof, and vibration isolation systems including the same are provided. Mounting systems comprise a pair of mounting brackets, each clamped against a fastening assembly forming a mounting assembly. Fastening assemblies comprise a spherical rod end comprising a spherical member having a through opening and an integrally threaded shaft, first and second seating members on opposite sides of the spherical member and each having a through opening that is substantially coaxial with the spherical member through opening, and a partially threaded fastener that threadably engages each mounting bracket forming the mounting assembly. Structural members have axial end portions, each releasably coupled to a mounting bracket by the integrally threaded shaft. Axial end portions are threaded in opposite directions for permitting structural member rotation to adjust a length thereof to a substantially zero strain position. Structural members may be vibration isolator struts in vibration isolation systems.
The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans.
Tully, Benjamin J; Graham, Elaina D; Heidelberg, John F
2018-01-16
Microorganisms play a crucial role in mediating global biogeochemical cycles in the marine environment. By reconstructing the genomes of environmental organisms through metagenomics, researchers are able to study the metabolic potential of Bacteria and Archaea that are resistant to isolation in the laboratory. Utilizing the large metagenomic dataset generated from 234 samples collected during the Tara Oceans circumnavigation expedition, we were able to assemble 102 billion paired-end reads into 562 million contigs, which in turn were co-assembled and consolidated in to 7.2 million contigs ≥2 kb in length. Approximately 1 million of these contigs were binned to reconstruct draft genomes. In total, 2,631 draft genomes with an estimated completion of ≥50% were generated (1,491 draft genomes >70% complete; 603 genomes >90% complete). A majority of the draft genomes were manually assigned phylogeny based on sets of concatenated phylogenetic marker genes and/or 16S rRNA gene sequences. The draft genomes are now publically available for the research community at-large.
De novo Genome Assembly of the Fungal Plant Pathogen Pyrenophora semeniperda
Soliai, Marcus M.; Meyer, Susan E.; Udall, Joshua A.; Elzinga, David E.; Hermansen, Russell A.; Bodily, Paul M.; Hart, Aaron A.; Coleman, Craig E.
2014-01-01
Pyrenophora semeniperda (anamorph Drechslera campulata) is a necrotrophic fungal seed pathogen that has a wide host range within the Poaceae. One of its hosts is cheatgrass (Bromus tectorum), a species exotic to the United States that has invaded natural ecosystems of the Intermountain West. As a natural pathogen of cheatgrass, P. semeniperda has potential as a biocontrol agent due to its effectiveness at killing seeds within the seed bank; however, few genetic resources exist for the fungus. Here, the genome of P. semeniperda isolate assembled from sequence reads of 454 pyrosequencing is presented. The total assembly is 32.5 Mb and includes 11,453 gene models encoding putative proteins larger than 24 amino acids. The models represent a variety of putative genes that are involved in pathogenic pathways typically found in necrotrophic fungi. In addition, extensive rearrangements, including inter- and intrachromosomal rearrangements, were found when the P. semeniperda genome was compared to P. tritici-repentis, a related fungal species. PMID:24475219
Local circulating clones of Staphylococcus aureus in Ecuador.
Zurita, Jeannete; Barba, Pedro; Ortega-Paredes, David; Mora, Marcelo; Rivadeneira, Sebastián
The spread of pandemic Staphylococcus aureus clones, mainly methicillin-resistant S. aureus (MRSA), must be kept under surveillance to assemble an accurate, local epidemiological analysis. In Ecuador, the prevalence of the USA300 Latin American variant clone (USA300-LV) is well known; however, there is little information about other circulating clones. The aim of this work was to identify the sequence types (ST) using a Multiple-Locus Variable number tandem repeat Analysis 14-locus genotyping approach. We analyzed 132 S. aureus strains that were recovered from 2005 to 2013 and isolated in several clinical settings in Quito, Ecuador. MRSA isolates composed 46.97% (62/132) of the study population. Within MRSA, 37 isolates were related to the USA300-LV clone (ST8-MRSA-IV, Panton-Valentine Leukocidin [PVL] +) and 10 were related to the Brazilian clone (ST239-MRSA-III, PVL-). Additionally, two isolates (ST5-MRSA-II, PVL-) were related to the New York/Japan clone. One isolate was related to the Pediatric clone (ST5-MRSA-IV, PVL-), one isolate (ST45-MRSA-II, PVL-) was related to the USA600 clone, and one (ST22-MRSA-IV, PVL-) was related to the epidemic UK-EMRSA-15 clone. Moreover, the most prevalent MSSA sequence types were ST8 (11 isolates), ST45 (8 isolates), ST30 (8 isolates), ST5 (7 isolates) and ST22 (6 isolates). Additionally, we found one isolate that was related to the livestock associated S. aureus clone ST398. We conclude that in addition to the high prevalence of clone LV-ST8-MRSA-IV, other epidemic clones are circulating in Quito, such as the Brazilian, Pediatric and New York/Japan clones. The USA600 and UK-EMRSA-15 clones, which were not previously described in Ecuador, were also found. Moreover, we found evidence of the presence of the livestock associated clone ST398 in a hospital environment. Copyright © 2016 Sociedade Brasileira de Infectologia. Published by Elsevier Editora Ltda. All rights reserved.
Self-Assembly of Measles Virus Nucleocapsid-like Particles: Kinetics and RNA Sequence Dependence.
Milles, Sigrid; Jensen, Malene Ringkjøbing; Communie, Guillaume; Maurin, Damien; Schoehn, Guy; Ruigrok, Rob W H; Blackledge, Martin
2016-08-01
Measles virus RNA genomes are packaged into helical nucleocapsids (NCs), comprising thousands of nucleo-proteins (N) that bind the entire genome. N-RNA provides the template for replication and transcription by the viral polymerase and is a promising target for viral inhibition. Elucidation of mechanisms regulating this process has been severely hampered by the inability to controllably assemble NCs. Here, we demonstrate self-organization of N into NC-like particles in vitro upon addition of RNA, providing a simple and versatile tool for investigating assembly. Real-time NMR and fluorescence spectroscopy reveals biphasic assembly kinetics. Remarkably, assembly depends strongly on the RNA-sequence, with the genomic 5' end and poly-Adenine sequences assembling efficiently, while sequences such as poly-Uracil are incompetent for NC formation. This observation has important consequences for understanding the assembly process. © 2016 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA.
Cerdeira, Louise Teixeira; Carneiro, Adriana Ribeiro; Ramos, Rommel Thiago Jucá; de Almeida, Sintia Silva; D'Afonseca, Vivian; Schneider, Maria Paula Cruz; Baumbach, Jan; Tauch, Andreas; McCulloch, John Anthony; Azevedo, Vasco Ariston Carvalho; Silva, Artur
2011-08-01
Due to the advent of the so-called Next-Generation Sequencing (NGS) technologies the amount of monetary and temporal resources for whole-genome sequencing has been reduced by several orders of magnitude. Sequence reads can be assembled either by anchoring them directly onto an available reference genome (classical reference assembly), or can be concatenated by overlap (de novo assembly). The latter strategy is preferable because it tends to maintain the architecture of the genome sequence the however, depending on the NGS platform used, the shortness of read lengths cause tremendous problems the in the subsequent genome assembly phase, impeding closing of the entire genome sequence. To address the problem, we developed a multi-pronged hybrid de novo strategy combining De Bruijn graph and Overlap-Layout-Consensus methods, which was used to assemble from short reads the entire genome of Corynebacterium pseudotuberculosis strain I19, a bacterium with immense importance in veterinary medicine that causes Caseous Lymphadenitis in ruminants, principally ovines and caprines. Briefly, contigs were assembled de novo from the short reads and were only oriented using a reference genome by anchoring. Remaining gaps were closed using iterative anchoring of short reads by craning to gap flanks. Finally, we compare the genome sequence assembled using our hybrid strategy to a classical reference assembly using the same data as input and show that with the availability of a reference genome, it pays off to use the hybrid de novo strategy, rather than a classical reference assembly, because more genome sequences are preserved using the former. Copyright © 2011 Elsevier B.V. All rights reserved.
Nowrousian, Minou; Stajich, Jason E.; Chu, Meiling; Engh, Ines; Espagne, Eric; Halliday, Karen; Kamerewerd, Jens; Kempken, Frank; Knab, Birgit; Kuo, Hsiao-Che; Osiewacz, Heinz D.; Pöggeler, Stefanie; Read, Nick D.; Seiler, Stephan; Smith, Kristina M.; Zickler, Denise; Kück, Ulrich; Freitag, Michael
2010-01-01
Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30–90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in ∼4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for comparative studies to address basic questions of fungal biology. PMID:20386741
Nowrousian, Minou; Stajich, Jason E; Chu, Meiling; Engh, Ines; Espagne, Eric; Halliday, Karen; Kamerewerd, Jens; Kempken, Frank; Knab, Birgit; Kuo, Hsiao-Che; Osiewacz, Heinz D; Pöggeler, Stefanie; Read, Nick D; Seiler, Stephan; Smith, Kristina M; Zickler, Denise; Kück, Ulrich; Freitag, Michael
2010-04-08
Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30-90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in approximately 4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for comparative studies to address basic questions of fungal biology.
Pervasiveness of UVC254-resistant Geobacillus strains in extreme environments.
Carlson, Courtney; Singh, Nitin K; Bibra, Mohit; Sani, Rajesh K; Venkateswaran, Kasthuri
2018-02-01
We have characterized a broad collection of extremophilic bacterial isolates from a deep subsurface mine, compost dumping sites, and several hot spring ecosystems. Spore-forming strains isolated from these environments comprised both obligate thermophiles/thermotolerant species (growing at > 55 °C; 240 strains) and mesophiles (growing at 15 to 40 °C; 12 strains). An overwhelming abundance of Geobacillus (81.3%) and Bacillus (18.3%) species was observed among the tested isolates. 16S rRNA sequence analysis documented the presence of 24 species among these isolates, but the 16S rRNA gene was shown to possess insufficient resolution to reliably discern Geobacillus phylogeny. gyrB-based phylogenetic analyses of nine strains revealed the presence of six known Geobacillus and one novel species. Multilocus sequence typing analyses based on seven different housekeeping genes deduced from whole genome sequencing of nine strains revealed the presence of three novel Geobacillus species. The vegetative cells of 41 Geobacillus strains were exposed to UVC 254 , and most (34 strains) survived 120 J/m 2 , while seven strains survived 300 J/m 2 , and cells of only one Geobacillus strain isolated from a compost facility survived 600 J/m 2 . Additionally, the UVC 254 inactivation kinetics of spores from four Geobacillus strains isolated from three distinct geographical regions were evaluated and compared to that of a spacecraft assembly facility (SAF) clean room Geobacillus strain. The purified spores of the thermophilic SAF strain exhibited resistance to 2000 J/m 2 , whereas spores of two environmental Geobacillus strains showed resistance to 1000 J/m 2 . This study is the first to investigate UV resistance of environmental, obligately thermophilic Geobacillus strains, and also lays the foundation for advanced understanding of necessary sterilization protocols practiced in food, medical, pharmaceutical, and aerospace industries.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tiwari, Ravi; Howieson, John; Yates, Ron
Bradyrhizobium sp. WSM1253 is a novel N 2-fixing bacterium isolated from a root nodule of the herbaceous annual legume Ornithopus compressus that was growing on the Greek Island of Sifnos. WSM1253 emerged as a strain of interest in an Australian program that was selecting inoculant quality bradyrhizobial strains for inoculation of Mediterranean species of lupins ( Lupinus angustifolius, L. princei, L. atlanticus, L. pilosus ). In this report we describe, for the first time, the genome sequence information and annotation of this legume microsymbiont. The 8,719,808 bp genome has a G + C content of 63.09 % with 71 contigsmore » arranged into two scaffolds. The assembled genome contains 8,432 protein-coding genes, 66 RNA genes and a single rRNA operon. In conclusion, this improved-high-quality draft rhizobial genome is one of 20 sequenced through a DOE Joint Genome Institute 2010 Community Sequencing Project.« less
Tiwari, Ravi; Howieson, John; Yates, Ron; ...
2015-11-30
Bradyrhizobium sp. WSM1253 is a novel N 2-fixing bacterium isolated from a root nodule of the herbaceous annual legume Ornithopus compressus that was growing on the Greek Island of Sifnos. WSM1253 emerged as a strain of interest in an Australian program that was selecting inoculant quality bradyrhizobial strains for inoculation of Mediterranean species of lupins ( Lupinus angustifolius, L. princei, L. atlanticus, L. pilosus ). In this report we describe, for the first time, the genome sequence information and annotation of this legume microsymbiont. The 8,719,808 bp genome has a G + C content of 63.09 % with 71 contigsmore » arranged into two scaffolds. The assembled genome contains 8,432 protein-coding genes, 66 RNA genes and a single rRNA operon. In conclusion, this improved-high-quality draft rhizobial genome is one of 20 sequenced through a DOE Joint Genome Institute 2010 Community Sequencing Project.« less
Long, E O; Gross, N; Wake, C T; Mach, J P; Carrel, S; Accolla, R; Mach, B
1982-01-01
HLA-DR antigens are polymorphic cell surface glycoproteins, expressed primarily in B lymphocytes and macrophages, which are thought to play an important role in the immune response. Two polypeptide chains, alpha and beta, are associated at the cell surface, and a third chain associates with alpha and beta intracellularly. RNA isolated from the human B-cell line Raji was injected in Xenopus laevis oocytes. Immunoprecipitates of translation products with several monoclonal antibodies revealed the presence of HLA-DR antigens similar to those synthesized in Raji cells. One monoclonal antibody was able to bind the beta chain after dissociation of the three polypeptide chains with detergent. The presence of all three chains was confirmed by two-dimensional gel electrophoresis. The glycosylation pattern of the three chains was identical to that observed in vivo, as evidenced in studies using tunicamycin, an inhibitor of N-linked glycosylation. The presence of alpha chains assembled with beta chains in equimolar ratio was further demonstrated by amino-terminal sequencing. An RNA fraction enriched for the three mRNAs, encoding alpha, beta, and intracellular chains, was isolated. This translation-assembly system and the availability of monoclonal antibodies make it possible to assay for mRNA encoding specific molecules among the multiple human Ia-like antigens. Images Fig. 1. Fig. 2. Fig. 3. Fig. 4. Fig. 5. PMID:6821356
LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads.
El-Metwally, Sara; Zakaria, Magdi; Hamza, Taher
2016-11-01
The deluge of current sequenced data has exceeded Moore's Law, more than doubling every 2 years since the next-generation sequencing (NGS) technologies were invented. Accordingly, we will able to generate more and more data with high speed at fixed cost, but lack the computational resources to store, process and analyze it. With error prone high throughput NGS reads and genomic repeats, the assembly graph contains massive amount of redundant nodes and branching edges. Most assembly pipelines require this large graph to reside in memory to start their workflows, which is intractable for mammalian genomes. Resource-efficient genome assemblers combine both the power of advanced computing techniques and innovative data structures to encode the assembly graph efficiently in a computer memory. LightAssembler is a lightweight assembly algorithm designed to be executed on a desktop machine. It uses a pair of cache oblivious Bloom filters, one holding a uniform sample of [Formula: see text]-spaced sequenced [Formula: see text]-mers and the other holding [Formula: see text]-mers classified as likely correct, using a simple statistical test. LightAssembler contains a light implementation of the graph traversal and simplification modules that achieves comparable assembly accuracy and contiguity to other competing tools. Our method reduces the memory usage by [Formula: see text] compared to the resource-efficient assemblers using benchmark datasets from GAGE and Assemblathon projects. While LightAssembler can be considered as a gap-based sequence assembler, different gap sizes result in an almost constant assembly size and genome coverage. https://github.com/SaraEl-Metwally/LightAssembler CONTACT: sarah_almetwally4@mans.edu.egSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Liu, X; Gorovsky, M A
1996-01-01
A truncated cDNA clone encoding Tetrahymena thermophila histone H2A2 was isolated using synthetic degenerate oligonucleotide probes derived from H2A protein sequences of Tetrahymena pyriformis. The cDNA clone was used as a homologous probe to isolate a truncated genomic clone encoding H2A1. The remaining regions of the genes for H2A1 (HTA1) and H2A2 (HTA2) were then isolated using inverse PCR on circularized genomic DNA fragments. These partial clones were assembled into intact HTA1 and HTA2 clones. Nucleotide sequences of the two genes were highly homologous within the coding region but not in the noncoding regions. Comparison of the deduced amino acid sequences with protein sequences of T. pyriformis H2As showed only two and three differences respectively, in a total of 137 amino acids for H2A1, and 132 amino acids for H2A2, indicating the two genes arose before the divergence of these two species. The HTA2 gene contains a TAA triplet within the coding region, encoding a glutamine residue. In contrast with the T. thermophila HHO and HTA3 genes, no introns were identified within the two genes. The 5'- and 3'-ends of the histone H2A mRNAs; were determined by RNase protection and by PCR mapping using RACE and RLM-RACE methods. Both genes encode polyadenylated mRNAs and are highly expressed in vegetatively growing cells but only weakly expressed in starved cultures. With the inclusion of these two genes, T. thermophila is the first organism whose entire complement of known core and linker histones, including replication-dependent and basal variants, has been cloned and sequenced. PMID:8760889
Cornelissen, Marion; Gall, Astrid; Vink, Monique; Zorgdrager, Fokla; Binter, Špela; Edwards, Stephanie; Jurriaans, Suzanne; Bakker, Margreet; Ong, Swee Hoe; Gras, Luuk; van Sighem, Ard; Bezemer, Daniela; de Wolf, Frank; Reiss, Peter; Kellam, Paul; Berkhout, Ben; Fraser, Christophe; van der Kuyl, Antoinette C
2017-07-15
The BEEHIVE (Bridging the Evolution and Epidemiology of HIV in Europe) project aims to analyse nearly-complete viral genomes from >3000 HIV-1 infected Europeans using high-throughput deep sequencing techniques to investigate the virus genetic contribution to virulence. Following the development of a computational pipeline, including a new de novo assembler for RNA virus genomes, to generate larger contiguous sequences (contigs) from the abundance of short sequence reads that characterise the data, another area that determines genome sequencing success is the quality and quantity of the input RNA. A pilot experiment with 125 patient plasma samples was performed to investigate the optimal method for isolation of HIV-1 viral RNA for long amplicon genome sequencing. Manual isolation with the QIAamp Viral RNA Mini Kit (Qiagen) was superior over robotically extracted RNA using either the QIAcube robotic system, the mSample Preparation Systems RNA kit with automated extraction by the m2000sp system (Abbott Molecular), or the MagNA Pure 96 System in combination with the MagNA Pure 96 Instrument (Roche Diagnostics). We scored amplification of a set of four HIV-1 amplicons of ∼1.9, 3.6, 3.0 and 3.5kb, and subsequent recovery of near-complete viral genomes. Subsequently, 616 BEEHIVE patient samples were analysed to determine factors that influence successful amplification of the genome in four overlapping amplicons using the QIAamp Viral RNA Kit for viral RNA isolation. Both low plasma viral load and high sample age (stored before 1999) negatively influenced the amplification of viral amplicons >3kb. A plasma viral load of >100,000 copies/ml resulted in successful amplification of all four amplicons for 86% of the samples, this value dropped to only 46% for samples with viral loads of <20,000 copies/ml. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Staňková, Helena; Hastie, Alex R; Chan, Saki; Vrána, Jan; Tulpová, Zuzana; Kubaláková, Marie; Visendi, Paul; Hayashi, Satomi; Luo, Mingcheng; Batley, Jacqueline; Edwards, David; Doležel, Jaroslav; Šimková, Hana
2016-07-01
The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC-by-BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high-resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high-resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome-scale analysis of repetitive sequences and revealed a ~800-kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone-by-clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC-contig physical map and validate sequence assembly on a chromosome-arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome-by-chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Lin, You-Yu; Hsieh, Chia-Hung; Chen, Jiun-Hong; Lu, Xuemei; Kao, Jia-Horng; Chen, Pei-Jer; Chen, Ding-Shinn; Wang, Hurng-Yi
2017-04-26
The accuracy of metagenomic assembly is usually compromised by high levels of polymorphism due to divergent reads from the same genomic region recognized as different loci when sequenced and assembled together. A viral quasispecies is a group of abundant and diversified genetically related viruses found in a single carrier. Current mainstream assembly methods, such as Velvet and SOAPdenovo, were not originally intended for the assembly of such metagenomics data, and therefore demands for new methods to provide accurate and informative assembly results for metagenomic data. In this study, we present a hybrid method for assembling highly polymorphic data combining the partial de novo-reference assembly (PDR) strategy and the BLAST-based assembly pipeline (BBAP). The PDR strategy generates in situ reference sequences through de novo assembly of a randomly extracted partial data set which is subsequently used for the reference assembly for the full data set. BBAP employs a greedy algorithm to assemble polymorphic reads. We used 12 hepatitis B virus quasispecies NGS data sets from a previous study to assess and compare the performance of both PDR and BBAP. Analyses suggest the high polymorphism of a full metagenomic data set leads to fragmentized de novo assembly results, whereas the biased or limited representation of external reference sequences included fewer reads into the assembly with lower assembly accuracy and variation sensitivity. In comparison, the PDR generated in situ reference sequence incorporated more reads into the final PDR assembly of the full metagenomics data set along with greater accuracy and higher variation sensitivity. BBAP assembly results also suggest higher assembly efficiency and accuracy compared to other assembly methods. Additionally, BBAP assembly recovered HBV structural variants that were not observed amongst assembly results of other methods. Together, PDR/BBAP assembly results were significantly better than other compared methods. Both PDR and BBAP independently increased the assembly efficiency and accuracy of highly polymorphic data, and assembly performances were further improved when used together. BBAP also provides nucleotide frequency information. Together, PDR and BBAP provide powerful tools for metagenomic data studies.
Comparative Analysis of the First Complete Enterococcus faecium Genome
Lam, Margaret M. C.; Seemann, Torsten; Bulach, Dieter M.; Gladman, Simon L.; Chen, Honglei; Haring, Volker; Moore, Robert J.; Ballard, Susan; Grayson, M. Lindsay; Johnson, Paul D. R.; Howden, Benjamin P.
2012-01-01
Vancomycin-resistant enterococci (VRE) are one of the leading causes of nosocomial infections in health care facilities around the globe. In particular, infections caused by vancomycin-resistant Enterococcus faecium are becoming increasingly common. Comparative and functional genomic studies of E. faecium isolates have so far been limited owing to the lack of a fully assembled E. faecium genome sequence. Here we address this issue and report the complete 3.0-Mb genome sequence of the multilocus sequence type 17 vancomycin-resistant Enterococcus faecium strain Aus0004, isolated from the bloodstream of a patient in Melbourne, Australia, in 1998. The genome comprises a 2.9-Mb circular chromosome and three circular plasmids. The chromosome harbors putative E. faecium virulence factors such as enterococcal surface protein, hemolysin, and collagen-binding adhesin. Aus0004 has a very large accessory genome (38%) that includes three prophage and two genomic islands absent among 22 other E. faecium genomes. One of the prophage was present as inverted 50-kb repeats that appear to have facilitated a 683-kb chromosomal inversion across the replication terminus, resulting in a striking replichore imbalance. Other distinctive features include 76 insertion sequence elements and a single chromosomal copy of Tn1549 containing the vanB vancomycin resistance element. A complete E. faecium genome will be a useful resource to assist our understanding of this emerging nosocomial pathogen. PMID:22366422
Duarte, M.; Sousa, R.; Videira, A.
1995-01-01
We have isolated and characterized the nuclear genes encoding the 12.3-kD subunit of the membrane arm and the 29.9-kD subunit of the peripheral arm of complex I from Neurospora crassa. The former gene was known to be located in linkage group I and the latter is now assigned to linkage group IV of the fungal genome. The genes were separately transformed into different N. crassa strains and transformants with duplicated DNA sequences were isolated. Selected transformants were then mated with other strains to generate repeat-induced point mutations in both copies of the genes present in the nucleus of the parental transformant. From the progeny of the crosses, we were then able to recover two individual mutants lacking the 12.3- and 29.9-kD proteins in their mitochondria, mutants nuo12.3 and nuo29.9, respectively. Several other subunits of complex I are present in the mutant organelles, although with altered stoichiometries as compared with those in the wild-type strain. Based on the analysis of Triton-solubilized mitochondrial complexes in sucrose gradients, neither mutant is able to fully assemble complex I. Our results indicate that mutant nuo12.3 separately assembles the peripheral arm and most of the membrane arm of the enzyme. Mutant nuo29.9 seems to accumulate the membrane arm of complex I and being devoid of the peripheral part. This implicates the 29.9-kD protein in an early step of complex I assembly. PMID:7768434
Kabani, Mehdi; Melki, Ronald
2011-01-01
Yeast prions are self-perpetuating protein aggregates that are at the origin of heritable and transmissible non-Mendelian phenotypic traits. Among these, [PSI+], [URE3] and [PIN+] are the most well documented prions and arise from the assembly of Sup35p, Ure2p and Rnq1p, respectively, into insoluble fibrillar assemblies. Fibril assembly depends on the presence of N- or C-terminal prion domains (PrDs) which are not homologous in sequence but share unusual amino-acid compositions, such as enrichment in polar residues (glutamines and asparagines) or the presence of oligopeptide repeats. Purified PrDs form amyloid fibrils that can convert prion-free cells to the prion state upon transformation. Nonetheless, isolated PrDs and full-length prion proteins have different aggregation, structural and infectious properties. In addition, mutations in the "non-prion" domains (non-PrDs) of Sup35p, Ure2p and Rnq1p were shown to affect their prion properties in vitro and in vivo. Despite these evidences, the implication of the functional non-PrDs in fibril assembly and prion propagation has been mostly overlooked. In this review, we discuss the contribution of non-PrDs to prion assemblies, and the structure-function relationship in prion infectivity in the light of recent findings on Sup35p and Ure2p assembly into infectious fibrils from our laboratory and others.
Review of General Algorithmic Features for Genome Assemblers for Next Generation Sequencers
Wajid, Bilal; Serpedin, Erchin
2012-01-01
In the realm of bioinformatics and computational biology, the most rudimentary data upon which all the analysis is built is the sequence data of genes, proteins and RNA. The sequence data of the entire genome is the solution to the genome assembly problem. The scope of this contribution is to provide an overview on the art of problem-solving applied within the domain of genome assembly in the next-generation sequencing (NGS) platforms. This article discusses the major genome assemblers that were proposed in the literature during the past decade by outlining their basic working principles. It is intended to act as a qualitative, not a quantitative, tutorial to all working on genome assemblers pertaining to the next generation of sequencers. We discuss the theoretical aspects of various genome assemblers, identifying their working schemes. We also discuss briefly the direction in which the area is headed towards along with discussing core issues on software simplicity. PMID:22768980
Centromere Locations in Brassica A and C Genomes Revealed Through Half-Tetrad Analysis
Mason, Annaliese S.; Rousseau-Gueutin, Mathieu; Morice, Jérôme; Bayer, Philipp E.; Besharat, Naghmeh; Cousin, Anouska; Pradhan, Aneeta; Parkin, Isobel A. P.; Chèvre, Anne-Marie; Batley, Jacqueline; Nelson, Matthew N.
2016-01-01
Locating centromeres on genome sequences can be challenging. The high density of repetitive elements in these regions makes sequence assembly problematic, especially when using short-read sequencing technologies. It can also be difficult to distinguish between active and recently extinct centromeres through sequence analysis. An effective solution is to identify genetically active centromeres (functional in meiosis) by half-tetrad analysis. This genetic approach involves detecting heterozygosity along chromosomes in segregating populations derived from gametes (half-tetrads). Unreduced gametes produced by first division restitution mechanisms comprise complete sets of nonsister chromatids. Along these chromatids, heterozygosity is maximal at the centromeres, and homologous recombination events result in homozygosity toward the telomeres. We genotyped populations of half-tetrad-derived individuals (from Brassica interspecific hybrids) using a high-density array of physically anchored SNP markers (Illumina Brassica 60K Infinium array). Mapping the distribution of heterozygosity in these half-tetrad individuals allowed the genetic mapping of all 19 centromeres of the Brassica A and C genomes to the reference Brassica napus genome. Gene and transposable element density across the B. napus genome were also assessed and corresponded well to previously reported genetic map positions. Known centromere-specific sequences were located in the reference genome, but mostly matched unanchored sequences, suggesting that the core centromeric regions may not yet be assembled into the pseudochromosomes of the reference genome. The increasing availability of genetic markers physically anchored to reference genomes greatly simplifies the genetic and physical mapping of centromeres using half-tetrad analysis. We discuss possible applications of this approach, including in species where half-tetrads are currently difficult to isolate. PMID:26614742
Centromere Locations in Brassica A and C Genomes Revealed Through Half-Tetrad Analysis.
Mason, Annaliese S; Rousseau-Gueutin, Mathieu; Morice, Jérôme; Bayer, Philipp E; Besharat, Naghmeh; Cousin, Anouska; Pradhan, Aneeta; Parkin, Isobel A P; Chèvre, Anne-Marie; Batley, Jacqueline; Nelson, Matthew N
2016-02-01
Locating centromeres on genome sequences can be challenging. The high density of repetitive elements in these regions makes sequence assembly problematic, especially when using short-read sequencing technologies. It can also be difficult to distinguish between active and recently extinct centromeres through sequence analysis. An effective solution is to identify genetically active centromeres (functional in meiosis) by half-tetrad analysis. This genetic approach involves detecting heterozygosity along chromosomes in segregating populations derived from gametes (half-tetrads). Unreduced gametes produced by first division restitution mechanisms comprise complete sets of nonsister chromatids. Along these chromatids, heterozygosity is maximal at the centromeres, and homologous recombination events result in homozygosity toward the telomeres. We genotyped populations of half-tetrad-derived individuals (from Brassica interspecific hybrids) using a high-density array of physically anchored SNP markers (Illumina Brassica 60K Infinium array). Mapping the distribution of heterozygosity in these half-tetrad individuals allowed the genetic mapping of all 19 centromeres of the Brassica A and C genomes to the reference Brassica napus genome. Gene and transposable element density across the B. napus genome were also assessed and corresponded well to previously reported genetic map positions. Known centromere-specific sequences were located in the reference genome, but mostly matched unanchored sequences, suggesting that the core centromeric regions may not yet be assembled into the pseudochromosomes of the reference genome. The increasing availability of genetic markers physically anchored to reference genomes greatly simplifies the genetic and physical mapping of centromeres using half-tetrad analysis. We discuss possible applications of this approach, including in species where half-tetrads are currently difficult to isolate. Copyright © 2016 by the Genetics Society of America.
Geisler, Christoph
2018-02-07
Adventitious viral contamination in cell substrates used for biologicals production is a major safety concern. A powerful new approach that can be used to identify adventitious viruses is a combination of bioinformatics tools with massively parallel sequencing technology. Typically, this involves mapping or BLASTN searching individual reads against viral nucleotide databases. Although extremely sensitive for known viruses, this approach can easily miss viruses that are too dissimilar to viruses in the database. Moreover, it is computationally intensive and requires reference cell genome databases. To avoid these drawbacks, we set out to develop an alternative approach. We reasoned that searching genome and transcriptome assemblies for adventitious viral contaminants using TBLASTN with a compact viral protein database covering extant viral diversity as the query could be fast and sensitive without a requirement for high performance computing hardware. We tested our approach on Spodoptera frugiperda Sf-RVN, a recently isolated insect cell line, to determine if it was contaminated with one or more adventitious viruses. We used Illumina reads to assemble the Sf-RVN genome and transcriptome and searched them for adventitious viral contaminants using TBLASTN with our viral protein database. We found no evidence of viral contamination, which was substantiated by the fact that our searches otherwise identified diverse sequences encoding virus-like proteins. These sequences included Maverick, R1 LINE, and errantivirus transposons, all of which are common in insect genomes. We also identified previously described as well as novel endogenous viral elements similar to ORFs encoded by diverse insect viruses. Our results demonstrate TBLASTN searching massively parallel sequencing (MPS) assemblies with a compact, manually curated viral protein database is more sensitive for adventitious virus detection than BLASTN, as we identified various sequences that encoded virus-like proteins, but had no similarity to viral sequences at the nucleotide level. Moreover, searches were fast without requiring high performance computing hardware. Our study also documents the enhanced biosafety profile of Sf-RVN as compared to other Sf cell lines, and supports the notion that Sf-RVN is highly suitable for the production of safe biologicals.
Compression of next-generation sequencing reads aided by highly efficient de novo assembly
Jones, Daniel C.; Ruzzo, Walter L.; Peng, Xinxia
2012-01-01
We present Quip, a lossless compression algorithm for next-generation sequencing data in the FASTQ and SAM/BAM formats. In addition to implementing reference-based compression, we have developed, to our knowledge, the first assembly-based compressor, using a novel de novo assembly algorithm. A probabilistic data structure is used to dramatically reduce the memory required by traditional de Bruijn graph assemblers, allowing millions of reads to be assembled very efficiently. Read sequences are then stored as positions within the assembled contigs. This is combined with statistical compression of read identifiers, quality scores, alignment information and sequences, effectively collapsing very large data sets to <15% of their original size with no loss of information. Availability: Quip is freely available under the 3-clause BSD license from http://cs.washington.edu/homes/dcjones/quip. PMID:22904078
Reference-guided assembly of four diverse Arabidopsis thaliana genomes
Schneeberger, Korbinian; Ossowski, Stephan; Ott, Felix; Klein, Juliane D.; Wang, Xi; Lanz, Christa; Smith, Lisa M.; Cao, Jun; Fitz, Joffrey; Warthmann, Norman; Henz, Stefan R.; Huson, Daniel H.; Weigel, Detlef
2011-01-01
We present whole-genome assemblies of four divergent Arabidopsis thaliana strains that complement the 125-Mb reference genome sequence released a decade ago. Using a newly developed reference-guided approach, we assembled large contigs from 9 to 42 Gb of Illumina short-read data from the Landsberg erecta (Ler-1), C24, Bur-0, and Kro-0 strains, which have been sequenced as part of the 1,001 Genomes Project for this species. Using alignments against the reference sequence, we first reduced the complexity of the de novo assembly and later integrated reads without similarity to the reference sequence. As an example, half of the noncentromeric C24 genome was covered by scaffolds that are longer than 260 kb, with a maximum of 2.2 Mb. Moreover, over 96% of the reference genome was covered by the reference-guided assembly, compared with only 87% with a complete de novo assembly. Comparisons with 2 Mb of dideoxy sequence reveal that the per-base error rate of the reference-guided assemblies was below 1 in 10,000. Our assemblies provide a detailed, genomewide picture of large-scale differences between A. thaliana individuals, most of which are difficult to access with alignment-consensus methods only. We demonstrate their practical relevance in studying the expression differences of polymorphic genes and show how the analysis of sRNA sequencing data can lead to erroneous conclusions if aligned against the reference genome alone. Genome assemblies, raw reads, and further information are accessible through http://1001genomes.org/projects/assemblies.html. PMID:21646520
Pang, Chaoyou; Fan, Shuli; Song, Meizhen; Yu, Shuxun
2013-01-01
Background Cotton (Gossypium hirsutum L.) is one of the world’s most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. Methodology/Principal Findings In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR), which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. Conclusions/Significance These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence assembly and annotation in G. hirsutum and comparative genomics among Gossypium species. PMID:24146870
Baptista, Rodrigo P; Reis-Cunha, Joao Luis; DeBarry, Jeremy D; Chiari, Egler; Kissinger, Jessica C; Bartholomeu, Daniella C; Macedo, Andrea M
2018-02-14
Next-generation sequencing (NGS) methods are low-cost high-throughput technologies that produce thousands to millions of sequence reads. Despite the high number of raw sequence reads, their short length, relative to Sanger, PacBio or Nanopore reads, complicates the assembly of genomic repeats. Many genome tools are available, but the assembly of highly repetitive genome sequences using only NGS short reads remains challenging. Genome assembly of organisms responsible for important neglected diseases such as Trypanosoma cruzi, the aetiological agent of Chagas disease, is known to be challenging because of their repetitive nature. Only three of six recognized discrete typing units (DTUs) of the parasite have their draft genomes published and therefore genome evolution analyses in the taxon are limited. In this study, we developed a computational workflow to assemble highly repetitive genomes via a combination of de novo and reference-based assembly strategies to better overcome the intrinsic limitations of each, based on Illumina reads. The highly repetitive genome of the human-infecting parasite T. cruzi 231 strain was used as a test subject. The combined-assembly approach shown in this study benefits from the reference-based assembly ability to resolve highly repetitive sequences and from the de novo capacity to assemble genome-specific regions, improving the quality of the assembly. The acceptable confidence obtained by analyzing our results showed that our combined approach is an attractive option to assemble highly repetitive genomes with NGS short reads. Phylogenomic analysis including the 231 strain, the first representative of DTU III whose genome was sequenced, was also performed and provides new insights into T. cruzi genome evolution.
Aguilera-Cogley, Vidal Antonio; Berbegal, Mónica; Català, Santiago; Brentu, Francis Collison; Armengol, Josep
2017-01-01
Greasy spot of citrus, caused by Zasmidium citri-griseum (= Mycosphaerella citri), is widely distributed in the Caribbean Basin, inducing leaf spots, premature defoliation, and yield loss. Greasy spot-like symptoms were frequently observed in humid citrus-growing regions in Panama as well as in semi-arid areas in Spain, but disease aetiology was unknown. Citrus-growing areas in Panama and Spain were surveyed and isolates of Mycosphaerellaceae were obtained from citrus greasy spot lesions. A selection of isolates from Panama (n = 22) and Spain (n = 16) was assembled based on their geographical origin, citrus species, and affected tissue. The isolates were characterized based on multi-locus DNA (ITS and EF-1α) sequence analyses, morphology, growth at different temperatures, and independent pathogenicity tests on the citrus species most affected in each country. Reference isolates and sequences were also included in the analysis. Isolates from Panama were identified as Z. citri-griseum complex, and others from Spain attributed to Amycosphaerella africana. Isolates of the Z. citri-griseum complex had a significantly higher optimal growth temperature (26.8°C) than those of A. africana (19.3°C), which corresponded well with their actual biogeographical range. The isolates of the Z. citri-griseum complex from Panama induced typical greasy spot symptoms in ‘Valencia’ sweet orange plants and the inoculated fungi were reisolated. No symptoms were observed in plants of the ‘Ortanique’ tangor inoculated with A. africana. These results demonstrate the presence of citrus greasy spot, caused by Z. citri-griseum complex, in Panama whereas A. africana was associated with greasy spot-like symptoms in Spain. PMID:29236789
Rao, Soumya; Nandineni, Madhusudan R
2017-01-01
Colletotrichum truncatum, a major fungal phytopathogen, causes the anthracnose disease on an economically important spice crop chilli (Capsicum annuum), resulting in huge economic losses in tropical and sub-tropical countries. It follows a subcuticular intramural infection strategy on chilli with a short, asymptomatic, endophytic phase, which contrasts with the intracellular hemibiotrophic lifestyle adopted by most of the Colletotrichum species. However, little is known about the molecular determinants and the mechanism of pathogenicity in this fungus. A high quality whole genome sequence and gene annotation based on transcriptome data of an Indian isolate of C. truncatum from chilli has been obtained. Analysis of the genome sequence revealed a rich repertoire of pathogenicity genes in C. truncatum encoding secreted proteins, effectors, plant cell wall degrading enzymes, secondary metabolism associated proteins, with potential roles in the host-specific infection strategy, placing it next only to the Fusarium species. The size of genome assembly, number of predicted genes and some of the functional categories were similar to other sequenced Colletotrichum species. The comparative genomic analyses with other species and related fungi identified some unique genes and certain highly expanded gene families of CAZymes, proteases and secondary metabolism associated genes in the genome of C. truncatum. The draft genome assembly and functional annotation of potential pathogenicity genes of C. truncatum provide an important genomic resource for understanding the biology and lifestyle of this important phytopathogen and will pave the way for designing efficient disease control regimens.
Stabej, P; Leegwater, P A J; Imholz, S; Versteeg, S A; Zijlstra, C; Stokhof, A A; Domanjko-Petriè, A; van Oost, B A
2005-01-01
Dilated cardiomyopathy (DCM) is a common disease of the myocardium recognized in human, dog and experimental animals. Genetic factors are responsible for a large proportion of cases in humans, and 17 genes with DCM causing mutations have been identified. The genetic origin of DCM in the Dobermann dogs has been suggested, but no disease genes have been identified to date. In this paper, we describe the characterization and evaluation of the canine sarcoglycan delta (SGCD), a gene implicated in DCM in human and hamster. Bacterial artificial chromosomes (BACs) containing the canine SGCD gene were isolated with probes for exon 3 and exons 4-8 and were characterized by Southern blot analysis. BAC end sequences were obtained for four BACs. Three of the BACs overlapped and could be ordered relative to each other and the end sequences of all four BACs could be anchored on the preliminary assembly of the dog genome sequence (www. ensembl.org). One of the BACs of the partial contig was localized by fluorescent in situ hybridization to canine chromosome 4q22, in agreement with the dog genome sequence. Two highly informative polymorphic microsatellite markers in intron 7 of the SGCD gene were identified. In 25 DCM-affected and 13 non DCM-affected dogs seven different haplotypes could be distinguished. However, no association between any of the SGCD variants and the disease locus was apparent.
Rao, Soumya
2017-01-01
Colletotrichum truncatum, a major fungal phytopathogen, causes the anthracnose disease on an economically important spice crop chilli (Capsicum annuum), resulting in huge economic losses in tropical and sub-tropical countries. It follows a subcuticular intramural infection strategy on chilli with a short, asymptomatic, endophytic phase, which contrasts with the intracellular hemibiotrophic lifestyle adopted by most of the Colletotrichum species. However, little is known about the molecular determinants and the mechanism of pathogenicity in this fungus. A high quality whole genome sequence and gene annotation based on transcriptome data of an Indian isolate of C. truncatum from chilli has been obtained. Analysis of the genome sequence revealed a rich repertoire of pathogenicity genes in C. truncatum encoding secreted proteins, effectors, plant cell wall degrading enzymes, secondary metabolism associated proteins, with potential roles in the host-specific infection strategy, placing it next only to the Fusarium species. The size of genome assembly, number of predicted genes and some of the functional categories were similar to other sequenced Colletotrichum species. The comparative genomic analyses with other species and related fungi identified some unique genes and certain highly expanded gene families of CAZymes, proteases and secondary metabolism associated genes in the genome of C. truncatum. The draft genome assembly and functional annotation of potential pathogenicity genes of C. truncatum provide an important genomic resource for understanding the biology and lifestyle of this important phytopathogen and will pave the way for designing efficient disease control regimens. PMID:28846714
A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome
Chapman, Jarrod A.; Mascher, Martin; Buluc, Aydin; ...
2015-01-31
We report that polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible tomore » construct a mapping population.« less
A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chapman, Jarrod A.; Mascher, Martin; Buluc, Aydin
We report that polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible tomore » construct a mapping population.« less
Puthoff, D P; Neelam, A; Ehrenfried, M L; Scheffler, B E; Ballard, L; Song, Q; Campbell, K B; Cooper, B; Tucker, M L
2008-10-01
Hyphae, 2 to 8 days postinoculation (dpi), and haustoria, 5 dpi, were isolated from Uromyces appendiculatus infected bean leaves (Phaseolus vulgaris cv. Pinto 111) and a separate cDNA library prepared for each fungal preparation. Approximately 10,000 hyphae and 2,700 haustoria clones were sequenced from both the 5' and 3' ends. Assembly of all of the fungal sequences yielded 3,359 contigs and 927 singletons. The U. appendiculatus sequences were compared with sequence data for other rust fungi, Phakopsora pachyrhizi, Uromyces fabae, and Puccinia graminis. The U. appendiculatus haustoria library included a large number of genes with unknown cellular function; however, summation of sequences of known cellular function suggested that haustoria at 5 dpi had fewer transcripts linked to protein synthesis in favor of energy metabolism and nutrient uptake. In addition, open reading frames in the U. appendiculatus data set with an N-terminal signal peptide were identified and compared with other proteins putatively secreted from rust fungi. In this regard, a small family of putatively secreted RTP1-like proteins was identified in U. appendiculatus and P. graminis.
USDA-ARS?s Scientific Manuscript database
Next-generation sequencing technologies were used to rapidly and efficiently sequence the genome of the domestic turkey (Meleagris gallopavo). The current genome assembly (~1.1 Gb) includes 917 Mb of sequence assigned to chromosomes. Innate heterozygosity of the sequenced bird allowed discovery of...
Graph mining for next generation sequencing: leveraging the assembly graph for biological insights.
Warnke-Sommer, Julia; Ali, Hesham
2016-05-06
The assembly of Next Generation Sequencing (NGS) reads remains a challenging task. This is especially true for the assembly of metagenomics data that originate from environmental samples potentially containing hundreds to thousands of unique species. The principle objective of current assembly tools is to assemble NGS reads into contiguous stretches of sequence called contigs while maximizing for both accuracy and contig length. The end goal of this process is to produce longer contigs with the major focus being on assembly only. Sequence read assembly is an aggregative process, during which read overlap relationship information is lost as reads are merged into longer sequences or contigs. The assembly graph is information rich and capable of capturing the genomic architecture of an input read data set. We have developed a novel hybrid graph in which nodes represent sequence regions at different levels of granularity. This model, utilized in the assembly and analysis pipeline Focus, presents a concise yet feature rich view of a given input data set, allowing for the extraction of biologically relevant graph structures for graph mining purposes. Focus was used to create hybrid graphs to model metagenomics data sets obtained from the gut microbiomes of five individuals with Crohn's disease and eight healthy individuals. Repetitive and mobile genetic elements are found to be associated with hybrid graph structure. Using graph mining techniques, a comparative study of the Crohn's disease and healthy data sets was conducted with focus on antibiotics resistance genes associated with transposase genes. Results demonstrated significant differences in the phylogenetic distribution of categories of antibiotics resistance genes in the healthy and diseased patients. Focus was also evaluated as a pure assembly tool and produced excellent results when compared against the Meta-velvet, Omega, and UD-IDBA assemblers. Mining the hybrid graph can reveal biological phenomena captured by its structure. We demonstrate the advantages of considering assembly graphs as data-mining support in addition to their role as frameworks for assembly.
Li, Xi; Zhu, Yongze; Shen, Mengyuan; Du, Jing; Zhang, Lei; Wang, Dairong
2018-03-01
Enterobacter cloacae is one of the major pathogens responsible for a variety of human infections. Here we report the draft genome sequence of multidrug-resistant E. cloacae strain HBY isolated from a female patient in China. Whole genomic DNA of E. cloacae strain HBY was extracted and was sequenced using an Illumina HiSeq™ 2000 platform. The generated sequence reads were assembled using CLC Genomics Workbench. The draft genome was annotated using Rapid Annotations using Subsystems Technology (RAST), and the presence of antimicrobial resistance genes was identified. The 5799439-bp genome contains various antimicrobial resistance genes conferring resistance to aminoglycosides, β-lactams, fosfomycin, macrolides, sulphonamides and fluoroquinolones. Notably, the strain was identified to carry two main carbapenemase genes (bla KPC-2 and bla NDM-1 ). The genome sequence reported in this study will provide valuable information to understand antibiotic resistance mechanisms in this strain. It is important to monitor the spread strains of Enterobacter sp. encoding both of these carbapenemase genes. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Optimizing and evaluating the reconstruction of Metagenome-assembled microbial genomes.
Papudeshi, Bhavya; Haggerty, J Matthew; Doane, Michael; Morris, Megan M; Walsh, Kevin; Beattie, Douglas T; Pande, Dnyanada; Zaeri, Parisa; Silva, Genivaldo G Z; Thompson, Fabiano; Edwards, Robert A; Dinsdale, Elizabeth A
2017-11-28
Microbiome/host interactions describe characteristics that affect the host's health. Shotgun metagenomics includes sequencing a random subset of the microbiome to analyze its taxonomic and metabolic potential. Reconstruction of DNA fragments into genomes from metagenomes (called metagenome-assembled genomes) assigns unknown fragments to taxa/function and facilitates discovery of novel organisms. Genome reconstruction incorporates sequence assembly and sorting of assembled sequences into bins, characteristic of a genome. However, the microbial community composition, including taxonomic and phylogenetic diversity may influence genome reconstruction. We determine the optimal reconstruction method for four microbiome projects that had variable sequencing platforms (IonTorrent and Illumina), diversity (high or low), and environment (coral reefs and kelp forests), using a set of parameters to select for optimal assembly and binning tools. We tested the effects of the assembly and binning processes on population genome reconstruction using 105 marine metagenomes from 4 projects. Reconstructed genomes were obtained from each project using 3 assemblers (IDBA, MetaVelvet, and SPAdes) and 2 binning tools (GroopM and MetaBat). We assessed the efficiency of assemblers using statistics that including contig continuity and contig chimerism and the effectiveness of binning tools using genome completeness and taxonomic identification. We concluded that SPAdes, assembled more contigs (143,718 ± 124 contigs) of longer length (N50 = 1632 ± 108 bp), and incorporated the most sequences (sequences-assembled = 19.65%). The microbial richness and evenness were maintained across the assembly, suggesting low contig chimeras. SPAdes assembly was responsive to the biological and technological variations within the project, compared with other assemblers. Among binning tools, we conclude that MetaBat produced bins with less variation in GC content (average standard deviation: 1.49), low species richness (4.91 ± 0.66), and higher genome completeness (40.92 ± 1.75) across all projects. MetaBat extracted 115 bins from the 4 projects of which 66 bins were identified as reconstructed metagenome-assembled genomes with sequences belonging to a specific genus. We identified 13 novel genomes, some of which were 100% complete, but show low similarity to genomes within databases. In conclusion, we present a set of biologically relevant parameters for evaluation to select for optimal assembly and binning tools. For the tools we tested, SPAdes assembler and MetaBat binning tools reconstructed quality metagenome-assembled genomes for the four projects. We also conclude that metagenomes from microbial communities that have high coverage of phylogenetically distinct, and low taxonomic diversity results in highest quality metagenome-assembled genomes.
ABACAS: algorithm-based automatic contiguation of assembled sequences
Assefa, Samuel; Keane, Thomas M.; Otto, Thomas D.; Newbold, Chris; Berriman, Matthew
2009-01-01
Summary: Due to the availability of new sequencing technologies, we are now increasingly interested in sequencing closely related strains of existing finished genomes. Recently a number of de novo and mapping-based assemblers have been developed to produce high quality draft genomes from new sequencing technology reads. New tools are necessary to take contigs from a draft assembly through to a fully contiguated genome sequence. ABACAS is intended as a tool to rapidly contiguate (align, order, orientate), visualize and design primers to close gaps on shotgun assembled contigs based on a reference sequence. The input to ABACAS is a set of contigs which will be aligned to the reference genome, ordered and orientated, visualized in the ACT comparative browser, and optimal primer sequences are automatically generated. Availability and Implementation: ABACAS is implemented in Perl and is freely available for download from http://abacas.sourceforge.net Contact: sa4@sanger.ac.uk PMID:19497936
Genome Sequencing and Assembly by Long Reads in Plants
Li, Changsheng; Lin, Feng; An, Dong; Huang, Ruidong
2017-01-01
Plant genomes generated by Sanger and Next Generation Sequencing (NGS) have provided insight into species diversity and evolution. However, Sanger sequencing is limited in its applications due to high cost, labor intensity, and low throughput, while NGS reads are too short to resolve abundant repeats and polyploidy, leading to incomplete or ambiguous assemblies. The advent and improvement of long-read sequencing by Third Generation Sequencing (TGS) methods such as PacBio and Nanopore have shown promise in producing high-quality assemblies for complex genomes. Here, we review the development of sequencing, introducing the application as well as considerations of experimental design in TGS of plant genomes. We also introduce recent revolutionary scaffolding technologies including BioNano, Hi-C, and 10× Genomics. We expect that the informative guidance for genome sequencing and assembly by long reads will benefit the initiation of scientists’ projects. PMID:29283420
Metagenomic Assembly: Overview, Challenges and Applications
Ghurye, Jay S.; Cepeda-Espinoza, Victoria; Pop, Mihai
2016-01-01
Advances in sequencing technologies have led to the increased use of high throughput sequencing in characterizing the microbial communities associated with our bodies and our environment. Critical to the analysis of the resulting data are sequence assembly algorithms able to reconstruct genes and organisms from complex mixtures. Metagenomic assembly involves new computational challenges due to the specific characteristics of the metagenomic data. In this survey, we focus on major algorithmic approaches for genome and metagenome assembly, and discuss the new challenges and opportunities afforded by this new field. We also review several applications of metagenome assembly in addressing interesting biological problems. PMID:27698619
Solving Assembly Sequence Planning using Angle Modulated Simulated Kalman Filter
NASA Astrophysics Data System (ADS)
Mustapa, Ainizar; Yusof, Zulkifli Md.; Adam, Asrul; Muhammad, Badaruddin; Ibrahim, Zuwairie
2018-03-01
This paper presents an implementation of Simulated Kalman Filter (SKF) algorithm for optimizing an Assembly Sequence Planning (ASP) problem. The SKF search strategy contains three simple steps; predict-measure-estimate. The main objective of the ASP is to determine the sequence of component installation to shorten assembly time or save assembly costs. Initially, permutation sequence is generated to represent each agent. Each agent is then subjected to a precedence matrix constraint to produce feasible assembly sequence. Next, the Angle Modulated SKF (AMSKF) is proposed for solving ASP problem. The main idea of the angle modulated approach in solving combinatorial optimization problem is to use a function, g(x), to create a continuous signal. The performance of the proposed AMSKF is compared against previous works in solving ASP by applying BGSA, BPSO, and MSPSO. Using a case study of ASP, the results show that AMSKF outperformed all the algorithms in obtaining the best solution.
An Enterobacter plasmid as a new genetic background for the transposon Tn1331
Alavi, Mohammad R; Antonic, Vlado; Ravizee, Adrien; Weina, Peter J; Izadjoo, Mina; Stojadinovic, Alexander
2011-01-01
Background Genus Enterobacter includes important opportunistic nosocomial pathogens that could infect complex wounds. The presence of antibiotic resistance genes in these microorganisms represents a challenging clinical problem in the treatment of these wounds. In the authors’ screening of antibiotic-resistant bacteria from complex wounds, an Enterobacter species was isolated that harbors antibiotic-resistant plasmids conferring resistance to Escherichia coli. The aim of this study was to identify the resistance genes carried by one of these plasmids. Methods The plasmids from the Enterobacter isolate were propagated in E. coli and one of the plasmids, designated as pR23, was sequenced by the Sanger method using fluorescent dyeterminator chemistry on a genetic analyzer. The assembled sequence was annotated by search of the GenBank database. Results Plasmid pR23 is composed of the transposon Tn1331 and a backbone plasmid that is identical to the plasmid pPIGDM1 from Enterobacter agglomerans. The multidrug-resistance transposon Tn1331, which confers resistance to aminoglycoside and beta lactam antibiotics, has been previously isolated only from Klebsiella. The Enterobacter plasmid pPIGDM1, which carries a ColE1-like origin of replication and has no apparent selective marker, appears to provide a backbone for propagation of Tn1331 in Enterobacter. The recognition sequence of Tn1331 transposase for insertion into pPIGDM1 is the pentanucleotide TATTA, which occurs only once throughout the length of this plasmid. Conclusion Transposition of Tn1331 into the Enterobacter plasmid pPIGDM1 enables this transposon to propagate in this Enterobacter. Since Tn1331 was previously isolated only from Klebsiella, this report suggests horizontal transfer of this transposon between the two bacterial genera. PMID:22259249
Next Generation Sequence Assembly with AMOS
Treangen, Todd J; Sommer, Dan D; Angly, Florent E; Koren, Sergey; Pop, Mihai
2011-01-01
A Modular Open-Source Assembler (AMOS) was designed to offer a modular approach to genome assembly. AMOS includes a wide range of tools for assembly, including lightweight de novo assemblers Minimus and Minimo, and Bambus 2, a robust scaffolder able to handle metagenomic and polymorphic data. This protocol describes how to configure and use AMOS for the assembly of Next Generation sequence data. Additionally, we provide three tutorial examples that include bacterial, viral, and metagenomic datasets with specific tips for improving assembly quality. PMID:21400694
Reference-guided de novo assembly approach improves genome reconstruction for related species.
Lischer, Heidi E L; Shimizu, Kentaro K
2017-11-10
The development of next-generation sequencing has made it possible to sequence whole genomes at a relatively low cost. However, de novo genome assemblies remain challenging due to short read length, missing data, repetitive regions, polymorphisms and sequencing errors. As more and more genomes are sequenced, reference-guided assembly approaches can be used to assist the assembly process. However, previous methods mostly focused on the assembly of other genotypes within the same species. We adapted and extended a reference-guided de novo assembly approach, which enables the usage of a related reference sequence to guide the genome assembly. In order to compare and evaluate de novo and our reference-guided de novo assembly approaches, we used a simulated data set of a repetitive and heterozygotic plant genome. The extended reference-guided de novo assembly approach almost always outperforms the corresponding de novo assembly program even when a reference of a different species is used. Similar improvements can be observed in high and low coverage situations. In addition, we show that a single evaluation metric, like the widely used N50 length, is not enough to properly rate assemblies as it not always points to the best assembly evaluated with other criteria. Therefore, we used the summed z-scores of 36 different statistics to evaluate the assemblies. The combination of reference mapping and de novo assembly provides a powerful tool to improve genome reconstruction by integrating information of a related genome. Our extension of the reference-guided de novo assembly approach enables the application of this strategy not only within but also between related species. Finally, the evaluation of genome assemblies is often not straight forward, as the truth is not known. Thus one should always use a combination of evaluation metrics, which not only try to assess the continuity but also the accuracy of an assembly.
Planning Assembly Of Large Truss Structures In Outer Space
NASA Technical Reports Server (NTRS)
De Mello, Luiz S. Homem; Desai, Rajiv S.
1992-01-01
Report dicusses developmental algorithm used in systematic planning of sequences of operations in which large truss structures assembled in outer space. Assembly sequence represented by directed graph called "assembly graph", in which each arc represents joining of two parts or subassemblies. Algorithm generates assembly graph, working backward from state of complete assembly to initial state, in which all parts disassembled. Working backward more efficient than working forward because it avoids intermediate dead ends.
Mataseje, L F; Boyd, D A; Lefebvre, B; Bryce, E; Embree, J; Gravel, D; Katz, K; Kibsey, P; Kuhn, M; Langley, J; Mitchell, R; Roscoe, D; Simor, A; Taylor, G; Thomas, E; Turgeon, N; Mulvey, M R
2014-03-01
Emergence of plasmids harbouring bla(NDM-1) is a major public health concern due to their association with multidrug resistance and their potential mobility. PCR was used to detect bla(NDM-1) from clinical isolates of Providencia rettgeri (PR) and Klebsiella pneumoniae (KP). Antimicrobial susceptibilities were determined using Vitek 2. The complete DNA sequence of two bla(NDM-1) plasmids (pPrY2001 and pKp11-42) was obtained using a 454-Genome Sequencer FLX. Contig assembly and gap closures were confirmed by PCR-based sequencing. Comparative analysis was done using BLASTn and BLASTp algorithms. Both clinical isolates were resistant to all β-lactams, carbapenems, aminoglycosides, ciprofloxacin and trimethoprim/sulfamethoxazole, and susceptible to tigecycline. Plasmid pPrY2001 (113 295 bp) was isolated from PR. It did not show significant homology to any known plasmid backbone and contained a truncated repA and novel repB. Two bla(NDM-1)-harbouring plasmids from Acinetobacter lwoffii (JQ001791 and JQ060896) shared 100% similarity to a 15 kb region that contained bla(NDM-1). pPrY2001 also contained a type II toxin/antitoxin system. pKp11-42 (146 695 bp) was isolated from KP. It contained multiple repA genes. The plasmid backbone had the highest homology to the IncFIIk plasmid type (51% coverage, 100% nucleotide identity). The bla(NDM-1) region was unique in that it was flanked upstream by IS3000 and downstream by a novel transposon designated Tn6229. pKp11-42 also contained a number of mutagenesis and plasmid stability proteins. pPrY2001 differed from all known plasmids due to its novel backbone and repB. pKp11-42 was similar to IncFIIk plasmids and contained a number of genes that aid in plasmid persistence.
Transcriptome deep-sequencing and clustering of expressed isoforms from Favia corals
2013-01-01
Background Genomic and transcriptomic sequence data are essential tools for tackling ecological problems. Using an approach that combines next-generation sequencing, de novo transcriptome assembly, gene annotation and synthetic gene construction, we identify and cluster the protein families from Favia corals from the northern Red Sea. Results We obtained 80 million 75 bp paired-end cDNA reads from two Favia adult samples collected at 65 m (Fav1, Fav2) on the Illumina GA platform, and generated two de novo assemblies using ABySS and CAP3. After removing redundancy and filtering out low quality reads, our transcriptome datasets contained 58,268 (Fav1) and 62,469 (Fav2) contigs longer than 100 bp, with N50 values of 1,665 bp and 1,439 bp, respectively. Using the proteome of the sea anemone Nematostella vectensis as a reference, we were able to annotate almost 20% of each dataset using reciprocal homology searches. Homologous clustering of these annotated transcripts allowed us to divide them into 7,186 (Fav1) and 6,862 (Fav2) homologous transcript clusters (E-value ≤ 2e-30). Functional annotation categories were assigned to homologous clusters using the functional annotation of Nematostella vectensis. General annotation of the assembled transcripts was improved 1-3% using the Acropora digitifera proteome. In addition, we screened these transcript isoform clusters for fluorescent proteins (FPs) homologs and identified seven potential FP homologs in Fav1, and four in Fav2. These transcripts were validated as bona fide FP transcripts via robust fluorescence heterologous expression. Annotation of the assembled contigs revealed that 1.34% and 1.61% (in Fav1 and Fav2, respectively) of the total assembled contigs likely originated from the corals’ algal symbiont, Symbiodinium spp. Conclusions Here we present a study to identify the homologous transcript isoform clusters from the transcriptome of Favia corals using a far-related reference proteome. Furthermore, the symbiont-derived transcripts were isolated from the datasets and their contribution quantified. This is the first annotated transcriptome of the genus Favia, a major increase in genomics resources available in this important family of corals. PMID:23937070
Gradient isolator for flow field of fuel cell assembly
Ernst, W.D.
1999-06-15
Isolator(s) include isolating material and optionally gasketing material strategically positioned within a fuel cell assembly. The isolating material is disposed between a solid electrolyte and a metal flow field plate. Reactant fluid carried by flow field plate channel(s) forms a generally transverse electrochemical gradient. The isolator(s) serve to isolate electrochemically a portion of the flow field plate, for example, transversely outward from the channel(s), from the electrochemical gradient. Further, the isolator(s) serve to protect a portion of the solid electrolyte from metallic ions. 4 figs.
Gradient isolator for flow field of fuel cell assembly
Ernst, William D.
1999-01-01
Isolator(s) include isolating material and optionally gasketing material strategically positioned within a fuel cell assembly. The isolating material is disposed between a solid electrolyte and a metal flow field plate. Reactant fluid carried by flow field plate channel(s) forms a generally transverse electrochemical gradient. The isolator(s) serve to isolate electrochemically a portion of the flow field plate, for example, transversely outward from the channel(s), from the electrochemical gradient. Further, the isolator(s) serve to protect a portion of the solid electrolyte from metallic ions.
Olson, Nathan D; Treangen, Todd J; Hill, Christopher M; Cepeda-Espinoza, Victoria; Ghurye, Jay; Koren, Sergey; Pop, Mihai
2017-08-07
Metagenomic samples are snapshots of complex ecosystems at work. They comprise hundreds of known and unknown species, contain multiple strain variants and vary greatly within and across environments. Many microbes found in microbial communities are not easily grown in culture making their DNA sequence our only clue into their evolutionary history and biological function. Metagenomic assembly is a computational process aimed at reconstructing genes and genomes from metagenomic mixtures. Current methods have made significant strides in reconstructing DNA segments comprising operons, tandem gene arrays and syntenic blocks. Shorter, higher-throughput sequencing technologies have become the de facto standard in the field. Sequencers are now able to generate billions of short reads in only a few days. Multiple metagenomic assembly strategies, pipelines and assemblers have appeared in recent years. Owing to the inherent complexity of metagenome assembly, regardless of the assembly algorithm and sequencing method, metagenome assemblies contain errors. Recent developments in assembly validation tools have played a pivotal role in improving metagenomics assemblers. Here, we survey recent progress in the field of metagenomic assembly, provide an overview of key approaches for genomic and metagenomic assembly validation and demonstrate the insights that can be derived from assemblies through the use of assembly validation strategies. We also discuss the potential for impact of long-read technologies in metagenomics. We conclude with a discussion of future challenges and opportunities in the field of metagenomic assembly and validation. © The Author 2017. Published by Oxford University Press.
A RESTful application programming interface for the PubMLST molecular typing and genome databases
Bray, James E.; Maiden, Martin C. J.
2017-01-01
Abstract Molecular typing is used to differentiate microorganisms at the subspecies or strain level for epidemiological investigations, infection control, public health and environmental sampling. DNA sequence-based typing methods require authoritative databases that link sequence variants to nomenclature in order to facilitate communication and comparison of identified types in national or global settings. The PubMLST website (https://pubmlst.org/) fulfils this role for over a hundred microorganisms for which it hosts curated molecular sequence typing data, providing sequence and allelic profile definitions for multi-locus sequence typing (MLST) and single-gene typing approaches. In recent years, these have expanded to cover the whole genome with schemes such as core genome MLST (cgMLST) and whole genome MLST (wgMLST) which catalogue the allelic diversity found in hundreds to thousands of genes. These approaches provide a common nomenclature for high-resolution strain characterization and comparison. Molecular typing information is linked to isolate provenance, phenotype, and increasingly genome assemblies, providing a resource for outbreak investigation and research in to population structure, gene association, global epidemiology and vaccine coverage. A Representational State Transfer (REST) Application Programming Interface (API) has been developed for the PubMLST website to make these large quantities of structured molecular typing and whole genome sequence data available for programmatic access by any third party application. The API is an integral component of the Bacterial Isolate Genome Sequence Database (BIGSdb) platform that is used to host PubMLST resources, and exposes all public data within the site. In addition to data browsing, searching and download, the API supports authentication and submission of new data to curator queues. Database URL: http://rest.pubmlst.org/ PMID:29220452
Brumm, Phillip; Land, Miriam L; Hauser, Loren J; Jeffries, Cynthia D; Chang, Yun-Juan; Mead, David A
2015-01-01
Geobacillus sp. Y412MC52 was isolated from Obsidian Hot Spring, Yellowstone National Park, Montana, USA under permit from the National Park Service. The genome was sequenced, assembled, and annotated by the DOE Joint Genome Institute and deposited at the NCBI in December 2011 (CP002835). Based on 16S rRNA genes and average nucleotide identity, Geobacillus sp. Y412MC52 and the related Geobacillus sp. Y412MC61 appear to be members of a new species of Geobacillus. The genome of Geobacillus sp. Y412MC52 consists of one circular chromosome of 3,628,883 bp, an average G + C content of 52 % and one circular plasmid of 45,057 bp and an average G + C content of 45 %. Y412MC52 possesses arabinan, arabinoglucuronoxylan, and aromatic acid degradation clusters for degradation of hemicellulose from biomass. Transport and utilization clusters are also present for other carbohydrates including starch, cellobiose, and α- and β-galactooligosaccharides.
Brumm, Phillip; Land, Miriam L.; Hauser, Loren J.; ...
2015-10-19
We isolated geobacillus sp. Y412MC52 from Obsidian Hot Spring, Yellowstone National Park, Montana, USA under permit from the National Park Service. The genome was sequenced, assembled, and annotated by the DOE Joint Genome Institute and deposited at the NCBI in December 2011 (CP002835). Based on 16S rRNA genes and average nucleotide identity, Geobacillus sp. Y412MC52 and the related Geobacillus sp. Y412MC61 appear to be members of a new species of Geobacillus. Moreover, te genome of Geobacillus sp. Y412MC52 consists of one circular chromosome of 3,628,883 bp, an average G + C content of 52 % and one circular plasmid ofmore » 45,057 bp and an average G + C content of 45 %. Y412MC52 possesses arabinan, arabinoglucuronoxylan, and aromatic acid degradation clusters for degradation of hemicellulose from biomass. Finally, we present transport and utilization clusters for other carbohydrates including starch, cellobiose, and - and -galactooligosaccharides.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brumm, Phillip; Land, Miriam L.; Hauser, Loren J.
We isolated geobacillus sp. Y412MC52 from Obsidian Hot Spring, Yellowstone National Park, Montana, USA under permit from the National Park Service. The genome was sequenced, assembled, and annotated by the DOE Joint Genome Institute and deposited at the NCBI in December 2011 (CP002835). Based on 16S rRNA genes and average nucleotide identity, Geobacillus sp. Y412MC52 and the related Geobacillus sp. Y412MC61 appear to be members of a new species of Geobacillus. Moreover, te genome of Geobacillus sp. Y412MC52 consists of one circular chromosome of 3,628,883 bp, an average G + C content of 52 % and one circular plasmid ofmore » 45,057 bp and an average G + C content of 45 %. Y412MC52 possesses arabinan, arabinoglucuronoxylan, and aromatic acid degradation clusters for degradation of hemicellulose from biomass. Finally, we present transport and utilization clusters for other carbohydrates including starch, cellobiose, and - and -galactooligosaccharides.« less
Dynamic peptide libraries for the discovery of supramolecular nanomaterials
NASA Astrophysics Data System (ADS)
Pappas, Charalampos G.; Shafi, Ramim; Sasselli, Ivan R.; Siccardi, Henry; Wang, Tong; Narang, Vishal; Abzalimov, Rinat; Wijerathne, Nadeesha; Ulijn, Rein V.
2016-11-01
Sequence-specific polymers, such as oligonucleotides and peptides, can be used as building blocks for functional supramolecular nanomaterials. The design and selection of suitable self-assembling sequences is, however, challenging because of the vast combinatorial space available. Here we report a methodology that allows the peptide sequence space to be searched for self-assembling structures. In this approach, unprotected homo- and heterodipeptides (including aromatic, aliphatic, polar and charged amino acids) are subjected to continuous enzymatic condensation, hydrolysis and sequence exchange to create a dynamic combinatorial peptide library. The free-energy change associated with the assembly process itself gives rise to selective amplification of self-assembling candidates. By changing the environmental conditions during the selection process, different sequences and consequent nanoscale morphologies are selected.
Dynamic peptide libraries for the discovery of supramolecular nanomaterials.
Pappas, Charalampos G; Shafi, Ramim; Sasselli, Ivan R; Siccardi, Henry; Wang, Tong; Narang, Vishal; Abzalimov, Rinat; Wijerathne, Nadeesha; Ulijn, Rein V
2016-11-01
Sequence-specific polymers, such as oligonucleotides and peptides, can be used as building blocks for functional supramolecular nanomaterials. The design and selection of suitable self-assembling sequences is, however, challenging because of the vast combinatorial space available. Here we report a methodology that allows the peptide sequence space to be searched for self-assembling structures. In this approach, unprotected homo- and heterodipeptides (including aromatic, aliphatic, polar and charged amino acids) are subjected to continuous enzymatic condensation, hydrolysis and sequence exchange to create a dynamic combinatorial peptide library. The free-energy change associated with the assembly process itself gives rise to selective amplification of self-assembling candidates. By changing the environmental conditions during the selection process, different sequences and consequent nanoscale morphologies are selected.
A clone-free, single molecule map of the domestic cow (Bos taurus) genome.
Zhou, Shiguo; Goldstein, Steve; Place, Michael; Bechner, Michael; Patino, Diego; Potamousis, Konstantinos; Ravindran, Prabu; Pape, Louise; Rincon, Gonzalo; Hernandez-Ortiz, Juan; Medrano, Juan F; Schwartz, David C
2015-08-28
The cattle (Bos taurus) genome was originally selected for sequencing due to its economic importance and unique biology as a model organism for understanding other ruminants, or mammals. Currently, there are two cattle genome sequence assemblies (UMD3.1 and Btau4.6) from groups using dissimilar assembly algorithms, which were complemented by genetic and physical map resources. However, past comparisons between these assemblies revealed substantial differences. Consequently, such discordances have engendered ambiguities when using reference sequence data, impacting genomic studies in cattle and motivating construction of a new optical map resource--BtOM1.0--to guide comparisons and improvements to the current sequence builds. Accordingly, our comprehensive comparisons of BtOM1.0 against the UMD3.1 and Btau4.6 sequence builds tabulate large-to-immediate scale discordances requiring mediation. The optical map, BtOM1.0, spanning the B. taurus genome (Hereford breed, L1 Dominette 01449) was assembled from an optical map dataset consisting of 2,973,315 (439 X; raw dataset size before assembly) single molecule optical maps (Rmaps; 1 Rmap = 1 restriction mapped DNA molecule) generated by the Optical Mapping System. The BamHI map spans 2,575.30 Mb and comprises 78 optical contigs assembled by a combination of iterative (using the reference sequence: UMD3.1) and de novo assembly techniques. BtOM1.0 is a high-resolution physical map featuring an average restriction fragment size of 8.91 Kb. Comparisons of BtOM1.0 vs. UMD3.1, or Btau4.6, revealed that Btau4.6 presented far more discordances (7,463) vs. UMD3.1 (4,754). Overall, we found that Btau4.6 presented almost double the number of discordances than UMD3.1 across most of the 6 categories of sequence vs. map discrepancies, which are: COMPLEX (misassembly), DELs (extraneous sequences), INSs (missing sequences), ITs (Inverted/Translocated sequences), ECs (extra restriction cuts) and MCs (missing restriction cuts). Alignments of UMD3.1 and Btau4.6 to BtOM1.0 reveal discordances commensurate with previous reports, and affirm the NCBI's current designation of UMD3.1 sequence assembly as the "reference assembly" and the Btau4.6 as the "alternate assembly." The cattle genome optical map, BtOM1.0, when used as a comprehensive and largely independent guide, will greatly assist improvements to existing sequence builds, and later serve as an accurate physical scaffold for studies concerning the comparative genomics of cattle breeds.
Wang, Daxi; Korhonen, Pasi K; Gasser, Robin B; Young, Neil D
Clonorchis sinensis (family Opisthorchiidae) is an important foodborne parasite that has a major socioeconomic impact on ~35 million people predominantly in China, Vietnam, Korea and the Russian Far East. In humans, infection with C. sinensis causes clonorchiasis, a complex hepatobiliary disease that can induce cholangiocarcinoma (CCA), a malignant cancer of the bile ducts. Central to understanding the epidemiology of this disease is knowledge of genetic variation within and among populations of this parasite. Although most published molecular studies seem to suggest that C. sinensis represents a single species, evidence of karyotypic variation within C. sinensis and cryptic species within a related opisthorchiid fluke (Opisthorchis viverrini) emphasise the importance of studying and comparing the genes and genomes of geographically distinct isolates of C. sinensis. Recently, we sequenced, assembled and characterised a draft nuclear genome of a C. sinensis isolate from Korea and compared it with a published draft genome of a Chinese isolate of this species using a bioinformatic workflow established for comparing draft genome assemblies and their gene annotations. We identified that 50.6% and 51.3% of the Korean and Chinese C. sinensis genomic scaffolds were syntenic, respectively. Within aligned syntenic blocks, the genomes had a high level of nucleotide identity (99.1%) and encoded 15 variable proteins likely to be involved in diverse biological processes. Here, we review current technical challenges of using draft genome assemblies to undertake comparative genomic analyses to quantify genetic variation between isolates of the same species. Using a workflow that overcomes these challenges, we report on a high-quality draft genome for C. sinensis from Korea and comparative genomic analyses, as a basis for future investigations of the genetic structures of C. sinensis populations, and discuss the biotechnological implications of these explorations. Copyright © 2018 Elsevier Inc. All rights reserved.
Simple tools for assembling and searching high-density picolitre pyrophosphate sequence data.
Parker, Nicolas J; Parker, Andrew G
2008-04-18
The advent of pyrophosphate sequencing makes large volumes of sequencing data available at a lower cost than previously possible. However, the short read lengths are difficult to assemble and the large dataset is difficult to handle. During the sequencing of a virus from the tsetse fly, Glossina pallidipes, we found the need for tools to search quickly a set of reads for near exact text matches. A set of tools is provided to search a large data set of pyrophosphate sequence reads under a "live" CD version of Linux on a standard PC that can be used by anyone without prior knowledge of Linux and without having to install a Linux setup on the computer. The tools permit short lengths of de novo assembly, checking of existing assembled sequences, selection and display of reads from the data set and gathering counts of sequences in the reads. Demonstrations are given of the use of the tools to help with checking an assembly against the fragment data set; investigating homopolymer lengths, repeat regions and polymorphisms; and resolving inserted bases caused by incomplete chain extension. The additional information contained in a pyrophosphate sequencing data set beyond a basic assembly is difficult to access due to a lack of tools. The set of simple tools presented here would allow anyone with basic computer skills and a standard PC to access this information.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, Xiaofan; Peris, David; Kominek, Jacek
The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimentalmore » design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less
Zhou, Xiaofan; Peris, David; Kominek, Jacek; ...
2016-09-16
The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimentalmore » design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less
Effective de novo assembly of fish genome using haploid larvae.
Iwasaki, Yuki; Nishiki, Issei; Nakamura, Yoji; Yasuike, Motoshige; Kai, Wataru; Nomura, Kazuharu; Yoshida, Kazunori; Nomura, Yousuke; Fujiwara, Atushi; Kobayashi, Takanori; Ototake, Mitsuru
2016-02-01
Recent improvements in next-generation sequencing technology have made it possible to do whole genome sequencing, on even non-model eukaryote species with no available reference genomes. However, de novo assembly of diploid genomes is still a big challenge because of allelic variation. The aim of this study was to determine the feasibility of utilizing the genome of haploid fish larvae for de novo assembly of whole-genome sequences. We compared the efficiency of assembly using the haploid genome of yellowtail (Seriola quinqueradiata) with that using the diploid genome obtained from the dam. De novo assembly from the haploid and the diploid sequence reads (100 million reads per each datasets) generated by the Ion Proton sequencer (200 bp) was done under two different assembly algorithms, namely overlap-layout-consensus (OLC) and de Bruijn graph (DBG). This revealed that the assembly of the haploid genome significantly reduced (approximately 22% for OLC, 9% for DBG) the total number of contigs (with longer average and N50 contig lengths) when compared to the diploid genome assembly. The haploid assembly also improved the quality of the scaffolds by reducing the number of regions with unassigned nucleotides (Ns) (total length of Ns; 45,331,916 bp for haploids and 67,724,360 bp for diploids) in OLC-based assemblies. It appears clear that the haploid genome assembly is better because the allelic variation in the diploid genome disrupts the extension of contigs during the assembly process. Our results indicate that utilizing the genome of haploid larvae leads to a significant improvement in the de novo assembly process, thus providing a novel strategy for the construction of reference genomes from non-model diploid organisms such as fish. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
USDA-ARS?s Scientific Manuscript database
The current pig reference genome sequence (Sscrofa10.2) was established using Sanger sequencing and following the clone-by-clone hierarchical shotgun sequencing approach used in the public human genome project. However, as sequence coverage was low (4-6x) the resulting assembly was only of draft qua...
Sakakibara, Yasumbumi
2018-02-13
Keio University's Yasumbumi Sakakibara on "MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sakakibara, Yasumbumi
2011-10-13
Keio University's Yasumbumi Sakakibara on "MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.
Effects of short read quality and quantity on a de novo vertebrate transcriptome assembly.
Garcia, T I; Shen, Y; Catchen, J; Amores, A; Schartl, M; Postlethwait, J; Walter, R B
2012-01-01
For many researchers, next generation sequencing data holds the key to answering a category of questions previously unassailable. One of the important and challenging steps in achieving these goals is accurately assembling the massive quantity of short sequencing reads into full nucleic acid sequences. For research groups working with non-model or wild systems, short read assembly can pose a significant challenge due to the lack of pre-existing EST or genome reference libraries. While many publications describe the overall process of sequencing and assembly, few address the topic of how many and what types of reads are best for assembly. The goal of this project was use real world data to explore the effects of read quantity and short read quality scores on the resulting de novo assemblies. Using several samples of short reads of various sizes and qualities we produced many assemblies in an automated manner. We observe how the properties of read length, read quality, and read quantity affect the resulting assemblies and provide some general recommendations based on our real-world data set. Published by Elsevier Inc.
Phylogenomics from Whole Genome Sequences Using aTRAM.
Allen, Julie M; Boyd, Bret; Nguyen, Nam-Phuong; Vachaspati, Pranjal; Warnow, Tandy; Huang, Daisie I; Grady, Patrick G S; Bell, Kayce C; Cronk, Quentin C B; Mugisha, Lawrence; Pittendrigh, Barry R; Leonardi, M Soledad; Reed, David L; Johnson, Kevin P
2017-09-01
Novel sequencing technologies are rapidly expanding the size of data sets that can be applied to phylogenetic studies. Currently the most commonly used phylogenomic approaches involve some form of genome reduction. While these approaches make assembling phylogenomic data sets more economical for organisms with large genomes, they reduce the genomic coverage and thereby the long-term utility of the data. Currently, for organisms with moderate to small genomes ($<$1000 Mbp) it is feasible to sequence the entire genome at modest coverage ($10-30\\times$). Computational challenges for handling these large data sets can be alleviated by assembling targeted reads, rather than assembling the entire genome, to produce a phylogenomic data matrix. Here we demonstrate the use of automated Target Restricted Assembly Method (aTRAM) to assemble 1107 single-copy ortholog genes from whole genome sequencing of sucking lice (Anoplura) and out-groups. We developed a pipeline to extract exon sequences from the aTRAM assemblies by annotating them with respect to the original target protein. We aligned these protein sequences with the inferred amino acids and then performed phylogenetic analyses on both the concatenated matrix of genes and on each gene separately in a coalescent analysis. Finally, we tested the limits of successful assembly in aTRAM by assembling 100 genes from close- to distantly related taxa at high to low levels of coverage.Both the concatenated analysis and the coalescent-based analysis produced the same tree topology, which was consistent with previously published results and resolved weakly supported nodes. These results demonstrate that this approach is successful at developing phylogenomic data sets from raw genome sequencing reads. Further, we found that with coverages above $5-10\\times$, aTRAM was successful at assembling 80-90% of the contigs for both close and distantly related taxa. As sequencing costs continue to decline, we expect full genome sequencing will become more feasible for a wider array of organisms, and aTRAM will enable mining of these genomic data sets for an extensive variety of applications, including phylogenomics. [aTRAM; gene assembly; genome sequencing; phylogenomics.]. © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Palmer, Lance E; Dejori, Mathaeus; Bolanos, Randall; Fasulo, Daniel
2010-01-15
With the rapid expansion of DNA sequencing databases, it is now feasible to identify relevant information from prior sequencing projects and completed genomes and apply it to de novo sequencing of new organisms. As an example, this paper demonstrates how such extra information can be used to improve de novo assemblies by augmenting the overlapping step. Finding all pairs of overlapping reads is a key task in many genome assemblers, and to this end, highly efficient algorithms have been developed to find alignments in large collections of sequences. It is well known that due to repeated sequences, many aligned pairs of reads nevertheless do not overlap. But no overlapping algorithm to date takes a rigorous approach to separating aligned but non-overlapping read pairs from true overlaps. We present an approach that extends the Minimus assembler by a data driven step to classify overlaps as true or false prior to contig construction. We trained several different classification models within the Weka framework using various statistics derived from overlaps of reads available from prior sequencing projects. These statistics included percent mismatch and k-mer frequencies within the overlaps as well as a comparative genomics score derived from mapping reads to multiple reference genomes. We show that in real whole-genome sequencing data from the E. coli and S. aureus genomes, by providing a curated set of overlaps to the contigging phase of the assembler, we nearly doubled the median contig length (N50) without sacrificing coverage of the genome or increasing the number of mis-assemblies. Machine learning methods that use comparative and non-comparative features to classify overlaps as true or false can be used to improve the quality of a sequence assembly.
Shimizu, Tokurou; Tanizawa, Yasuhiro; Mochizuki, Takako; Nagasaki, Hideki; Yoshioka, Terutaka; Toyoda, Atsushi; Fujiyama, Asao; Kaminuma, Eli; Nakamura, Yasukazu
2017-01-01
Satsuma (Citrus unshiu Marc.) is one of the most abundantly produced mandarin varieties of citrus, known for its seedless fruit production and as a breeding parent of citrus. De novo assembly of the heterozygous diploid genome of Satsuma (“Miyagawa Wase”) was conducted by a hybrid assembly approach using short-read sequences, three mate-pair libraries, and a long-read sequence of PacBio by the PLATANUS assembler. The assembled sequence, with a total size of 359.7 Mb at the N50 length of 386,404 bp, consisted of 20,876 scaffolds. Pseudomolecules of Satsuma constructed by aligning the scaffolds to three genetic maps showed genome-wide synteny to the genomes of Clementine, pummelo, and sweet orange. Gene prediction by modeling with MAKER-P proposed 29,024 genes and 37,970 mRNA; additionally, gene prediction analysis found candidates for novel genes in several biosynthesis pathways for gibberellin and violaxanthin catabolism. BUSCO scores for the assembled scaffold and predicted transcripts, and another analysis by BAC end sequence mapping indicated the assembled genome consistency was close to those of the haploid Clementine, pummel, and sweet orange genomes. The number of repeat elements and long terminal repeat retrotransposon were comparable to those of the seven citrus genomes; this suggested no significant failure in the assembly at the repeat region. A resequencing application using the assembled sequence confirmed that both kunenbo-A and Satsuma are offsprings of Kishu, and Satsuma is a back-crossed offspring of Kishu. These results illustrated the performance of the hybrid assembly approach and its ability to construct an accurate heterozygous diploid genome. PMID:29259619
Shimizu, Tokurou; Tanizawa, Yasuhiro; Mochizuki, Takako; Nagasaki, Hideki; Yoshioka, Terutaka; Toyoda, Atsushi; Fujiyama, Asao; Kaminuma, Eli; Nakamura, Yasukazu
2017-01-01
Satsuma ( Citrus unshiu Marc.) is one of the most abundantly produced mandarin varieties of citrus, known for its seedless fruit production and as a breeding parent of citrus. De novo assembly of the heterozygous diploid genome of Satsuma ("Miyagawa Wase") was conducted by a hybrid assembly approach using short-read sequences, three mate-pair libraries, and a long-read sequence of PacBio by the PLATANUS assembler. The assembled sequence, with a total size of 359.7 Mb at the N 50 length of 386,404 bp, consisted of 20,876 scaffolds. Pseudomolecules of Satsuma constructed by aligning the scaffolds to three genetic maps showed genome-wide synteny to the genomes of Clementine, pummelo, and sweet orange. Gene prediction by modeling with MAKER-P proposed 29,024 genes and 37,970 mRNA; additionally, gene prediction analysis found candidates for novel genes in several biosynthesis pathways for gibberellin and violaxanthin catabolism. BUSCO scores for the assembled scaffold and predicted transcripts, and another analysis by BAC end sequence mapping indicated the assembled genome consistency was close to those of the haploid Clementine, pummel, and sweet orange genomes. The number of repeat elements and long terminal repeat retrotransposon were comparable to those of the seven citrus genomes; this suggested no significant failure in the assembly at the repeat region. A resequencing application using the assembled sequence confirmed that both kunenbo-A and Satsuma are offsprings of Kishu, and Satsuma is a back-crossed offspring of Kishu. These results illustrated the performance of the hybrid assembly approach and its ability to construct an accurate heterozygous diploid genome.
Enabling large-scale next-generation sequence assembly with Blacklight
Couger, M. Brian; Pipes, Lenore; Squina, Fabio; Prade, Rolf; Siepel, Adam; Palermo, Robert; Katze, Michael G.; Mason, Christopher E.; Blood, Philip D.
2014-01-01
Summary A variety of extremely challenging biological sequence analyses were conducted on the XSEDE large shared memory resource Blacklight, using current bioinformatics tools and encompassing a wide range of scientific applications. These include genomic sequence assembly, very large metagenomic sequence assembly, transcriptome assembly, and sequencing error correction. The data sets used in these analyses included uncategorized fungal species, reference microbial data, very large soil and human gut microbiome sequence data, and primate transcriptomes, composed of both short-read and long-read sequence data. A new parallel command execution program was developed on the Blacklight resource to handle some of these analyses. These results, initially reported previously at XSEDE13 and expanded here, represent significant advances for their respective scientific communities. The breadth and depth of the results achieved demonstrate the ease of use, versatility, and unique capabilities of the Blacklight XSEDE resource for scientific analysis of genomic and transcriptomic sequence data, and the power of these resources, together with XSEDE support, in meeting the most challenging scientific problems. PMID:25294974
Rapid construction of insulated genetic circuits via synthetic sequence-guided isothermal assembly
DOE Office of Scientific and Technical Information (OSTI.GOV)
Torella, JP; Boehm, CR; Lienert, F
2013-12-28
In vitro recombination methods have enabled one-step construction of large DNA sequences from multiple parts. Although synthetic biological circuits can in principle be assembled in the same fashion, they typically contain repeated sequence elements such as standard promoters and terminators that interfere with homologous recombination. Here we use a computational approach to design synthetic, biologically inactive unique nucleotide sequences (UNSes) that facilitate accurate ordered assembly. Importantly, our designed UNSes make it possible to assemble parts with repeated terminator and insulator sequences, and thereby create insulated functional genetic circuits in bacteria and mammalian cells. Using UNS-guided assembly to construct repeating promoter-gene-terminatormore » parts, we systematically varied gene expression to optimize production of a deoxychromoviridans biosynthetic pathway in Escherichia coli. We then used this system to construct complex eukaryotic AND-logic gates for genomic integration into embryonic stem cells. Construction was performed by using a standardized series of UNS-bearing BioBrick-compatible vectors, which enable modular assembly and facilitate reuse of individual parts. UNS-guided isothermal assembly is broadly applicable to the construction and optimization of genetic circuits and particularly those requiring tight insulation, such as complex biosynthetic pathways, sensors, counters and logic gates.« less
Jo, Yeonhwa; Choi, Hoseong; Kim, Sang-Min; Kim, Sun-Lim; Lee, Bong Choon; Cho, Won Kyong
2016-08-09
Next-generation sequencing (NGS) provides many possibilities for plant virology research. In this study, we performed integrated analyses using plant transcriptome data for plant virus identification using Apple stem grooving virus (ASGV) as an exemplar virus. We used 15 publicly available transcriptome libraries from three different studies, two mRNA-Seq studies and a small RNA-Seq study. We de novo assembled nearly complete genomes of ASGV isolates Fuji and Cuiguan from apple and pear transcriptomes, respectively, and identified single nucleotide variations (SNVs) of ASGV within the transcriptomes. We demonstrated the application of NGS raw data to confirm viral infections in the plant transcriptomes. In addition, we compared the usability of two de novo assemblers, Trinity and Velvet, for virus identification and genome assembly. A phylogenetic tree revealed that ASGV and Citrus tatter leaf virus (CTLV) are the same virus, which was divided into two clades. Recombination analyses identified six recombination events from 21 viral genomes. Taken together, our in silico analyses using NGS data provide a successful application of plant transcriptomes to reveal extensive information associated with viral genome assembly, SNVs, phylogenetic relationships, and genetic recombination.
Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.
Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro
2010-05-07
Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.
De novo assembly and phasing of a Korean human genome.
Seo, Jeong-Sun; Rhie, Arang; Kim, Junsoo; Lee, Sangjin; Sohn, Min-Hwan; Kim, Chang-Uk; Hastie, Alex; Cao, Han; Yun, Ji-Young; Kim, Jihye; Kuk, Junho; Park, Gun Hwa; Kim, Juhyeok; Ryu, Hanna; Kim, Jongbum; Roh, Mira; Baek, Jeonghun; Hunkapiller, Michael W; Korlach, Jonas; Shin, Jong-Yeon; Kim, Changhoon
2016-10-13
Advances in genome assembly and phasing provide an opportunity to investigate the diploid architecture of the human genome and reveal the full range of structural variation across population groups. Here we report the de novo assembly and haplotype phasing of the Korean individual AK1 (ref. 1) using single-molecule real-time sequencing, next-generation mapping, microfluidics-based linked reads, and bacterial artificial chromosome (BAC) sequencing approaches. Single-molecule sequencing coupled with next-generation mapping generated a highly contiguous assembly, with a contig N50 size of 17.9 Mb and a scaffold N50 size of 44.8 Mb, resolving 8 chromosomal arms into single scaffolds. The de novo assembly, along with local assemblies and spanning long reads, closes 105 and extends into 72 out of 190 euchromatic gaps in the reference genome, adding 1.03 Mb of previously intractable sequence. High concordance between the assembly and paired-end sequences from 62,758 BAC clones provides strong support for the robustness of the assembly. We identify 18,210 structural variants by direct comparison of the assembly with the human reference, identifying thousands of breakpoints that, to our knowledge, have not been reported before. Many of the insertions are reflected in the transcriptome and are shared across the Asian population. We performed haplotype phasing of the assembly with short reads, long reads and linked reads from whole-genome sequencing and with short reads from 31,719 BAC clones, thereby achieving phased blocks with an N50 size of 11.6 Mb. Haplotigs assembled from single-molecule real-time reads assigned to haplotypes on phased blocks covered 89% of genes. The haplotigs accurately characterized the hypervariable major histocompatability complex region as well as demonstrating allele configuration in clinically relevant genes such as CYP2D6. This work presents the most contiguous diploid human genome assembly so far, with extensive investigation of unreported and Asian-specific structural variants, and high-quality haplotyping of clinically relevant alleles for precision medicine.
Rochi, Lucia; Diéguez, María José; Burguener, Germán; Darino, Martín Alejandro; Pergolesi, María Fernanda; Ingala, Lorena Romina; Cuyeu, Alba Romina; Turjanski, Adrián; Kreff, Enrique Domingo; Sacco, Francisco
2018-03-01
Rust fungi are one of the most devastating pathogens of crop plants. The biotrophic fungus Puccinia sorghi Schwein (Ps) is responsible for maize common rust, an endemic disease of maize (Zea mays L.) in Argentina that causes significant yield losses in corn production. In spite of this, the Ps genomic sequence was not available. We used Illumina sequencing to rapidly produce the 99.6Mbdraft genome sequence of Ps race RO10H11247, derived from a single-uredinial isolate from infected maize leaves collected in the Argentine Corn Belt Region during 2010. High quality reads were obtained from 200bppaired-end and 5000bpmate-paired libraries and assembled in 15,722 scaffolds. A pipeline which combined an ab initio program with homology-based models and homology to in planta enriched ESTs from four cereal pathogenic fungus (the three sequenced wheat rusts and Ustilago maydis) was used to identify 21,087 putative coding sequences, of which 1599 might be part of the Ps RO10H11247 secretome. Among the 458 highly conserved protein families from the euKaryotic Orthologous Groups (KOG) that occur in a wide range of eukaryotic organisms, 97.5% have at least one member with high homology in the Ps assembly (TBlastN, E-value⩽e-10) covering more than 50% of the length of the KOG protein. Comparative studies with the three sequenced wheat rust fungus, and microsynteny analysis involving Puccinia striiformis f. sp. tritici (Pst, wheat stripe rust fungus), support the quality achieved. The results presented here show the effectiveness of the Illumina strategy for sequencing dikaryotic genomes of non-model organisms and provides reliable DNA sequence information for genomic studies, including pathogenic mechanisms of this maize fungus and molecular marker design. Copyright © 2016 Elsevier Inc. All rights reserved.
Whole-genome sequencing for comparative genomics and de novo genome assembly.
Benjak, Andrej; Sala, Claudia; Hartkoorn, Ruben C
2015-01-01
Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).
NASA Technical Reports Server (NTRS)
Homem De Mello, Luiz S.; Sanderson, Arthur C.
1991-01-01
The authors introduce two criteria for the evaluation and selection of assembly plans. The first criterion is to maximize the number of different sequences in which the assembly tasks can be executed. The second criterion is to minimize the total assembly time through simultaneous execution of assembly tasks. An algorithm that performs a heuristic search for the best assembly plan over the AND/OR graph representation of assembly plans is discussed. Admissible heuristics for each of the two criteria introduced are presented. Some implementation issues that affect the computational efficiency are addressed.
Two low coverage bird genomes and a comparison of reference-guided versus de novo genome assemblies.
Card, Daren C; Schield, Drew R; Reyes-Velasco, Jacobo; Fujita, Matthew K; Andrew, Audra L; Oyler-McCance, Sara J; Fike, Jennifer A; Tomback, Diana F; Ruggiero, Robert P; Castoe, Todd A
2014-01-01
As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (∼3.5-5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark's Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies.
Two low coverage bird genomes and a comparison of reference-guided versus de novo genome assemblies
Card, Daren C.; Schield, Drew R.; Reyes-Velasco, Jacobo; Fujita, Matthre K.; Andrew, Audra L.; Oyler-McCance, Sara J.; Fike, Jennifer A.; Tomback, Diana F.; Ruggiero, Robert P.; Castoe, Todd A.
2014-01-01
As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (~3.5–5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark's Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies.
NASA Astrophysics Data System (ADS)
Jay, Z.; Beam, J.; Bailey, C.; Dohnalkova, A.; Planer-Friedrich, B.; Romine, M.; Inskeep, W. P.
2012-12-01
The order Thermoproteales (phylum Crenarchaeota) consists of thermophilic, rod-shaped organisms that are found globally in geothermal habitats ranging in pH from ~3-9. Nearly all isolated Thermoproteales couple the respiration of inorganic sulfur species (e.g. elemental sulfur, thiosulfate, sulfate) to the oxidation of hydrogen or complex organic carbon. Prior 16S rRNA and metagenome analysis revealed four prominent Thermoproteales-like populations in hypoxic, sulfidic hot springs In Yellowstone National Park (YNP), WY, USA (Monarch Geyser [80° C, pH 4], Cistern Spring [76° C, pH 5] and Joseph's Coat Hot Spring [JCHS; 80° C, pH 6]). The objectives of this study were to 1) characterize and compare the indigenous Thermoproteales-like de novo assemblies identified from metagenomic sequence data available for geothermal systems across YNP, 2) determine the metabolic potential of the Thermoproteales-like populations and evaluate their role in the geochemical cycling of organic and inorganic constituents, and 3) contrast both the sequenced genome and growth physiology of the first Thermoproteales isolated from YNP ("Pyrobaculum yellowstonensis" strain WP30), to the indigenous Thermoproteales-like de novo assemblies. Sequences related to either Caldivirga or Vulcanisaeta spp. (Type I Thermoproteales) were identified in both aerobic and anaerobic habitats ranging in pH ~3 - 6. Thermoproteus or Pyrobaculum spp. (Type-II Thermoproteales) were identified in anoxic habitats, but were constrained to pH values >4. Annotation of the de novo assemblies indicate that both Type-I and Type-II Thermoproteales populations are primarily heterotrophic, although key proteins of the autotrophic dicarboxylate/4-hydroxybutyrate cycle were also identified. Caldivirga/Vulcanisaeta-like populations appear to respire on elemental sulfur, sulfate, or molecular oxygen, while the Thermoproteus/Pyrobaculum-like population may also oxidize hydrogen and respire on elemental sulfur, thiosulfate, arsenate, or tetrathionate. One of the relevant Thermoproteales Type-II populations was isolated from JCHS and is an anaerobic heterotroph utilizing yeast extract as a carbon and energy source while respiring on elemental sulfur or arsenate, resulting in the production of sulfide or arsenite, respectively. The optimum growth temperature of strain WP30 (75° C) and pH range (4.5 - 7) corresponds well with characteristics of the sulfidic sediment used as the original inoculum. A draft genome of strain WP30 reveals that respiration may involve as many as four dimethylsulfoxide molybdopterin oxidoreductases including a putative sulfur reductase and an arsenate reductase. Sequences with high amino acid identity to these reductases were also identified in metagenome data sets from sites containin Type-II populations. Expression data of these terminal reductase genes during the growth of strain WP30 on either sulfur or arsenate were compared to expression results from field sites. These data provide insights regarding the diversity, distribution, and potential role of Thermoproteales-like populations in high-temperature environments of YNP.
Review of general algorithmic features for genome assemblers for next generation sequencers.
Wajid, Bilal; Serpedin, Erchin
2012-04-01
In the realm of bioinformatics and computational biology, the most rudimentary data upon which all the analysis is built is the sequence data of genes, proteins and RNA. The sequence data of the entire genome is the solution to the genome assembly problem. The scope of this contribution is to provide an overview on the art of problem-solving applied within the domain of genome assembly in the next-generation sequencing (NGS) platforms. This article discusses the major genome assemblers that were proposed in the literature during the past decade by outlining their basic working principles. It is intended to act as a qualitative, not a quantitative, tutorial to all working on genome assemblers pertaining to the next generation of sequencers. We discuss the theoretical aspects of various genome assemblers, identifying their working schemes. We also discuss briefly the direction in which the area is headed towards along with discussing core issues on software simplicity. Copyright © 2012 Beijing Institute of Genomics, Chinese Academy of Sciences. Published by Elsevier Ltd. All rights reserved.
Shen, Ping; Fan, Jianzhong; Guo, Lihua; Li, Jiahua; Li, Ang; Zhang, Jing; Ying, Chaoqun; Ji, Jinru; Xu, Hao; Zheng, Beiwen; Xiao, Yonghong
2017-05-12
Shigellosis is the most common cause of gastrointestinal infections in developing countries. In China, the species most frequently responsible for shigellosis is Shigella flexneri. S. flexneri remains largely unexplored from a genomic standpoint and is still described using a vocabulary based on biochemical and serological properties. Moreover, increasing numbers of ESBL-producing Shigella strains have been isolated from clinical samples. Despite this, only a few cases of ESBL-producing Shigella have been described in China. Therefore, a better understanding of ESBL-producing Shigella from a genomic standpoint is required. In this study, a S. flexneri type 1a isolate SP1 harboring bla CTX-M-14 , which was recovered from the patient with diarrhea, was subjected to whole genome sequencing. The draft genome assembly of S. flexneri strain SP1 consisted of 4,592,345 bp with a G+C content of 50.46%. RAST analysis revealed the genome contained 4798 coding sequences (CDSs) and 100 RNA-encoding genes. We detected one incomplete prophage and six candidate CRISPR loci in the genome. In vitro antimicrobial susceptibility testing demonstrated that strain SP1 is resistant to ampicillin, amoxicillin/clavulanic acid, cefazolin, ceftriaxone and trimethoprim. In silico analysis detected genes mediating resistance to aminoglycosides, β-lactams, phenicol, tetracycline, sulphonamides, and trimethoprim. The bla CTX-M-14 gene was located on an IncFII2 plasmid. A series of virulence factors were identified in the genome. In this study, we report the whole genome sequence of a bla CTX-M-14 -encoding S. flexneri strain SP1. Dozens of resistance determinants were detected in the genome and may be responsible for the multidrug-resistance of this strain, although further confirmation studies are warranted. Numerous virulence factors identified in the strain suggest that isolate SP1 is potential pathogenic. The availability of the genome sequence and comparative analysis with other S. flexneri strains provides the basis to further address the evolution of drug resistance mechanisms and pathogenicity in S. flexneri.
Li, Runsheng; Hsieh, Chia-Ling; Young, Amanda; Zhang, Zhihong; Ren, Xiaoliang; Zhao, Zhongying
2015-01-01
Most next-generation sequencing platforms permit acquisition of high-throughput DNA sequences, but the relatively short read length limits their use in genome assembly or finishing. Illumina has recently released a technology called Synthetic Long-Read Sequencing that can produce reads of unusual length, i.e., predominately around 10 Kb. However, a systematic assessment of their use in genome finishing and assembly is still lacking. We evaluate the promise and deficiency of the long reads in these aspects using isogenic C. elegans genome with no gap. First, the reads are highly accurate and capable of recovering most types of repetitive sequences. However, the presence of tandem repetitive sequences prevents pre-assembly of long reads in the relevant genomic region. Second, the reads are able to reliably detect missing but not extra sequences in the C. elegans genome. Third, the reads of smaller size are more capable of recovering repetitive sequences than those of bigger size. Fourth, at least 40 Kbp missing genomic sequences are recovered in the C. elegans genome using the long reads. Finally, an N50 contig size of at least 86 Kbp can be achieved with 24×reads but with substantial mis-assembly errors, highlighting a need for novel assembly algorithm for the long reads. PMID:26039588
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
Bankevich, Anton; Nurk, Sergey; Antipov, Dmitry; Gurevich, Alexey A.; Dvorkin, Mikhail; Kulikov, Alexander S.; Lesin, Valery M.; Nikolenko, Sergey I.; Pham, Son; Prjibelski, Andrey D.; Pyshkin, Alexey V.; Sirotkin, Alexander V.; Vyahhi, Nikolay; Tesler, Glenn; Pevzner, Pavel A.
2012-01-01
Abstract The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V−SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online (http://bioinf.spbau.ru/spades). It is distributed as open source software. PMID:22506599
Vélez, Julián Reyes; Cameron, Marguerite; Rodríguez-Lecompte, Juan Carlos; Xia, Fangfang; Heider, Luke C.; Saab, Matthew; McClure, J. Trenton; Sánchez, Javier
2017-01-01
The objectives of this study are to determine the occurrence of antimicrobial resistance (AMR) genes using whole-genome sequence (WGS) of Streptococcus uberis (S. uberis) and Streptococcus dysgalactiae (S. dysgalactiae) isolates, recovered from dairy cows in the Canadian Maritime Provinces. A secondary objective included the exploration of the association between phenotypic AMR and the genomic characteristics (genome size, guanine–cytosine content, and occurrence of unique gene sequences). Initially, 91 isolates were sequenced, and of these isolates, 89 were assembled. Furthermore, 16 isolates were excluded due to larger than expected genomic sizes (>2.3 bp × 1,000 bp). In the final analysis, 73 were used with complete WGS and minimum inhibitory concentration records, which were part of the previous phenotypic AMR study, representing 18 dairy herds from the Maritime region of Canada (1). A total of 23 unique AMR gene sequences were found in the bacterial genomes, with a mean number of 8.1 (minimum: 5; maximum: 13) per genome. Overall, there were 10 AMR genes [ANT(6), TEM-127, TEM-163, TEM-89, TEM-95, Linb, Lnub, Ermb, Ermc, and TetS] present only in S. uberis genomes and 2 genes unique (EF-TU and TEM-71) to the S. dysgalactiae genomes; 11 AMR genes [APH(3′), TEM-1, TEM-136, TEM-157, TEM-47, TetM, bl2b, gyrA, parE, phoP, and rpoB] were found in both bacterial species. Two-way tabulations showed association between the phenotypic susceptibility to lincosamides and the presence of linB (P = 0.002) and lnuB (P < 0.001) genes and the between the presence of tetM (P = 0.015) and tetS (P = 0.064) genes and phenotypic resistance to tetracyclines only for the S. uberis isolates. The logistic model showed that the odds of resistance (to any of the phenotypically tested antimicrobials) was 4.35 times higher when there were >11 AMR genes present in the genome, compared with <7 AMR genes (P < 0.001). The odds of resistance was lower for S. dysgalactiae than S. uberis (P = 0.031). When the within-herd somatic cell count was >250,000 cells/mL, a trend toward higher odds of resistance compared with the baseline category of <150,000 cells/mL was observed. When the isolate corresponded to a post-mastitis sample, there were lower odds of resistance when compared with non-clinical isolates (P = 0.01). The results of this study showed the strength of associations between phenotypic AMR resistance of both mastitis pathogens and their genotypic resistome and other epidemiological characteristics. PMID:28589129
ERIC Educational Resources Information Center
Taylor, D. Leland; Campbell, A. Malcolm; Heyer, Laurie J.
2013-01-01
Next-generation sequencing technologies have greatly reduced the cost of sequencing genomes. With the current sequencing technology, a genome is broken into fragments and sequenced, producing millions of "reads." A computer algorithm pieces these reads together in the genome assembly process. PHAST is a set of online modules…
Human Contamination in Public Genome Assemblies.
Kryukov, Kirill; Imanishi, Tadashi
2016-01-01
Contamination in genome assembly can lead to wrong or confusing results when using such genome as reference in sequence comparison. Although bacterial contamination is well known, the problem of human-originated contamination received little attention. In this study we surveyed 45,735 available genome assemblies for evidence of human contamination. We used lineage specificity to distinguish between contamination and conservation. We found that 154 genome assemblies contain fragments that with high confidence originate as contamination from human DNA. Majority of contaminating human sequences were present in the reference human genome assembly for over a decade. We recommend that existing contaminated genomes should be revised to remove contaminated sequence, and that new assemblies should be thoroughly checked for presence of human DNA before submitting them to public databases.
Next generation sequence assembly with AMOS.
Treangen, Todd J; Sommer, Dan D; Angly, Florent E; Koren, Sergey; Pop, Mihai
2011-03-01
A Modular Open-Source Assembler (AMOS) was designed to offer a modular approach to genome assembly. AMOS includes a wide range of tools for assembly, including the lightweight de novo assemblers Minimus and Minimo, and Bambus 2, a robust scaffolder able to handle metagenomic and polymorphic data. This protocol describes how to configure and use AMOS for the assembly of Next Generation sequence data. Additionally, we provide three tutorial examples that include bacterial, viral, and metagenomic datasets with specific tips for improving assembly quality. © 2011 by John Wiley & Sons, Inc.
2013-01-01
Background The lack of genomic resources can present challenges for studies of non-model organisms. Transcriptome sequencing offers an attractive method to gather information about genes and gene expression without the need for a reference genome. However, it is unclear what sequencing depth is adequate to assemble the transcriptome de novo for these purposes. Results We assembled transcriptomes of animals from six different phyla (Annelids, Arthropods, Chordates, Cnidarians, Ctenophores, and Molluscs) at regular increments of reads using Velvet/Oases and Trinity to determine how read count affects the assembly. This included an assembly of mouse heart reads because we could compare those against the reference genome that is available. We found qualitative differences in the assemblies of whole-animals versus tissues. With increasing reads, whole-animal assemblies show rapid increase of transcripts and discovery of conserved genes, while single-tissue assemblies show a slower discovery of conserved genes though the assembled transcripts were often longer. A deeper examination of the mouse assemblies shows that with more reads, assembly errors become more frequent but such errors can be mitigated with more stringent assembly parameters. Conclusions These assembly trends suggest that representative assemblies are generated with as few as 20 million reads for tissue samples and 30 million reads for whole-animals for RNA-level coverage. These depths provide a good balance between coverage and noise. Beyond 60 million reads, the discovery of new genes is low and sequencing errors of highly-expressed genes are likely to accumulate. Finally, siphonophores (polymorphic Cnidarians) are an exception and possibly require alternate assembly strategies. PMID:23496952
Francis, Warren R; Christianson, Lynne M; Kiko, Rainer; Powers, Meghan L; Shaner, Nathan C; Haddock, Steven H D
2013-03-12
The lack of genomic resources can present challenges for studies of non-model organisms. Transcriptome sequencing offers an attractive method to gather information about genes and gene expression without the need for a reference genome. However, it is unclear what sequencing depth is adequate to assemble the transcriptome de novo for these purposes. We assembled transcriptomes of animals from six different phyla (Annelids, Arthropods, Chordates, Cnidarians, Ctenophores, and Molluscs) at regular increments of reads using Velvet/Oases and Trinity to determine how read count affects the assembly. This included an assembly of mouse heart reads because we could compare those against the reference genome that is available. We found qualitative differences in the assemblies of whole-animals versus tissues. With increasing reads, whole-animal assemblies show rapid increase of transcripts and discovery of conserved genes, while single-tissue assemblies show a slower discovery of conserved genes though the assembled transcripts were often longer. A deeper examination of the mouse assemblies shows that with more reads, assembly errors become more frequent but such errors can be mitigated with more stringent assembly parameters. These assembly trends suggest that representative assemblies are generated with as few as 20 million reads for tissue samples and 30 million reads for whole-animals for RNA-level coverage. These depths provide a good balance between coverage and noise. Beyond 60 million reads, the discovery of new genes is low and sequencing errors of highly-expressed genes are likely to accumulate. Finally, siphonophores (polymorphic Cnidarians) are an exception and possibly require alternate assembly strategies.
Binding branched and linear DNA structures: From isolated clusters to fully bonded gels
NASA Astrophysics Data System (ADS)
Fernandez-Castanon, J.; Bomboi, F.; Sciortino, F.
2018-01-01
The proper design of DNA sequences allows for the formation of well-defined supramolecular units with controlled interactions via a consecution of self-assembling processes. Here, we benefit from the controlled DNA self-assembly to experimentally realize particles with well-defined valence, namely, tetravalent nanostars (A) and bivalent chains (B). We specifically focus on the case in which A particles can only bind to B particles, via appropriately designed sticky-end sequences. Hence AA and BB bonds are not allowed. Such a binary mixture system reproduces with DNA-based particles the physics of poly-functional condensation, with an exquisite control over the bonding process, tuned by the ratio, r, between B and A units and by the temperature, T. We report dynamic light scattering experiments in a window of Ts ranging from 10 °C to 55 °C and an interval of r around the percolation transition to quantify the decay of the density correlation for the different cases. At low T, when all possible bonds are formed, the system behaves as a fully bonded network, as a percolating gel, and as a cluster fluid depending on the selected r.
Long-read sequencing and de novo assembly of a Chinese genome
USDA-ARS?s Scientific Manuscript database
Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arr...
Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse
Hillier, LaDeana W.; Zody, Michael C.; Goldstein, Steve; She, Xinwe; Bult, Carol J.; Agarwala, Richa; Cherry, Joshua L.; DiCuccio, Michael; Hlavina, Wratko; Kapustin, Yuri; Meric, Peter; Maglott, Donna; Birtle, Zoë; Marques, Ana C.; Graves, Tina; Zhou, Shiguo; Teague, Brian; Potamousis, Konstantinos; Churas, Christopher; Place, Michael; Herschleb, Jill; Runnheim, Ron; Forrest, Daniel; Amos-Landgraf, James; Schwartz, David C.; Cheng, Ze; Lindblad-Toh, Kerstin; Eichler, Evan E.; Ponting, Chris P.
2009-01-01
The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non–protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not. PMID:19468303
Hara, Yuichiro; Tatsumi, Kaori; Yoshida, Michio; Kajikawa, Eriko; Kiyonari, Hiroshi; Kuraku, Shigehiro
2015-11-18
RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as well as whole genome analyses.
de la Fuente, José; Díez-Delgado, Iratxe; Contreras, Marinela; Vicente, Joaquín; Cabezas-Cruz, Alejandro; Tobes, Raquel; Manrique, Marina; López, Vladimir; Romero, Beatriz; Bezos, Javier; Dominguez, Lucas; Sevilla, Iker A; Garrido, Joseba M; Juste, Ramón; Madico, Guillermo; Jones-López, Edward; Gortazar, Christian
2015-11-01
Mycobacteria of the Mycobacterium tuberculosis complex (MTBC) greatly affect humans and animals worldwide. The life cycle of mycobacteria is complex and the mechanisms resulting in pathogen infection and survival in host cells are not fully understood. Recently, comparative genomics analyses have provided new insights into the evolution and adaptation of the MTBC to survive inside the host. However, most of this information has been obtained using M. tuberculosis but not other members of the MTBC such as M. bovis and M. caprae. In this study, the genome of three M. bovis (MB1, MB3, MB4) and one M. caprae (MB2) field isolates with different lesion score, prevalence and host distribution phenotypes were sequenced. Genome sequence information was used for whole-genome and protein-targeted comparative genomics analysis with the aim of finding correlates with phenotypic variation with potential implications for tuberculosis (TB) disease risk assessment and control. At the whole-genome level the results of the first comparative genomics study of field isolates of M. bovis including M. caprae showed that as previously reported for M. tuberculosis, sequential chromosomal nucleotide substitutions were the main driver of the M. bovis genome evolution. The phylogenetic analysis provided a strong support for the M. bovis/M. caprae clade, but supported M. caprae as a separate species. The comparison of the MB1 and MB4 isolates revealed differences in genome sequence, including gene families that are important for bacterial infection and transmission, thus highlighting differences with functional implications between isolates otherwise classified with the same spoligotype. Strategic protein-targeted analysis using the ESX or type VII secretion system, proteins linking stress response with lipid metabolism, host T cell epitopes of mycobacteria, antigens and peptidoglycan assembly protein identified new genetic markers and candidate vaccine antigens that warrant further study to develop tools to evaluate risks for TB disease caused by M. bovis/M.caprae and for TB control in humans and animals.
SeqLib: a C ++ API for rapid BAM manipulation, sequence alignment and sequence assembly
Wala, Jeremiah; Beroukhim, Rameen
2017-01-01
Abstract We present SeqLib, a C ++ API and command line tool that provides a rapid and user-friendly interface to BAM/SAM/CRAM files, global sequence alignment operations and sequence assembly. Four C libraries perform core operations in SeqLib: HTSlib for BAM access, BWA-MEM and BLAT for sequence alignment and Fermi for error correction and sequence assembly. Benchmarking indicates that SeqLib has lower CPU and memory requirements than leading C ++ sequence analysis APIs. We demonstrate an example of how minimal SeqLib code can extract, error-correct and assemble reads from a CRAM file and then align with BWA-MEM. SeqLib also provides additional capabilities, including chromosome-aware interval queries and read plotting. Command line tools are available for performing integrated error correction, micro-assemblies and alignment. Availability and Implementation: SeqLib is available on Linux and OSX for the C ++98 standard and later at github.com/walaj/SeqLib. SeqLib is released under the Apache2 license. Additional capabilities for BLAT alignment are available under the BLAT license. Contact: jwala@broadinstitue.org; rameen@broadinstitute.org PMID:28011768
SeqLib: a C ++ API for rapid BAM manipulation, sequence alignment and sequence assembly.
Wala, Jeremiah; Beroukhim, Rameen
2017-03-01
We present SeqLib, a C ++ API and command line tool that provides a rapid and user-friendly interface to BAM/SAM/CRAM files, global sequence alignment operations and sequence assembly. Four C libraries perform core operations in SeqLib: HTSlib for BAM access, BWA-MEM and BLAT for sequence alignment and Fermi for error correction and sequence assembly. Benchmarking indicates that SeqLib has lower CPU and memory requirements than leading C ++ sequence analysis APIs. We demonstrate an example of how minimal SeqLib code can extract, error-correct and assemble reads from a CRAM file and then align with BWA-MEM. SeqLib also provides additional capabilities, including chromosome-aware interval queries and read plotting. Command line tools are available for performing integrated error correction, micro-assemblies and alignment. SeqLib is available on Linux and OSX for the C ++98 standard and later at github.com/walaj/SeqLib. SeqLib is released under the Apache2 license. Additional capabilities for BLAT alignment are available under the BLAT license. jwala@broadinstitue.org ; rameen@broadinstitute.org. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Ashrafi, Hamid; Hill, Theresa; Stoffel, Kevin; Kozik, Alexander; Yao, Jiqiang; Chin-Wo, Sebastian Reyes; Van Deynze, Allen
2012-10-30
Molecular breeding of pepper (Capsicum spp.) can be accelerated by developing DNA markers associated with transcriptomes in breeding germplasm. Before the advent of next generation sequencing (NGS) technologies, the majority of sequencing data were generated by the Sanger sequencing method. By leveraging Sanger EST data, we have generated a wealth of genetic information for pepper including thousands of SNPs and Single Position Polymorphic (SPP) markers. To complement and enhance these resources, we applied NGS to three pepper genotypes: Maor, Early Jalapeño and Criollo de Morelos-334 (CM334) to identify SNPs and SSRs in the assembly of these three genotypes. Two pepper transcriptome assemblies were developed with different purposes. The first reference sequence, assembled by CAP3 software, comprises 31,196 contigs from >125,000 Sanger-EST sequences that were mainly derived from a Korean F1-hybrid line, Bukang. Overlapping probes were designed for 30,815 unigenes to construct a pepper Affymetrix GeneChip® microarray for whole genome analyses. In addition, custom Python scripts were used to identify 4,236 SNPs in contigs of the assembly. A total of 2,489 simple sequence repeats (SSRs) were identified from the assembly, and primers were designed for the SSRs. Annotation of contigs using Blast2GO software resulted in information for 60% of the unigenes in the assembly. The second transcriptome assembly was constructed from more than 200 million Illumina Genome Analyzer II reads (80-120 nt) using a combination of Velvet, CLC workbench and CAP3 software packages. BWA, SAMtools and in-house Perl scripts were used to identify SNPs among three pepper genotypes. The SNPs were filtered to be at least 50 bp from any intron-exon junctions as well as flanking SNPs. More than 22,000 high-quality putative SNPs were identified. Using the MISA software, 10,398 SSR markers were also identified within the Illumina transcriptome assembly and primers were designed for the identified markers. The assembly was annotated by Blast2GO and 14,740 (12%) of annotated contigs were associated with functional proteins. Before availability of pepper genome sequence, assembling transcriptomes of this economically important crop was required to generate thousands of high-quality molecular markers that could be used in breeding programs. In order to have a better understanding of the assembled sequences and to identify candidate genes underlying QTLs, we annotated the contigs of Sanger-EST and Illumina transcriptome assemblies. These and other information have been curated in a database that we have dedicated for pepper project.
General Model for Retroviral Capsid Pattern Recognition by TRIM5 Proteins.
Wagner, Jonathan M; Christensen, Devin E; Bhattacharya, Akash; Dawidziak, Daria M; Roganowicz, Marcin D; Wan, Yueping; Pumroy, Ruth A; Demeler, Borries; Ivanov, Dmitri N; Ganser-Pornillos, Barbie K; Sundquist, Wesley I; Pornillos, Owen
2018-02-15
Restriction factors are intrinsic cellular defense proteins that have evolved to block microbial infections. Retroviruses such as HIV-1 are restricted by TRIM5 proteins, which recognize the viral capsid shell that surrounds, organizes, and protects the viral genome. TRIM5α uses a SPRY domain to bind capsids with low intrinsic affinity ( K D of >1 mM) and therefore requires higher-order assembly into a hexagonal lattice to generate sufficient avidity for productive capsid recognition. TRIMCyp, on the other hand, binds HIV-1 capsids through a cyclophilin A domain, which has a well-defined binding site and higher affinity ( K D of ∼10 μM) for isolated capsid subunits. Therefore, it has been argued that TRIMCyp proteins have dispensed with the need for higher-order assembly to function as antiviral factors. Here, we show that, consistent with its high degree of sequence similarity with TRIM5α, the TRIMCyp B-box 2 domain shares the same ability to self-associate and facilitate assembly of a TRIMCyp hexagonal lattice that can wrap about the HIV-1 capsid. We also show that under stringent experimental conditions, TRIMCyp-mediated restriction of HIV-1 is indeed dependent on higher-order assembly. Both forms of TRIM5 therefore use the same mechanism of avidity-driven capsid pattern recognition. IMPORTANCE Rhesus macaques and owl monkeys are highly resistant to HIV-1 infection due to the activity of TRIM5 restriction factors. The rhesus macaque TRIM5α protein blocks HIV-1 through a mechanism that requires self-assembly of a hexagonal TRIM5α lattice around the invading viral core. Lattice assembly amplifies very weak interactions between the TRIM5α SPRY domain and the HIV-1 capsid. Assembly also promotes dimerization of the TRIM5α RING E3 ligase domain, resulting in synthesis of polyubiquitin chains that mediate downstream steps of restriction. In contrast to rhesus TRIM5α, the owl monkey TRIM5 homolog, TRIMCyp, binds isolated HIV-1 CA subunits much more tightly through its cyclophilin A domain and therefore was thought to act independently of higher-order assembly. Here, we show that TRIMCyp shares the assembly properties of TRIM5α and that both forms of TRIM5 use the same mechanism of hexagonal lattice formation to promote viral recognition and restriction. Copyright © 2018 American Society for Microbiology.
Genome-Wide Analysis of Corynespora cassiicola Leaf Fall Disease Putative Effectors
Lopez, David; Ribeiro, Sébastien; Label, Philippe; Fumanal, Boris; Venisse, Jean-Stéphane; Kohler, Annegret; de Oliveira, Ricardo R.; Labutti, Kurt; Lipzen, Anna; Lail, Kathleen; Bauer, Diane; Ohm, Robin A.; Barry, Kerrie W.; Spatafora, Joseph; Grigoriev, Igor V.; Martin, Francis M.; Pujade-Renaud, Valérie
2018-01-01
Corynespora cassiicola is an Ascomycetes fungus with a broad host range and diverse life styles. Mostly known as a necrotrophic plant pathogen, it has also been associated with rare cases of human infection. In the rubber tree, this fungus causes the Corynespora leaf fall (CLF) disease, which increasingly affects natural rubber production in Asia and Africa. It has also been found as an endophyte in South American rubber plantations where no CLF outbreak has yet occurred. The C. cassiicola species is genetically highly diverse, but no clear relationship has been evidenced between phylogenetic lineage and pathogenicity. Cassiicolin, a small glycosylated secreted protein effector, is thought to be involved in the necrotrophic interaction with the rubber tree but some virulent C. cassiicola isolates do not have a cassiicolin gene. This study set out to identify other putative effectors involved in CLF. The genome of a highly virulent C. cassiicola isolate from the rubber tree (CCP) was sequenced and assembled. In silico prediction revealed 2870 putative effectors, comprising CAZymes, lipases, peptidases, secreted proteins and enzymes associated with secondary metabolism. Comparison with the genomes of 44 other fungal species, focusing on effector content, revealed a striking proximity with phylogenetically unrelated species (Colletotrichum acutatum, Colletotrichum gloesporioides, Fusarium oxysporum, nectria hematococca, and Botrosphaeria dothidea) sharing life style plasticity and broad host range. Candidate effectors involved in the compatible interaction with the rubber tree were identified by transcriptomic analysis. Differentially expressed genes included 92 putative effectors, among which cassiicolin and two other secreted singleton proteins. Finally, the genomes of 35 C. cassiicola isolates representing the genetic diversity of the species were sequenced and assembled, and putative effectors identified. At the intraspecific level, effector-based classification was found to be highly consistent with the phylogenomic trees. Identification of lineage-specific effectors is a key step toward understanding C. cassiicola virulence and host specialization mechanisms. PMID:29551995
Genome-Wide Analysis of Corynespora cassiicola Leaf Fall Disease Putative Effectors.
Lopez, David; Ribeiro, Sébastien; Label, Philippe; Fumanal, Boris; Venisse, Jean-Stéphane; Kohler, Annegret; de Oliveira, Ricardo R; Labutti, Kurt; Lipzen, Anna; Lail, Kathleen; Bauer, Diane; Ohm, Robin A; Barry, Kerrie W; Spatafora, Joseph; Grigoriev, Igor V; Martin, Francis M; Pujade-Renaud, Valérie
2018-01-01
Corynespora cassiicola is an Ascomycetes fungus with a broad host range and diverse life styles. Mostly known as a necrotrophic plant pathogen, it has also been associated with rare cases of human infection. In the rubber tree, this fungus causes the Corynespora leaf fall (CLF) disease, which increasingly affects natural rubber production in Asia and Africa. It has also been found as an endophyte in South American rubber plantations where no CLF outbreak has yet occurred. The C. cassiicola species is genetically highly diverse, but no clear relationship has been evidenced between phylogenetic lineage and pathogenicity. Cassiicolin, a small glycosylated secreted protein effector, is thought to be involved in the necrotrophic interaction with the rubber tree but some virulent C. cassiicola isolates do not have a cassiicolin gene. This study set out to identify other putative effectors involved in CLF. The genome of a highly virulent C. cassiicola isolate from the rubber tree (CCP) was sequenced and assembled. In silico prediction revealed 2870 putative effectors, comprising CAZymes, lipases, peptidases, secreted proteins and enzymes associated with secondary metabolism. Comparison with the genomes of 44 other fungal species, focusing on effector content, revealed a striking proximity with phylogenetically unrelated species ( Colletotrichum acutatum, Colletotrichum gloesporioides, Fusarium oxysporum, nectria hematococca , and Botrosphaeria dothidea ) sharing life style plasticity and broad host range. Candidate effectors involved in the compatible interaction with the rubber tree were identified by transcriptomic analysis. Differentially expressed genes included 92 putative effectors, among which cassiicolin and two other secreted singleton proteins. Finally, the genomes of 35 C. cassiicola isolates representing the genetic diversity of the species were sequenced and assembled, and putative effectors identified. At the intraspecific level, effector-based classification was found to be highly consistent with the phylogenomic trees. Identification of lineage-specific effectors is a key step toward understanding C. cassiicola virulence and host specialization mechanisms.
Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum.
VanBuren, Robert; Bryant, Doug; Edger, Patrick P; Tang, Haibao; Burgess, Diane; Challabathula, Dinakar; Spittle, Kristi; Hall, Richard; Gu, Jenny; Lyons, Eric; Freeling, Michael; Bartels, Dorothea; Ten Hallers, Boudewijn; Hastie, Alex; Michael, Todd P; Mockler, Todd C
2015-11-26
Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetium genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a 'near-complete' draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. The Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.
Tirera, Sourakhata; Ginouves, Marine; Donato, Damien; Caballero, Ignacio S; Bouchier, Christiane; Lavergne, Anne; Bourreau, Eliane; Mosnier, Emilie; Vantilcke, Vincent; Couppié, Pierre; Prevot, Ghislaine; Lacoste, Vincent
2017-07-01
Leishmania RNA virus type 1 (LRV1) is an endosymbiont of some Leishmania (Vianna) species in South America. Presence of LRV1 in parasites exacerbates disease severity in animal models and humans, related to a disproportioned innate immune response, and is correlated with drug treatment failures in humans. Although the virus was identified decades ago, its genomic diversity has been overlooked until now. We subjected LRV1 strains from 19 L. (V.) guyanensis and one L. (V.) braziliensis isolates obtained from cutaneous leishmaniasis samples identified throughout French Guiana with next-generation sequencing and de novo sequence assembly. We generated and analyzed 24 unique LRV1 sequences over their full-length coding regions. Multiple alignment of these new sequences revealed variability (0.5%-23.5%) across the entire sequence except for highly conserved motifs within the 5' untranslated region. Phylogenetic analyses showed that viral genomes of L. (V.) guyanensis grouped into five distinct clusters. They further showed a species-dependent clustering between viral genomes of L. (V.) guyanensis and L. (V.) braziliensis, confirming a long-term co-evolutionary history. Noteworthy, we identified cases of multiple LRV1 infections in three of the 20 Leishmania isolates. Here, we present the first-ever estimate of LRV1 genomic diversity that exists in Leishmania (V.) guyanensis parasites. Genetic characterization and phylogenetic analyses of these viruses has shed light on their evolutionary relationships. To our knowledge, this study is also the first to report cases of multiple LRV1 infections in some parasites. Finally, this work has made it possible to develop molecular tools for adequate identification and genotyping of LRV1 strains for diagnostic purposes. Given the suspected worsening role of LRV1 infection in the pathogenesis of human leishmaniasis, these data have a major impact from a clinical viewpoint and for the management of Leishmania-infected patients.
Caballero, Ignacio S.; Bouchier, Christiane; Lavergne, Anne; Bourreau, Eliane; Mosnier, Emilie; Vantilcke, Vincent; Couppié, Pierre; Prevot, Ghislaine
2017-01-01
Introduction Leishmania RNA virus type 1 (LRV1) is an endosymbiont of some Leishmania (Vianna) species in South America. Presence of LRV1 in parasites exacerbates disease severity in animal models and humans, related to a disproportioned innate immune response, and is correlated with drug treatment failures in humans. Although the virus was identified decades ago, its genomic diversity has been overlooked until now. Methodology/Principles findings We subjected LRV1 strains from 19 L. (V.) guyanensis and one L. (V.) braziliensis isolates obtained from cutaneous leishmaniasis samples identified throughout French Guiana with next-generation sequencing and de novo sequence assembly. We generated and analyzed 24 unique LRV1 sequences over their full-length coding regions. Multiple alignment of these new sequences revealed variability (0.5%–23.5%) across the entire sequence except for highly conserved motifs within the 5’ untranslated region. Phylogenetic analyses showed that viral genomes of L. (V.) guyanensis grouped into five distinct clusters. They further showed a species-dependent clustering between viral genomes of L. (V.) guyanensis and L. (V.) braziliensis, confirming a long-term co-evolutionary history. Noteworthy, we identified cases of multiple LRV1 infections in three of the 20 Leishmania isolates. Conclusions/Significance Here, we present the first-ever estimate of LRV1 genomic diversity that exists in Leishmania (V.) guyanensis parasites. Genetic characterization and phylogenetic analyses of these viruses has shed light on their evolutionary relationships. To our knowledge, this study is also the first to report cases of multiple LRV1 infections in some parasites. Finally, this work has made it possible to develop molecular tools for adequate identification and genotyping of LRV1 strains for diagnostic purposes. Given the suspected worsening role of LRV1 infection in the pathogenesis of human leishmaniasis, these data have a major impact from a clinical viewpoint and for the management of Leishmania-infected patients. PMID:28715422
A dual-rating method for evaluating impact noise isolation of floor-ceiling assemblies.
LoVerde, John J; Dong, D Wayland
2017-01-01
Impact Insulation Class (IIC), the single-number rating for evaluating the impact noise insulation of a floor-ceiling assembly, and the associated field testing ratings, are unsatisfactory because they do not have strong correlation with subjective reaction nor provide suitable detailed information for evaluation or design of floor-ceiling assemblies. Various proposals have been made for improving the method, but the data presented indicate that no single-number rating can adequately characterize the impact noise isolation of an assembly. For realistic impact noise sources and floor-ceiling assembly types, there are two frequency domains for impact noise, and the impact noise levels in the two domains can vary independently. Therefore, two ratings are required in order to satisfactorily evaluate the impact isolation provided by a floor-ceiling assembly. Two different ratings are introduced for measuring field impact isolation in the two frequency domains, using the existing impact source and measurement method. They are named low-frequency impact rating (LIR) and high-frequency impact rating (HIR). LIR and HIR are proposed to improve the current method for design and evaluation of floor-ceiling assemblies and also provide a better method for predicting subjective reaction.
Moll, Karen M; Zhou, Peng; Ramaraj, Thiruvarangan; Fajardo, Diego; Devitt, Nicholas P; Sadowsky, Michael J; Stupar, Robert M; Tiffin, Peter; Miller, Jason R; Young, Nevin D; Silverstein, Kevin A T; Mudge, Joann
2017-08-04
Third generation sequencing technologies, with sequencing reads in the tens- of kilo-bases, facilitate genome assembly by spanning ambiguous regions and improving continuity. This has been critical for plant genomes, which are difficult to assemble due to high repeat content, gene family expansions, segmental and tandem duplications, and polyploidy. Recently, high-throughput mapping and scaffolding strategies have further improved continuity. Together, these long-range technologies enable quality draft assemblies of complex genomes in a cost-effective and timely manner. Here, we present high quality genome assemblies of the model legume plant, Medicago truncatula (R108) using PacBio, Dovetail Chicago (hereafter, Dovetail) and BioNano technologies. To test these technologies for plant genome assembly, we generated five assemblies using all possible combinations and ordering of these three technologies in the R108 assembly. While the BioNano and Dovetail joins overlapped, they also showed complementary gains in continuity and join numbers. Both technologies spanned repetitive regions that PacBio alone was unable to bridge. Combining technologies, particularly Dovetail followed by BioNano, resulted in notable improvements compared to Dovetail or BioNano alone. A combination of PacBio, Dovetail, and BioNano was used to generate a high quality draft assembly of R108, a M. truncatula accession widely used in studies of functional genomics. As a test for the usefulness of the resulting genome sequence, the new R108 assembly was used to pinpoint breakpoints and characterize flanking sequence of a previously identified translocation between chromosomes 4 and 8, identifying more than 22.7 Mb of novel sequence not present in the earlier A17 reference assembly. Adding Dovetail followed by BioNano data yielded complementary improvements in continuity over the original PacBio assembly. This strategy proved efficient and cost-effective for developing a quality draft assembly compared to traditional reference assemblies.
Wang, Anqi; Wang, Zhanyu; Li, Zheng; Li, Lei M
2018-06-15
It is highly desirable to assemble genomes of high continuity and consistency at low cost. The current bottleneck of draft genome continuity using the second generation sequencing (SGS) reads is primarily caused by uncertainty among repetitive sequences. Even though the single-molecule real-time sequencing technology is very promising to overcome the uncertainty issue, its relatively high cost and error rate add burden on budget or computation. Many long-read assemblers take the overlap-layout-consensus (OLC) paradigm, which is less sensitive to sequencing errors, heterozygosity and variability of coverage. However, current assemblers of SGS data do not sufficiently take advantage of the OLC approach. Aiming at minimizing uncertainty, the proposed method BAUM, breaks the whole genome into regions by adaptive unique mapping; then the local OLC is used to assemble each region in parallel. BAUM can (i) perform reference-assisted assembly based on the genome of a close species (ii) or improve the results of existing assemblies that are obtained based on short or long sequencing reads. The tests on two eukaryote genomes, a wild rice Oryza longistaminata and a parrot Melopsittacus undulatus, show that BAUM achieved substantial improvement on genome size and continuity. Besides, BAUM reconstructed a considerable amount of repetitive regions that failed to be assembled by existing short read assemblers. We also propose statistical approaches to control the uncertainty in different steps of BAUM. http://www.zhanyuwang.xin/wordpress/index.php/2017/07/21/baum. Supplementary data are available at Bioinformatics online.
The A, C, G, and T of Genome Assembly.
Wajid, Bilal; Sohail, Muhammad U; Ekti, Ali R; Serpedin, Erchin
2016-01-01
Genome assembly in its two decades of history has produced significant research, in terms of both biotechnology and computational biology. This contribution delineates sequencing platforms and their characteristics, examines key steps involved in filtering and processing raw data, explains assembly frameworks, and discusses quality statistics for the assessment of the assembled sequence. Furthermore, the paper explores recent Ubuntu-based software environments oriented towards genome assembly as well as some avenues for future research.
An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing.
Zimin, Aleksey V; Stevens, Kristian A; Crepeau, Marc W; Puiu, Daniela; Wegrzyn, Jill L; Yorke, James A; Langley, Charles H; Neale, David B; Salzberg, Steven L
2017-01-01
The 22-gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs whose weighted average (N50) size was 8206 bp. To improve this result, we generated approximately 12-fold coverage in long reads using the Single Molecule Real Time sequencing technology developed at Pacific Biosciences. We assembled the long and short reads together using the MaSuRCA mega-reads assembly algorithm, which produced a substantially better assembly, P. taeda version 2.0. The new assembly has an N50 contig size of 25 361, more than three times as large as achieved in the original assembly, and an N50 scaffold size of 107 821, 61% larger than the previous assembly. © The Author 2017. Published by Oxford University Press.
Zimin, Aleksey V; Stevens, Kristian A; Crepeau, Marc W; Puiu, Daniela; Wegrzyn, Jill L; Yorke, James A; Langley, Charles H; Neale, David B; Salzberg, Steven L
2017-10-01
The 22-gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs whose weighted average (N50) size was 8206 bp. To improve this result, we generated approximately 12-fold coverage in long reads using the Single Molecule Real Time sequencing technology developed at Pacific Biosciences. We assembled the long and short reads together using the MaSuRCA mega-reads assembly algorithm, which produced a substantially better assembly, P. taeda version 2.0. The new assembly has an N50 contig size of 25 361, more than three times as large as achieved in the original assembly, and an N50 scaffold size of 107 821, 61% larger than the previous assembly. © The Authors 2017. Published by Oxford University Press.
NASA Astrophysics Data System (ADS)
Parks, M. C.; Moreno, E.
2016-02-01
Vibrio parahaemolyticus [Vp] is a Gram-negative bacterium and a natural inhabitant of coastal marine ecosystems worldwide. Vp is also a coincidental pathogen of humans. Virulent strains are commonly identified by the presence of the thermostable direct (tdh) or tdh-related (trh) hemolysin genes. However, virulence is multifaceted and many clinical Vp isolates do not carry tdh or trh. In this study, we sequenced and assembled the draft genome of a tdh- and trh-negative environmental isolate (805) shown previously to be highly virulent in zebrafish. To investigate potential mechanisms of virulence, we compared 805 to the clinical V. parahaemolyticus type strain (RIMD2210633). Pairwise comparison revealed the presence of multiple genomic regions including an IncF conjugative pilus (1.3 Kb) and a colicin V plasmid (1.49 Kb). These features are homologous to genomic regions present in clinical V. vulnificus and V. cholerae strains. Genome comparison also revealed the presence of five toxin-antitoxin systems. Isolate 805 likely attained these new features through the lateral acquisition of mobile genomic material - a hypothesis supported by the aberrant GC content of these regions. Colicin V plasmids are a diverse group of IncF plasmids found in invasive bacterial strains. Similarly, an abundance of toxin-antitoxin systems have been linked to virulence in Gram-negative bacteria. Current efforts are focused on characterizing 142 coding features present in 805 but absent from the type strain.
Cost-Effective Sequencing of Full-Length cDNA Clones Powered by a De Novo-Reference Hybrid Assembly
Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka
2010-01-01
Background Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. Methodology We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence ∼800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. Conclusions The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only ∼US$3 per clone, demonstrating a significant advantage over previous approaches. PMID:20479877
Chhuneja, Parveen; Yadav, Bharat; Stirnweis, Daniel; Hurni, Severine; Kaur, Satinder; Elkot, Ahmed Fawzy; Keller, Beat; Wicker, Thomas; Sehgal, Sunish; Gill, Bikram S; Singh, Kuldeep
2015-10-01
A novel powdery mildew resistance gene and a new allele of Pm1 were identified and fine mapped. DNA markers suitable for marker-assisted selection have been identified. Powdery mildew caused by Blumeria graminis is one of the most important foliar diseases of wheat and causes significant yield losses worldwide. Diploid A genome species are an important genetic resource for disease resistance genes. Two powdery mildew resistance genes, identified in Triticum boeoticum (A(b)A(b)) accession pau5088, PmTb7A.1 and PmTb7A.2 were mapped on chromosome 7AL. In the present study, shotgun sequence assembly data for chromosome 7AL were utilised for fine mapping of these Pm resistance genes. Forty SSR, 73 resistance gene analogue-based sequence-tagged sites (RGA-STS) and 36 single nucleotide polymorphism markers were designed for fine mapping of PmTb7A.1 and PmTb7A.2. Twenty-one RGA-STS, 8 SSR and 13 SNP markers were mapped to 7AL. RGA-STS markers Ta7AL-4556232 and 7AL-4426363 were linked to the PmTb7A.1 and PmTb7A.2, at a genetic distance of 0.6 and 6.0 cM, respectively. The present investigation established that PmTb7A.1 is a new powdery mildew resistance gene that confers resistance to a broad range of Bgt isolates, whereas PmTb7A.2 most probably is a new allele of Pm1 based on chromosomal location and screening with Bgt isolates showing differential reaction on lines with different Pm1 alleles. The markers identified to be linked to the two Pm resistance genes are robust and can be used for marker-assisted introgression of these genes to hexaploid wheat.
Pajuelo, Mónica J; Eguiluz, María; Dahlstrom, Eric; Requena, David; Guzmán, Frank; Ramirez, Manuel; Sheen, Patricia; Frace, Michael; Sammons, Scott; Cama, Vitaliano; Anzick, Sarah; Bruno, Dan; Mahanty, Siddhartha; Wilkins, Patricia; Nash, Theodore; Gonzalez, Armando; García, Héctor H; Gilman, Robert H; Porcella, Steve; Zimic, Mirko
2015-12-01
Infections with Taenia solium are the most common cause of adult acquired seizures worldwide, and are the leading cause of epilepsy in developing countries. A better understanding of the genetic diversity of T. solium will improve parasite diagnostics and transmission pathways in endemic areas thereby facilitating the design of future control measures and interventions. Microsatellite markers are useful genome features, which enable strain typing and identification in complex pathogen genomes. Here we describe microsatellite identification and characterization in T. solium, providing information that will assist in global efforts to control this important pathogen. For genome sequencing, T. solium cysts and proglottids were collected from Huancayo and Puno in Peru, respectively. Using next generation sequencing (NGS) and de novo assembly, we assembled two draft genomes and one hybrid genome. Microsatellite sequences were identified and 36 of them were selected for further analysis. Twenty T. solium isolates were collected from Tumbes in the northern region, and twenty from Puno in the southern region of Peru. The size-polymorphism of the selected microsatellites was determined with multi-capillary electrophoresis. We analyzed the association between microsatellite polymorphism and the geographic origin of the samples. The predicted size of the hybrid (proglottid genome combined with cyst genome) T. solium genome was 111 MB with a GC content of 42.54%. A total of 7,979 contigs (>1,000 nt) were obtained. We identified 9,129 microsatellites in the Puno-proglottid genome and 9,936 in the Huancayo-cyst genome, with 5 or more repeats, ranging from mono- to hexa-nucleotide. Seven microsatellites were polymorphic and 29 were monomorphic within the analyzed isolates. T. solium tapeworms were classified into two genetic groups that correlated with the North/South geographic origin of the parasites. The availability of draft genomes for T. solium represents a significant step towards the understanding the biology of the parasite. We report here a set of T. solium polymorphic microsatellite markers that appear promising for genetic epidemiology studies.
Mutations in the putative calcium-binding domain of polyomavirus VP1 affect capsid assembly
NASA Technical Reports Server (NTRS)
Haynes, J. I. 2nd; Chang, D.; Consigli, R. A.; Spooner, B. S. (Principal Investigator)
1993-01-01
Calcium ions appear to play a major role in maintaining the structural integrity of the polyomavirus and are likely involved in the processes of viral uncoating and assembly. Previous studies demonstrated that a VP1 fragment extending from Pro-232 to Asp-364 has calcium-binding capabilities. This fragment contains an amino acid stretch from Asp-266 to Glu-277 which is quite similar in sequence to the amino acids that make up the calcium-binding EF hand structures found in many proteins. To assess the contribution of this domain to polyomavirus structural integrity, the effects of mutations in this region were examined by transfecting mutated viral DNA into susceptible cells. Immunofluorescence studies indicated that although viral protein synthesis occurred normally, infective viral progeny were not produced in cells transfected with polyomavirus genomes encoding either a VP1 molecule lacking amino acids Thr-262 through Gly-276 or a VP1 molecule containing a mutation of Asp-266 to Ala. VP1 molecules containing the deletion mutation were unable to bind 45Ca in an in vitro assay. Upon expression in Escherichia coli and purification by immunoaffinity chromatography, wild-type VP1 was isolated as pentameric, capsomere-like structures which could be induced to form capsid-like structures upon addition of CaCl2, consistent with previous studies. However, although VP1 containing the point mutation was isolated as pentamers which were indistinguishable from wild-type VP1 pentamers, addition of CaCl2 did not result in their assembly into capsid-like structures. Immunogold labeling and electron microscopy studies of transfected mammalian cells provided in vivo evidence that a mutation in this region affects the process of viral assembly.
Hulse-Kemp, Amanda M; Maheshwari, Shamoni; Stoffel, Kevin; Hill, Theresa A; Jaffe, David; Williams, Stephen R; Weisenfeld, Neil; Ramakrishnan, Srividya; Kumar, Vijay; Shah, Preyas; Schatz, Michael C; Church, Deanna M; Van Deynze, Allen
2018-01-01
Linked-Read sequencing technology has recently been employed successfully for de novo assembly of human genomes, however, the utility of this technology for complex plant genomes is unproven. We evaluated the technology for this purpose by sequencing the 3.5-gigabase (Gb) diploid pepper ( Capsicum annuum ) genome with a single Linked-Read library. Plant genomes, including pepper, are characterized by long, highly similar repetitive sequences. Accordingly, significant effort is used to ensure that the sequenced plant is highly homozygous and the resulting assembly is a haploid consensus. With a phased assembly approach, we targeted a heterozygous F 1 derived from a wide cross to assess the ability to derive both haplotypes and characterize a pungency gene with a large insertion/deletion. The Supernova software generated a highly ordered, more contiguous sequence assembly than all currently available C. annuum reference genomes. Over 83% of the final assembly was anchored and oriented using four publicly available de novo linkage maps. A comparison of the annotation of conserved eukaryotic genes indicated the completeness of assembly. The validity of the phased assembly is further demonstrated with the complete recovery of both 2.5-Kb insertion/deletion haplotypes of the PUN1 locus in the F 1 sample that represents pungent and nonpungent peppers, as well as nearly full recovery of the BUSCO2 gene set within each of the two haplotypes. The most contiguous pepper genome assembly to date has been generated which demonstrates that Linked-Read library technology provides a tool to de novo assemble complex highly repetitive heterozygous plant genomes. This technology can provide an opportunity to cost-effectively develop high-quality genome assemblies for other complex plants and compare structural and gene differences through accurate haplotype reconstruction.
Lakshmanan, Anupama; Cheong, Daniel W; Accardo, Angelo; Di Fabrizio, Enzo; Riekel, Christian; Hauser, Charlotte A E
2013-01-08
The self-assembly of abnormally folded proteins into amyloid fibrils is a hallmark of many debilitating diseases, from Alzheimer's and Parkinson diseases to prion-related disorders and diabetes type II. However, the fundamental mechanism of amyloid aggregation remains poorly understood. Core sequences of four to seven amino acids within natural amyloid proteins that form toxic fibrils have been used to study amyloidogenesis. We recently reported a class of systematically designed ultrasmall peptides that self-assemble in water into cross-β-type fibers. Here we compare the self-assembly of these peptides with natural core sequences. These include core segments from Alzheimer's amyloid-β, human amylin, and calcitonin. We analyzed the self-assembly process using circular dichroism, electron microscopy, X-ray diffraction, rheology, and molecular dynamics simulations. We found that the designed aliphatic peptides exhibited a similar self-assembly mechanism to several natural sequences, with formation of α-helical intermediates being a common feature. Interestingly, the self-assembly of a second core sequence from amyloid-β, containing the diphenylalanine motif, was distinctly different from all other examined sequences. The diphenylalanine-containing sequence formed β-sheet aggregates without going through the α-helical intermediate step, giving a unique fiber-diffraction pattern and simulation structure. Based on these results, we propose a simplified aliphatic model system to study amyloidosis. Our results provide vital insight into the nature of early intermediates formed and suggest that aromatic interactions are not as important in amyloid formation as previously postulated. This information is necessary for developing therapeutic drugs that inhibit and control amyloid formation.
FMLRC: Hybrid long read error correction using an FM-index.
Wang, Jeremy R; Holt, James; McMillan, Leonard; Jones, Corbin D
2018-02-09
Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging "hybrid" assemblies that use long reads for scaffolding and short reads for accuracy. We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods. Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.
Reducing assembly complexity of microbial genomes with single-molecule sequencing
USDA-ARS?s Scientific Manuscript database
Genome assembly algorithms cannot fully reconstruct microbial chromosomes from the DNA reads output by first or second-generation sequencing instruments. Therefore, most genomes are left unfinished due to the significant resources required to manually close gaps left in the draft assemblies. Single-...
Kingry, Luke C; Batra, Dhwani; Replogle, Adam; Rowe, Lori A; Pritt, Bobbi S; Petersen, Jeannine M
2016-01-01
Borrelia mayonii, a Borrelia burgdorferi sensu lato (Bbsl) genospecies, was recently identified as a cause of Lyme borreliosis (LB) among patients from the upper midwestern United States. By microscopy and PCR, spirochete/genome loads in infected patients were estimated at 105 to 106 per milliliter of blood. Here, we present the full chromosome and plasmid sequences of two B. mayonii isolates, MN14-1420 and MN14-1539, cultured from blood of two of these patients. Whole genome sequencing and assembly was conducted using PacBio long read sequencing (Pacific Biosciences RSII instrument) followed by hierarchical genome-assembly process (HGAP). The B. mayonii genome is ~1.31 Mbp in size (26.9% average GC content) and is comprised of a linear chromosome, 8 linear and 7 circular plasmids. Consistent with its taxonomic designation as a new Bbsl genospecies, the B. mayonii linear chromosome shares only 93.83% average nucleotide identity with other genospecies. Both B. mayonii genomes contain plasmids similar to B. burgdorferi sensu stricto lp54, lp36, lp28-3, lp28-4, lp25, lp17, lp5, 5 cp32s, cp26, and cp9. The vls locus present on lp28-10 of B. mayonii MN14-1420 is remarkably long, being comprised of 24 silent vls cassettes. Genetic differences between the two B. mayonii genomes are limited and include 15 single nucleotide variations as well as 7 fewer silent vls cassettes and a lack of the lp5 plasmid in MN14-1539. Notably, 68 homologs to proteins present in B. burgdorferi sensu stricto appear to be lacking from the B. mayonii genomes. These include the complement inhibitor, CspZ (BB_H06), the fibronectin binding protein, BB_K32, as well as multiple lipoproteins and proteins of unknown function. This study shows the utility of long read sequencing for full genome assembly of Bbsl genomes, identifies putative genome regions of B. mayonii that may be linked to clinical manifestation or tissue tropism, and provides a valuable resource for pathogenicity, diagnostic and vaccine studies.
Batra, Dhwani; Replogle, Adam; Rowe, Lori A.; Pritt, Bobbi S.; Petersen, Jeannine M.
2016-01-01
Borrelia mayonii, a Borrelia burgdorferi sensu lato (Bbsl) genospecies, was recently identified as a cause of Lyme borreliosis (LB) among patients from the upper midwestern United States. By microscopy and PCR, spirochete/genome loads in infected patients were estimated at 105 to 106 per milliliter of blood. Here, we present the full chromosome and plasmid sequences of two B. mayonii isolates, MN14-1420 and MN14-1539, cultured from blood of two of these patients. Whole genome sequencing and assembly was conducted using PacBio long read sequencing (Pacific Biosciences RSII instrument) followed by hierarchical genome-assembly process (HGAP). The B. mayonii genome is ~1.31 Mbp in size (26.9% average GC content) and is comprised of a linear chromosome, 8 linear and 7 circular plasmids. Consistent with its taxonomic designation as a new Bbsl genospecies, the B. mayonii linear chromosome shares only 93.83% average nucleotide identity with other genospecies. Both B. mayonii genomes contain plasmids similar to B. burgdorferi sensu stricto lp54, lp36, lp28-3, lp28-4, lp25, lp17, lp5, 5 cp32s, cp26, and cp9. The vls locus present on lp28-10 of B. mayonii MN14-1420 is remarkably long, being comprised of 24 silent vls cassettes. Genetic differences between the two B. mayonii genomes are limited and include 15 single nucleotide variations as well as 7 fewer silent vls cassettes and a lack of the lp5 plasmid in MN14-1539. Notably, 68 homologs to proteins present in B. burgdorferi sensu stricto appear to be lacking from the B. mayonii genomes. These include the complement inhibitor, CspZ (BB_H06), the fibronectin binding protein, BB_K32, as well as multiple lipoproteins and proteins of unknown function. This study shows the utility of long read sequencing for full genome assembly of Bbsl genomes, identifies putative genome regions of B. mayonii that may be linked to clinical manifestation or tissue tropism, and provides a valuable resource for pathogenicity, diagnostic and vaccine studies. PMID:28030649
Budak, Hikmet; Kantar, Melda
2015-07-01
MicroRNAs (miRNAs) are small, endogenous, non-coding RNA molecules that regulate gene expression at the post-transcriptional level. As high-throughput next generation sequencing (NGS) and Big Data rapidly accumulate for various species, efforts for in silico identification of miRNAs intensify. Surprisingly, the effect of the input genomics sequence on the robustness of miRNA prediction was not evaluated in detail to date. In the present study, we performed a homology-based miRNA and isomiRNA prediction of the 5D chromosome of bread wheat progenitor, Aegilops tauschii, using two distinct sequence data sets as input: (1) raw sequence reads obtained from 454-GS FLX Titanium sequencing platform and (2) an assembly constructed from these reads. We also compared this method with a number of available plant sequence datasets. We report here the identification of 62 and 22 miRNAs from raw reads and the assembly, respectively, of which 16 were predicted with high confidence from both datasets. While raw reads promoted sensitivity with the high number of miRNAs predicted, 55% (12 out of 22) of the assembly-based predictions were supported by previous observations, bringing specificity forward compared to the read-based predictions, of which only 37% were supported. Importantly, raw reads could identify several repeat-related miRNAs that could not be detected with the assembly. However, raw reads could not capture 6 miRNAs, for which the stem-loops could only be covered by the relatively longer sequences from the assembly. In summary, the comparison of miRNA datasets obtained by these two strategies revealed that utilization of raw reads, as well as assemblies for in silico prediction, have distinct advantages and disadvantages. Consideration of these important nuances can benefit future miRNA identification efforts in the current age of NGS and Big Data driven life sciences innovation.
Cassette less SOFC stack and method of assembly
DOE Office of Scientific and Technical Information (OSTI.GOV)
Meinhardt, Kerry D
2014-11-18
A cassette less SOFC assembly and a method for creating such an assembly. The SOFC stack is characterized by an electrically isolated stack current path which allows welded interconnection between frame portions of the stack. In one embodiment electrically isolating a current path comprises the step of sealing a interconnect plate to a interconnect plate frame with an insulating seal. This enables the current path portion to be isolated from the structural frame an enables the cell frame to be welded together.
Conlan, Sean; Thomas, Pamela J.; Deming, Clayton; Park, Morgan; Lau, Anna F.; Dekker, John P.; Snitkin, Evan S.; Clark, Tyson A.; Luong, Khai; Song, Yi; Tsai, Yu-Chih; Boitano, Matthew; Gupta, Jyoti; Brooks, Shelise Y.; Schmidt, Brian; Young, Alice C.; Thomas, James W.; Bouffard, Gerard G.; Blakesley, Robert W.; Mullikin, James C.; Korlach, Jonas; Henderson, David K.; Frank, Karen M.; Palmore, Tara N.; Segre, Julia A.
2014-01-01
Public health officials have raised concerns that plasmid transfer between Enterobacteriaceae species may spread resistance to carbapenems, an antibiotic class of last resort, thereby rendering common healthcare-associated infections nearly impossible to treat. We performed comprehensive surveillance and genomic sequencing to identify carbapenem-resistant Enterobacteriaceae in the NIH Clinical Center patient population and hospital environment in order to to articulate the diversity of carbapenemase-encoding plasmids and survey the mobility of and assess the mobility of these plasmids between bacterial species. We isolated a repertoire of carbapenemase-encoding Enterobacteriaceae, including multiple strains of Klebsiella pneumoniae, Klebsiella oxytoca, Escherichia coli, Enterobacter cloacae, Citrobacter freundii, and Pantoea species. Long-read genome sequencing with full end-to-end assembly revealed that these organisms carry the carbapenem-resistance genes on a wide array of plasmids. Klebsiella pneumoniae and Enterobacter cloacae isolated simultaneously from a single patient harbored two different carbapenemase-encoding plasmids, overriding the epidemiological scenario of plasmid transfer between organisms within this patient. We did, however, find evidence supporting horizontal transfer of carbapenemase-encoding plasmids between Klebsiella pneumoniae, Enterobacter cloacae and Citrobacter freundii in the hospital environment. Our comprehensive sequence data, with full plasmid identification, challenges assumptions about horizontal gene transfer events within patients and identified wider possible connections between patients and the hospital environment. In addition, we identified a new carbapenemase-encoding plasmid of potentially high clinical impact carried by Klebsiella pneumoniae, Escherichia coli, Enterobacter cloacae and Pantoea species, from unrelated patients and the hospital environment. PMID:25232178
Marcacci, Maurilia; Ancora, Massimo; Mangone, Iolanda; Teodori, Liana; Di Sabatino, Daria; De Massis, Fabrizio; Camma', Cesare; Savini, Giovanni; Lorusso, Alessio
2014-06-01
Dynamic surveillance and characterization of canine distemper virus (CDV) circulating strains are essential against possible vaccine breakthroughs events. This study describes the setup of a fast and robust next-generation sequencing (NGS) Ion PGM™ protocol that was used to obtain the complete genome sequence of a CDV isolate (CDV2784/2013). CDV2784/2013 is the prototype of CDV strains responsible for severe clinical distemper in dogs and wolves in Italy during 2013. CDV2784/2013 was isolated on cell culture and total RNA was used for NGS sample preparation. A total of 112.3 Mb of reads were assembled de novo using MIRA version 4.0rc4, which yielded a total number of 403 contigs with 12.1% coverage. The whole genome (15,690 bp) was recovered successfully and compared to those of existing CDV whole genomes. CDV2784/2013 was shown to have 92% nt identity with the Onderstepoort vaccine strain. This study describes for the first time a fast and robust Ion PGM™ platform-based whole genome amplification protocol for non-segmented negative stranded RNA viruses starting from total cell-purified RNA. Additionally, this is the first study reporting the whole genome analysis of an Arctic lineage strain that is known to circulate widely in Europe, Asia and USA. Copyright © 2014 Elsevier B.V. All rights reserved.
One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly.
Koren, Sergey; Phillippy, Adam M
2015-02-01
Like a jigsaw puzzle with large pieces, a genome sequenced with long reads is easier to assemble. However, recent sequencing technologies have favored lowering per-base cost at the expense of read length. This has dramatically reduced sequencing cost, but resulted in fragmented assemblies, which negatively affect downstream analyses and hinder the creation of finished (gapless, high-quality) genomes. In contrast, emerging long-read sequencing technologies can now produce reads tens of kilobases in length, enabling the automated finishing of microbial genomes for under $1000. This promises to improve the quality of reference databases and facilitate new studies of chromosomal structure and variation. We present an overview of these new technologies and the methods used to assemble long reads into complete genomes. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans
Tully, Benjamin J.; Graham, Elaina D.; Heidelberg, John F.
2018-01-01
Microorganisms play a crucial role in mediating global biogeochemical cycles in the marine environment. By reconstructing the genomes of environmental organisms through metagenomics, researchers are able to study the metabolic potential of Bacteria and Archaea that are resistant to isolation in the laboratory. Utilizing the large metagenomic dataset generated from 234 samples collected during the Tara Oceans circumnavigation expedition, we were able to assemble 102 billion paired-end reads into 562 million contigs, which in turn were co-assembled and consolidated in to 7.2 million contigs ≥2 kb in length. Approximately 1 million of these contigs were binned to reconstruct draft genomes. In total, 2,631 draft genomes with an estimated completion of ≥50% were generated (1,491 draft genomes >70% complete; 603 genomes >90% complete). A majority of the draft genomes were manually assigned phylogeny based on sets of concatenated phylogenetic marker genes and/or 16S rRNA gene sequences. The draft genomes are now publically available for the research community at-large. PMID:29337314
Smukowski Heil, Caiti; Burton, Joshua N; Liachko, Ivan; Friedrich, Anne; Hanson, Noah A; Morris, Cody L; Schacherer, Joseph; Shendure, Jay; Thomas, James H; Dunham, Maitreya J
2018-01-01
Interspecific hybridization is a common mechanism enabling genetic diversification and adaptation; however, the detection of hybrid species has been quite difficult. The identification of microbial hybrids is made even more complicated, as most environmental microbes are resistant to culturing and must be studied in their native mixed communities. We have previously adapted the chromosome conformation capture method Hi-C to the assembly of genomes from mixed populations. Here, we show the method's application in assembling genomes directly from an uncultured, mixed population from a spontaneously inoculated beer sample. Our assembly method has enabled us to de-convolute four bacterial and four yeast genomes from this sample, including a putative yeast hybrid. Downstream isolation and analysis of this hybrid confirmed its genome to consist of Pichia membranifaciens and that of another related, but undescribed, yeast. Our work shows that Hi-C-based metagenomic methods can overcome the limitation of traditional sequencing methods in studying complex mixtures of genomes. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.