Science.gov

Sample records for fully sequenced genomes

  1. Analysis of singleton ORFans in fully sequenced microbial genomes.

    PubMed

    Siew, Naomi; Fischer, Daniel

    2003-11-01

    Singleton sequence ORFans are orphan ORFs (open reading frames) that have no detectable sequence similarity to any other sequence in the databases. ORFans are of particular interest not only as evolutionary puzzles but also because we can learn little about them using bioinformatics tools. Here, we present a first systematic analysis of singleton ORFans in the first 60 fully sequenced microbial genomes. We show that although ORFans have been underemphasized, the number of ORFans is steadily growing, currently accounting for 23,634 sequences. At the same time, the percentage of ORFans as a fraction of all sequences is slowly diminishing, and is currently about 14%. Short ORFans comprise about 61% of all ORFans. The abundance of short ORFans may be due to a yet unexplained artifact. The data also suggest that the number of longer ORFans may soon diminish as more genomes of closely related organisms become available. To better address the questions about the functions and origins of ORFans, we propose to focus further studies on the longer ORFans, with emphasis on three new types of ORFans: ORFan modules, paralogous ORFans, and orthologous ORFans. We conclude that the large number of ORFans reflects an intrinsic property of the genetic material not yet fully understood. Further computational and experimental studies aimed at understanding Nature's protein diversity should also include ORFans. PMID:14517975

  2. Genome-Wide Survey of Pseudogenes in 80 Fully Re-sequenced Arabidopsis thaliana Accessions

    PubMed Central

    Yao, Yongfang; Tian, Dacheng; Araki, Hitoshi; Yang, Sihai

    2012-01-01

    Pseudogenes (Ψs), including processed and non-processed Ψs, are ubiquitous genetic elements derived from originally functional genes in all studied genomes within the three kingdoms of life. However, systematic surveys of non-processed Ψs utilizing genomic information from multiple samples within a species are still rare. Here a systematic comparative analysis was conducted of Ψs within 80 fully re-sequenced Arabidopsis thaliana accessions, and 7546 genes, representing ∼28% of the genomic annotated open reading frames (ORFs), were found with disruptive mutations in at least one accession. The distribution of these Ψs on chromosomes showed a significantly negative correlation between Ψs/ORFs and their local gene densities, suggesting a higher proportion of Ψs in gene desert regions, e.g. near centromeres. On the other hand, compared with the non-Ψ loci, even the intact coding sequences (CDSs) in the Ψ loci were found to have shorter CDS length, fewer exon number and lower GC content. In addition, a significant functional bias against the null hypothesis was detected in the Ψs mainly involved in responses to environmental stimuli and biotic stress as reported, suggesting that they are likely important for adaptive evolution to rapidly changing environments by pseudogenization to accumulate successive mutations. PMID:23272162

  3. Genome Sequence of Oenococcus oeni OM27, the First Fully Assembled Genome of a Strain Isolated from an Italian Wine.

    PubMed

    Lamontanara, Antonella; Orrù, Luigi; Cattivelli, Luigi; Russo, Pasquale; Spano, Giuseppe; Capozzi, Vittorio

    2014-01-01

    Oenococcus oeni OM27 is a strain selected from "Nero di Troia" wine undergoing spontaneous malolactic fermentation. "Nero di Troia" is a wine made from "Uva di Troia" grapes, an autochthonous black grape variety from the Apulian region (south of Italy). In this paper we present a 1.78-Mb assembly of the O. oeni OM27 genome, the first fully assembled genome of an O. oeni strain from an Italian wine. PMID:24994801

  4. The repertoire of G-protein-coupled receptors in fully sequenced genomes.

    PubMed

    Fredriksson, Robert; Schiöth, Helgi B

    2005-05-01

    The superfamily of G-protein-coupled receptors (GPCRs) is one of the largest and most studied families of proteins. We created Hidden Markov Models derived from sorted groups of GPCRs from our previous detailed phylogenetic classification of human GPCRs and added several other models derived from receptors not found in mammals. We used these models to search entire Genscan data sets from 13 species whose genomes are nearly completely sequenced. We found more than 5000 unique GPCRs that were divided into 15 main groups, and the largest one, the Rhodopsin family, was subdivided into 13 subclasses. The results show that the main families in the human genome, Glutamate, Rhodopsin, Adhesion, Frizzled, and Secretin, arose before the split of nematodes from the chordate lineage. Moreover, several of the subgroups of the Rhodopsin family arose before the split of the linage leading to vertebrates. We also searched expressed sequence tag (EST) databases and identified more than 20,000 sequences that match GPCRs. Although the GPCRs represent typically 1 to 2% of the Genscan predictions, the ESTs that match GPCRs are typically only 0.01 to 0.001%, indicating that GPCRs in most of the groups are expressed at low levels. We also provide searchable data sets that may be used for annotation and further detailed analysis of the GPCR family. This study provides an extensive overview of the expansion of the gene repertoire for families and subgroups of GPCRs. PMID:15687224

  5. Genome Sequence of Oenococcus oeni OM27, the First Fully Assembled Genome of a Strain Isolated from an Italian Wine

    PubMed Central

    Lamontanara, Antonella; Orrù, Luigi; Cattivelli, Luigi; Russo, Pasquale; Capozzi, Vittorio

    2014-01-01

    Oenococcus oeni OM27 is a strain selected from “Nero di Troia” wine undergoing spontaneous malolactic fermentation. “Nero di Troia” is a wine made from “Uva di Troia” grapes, an autochthonous black grape variety from the Apulian region (south of Italy). In this paper we present a 1.78-Mb assembly of the O. oeni OM27 genome, the first fully assembled genome of an O. oeni strain from an Italian wine. PMID:24994801

  6. Fully Closed Genome Sequences of Five Type Strains of the Genus Cronobacter and One Cronobacter sakazakii Strain

    PubMed Central

    Kassam, Mohamed; Baert, Leen; Tang, Yanjie; Barretto, Caroline; Ngom Bru, Catherine; Klijn, Adrianne; Descombes, Patrick

    2016-01-01

    Cronobacter is associated with infant infections and the consumption of reconstituted infant formula. Here we sequenced and closed six genomes of C. condimentiT, C. muytjensiiT, C. universalisT, C. malonaticusT, C. dublinensisT, and C. sakazakii that can be used as reference genomes in single nucleotide polymorphism (SNP)-based next-generation sequencing (NGS) analysis for source tracking investigations. PMID:27013041

  7. Fully Closed Genome Sequences of Five Type Strains of the Genus Cronobacter and One Cronobacter sakazakii Strain.

    PubMed

    Moine, Deborah; Kassam, Mohamed; Baert, Leen; Tang, Yanjie; Barretto, Caroline; Ngom Bru, Catherine; Klijn, Adrianne; Descombes, Patrick

    2016-01-01

    Cronobacteris associated with infant infections and the consumption of reconstituted infant formula. Here we sequenced and closed six genomes ofC. condimenti(T),C. muytjensii(T),C. universalis(T),C. malonaticus(T),C. dublinensis(T), andC. sakazakiithat can be used as reference genomes in single nucleotide polymorphism (SNP)-based next-generation sequencing (NGS) analysis for source tracking investigations. PMID:27013041

  8. Whole Genome Sequencing

    MedlinePlus

    ... you want to learn. Search form Search Whole Genome Sequencing You are here Home Testing & Services Testing ... the full story, click here . What is whole genome sequencing? Whole genome sequencing is the mapping out ...

  9. A survey of HK, HPt, and RR domains and their organization in two-component systems and phosphorelay proteins of organisms with fully sequenced genomes

    PubMed Central

    Salvado, Baldiri; Vilaprinyo, Ester; Sorribas, Albert

    2015-01-01

    Two Component Systems and Phosphorelays (TCS/PR) are environmental signal transduction cascades in prokaryotes and, less frequently, in eukaryotes. The internal domain organization of proteins and the topology of TCS/PR cascades play an important role in shaping the responses of the circuits. It is thus important to maintain updated censuses of TCS/PR proteins in order to identify the various topologies used by nature and enable a systematic study of the dynamics associated with those topologies. To create such a census, we analyzed the proteomes of 7,609 organisms from all domains of life with fully sequenced and annotated genomes. To begin, we survey each proteome searching for proteins containing domains that are associated with internal signal transmission within TCS/PR: Histidine Kinase (HK), Response Regulator (RR) and Histidine Phosphotranfer (HPt) domains, and analyze how these domains are arranged in the individual proteins. Then, we find all types of operon organization and calculate how much more likely are proteins that contain TCS/PR domains to be coded by neighboring genes than one would expect from the genome background of each organism. Finally, we analyze if the fusion of domains into single TCS/PR proteins is more frequently observed than one might expect from the background of each proteome. We find 50 alternative ways in which the HK, HPt, and RR domains are observed to organize into single proteins. In prokaryotes, TCS/PR coding genes tend to be clustered in operons. 90% of all proteins identified in this study contain just one of the three domains, while 8% of the remaining proteins combine one copy of an HK, a RR, and/or an HPt domain. In eukaryotes, 25% of all TCS/PR proteins have more than one domain. These results might have implications for how signals are internally transmitted within TCS/PR cascades. These implications could explain the selection of the various designs in alternative circumstances. PMID:26339559

  10. Multiplexed Fragaria Chloroplast Genome Sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A method to sequence multiple chloroplast genomes that uses the sequencing depth of ultra high throughput sequencing technologies was recently described. Sequencing complete chloroplast genomes can resolve phylogenetic relationships at low taxonomic levels and identify point mutations and indels tha...

  11. SINGLE CELL GENOME SEQUENCING

    PubMed Central

    Yilmaz, Suzan; Singh, Anup K.

    2011-01-01

    Whole genome amplification and next-generation sequencing of single cells has become a powerful approach for studying uncultivated microorganisms that represent 90–99 % of all environmental microbes. Single cell sequencing enables not only the identification of microbes but also linking of functions to species, a feat not achievable by metagenomic techniques. Moreover, it allows the analysis of low abundance species that may be missed in community-based analyses. It has also proved very useful in complementing metagenomics in the assembly and binning of single genomes. With the advent of drastically cheaper and higher throughput sequencing technologies, it is expected that single cell sequencing will become a standard tool in studying the genome and transcriptome of microbial communities. PMID:22154471

  12. Unlocking hidden genomic sequence

    PubMed Central

    Keith, Jonathan M.; Cochran, Duncan A. E.; Lala, Gita H.; Adams, Peter; Bryant, Darryn; Mitchelson, Keith R.

    2004-01-01

    Despite the success of conventional Sanger sequencing, significant regions of many genomes still present major obstacles to sequencing. Here we propose a novel approach with the potential to alleviate a wide range of sequencing difficulties. The technique involves extracting target DNA sequence from variants generated by introduction of random mutations. The introduction of mutations does not destroy original sequence information, but distributes it amongst multiple variants. Some of these variants lack problematic features of the target and are more amenable to conventional sequencing. The technique has been successfully demonstrated with mutation levels up to an average 18% base substitution and has been used to read previously intractable poly(A), AT-rich and GC-rich motifs. PMID:14973330

  13. Genome Sequence Databases (Overview): Sequencing and Assembly

    SciTech Connect

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  14. Genomic sequencing in clinical trials

    PubMed Central

    2011-01-01

    Human genome sequencing is the process by which the exact order of nucleic acid base pairs in the 24 human chromosomes is determined. Since the completion of the Human Genome Project in 2003, genomic sequencing is rapidly becoming a major part of our translational research efforts to understand and improve human health and disease. This article reviews the current and future directions of clinical research with respect to genomic sequencing, a technology that is just beginning to find its way into clinical trials both nationally and worldwide. We highlight the currently available types of genomic sequencing platforms, outline the advantages and disadvantages of each, and compare first- and next-generation techniques with respect to capabilities, quality, and cost. We describe the current geographical distributions and types of disease conditions in which these technologies are used, and how next-generation sequencing is strategically being incorporated into new and existing studies. Lastly, recent major breakthroughs and the ongoing challenges of using genomic sequencing in clinical research are discussed. PMID:22206293

  15. Decoding the human genome sequence.

    PubMed

    Bentley, D R

    2000-10-01

    The year 2000 is marked by the production of the sequence of the human genome. A 'working draft' of high quality sequence covering 90% of the genome has been determined and a quarter is in finished form, including the first two completed chromosomes. All sequence data from the project is made freely available to the community via the Internet, for further analysis and exploitation. The challenge which lies ahead is to decipher the information. Knowledge of the human genome sequence will enable us to understand how the genetic information determines the development, structure and function of the human body. We will be able to explore how variations within our DNA sequence cause disease, how they affect our interaction with our environment and ultimately to develop new and effective ways to improve human health. PMID:11005789

  16. Integrating sequence, evolution and functional genomics in regulatory genomics

    PubMed Central

    Vingron, Martin; Brazma, Alvis; Coulson, Richard; van Helden, Jacques; Manke, Thomas; Palin, Kimmo; Sand, Olivier; Ukkonen, Esko

    2009-01-01

    With genome analysis expanding from the study of genes to the study of gene regulation, 'regulatory genomics' utilizes sequence information, evolution and functional genomics measurements to unravel how regulatory information is encoded in the genome. PMID:19226437

  17. Sequencing Complex Genomic Regions

    SciTech Connect

    Eichler, Evan

    2009-05-28

    Evan Eichler, Howard Hughes Medical Investigator at the University of Washington, gives the May 28, 2009 keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM. Part 1 of 2

  18. Sequencing Complex Genomic Regions

    SciTech Connect

    Eichler, Evan

    2009-05-28

    Evan Eichler, Howard Hughes Medical Investigator at the University of Washington, gives the May 28, 2009 keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM. Part 2 of 2

  19. Poultry Genome Sequences: Progress and Outstanding Challenges

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The first build of the chicken genome sequence appeared in March 2004 – the first genome sequence of any animal agriculture species. That sequence was done primarily by whole genome shotgun Sanger sequencing, along with the use of an extensive BAC contig-based physical map to assemble the sequence ...

  20. Sequencing and mapping of the onion genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The cost of DNA sequencing continues to decline and, in the near future, it will become reasonable to undertake sequencing of the enormous nuclear genome of onion. We undertook sequencing of expressed and genomic regions of the onion genome to learn about the structure of the onion genome, as well a...

  1. Genome Sequence of Canine Herpesvirus

    PubMed Central

    Papageorgiou, Konstantinos V.; Suárez, Nicolás M.; Wilkie, Gavin S.; McDonald, Michael; Graham, Elizabeth M.; Davison, Andrew J.

    2016-01-01

    Canine herpesvirus is a widespread alphaherpesvirus that causes a fatal haemorrhagic disease of neonatal puppies. We have used high-throughput methods to determine the genome sequences of three viral strains (0194, V777 and V1154) isolated in the United Kingdom between 1985 and 2000. The sequences are very closely related to each other. The canine herpesvirus genome is estimated to be 125 kbp in size and consists of a unique long sequence (97.5 kbp) and a unique short sequence (7.7 kbp) that are each flanked by terminal and internal inverted repeats (38 bp and 10.0 kbp, respectively). The overall nucleotide composition is 31.6% G+C, which is the lowest among the completely sequenced alphaherpesviruses. The genome contains 76 open reading frames predicted to encode functional proteins, all of which have counterparts in other alphaherpesviruses. The availability of the sequences will facilitate future research on the diagnosis and treatment of canine herpesvirus-associated disease. PMID:27213534

  2. Genome Sequence of Spizellomyces punctatus

    PubMed Central

    Russ, Carsten; Lang, B. Franz; Chen, Zehua; Gujja, Sharvari; Shea, Terrance; Zeng, Qiandong; Young, Sarah; Nusbaum, Chad

    2016-01-01

    Spizellomyces punctatus is a basally branching chytrid fungus that is found in the Chytridiomycota phylum. Spizellomyces species are common in soil and of importance in terrestrial ecosystems. Here, we report the genome sequence of S. punctatus, which will facilitate the study of this group of early diverging fungi. PMID:27540072

  3. Meeting Highlights: Genome Sequencing and Biology 2001

    PubMed Central

    2001-01-01

    We bring you a report from the CSHL Genome Sequencing and Biology Meeting, which has a long and prestigious history. This year there were sessions on large-scale sequencing and analysis, polymorphisms (covering discovery and technologies and mapping and analysis), comparative genomics of mammalian and model organism genomes, functional genomics and bioinformatics. PMID:18628920

  4. Sequencing crop genomes: approaches and applications

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Plant genome sequencing methodology parrallels the sequencing of the human genome. The first projects were slow and very expensive. BAC by BAC approaches were utilized first and whole-genome shotgun sequencing rapidly replaced that approach. So called 'next generation' technologies such as short rea...

  5. Simple sequence repeats in bryophyte mitochondrial genomes.

    PubMed

    Zhao, Chao-Xian; Zhu, Rui-Liang; Liu, Yang

    2016-01-01

    Simple sequence repeats (SSRs) are thought to be common in plant mitochondrial (mt) genomes, but have yet to be fully described for bryophytes. We screened the mt genomes of two liverworts (Marchantia polymorpha and Pleurozia purpurea), two mosses (Physcomitrella patens and Anomodon rugelii) and two hornworts (Phaeoceros laevis and Nothoceros aenigmaticus), and detected 475 SSRs. Some SSRs are found conserved during the evolution, among which except one exists in both liverworts and mosses, all others are shared only by the two liverworts, mosses or hornworts. SSRs are known as DNA tracts having high mutation rates; however, according to our observations, they still can evolve slowly. The conservativeness of these SSRs suggests that they are under strong selection and could play critical roles in maintaining the gene functions. PMID:24491104

  6. Genome Project Standards in a New Era of Sequencing

    SciTech Connect

    GSC Consortia; HMP Jumpstart Consortia; Chain, P. S. G.; Grafham, D. V.; Fulton, R. S.; FitzGerald, M. G.; Hostetler, J.; Muzny, D.; Detter, J. C.; Ali, J.; Birren, B.; Bruce, D. C.; Buhay, C.; Cole, J. R.; Ding, Y.; Dugan, S.; Field, D.; Garrity, G. M.; Gibbs, R.; Graves, T.; Han, C. S.; Harrison, S. H.; Highlander, S.; Hugenholtz, P.; Khouri, H. M.; Kodira, C. D.; Kolker, E.; Kyrpides, N. C.; Lang, D.; Lapidus, A.; Malfatti, S. A.; Markowitz, V.; Metha, T.; Nelson, K. E.; Parkhill, J.; Pitluck, S.; Qin, X.; Read, T. D.; Schmutz, J.; Sozhamannan, S.; Strausberg, R.; Sutton, G.; Thomson, N. R.; Tiedje, J. M.; Weinstock, G.; Wollam, A.

    2009-06-01

    For over a decade, genome 43 sequences have adhered to only two standards that are relied on for purposes of sequence analysis by interested third parties (1, 2). However, ongoing developments in revolutionary sequencing technologies have resulted in a redefinition of traditional whole genome sequencing that requires a careful reevaluation of such standards. With commercially available 454 pyrosequencing (followed by Illumina, SOLiD, and now Helicos), there has been an explosion of genomes sequenced under the moniker 'draft', however these can be very poor quality genomes (due to inherent errors in the sequencing technologies, and the inability of assembly programs to fully address these errors). Further, one can only infer that such draft genomes may be of poor quality by navigating through the databases to find the number and type of reads deposited in sequence trace repositories (and not all genomes have this available), or to identify the number of contigs or genome fragments deposited to the database. The difficulty in assessing the quality of such deposited genomes has created some havoc for genome analysis pipelines and contributed to many wasted hours of (mis)interpretation. These same novel sequencing technologies have also brought an exponential leap in raw sequencing capability, and at greatly reduced prices that have further skewed the time- and cost-ratios of draft data generation versus the painstaking process of improving and finishing a genome. The resulting effect is an ever-widening gap between drafted and finished genomes that only promises to continue (Figure 1), hence there is an urgent need to distinguish good and poor datasets. The sequencing institutes in the authorship, along with the NIH's Human Microbiome Project Jumpstart Consortium (3), strongly believe that a new set of standards is required for genome sequences. The following represents a set of six community-defined categories of genome sequence standards that better reflect the

  7. Sequencing Intractable DNA to Close Microbial Genomes

    SciTech Connect

    Hurt, Jr., Richard Ashley; Brown, Steven D; Podar, Mircea; Palumbo, Anthony Vito; Elias, Dwayne A

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  8. Fungal genome sequencing: basic biology to biotechnology.

    PubMed

    Sharma, Krishna Kant

    2016-08-01

    The genome sequences provide a first glimpse into the genomic basis of the biological diversity of filamentous fungi and yeast. The genome sequence of the budding yeast, Saccharomyces cerevisiae, with a small genome size, unicellular growth, and rich history of genetic and molecular analyses was a milestone of early genomics in the 1990s. The subsequent completion of fission yeast, Schizosaccharomyces pombe and genetic model, Neurospora crassa initiated a revolution in the genomics of the fungal kingdom. In due course of time, a substantial number of fungal genomes have been sequenced and publicly released, representing the widest sampling of genomes from any eukaryotic kingdom. An ambitious genome-sequencing program provides a wealth of data on metabolic diversity within the fungal kingdom, thereby enhancing research into medical science, agriculture science, ecology, bioremediation, bioenergy, and the biotechnology industry. Fungal genomics have higher potential to positively affect human health, environmental health, and the planet's stored energy. With a significant increase in sequenced fungal genomes, the known diversity of genes encoding organic acids, antibiotics, enzymes, and their pathways has increased exponentially. Currently, over a hundred fungal genome sequences are publicly available; however, no inclusive review has been published. This review is an initiative to address the significance of the fungal genome-sequencing program and provides the road map for basic and applied research. PMID:25721271

  9. Draft Genome Sequences of Fungus Aspergillus calidoustus.

    PubMed

    Horn, Fabian; Linde, Jörg; Mattern, Derek J; Walther, Grit; Guthke, Reinhard; Scherlach, Kirstin; Martin, Karin; Brakhage, Axel A; Petzke, Lutz; Valiante, Vito

    2016-01-01

    Here, we report the draft genome sequence of Aspergillus calidoustus (strain SF006504). The functional annotation of A. calidoustus predicts a relatively large number of secondary metabolite gene clusters. The presented genome sequence builds the basis for further genome mining. PMID:26966204

  10. Draft Genome Sequences of Fungus Aspergillus calidoustus

    PubMed Central

    Horn, Fabian; Linde, Jörg; Mattern, Derek J.; Walther, Grit; Guthke, Reinhard; Scherlach, Kirstin; Martin, Karin; Brakhage, Axel A.; Petzke, Lutz

    2016-01-01

    Here, we report the draft genome sequence of Aspergillus calidoustus (strain SF006504). The functional annotation of A. calidoustus predicts a relatively large number of secondary metabolite gene clusters. The presented genome sequence builds the basis for further genome mining. PMID:26966204

  11. Value of a newly sequenced bacterial genome.

    PubMed

    Barbosa, Eudes Gv; Aburjaile, Flavia F; Ramos, Rommel Tj; Carneiro, Adriana R; Le Loir, Yves; Baumbach, Jan; Miyoshi, Anderson; Silva, Artur; Azevedo, Vasco

    2014-05-26

    Next-generation sequencing (NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft (partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the "scientific value" of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information. PMID:24921006

  12. Corrected sequence of the wheat plastid genome.

    PubMed

    Bahieldin, Ahmed; Al-Kordy, Magdy A; Shokry, Ahmed M; Gadalla, Nour O; Al-Hejin, Ahmed M M; Sabir, Jamal S M; Hassan, Sabah M; Al-Ahmadi, Ahlam A; Schwarz, Erika N; Eissa, Hala F; El-Domyati, Fotouh M; Jansen, Robert K

    2014-09-01

    Wheat is the most important cereal in the world in terms of acreage and productivity. We sequenced and assembled the plastid genome of one Egyptian wheat cultivar using next-generation sequence data. The size of the plastid genome is 133,873 bp, which is 672 bp smaller than the published plastid genome of "Chinese Spring" cultivar, due mainly to the presence of three sequences from the rice plastid genome. The difference in size between the previously published wheat plastid genome and the sequence reported here is due to contamination of the published genome with rice plastid DNA, most of which is present in three sequences of 332, 131 and 131 bp. The corrected plastid genome of wheat has been submitted to GenBank (accession number KJ592713) and can be used in future comparisons. PMID:25242688

  13. The fungal genome initiative and lessons learned from genome sequencing.

    PubMed

    Cuomo, Christina A; Birren, Bruce W

    2010-01-01

    The sequence of Saccharomyces cerevisiae enabled systematic genome-wide experimental approaches, demonstrating the power of having the complete genome of an organism. The rapid impact of these methods on research in yeast mobilized an effort to expand genomic resources for other fungi. The "fungal genome initiative" represents an organized genome sequencing effort to promote comparative and evolutionary studies across the fungal kingdom. Through such an approach, scientists can not only better understand specific organisms but also illuminate the shared and unique aspects of fungal biology that underlie the importance of fungi in biomedical research, health, food production, and industry. To date, assembled genomes for over 100 fungi are available in public databases, and many more sequencing projects are underway. Here, we discuss both examples of findings from comparative analysis of fungal sequences, with a specific emphasis on yeast genomes, and on the analytical approaches taken to mine fungal genomes. New sequencing methods are accelerating comparative studies of fungi by reducing the cost and difficulty of sequencing. This has driven more common use of sequencing applications, such as to study genome-wide variation in populations or to deeply profile RNA transcripts. These and further technological innovations will continue to be piloted in yeasts and other fungi, and will expand the applications of sequencing to study fungal biology. PMID:20946837

  14. Atypical regions in large genomic DNA sequences

    SciTech Connect

    Scherer, S. |; McPeek, M.S.; Speed, T.P.

    1994-07-19

    Large genomic DNA sequences contain regions with distinctive patterns of sequence organization. The authors describe a method using logarithms of probabilities based on seventh-order Markov chains to rapidly identify genomic sequences that do not resemble models of genome organization built from compilations of octanucleotide usage. Data bases have been constructed from Escherichia coli and Saccharomyces cerevisiae DNA sequences of >1000 nt and human sequences of >10,000 nt. Atypical genes and clusters of genes have been located in bacteriophage, yeast, and primate DNA sequences. The authors consider criteria for statistical significance of the results, offer possible explanations for the observed variation in genome organization, and give additional applications of these methods in DNA sequence analysis.

  15. Genome Sequence of Gordonia Phage Yvonnetastic.

    PubMed

    Pope, Welkin H; Bandyopadhyay, Anshika; Carlton, Meghan L; Kane, Meghan T; Panchal, Niyati J; Pham, Yvonne C; Reynolds, Zachary J; Sapienza, Michael S; German, Brian A; McDonnell, Jill E; Schafer, Claire E; Yu, Victor J; Furbee, Emily C; Grubb, Sarah R; Warner, Marcie H; Montgomery, Matthew T; Garlena, Rebecca A; Russell, Daniel A; Jacobs-Sera, Deborah; Hatfull, Graham F

    2016-01-01

    Gordonia bacteriophage Yvonnetastic was isolated from soil in Pittsburgh, PA, using Gordonia terrae 3612 as a host. Yvonnetastic has siphoviral morphology and a genome of 98,136 bp, with 198 predicted protein-coding genes and five tRNA genes. Yvonnetastic does not share substantial sequence similarity with other sequenced bacteriophage genomes. PMID:27389265

  16. Genome sequence of Lactobacillus rhamnosus ATCC 8530.

    PubMed

    Pittet, Vanessa; Ewen, Emily; Bushell, Barry R; Ziola, Barry

    2012-02-01

    Lactobacillus rhamnosus is found in the human gastrointestinal tract and is important for probiotics. We became interested in L. rhamnosus isolate ATCC 8530 in relation to beer spoilage and hops resistance. We report here the genome sequence of this isolate, along with a brief comparison to other available L. rhamnosus genome sequences. PMID:22247527

  17. Towards a reference pecan genome sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The cost of generating DNA sequence data has declined dramatically over the previous 15 years as a result of the Human Genome Project and the potential applications of genome sequencing for human medicine. This cost reduction has generated renewed interest among crop breeding scientists in applying...

  18. Genome Sequence of Gordonia Phage Yvonnetastic

    PubMed Central

    Bandyopadhyay, Anshika; Carlton, Meghan L.; Kane, Meghan T.; Panchal, Niyati J.; Pham, Yvonne C.; Reynolds, Zachary J.; Sapienza, Michael S.; German, Brian A.; McDonnell, Jill E.; Schafer, Claire E.; Yu, Victor J.; Furbee, Emily C.; Grubb, Sarah R.; Warner, Marcie H.; Montgomery, Matthew T.; Garlena, Rebecca A.; Russell, Daniel A.; Jacobs-Sera, Deborah; Hatfull, Graham F.

    2016-01-01

    Gordonia bacteriophage Yvonnetastic was isolated from soil in Pittsburgh, PA, using Gordonia terrae 3612 as a host. Yvonnetastic has siphoviral morphology and a genome of 98,136 bp, with 198 predicted protein-coding genes and five tRNA genes. Yvonnetastic does not share substantial sequence similarity with other sequenced bacteriophage genomes. PMID:27389265

  19. Next generation sequencing of viral RNA genomes

    PubMed Central

    2013-01-01

    Background With the advent of Next Generation Sequencing (NGS) technologies, the ability to generate large amounts of sequence data has revolutionized the genomics field. Most RNA viruses have relatively small genomes in comparison to other organisms and as such, would appear to be an obvious success story for the use of NGS technologies. However, due to the relatively low abundance of viral RNA in relation to host RNA, RNA viruses have proved relatively difficult to sequence using NGS technologies. Here we detail a simple, robust methodology, without the use of ultra-centrifugation, filtration or viral enrichment protocols, to prepare RNA from diagnostic clinical tissue samples, cell monolayers and tissue culture supernatant, for subsequent sequencing on the Roche 454 platform. Results As representative RNA viruses, full genome sequence was successfully obtained from known lyssaviruses belonging to recognized species and a novel lyssavirus species using these protocols and assembling the reads using de novo algorithms. Furthermore, genome sequences were generated from considerably less than 200 ng RNA, indicating that manufacturers’ minimum template guidance is conservative. In addition to obtaining genome consensus sequence, a high proportion of SNPs (Single Nucleotide Polymorphisms) were identified in the majority of samples analyzed. Conclusions The approaches reported clearly facilitate successful full genome lyssavirus sequencing and can be universally applied to discovering and obtaining consensus genome sequences of RNA viruses from a variety of sources. PMID:23822119

  20. Human Genome Sequencing in Health and Disease

    PubMed Central

    Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.

    2013-01-01

    Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320

  1. The genome sequence of parrot bornavirus 5.

    PubMed

    Guo, Jianhua; Tizard, Ian

    2015-12-01

    Although several new avian bornaviruses have recently been described, information on their evolution, virulence, and sequence are often limited. Here we report the complete genome sequence of parrot bornavirus 5 (PaBV-5) isolated from a case of proventricular dilatation disease in a Palm cockatoo (Probosciger aterrimus). The complete genome consists of 8842 nucleotides with distinct 5' and 3' end sequences. This virus shares nucleotide sequence identities of 69-74 % with other bornaviruses in the genomic regions excluding the 5' and 3' terminal sequences. Phylogenetic analysis based on the genomic regions demonstrated this new isolate is an isolated branch within the clade that includes the aquatic bird bornaviruses and the passerine bornaviruses. Based on phylogenetic analyses and its low nucleotide sequence identities with other bornavirus, we support the proposal that PaBV-5 be assigned to a new bornavirus species:- Psittaciform 2 bornavirus. PMID:26403158

  2. Translational genomics for plant breeding with the genome sequence explosion.

    PubMed

    Kang, Yang Jae; Lee, Taeyoung; Lee, Jayern; Shim, Sangrea; Jeong, Haneul; Satyawan, Dani; Kim, Moon Young; Lee, Suk-Ha

    2016-04-01

    The use of next-generation sequencers and advanced genotyping technologies has propelled the field of plant genomics in model crops and plants and enhanced the discovery of hidden bridges between genotypes and phenotypes. The newly generated reference sequences of unstudied minor plants can be annotated by the knowledge of model plants via translational genomics approaches. Here, we reviewed the strategies of translational genomics and suggested perspectives on the current databases of genomic resources and the database structures of translated information on the new genome. As a draft picture of phenotypic annotation, translational genomics on newly sequenced plants will provide valuable assistance for breeders and researchers who are interested in genetic studies. PMID:26269219

  3. Sequence Maneuverer: tool for sequence extraction from genomes

    PubMed Central

    Yasmin, Tayyaba; Rehman, Inayat Ur; Ansari, Adnan Ahmad; liaqat, Khurrum; khan, Muhammad Irfan

    2012-01-01

    The availability of genomic sequences of many organisms has opened new challenges in many aspects particularly in terms of genome analysis. Sequence extraction is a vital step and many tools have been developed to solve this issue. These tools are available publically but have limitations with reference to the sequence extraction, length of the sequence to be extracted, organism specificity and lack of user friendly interface. We have developed a java based software package having three modules which can be used independently or sequentially. The tool efficiently extracts sequences from large datasets with few simple steps. It can efficiently extract multiple sequences of any desired length from a genome of any organism. The results are crosschecked by published data. Availability URL 1: http://ww3.comsats.edu.pk/bio/ResearchProjects.aspx URL 2: http://ww3.comsats.edu.pk/bio/SequenceManeuverer.aspx PMID:23275734

  4. Genome Sequencing and Analysis Conference IV

    SciTech Connect

    Not Available

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  5. Genomic sequencing of Pleistocene cave bears

    SciTech Connect

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  6. Assessing inhomogeneities in bacterial long genomic sequences

    SciTech Connect

    Karlin, S.

    1997-12-01

    Several complete prokaryotic and eukaryotic genomes are already at hand (S. cerevisiae, H. influenzae, M. genitalium, M. jannaschii, Synechocystis, sp.) and many are forthcoming (e.g., E. coli, H, pylori, C. elegans). The comparative analysis of genomes generally strives to identify genes and characterize function/structure relationships inferred mostly via amino acid sequence comparisons. We describe concisely methods for comparing genomes (or long contigs) emphasizing sequence features other than gene comparisons. These center on the following measures of genomic organization and sequence heterogeneity: (i) compositional biases of short oligonucleotides; (ii) dinucleotide relative abundance distances within and between genomes; (iii) rare and frequent word (oligonucleotide) determinations and their distributional properties; (iv) r-scan statistics assessing clustering, overdispersion, or excessive evenness of various marker arrays; and (v) characterizations of repeat structures in the genome. 20 refs., 3 figs.

  7. The genome sequence of Drosophila melanogaster.

    SciTech Connect

    2000-03-24

    The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the {approximately}120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes {approximately}13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

  8. Plantagora: Modeling Whole Genome Sequencing and Assembly of Plant Genomes

    PubMed Central

    Barthelson, Roger; McFarlin, Adam J.; Rounsley, Steven D.; Young, Sarah

    2011-01-01

    Background Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them. Methodology/Principal Findings For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website. Conclusions/Significance Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly further. PMID:22174807

  9. Sequencing and Analysis of Neanderthal Genomic DNA

    PubMed Central

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith, Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Pääbo, Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2008-01-01

    Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence indicate that the 65,250 base pairs of hominid sequence so far identified in the library are of Neanderthal origin, the strongest being the ascertainment of sequence identities between Neanderthal and chimpanzee at sites where the human genomic sequence is different. These results enabled us to calculate the human-Neanderthal divergence time based on multiple randomly distributed autosomal loci. Our analyses suggest that on average the Neanderthal genomic sequence we obtained and the reference human genome sequence share a most recent common ancestor ~706,000 years ago, and that the human and Neanderthal ancestral populations split ~370,000 years ago, before the emergence of anatomically modern humans. Our finding that the Neanderthal and human genomes are at least 99.5% identical led us to develop and successfully implement a targeted method for recovering specific ancient DNA sequences from metagenomic libraries. This initial analysis of the Neanderthal genome advances our understanding of the evolutionary relationship of Homo sapiens and Homo neanderthalensis and signifies the dawn of Neanderthal genomics. PMID:17110569

  10. Microbial species delineation using whole genome sequences

    SciTech Connect

    Kyrpides, Nikos; Mukherjee, Supratim; Ivanova, Natalia; Mavrommatics, Kostas; Pati, Amrita; Konstantinidis, Konstantinos

    2014-10-20

    Species assignments in prokaryotes use a manual, poly-phasic approach utilizing both phenotypic traits and sequence information of phylogenetic marker genes. With thousands of genomes being sequenced every year, an automated, uniform and scalable approach exploiting the rich genomic information in whole genome sequences is desired, at least for the initial assignment of species to an organism. We have evaluated pairwise genome-wide Average Nucleotide Identity (gANI) values and alignment fractions (AFs) for nearly 13,000 genomes using our fast implementation of the computation, identifying robust and widely applicable hard cut-offs for species assignments based on AF and gANI. Using these cutoffs, we generated stable species-level clusters of organisms, which enabled the identification of several species mis-assignments and facilitated the assignment of species for organisms without species definitions.

  11. Genome sequence of Coxiella burnetii strain Namibia

    PubMed Central

    2014-01-01

    We present the whole genome sequence and annotation of the Coxiella burnetii strain Namibia. This strain was isolated from an aborting goat in 1991 in Windhoek, Namibia. The plasmid type QpRS was confirmed in our work. Further genomic typing placed the strain into a unique genomic group. The genome sequence is 2,101,438 bp long and contains 1,979 protein-coding and 51 RNA genes, including one rRNA operon. To overcome the poor yield from cell culture systems, an additional DNA enrichment with whole genome amplification (WGA) methods was applied. We describe a bioinformatics pipeline for improved genome assembly including several filters with a special focus on WGA characteristics. PMID:25593636

  12. Complementary DNA sequencing: Expressed sequence tags and human genome project

    SciTech Connect

    Adams, M.D.; Kelley, J.M.; Gocayne, J.D.; Dubnick, M.; Wu, A.; Olde, B.; Moreno, R.F.; Kerlavage, A.R.; McCombie, W.R.; Venter, J.C. ); Polymeropoulos, M.H.; Hong Xiao; Merril, C.R. )

    1991-06-21

    Automated partial DNA sequencing was conducted on more than 600 randomly selected human brain complementary DNA (cDNA) clones to generate expressed sequence tags (ESTs). ESTs have applications in the discovery of new human genes, mapping of the human genome, and identification of coding regions in genomic sequences. Of the sequences generated, 337 represent new genes, including 48 with significant similarity to genes from other organisms, such as a yeast RNA polymerase II subunit; Drosophila kinesin, Notch, and Enhancer of split; and a murine tyrosine kinase receptor. Forty-six ESTs were mapped to chromosomes after amplification by the polymerase chain reaction. This fast approach to cDNA characterization will facilitate the tagging of most human genes in a few years at a fraction of the cost of complete genomic sequencing, provide new genetic markers, and serve as a resource in diverse biological research fields.

  13. Complete genome sequence of tobacco mosqueado virus.

    PubMed

    Blawid, Rosana; Rodrigues, Kelly Barreto; de Moraes Rêgo, Camila; Inoue-Nagata, Alice K; Nagata, Tatsuya

    2016-09-01

    We describe the genomic characteristics of a new potyvirus isolated from tobacco plants showing mottling ("mosqueado" in Portuguese) in southern Brazil. The complete genomic sequence consists of 9896 nucleotides, without the poly(A) tail, and shares the highest pairwise nucleotide sequence identities of 68.5 % with pepper yellow mosaic virus and 68.2 % with Brugmansia mosaic virus isolate D437. These identity values are below the level of 76.0 % used as a criterion for species demarcation in the genus Potyvirus based on the complete genome sequence. The viral genomic organization and sequence comparison thus suggest that this virus, tentatively named "tobacco mosqueado virus" (TMosqV), represents a new potyvirus species. PMID:27368991

  14. Automated correction of genome sequence errors

    PubMed Central

    Gajer, Pawel; Schatz, Michael; Salzberg, Steven L.

    2004-01-01

    By using information from an assembly of a genome, a new program called AutoEditor significantly improves base calling accuracy over that achieved by previous algorithms. This in turn improves the overall accuracy of genome sequences and facilitates the use of these sequences for polymorphism discovery. We describe the algorithm and its application in a large set of recent genome sequencing projects. The number of erroneous base calls in these projects was reduced by 80%. In an analysis of over one million corrections, we found that AutoEditor made just one error per 8828 corrections. By substantially increasing the accuracy of base calling, AutoEditor can dramatically accelerate the process of finishing genomes, which involves closing all gaps and ensuring minimum quality standards for the final sequence. It also greatly improves our ability to discover single nucleotide polymorphisms (SNPs) between closely related strains and isolates of the same species. PMID:14744981

  15. Complete genome sequence of Thauera aminoaromatica strain MZ1T

    SciTech Connect

    Sanseverino, John; Chauhan, Archana; Lucas, Susan; Copeland, A; Lapidus, Alla L.; Glavina Del Rio, Tijana; Dalin, Eileen; Tice, Hope; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Sims, David; Brettin, Thomas S; Detter, J. Chris; Han, Cliff; Chang, Yun-Juan; Larimer, Frank W; Land, Miriam L; Hauser, Loren John; Kyrpides, Nikos C; Mikhailova, Natalia; Moser, Scott; Jegier, Patricia; Close, Dan; Wang, Ying; Layton, Alice; Allen, Michael S.; Sayler, Gary

    2012-01-01

    Thauera aminoaromatica strain MZ1T, an isolate belonging to genus Thauera, of the family Rhodocyclaceae and the class the Betaproteobacteria, has been characterized for its ability to produce abundant exopolysaccharide and degrade various aromatic compounds with nitrate as an electron acceptor. These properties, if fully understood at the genome-sequence level, can aid in environmental processing of organic matter in anaerobic cycles by short-circuiting a central anaerobic metabolite, acetate, from microbiological conversion to methane, a criti-cal greenhouse gas. Strain MZ1T is the first strain from the genus Thauera with a completely sequenced genome. The 4,496,212 bp chromosome and 78,374 bp plasmid contain 4,071 protein-coding and 71 RNA genes, and were sequenced as part of the DOE Community Se-quencing Program CSP{_}776774.

  16. Genome sequence and analysis of Lactobacillus helveticus

    PubMed Central

    Cremonesi, Paola; Chessa, Stefania; Castiglioni, Bianca

    2013-01-01

    The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of Lactobacillus helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE) inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract. As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones. PMID:23335916

  17. Sequencing and comparing whole mitochondrial genomes ofanimals

    SciTech Connect

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  18. Privacy-preserving genome-wide association studies on cloud environment using fully homomorphic encryption

    PubMed Central

    2015-01-01

    Objective Developed sequencing techniques are yielding large-scale genomic data at low cost. A genome-wide association study (GWAS) targeting genetic variations that are significantly associated with a particular disease offers great potential for medical improvement. However, subjects who volunteer their genomic data expose themselves to the risk of privacy invasion; these privacy concerns prevent efficient genomic data sharing. Our goal is to presents a cryptographic solution to this problem. Methods To maintain the privacy of subjects, we propose encryption of all genotype and phenotype data. To allow the cloud to perform meaningful computation in relation to the encrypted data, we use a fully homomorphic encryption scheme. Noting that we can evaluate typical statistics for GWAS from a frequency table, our solution evaluates frequency tables with encrypted genomic and clinical data as input. We propose to use a packing technique for efficient evaluation of these frequency tables. Results Our solution supports evaluation of the D′ measure of linkage disequilibrium, the Hardy-Weinberg Equilibrium, the χ2 test, etc. In this paper, we take χ2 test and linkage disequilibrium as examples and demonstrate how we can conduct these algorithms securely and efficiently in an outsourcing setting. We demonstrate with experimentation that secure outsourcing computation of one χ2 test with 10, 000 subjects requires about 35 ms and evaluation of one linkage disequilibrium with 10, 000 subjects requires about 80 ms. Conclusions With appropriate encoding and packing technique, cryptographic solutions based on fully homomorphic encryption for secure computations of GWAS can be practical. PMID:26732892

  19. Complete genome sequence of arracacha mottle virus.

    PubMed

    Orílio, Anelise F; Lucinda, Natalia; Dusi, André N; Nagata, Tatsuya; Inoue-Nagata, Alice K

    2013-01-01

    Arracacha mottle virus (AMoV) is the only potyvirus reported to infect arracacha (Arracacia xanthorrhiza) in Brazil. Here, the complete genome sequence of an isolate of AMoV was determined to be 9,630 nucleotides in length, excluding the 3' poly-A tail, and encoding a polyprotein of 3,135 amino acids and a putative P3N-PIPO protein. Its genomic organization is typical of a member of the genus Potyvirus, containing all conserved motifs. Its full genome sequence shared 56.2 % nucleotide identity with sunflower chlorotic mottle virus and verbena virus Y, the most closely related viruses. PMID:23001696

  20. The complete plastid genome sequence of Picea jezoensis (Pinaceae: Piceoideae).

    PubMed

    Yang, Jong Cheol; Joo, Minjung; So, Soonku; Yi, Dong-Keun; Shin, Chang Ho; Lee, You-Mi; Choi, Kyung

    2016-09-01

    The nucleotide sequence of the complete chloroplast genome of P. jezoensis was completed. The total genome size was 124 146 bp, containing a pair of very short inverted repeats (IRa and IRb) of 422 bp, which were separated by large single copy (LSC) and small single copy (SSC) with 66 956 bp and 56 346 bp, respectively. The overall GC contents of the plastid genome were determined as 38.8%. One hundred fifteen genes including 68 peptide-encoding genes, 35 tRNA genes, four rRNA genes, six open-reading frames, and two pseudogenes were annotated. In these genes, 15 genes contained only one or two introns. Phylogenetic analyses using maximum likelihood (ML) methods were performed from fully sequenced Gymnosperms and other species of dataset composed of 69 protein-coding genes. PMID:26332576

  1. A Workshop Report on Wheat Genome Sequencing

    PubMed Central

    Gill, Bikram S.; Appels, Rudi; Botha-Oberholster, Anna-Maria; Buell, C. Robin; Bennetzen, Jeffrey L.; Chalhoub, Boulos; Chumley, Forrest; Dvořák, Jan; Iwanaga, Masaru; Keller, Beat; Li, Wanlong; McCombie, W. Richard; Ogihara, Yasunari; Quetier, Francis; Sasaki, Takuji

    2004-01-01

    Sponsored by the National Science Foundation and the U.S. Department of Agriculture, a wheat genome sequencing workshop was held November 10–11, 2003, in Washington, DC. It brought together 63 scientists of diverse research interests and institutions, including 45 from the United States and 18 from a dozen foreign countries (see list of participants at http://www.ksu.edu/igrow). The objectives of the workshop were to discuss the status of wheat genomics, obtain feedback from ongoing genome sequencing projects, and develop strategies for sequencing the wheat genome. The purpose of this report is to convey the information discussed at the workshop and provide the basis for an ongoing dialogue, bringing forth comments and suggestions from the genetics community. PMID:15514080

  2. Draft Genome Sequence of Goose Dicistrovirus.

    PubMed

    Greninger, Alexander L; Jerome, Keith R

    2016-01-01

    We report the draft genome sequence of goose dicistrovirus assembled from the filtered feces of a Canadian goose from South Lake Union in Seattle, Washington. The 9.1-kb dicistronic RNA virus falls within the family Dicistroviridae; however, it shares <33% translated amino acid sequence within the nonstructural open reading frame (ORF) from aparavirus or cripavirus. PMID:26941149

  3. Complete Genome Sequencing of Trivittatus virus

    PubMed Central

    Groseth, Allison; Vine, Veronica; Weisend, Carla; Ebihara, Hideki

    2015-01-01

    Trivittatus virus (family Bunyaviridae, genus Orthobunyavirus) represents an important genetic intermediate between the California encephalitis group, and Bwamba/Pongola and Nyando groups. Here, we report the first complete genome sequence of the prototype (Eklund) strain, isolated in 1948, which interestingly shows only few differences compared to partial sequences of modern strains. PMID:26212363

  4. Draft Genome Sequence of Goose Dicistrovirus

    PubMed Central

    Jerome, Keith R.

    2016-01-01

    We report the draft genome sequence of goose dicistrovirus assembled from the filtered feces of a Canadian goose from South Lake Union in Seattle, Washington. The 9.1-kb dicistronic RNA virus falls within the family Dicistroviridae; however, it shares <33% translated amino acid sequence within the nonstructural open reading frame (ORF) from aparavirus or cripavirus. PMID:26941149

  5. Complete Genome Sequences of 63 Mycobacteriophages

    PubMed Central

    2013-01-01

    Mycobacteriophages are viruses that infect mycobacterial hosts. The current collection of sequenced mycobacteriophages—all isolated on a single host strain, Mycobacterium smegmatis mc2155, reveals substantial genetic diversity. The complete genome sequences of 63 newly isolated mycobacteriophages expand the resolution of our understanding of phage diversity. PMID:24285655

  6. Genome Sequence of Pseudomonas chlororaphis Strain 189

    PubMed Central

    Town, Jennifer; Audy, Patrice; Boyetchko, Susan M.

    2016-01-01

    Pseudomonas chlororaphis strain 189 is a potent inhibitor of the growth of the potato pathogen Phytophthora infestans. We determined the complete, finished sequence of the 6.8-Mbp genome of this strain, consisting of a single contiguous molecule. Strain 189 is closely related to previously sequenced strains of P. chlororaphis. PMID:27340063

  7. Global Alignment System for Large Genomic Sequencing

    Energy Science and Technology Software Center (ESTSC)

    2002-03-01

    AVID is a global alignment system tailored for the alignment of large genomic sequences up to megabases in length. Features include the possibility of one sequence being in draft form, fast alignment, robustness and accuracy. The method is an anchor based alignment using maximal matches derived from suffix trees.

  8. Complete Plastid Genome Sequence of the Brown Alga Undaria pinnatifida

    PubMed Central

    Liu, Tao; Wang, Guoliang; Chi, Shan; Liu, Cui; Wang, Haiyang

    2015-01-01

    In this study, we fully sequenced the circular plastid genome of a brown alga, Undaria pinnatifida. The genome is 130,383 base pairs (bp) in size; it contains a large single-copy (LSC, 76,598 bp) and a small single-copy region (SSC, 42,977 bp), separated by two inverted repeats (IRa and IRb: 5,404 bp). The genome contains 139 protein-coding, 28 tRNA, and 6 rRNA genes; none of these genes contains introns. Organization and gene contents of the U. pinnatifida plastid genome were similar to those of Saccharina japonica. There is a co-linear relationship between the plastid genome of U. pinnatifida and that of three previously sequenced large brown algal species. Phylogenetic analyses of 43 taxa based on 23 plastid protein-coding genes grouped all plastids into a red or green lineage. In the large brown algae branch, U. pinnatifida and S. japonica formed a sister clade with much closer relationship to Ectocarpus siliculosus than to Fucus vesiculosus. For the first time, the start codon ATT was identified in the plastid genome of large brown algae, in the atpA gene of U. pinnatifida. In addition, we found a gene-length change induced by a 3-bp repetitive DNA in ycf35 and ilvB genes of the U. pinnatifida plastid genome. PMID:26426800

  9. Genomic sequence analysis tools: a user's guide.

    PubMed

    Fortna, A; Gardiner, K

    2001-03-01

    The wealth of information from various genome sequencing projects provides the biologist with a new perspective from which to analyze, and design experiments with, mammalian systems. The complexity of the information, however, requires new software tools, and numerous such tools are now available. Which type and which specific system is most effective depends, in part, upon how much sequence is to be analyzed and with what level of experimental support. Here we survey a number of mammalian genomic sequence analysis systems with respect to the data they provide and the ease of their use. The hope is to aid the experimental biologist in choosing the most appropriate tool for their analyses. PMID:11226611

  10. It’s More Than Stamp Collecting: How Genome Sequencing Can Unify Biological Research

    PubMed Central

    Richards, Stephen

    2015-01-01

    The availability of reference genome sequences, especially the human reference, has revolutionized the study of biology. However, whilst the genomes of some species have been fully sequenced, a wide range of biological problems still cannot be effectively studied for lack of genome sequence information. Here, I identify neglected areas of biology and describe how both targeted species sequencing and more broad taxonomic surveys of the tree of life can address important biological questions. I enumerate the significant benefits that would accrue from sequencing a broader range of taxa, as well as discuss the technical advances in sequencing and assembly methods that would allow for wide-ranging application of whole-genome analysis. Finally, I suggest that in addition to “Big Science” survey initiatives to sequence the tree of life, a modified infrastructure-funding paradigm would better support reference genome sequence generation for research communities most in need. PMID:26003218

  11. A Complete Sequence of the T. tengcongensis Genome

    PubMed Central

    Bao, Qiyu; Tian, Yuqing; Li, Wei; Xu, Zuyuan; Xuan, Zhenyu; Hu, Songnian; Dong, Wei; Yang, Jian; Chen, Yanjiong; Xue, Yanfen; Xu, Yi; Lai, Xiaoqin; Huang, Li; Dong, Xiuzhu; Ma, Yanhe; Ling, Lunjiang; Tan, Huarong; Chen, Runsheng; Wang, Jian; Yu, Jun; Yang, Huanming

    2002-01-01

    Thermoanaerobacter tengcongensis is a rod-shaped, gram-negative, anaerobic eubacterium that was isolated from a freshwater hot spring in Tengchong, China. Using a whole-genome-shotgun method, we sequenced its 2,689,445-bp genome from an isolate, MB4T (Genbank accession no. AE008691). The genome encodes 2588 predicted coding sequences (CDS). Among them, 1764 (68.2%) are classified according to homology to other documented proteins, and the rest, 824 CDS (31.8%), are functionally unknown. One of the interesting features of the T. tengcongensis genome is that 86.7% of its genes are encoded on the leading strand of DNA replication. Based on protein sequence similarity, the T. tengcongensis genome is most similar to that of Bacillus halodurans, a mesophilic eubacterium, among all fully sequenced prokaryotic genomes up to date. Computational analysis on genes involved in basic metabolic pathways supports the experimental discovery that T. tengcongensis metabolizes sugars as principal energy and carbon source and utilizes thiosulfate and element sulfur, but not sulfate, as electron acceptors. T. tengcongensis, as a gram-negative rod by empirical definitions (such as staining), shares many genes that are characteristics of gram-positive bacteria whereas it is missing molecular components unique to gram-negative bacteria. A strong correlation between the G + C content of tDNA and rDNA genes and the optimal growth temperature is found among the sequenced thermophiles. It is concluded that thermophiles are a biologically and phylogenetically divergent group of prokaryotes that have converged to sustain extreme environmental conditions over evolutionary timescale. [Supplemental material is available online at http://www.genome.org.] PMID:11997336

  12. Genome Sequence of Mycobacteriophage Cabrinians

    PubMed Central

    Chudoff, Dylan; Conboy, Andrew; Conboy, Danielle; Atoulelou, Mireille; Hasan, Sakina; Martinez, Alexandria; Mastrando, Jessica; Roy, Renoy; Schmidt, Robert; Sheed, Kabreeze; Smith, Jewel; Sperratore, Morgan; Struga, Rexhina; Starr, Katelyn; Suppi, Regina; Uguru, Ugo; Terry, Katrina; Villafuerte, Rosendo; Yuan, Vanessa

    2016-01-01

    Mycobacteriophage Cabrinians is a newly isolated phage capable of infecting both Mycobacterium phlei and Mycobacterium smegmatis and was recovered from a soil sample in New York City, NY. Cabrinians has a genome length of 56,669 bp, encodes 101 predicted proteins, and is a member of mycobacteriophages in cluster F. PMID:26847904

  13. Genome Sequence of Mycobacteriophage Cabrinians.

    PubMed

    Chudoff, Dylan; Conboy, Andrew; Conboy, Danielle; Atoulelou, Mireille; Hasan, Sakina; Martinez, Alexandria; Mastrando, Jessica; Roy, Renoy; Schmidt, Robert; Sheed, Kabreeze; Smith, Jewel; Sperratore, Morgan; Struga, Rexhina; Starr, Katelyn; Suppi, Regina; Uguru, Ugo; Terry, Katrina; Villafuerte, Rosendo; Yuan, Vanessa; Dunbar, David

    2016-01-01

    Mycobacteriophage Cabrinians is a newly isolated phage capable of infecting both Mycobacterium phlei and Mycobacterium smegmatis and was recovered from a soil sample in New York City, NY. Cabrinians has a genome length of 56,669 bp, encodes 101 predicted proteins, and is a member of mycobacteriophages in cluster F. PMID:26847904

  14. Complete chloroplast genome sequences of Solanum bulbocastanum, Solanum lycopersicum and comparative analyses with other Solanaceae genomes.

    PubMed

    Daniell, Henry; Lee, Seung-Bum; Grevich, Justin; Saski, Christopher; Quesada-Vargas, Tania; Guda, Chittibabu; Tomkins, Jeffrey; Jansen, Robert K

    2006-05-01

    Despite the agricultural importance of both potato and tomato, very little is known about their chloroplast genomes. Analysis of the complete sequences of tomato, potato, tobacco, and Atropa chloroplast genomes reveals significant insertions and deletions within certain coding regions or regulatory sequences (e.g., deletion of repeated sequences within 16S rRNA, ycf2 or ribosomal binding sites in ycf2). RNA, photosynthesis, and atp synthase genes are the least divergent and the most divergent genes are clpP, cemA, ccsA, and matK. Repeat analyses identified 33-45 direct and inverted repeats >or=30 bp with a sequence identity of at least 90%; all but five of the repeats shared by all four Solanaceae genomes are located in the same genes or intergenic regions, suggesting a functional role. A comprehensive genome-wide analysis of all coding sequences and intergenic spacer regions was done for the first time in chloroplast genomes. Only four spacer regions are fully conserved (100% sequence identity) among all genomes; deletions or insertions within some intergenic spacer regions result in less than 25% sequence identity, underscoring the importance of choosing appropriate intergenic spacers for plastid transformation and providing valuable new information for phylogenetic utility of the chloroplast intergenic spacer regions. Comparison of coding sequences with expressed sequence tags showed considerable amount of variation, resulting in amino acid changes; none of the C-to-U conversions observed in potato and tomato were conserved in tobacco and Atropa. It is possible that there has been a loss of conserved editing sites in potato and tomato. PMID:16575560

  15. Genome Sequence of the Palaeopolyploid soybean

    SciTech Connect

    Schmutz, Jeremy; Cannon, Steven B.; Schlueter, Jessica; Ma, Jianxin; Mitros, Therese; Nelson, William; Hyten, David L.; Song, Qijian; Thelen, Jay J.; Cheng, Jianlin; Xu, Dong; Hellsten, Uffe; May, Gregory D.; Yu, Yeisoo; Sakura, Tetsuya; Umezawa, Taishi; Bhattacharyya, Madan K.; Sandhu, Devinder; Valliyodan, Babu; Lindquist, Erika; Peto, Myron; Grant, David; Shu, Shengqiang; Goodstein, David; Barry, Kerrie; Futrell-Griggs, Montona; Abernathy, Brian; Du, Jianchang; Tian, Zhixi; Zhu, Liucun; Gill, Navdeep; Joshi, Trupti; Libault, Marc; Sethuraman, Anand; Zhang, Xue-Cheng; Shinozaki, Kazuo; Nguyen, Henry T.; Wing, Rod A.; Cregan, Perry; Specht, James; Grimwood, Jane; Rokhsar, Dan; Stacey, Gary; Shoemaker, Randy C.; Jackson, Scott A.

    2009-08-03

    Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70percent more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78percent of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75percent of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

  16. Rhipicephalus (Boophilus) microplus strain Deutsch, whole genome shotgun sequencing project first submission of genome sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence difficult. Cot filtration/selection techniques were used to reduce the repetitive fraction of the tick genome and enrich for the fraction of DNA with gene-containing regions. The Cot-selected ...

  17. Sequencing and analysis of a genomic fragment provide an insight into the Dunaliella viridis genomic sequence.

    PubMed

    Sun, Xiao-Ming; Tang, Yuan-Ping; Meng, Xiang-Zong; Zhang, Wen-Wen; Li, Shan; Deng, Zhi-Rui; Xu, Zheng-Kai; Song, Ren-Tao

    2006-11-01

    Dunaliella is a genus of wall-less unicellular eukaryotic green alga. Its exceptional resistances to salt and various other stresses have made it an ideal model for stress tolerance study. However, very little is known about its genome and genomic sequences. In this study, we sequenced and analyzed a 29,268 bp genomic fragment from Dunaliella viridis. The fragment showed low sequence homology to the GenBank database. At the nucleotide level, only a segment with significant sequence homology to 18S rRNA was found. The fragment contained six putative genes, but only one gene showed significant homology at the protein level to GenBank database. The average GC content of this sequence was 51.1%, which was much lower than that of close related green algae Chlamydomonas (65.7%). Significant segmental duplications were found within this fragment. The duplicated sequences accounted for about 35.7% of the entire region. Large amounts of simple sequence repeats (microsatellites) were found, with strong bias towards (AC)(n) type (76%). Analysis of other Dunaliella genomic sequences in the GenBank database (total 25,749 bp) was in agreement with these findings. These sequence features made it difficult to sequence Dunaliella genomic sequences. Further investigation should be made to reveal the biological significance of these unique sequence features. PMID:17091199

  18. Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    PubMed

    Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC. PMID:23589541

  19. Accelerating Genome Sequencing 100X with FPGAs

    SciTech Connect

    Storaasli, Olaf O; Strenski, Dave

    2007-01-01

    The performance of two Cray XD1 systems with Virtex-II Pro 50 and Virtex-4 LX160 FPGAs was evaluated using the FASTA computational biology program for human genome (DNA and protein) sequence comparisons. FPGA speedups of 50X (Virtex-II Pro 50) and 100X (Virtex-4 LX160) over a 2.2 GHz Opteron were obtained. FPGA coding issues for human genome data are described.

  20. Sequence analysis of the Choristoneura occidentalis granulovirus genome.

    PubMed

    Escasa, Shannon R; Lauzon, Hilary A M; Mathur, Amanda C; Krell, Peter J; Arif, Basil M

    2006-07-01

    The genome of the Choristoneura occidentalis granulovirus (ChocGV) isolated from the western spruce budworm, Choristoneura occidentalis, was sequenced completely. It was 104,710 bp long, with a 67.3% A+T content and contained 116 potential open reading frames (ORFs) covering 88.4% of the genome. Of these, 29 ORFs were conserved in all fully sequenced baculovirus genomes, 30 were GV-specific, 53 were present in some nucleopolyhedroviruses (NPVs) and/or GVs, three were common to ChocGV and Choristoneura fumiferana GV (ChfuGV) and one was so far unique. To date, ChocGV is the only GV identified that contains a homologue of the apoptosis inhibitor protein P35/P49, present in some group I NPVs. It is also the first GV without a Xestia c-nigrum GV ORF 26 homologue. Five homologous regions (hrs)/repeat regions, lacking typical NPV hr palindromes were identified. ChocGV hrs were similar to each other but not to other GV hrs. A 1.8 kb repeat region with a high A+T content (81%) and multiple repeats of 21-210 bp was found between choc36 and 37. This area resembled the non-homologous region origin of DNA replication (non-hr ori) identified in Cryptophlebia leucotreta GV (CrleGV) and Cydia pomonella GV (CpGV). Based on the mean amino acid identities of homologous proteins, ChocGV was closest to fully sequenced genomes CpGV (52.3%) and CrleGV (52.1%). The closest amino acid identity was to individual ORFs from the partially sequenced ChfuGV genome (97.2% in 38 ORFs). Phylogenetic analysis placed ChocGV in a clade with CrleGV and CpGV. PMID:16760394

  1. Microbial species delineation using whole genome sequences

    PubMed Central

    Varghese, Neha J.; Mukherjee, Supratim; Ivanova, Natalia; Konstantinidis, Konstantinos T.; Mavrommatis, Kostas; Kyrpides, Nikos C.; Pati, Amrita

    2015-01-01

    Increased sequencing of microbial genomes has revealed that prevailing prokaryotic species assignments can be inconsistent with whole genome information for a significant number of species. The long-standing need for a systematic and scalable species assignment technique can be met by the genome-wide Average Nucleotide Identity (gANI) metric, which is widely acknowledged as a robust measure of genomic relatedness. In this work, we demonstrate that the combination of gANI and the alignment fraction (AF) between two genomes accurately reflects their genomic relatedness. We introduce an efficient implementation of AF,gANI and discuss its successful application to 86.5M genome pairs between 13,151 prokaryotic genomes assigned to 3032 species. Subsequently, by comparing the genome clusters obtained from complete linkage clustering of these pairs to existing taxonomy, we observed that nearly 18% of all prokaryotic species suffer from anomalies in species definition. Our results can be used to explore central questions such as whether microorganisms form a continuum of genetic diversity or distinct species represented by distinct genetic signatures. We propose that this precise and objective AF,gANI-based species definition: the MiSI (Microbial Species Identifier) method, be used to address previous inconsistencies in species classification and as the primary guide for new taxonomic species assignment, supplemented by the traditional polyphasic approach, as required. PMID:26150420

  2. Sequencing the AML Genome, Transcriptome, and Epigenome

    PubMed Central

    Mardis, Elaine R.

    2014-01-01

    Leukemia is a disease that develops as a result of changes in the genomes of hematopoietic cells, a fact first appreciated by microscopic examination of the bone marrow cell chromosomes of affected patients. These studies revealed that specific subtypes of leukemia diagnosis correlated with specific chromosomal abnormalities, such as the t(15;17) of acute promyelocytic leukemia1 and the t(9;22) of chronic myeloid leukemia2. Over time, our genomic characterization of hematologic malignancies has moved beyond the resolution of the microscope to that of individual nucleotides in the analysis of whole genome sequencing data using state-of-the-art massively parallel sequencing (MPS) instruments and algorithmic analyses of the resulting data. In addition to studying the genomic sequence alterations that occur in patient’s genomes, these same instruments can decode the methylation landscape of the leukemia genome and the resulting RNA expression landscape of the leukemia transcriptome. Broad correlative analyses can then integrate these three data types to better inform researchers and clinicians about the biology of individual acute myeloid leukemia (AML) cases, facilitating improvements in care and prognosis. PMID:25311738

  3. Sorghum genome sequencing by methylation filtration.

    PubMed

    Bedell, Joseph A; Budiman, Muhammad A; Nunberg, Andrew; Citek, Robert W; Robbins, Dan; Jones, Joshua; Flick, Elizabeth; Rholfing, Theresa; Fries, Jason; Bradford, Kourtney; McMenamy, Jennifer; Smith, Michael; Holeman, Heather; Roe, Bruce A; Wiley, Graham; Korf, Ian F; Rabinowicz, Pablo D; Lakey, Nathan; McCombie, W Richard; Jeddeloh, Jeffrey A; Martienssen, Robert A

    2005-01-01

    Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF) technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis. PMID:15660154

  4. An International Plan to Sequence the Onion Genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The cost of DNA sequencing continues to decline and, in the near future, it will become reasonable to undertake sequencing of the enormous nuclear genome of onion. We undertook sequencing of expressed and genomic regions of the onion genome to learn about the structure of the onion genome, as well a...

  5. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

    PubMed Central

    Cosentino, Salvatore; Rasmussen, Simon; Friis, Carsten; Hasman, Henrik; Marvig, Rasmus Lykke; Jelsbak, Lars; Sicheritz-Pontén, Thomas; Ussery, David W.; Aarestrup, Frank M.; Lund, Ole

    2012-01-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the “gold standard” of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available to scientists and routine diagnostic laboratories. Currently, the cost is below that of traditional MLST. The new challenges will be how to extract the relevant information from the large amount of data so as to allow for comparison over time and between laboratories. Ideally, this information should also allow for comparison to historical data. We developed a Web-based method for MLST of 66 bacterial species based on WGS data. As input, the method uses short sequence reads from four sequencing platforms or preassembled genomes. Updates from the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56 MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types of their isolates on the basis of WGS data. This method is publicly available at www.cbs.dtu.dk/services/MLST. PMID:22238442

  6. Multilocus sequence typing of total-genome-sequenced bacteria.

    PubMed

    Larsen, Mette V; Cosentino, Salvatore; Rasmussen, Simon; Friis, Carsten; Hasman, Henrik; Marvig, Rasmus Lykke; Jelsbak, Lars; Sicheritz-Pontén, Thomas; Ussery, David W; Aarestrup, Frank M; Lund, Ole

    2012-04-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available to scientists and routine diagnostic laboratories. Currently, the cost is below that of traditional MLST. The new challenges will be how to extract the relevant information from the large amount of data so as to allow for comparison over time and between laboratories. Ideally, this information should also allow for comparison to historical data. We developed a Web-based method for MLST of 66 bacterial species based on WGS data. As input, the method uses short sequence reads from four sequencing platforms or preassembled genomes. Updates from the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56 MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types of their isolates on the basis of WGS data. This method is publicly available at www.cbs.dtu.dk/services/MLST. PMID:22238442

  7. Mapping and sequencing the human genome

    SciTech Connect

    1988-01-01

    Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.

  8. Mapping and Sequencing the Human Genome

    DOE R&D Accomplishments Database

    1988-01-01

    Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.

  9. Complete Genome Sequence and Methylome of Salmonella enterica subsp. enterica Cerro, a Frequent Dairy Cow Serovar

    PubMed Central

    Haley, Bradd J.; Pirone, Cary; Muruvanda, Tim; Brown, Eric; Allard, Marc; Karns, Jeffrey S.

    2016-01-01

    Salmonella enterica subsp. enterica serovar Cerro is an infrequent pathogen of humans and other mammals but is frequently isolated from the hindgut of asymptomatic cattle in the United States. To further understand the genomic determinants of S. Cerro specificity for the bovine hindgut, the genome of isolate CFSAN001588 was fully sequenced and deposited in the GenBank database. PMID:26823571

  10. Complete Genome Sequence of Bovine Polyomavirus Type 1 from Aborted Cattle, Isolated in Belgium in 2014.

    PubMed

    Van Borm, Steven; Rosseel, Toon; Behaeghel, Isabelle; Saulmont, Marc; Delooz, Laurent; Petitjean, Thierry; Mathijs, Elisabeth; Vandenbussche, Frank

    2016-01-01

    The complete and fully annotated genome sequence of a bovine polyomavirus type 1 (BPyV/BEL/1/2014) from aborted cattle was assembled from a metagenomics data set. The 4,697-bp circular dsDNA genome contains 6 protein-coding genes. Bovine polyomavirus is unlikely to be causally related to the abortion cases. PMID:26941154

  11. Reducing assembly complexity of microbial genomes with single-molecule sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genome assembly algorithms cannot fully reconstruct microbial chromosomes from the DNA reads output by first or second-generation sequencing instruments. Therefore, most genomes are left unfinished due to the significant resources required to manually close gaps left in the draft assemblies. Single-...

  12. Genome Sequences of Murine Pneumotropic Virus (Polyomaviridae) Detected in Wild House Mice (Mus musculus)

    PubMed Central

    Ben Salem, Nicole; Moens, Ugo

    2016-01-01

    Using generic PCR, we identified a variant of murine pneumotropic virus (MptV) (family Polyomaviridae) in 3 wild house mice (Mus musculus). The fully amplified and sequenced genomes display considerable differences from the MptV genomes published previously and enlighten us on the natural diversity of rodent polyomaviruses. PMID:26798094

  13. Complete Genome Sequence of Bovine Polyomavirus Type 1 from Aborted Cattle, Isolated in Belgium in 2014

    PubMed Central

    Rosseel, Toon; Behaeghel, Isabelle; Saulmont, Marc; Delooz, Laurent; Petitjean, Thierry; Mathijs, Elisabeth; Vandenbussche, Frank

    2016-01-01

    The complete and fully annotated genome sequence of a bovine polyomavirus type 1 (BPyV/BEL/1/2014) from aborted cattle was assembled from a metagenomics data set. The 4,697-bp circular dsDNA genome contains 6 protein-coding genes. Bovine polyomavirus is unlikely to be causally related to the abortion cases. PMID:26941154

  14. The Complete Genome Sequence of the Murine Pathobiont Helicobacter typhlonius

    PubMed Central

    Frank, Jeroen; Dingemanse, Celia; Schmitz, Arnoud M.; Vossen, Rolf H. A. M.; van Ommen, Gert-Jan B.; den Dunnen, Johan T.; Robanus-Maandag, Els C.; Anvar, Seyed Yahya

    2016-01-01

    Background: Immuno-compromised mice infected with Helicobacter typhlonius are used to model microbially inducted inflammatory bowel disease (IBD). The specific mechanism through which H. typhlonius induces and promotes IBD is not fully understood. Access to the genome sequence is essential to examine emergent properties of this organism, such as its pathogenicity. To this end, we present the complete genome sequence of H. typhlonius MIT 97-6810, obtained through single-molecule real-time sequencing. Results: The genome was assembled into a single circularized contig measuring 1.92 Mbp with an average GC content of 38.8%. In total 2,117 protein-encoding genes and 43 RNA genes were identified. Numerous pathogenic features were found, including a putative pathogenicity island (PAIs) containing components of type IV secretion system, virulence-associated proteins and cag PAI protein. We compared the genome of H. typhlonius to those of the murine pathobiont H. hepaticus and human pathobiont H. pylori. H. typhlonius resembles H. hepaticus most with 1,594 (75.3%) of its genes being orthologous to genes in H. hepaticus. Determination of the global methylation state revealed eight distinct recognition motifs for adenine and cytosine methylation. H. typhlonius shares four of its recognition motifs with H. pylori. Conclusion: The complete genome sequence of H. typhlonius MIT 97-6810 enabled us to identify many pathogenic features suggesting that H. typhlonius can act as a pathogen. Follow-up studies are necessary to evaluate the true nature of its pathogenic capabilities. We found many methylated sites and a plethora of restriction-modification systems. The genome, together with the methylome, will provide an essential resource for future studies investigating gene regulation, host interaction and pathogenicity of H. typhlonius. In turn, this work can contribute to unraveling the role of Helicobacter in enteric disease. PMID:26779178

  15. Genome Sequencing Fishes out Longevity Genes.

    PubMed

    Lakhina, Vanisha; Murphy, Coleen T

    2015-12-01

    Understanding the molecular basis underlying aging is critical if we are to fully understand how and why we age-and possibly how to delay the aging process. Up until now, most longevity pathways were discovered in invertebrates because of their short lifespans and availability of genetic tools. Now, Reichwald et al. and Valenzano et al. independently provide a reference genome for the short-lived African turquoise killifish, establishing its role as a vertebrate system for aging research. PMID:26638067

  16. Genome Sequence of Corynebacterium ulcerans Strain 210932

    PubMed Central

    Viana, Marcus Vinicius Canário; de Jesus Benevides, Leandro; Batista Mariano, Diego Cesar; de Souza Rocha, Flávia; Bagano Vilas Boas, Priscilla Carolinne; Folador, Edson Luiz; Pereira, Felipe Luiz; Alves Dorella, Fernanda; Gomes Leal, Carlos Augusto; Fiorini de Carvalho, Alex; Silva, Artur; de Castro Soares, Siomar; Pereira Figueiredo, Henrique Cesar; Guimarães, Luis Carlos

    2014-01-01

    In this work, we present the complete genome sequence of Corynebacterium ulcerans strain 210932, isolated from a human. The species is an emergent pathogen that infects a variety of wild and domesticated animals and humans. It is associated with a growing number of cases of a diphtheria-like disease around the world. PMID:25428977

  17. Complete Genome Sequences of 61 Mycobacteriophages

    PubMed Central

    2016-01-01

    Mycobacteriophages—viruses of mycobacteria—provide insights into viral diversity and evolution as well as numerous tools for genetic dissection of Mycobacterium tuberculosis. Here we report the complete genome sequences of 61 mycobacteriophages newly isolated from environmental samples using Mycobacterium smegmatis mc2155 that expand our understanding of phage diversity. PMID:27389257

  18. Draft Genome Sequence of Virgibacillus halodenitrificans 1806

    PubMed Central

    Lee, Sang-Jae; Lee, Yong-Jik; Jeong, Haeyoung; Lee, Sang Jun; Lee, Han-Seung; Pan, Jae-Gu

    2012-01-01

    Virgibacillus halodenitrificans 1806 is an endospore-forming halophilic bacterium isolated from salterns in Korea. Here, we report the draft genome sequence of V. halodenitrificans 1806, which may reveal the molecular basis of osmoadaptation and insights into carbon and anaerobic metabolism in moderate halophiles. PMID:23105070

  19. Genome Sequence of Gordonia Phage Emalyn

    PubMed Central

    Guido, Madeline J.; Iyengar, Pragnya; Nigra, Jonathan T.; Serbin, Matthew B.; Kasturiarachi, Naomi S.; Pressimone, Catherine A.; Schiebel, Johnathon G.; Furbee, Emily C.; Grubb, Sarah R.; Warner, Marcie H.; Montgomery, Matthew T.; Garlena, Rebecca A.; Russell, Daniel A.; Jacobs-Sera, Deborah; Hatfull, Graham F.

    2016-01-01

    Emalyn is a newly isolated bacteriophage of Gordonia terrae 3612 and has a double-stranded DNA genome 43,982 bp long with 67 predicted protein-encoding genes, 32 of which we can assign putative functions. Emalyn has a prolate capsid and has extensive nucleotide similarity with several previously sequenced phages. PMID:27516499

  20. Genome Sequence of Gordonia Phage Emalyn.

    PubMed

    Pope, Welkin H; Guido, Madeline J; Iyengar, Pragnya; Nigra, Jonathan T; Serbin, Matthew B; Kasturiarachi, Naomi S; Pressimone, Catherine A; Schiebel, Johnathon G; Furbee, Emily C; Grubb, Sarah R; Warner, Marcie H; Montgomery, Matthew T; Garlena, Rebecca A; Russell, Daniel A; Jacobs-Sera, Deborah; Hatfull, Graham F

    2016-01-01

    Emalyn is a newly isolated bacteriophage of Gordonia terrae 3612 and has a double-stranded DNA genome 43,982 bp long with 67 predicted protein-encoding genes, 32 of which we can assign putative functions. Emalyn has a prolate capsid and has extensive nucleotide similarity with several previously sequenced phages. PMID:27516499

  1. Genome sequence of Lactobacillus amylovorus GRL1112.

    PubMed

    Kant, Ravi; Paulin, Lars; Alatalo, Edward; de Vos, Willem M; Palva, Airi

    2011-02-01

    Lactobacillus amylovorus is a common member of the normal gastrointestinal tract (GIT) microbiota in pigs. Here, we report the genome sequence of L. amylovorus GRL1112, a porcine feces isolate displaying strong adherence to the pig intestinal epithelial cells. The strain is of interest, as it is a potential probiotic bacterium. PMID:21131492

  2. Complete Genome Sequences of 61 Mycobacteriophages.

    PubMed

    Hatfull, Graham F

    2016-01-01

    Mycobacteriophages-viruses of mycobacteria-provide insights into viral diversity and evolution as well as numerous tools for genetic dissection of Mycobacterium tuberculosis Here we report the complete genome sequences of 61 mycobacteriophages newly isolated from environmental samples using Mycobacterium smegmatis mc(2)155 that expand our understanding of phage diversity. PMID:27389257

  3. Whole genome sequences of four Brucella strains.

    PubMed

    Ding, Jiabo; Pan, Yuanlong; Jiang, Hai; Cheng, Junsheng; Liu, Taotao; Qin, Nan; Yang, Yi; Cui, Buyun; Chen, Chen; Liu, Cuihua; Mao, Kairong; Zhu, Baoli

    2011-07-01

    Brucella melitensis and Brucella suis are intracellular pathogens of livestock and humans. Here we report four genome sequences, those of the virulent strain B. melitensis M28-12 and vaccine strains B. melitensis M5 and M111 and B. suis S2, which show different virulences and pathogenicities, which will help to design a more effective brucellosis vaccine. PMID:21602346

  4. Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genomics research in mammals has produced reference genome sequences that are essential for identifying variation associated with disease. High quality reference genome sequences are now available for humans, model species, and economically important agricultural animals. Comparisons between these s...

  5. Gambling on a shortcut to genome sequencing

    SciTech Connect

    Roberts, L.

    1991-06-21

    Almost from the start of the Human Genome Project, a debate has been raging over whether to sequence the entire human genome, all 3 billion bases, or just the genes - a mere 2% or 3% of the genome, and by far the most interesting part. In England, Sydney Brenner convinced the Medical Research Council (MRC) to start with the expressed genes, or complementary DNAs. But the US stance has been that the entire sequence is essential if we are to understand the blueprint of man. Craig Venter of the National Institute of Neurological Disorders and Stroke says that focusing on the expressed genes may be even more useful than expected. His strategy involves randomly selecting clones from cDNA libraries which theoretically contain all the genes that are switched on at a particular time in a particular tissue. Then the researchers sequence just a short stretch of each clone, about 400 to 500 bases, to create can expressed sequence tag or EST. The sequences of these ESTs are then stored in a database. Using that information, other researchers can then recreate that EST by using polymerase chain reaction techniques.

  6. Single-Molecule Real-Time Sequencing Combined with Optical Mapping Yields Completely Finished Fungal Genome

    PubMed Central

    Faino, Luigi; Seidl, Michael F.; Datema, Erwin; van den Berg, Grardy C. M.; Janssen, Antoine; Wittenberg, Alexander H. J.

    2015-01-01

    ABSTRACT Next-generation sequencing (NGS) technologies have increased the scalability, speed, and resolution of genomic sequencing and, thus, have revolutionized genomic studies. However, eukaryotic genome sequencing initiatives typically yield considerably fragmented genome assemblies. Here, we assessed various state-of-the-art sequencing and assembly strategies in order to produce a contiguous and complete eukaryotic genome assembly, focusing on the filamentous fungus Verticillium dahliae. Compared with Illumina-based assemblies of the V. dahliae genome, hybrid assemblies that also include PacBio-generated long reads establish superior contiguity. Intriguingly, provided that sufficient sequence depth is reached, assemblies solely based on PacBio reads outperform hybrid assemblies and even result in fully assembled chromosomes. Furthermore, the addition of optical map data allowed us to produce a gapless and complete V. dahliae genome assembly of the expected eight chromosomes from telomere to telomere. Consequently, we can now study genomic regions that were previously not assembled or poorly assembled, including regions that are populated by repetitive sequences, such as transposons, allowing us to fully appreciate an organism’s biological complexity. Our data show that a combination of PacBio-generated long reads and optical mapping can be used to generate complete and gapless assemblies of fungal genomes. PMID:26286689

  7. Agaricus bisporus genome sequence: a commentary.

    PubMed

    Kerrigan, Richard W; Challen, Michael P; Burton, Kerry S

    2013-06-01

    The genomes of two isolates of Agaricus bisporus have been sequenced recently. This soil-inhabiting fungus has a wide geographical distribution in nature and it is also cultivated in an industrialized indoor process ($4.7bn annual worldwide value) to produce edible mushrooms. Previously this lignocellulosic fungus has resisted precise econutritional classification, i.e. into white- or brown-rot decomposers. The generation of the genome sequence and transcriptomic analyses has revealed a new classification, 'humicolous', for species adapted to grow in humic-rich, partially decomposed leaf material. The Agaricus biporus genomes contain a collection of polysaccharide and lignin-degrading genes and more interestingly an expanded number of genes (relative to other lignocellulosic fungi) that enhance degradation of lignin derivatives, i.e. heme-thiolate peroxidases and β-etherases. A motif that is hypothesized to be a promoter element in the humicolous adaptation suite is present in a large number of genes specifically up-regulated when the mycelium is grown on humic-rich substrate. The genome sequence of A. bisporus offers a platform to explore fungal biology in carbon-rich soil environments and terrestrial cycling of carbon, nitrogen, phosphorus and potassium. PMID:23558250

  8. Comparative Analysis of Genome Sequences with VISTA

    DOE Data Explorer

    Dubchak, Inna

    VISTA is a comprehensive suite of programs and databases developed by and hosted at the Genomics Division of Lawrence Berkeley National Laboratory. They provide information and tools designed to facilitate comparative analysis of genomic sequences. Users have two ways to interact with the suite of applications at the VISTA portal. They can submit their own sequences and alignments for analysis (VISTA servers) or examine pre-computed whole-genome alignments of different species. A key menu option is the Enhancer Browser and Database at http://enhancer.lbl.gov/. The VISTA Enhancer Browser is a central resource for experimentally validated human noncoding fragments with gene enhancer activity as assessed in transgenic mice. Most of these noncoding elements were selected for testing based on their extreme conservation with other vertebrates. The results of this enhancer screen are provided through this publicly available website. The browser also features relevant results by external contributors and a large collection of additional genome-wide conserved noncoding elements which are candidate enhancer sequences. The LBL developers invite external groups to submit computational predictions of developmental enhancers. As of 10/19/2009 the database contains information on 1109 in vivo tested elements - 508 elements with enhancer activity.

  9. Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors

    PubMed Central

    MacLeod, Iona M.; Larkin, Denis M.; Lewin, Harris A.; Hayes, Ben J.; Goddard, Mike E.

    2013-01-01

    Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole-genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (Ne) if the correction method was not applied to the data, both for our method and the Li and Durbin (Inference of human population history from individual whole-genome sequences. Nature 475:493–496) pairwise sequentially Markovian coalescent method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in Ne. The LD statistic summarizes the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the Ne around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with Ne of between 3,500 and 6,000. The most recent reduction of Ne to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole-genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals. PMID:23842528

  10. Whole-genome sequencing in bacteriology: state of the art

    PubMed Central

    Dark, Michael J

    2013-01-01

    Over the last ten years, genome sequencing capabilities have expanded exponentially. There have been tremendous advances in sequencing technology, DNA sample preparation, genome assembly, and data analysis. This has led to advances in a number of facets of bacterial genomics, including metagenomics, clinical medicine, bacterial archaeology, and bacterial evolution. This review examines the strengths and weaknesses of techniques in bacterial genome sequencing, upcoming technologies, and assembly techniques, as well as highlighting recent studies that highlight new applications for bacterial genomics. PMID:24143115

  11. Genome Sequence of Psychrobacter cibarius Strain W1

    PubMed Central

    Raghupathi, Prem K.; Herschend, Jakob; Røder, Henriette L.; Sørensen, Søren J.

    2016-01-01

    Here, we report the draft genome sequence of Psychrobacter cibarius strain W1, which was isolated at a slaughterhouse in Denmark. The 3.63-Mb genome sequence was assembled into 241 contigs. PMID:27231353

  12. Draft Genome Sequence of Mycobacterium brumae ATCC 51384

    PubMed Central

    D'Auria, Giuseppe

    2016-01-01

    Here, we report the draft genome sequence of Mycobacterium brumae type strain ATCC 51384. This is the first draft genome sequence of M. brumae, a nonpathogenic, rapidly growing, nonchromogenic mycobacterium, with immunotherapeutic capacities. PMID:27125480

  13. Whole Genome Sequencing: Cracking the Genetic Code for Foodborne Illness

    MedlinePlus

    ... Consumers Consumer Updates Whole Genome Sequencing: Cracking the Genetic Code for Foodborne Illness Share Tweet Linkedin Pin ... have millions of different genomes, or sequences of genetic code, each as unique as a fingerprint. Get ...

  14. Whole genome sequence analysis of Mycobacterium suricattae.

    PubMed

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; van der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah Musa; Siame, Kabengele Keith; Gey van Pittius, Nicolaas Claudius; van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-12-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi. PMID:26542221

  15. Simple sequence repeats in prokaryotic genomes

    PubMed Central

    Mrázek, Jan; Guo, Xiangxue; Shah, Apurva

    2007-01-01

    Simple sequence repeats (SSRs) in DNA sequences are composed of tandem iterations of short oligonucleotides and may have functional and/or structural properties that distinguish them from general DNA sequences. They are variable in length because of slip-strand mutations and may also affect local structure of the DNA molecule or the encoded proteins. Long SSRs (LSSRs) are common in eukaryotes but rare in most prokaryotes. In pathogens, SSRs can enhance antigenic variance of the pathogen population in a strategy that counteracts the host immune response. We analyze representations of SSRs in >300 prokaryotic genomes and report significant differences among different prokaryotes as well as among different types of SSRs. LSSRs composed of short oligonucleotides (1–4 bp length, designated LSSR1–4) are often found in host-adapted pathogens with reduced genomes that are not known to readily survive in a natural environment outside the host. In contrast, LSSRs composed of longer oligonucleotides (5–11 bp length, designated LSSR5–11) are found mostly in nonpathogens and opportunistic pathogens with large genomes. Comparisons among SSRs of different lengths suggest that LSSR1–4 are likely maintained by selection. This is consistent with the established role of some LSSR1–4 in enhancing antigenic variance. By contrast, abundance of LSSR5–11 in some genomes may reflect the SSRs' general tendency to expand rather than their specific role in the organisms' physiology. Differences among genomes in terms of SSR representations and their possible interpretations are discussed. PMID:17485665

  16. Elucidating population histories using genomic DNA sequences.

    PubMed

    Vigilant, Linda

    2009-04-01

    In 1993, Cliff Jolly suggested that rather than debating species definitions and classifications, energy would be better spent investigating multidimensional patterns of variation and gene flow among populations. Until now, however, genetic studies of wild primate populations have been limited to very small portions of the genome. Access to complete genome sequences of humans, chimpanzees, macaques, and other primates makes it possible to design studies surveying substantial amounts of DNA sequence variation at multiple genetic loci in representatives of closely related but distinct wild primate populations. Such data can be analyzed with new approaches that estimate not only when populations diverged but also the relative amounts and directions of subsequent gene flow. These analyses will reemphasize the difficulty of achieving consistent species and subspecies definitions by revealing the extent of variation in the amount and duration of gene flow accompanying population divergences. PMID:19817223

  17. Complete genome sequence of Piry vesiculovirus.

    PubMed

    de Souza, William Marciel; Acrani, Gustavo Olszanski; Romeiro, Marilia Farignoli; Júnior, Osvaldo Reis; Tolardo, Aline Lavado; de Andrade, Amanda Araújo Serrão; da Silva Gonçalves Vianez Júnior, João Lídio; de Almeida Medeiros, Daniele Barbosa; Nunes, Márcio Roberto Teixeira; Figueiredo, Luiz Tadeu Moraes

    2016-08-01

    Piry virus (PIRYV) is a rhabdovirus (genus Vesiculovirus) and is described as a possible human pathogen, originally isolated from a Philander opossum trapped in Para State, Northern Brazil. This study describes the complete full coding sequence and the genetic characterization of PIRYV. The genome sequence reveals that PIRYV has a typical vesiculovirus-like organization, encoding the five genes typical of the genus. Phylogenetic analysis confirmed that PIRYV is most closely related to Perinet virus and clustered in the same clade as Chandipura and Isfahan vesiculoviruses. PMID:27216928

  18. Complete Genome Sequence of Mycobacterium abscessus subsp. bolletii

    PubMed Central

    Spilker, Theodore; LiPuma, John J.

    2016-01-01

    We report the complete genome sequence of a Mycobacterium abscessus subsp. bolletii isolate recovered from a sputum culture from an individual with cystic fibrosis. This sequence is the first completed whole-genome sequence of M. abscessus subsp. bolletii and adds value to studies of M. abscessus complex genomics. PMID:27284156

  19. Genome Sequence of the Zoonotic Pathogen Chlamydophila psittaci▿

    PubMed Central

    Seth-Smith, Helena M. B.; Harris, Simon R.; Rance, Richard; West, Anthony P.; Severin, Juliette A.; Ossewaarde, Jacobus M.; Cutcliffe, Lesley T.; Skilton, Rachel J.; Marsh, Pete; Parkhill, Julian; Clarke, Ian N.; Thomson, Nicholas R.

    2011-01-01

    We present the first genome sequence of Chlamydophila psittaci, an intracellular pathogen of birds and a human zoonotic pathogen. A comparison with previously sequenced Chlamydophila genomes shows that, as in other chlamydiae, most of the genome diversity is restricted to the plasticity zone. The C. psittaci plasmid was also sequenced. PMID:21183672

  20. Draft Genome Sequence of Rubrivivax gelatinosus CBS

    SciTech Connect

    Hu, P. S.; Lang, J.; Wawrousek, K.; Yu, J. P.; Maness, P. C.; Chen, J.

    2012-06-01

    Rubrivivax gelatinosus CBS, a purple nonsulfur photosynthetic bacterium, can grow photosynthetically using CO and N{sub 2} as the sole carbon and nitrogen nutrients, respectively. R. gelatinosus CBS is of particular interest due to its ability to metabolize CO and yield H{sub 2}. We present the 5-Mb draft genome sequence of R. gelatinosus CBS with the goal of providing genetic insight into the metabolic properties of this bacterium.

  1. Complete Genome Sequences of 138 Mycobacteriophages

    PubMed Central

    2012-01-01

    Bacteriophages are the most numerous biological entities in the biosphere, and although their genetic diversity is high, it remains ill defined. Mycobacteriophages—the viruses of mycobacterial hosts—provide insights into this diversity as well as tools for manipulating Mycobacterium tuberculosis. We report here the complete genome sequences of 138 new mycobacteriophages, which—together with the 83 mycobacteriophages previously reported—represent the largest collection of phages known to infect a single common host, Mycobacterium smegmatis mc2 155. PMID:22282335

  2. Complete genome sequence of Candidatus Ruthia magnifica.

    PubMed

    Roeselers, Guus; Newton, Irene L G; Woyke, Tanja; Auchtung, Thomas A; Dilly, Geoffrey F; Dutton, Rachel J; Fisher, Meredith C; Fontanez, Kristina M; Lau, Evan; Stewart, Frank J; Richardson, Paul M; Barry, Kerrie W; Saunders, Elizabeth; Detter, John C; Wu, Dongying; Eisen, Jonathan A; Cavanaugh, Colleen M

    2010-01-01

    The hydrothermal vent clam Calyptogena magnifica (Bivalvia: Mollusca) is a member of the Vesicomyidae. Species within this family form symbioses with chemosynthetic Gammaproteobacteria. They exist in environments such as hydrothermal vents and cold seeps and have a rudimentary gut and feeding groove, indicating a large dependence on their endosymbionts for nutrition. The C. magnifica symbiont, Candidatus Ruthia magnifica, was the first intracellular sulfur-oxidizing endosymbiont to have its genome sequenced (Newton et al. 2007). Here we expand upon the original report and provide additional details complying with the emerging MIGS/MIMS standards. The complete genome exposed the genetic blueprint of the metabolic capabilities of the symbiont. Genes which were predicted to encode the proteins required for all the metabolic pathways typical of free-living chemoautotrophs were detected in the symbiont genome. These include major pathways including carbon fixation, sulfur oxidation, nitrogen assimilation, as well as amino acid and cofactor/vitamin biosynthesis. This genome sequence is invaluable in the study of these enigmatic associations and provides insights into the origin and evolution of autotrophic endosymbiosis. PMID:21304746

  3. The predictive capacity of personal genome sequencing.

    PubMed

    Roberts, Nicholas J; Vogelstein, Joshua T; Parmigiani, Giovanni; Kinzler, Kenneth W; Vogelstein, Bert; Velculescu, Victor E

    2012-05-01

    New DNA sequencing methods will soon make it possible to identify all germline variants in any individual at a reasonable cost. However, the ability of whole-genome sequencing to predict predisposition to common diseases in the general population is unknown. To estimate this predictive capacity, we use the concept of a "genometype." A specific genometype represents the genomes in the population conferring a specific level of genetic risk for a specified disease. Using this concept, we estimated the maximum capacity of whole-genome sequencing to identify individuals at clinically significant risk for 24 different diseases. Our estimates were derived from the analysis of large numbers of monozygotic twin pairs; twins of a pair share the same genometype and therefore identical genetic risk factors. Our analyses indicate that (i) for 23 of the 24 diseases, most of the individuals will receive negative test results; (ii) these negative test results will, in general, not be very informative, because the risk of developing 19 of the 24 diseases in those who test negative will still be, at minimum, 50 to 80% of that in the general population; and (iii) on the positive side, in the best-case scenario, more than 90% of tested individuals might be alerted to a clinically significant predisposition to at least one disease. These results have important implications for the valuation of genetic testing by industry, health insurance companies, public policy-makers, and consumers. PMID:22472521

  4. Assessing the Costs and Cost-Effectiveness of Genomic Sequencing

    PubMed Central

    Christensen, Kurt D.; Dukhovny, Dmitry; Siebert, Uwe; Green, Robert C.

    2015-01-01

    Despite dramatic drops in DNA sequencing costs, concerns are great that the integration of genomic sequencing into clinical settings will drastically increase health care expenditures. This commentary presents an overview of what is known about the costs and cost-effectiveness of genomic sequencing. We discuss the cost of germline genomic sequencing, addressing factors that have facilitated the decrease in sequencing costs to date and anticipating the factors that will drive sequencing costs in the future. We then address the cost-effectiveness of diagnostic and pharmacogenomic applications of genomic sequencing, with an emphasis on the implications for secondary findings disclosure and the integration of genomic sequencing into general patient care. Throughout, we ground the discussion by describing efforts in the MedSeq Project, an ongoing randomized controlled clinical trial, to understand the costs and cost-effectiveness of integrating whole genome sequencing into cardiology and primary care settings. PMID:26690481

  5. Global Genomic Diversity of Human Papillomavirus 6 Based on 724 Isolates and 190 Complete Genome Sequences

    PubMed Central

    Jelen, Mateja M.; Chen, Zigui; Kocjan, Boštjan J.; Burt, Felicity J.; Chan, Paul K. S.; Chouhy, Diego; Combrinck, Catharina E.; Coutlée, François; Estrade, Christine; Ferenczy, Alex; Fiander, Alison; Franco, Eduardo L.; Garland, Suzanne M.; Giri, Adriana A.; González, Joaquín Víctor; Gröning, Arndt; Heidrich, Kerstin; Hibbitts, Sam; Hošnjak, Lea; Luk, Tommy N. M.; Marinic, Karina; Matsukura, Toshihiko; Neumann, Anna; Oštrbenk, Anja; Picconi, Maria Alejandra; Richardson, Harriet; Sagadin, Martin; Sahli, Roland; Seedat, Riaz Y.; Seme, Katja; Severini, Alberto; Sinchi, Jessica L.; Smahelova, Jana; Tabrizi, Sepehr N.; Tachezy, Ruth; Tohme, Sarah; Uloza, Virgilijus; Vitkauskiene, Astra; Wong, Yong Wee; Židovec Lepej, Snježana; Burk, Robert D.

    2014-01-01

    ABSTRACT Human papillomavirus type 6 (HPV6) is the major etiological agent of anogenital warts and laryngeal papillomas and has been included in both the quadrivalent and nonavalent prophylactic HPV vaccines. This study investigated the global genomic diversity of HPV6, using 724 isolates and 190 complete genomes from six continents, and the association of HPV6 genomic variants with geographical location, anatomical site of infection/disease, and gender. Initially, a 2,800-bp E5a-E5b-L1-LCR fragment was sequenced from 492/530 (92.8%) HPV6-positive samples collected for this study. Among them, 130 exhibited at least one single nucleotide polymorphism (SNP), indel, or amino acid change in the E5a-E5b-L1-LCR fragment and were sequenced in full. A global alignment and maximum likelihood tree of 190 complete HPV6 genomes (130 fully sequenced in this study and 60 obtained from sequence repositories) revealed two variant lineages, A and B, and five B sublineages: B1, B2, B3, B4, and B5. HPV6 (sub)lineage-specific SNPs and a 960-bp representative region for whole-genome-based phylogenetic clustering within the L2 open reading frame were identified. Multivariate logistic regression analysis revealed that lineage B predominated globally. Sublineage B3 was more common in Africa and North and South America, and lineage A was more common in Asia. Sublineages B1 and B3 were associated with anogenital infections, indicating a potential lesion-specific predilection of some HPV6 sublineages. Females had higher odds for infection with sublineage B3 than males. In conclusion, a global HPV6 phylogenetic analysis revealed the existence of two variant lineages and five sublineages, showing some degree of ethnogeographic, gender, and/or disease predilection in their distribution. IMPORTANCE This study established the largest database of globally circulating HPV6 genomic variants and contributed a total of 130 new, complete HPV6 genome sequences to available sequence repositories. Two HPV

  6. Why Assembling Plant Genome Sequences Is So Challenging

    PubMed Central

    Claros, Manuel Gonzalo; Bautista, Rocío; Guerrero-Fernández, Darío; Benzerki, Hicham; Seoane, Pedro; Fernández-Pozo, Noé

    2012-01-01

    In spite of the biological and economic importance of plants, relatively few plant species have been sequenced. Only the genome sequence of plants with relatively small genomes, most of them angiosperms, in particular eudicots, has been determined. The arrival of next-generation sequencing technologies has allowed the rapid and efficient development of new genomic resources for non-model or orphan plant species. But the sequencing pace of plants is far from that of animals and microorganisms. This review focuses on the typical challenges of plant genomes that can explain why plant genomics is less developed than animal genomics. Explanations about the impact of some confounding factors emerging from the nature of plant genomes are given. As a result of these challenges and confounding factors, the correct assembly and annotation of plant genomes is hindered, genome drafts are produced, and advances in plant genomics are delayed. PMID:24832233

  7. Functional genomics of tomato in a post-genome-sequencing phase

    PubMed Central

    Aoki, Koh; Ogata, Yoshiyuki; Igarashi, Kaori; Yano, Kentaro; Nagasaki, Hideki; Kaminuma, Eli; Toyoda, Atsushi

    2013-01-01

    Completion of tomato genome sequencing project has broad impacts on genetic and genomic studies of tomato and Solanaceae plants. The reference genome sequence derived from Solanum lycopersicum cv ‘Heinz 1706’ serves as the firm basis for sequencing-based approaches to tomato genomics. In this article, we first present a brief summary of the genome sequencing project and a summary of the reference genome sequence. We then focus on recent progress in transcriptome sequencing and small RNA sequencing and show how the reference genome sequence makes these analyses more comprehensive than before. We discuss the potential of in-depth analysis that is based on DNA methylome sequencing and transcription start-site detection. Finally, we describe the current status of efforts to resequence S. lycopersicum cultivars to demonstrate how resequencing can allow the use of intraspecific genomic diversity for detailed phenotyping and breeding. PMID:23641177

  8. Whole Chloroplast Genome Sequencing in Fragaria Using Deep Sequencing: A Comparison of Three Methods

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Chloroplast sequences previously investigated in Fragaria revealed low amounts of variation. Deep sequencing technologies enable economical sequencing of complete chloroplast genomes. These sequences can potentially provide robust phylogenetic resolution, even at low taxonomic levels within plant gr...

  9. Porcine parvovirus: DNA sequence and genome organization.

    PubMed

    Ranz, A I; Manclús, J J; Díaz-Aroca, E; Casal, J I

    1989-10-01

    We have determined the nucleotide sequence of an almost full-length clone of porcine parvovirus (PPV). The sequence is 4973 nucleotides (nt) long. The 3' end of virion DNA shows a Y-shaped configuration homologous to rodent parvoviruses. The 5' end of virion DNA shows a repetition of 127 nt at the carboxy terminus of the capsid proteins. The overall organization of the PPV genome is similar to those of other autonomous parvoviruses. There are two large open reading frames (ORFs) that almost entirely cover the genome, both located in the same frame of the complementary strand. The left ORF encodes the non-structural protein NS1 and the right ORF encodes the capsid proteins (VP1, VP2 and VP3). Promoter analysis, location of splicing sites and putative amino acid sequences for the viral proteins show a high homology of PPV with feline panleukopenia virus and canine parvoviruses (FPV and CPV) and rodent parvovirus. Therefore we conclude that PPV is related to the Kilham rat virus (KRV) group of autonomous parvoviruses formed by KRV, minute virus of mice, Lu III, H-1, FPV and CPV. PMID:2794971

  10. Initial sequencing and comparative analysis of the mouse genome

    SciTech Connect

    Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F.; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E.; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R.; Brown, Daniel G.; Brown, Stephen D.; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D.; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T.; Church, Deanna M.; Clamp, Michele; Clee, Christopher; Collins, Francis S.; Cook, Lisa L.; Copley, Richard R.; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D.; Deri, Justin; Dermitzakis, Emmanouil T.; Dewey, Colin; Dickens, Nicholas J.; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M.; Eddy, Sean R.; Elnitski, Laura; Emes, Richard D.; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A.; Flicek, Paul; Foley, Karen; Frankel, Wayne N.; Fulton, Lucinda A.; Fulton, Robert S.; Furey, Terrence S.; Gage, Diane; Gibbs, Richard A.; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A.; Green, Eric D.; Gregory, Simon; Guigo, Roderic; Guyer, Mark; Hardison, Ross C.; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W.; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B.; Johnson, L. Steven; Jones, Matthew; Jones, Thomas A.; Joy, Ann; Kamal, Michael; Karlsson, Elinor K.; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W. James; Kirby, Andrew; Kolbe, Diana L.; Korf, Ian; Kucherlapati, Raju S.; Kulbokas III, Edward J.; Kulp, David; Landers, Tom; Leger, J.P.; Leonard, Steven; Letunic, Ivica; Levine, Rosie; et al.

    2002-12-15

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  11. Whole genome sequences of two octogenarians with sustained cognitive abilities

    PubMed Central

    Nickles, Dorothee; Madireddy, Lohith; Patel, Nihar; Isobe, Noriko; Miller, Bruce L.; Baranzini, Sergio E.; Kramer, Joel H.; Oksenberg, Jorge R.

    2014-01-01

    Although numerous genetic variants affecting aging and mortality have been identified, e.g. APOE ε4, the genetic component influencing cognitive aging has not been fully defined yet. A better knowledge of the genetics of aging will prove helpful in understanding the underlying biological processes. Here, we describe the whole genome sequences of two female octogenarians. We provide the repertoire of genomic variants that the two octogenarians have in common. We also describe the overlap with the previously reported genomes of two supercentenarians - individuals aged ≥ 110 years. We assessed the genetic disease propensities of the octogenarians and non-aged control genomes and could not find support for the hypothesis that long-lived healthy individuals might exhibit greater genetic fitness than the general population. Furthermore, there is no evidence for an accumulation of previously described variants promoting longevity in the two octogenarians. These findings suggest that genetic fitness, as currently defined, is not the sole factor enabling an increased lifespan. We identified a number of healthy-cognitive-aging candidate genetic loci awaiting confirmation in larger studies. PMID:25618617

  12. Draft Genome Sequence of Streptomyces hygroscopicus subsp. hygroscopicus NBRC 16556.

    PubMed

    Komaki, Hisayuki; Ichikawa, Natsuko; Oguchi, Akio; Hamada, Moriyuki; Tamura, Tomohiko; Suzuki, Ken-Ichiro; Fujita, Nobuyuki

    2016-01-01

    Here, we report the draft genome sequence of strain NBRC 16556, deposited as Streptomyces hygroscopicus subsp. hygroscopicus into the NBRC culture collection. An average nucleotide identity analysis confirmed that the taxonomic identification is correct. The genome sequence will serve as a valuable reference for genome mining to search new secondary metabolites. PMID:27198007

  13. Draft Genome Sequence of Fungus Clonostachys rosea Strain YKD0085

    PubMed Central

    Liu, Shuai; Chang, Yaowen; Hu, Xujia; Gong, Xuanyun; Hao, Xiaojiang

    2016-01-01

    Here, we report the draft genome sequence of Clonostachys rosea (strain YKD0085). The functional annotation of C. rosea provides important information related to its ability to produce secondary metabolites. The genome sequence presented here builds the basis for further genome mining. PMID:27340057

  14. Complete Genome Sequence of Staphylococcus aureus Siphovirus Phage JS01

    PubMed Central

    Jia, Hongying; Bai, Qinqin; Yang, Yongchun

    2013-01-01

    Staphylococcus aureus is the most prevalent and economically significant pathogen causing bovine mastitis. We isolated and characterized one staphylophage from the milk of mastitis-affected cattle and sequenced its genome. Transmission electron microscopy (TEM) observation shows that it belongs to the family Siphovirus. We announce here its complete genome sequence and report major findings from the genomic analysis. PMID:24233583

  15. First Draft Genome Sequence of Staphylococcus condimenti F-2T

    PubMed Central

    Zheng, Beiwen; Hu, Xinjun; Jiang, Xiawei; Li, Ang; Yao, Jian

    2016-01-01

    This report describes the draft genome sequence of S. condimenti strain F-2T (DSM 11674), a potential starter culture. The genome assembly comprised 2,616,174 bp with 34.6% GC content. To the best of our knowledge, this is the first documentation that reports the whole-genome sequence of S. condimenti. PMID:27257207

  16. Draft Genome Sequence of Alternaria alternata ATCC 34957.

    PubMed

    Nguyen, Hai D T; Lewis, Christopher T; Lévesque, C André; Gräfenhan, Tom

    2016-01-01

    We report the draft genome sequence of Alternaria alternata ATCC 34957. This strain was previously reported to produce alternariol and alternariol monomethyl ether on weathered grain sorghum. The genome was sequenced with PacBio technology and assembled into 27 scaffolds with a total genome size of 33.5 Mb. PMID:26769939

  17. First Draft Genome Sequence of Staphylococcus condimenti F-2T.

    PubMed

    Zheng, Beiwen; Hu, Xinjun; Jiang, Xiawei; Li, Ang; Yao, Jian; Li, Lanjuan

    2016-01-01

    This report describes the draft genome sequence of S. condimenti strain F-2(T) (DSM 11674), a potential starter culture. The genome assembly comprised 2,616,174 bp with 34.6% GC content. To the best of our knowledge, this is the first documentation that reports the whole-genome sequence of S. condimenti. PMID:27257207

  18. Draft Genome Sequence of the Fungus Trametes hirsuta 072

    PubMed Central

    Tyazhelova, Tatiana V.; Moiseenko, Konstantin V.; Vasina, Daria V.; Mosunova, Olga V.; Fedorova, Tatiana V.; Maloshenok, Lilya G.; Landesman, Elena O.; Bruskin, Sergei A.; Psurtseva, Nadezhda V.; Slesarev, Alexei I.; Kozyavkin, Sergei A.; Koroleva, Olga V.

    2015-01-01

    A standard draft genome sequence of the white rot saprotrophic fungus Trametes hirsuta 072 (Basidiomycota, Polyporales) is presented. The genome sequence contains about 33.6 Mb assembled in 141 scaffolds with a G+C content of ~57.6%. The draft genome annotation predicts 14,598 putative protein-coding open reading frames (ORFs). PMID:26586872

  19. Draft Genome Sequence of Fungus Clonostachys rosea Strain YKD0085.

    PubMed

    Liu, Shuai; Chang, Yaowen; Hu, Xujia; Gong, Xuanyun; Di, Yingtong; Dong, Jinyan; Hao, Xiaojiang

    2016-01-01

    Here, we report the draft genome sequence of Clonostachys rosea (strain YKD0085). The functional annotation of C. rosea provides important information related to its ability to produce secondary metabolites. The genome sequence presented here builds the basis for further genome mining. PMID:27340057

  20. Whole-Genome Shotgun Sequencing of a Colonizing Multilocus Sequence Type 17 Streptococcus agalactiae Strain

    PubMed Central

    Singh, Pallavi; Springman, A. Cody; Davies, H. Dele

    2012-01-01

    This report highlights the whole-genome shotgun draft sequence for a Streptococcus agalactiae strain representing multilocus sequence type (ST) 17, isolated from a colonized woman at 8 weeks postpartum. This sequence represents an important addition to the published genomes and will promote comparative genomic studies of S. agalactiae recovered from diverse sources. PMID:23045509

  1. Assessing Illumina technology for the high-throughput sequencing of bacteriophage genomes

    PubMed Central

    Rihtman, Branko; Meaden, Sean; Clokie, Martha R.J.; Koskella, Britt

    2016-01-01

    Bacteriophages are the most abundant biological entities on the planet, playing crucial roles in the shaping of bacterial populations. Phages have smaller genomes than their bacterial hosts, yet there are currently fewer fully sequenced phage than bacterial genomes. We assessed the suitability of Illumina technology for high-throughput sequencing and subsequent assembly of phage genomes. In silico datasets reveal that 30× coverage is sufficient to correctly assemble the complete genome of ~98.5% of known phages, with experimental data confirming that the majority of phage genomes can be assembled at 30× coverage. Furthermore, in silico data demonstrate it is possible to co-sequence multiple phages from different hosts, without introducing assembly errors. PMID:27280068

  2. Complete genome sequence of Methanocorpusculum labreanum type strain Z

    PubMed Central

    Anderson, Iain J.; Sieprawska-Lupa, Magdalena; Goltsman, Eugene; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Tice, Hope; Dalin, Eileen; Barry, Kerrie; Pitluck, Sam; Hauser, Loren; Land, Miriam; Lucas, Susan; Richardson, Paul; Whitman, William B.; Kyrpides, Nikos C.

    2009-01-01

    Methanocorpusculum labreanum is a methanogen belonging to the order Methanomicrobiales within the archaeal kingdom Euryarchaeota. The type strain Z was isolated from surface sediments of Tar Pit Lake in the La Brea Tar Pits in Los Angeles, California. M. labreanum is of phylogenetic interest because at the time the sequencing project began only one genome had previously been sequenced from the order Methanomicrobiales. We report here the complete genome sequence of M. labreanum type strain Z and its annotation. This is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea. PMID:21304657

  3. TAG Sequence Identification of Genomic Regions Using TAGdb.

    PubMed

    Ruperao, Pradeep

    2016-01-01

    Second-generation sequencing (SGS) technology has enabled the sequencing of genomes and identification of genes. However, large complex plant genomes remain particularly difficult for de novo assembly. Access to the vast quantity of raw sequence data may facilitate discoveries; however the volume of this data makes access difficult. This chapter discusses the Web-based tool TAGdb that enables researchers to identify paired read second-generation DNA sequence data that share identity with a submitted query sequence. The identified reads can be used for PCR amplification of genomic regions to identify genes and promoters without the need for genome assembly. PMID:26519409

  4. Genomic Sequence Comparisons, 1987-2003 Final Report

    SciTech Connect

    George M. Church

    2004-07-29

    This project was to develop new DNA sequencing and RNA and protein quantitation methods and related genome annotation tools. The project began in 1987 with the development of multiplex sequencing (published in Science in 1988), and one of the first automated sequencing methods. This lead to the first commercial genome sequence in 1994 and to the establishment of the main commercial participants (GTC then Agencourt) in the public DOE/NIH genome project. In collaboration with GTC we contributed to one of the first complete DOE genome sequences, in 1997, that of Methanobacterium thermoautotropicum, a species of great relevance to energy-rich gas production.

  5. Complete genome sequence of Methanoculleus marisnigri type strain JR1

    SciTech Connect

    Anderson, Iain; Sieprawska-Lupa, Magdalena; Goltsman, Eugene; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Tice, Hope; Dalin, Eileen; Barry, Kerrie; Saunders, Elizabeth H; Han, Cliff; Brettin, Tom; Detter, J. Chris; Bruce, David; Mikhailova, Natalia; Pitluck, Sam; Hauser, Loren John; Land, Miriam L; Lucas, Susan; Richardson, P M; Whitman, W. B.; Kyrpides, Nikos C

    2009-01-01

    Methanoculleus marisnigri Romesser et al. 1981 is a methanogen belonging to the order Methanomicrobiales within the archaeal phylum Euryarchaeota. The type strain, JR1, was isolated from anoxic sediments of the Black Sea. M. marisnigri is of phylogenetic interest because at the time the sequencing project began only one genome had previously been sequenced from the order Methanomicrobiales. We report here the complete genome sequence of M. marisnigri type strain JR1 and its annotation. This is part of a Joint Genome Institute 2006 Community Sequencing Program to sequence genomes of diverse Archaea.

  6. Complete genome sequence of Methanocorpusculum labreanum type strain Z

    SciTech Connect

    Anderson, Iain; Sieprawska-Lupa, Magdalena; Goltsman, Eugene; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Tice, Hope; Dalin, Eileen; Barry, Kerrie; Pitluck, Sam; Hauser, Loren John; Land, Miriam L; Lucas, Susan; Richardson, P M; Whitman, W. B.; Kyrpides, Nikos C

    2009-01-01

    Methanocorpusculum labreanum is a methanogen belonging to the order Methanomicrobiales within the archaeal phylum Euryarchaeota. The type strain Z was isolated from surface sediments of Tar Pit Lake in the La Brea Tar Pits in Los Angeles, California. M. labreanum is of phylogenetic interest because at the time the sequencing project began only one genome had previously been sequenced from the order Methanomicrobiales. We report here the complete genome sequence of M. labreanum type strain Z and its annotation. This is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.

  7. De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data

    PubMed Central

    DiGuistini, Scott; Liao, Nancy Y; Platt, Darren; Robertson, Gordon; Seidel, Michael; Chan, Simon K; Docking, T Roderick; Birol, Inanc; Holt, Robert A; Hirst, Martin; Mardis, Elaine; Marra, Marco A; Hamelin, Richard C; Bohlmann, Jörg; Breuil, Colette; Jones, Steven JM

    2009-01-01

    Sequencing-by-synthesis technologies can reduce the cost of generating de novo genome assemblies. We report a method for assembling draft genome sequences of eukaryotic organisms that integrates sequence information from different sources, and demonstrate its effectiveness by assembling an approximately 32.5 Mb draft genome sequence for the forest pathogen Grosmannia clavigera, an ascomycete fungus. We also developed a method for assessing draft assemblies using Illumina paired end read data and demonstrate how we are using it to guide future sequence finishing. Our results demonstrate that eukaryotic genome sequences can be accurately assembled by combining Illumina, 454 and Sanger sequence data. PMID:19747388

  8. Detecting long tandem duplications in genomic sequences

    PubMed Central

    2012-01-01

    Background Detecting duplication segments within completely sequenced genomes provides valuable information to address genome evolution and in particular the important question of the emergence of novel functions. The usual approach to gene duplication detection, based on all-pairs protein gene comparisons, provides only a restricted view of duplication. Results In this paper, we introduce ReD Tandem, a software using a flow based chaining algorithm targeted at detecting tandem duplication arrays of moderate to longer length regions, with possibly locally weak similarities, directly at the DNA level. On the A. thaliana genome, using a reference set of tandem duplicated genes built using TAIR,a we show that ReD Tandem is able to predict a large fraction of recently duplicated genes (dS < 1) and that it is also able to predict tandem duplications involving non coding elements such as pseudo-genes or RNA genes. Conclusions ReD Tandem allows to identify large tandem duplications without any annotation, leading to agnostic identification of tandem duplications. This approach nicely complements the usual protein gene based which ignores duplications involving non coding regions. It is however inherently restricted to relatively recent duplications. By recovering otherwise ignored events, ReD Tandem gives a more comprehensive view of existing evolutionary processes and may also allow to improve existing annotations. PMID:22568762

  9. Mitochondrial genome sequencing in atherosclerosis: what's next?

    PubMed

    Sazonova, Margarita A; Shkurat, Tatiana P; Demakova, Natalya A; Zhelankin, Andrey V; Barinova, Valeria A; Sobenin, Igor A; Orekhov, Alexander N

    2016-01-01

    Cardiovascular diseases are currently a basic cause of mortality in highly developed countries. The major reason for genesis and development of cardiovascular diseases is atherosclerosis. At the present time high technology methods of molecular genetic diagnostics can significantly simplify early presymptomatic recognition of patients with atherosclerosis, to detect risk groups and to perform a family analysis of this pathology. A Next-Generation Sequencing (NGS) technology can be characterized by high productivity and cheapness of full genome analysis of each DNA sample. We suppose that in the nearest future NGS methods will be widely used for scientific and diagnostic purposes, including personalized medicine. In the present review article literature data on using NGS technology were described in studying mitochondrial genome mutations associated with atherosclerosis and its risk factors, such as mitochondrial diabetes, mitochondrial cardiomyopathy, diabetic nephropathy and left ventricular hypertrophy. With the use of the NGS technology it proved to be possible to detect a range of homoplasmic and heteroplasmic mutations and mitochondrial genome haplogroups which are associated with these pathologies. Meanwhile some mutations and haplogroups were detected both in atherosclerosis and in its risk factors. It conveys the suggestion that there are common pathogenetic mechanisms causing these pathologies. What comes next? New paradigm of crosstalk between non-pharmaceutical (including molecular genetic) and true pharmaceutical approaches may be developed to fill the niche of effective and pathogenically targeted pretreatment and treatment of preclinical and subclinical atherosclerosis to avoid the development of chronic life-threatening disease. PMID:26561059

  10. First Complete Genome Sequence of Cherry virus A.

    PubMed

    Koinuma, Hiroaki; Nijo, Takamichi; Iwabuchi, Nozomu; Yoshida, Tetsuya; Keima, Takuya; Okano, Yukari; Maejima, Kensaku; Yamaji, Yasuyuki; Namba, Shigetou

    2016-01-01

    The 5'-terminal genomic sequence of Cherry virus A (CVA) has long been unknown. We determined the first complete genome sequence of an apricot isolate of CVA (7,434 nucleotides [nt]). The 5'-untranslated region was 107 nt in length, which was 53 nt longer than those of known CVA sequences. PMID:27284130

  11. First Complete Genome Sequence of Cherry virus A

    PubMed Central

    Koinuma, Hiroaki; Nijo, Takamichi; Iwabuchi, Nozomu; Yoshida, Tetsuya; Keima, Takuya; Okano, Yukari; Maejima, Kensaku; Yamaji, Yasuyuki

    2016-01-01

    The 5′-terminal genomic sequence of Cherry virus A (CVA) has long been unknown. We determined the first complete genome sequence of an apricot isolate of CVA (7,434 nucleotides [nt]). The 5′-untranslated region was 107 nt in length, which was 53 nt longer than those of known CVA sequences. PMID:27284130

  12. Complete Genomic Sequence of Duck Flavivirus from China

    PubMed Central

    Liu, Ming; Liu, Chunguo; Li, Gang; Li, Xiaojun; Yin, Xiuchen; Chen, Yuhuan

    2012-01-01

    We report here the complete genomic sequence of the Chinese duck flavivirus TA strain. This work is the first to document the complete genomic sequence of this previously unknown duck flavivirus strain. The sequence will help further relevant epidemiological studies and extend our general knowledge of flaviviruses. PMID:22354941

  13. Complete Genome Sequence of Rift Valley Fever Virus Strain Lunyo

    PubMed Central

    Horton, Daniel L.; Marston, Denise A.; Johnson, Nicholas; Ellis, Richard J.; Fooks, Anthony R.; Hewson, Roger

    2016-01-01

    Using next-generation sequencing technologies, the first complete genome sequence of Rift Valley fever virus strain Lunyo is reported here. Originally reported as an attenuated antigenic variant strain from Uganda, genomic sequence analysis shows that Lunyo clusters together with other Ugandan isolates. PMID:27081121

  14. Draft Genome Sequence of the Archiascomycetous Yeast Saitoella complicata

    PubMed Central

    Yamauchi, Kenta; Hamamoto, Makiko; Takahashi, Yurika; Ogura, Yoshitoshi; Hayashi, Tetsuya

    2015-01-01

    The draft genome sequence of the archiasomycetous yeast Saitoella complicata was determined. The assembly of newly and previously sequenced data sets resulted in 104 contigs (total of 14.1 Mbp; N50, 239 kbp). On the newly assembled genome, a total of 6,933 protein-coding sequences (7,119 transcripts, including alternative splicing forms) were identified. PMID:26021914

  15. Next Generation Sequencing at the University of Chicago Genomics Core

    SciTech Connect

    Faber, Pieter

    2013-04-24

    The University of Chicago Genomics Core provides University of Chicago investigators (and external clients) access to State-of-the-Art genomics capabilities: next generation sequencing, Sanger sequencing / genotyping and micro-arrays (gene expression, genotyping, and methylation). The current presentation will highlight our capabilities in the area of ultra-high throughput sequencing analysis.

  16. Genome sequencing of the important oilseed crop Sesamum indicum L.

    PubMed

    Zhang, Haiyang; Miao, Hongmei; Wang, Lei; Qu, Lingbo; Liu, Hongyan; Wang, Qiang; Yue, Meiwang

    2013-01-01

    The Sesame Genome Working Group (SGWG) has been formed to sequence and assemble the sesame (Sesamum indicum L.) genome. The status of this project and our planned analyses are described. PMID:23369264

  17. Current challenges in de novo plant genome sequencing and assembly

    PubMed Central

    2012-01-01

    Genome sequencing is now affordable, but assembling plant genomes de novo remains challenging. We assess the state of the art of assembly and review the best practices for the community. PMID:22546054

  18. First complete genome sequence of infectious laryngotracheitis virus

    PubMed Central

    2011-01-01

    Background Infectious laryngotracheitis virus (ILTV) is an alphaherpesvirus that causes acute respiratory disease in chickens worldwide. To date, only one complete genomic sequence of ILTV has been reported. This sequence was generated by concatenating partial sequences from six different ILTV strains. Thus, the full genomic sequence of a single (individual) strain of ILTV has not been determined previously. This study aimed to use high throughput sequencing technology to determine the complete genomic sequence of a live attenuated vaccine strain of ILTV. Results The complete genomic sequence of the Serva vaccine strain of ILTV was determined, annotated and compared to the concatenated ILTV reference sequence. The genome size of the Serva strain was 152,628 bp, with a G + C content of 48%. A total of 80 predicted open reading frames were identified. The Serva strain had 96.5% DNA sequence identity with the concatenated ILTV sequence. Notably, the concatenated ILTV sequence was found to lack four large regions of sequence, including 528 bp and 594 bp of sequence in the UL29 and UL36 genes, respectively, and two copies of a 1,563 bp sequence in the repeat regions. Considerable differences in the size of the predicted translation products of 4 other genes (UL54, UL30, UL37 and UL38) were also identified. More than 530 single-nucleotide polymorphisms (SNPs) were identified. Most SNPs were located within three genomic regions, corresponding to sequence from the SA-2 ILTV vaccine strain in the concatenated ILTV sequence. Conclusions This is the first complete genomic sequence of an individual ILTV strain. This sequence will facilitate future comparative genomic studies of ILTV by providing an appropriate reference sequence for the sequence analysis of other ILTV strains. PMID:21501528

  19. Complete genome sequence of Arcanobacterium haemolyticum type strain (11018T)

    SciTech Connect

    Yasawong, Montri; Teshima, Hazuki; Lapidus, Alla L.; Nolan, Matt; Lucas, Susan; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Bruce, David; Detter, J. Chris; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Rohde, Manfred; Sikorski, Johannes; Pukall, Rudiger; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-01

    Vulcanisaeta distributa Itoh et al. 2002 belongs to the family Thermoproteaceae in the phylum Crenarchaeota. The genus Vulcanisaeta is characterized by a global distribution in hot and acidic springs. This is the first genome sequence from a member of the genus Vulcanisaeta and seventh genome sequence in the family Thermoproteaceae. The 2,374,137 bp long genome with its 2,544 protein-coding and 49 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  20. Draft Genome Sequences of Klebsiella variicola Plant Isolates.

    PubMed

    Martínez-Romero, Esperanza; Silva-Sanchez, Jesús; Barrios, Humberto; Rodríguez-Medina, Nadia; Martínez-Barnetche, Jesús; Téllez-Sosa, Juan; Gómez-Barreto, Rosa Elena; Garza-Ramos, Ulises

    2015-01-01

    Three endophytic Klebsiella variicola isolates-T29A, 3, and 6A2, obtained from sugar cane stem, maize shoots, and banana leaves, respectively-were used for whole-genome sequencing. Here, we report the draft genome sequences of circular chromosomes and plasmids. The genomes contain plant colonization and cellulases genes. This study will help toward understanding the genomic basis of K. variicola interaction with plant hosts. PMID:26358599

  1. Draft Genome Sequences of Klebsiella variicola Plant Isolates

    PubMed Central

    Martínez-Romero, Esperanza; Silva-Sanchez, Jesús; Barrios, Humberto; Rodríguez-Medina, Nadia; Martínez-Barnetche, Jesús; Téllez-Sosa, Juan; Gómez-Barreto, Rosa Elena

    2015-01-01

    Three endophytic Klebsiella variicola isolates—T29A, 3, and 6A2, obtained from sugar cane stem, maize shoots, and banana leaves, respectively—were used for whole-genome sequencing. Here, we report the draft genome sequences of circular chromosomes and plasmids. The genomes contain plant colonization and cellulases genes. This study will help toward understanding the genomic basis of K. variicola interaction with plant hosts. PMID:26358599

  2. Integration of new alternative reference strain genome sequences into the Saccharomyces genome database.

    PubMed

    Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C; Dalusag, Kyla; Demeter, Janos; Engel, Stacia; Hellerstedt, Sage T; Karra, Kalpana; Hitz, Benjamin C; Nash, Robert S; Paskov, Kelley; Sheppard, Travis; Skrzypek, Marek; Weng, Shuai; Wong, Edith; Michael Cherry, J

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. To provide a wider scope of genetic and phenotypic variation in yeast, the genome sequences and their corresponding annotations from 11 alternative S. cerevisiae reference strains have been integrated into SGD. Genomic and protein sequence information for genes from these strains are now available on the Sequence and Protein tab of the corresponding Locus Summary pages. We illustrate how these genome sequences can be utilized to aid our understanding of strain-specific functional and phenotypic differences.Database URL: www.yeastgenome.org. PMID:27252399

  3. Integration of new alternative reference strain genome sequences into the Saccharomyces genome database

    PubMed Central

    Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C.; Dalusag, Kyla; Demeter, Janos; Engel, Stacia; Hellerstedt, Sage T.; Karra, Kalpana; Hitz, Benjamin C.; Nash, Robert S.; Paskov, Kelley; Sheppard, Travis; Skrzypek, Marek; Weng, Shuai; Wong, Edith; Michael Cherry, J.

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. To provide a wider scope of genetic and phenotypic variation in yeast, the genome sequences and their corresponding annotations from 11 alternative S. cerevisiae reference strains have been integrated into SGD. Genomic and protein sequence information for genes from these strains are now available on the Sequence and Protein tab of the corresponding Locus Summary pages. We illustrate how these genome sequences can be utilized to aid our understanding of strain-specific functional and phenotypic differences. Database URL: www.yeastgenome.org PMID:27252399

  4. Genome sequencing and annotation of Proteus sp. SAS71

    PubMed Central

    Selim, Samy; Hassan, Sherif; Hagagy, Nashwa

    2015-01-01

    We report draft genome sequence of Proteus sp. strain SAS71, isolated from water spring in Aljouf region, Saudi Arabia. The draft genome size is 3,037,704 bp with a G + C content of 39.3% and contains 6 rRNA sequence (single copies of 5S, 16S & 23S rRNA). The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. LDIU00000000. PMID:26697338

  5. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions fr...

  6. Sequencing and assembly of the 22-gb loblolly pine genome.

    PubMed

    Zimin, Aleksey; Stevens, Kristian A; Crepeau, Marc W; Holtz-Morris, Ann; Koriabine, Maxim; Marçais, Guillaume; Puiu, Daniela; Roberts, Michael; Wegrzyn, Jill L; de Jong, Pieter J; Neale, David B; Salzberg, Steven L; Yorke, James A; Langley, Charles H

    2014-03-01

    Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer "super-reads," rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp. PMID:24653210

  7. The reference genome sequence of Saccharomyces cerevisiae: then and now.

    PubMed

    Engel, Stacia R; Dietrich, Fred S; Fisk, Dianna G; Binkley, Gail; Balakrishnan, Rama; Costanzo, Maria C; Dwight, Selina S; Hitz, Benjamin C; Karra, Kalpana; Nash, Robert S; Weng, Shuai; Wong, Edith D; Lloyd, Paul; Skrzypek, Marek S; Miyasato, Stuart R; Simison, Matt; Cherry, J Michael

    2014-03-01

    The genome of the budding yeast Saccharomyces cerevisiae was the first completely sequenced from a eukaryote. It was released in 1996 as the work of a worldwide effort of hundreds of researchers. In the time since, the yeast genome has been intensively studied by geneticists, molecular biologists, and computational scientists all over the world. Maintenance and annotation of the genome sequence have long been provided by the Saccharomyces Genome Database, one of the original model organism databases. To deepen our understanding of the eukaryotic genome, the S. cerevisiae strain S288C reference genome sequence was updated recently in its first major update since 1996. The new version, called "S288C 2010," was determined from a single yeast colony using modern sequencing technologies and serves as the anchor for further innovations in yeast genomic science. PMID:24374639

  8. The Reference Genome Sequence of Saccharomyces cerevisiae: Then and Now

    PubMed Central

    Engel, Stacia R.; Dietrich, Fred S.; Fisk, Dianna G.; Binkley, Gail; Balakrishnan, Rama; Costanzo, Maria C.; Dwight, Selina S.; Hitz, Benjamin C.; Karra, Kalpana; Nash, Robert S.; Weng, Shuai; Wong, Edith D.; Lloyd, Paul; Skrzypek, Marek S.; Miyasato, Stuart R.; Simison, Matt; Cherry, J. Michael

    2014-01-01

    The genome of the budding yeast Saccharomyces cerevisiae was the first completely sequenced from a eukaryote. It was released in 1996 as the work of a worldwide effort of hundreds of researchers. In the time since, the yeast genome has been intensively studied by geneticists, molecular biologists, and computational scientists all over the world. Maintenance and annotation of the genome sequence have long been provided by the Saccharomyces Genome Database, one of the original model organism databases. To deepen our understanding of the eukaryotic genome, the S. cerevisiae strain S288C reference genome sequence was updated recently in its first major update since 1996. The new version, called “S288C 2010,” was determined from a single yeast colony using modern sequencing technologies and serves as the anchor for further innovations in yeast genomic science. PMID:24374639

  9. Selection to sequence: opportunities in fungal genomics

    SciTech Connect

    Baker, Scott E.

    2009-12-01

    Selection is a biological force, causing genotypic and phenotypic change over time. Whether environmental or human induced, selective pressures shape the genotypes and the phenotypes of organisms both in nature and in the laboratory. In nature, selective pressure is highly dynamic and the sum of the environment and other organisms. In the laboratory, selection is used in genetic studies and industrial strain development programs to isolate mutants affecting biological processes of interest to researchers. Selective pressures are important considerations for fungal biology. In the laboratory a number of fungi are used as experimental systems to study a wide range of biological processes and in nature fungi are important pathogens of plants and animals and play key roles in carbon and nitrogen cycling. The continued development of high throughput sequencing technologies makes it possible to characterize at the genomic level, the effect of selective pressures both in the lab and in nature for filamentous fungi as well as other organisms.

  10. A taste of pineapple evolution through genome sequencing.

    PubMed

    Xu, Qing; Liu, Zhong-Jian

    2015-12-01

    The genome sequence assembly of the highly heterozygous Ananas comosus and its varieties is an impressive technical achievement. The sequence opens the door to a greater understanding of pineapple morphology and evolution. PMID:26620110

  11. Insights from twenty years of bacterial genome sequencing

    SciTech Connect

    Land, Miriam L; Hauser, Loren John; Jun, Se Ran; Nookaew, Intawat; Leuze, Michael Rex; Ahn, Tae-Hyuk; Karpinets, Tatiana V; Lund, Ole; Kora, Guruprasad H; Wassenaar, Trudy; Poudel, Suresh; Ussery, David W

    2015-01-01

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome

  12. Insights from 20 years of bacterial genome sequencing.

    PubMed

    Land, Miriam; Hauser, Loren; Jun, Se-Ran; Nookaew, Intawat; Leuze, Michael R; Ahn, Tae-Hyuk; Karpinets, Tatiana; Lund, Ole; Kora, Guruprased; Wassenaar, Trudy; Poudel, Suresh; Ussery, David W

    2015-03-01

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome

  13. Coevolution between simple sequence repeats (SSRs) and virus genome size

    PubMed Central

    2012-01-01

    Background Relationship between the level of repetitiveness in genomic sequence and genome size has been investigated by making use of complete prokaryotic and eukaryotic genomes, but relevant studies have been rarely made in virus genomes. Results In this study, a total of 257 viruses were examined, which cover 90% of genera. The results showed that simple sequence repeats (SSRs) is strongly, positively and significantly correlated with genome size. Certain repeat class is distributed in a certain range of genome sequence length. Mono-, di- and tri- repeats are widely distributed in all virus genomes, tetra- SSRs as a common component consist in genomes which more than 100 kb in size; in the range of genome < 100 kb, genomes containing penta- and hexa- SSRs are not more than 50%. Principal components analysis (PCA) indicated that dinucleotide repeat affects the differences of SSRs most strongly among virus genomes. Results showed that SSRs tend to accumulate in larger virus genomes; and the longer genome sequence, the longer repeat units. Conclusions We conducted this research standing on the height of the whole virus. We concluded that genome size is an important factor in affecting the occurrence of SSRs; hosts are also responsible for the variances of SSRs content to a certain degree. PMID:22931422

  14. Genome Sequence of the Repetitive-Sequence-Rich Mycoplasma fermentans Strain M64▿

    PubMed Central

    Shu, Hung-Wei; Liu, Tze-Tze; Chan, Huang-I; Liu, Yen-Ming; Wu, Keh-Ming; Shu, Hung-Yu; Tsai, Shih-Feng; Hsiao, Kwang-Jen; Hu, Wensi S.; Ng, Wailap Victor

    2011-01-01

    Mycoplasma fermentans is a microorganism commonly found in the genitourinary and respiratory tracts of healthy individuals and AIDS patients. The complete genome of the repetitive-sequence-rich M. fermentans strain M64 is reported here. Comparative genomics analysis revealed dramatic differences in genome size between this strain and the recently completely sequenced JER strain. PMID:21642450

  15. Whole-genome sequencing in outbreak analysis.

    PubMed

    Gilchrist, Carol A; Turner, Stephen D; Riley, Margaret F; Petri, William A; Hewlett, Erik L

    2015-07-01

    In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  16. Whole-Genome Sequencing in Outbreak Analysis

    PubMed Central

    Turner, Stephen D.; Riley, Margaret F.; Petri, William A.; Hewlett, Erik L.

    2015-01-01

    SUMMARY In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  17. Complete Genome Sequence of Bacillus subtilis BAB-1, a Biocontrol Agent for Suppression of Tomato Gray Mold

    PubMed Central

    Guo, Qinggang; Li, Shezeng; Lu, Xiuyun; Zhang, Xiaoyun; Wang, Peipei

    2014-01-01

    Bacillus subtilis BAB-1, isolated from cotton rhizosphere soil, is an excellent biocontrol agent for tomato gray mold. The genome of B. subtilis strain BAB-1 was fully sequenced and annotated, genes encoding the antifungal active compound were identified, and multiple sets of regulatory systems were found in the genome. PMID:25103757

  18. Genome Wide Characterization of Simple Sequence Repeats in Cucumber

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The whole genome sequence of the cucumber cultivar Gy14 was recently sequenced at 15× coverage with the Roche 454 Titanium technology. The microsatellite DNA sequences (simple sequence repeats, SSRs) in the assembled scaffolds were computationally explored and characterized. A total of 112,073 SSRs ...

  19. Finishing The Euchromatic Sequence Of The Human Genome

    SciTech Connect

    Rubin, Edward M.; Lucas, Susan; Richardson, Paul; Rokhsar, Daniel; Pennacchio, Len

    2004-09-07

    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process.The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers {approx}99% of the euchromatic genome and is accurate to an error rate of {approx}1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number,birth and death. Notably, the human genome seems to encode only20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead.

  20. Genome sequence of Xanthomonas axonopodis pv. punicae strain LMG 859.

    PubMed

    Sharma, Vikas; Midha, Samriti; Ranjan, Manish; Pinnaka, Anil Kumar; Patil, Prabhu B

    2012-05-01

    We report the 4.94-Mb genome sequence of Xanthomonas axonopodis pv. punicae strain LMG 859, the causal agent of bacterial leaf blight disease in pomegranate. The draft genome will aid in comparative genomics, epidemiological studies, and quarantine of this devastating phytopathogen. PMID:22493202

  1. SEQUENCING THE PIG GENOME USING A BAC BY BAC APPROACH

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We have generated a highly contiguous physical map covering >98% of the pig genome in just 176 contigs. The map is localized to the genome through integration with the UIVC RH map as well BAC end sequence alignments to the human genome. Over 265k HindIII restriction digest fingerprints totaling 16.2...

  2. First Complete Genome Sequence of a Subdivision 6 Acidobacterium Strain

    PubMed Central

    Vieira, Selma; Bunk, Boyke; Riedel, Thomas; Spröer, Cathrin; Overmann, Jörg

    2016-01-01

    Although ubiquitous and abundant in soils, acidobacteria have mostly escaped isolation and remain poorly investigated. Only a few cultured representatives and just eight genomes of subdivisions 1, 3, and 4 are available to date. Here, we determined the complete genome sequence of strain HEG_-6_39, the first genome of Acidobacterium subdivision 6. PMID:27231379

  3. Genome Sequence of Mushroom Soft-Rot Pathogen Janthinobacterium agaricidamnosum

    PubMed Central

    Graupner, Katharina; Lackner, Gerald

    2015-01-01

    Janthinobacterium agaricidamnosum causes soft-rot disease of the cultured button mushroom Agaricus bisporus and is thus responsible for agricultural losses. Here, we present the genome sequence of J. agaricidamnosum DSM 9628. The 5.9-Mb genome harbors several secondary metabolite biosynthesis gene clusters, which renders this neglected bacterium a promising source for genome mining approaches. PMID:25883287

  4. Genome Sequence of Mushroom Soft-Rot Pathogen Janthinobacterium agaricidamnosum.

    PubMed

    Graupner, Katharina; Lackner, Gerald; Hertweck, Christian

    2015-01-01

    Janthinobacterium agaricidamnosum causes soft-rot disease of the cultured button mushroom Agaricus bisporus and is thus responsible for agricultural losses. Here, we present the genome sequence of J. agaricidamnosum DSM 9628. The 5.9-Mb genome harbors several secondary metabolite biosynthesis gene clusters, which renders this neglected bacterium a promising source for genome mining approaches. PMID:25883287

  5. Draft Genome Sequence of a Diarrheagenic Morganella morganii Isolate

    PubMed Central

    Singh, Pallavi; Mosci, Rebekah; Rudrik, James T.

    2015-01-01

    This is a report of the whole-genome draft sequence of a diarrheagenic Morganella morganii isolate from a patient in Michigan, USA. This genome represents an important addition to the limited number of pathogenic M. morganii genomes available. PMID:26450735

  6. Genome Sequence of Xanthomonas axonopodis pv. punicae Strain LMG 859

    PubMed Central

    Sharma, Vikas; Midha, Samriti; Ranjan, Manish; Pinnaka, Anil Kumar

    2012-01-01

    We report the 4.94-Mb genome sequence of Xanthomonas axonopodis pv. punicae strain LMG 859, the causal agent of bacterial leaf blight disease in pomegranate. The draft genome will aid in comparative genomics, epidemiological studies, and quarantine of this devastating phytopathogen. PMID:22493202

  7. Almost finished: the complete genome sequence of Mycosphaerella graminicola

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Mycosphaerella graminicola causes septoria tritici blotch of wheat. An 8.9x shotgun sequence of bread wheat strain IPO323 was generated through the Community Sequencing Program of the U.S. Department of Energy’s Joint Genome Institute (JGI), and was finished at the Stanford Human Genome Center. The ...

  8. Selection of sequence variants to improve dairy cattle genomic predictions

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genomic prediction reliabilities improved when adding selected sequence variants from run 5 of the 1,000 bull genomes project. High density (HD) imputed genotypes for 26,970 progeny tested Holstein bulls were combined with sequence variants for 444 Holstein animals. The first test included 481,904 c...

  9. The carrot genome sequence brings colors out of the dark.

    PubMed

    Garcia-Mas, Jordi; Rodriguez-Concepcion, Manuel

    2016-05-27

    The genome sequence of carrot (Daucus carota L.) is the first completed for an Apiaceae species, furthering knowledge of the evolution of the important euasterid II clade. Analyzing the whole-genome sequence allowed for the identification of a gene that may regulate the accumulation of carotenoids in the root. PMID:27230684

  10. Draft Genome Sequence of “Cohnella kolymensis” B-2846

    PubMed Central

    Kudryashova, Ekaterina B.; Ariskina, Elena V.

    2016-01-01

    A draft genome sequence of “Cohnella kolymensis” strain B-2846 was derived using IonTorrent sequencing technology. The size of the assembly and G+C content were in agreement with those of other species of this genus. Characterization of the genome of a novel species of Cohnella will assist in bacterial systematics. PMID:26769947

  11. Complete Genome Sequence of Enterococcus faecium ATCC 700221.

    PubMed

    McKenney, Peter T; Ling, Lilan; Wang, Guilin; Mane, Shrikant; Pamer, Eric G

    2016-01-01

    We report the complete genome sequence of a vancomycin-resistant isolate of Enterococcus faecium derived from human feces. The genome comprises one chromosome of 2.9 Mb and three plasmids. The strain harbors a plasmid-borne vanA-type vancomycin resistance locus and is a member of multilocus sequencing type (MLST) cluster ST-17. PMID:27198022

  12. Draft Genome Sequence of Neurospora crassa Strain FGSC 73

    SciTech Connect

    Baker, Scott E.; Schackwitz, Wendy; Lipzen, Anna; Martin, Joel; Haridas, Sajeet; LaButti, Kurt; Grigoriev, Igor V.; Simmons, Blake A.; McCluskey, Kevin

    2015-04-02

    We report the elucidation of the complete genome of the Neurospora crassa (Shear and Dodge) strain FGSC 73, a mat-a, trp-3 mutant strain. The genome sequence around the idiotypic mating type locus represents the only publicly available sequence for a mat-a strain. 40.42 Megabases are assembled into 358 scaffolds carrying 11,978 gene models.

  13. Complete Genome Sequence of Bacillus megaterium Bacteriophage Eldridge

    PubMed Central

    Reveille, Alexandra M.; Eldridge, Kimberly A.

    2016-01-01

    In this study the complete genome sequence of the unique bacteriophage Eldridge, isolated from soil using Bacillus megaterium as the host organism, was determined. Eldridge is a myovirus with a genome consisting of 242 genes and is unique when compared to phage sequences in GenBank. PMID:27103735

  14. Draft Genome Sequence of the Fish Pathogen Piscirickettsia salmonis.

    PubMed

    Eppinger, Mark; McNair, Katelyn; Zogaj, Xhavit; Dinsdale, Elizabeth A; Edwards, Robert A; Klose, Karl E

    2013-01-01

    Piscirickettsia salmonis is a Gram-negative intracellular fish pathogen that has a significant impact on the salmon industry. Here, we report the genome sequence of P. salmonis strain LF-89. This is the first draft genome sequence of P. salmonis, and it reveals interesting attributes, including flagellar genes, despite this bacterium being considered nonmotile. PMID:24201203

  15. Complete Genome Sequences of Five Paenibacillus larvae Bacteriophages.

    PubMed

    Sheflo, Michael A; Gardner, Adam V; Merrill, Bryan D; Fisher, Joshua N B; Lunt, Bryce L; Breakwell, Donald P; Grose, Julianne H; Burnett, Sandra H

    2013-01-01

    Paenibacillus larvae is a pathogen of honeybees that causes American foulbrood (AFB). We isolated bacteriophages from soil containing bee debris collected near beehives in Utah. We announce five high-quality complete genome sequences, which represent the first completed genome sequences submitted to GenBank for any P. larvae bacteriophage. PMID:24233582

  16. Initial sequencing and analysis of the human genome.

    PubMed

    Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, Y; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowki, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

    2001-02-15

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence. PMID:11237011

  17. Complete genome sequence of ‘Candidatus Liberibacter africanus’

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The complete genome sequence of ‘Candidatus Liberibacter africanus’ (Laf), strain ptsapsy, was obtained by an Illumina HiSeq 2000. The Laf genome comprises 1,192,232 nucleotides, 34.5% GC content, 1,141 predicted coding sequences, 44 tRNAs, 3 complete copies of ribosomal RNA genes (16S, 23S and 5S) ...

  18. Draft Genome Sequence of the Fish Pathogen Piscirickettsia salmonis

    PubMed Central

    Eppinger, Mark; McNair, Katelyn; Zogaj, Xhavit; Dinsdale, Elizabeth A.; Edwards, Robert A.

    2013-01-01

    Piscirickettsia salmonis is a Gram-negative intracellular fish pathogen that has a significant impact on the salmon industry. Here, we report the genome sequence of P. salmonis strain LF-89. This is the first draft genome sequence of P. salmonis, and it reveals interesting attributes, including flagellar genes, despite this bacterium being considered nonmotile. PMID:24201203

  19. Complete Genome Sequence of a Clinical Isolate of Enterobacter asburiae

    PubMed Central

    Liu, Feng; Yang, Jian; Xiao, Yan; Li, Li; Jin, Qi

    2016-01-01

    We report here the complete genome sequence of Enterobacter asburiae strain ENIPBJ-CG1, isolated from a bone marrow transplant patient. The size of the genome sequence is approximately 4.65 Mb, with a G+C content of 55.76%, and it is predicted to contain 4,790 protein-coding genes. PMID:27284137

  20. Nearly Complete Genome Sequence of Lactobacillus plantarum Strain NIZO2877

    PubMed Central

    Bayjanov, Jumamurat R.; Joncour, Pauline; Hughes, Sandrine; Gillet, Benjamin; Kleerebezem, Michiel; Siezen, Roland; van Hijum, Sacha A. F. T.

    2015-01-01

    Lactobacillus plantarum is a versatile bacterial species that is isolated mostly from foods. Here, we present the first genome sequence of L. plantarum strain NIZO2877 isolated from a hot dog in Vietnam. Its two contigs represent a nearly complete genome sequence. PMID:26607887

  1. Full Genome Sequence of a Bovine Enterovirus Isolated in China

    PubMed Central

    Peng, Xiao-wei; Dong, Hao; Wu, Qing-min

    2014-01-01

    We report the full genome sequence of an isolate of bovine enterovirus type B from China. The virus (BEV-BJ001) was isolated from Beijing, China, from fecal swabs of cattle suffering from severe diarrhea. This genome sequence will give useful insight for future molecular epidemiological studies in China. PMID:24970832

  2. On the current status of Phakopsora pachyrhizi genome sequencing

    PubMed Central

    Loehrer, Marco; Vogel, Alexander; Huettel, Bruno; Reinhardt, Richard; Benes, Vladimir; Duplessis, Sébastien; Usadel, Björn; Schaffrath, Ulrich

    2014-01-01

    Recent advances in the field of sequencing technologies and bioinformatics allow a more rapid access to genomes of non-model organisms at sinking costs. Accordingly, draft genomes of several economically important cereal rust fungi have been released in the last 3 years. Aside from the very recent flax rust and poplar rust draft assemblies there are no genomic data available for other dicot-infecting rust fungi. In this article we outline rust fungus sequencing efforts and comment on the current status of Phakopsora pachyrhizi (Asian soybean rust) genome sequencing. PMID:25221558

  3. On the current status of Phakopsora pachyrhizi genome sequencing.

    PubMed

    Loehrer, Marco; Vogel, Alexander; Huettel, Bruno; Reinhardt, Richard; Benes, Vladimir; Duplessis, Sébastien; Usadel, Björn; Schaffrath, Ulrich

    2014-01-01

    Recent advances in the field of sequencing technologies and bioinformatics allow a more rapid access to genomes of non-model organisms at sinking costs. Accordingly, draft genomes of several economically important cereal rust fungi have been released in the last 3 years. Aside from the very recent flax rust and poplar rust draft assemblies there are no genomic data available for other dicot-infecting rust fungi. In this article we outline rust fungus sequencing efforts and comment on the current status of Phakopsora pachyrhizi (Asian soybean rust) genome sequencing. PMID:25221558

  4. ProDeGe: A Computational Protocol for fully Automated Decontamination of Genomic Data

    Energy Science and Technology Software Center (ESTSC)

    2015-12-01

    The Single Cell Data Decontamination Pipeline is a fully-automated software tool which classifies unscreened contigs from single cell datasets through a combination of homology and feature-based methodologies using the organism's nucleotide sequences and known NCBI taxonomony. The software is freely available to download and install, and can be run on any system.

  5. Ten years of bacterial genome sequencing: comparative-genomics-based discoveries.

    PubMed

    Binnewies, Tim T; Motro, Yair; Hallin, Peter F; Lund, Ole; Dunn, David; La, Tom; Hampson, David J; Bellgard, Matthew; Wassenaar, Trudy M; Ussery, David W

    2006-07-01

    It has been more than 10 years since the first bacterial genome sequence was published. Hundreds of bacterial genome sequences are now available for comparative genomics, and searching a given protein against more than a thousand genomes will soon be possible. The subject of this review will address a relatively straightforward question: "What have we learned from this vast amount of new genomic data?" Perhaps one of the most important lessons has been that genetic diversity, at the level of large-scale variation amongst even genomes of the same species, is far greater than was thought. The classical textbook view of evolution relying on the relatively slow accumulation of mutational events at the level of individual bases scattered throughout the genome has changed. One of the most obvious conclusions from examining the sequences from several hundred bacterial genomes is the enormous amount of diversity--even in different genomes from the same bacterial species. This diversity is generated by a variety of mechanisms, including mobile genetic elements and bacteriophages. An examination of the 20 Escherichia coli genomes sequenced so far dramatically illustrates this, with the genome size ranging from 4.6 to 5.5 Mbp; much of the variation appears to be of phage origin. This review also addresses mobile genetic elements, including pathogenicity islands and the structure of transposable elements. There are at least 20 different methods available to compare bacterial genomes. Metagenomics offers the chance to study genomic sequences found in ecosystems, including genomes of species that are difficult to culture. It has become clear that a genome sequence represents more than just a collection of gene sequences for an organism and that information concerning the environment and growth conditions for the organism are important for interpretation of the genomic data. The newly proposed Minimal Information about a Genome Sequence standard has been developed to obtain this

  6. Minimum taxonomic criteria for bacterial genome sequence depositions and announcements.

    PubMed

    Bull, Matthew J; Marchesi, Julian R; Vandamme, Peter; Plummer, Sue; Mahenthiralingam, Eshwar

    2012-04-01

    Multiple bioinformatic methods are available to analyse the information encoded within the complete genome sequence of a bacterium and accurately assign its species status or nearest phylogenetic neighbour. However, it is clear that even now in what is the third decade of bacterial genomics, taxonomically incorrect genome sequence depositions are still being made. We outline a simple scheme of bioinformatic analysis and a set of minimum criteria that should be applied to all bacterial genomic data to ensure that they are accurately assigned to the species or genus level prior to database deposition. To illustrate the utility of the bioinformatic workflow, we analysed the recently deposited genome sequence of Lactobacillus acidophilus 30SC and demonstrated that this DNA was in fact derived from a strain of Lactobacillus amylovorus. Using these methods researchers can ensure that the taxonomic accuracy of genome sequence depositions is maintained within the ever increasing nucleic acid datasets. PMID:22366464

  7. From complete genome sequence to “complete“ understanding?

    PubMed Central

    Galperin, Michael Y.; Koonin, Eugene V.

    2011-01-01

    The rapidly accumulating genome sequence data allow researchers to address fundamental biological questions that were not even asked just a few years ago. A major problem in genomics is the widening gap between the rapid progress in genome sequencing and the comparatively slow progress in the functional characterization of sequenced genomes. Here we discuss two key questions of genome biology: whether we need more genomes, and how deep is our understanding of biology based on genomic analysis. We argue that overly specific annotations of gene functions are often less useful than the more generic, but also more robust, functional assignments based on protein family classification. We also discuss problems in understanding the functions of the remaining “conserved hypothetical” genes. PMID:20647113

  8. De novo assembly of a bell pepper endornavirus genome sequence using RNA sequencing data.

    PubMed

    Jo, Yeonhwa; Choi, Hoseng; Cho, Won Kyong

    2015-01-01

    The genus Endornavirus is a double-stranded RNA virus that infects a wide range of hosts. In this study, we report on the de novo assembly of a bell pepper endornavirus genome sequence by RNA sequencing (RNA-Seq). Our result demonstrates the successful application of RNA-Seq to obtain a complete viral genome sequence from the transcriptome data. PMID:25792042

  9. Genome sequencing and comparative genomics of honey bee microsporidia, Nosema apis reveal novel insights into host-parasite interactions

    PubMed Central

    2013-01-01

    Background The microsporidia parasite Nosema contributes to the steep global decline of honey bees that are critical pollinators of food crops. There are two species of Nosema that have been found to infect honey bees, Nosema apis and N. ceranae. Genome sequencing of N. apis and comparative genome analysis with N. ceranae, a fully sequenced microsporidia species, reveal novel insights into host-parasite interactions underlying the parasite infections. Results We applied the whole-genome shotgun sequencing approach to sequence and assemble the genome of N. apis which has an estimated size of 8.5 Mbp. We predicted 2,771 protein- coding genes and predicted the function of each putative protein using the Gene Ontology. The comparative genomic analysis led to identification of 1,356 orthologs that are conserved between the two Nosema species and genes that are unique characteristics of the individual species, thereby providing a list of virulence factors and new genetic tools for studying host-parasite interactions. We also identified a highly abundant motif in the upstream promoter regions of N. apis genes. This motif is also conserved in N. ceranae and other microsporidia species and likely plays a role in gene regulation across the microsporidia. Conclusions The availability of the N. apis genome sequence is a significant addition to the rapidly expanding body of microsprodian genomic data which has been improving our understanding of eukaryotic genome diversity and evolution in a broad sense. The predicted virulent genes and transcriptional regulatory elements are potential targets for innovative therapeutics to break down the life cycle of the parasite. PMID:23829473

  10. Genome sequencing and annotation of Serratia sp. strain TEL.

    PubMed

    Lephoto, Tiisetso E; Gray, Vincent M

    2015-12-01

    We present the annotation of the draft genome sequence of Serratia sp. strain TEL (GenBank accession number KP711410). This organism was isolated from entomopathogenic nematode Oscheius sp. strain TEL (GenBank accession number KM492926) collected from grassland soil and has a genome size of 5,000,541 bp and 542 subsystems. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession number LDEG00000000. PMID:26697332

  11. FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption

    PubMed Central

    2015-01-01

    Background The increasing availability of genome data motivates massive research studies in personalized treatment and precision medicine. Public cloud services provide a flexible way to mitigate the storage and computation burden in conducting genome-wide association studies (GWAS). However, data privacy has been widely concerned when sharing the sensitive information in a cloud environment. Methods We presented a novel framework (FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption) to fully outsource GWAS (i.e., chi-square statistic computation) using homomorphic encryption. The proposed framework enables secure divisions over encrypted data. We introduced two division protocols (i.e., secure errorless division and secure approximation division) with a trade-off between complexity and accuracy in computing chi-square statistics. Results The proposed framework was evaluated for the task of chi-square statistic computation with two case-control datasets from the 2015 iDASH genome privacy protection challenge. Experimental results show that the performance of FORESEE can be significantly improved through algorithmic optimization and parallel computation. Remarkably, the secure approximation division provides significant performance gain, but without missing any significance SNPs in the chi-square association test using the aforementioned datasets. Conclusions Unlike many existing HME based studies, in which final results need to be computed by the data owner due to the lack of the secure division operation, the proposed FORESEE framework support complete outsourcing to the cloud and output the final encrypted chi-square statistics. PMID:26733391

  12. Computational Profiling of Microbial Genomes using Short Sequences

    NASA Astrophysics Data System (ADS)

    Doering, Dale; Tsukuda, Toyoko

    2001-03-01

    The genomes of a number of microbial species have now been completely sequenced. We have developed a program for the statistical analysis of the appearance frequency and location of short DNA segments within an entire microbial genome. Using this program, the genomes of Methanococcus jannischii (1.66 Mbase; 68radiodurans (3.28 Mbase; 66and compared to a randomly generated genomic pattern. The random sequence shows the expected statistical frequency distribution about the average that equals the genome size divided by the total number of N size short segments (4N). In contrast, the microbial genomes are radically skewed with a large number of segments that rarely occur and a few that are highly represented in the genome. The specific distribution profile of the segments is strongly dependent on the overall bias in the organism. The biased appearance frequency allows us to develop a genome signature of each microbial species.

  13. Microbial genome sequencing using optical mapping and Illumina sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Introduction Optical mapping is a technique in which strands of genomic DNA are digested with one or more restriction enzymes, and a physical map of the genome constructed from the resulting image. In outline, genomic DNA is extracted from a pure culture, linearly arrayed on a specialized glass sli...

  14. Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi

    SciTech Connect

    Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.; Casjens, S. R.; Qiu, W.-G.; Mongodin, E. F.; Luft, B. J.

    2011-02-01

    Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.

  15. ACMG recommendations for reporting of incidental findings in clinical exome and genome sequencing.

    PubMed

    Green, Robert C; Berg, Jonathan S; Grody, Wayne W; Kalia, Sarah S; Korf, Bruce R; Martin, Christa L; McGuire, Amy L; Nussbaum, Robert L; O'Daniel, Julianne M; Ormond, Kelly E; Rehm, Heidi L; Watson, Michael S; Williams, Marc S; Biesecker, Leslie G

    2013-07-01

    In clinical exome and genome sequencing, there is a potential for the recognition and reporting of incidental or secondary findings unrelated to the indication for ordering the sequencing but of medical value for patient care. The American College of Medical Genetics and Genomics (ACMG) recently published a policy statement on clinical sequencing that emphasized the importance of alerting the patient to the possibility of such results in pretest patient discussions, clinical testing, and reporting of results. The ACMG appointed a Working Group on Incidental Findings in Clinical Exome and Genome Sequencing to make recommendations about responsible management of incidental findings when patients undergo exome or genome sequencing. This Working Group conducted a year-long consensus process, including an open forum at the 2012 Annual Meeting and review by outside experts, and produced recommendations that have been approved by the ACMG Board. Specific and detailed recommendations, and the background and rationale for these recommendations, are described herein. The ACMG recommends that laboratories performing clinical sequencing seek and report mutations of the specified classes or types in the genes listed here. This evaluation and reporting should be performed for all clinical germline (constitutional) exome and genome sequencing, including the "normal" of tumor-normal subtractive analyses in all subjects, irrespective of age but excluding fetal samples. We recognize that there are insufficient data on penetrance and clinical utility to fully support these recommendations, and we encourage the creation of an ongoing process for updating these recommendations at least annually as further data are collected. PMID:23788249

  16. Complete Genome Sequence of Probiotic Strain Lactobacillus acidophilus La-14.

    PubMed

    Stahl, Buffy; Barrangou, Rodolphe

    2013-01-01

    We present the 1,991,830-bp complete genome sequence of Lactobacillus acidophilus strain La-14 (SD-5212). Comparative genomic analysis revealed 99.98% similarity overall to the L. acidophilus NCFM genome. Globally, 111 single nucleotide polymorphisms (SNPs) (95 SNPs, 16 indels) were observed throughout the genome. Also, a 416-bp deletion in the LA14_1146 sugar ABC transporter was identified. PMID:23788546

  17. The Brachypodium genome sequence: a resource for oat genomics research

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Oat (Avena sativa) is an important cereal crop used as both an animal feed and for human consumption. Genetic and genomic research on oat is hindered because it is hexaploid and possesses a large (13 Gb) genome. Diploid Avena relatives have been employed for genetic and genomic studies, but only mod...

  18. Reconstructing mitochondrial genomes directly from genomic next-generation sequencing reads—a baiting and iterative mapping approach

    PubMed Central

    Hahn, Christoph; Bachmann, Lutz; Chevreux, Bastien

    2013-01-01

    We present an in silico approach for the reconstruction of complete mitochondrial genomes of non-model organisms directly from next-generation sequencing (NGS) data—mitochondrial baiting and iterative mapping (MITObim). The method is straightforward even if only (i) distantly related mitochondrial genomes or (ii) mitochondrial barcode sequences are available as starting-reference sequences or seeds, respectively. We demonstrate the efficiency of the approach in case studies using real NGS data sets of the two monogenean ectoparasites species Gyrodactylus thymalli and Gyrodactylus derjavinoides including their respective teleost hosts European grayling (Thymallus thymallus) and Rainbow trout (Oncorhynchus mykiss). MITObim appeared superior to existing tools in terms of accuracy, runtime and memory requirements and fully automatically recovered mitochondrial genomes exceeding 99.5% accuracy from total genomic DNA derived NGS data sets in <24 h using a standard desktop computer. The approach overcomes the limitations of traditional strategies for obtaining mitochondrial genomes for species with little or no mitochondrial sequence information at hand and represents a fast and highly efficient in silico alternative to laborious conventional strategies relying on initial long-range PCR. We furthermore demonstrate the applicability of MITObim for metagenomic/pooled data sets using simulated data. MITObim is an easy to use tool even for biologists with modest bioinformatics experience. The software is made available as open source pipeline under the MIT license at https://github.com/chrishah/MITObim. PMID:23661685

  19. MIPS: a database for genomes and protein sequences

    PubMed Central

    Mewes, H. W.; Frishman, D.; Gruber, C.; Geier, B.; Haase, D.; Kaps, A.; Lemcke, K.; Mannhaupt, G.; Pfeiffer, F.; Schüller, C.; Stocker, S.; Weil, B.

    2000-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried, near Munich, Germany, continues its longstanding tradition to develop and maintain high quality curated genome databases. In addition, efforts have been intensified to cover the wealth of complete genome sequences in a systematic, comprehensive form. Bioinformatics, supporting national as well as European sequencing and functional analysis projects, has resulted in several up-to-date genome-oriented databases. This report describes growing databases reflecting the progress of sequencing the Arabidopsis thaliana (MATDB) and Neurospora crassa genomes (MNCDB), the yeast genome database (MYGD) extended by functional analysis data, the database of annotated human EST-clusters (HIB) and the database of the complete cDNA sequences from the DHGP (German Human Genome Project). It also contains information on the up-to-date database of complete genomes (PEDANT), the classification of protein sequences (ProtFam) and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database. These databases can be accessed through the MIPS WWW server (http://www. mips.biochem.mpg.de ). PMID:10592176

  20. The diploid genome sequence of an Asian individual

    PubMed Central

    Wang, Jun; Wang, Wei; Li, Ruiqiang; Li, Yingrui; Tian, Geng; Goodman, Laurie; Fan, Wei; Zhang, Junqing; Li, Jun; Zhang, Juanbin; Guo, Yiran; Feng, Binxiao; Li, Heng; Lu, Yao; Fang, Xiaodong; Liang, Huiqing; Du, Zhenglin; Li, Dong; Zhao, Yiqing; Hu, Yujie; Yang, Zhenzhen; Zheng, Hancheng; Hellmann, Ines; Inouye, Michael; Pool, John; Yi, Xin; Zhao, Jing; Duan, Jinjie; Zhou, Yan; Qin, Junjie; Ma, Lijia; Li, Guoqing; Yang, Zhentao; Zhang, Guojie; Yang, Bin; Yu, Chang; Liang, Fang; Li, Wenjie; Li, Shaochuan; Li, Dawei; Ni, Peixiang; Ruan, Jue; Li, Qibin; Zhu, Hongmei; Liu, Dongyuan; Lu, Zhike; Li, Ning; Guo, Guangwu; Zhang, Jianguo; Ye, Jia; Fang, Lin; Hao, Qin; Chen, Quan; Liang, Yu; Su, Yeyang; san, A.; Ping, Cuo; Yang, Shuang; Chen, Fang; Li, Li; Zhou, Ke; Zheng, Hongkun; Ren, Yuanyuan; Yang, Ling; Gao, Yang; Yang, Guohua; Li, Zhuo; Feng, Xiaoli; Kristiansen, Karsten; Wong, Gane Ka-Shu; Nielsen, Rasmus; Durbin, Richard; Bolund, Lars; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian

    2009-01-01

    Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics. PMID:18987735

  1. Sequencing genomes from single cells by polymerase cloning.

    PubMed

    Zhang, Kun; Martiny, Adam C; Reppas, Nikos B; Barry, Kerrie W; Malek, Joel; Chisholm, Sallie W; Church, George M

    2006-06-01

    Genome sequencing currently requires DNA from pools of numerous nearly identical cells (clones), leaving the genome sequences of many difficult-to-culture microorganisms unattainable. We report a sequencing strategy that eliminates culturing of microorganisms by using real-time isothermal amplification to form polymerase clones (plones) from the DNA of single cells. Two Escherichia coli plones, analyzed by Affymetrix chip hybridization, demonstrate that plonal amplification is specific and the bias is randomly distributed. Whole-genome shotgun sequencing of Prochlorococcus MIT9312 plones showed 62% coverage of the genome from one plone at a sequencing depth of 3.5x, and 66% coverage from a second plone at a depth of 4.7x. Genomic regions not revealed in the initial round of sequencing are recovered by sequencing PCR amplicons derived from plonal DNA. The mutation rate in single-cell amplification is <2 x 10(5), better than that of current genome sequencing standards. Polymerase cloning should provide a critical tool for systematic characterization of genome diversity in the biosphere. PMID:16732271

  2. Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes

    SciTech Connect

    McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.; Kuehl, Jennifer V.; Boore, Jeffrey L.; dePamphilis, Claude W.

    2005-08-26

    Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. A minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.

  3. Genome Sequence of the Trichosporon asahii Environmental Strain CBS 8904

    PubMed Central

    Li, Hai Tao; Zhu, He; Zhou, Guang Peng; Wang, Meng; Wang, Lei

    2012-01-01

    This is the first report of the genome sequence of Trichosporon asahii environmental strain CBS 8904, which was isolated from maize cobs. Comparison of the genome sequence with that of clinical strain CBS 2479 revealed that they have >99% chromosomal and mitochondrial sequence identity, yet CBS 8904 has 368 specific genes. Analysis of clusters of orthologous groups predicted that 3,307 genes belong to 23 functional categories and 703 genes were predicted to have a general function. PMID:23193141

  4. Real-time, portable genome sequencing for Ebola surveillance.

    PubMed

    Quick, Joshua; Loman, Nicholas J; Duraffour, Sophie; Simpson, Jared T; Severi, Ettore; Cowley, Lauren; Bore, Joseph Akoi; Koundouno, Raymond; Dudas, Gytis; Mikhail, Amy; Ouédraogo, Nobila; Afrough, Babak; Bah, Amadou; Baum, Jonathan H J; Becker-Ziaja, Beate; Boettcher, Jan Peter; Cabeza-Cabrerizo, Mar; Camino-Sánchez, Álvaro; Carter, Lisa L; Doerrbecker, Juliane; Enkirch, Theresa; García-Dorival, Isabel; Hetzelt, Nicole; Hinzmann, Julia; Holm, Tobias; Kafetzopoulou, Liana Eleni; Koropogui, Michel; Kosgey, Abigael; Kuisma, Eeva; Logue, Christopher H; Mazzarelli, Antonio; Meisel, Sarah; Mertens, Marc; Michel, Janine; Ngabo, Didier; Nitzsche, Katja; Pallasch, Elisa; Patrono, Livia Victoria; Portmann, Jasmine; Repits, Johanna Gabriella; Rickett, Natasha Y; Sachse, Andreas; Singethan, Katrin; Vitoriano, Inês; Yemanaberhan, Rahel L; Zekeng, Elsa G; Racine, Trina; Bello, Alexander; Sall, Amadou Alpha; Faye, Ousmane; Faye, Oumar; Magassouba, N'Faly; Williams, Cecelia V; Amburgey, Victoria; Winona, Linda; Davis, Emily; Gerlach, Jon; Washington, Frank; Monteil, Vanessa; Jourdain, Marine; Bererd, Marion; Camara, Alimou; Somlare, Hermann; Camara, Abdoulaye; Gerard, Marianne; Bado, Guillaume; Baillet, Bernard; Delaune, Déborah; Nebie, Koumpingnin Yacouba; Diarra, Abdoulaye; Savane, Yacouba; Pallawo, Raymond Bernard; Gutierrez, Giovanna Jaramillo; Milhano, Natacha; Roger, Isabelle; Williams, Christopher J; Yattara, Facinet; Lewandowski, Kuiama; Taylor, James; Rachwal, Phillip; Turner, Daniel J; Pollakis, Georgios; Hiscox, Julian A; Matthews, David A; O'Shea, Matthew K; Johnston, Andrew McD; Wilson, Duncan; Hutley, Emma; Smit, Erasmus; Di Caro, Antonino; Wölfel, Roman; Stoecker, Kilian; Fleischmann, Erna; Gabriel, Martin; Weller, Simon A; Koivogui, Lamine; Diallo, Boubacar; Keïta, Sakoba; Rambaut, Andrew; Formenty, Pierre; Günther, Stephan; Carroll, Miles W

    2016-02-11

    The Ebola virus disease epidemic in West Africa is the largest on record, responsible for over 28,599 cases and more than 11,299 deaths. Genome sequencing in viral outbreaks is desirable to characterize the infectious agent and determine its evolutionary rate. Genome sequencing also allows the identification of signatures of host adaptation, identification and monitoring of diagnostic targets, and characterization of responses to vaccines and treatments. The Ebola virus (EBOV) genome substitution rate in the Makona strain has been estimated at between 0.87 × 10(-3) and 1.42 × 10(-3) mutations per site per year. This is equivalent to 16-27 mutations in each genome, meaning that sequences diverge rapidly enough to identify distinct sub-lineages during a prolonged epidemic. Genome sequencing provides a high-resolution view of pathogen evolution and is increasingly sought after for outbreak surveillance. Sequence data may be used to guide control measures, but only if the results are generated quickly enough to inform interventions. Genomic surveillance during the epidemic has been sporadic owing to a lack of local sequencing capacity coupled with practical difficulties transporting samples to remote sequencing facilities. To address this problem, here we devise a genomic surveillance system that utilizes a novel nanopore DNA sequencing instrument. In April 2015 this system was transported in standard airline luggage to Guinea and used for real-time genomic surveillance of the ongoing epidemic. We present sequence data and analysis of 142 EBOV samples collected during the period March to October 2015. We were able to generate results less than 24 h after receiving an Ebola-positive sample, with the sequencing process taking as little as 15-60 min. We show that real-time genomic surveillance is possible in resource-limited settings and can be established rapidly to monitor outbreaks. PMID:26840485

  5. Whole-exome targeted sequencing of the uncharacterized pine genome.

    PubMed

    Neves, Leandro G; Davis, John M; Barbazuk, William B; Kirst, Matias

    2013-07-01

    The large genome size of many species hinders the development and application of genomic tools to study them. For instance, loblolly pine (Pinus taeda L.), an ecologically and economically important conifer, has a large and yet uncharacterized genome of 21.7 Gbp. To characterize the pine genome, we performed exome capture and sequencing of 14 729 genes derived from an assembly of expressed sequence tags. Efficiency of sequence capture was evaluated and shown to be similar across samples with increasing levels of complexity, including haploid cDNA, haploid genomic DNA and diploid genomic DNA. However, this efficiency was severely reduced for probes that overlapped multiple exons, presumably because intron sequences hindered probe:exon hybridizations. Such regions could not be entirely avoided during probe design, because of the lack of a reference sequence. To improve the throughput and reduce the cost of sequence capture, a method to multiplex the analysis of up to eight samples was developed. Sequence data showed that multiplexed capture was reproducible among 24 haploid samples, and can be applied for high-throughput analysis of targeted genes in large populations. Captured sequences were de novo assembled, resulting in 11 396 expanded and annotated gene models, significantly improving the knowledge about the pine gene space. Interspecific capture was also evaluated with over 98% of all probes designed from P. taeda that were efficient in sequence capture, were also suitable for analysis of the related species Pinus elliottii Engelm. PMID:23551702

  6. Using BLAT to find sequence similarity in closely related genomes.

    PubMed

    Bhagwat, Medha; Young, Lynn; Robison, Rex R

    2012-03-01

    The BLAST-Like Alignment Tool (BLAT) is used to find genomic sequences that match a protein or DNA sequence submitted by the user. BLAT is typically used for searching similar sequences within the same or closely related species. It was developed to align millions of expressed sequence tags and mouse whole-genome random reads to the human genome at a higher speed. It is freely available either on the Web or as a downloadable stand-alone program. BLAT search results provide a link for visualization in the University of California, Santa Cruz (UCSC) Genome Browser, where associated biological information may be obtained. Three example protocols are given: using an mRNA sequence to identify the exon-intron locations and associated gene in the genomic sequence of the same species, using a protein sequence to identify the coding regions in a genomic sequence and to search for gene family members in the same species, and using a protein sequence to find homologs in another species. PMID:22389010

  7. Savant: genome browser for high-throughput sequencing data

    PubMed Central

    Fiume, Marc; Williams, Vanessa; Brook, Andrew; Brudno, Michael

    2010-01-01

    Motivation: The advent of high-throughput sequencing (HTS) technologies has made it affordable to sequence many individuals' genomes. Simultaneously the computational analysis of the large volumes of data generated by the new sequencing machines remains a challenge. While a plethora of tools are available to map the resulting reads to a reference genome, and to conduct primary analysis of the mappings, it is often necessary to visually examine the results and underlying data to confirm predictions and understand the functional effects, especially in the context of other datasets. Results: We introduce Savant, the Sequence Annotation, Visualization and ANalysis Tool, a desktop visualization and analysis browser for genomic data. Savant was developed for visualizing and analyzing HTS data, with special care taken to enable dynamic visualization in the presence of gigabases of genomic reads and references the size of the human genome. Savant supports the visualization of genome-based sequence, point, interval and continuous datasets, and multiple visualization modes that enable easy identification of genomic variants (including single nucleotide polymorphisms, structural and copy number variants), and functional genomic information (e.g. peaks in ChIP-seq data) in the context of genomic annotations. Availability: Savant is freely available at http://compbio.cs.toronto.edu/savant Contact: savant@cs.toronto.edu PMID:20562449

  8. Reference genome sequence of the model plant Setaria

    SciTech Connect

    Bennetzen, Jeffrey L; Schmutz, Jeremy; Wang, Hao; Percifield, Ryan; Hawkins, Jennifer; Pontaroli, Ana C.; Estep, Matt; Feng, Liang; Vaughn, Justin N; Grimwood, Jane; Jenkins, Jerry; Barry, Kerrie; Lindquist, Erika; Hellsten, Uffe; Deshpande, Shweta; Wang, Xuewen; Wu, Xiaomei; Mitros, Therese; Triplett, Jimmy; Yang, Xiaohan; Ye, Chuyu; Mauro-Herrera, Margarita; Wang, Lin; Li, Pinghua; Sharma, Manoj; Sharma, Rita; Ronald, Pamela; Panaud, Olivier; Kellogg, Elizabeth A.; Brutnell, Thomas P.; Doust, Andrew N.; Tuskan, Gerald A; Rokhsar, Daniel; Devos, Katrien M

    2012-01-01

    We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The ~400-Mb assembly covers ~80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).

  9. Reference genome sequence of the model plant Setaria

    SciTech Connect

    Bennetzen, Jeffrey L; Yang, Xiaohan; Ye, Chuyu; Tuskan, Gerald A

    2012-01-01

    We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The {approx}400-Mb assembly covers {approx}80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).

  10. Genome Science: A Video Tour of the Washington University Genome Sequencing Center for High School and Undergraduate Students

    ERIC Educational Resources Information Center

    Flowers, Susan K.; Easter, Carla; Holmes, Andrea; Cohen, Brian; Bednarski, April E.; Mardis, Elaine R.; Wilson, Richard K.; Elgin, Sarah C. R.

    2005-01-01

    Sequencing of the human genome has ushered in a new era of biology. The technologies developed to facilitate the sequencing of the human genome are now being applied to the sequencing of other genomes. In 2004, a partnership was formed between Washington University School of Medicine Genome Sequencing Center's Outreach Program and Washington…

  11. Marsupial genome sequences: providing insight into evolution and disease.

    PubMed

    Deakin, Janine E

    2012-01-01

    Marsupials (metatherians), with their position in vertebrate phylogeny and their unique biological features, have been studied for many years by a dedicated group of researchers, but it has only been since the sequencing of the first marsupial genome that their value has been more widely recognised. We now have genome sequences for three distantly related marsupial species (the grey short-tailed opossum, the tammar wallaby, and Tasmanian devil), with the promise of many more genomes to be sequenced in the near future, making this a particularly exciting time in marsupial genomics. The emergence of a transmissible cancer, which is obliterating the Tasmanian devil population, has increased the importance of obtaining and analysing marsupial genome sequence for understanding such diseases as well as for conservation efforts. In addition, these genome sequences have facilitated studies aimed at answering questions regarding gene and genome evolution and provided insight into the evolution of epigenetic mechanisms. Here I highlight the major advances in our understanding of evolution and disease, facilitated by marsupial genome projects, and speculate on the future contributions to be made by such sequences. PMID:24278712

  12. BG7: a new approach for bacterial genome annotation designed for next generation sequencing data.

    PubMed

    Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Pareja, Eduardo; Tobes, Raquel

    2012-01-01

    BG7 is a new system for de novo bacterial, archaeal and viral genome annotation based on a new approach specifically designed for annotating genomes sequenced with next generation sequencing technologies. The system is versatile and able to annotate genes even in the step of preliminary assembly of the genome. It is especially efficient detecting unexpected genes horizontally acquired from bacterial or archaeal distant genomes, phages, plasmids, and mobile elements. From the initial phases of the gene annotation process, BG7 exploits the massive availability of annotated protein sequences in databases. BG7 predicts ORFs and infers their function based on protein similarity with a wide set of reference proteins, integrating ORF prediction and functional annotation phases in just one step. BG7 is especially tolerant to sequencing errors in start and stop codons, to frameshifts, and to assembly or scaffolding errors. The system is also tolerant to the high level of gene fragmentation which is frequently found in not fully assembled genomes. BG7 current version - which is developed in Java, takes advantage of Amazon Web Services (AWS) cloud computing features, but it can also be run locally in any operating system. BG7 is a fast, automated and scalable system that can cope with the challenge of analyzing the huge amount of genomes that are being sequenced with NGS technologies. Its capabilities and efficiency were demonstrated in the 2011 EHEC Germany outbreak in which BG7 was used to get the first annotations right the next day after the first entero-hemorrhagic E. coli genome sequences were made publicly available. The suitability of BG7 for genome annotation has been proved for Illumina, 454, Ion Torrent, and PacBio sequencing technologies. Besides, thanks to its plasticity, our system could be very easily adapted to work with new technologies in the future. PMID:23185310

  13. Genome sequencing of chimpanzee malaria parasites reveals possible pathways of adaptation to human hosts.

    PubMed

    Otto, Thomas D; Rayner, Julian C; Böhme, Ulrike; Pain, Arnab; Spottiswoode, Natasha; Sanders, Mandy; Quail, Michael; Ollomo, Benjamin; Renaud, François; Thomas, Alan W; Prugnolle, Franck; Conway, David J; Newbold, Chris; Berriman, Matthew

    2014-01-01

    Plasmodium falciparum causes most human malaria deaths, having prehistorically evolved from parasites of African Great Apes. Here we explore the genomic basis of P. falciparum adaptation to human hosts by fully sequencing the genome of the closely related chimpanzee parasite species P. reichenowi, and obtaining partial sequence data from a more distantly related chimpanzee parasite (P. gaboni). The close relationship between P. reichenowi and P. falciparum is emphasized by almost complete conservation of genomic synteny, but against this strikingly conserved background we observe major differences at loci involved in erythrocyte invasion. The organization of most virulence-associated multigene families, including the hypervariable var genes, is broadly conserved, but P. falciparum has a smaller subset of rif and stevor genes whose products are expressed on the infected erythrocyte surface. Genome-wide analysis identifies other loci under recent positive selection, but a limited number of changes at the host-parasite interface may have mediated host switching. PMID:25203297

  14. Genome sequencing of chimpanzee malaria parasites reveals possible pathways of adaptation to human hosts

    PubMed Central

    Otto, Thomas D.; Rayner, Julian C.; Böhme, Ulrike; Pain, Arnab; Spottiswoode, Natasha; Sanders, Mandy; Quail, Michael; Ollomo, Benjamin; Renaud, François; Thomas, Alan W.; Prugnolle, Franck; Conway, David J.; Newbold, Chris; Berriman, Matthew

    2014-01-01

    Plasmodium falciparum causes most human malaria deaths, having prehistorically evolved from parasites of African Great Apes. Here we explore the genomic basis of P. falciparum adaptation to human hosts by fully sequencing the genome of the closely related chimpanzee parasite species P. reichenowi, and obtaining partial sequence data from a more distantly related chimpanzee parasite (P. gaboni). The close relationship between P. reichenowi and P. falciparum is emphasized by almost complete conservation of genomic synteny, but against this strikingly conserved background we observe major differences at loci involved in erythrocyte invasion. The organization of most virulence-associated multigene families, including the hypervariable var genes, is broadly conserved, but P. falciparum has a smaller subset of rif and stevor genes whose products are expressed on the infected erythrocyte surface. Genome-wide analysis identifies other loci under recent positive selection, but a limited number of changes at the host–parasite interface may have mediated host switching. PMID:25203297

  15. Marine organism cell biology and regulatory sequence discoveryin comparative functional genomics.

    PubMed

    Barnes, David W; Mattingly, Carolyn J; Parton, Angela; Dowell, Lori M; Bayne, Christopher J; Forrest, John N

    2004-10-01

    The use of bioinformatics to integrate phenotypic and genomic data from mammalian models is well established as a means of understanding human biology and disease. Beyond direct biomedical applications of these approaches in predicting structure-function relationships between coding sequences and protein activities, comparative studies also promote understanding of molecular evolution and the relationship between genomic sequence and morphological and physiological specialization. Recently recognized is the potential of comparative studies to identify functionally significant regulatory regions and to generate experimentally testable hypotheses that contribute to understanding mechanisms that regulate gene expression, including transcriptional activity, alternative splicing and transcript stability. Functional tests of hypotheses generated by computational approaches require experimentally tractable in vitro systems, including cell cultures. Comparative sequence analysis strategies that use genomic sequences from a variety of evolutionarily diverse organisms are critical for identifying conserved regulatory motifs in the 5'-upstream, 3'-downstream and introns of genes. Genomic sequences and gene orthologues in the first aquatic vertebrate and protovertebrate organisms to be fully sequenced (Fugu rubripes, Ciona intestinalis, Tetraodon nigroviridis, Danio rerio) as well as in the elasmobranchs, spiny dogfish shark (Squalus acanthias) and little skate (Raja erinacea), and marine invertebrate models such as the sea urchin (Strongylocentrotus purpuratus) are valuable in the prediction of putative genomic regulatory regions. Cell cultures have been derived for these and other model species. Data and tools resulting from these kinds of studies will contribute to understanding transcriptional regulation of biomedically important genes and provide new avenues for medical therapeutics and disease prevention. PMID:19003267

  16. Genome sequencing of the redbanded stink bug (Piezodorus guildinii)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We assembled a partial genome sequence from the redbanded stink bug, Piezodorus guildinii from Illumina MiSeq sequencing runs. The sequence has been submitted and published under NCBI GenBank Accession Number JTEQ01000000. The BioProject and BioSample Accession numbers are PRJNA263369 and SAMN030997...

  17. An international plan to sequence the nuclear genome of onion

    Technology Transfer Automated Retrieval System (TEKTRAN)

    As large-scale DNA sequencing technologies become more efficient and less costly, the genomic DNAs of more and more plants are being sequenced, assembled, and annotated. These complete sequences are extremely valuable for the identification of specific genes associated with important phenotypes. Thi...

  18. Complete genome sequence of Spirosoma linguale type strain (1T)

    SciTech Connect

    Lail, Kathleen; Sikorski, Johannes; Saunders, Elizabeth H; Lapidus, Alla L.; Glavina Del Rio, Tijana; Copeland, A; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Nolan, Matt; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Ivanova, N; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Detter, J. Chris; Schutze, Andrea; Rohde, Manfred; Tindall, Brian; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Chen, Feng

    2010-01-01

    Spirosoma linguale Migula 1894 is the type species of the genus. S. linguale is a free-living and non-pathogenic organism, known for its peculiar ringlike and horseshoe-shaped cell morphology. Here we describe the features of this organism, together with the complete ge-nome sequence and annotation. This is only the third completed genome sequence of a member of the family Cytophagaceae. The 8,491,258 bp long genome with its eight plas-mids, 7,069 protein-coding and 60 RNA genes is part of the Genomic Encyclopedia of Bacte-ria and Archaea project.

  19. Complete genome sequence of Sulfurospirillum deleyianum type strain (5175T)

    SciTech Connect

    Sikorski, Johannes; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Chen, Feng; Tice, Hope; Cheng, Jan-Fang; Saunders, Elizabeth H; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Ovchinnikova, Galina; Pati, Amrita; Ivanova, N; Mavromatis, K; Chen, Amy; Palaniappan, Krishna; Chain, Patrick S. G.; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Detter, J. Chris; Han, Cliff; Rohde, Manfred; Lang, Elke; Spring, Stefan; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-01

    Sulfurospirillum deleyianum Schumacher et al. 1993 is the type species of the genus Sulfurospirillum. S. deleyianum is a model organism for studying sulfur reduction and dissimilatory nitrate reduction as energy source for growth. Also, it is a prominent model organism for studying the structural and functional characteristics of the cytochrome c nitrite reductase. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the genus Sulfurospirillum. The 2,306,351 bp long genome with its 2291 protein-coding and 52 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  20. Complete genome sequence of Gordonia bronchialis type strain (3410T)

    SciTech Connect

    Ivanova, N; Sikorski, Johannes; Jando, Marlen; Lapidus, Alla L.; Nolan, Matt; Glavina Del Rio, Tijana; Tice, Hope; Copeland, A; Cheng, Jan-Fang; Chen, Feng; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Saunders, Elizabeth H; Han, Cliff; Detter, J C; Brettin, Thomas S; Rohde, Manfred; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2010-01-01

    Gordonia bronchialis Tsukamura 1971 is the type species of the genus. G. bronchialis is a human-pathogenic organism that has been isolated from a large variety of human tissues. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family Gordoniaceae. The 5,290,012 bp long genome with its 4,944 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  1. Complete genome sequence of Acidimicrobium ferrooxidans type strain (ICPT)

    SciTech Connect

    Clum, Alicia; Nolan, Matt; Lang, Elke; Glavina Del Rio, Tijana; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavrommatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Goker, Markus; Spring, Stefan; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia C.; Chain, Patrick; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Lapidus, Alla

    2009-05-20

    Acidimicrobium ferrooxidans (Clark and Norris 1996) is the sole and type species of the genus, which until recently was the only genus within the actinobacterial family Acidimicrobiaceae and in the order Acidomicrobiales. Rapid oxidation of iron pyrite during autotrophic growth in the absence of an enhanced CO2 concentration is characteristic for A. ferrooxidans. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the order Acidomicrobiales, and the 2,158,157 bp long single replicon genome with its 2038 protein coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  2. Complete genome sequence of Gordonia bronchialis type strain (3410T)

    PubMed Central

    Ivanova, Natalia; Sikorski, Johannes; Jando, Marlen; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Del Rio, Tijana Glavina; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Chain, Patrick; Saunders, Elizabeth; Han, Cliff; Detter, John C.; Brettin, Thomas; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C.

    2010-01-01

    Gordonia bronchialis Tsukamura 1971 is the type species of the genus. G. bronchialis is a human-pathogenic organism that has been isolated from a large variety of human tissues. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family Gordoniaceae. The 5,290,012 bp long genome with its 4,944 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304674

  3. Complete genome sequence of Thermomonospora curvata type strain (B9)

    SciTech Connect

    Chertkov, Olga; Sikorski, Johannes; Nolan, Matt; Lapidus, Alla L.; Lucas, Susan; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Ngatchou, Olivier Duplex; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Brettin, Thomas S; Han, Cliff; Detter, J. Chris; Rohde, Manfred; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2011-01-01

    Thermomonospora curvata Henssen 1957 is the type species of the genus Thermomonospora. This genus is of interest because members of this clade are sources of new antibiotics, enzymes, and products with pharmacological activity. In addition, members of this genus participate in the active degradation of cellulose. This is the first complete genome sequence of a member of the family Thermomonosporaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 5,639,016 bp long genome with its 4,985 protein-coding and 76 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  4. Draft genome sequence of Enterococcus faecium strain LMG 8148.

    PubMed

    Michiels, Joran E; Van den Bergh, Bram; Fauvart, Maarten; Michiels, Jan

    2016-01-01

    Enterococcus faecium, traditionally considered a harmless gut commensal, is emerging as an important nosocomial pathogen showing increasing rates of multidrug resistance. We report the draft genome sequence of E. faecium strain LMG 8148, isolated in 1968 from a human in Gothenburg, Sweden. The draft genome has a total length of 2,697,490 bp, a GC-content of 38.3 %, and 2,402 predicted protein-coding sequences. The isolation of this strain predates the emergence of E. faecium as a nosocomial pathogen. Consequently, its genome can be useful in comparative genomic studies investigating the evolution of E. faecium as a pathogen. PMID:27610213

  5. The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats

    PubMed Central

    van der Weide, Robin H.; Simonis, Marieke; Hermsen, Roel; Toonen, Pim; Cuppen, Edwin; de Ligt, Joep

    2016-01-01

    Unmapped next-generation sequencing reads are typically ignored while they contain biologically relevant information. We systematically analyzed unmapped reads from whole genome sequencing of 33 inbred rat strains. High quality reads were selected and enriched for biologically relevant sequences; similarity-based analysis revealed clustering similar to previously reported phylogenetic trees. Our results demonstrate that on average 20% of all unmapped reads harbor sequences that can be used to improve reference genomes and generate hypotheses on potential genotype-phenotype relationships. Analysis pipelines would benefit from incorporating the described methods and reference genomes would benefit from inclusion of the genomic segments obtained through these efforts. PMID:27501045

  6. Genome sequencing: a systematic review of health economic evidence

    PubMed Central

    2013-01-01

    Recently the sequencing of the human genome has become a major biological and clinical research field. However, the public health impact of this new technology with focus on the financial effect is not yet to be foreseen. To provide an overview of the current health economic evidence for genome sequencing, we conducted a thorough systematic review of the literature from 17 databases. In addition, we conducted a hand search. Starting with 5 520 records we ultimately included five full-text publications and one internet source, all focused on cost calculations. The results were very heterogeneous and, therefore, difficult to compare. Furthermore, because the methodology of the publications was quite poor, the reliability and validity of the results were questionable. The real costs for the whole sequencing workflow, including data management and analysis, remain unknown. Overall, our review indicates that the current health economic evidence for genome sequencing is quite poor. Therefore, we listed aspects that needed to be considered when conducting health economic analyses of genome sequencing. Thereby, specifics regarding the overall aim, technology, population, indication, comparator, alternatives after sequencing, outcomes, probabilities, and costs with respect to genome sequencing are discussed. For further research, at the outset, a comprehensive cost calculation of genome sequencing is needed, because all further health economic studies rely on valid cost data. The results will serve as an input parameter for budget-impact analyses or cost-effectiveness analyses. PMID:24330507

  7. Sequence and Organization of the Neodiprion lecontei Nucleopolyhedrovirus Genome

    PubMed Central

    Lauzon, Hilary A. M.; Lucarotti, Christopher J.; Krell, Peter J.; Feng, Qili; Retnakaran, Arthur; Arif, Basil M.

    2004-01-01

    All fully sequenced baculovirus genomes, with the exception of the dipteran Culex nigripalpus nucleopolyhedrovirus (CuniNPV), have previously been from Lepidoptera. This study reports the sequencing and characterization of a hymenopteran baculovirus, Neodiprion lecontei nucleopolyhedrovirus (NeleNPV), from the redheaded pine sawfly. NeleNPV has the smallest genome so far published (81,755 bp) and has a GC content of only 33.3%. It contains 89 potential open reading frames, 43 with baculovirus homologues, 6 identified by conserved domains, and 1 with homology to a densovirus structural protein. Average amino acid identity of homologues ranged from 19.7% with CuniNPV to 24.9% with Spodoptera exigua nucleopolyhedrovirus. The conserved set of baculovirus genes has dropped to 29, since NeleNPV lacks an F protein homologue (ac23/ld130). NeleNPV contains 12 conserved lepidopteran baculovirus genes, including that for DNA binding protein, late expression factor 11 (lef-11), polyhedrin, occlusion derived virus envelope protein-18 (odv-e18), p40, and p45, but lacks 21 others, including lef-3, me53, immediate early gene-1, lef-6, pp31, odv-e66, few polyhedra 25k, odv-e25, protein kinase-1, fibroblast growth factor, and ubiquitin. The lack of identified baculovirus homologues may be due to difficulties in identification, differences in host-virus interactions, or other genes performing similar functions. Gene parity plots showed limited colinearity of NeleNPV with other baculoviruses, and phylogenetic analysis indicates that NeleNPV may have existed before the lepidopteran nucleopolyhedrovirus and granulovirus divergence. The creation of two new Baculoviridae genera to fit hymenopteran and dipteran baculoviruses may be necessary. PMID:15194779

  8. Complete genome sequence of Staphylothermus hellenicus P8T

    SciTech Connect

    Anderson, Iain; Wirth, Reinhard; Lucas, Susan; Copeland, A; Lapidus, Alla L.; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Davenport, Karen W.; Detter, J. Chris; Han, Cliff; Tapia, Roxanne; Land, Miriam L; Hauser, Loren John; Pati, Amrita; Mikhailova, Natalia; Woyke, Tanja; Klenk, Hans-Peter; Kyrpides, Nikos C; Ivanova, N

    2011-01-01

    Staphylothermus hellenicus belongs to the order Desulfurococcales within the archaeal phy- lum Crenarchaeota. Strain P8T is the type strain of the species and was isolated from a shal- low hydrothermal vent system at Palaeochori Bay, Milos, Greece. It is a hyperthermophilic, anaerobic heterotroph. Here we describe the features of this organism together with the com- plete genome sequence and annotation. The 1,580,347 bp genome with its 1,668 protein- coding and 48 RNA genes was sequenced as part of a DOE Joint Genome Institute (JGI) La- boratory Sequencing Program (LSP) project.

  9. ICDS database: interrupted CoDing sequences in prokaryotic genomes.

    PubMed

    Perrodou, Emmanuel; Deshayes, Caroline; Muller, Jean; Schaeffer, Christine; Van Dorsselaer, Alain; Ripp, Raymond; Poch, Olivier; Reyrat, Jean-Marc; Lecompte, Odile

    2006-01-01

    Unrecognized frameshifts, in-frame stop codons and sequencing errors lead to Interrupted CoDing Sequence (ICDS) that can seriously affect all subsequent steps of functional characterization, from in silico analysis to high-throughput proteomic projects. Here, we describe the Interrupted CoDing Sequence database containing ICDS detected by a similarity-based approach in 80 complete prokaryotic genomes. ICDS can be retrieved by species browsing or similarity searches via a web interface (http://www-bio3d-igbmc.u-strasbg.fr/ICDS/). The definition of each interrupted gene is provided as well as the ICDS genomic localization with the surrounding sequence. Furthermore, to facilitate the experimental characterization of ICDS, we propose optimized primers for re-sequencing purposes. The database will be regularly updated with additional data from ongoing sequenced genomes. Our strategy has been validated by three independent tests: (i) ICDS prediction on a benchmark of artificially created frameshifts, (ii) comparison of predicted ICDS and results obtained from the comparison of the two genomic sequences of Bacillus licheniformis strain ATCC 14580 and (iii) re-sequencing of 25 predicted ICDS of the recently sequenced genome of Mycobacterium smegmatis. This allows us to estimate the specificity and sensitivity (95 and 82%, respectively) of our program and the efficiency of primer determination. PMID:16381882

  10. Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.

    PubMed

    Staats, Martijn; Erkens, Roy H J; van de Vossenberg, Bart; Wieringa, Jan J; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E; Bakker, Freek T

    2013-01-01

    Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal