Science.gov

Sample records for genome sequence migs

  1. The minimum information about a genome sequence (MIGS) specification

    PubMed Central

    Field, Dawn; Garrity, George; Gray, Tanya; Morrison, Norman; Selengut, Jeremy; Sterk, Peter; Tatusova, Tatiana; Thomson, Nicholas; Allen, Michael J; Angiuoli, Samuel V; Ashburner, Michael; Axelrod, Nelson; Baldauf, Sandra; Ballard, Stuart; Boore, Jeffrey; Cochrane, Guy; Cole, James; Dawyndt, Peter; De Vos, Paul; dePamphilis, Claude; Edwards, Robert; Faruque, Nadeem; Feldman, Robert; Gilbert, Jack; Gilna, Paul; Glöckner, Frank Oliver; Goldstein, Philip; Guralnick, Robert; Haft, Dan; Hancock, David; Hermjakob, Henning; Hertz-Fowler, Christiane; Hugenholtz, Phil; Joint, Ian; Kagan, Leonid; Kane, Matthew; Kennedy, Jessie; Kowalchuk, George; Kottmann, Renzo; Kolker, Eugene; Kravitz, Saul; Kyrpides, Nikos; Leebens-Mack, Jim; Lewis, Suzanna E; Li, Kelvin; Lister, Allyson L; Lord, Phillip; Maltsev, Natalia; Markowitz, Victor; Martiny, Jennifer; Methe, Barbara; Mizrachi, Ilene; Moxon, Richard; Nelson, Karen; Parkhill, Julian; Proctor, Lita; White, Owen; Sansone, Susanna-Assunta; Spiers, Andrew; Stevens, Robert; Swift, Paul; Taylor, Chris; Tateno, Yoshio; Tett, Adrian; Turner, Sarah; Ussery, David; Vaughan, Bob; Ward, Naomi; Whetzel, Trish; Gil, Ingio San; Wilson, Gareth; Wipat, Anil

    2008-01-01

    With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the ‘transparency’ of the information contained in existing genomic databases. PMID:18464787

  2. The minimum information about a genome sequence (MIGS) specification.

    PubMed

    Field, Dawn; Garrity, George; Gray, Tanya; Morrison, Norman; Selengut, Jeremy; Sterk, Peter; Tatusova, Tatiana; Thomson, Nicholas; Allen, Michael J; Angiuoli, Samuel V; Ashburner, Michael; Axelrod, Nelson; Baldauf, Sandra; Ballard, Stuart; Boore, Jeffrey; Cochrane, Guy; Cole, James; Dawyndt, Peter; De Vos, Paul; DePamphilis, Claude; Edwards, Robert; Faruque, Nadeem; Feldman, Robert; Gilbert, Jack; Gilna, Paul; Glöckner, Frank Oliver; Goldstein, Philip; Guralnick, Robert; Haft, Dan; Hancock, David; Hermjakob, Henning; Hertz-Fowler, Christiane; Hugenholtz, Phil; Joint, Ian; Kagan, Leonid; Kane, Matthew; Kennedy, Jessie; Kowalchuk, George; Kottmann, Renzo; Kolker, Eugene; Kravitz, Saul; Kyrpides, Nikos; Leebens-Mack, Jim; Lewis, Suzanna E; Li, Kelvin; Lister, Allyson L; Lord, Phillip; Maltsev, Natalia; Markowitz, Victor; Martiny, Jennifer; Methe, Barbara; Mizrachi, Ilene; Moxon, Richard; Nelson, Karen; Parkhill, Julian; Proctor, Lita; White, Owen; Sansone, Susanna-Assunta; Spiers, Andrew; Stevens, Robert; Swift, Paul; Taylor, Chris; Tateno, Yoshio; Tett, Adrian; Turner, Sarah; Ussery, David; Vaughan, Bob; Ward, Naomi; Whetzel, Trish; San Gil, Ingio; Wilson, Gareth; Wipat, Anil

    2008-05-01

    With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the 'transparency' of the information contained in existing genomic databases.

  3. MIG: Multi-Image Genome viewer.

    PubMed

    McGowan, Simon J; Hughes, Jim R; Han, Zong-Pei; Taylor, Stephen

    2013-10-01

    Multi-Image Genome (MIG) viewer is a web-based application for visualizing, querying and filtering many thousands of genome browser regions as well as for exporting the data in a variety of formats. This methodology has been used successfully to analyze ChIP-Seq data and RNA-Seq data and to detect somatic mutations in genome resequencing projects. MIG is available at https://mig.molbiol.ox.ac.uk/mig/

  4. Enriching public descriptions of marine phages using the Genomic Standards Consortium MIGS standard

    PubMed Central

    Duhaime, Melissa Beth; Kottmann, Renzo; Field, Dawn; Glöckner, Frank Oliver

    2011-01-01

    In any sequencing project, the possible depth of comparative analysis is determined largely by the amount and quality of the accompanying contextual data. The structure, content, and storage of this contextual data should be standardized to ensure consistent coverage of all sequenced entities and facilitate comparisons. The Genomic Standards Consortium (GSC) has developed the “Minimum Information about Genome/Metagenome Sequences (MIGS/MIMS)” checklist for the description of genomes and here we annotate all 30 publicly available marine bacteriophage sequences to the MIGS standard. These annotations build on existing International Nucleotide Sequence Database Collaboration (INSDC) records, and confirm, as expected that current submissions lack most MIGS fields. MIGS fields were manually curated from the literature and placed in XML format as specified by the Genomic Contextual Data Markup Language (GCDML). These “machine-readable” reports were then analyzed to highlight patterns describing this collection of genomes. Completed reports are provided in GCDML. This work represents one step towards the annotation of our complete collection of genome sequences and shows the utility of capturing richer metadata along with raw sequences. PMID:21677864

  5. Enriching public descriptions of marine phages using the Genomic Standards Consortium MIGS standard.

    PubMed

    Duhaime, Melissa Beth; Kottmann, Renzo; Field, Dawn; Glöckner, Frank Oliver

    2011-04-29

    In any sequencing project, the possible depth of comparative analysis is determined largely by the amount and quality of the accompanying contextual data. The structure, content, and storage of this contextual data should be standardized to ensure consistent coverage of all sequenced entities and facilitate comparisons. The Genomic Standards Consortium (GSC) has developed the "Minimum Information about Genome/Metagenome Sequences (MIGS/MIMS)" checklist for the description of genomes and here we annotate all 30 publicly available marine bacteriophage sequences to the MIGS standard. These annotations build on existing International Nucleotide Sequence Database Collaboration (INSDC) records, and confirm, as expected that current submissions lack most MIGS fields. MIGS fields were manually curated from the literature and placed in XML format as specified by the Genomic Contextual Data Markup Language (GCDML). These "machine-readable" reports were then analyzed to highlight patterns describing this collection of genomes. Completed reports are provided in GCDML. This work represents one step towards the annotation of our complete collection of genome sequences and shows the utility of capturing richer metadata along with raw sequences.

  6. MIG-seq: an effective PCR-based method for genome-wide single-nucleotide polymorphism genotyping using the next-generation sequencing platform

    PubMed Central

    Suyama, Yoshihisa; Matsuki, Yu

    2015-01-01

    Restriction-enzyme (RE)-based next-generation sequencing methods have revolutionized marker-assisted genetic studies; however, the use of REs has limited their widespread adoption, especially in field samples with low-quality DNA and/or small quantities of DNA. Here, we developed a PCR-based procedure to construct reduced representation libraries without RE digestion steps, representing de novo single-nucleotide polymorphism discovery, and its genotyping using next-generation sequencing. Using multiplexed inter-simple sequence repeat (ISSR) primers, thousands of genome-wide regions were amplified effectively from a wide variety of genomes, without prior genetic information. We demonstrated: 1) Mendelian gametic segregation of the discovered variants; 2) reproducibility of genotyping by checking its applicability for individual identification; and 3) applicability in a wide variety of species by checking standard population genetic analysis. This approach, called multiplexed ISSR genotyping by sequencing, should be applicable to many marker-assisted genetic studies with a wide range of DNA qualities and quantities. PMID:26593239

  7. MIG-seq: an effective PCR-based method for genome-wide single-nucleotide polymorphism genotyping using the next-generation sequencing platform.

    PubMed

    Suyama, Yoshihisa; Matsuki, Yu

    2015-11-23

    Restriction-enzyme (RE)-based next-generation sequencing methods have revolutionized marker-assisted genetic studies; however, the use of REs has limited their widespread adoption, especially in field samples with low-quality DNA and/or small quantities of DNA. Here, we developed a PCR-based procedure to construct reduced representation libraries without RE digestion steps, representing de novo single-nucleotide polymorphism discovery, and its genotyping using next-generation sequencing. Using multiplexed inter-simple sequence repeat (ISSR) primers, thousands of genome-wide regions were amplified effectively from a wide variety of genomes, without prior genetic information. We demonstrated: 1) Mendelian gametic segregation of the discovered variants; 2) reproducibility of genotyping by checking its applicability for individual identification; and 3) applicability in a wide variety of species by checking standard population genetic analysis. This approach, called multiplexed ISSR genotyping by sequencing, should be applicable to many marker-assisted genetic studies with a wide range of DNA qualities and quantities.

  8. A standard MIGS/MIMS compliant XML Schema: toward the development of the Genomic Contextual Data Markup Language (GCDML).

    PubMed

    Kottmann, Renzo; Gray, Tanya; Murphy, Sean; Kagan, Leonid; Kravitz, Saul; Lombardot, Thierry; Field, Dawn; Glöckner, Frank Oliver

    2008-06-01

    The Genomic Contextual Data Markup Language (GCDML) is a core project of the Genomic Standards Consortium (GSC) that implements the "Minimum Information about a Genome Sequence" (MIGS) specification and its extension, the "Minimum Information about a Metagenome Sequence" (MIMS). GCDML is an XML Schema for generating MIGS/MIMS compliant reports for data entry, exchange, and storage. When mature, this sample-centric, strongly-typed schema will provide a diverse set of descriptors for describing the exact origin and processing of a biological sample, from sampling to sequencing, and subsequent analysis. Here we describe the need for such a project, outline design principles required to support the project, and make an open call for participation in defining the future content of GCDML. GCDML is freely available, and can be downloaded, along with documentation, from the GSC Web site (http://gensc.org).

  9. Genome Sequencing.

    PubMed

    Verma, Mansi; Kulshrestha, Samarth; Puri, Ayush

    2017-01-01

    Genome sequencing is an important step toward correlating genotypes with phenotypic characters. Sequencing technologies are important in many fields in the life sciences, including functional genomics, transcriptomics, oncology, evolutionary biology, forensic sciences, and many more. The era of sequencing has been divided into three generations. First generation sequencing involved sequencing by synthesis (Sanger sequencing) and sequencing by cleavage (Maxam-Gilbert sequencing). Sanger sequencing led to the completion of various genome sequences (including human) and provided the foundation for development of other sequencing technologies. Since then, various techniques have been developed which can overcome some of the limitations of Sanger sequencing. These techniques are collectively known as "Next-generation sequencing" (NGS), and are further classified into second and third generation technologies. Although NGS methods have many advantages in terms of speed, cost, and parallelism, the accuracy and read length of Sanger sequencing is still superior and has confined the use of NGS mainly to resequencing genomes. Consequently, there is a continuing need to develop improved real time sequencing techniques. This chapter reviews some of the options currently available and provides a generic workflow for sequencing a genome.

  10. Submitting MIGS, MIMS, MIENS Information to EMBL and Standards and the Sequencing Pipelines of the Gordon and Betty Moore Foundation (GSC8 Meeting)

    ScienceCinema

    Vaughan, Bob [EMBL; Kaye, Jon [Gordon and Betty Moore Foundation

    2016-07-12

    The Genomic Standards Consortium was formed in September 2005. It is an international, open-membership working body which promotes standardization in the description of genomes and the exchange and integration of genomic data. The 2009 meeting was an activity of a five-year funding "Research Coordination Network" from the National Science Foundation and was organized held at the DOE Joint Genome Institute with organizational support provided by the JGI and by the University of California - San Diego. Bob Vaughan of EMBL on submitting MIGS/MIMS/MIENS information to EMBL-EBI's system, followed by a brief talk from Jon Kaye of the Gordon and Betty Moore Foundation on standards and the foundation's sequencing pipelines at the Genomic Standards Consortium's 8th meeting at the DOE JGI in Walnut Creek, Calif. on Sept. 9, 2009

  11. Whole Genome Sequencing

    MedlinePlus

    ... you want to learn. Search form Search Whole Genome Sequencing You are here Home Testing & Services Testing ... the full story, click here . What is whole genome sequencing? Whole genome sequencing is the mapping out ...

  12. Sequencing technologies and genome sequencing.

    PubMed

    Pareek, Chandra Shekhar; Smoczynski, Rafal; Tretyn, Andrzej

    2011-11-01

    The high-throughput - next generation sequencing (HT-NGS) technologies are currently the hottest topic in the field of human and animals genomics researches, which can produce over 100 times more data compared to the most sophisticated capillary sequencers based on the Sanger method. With the ongoing developments of high throughput sequencing machines and advancement of modern bioinformatics tools at unprecedented pace, the target goal of sequencing individual genomes of living organism at a cost of $1,000 each is seemed to be realistically feasible in the near future. In the relatively short time frame since 2005, the HT-NGS technologies are revolutionizing the human and animal genome researches by analysis of chromatin immunoprecipitation coupled to DNA microarray (ChIP-chip) or sequencing (ChIP-seq), RNA sequencing (RNA-seq), whole genome genotyping, genome wide structural variation, de novo assembling and re-assembling of genome, mutation detection and carrier screening, detection of inherited disorders and complex human diseases, DNA library preparation, paired ends and genomic captures, sequencing of mitochondrial genome and personal genomics. In this review, we addressed the important features of HT-NGS like, first generation DNA sequencers, birth of HT-NGS, second generation HT-NGS platforms, third generation HT-NGS platforms: including single molecule Heliscope™, SMRT™ and RNAP sequencers, Nanopore, Archon Genomics X PRIZE foundation, comparison of second and third HT-NGS platforms, applications, advances and future perspectives of sequencing technologies on human and animal genome research.

  13. Genomic Sequencing in Cancer

    PubMed Central

    Tuna, Musaffe; Amos, Christopher I.

    2013-01-01

    Genomic sequencing has provided critical insights into the etiology of both simple and complex diseases. The enormous reductions in cost for whole genome sequencing have allowed this technology to gain increasing use. Whole genome analysis has impacted research of complex diseases including cancer by allowing the systematic analysis of entire genomes in a single experiment, thereby facilitating the discovery of somatic and germline mutations, and identification of the function and impact of the insertions, deletions, and structural rearrangements, including translocations and inversions, in novel disease genes. Whole-genome sequencing can be used to provide the most comprehensive characterization of the cancer genome, the complexity of which we are only beginning to understand. Hence in this review, we focus on whole-genome sequencing in cancer. PMID:23178448

  14. Complete genome sequence of Candidatus Ruthia magnifica.

    PubMed

    Roeselers, Guus; Newton, Irene L G; Woyke, Tanja; Auchtung, Thomas A; Dilly, Geoffrey F; Dutton, Rachel J; Fisher, Meredith C; Fontanez, Kristina M; Lau, Evan; Stewart, Frank J; Richardson, Paul M; Barry, Kerrie W; Saunders, Elizabeth; Detter, John C; Wu, Dongying; Eisen, Jonathan A; Cavanaugh, Colleen M

    2010-10-27

    The hydrothermal vent clam Calyptogena magnifica (Bivalvia: Mollusca) is a member of the Vesicomyidae. Species within this family form symbioses with chemosynthetic Gammaproteobacteria. They exist in environments such as hydrothermal vents and cold seeps and have a rudimentary gut and feeding groove, indicating a large dependence on their endosymbionts for nutrition. The C. magnifica symbiont, Candidatus Ruthia magnifica, was the first intracellular sulfur-oxidizing endosymbiont to have its genome sequenced (Newton et al. 2007). Here we expand upon the original report and provide additional details complying with the emerging MIGS/MIMS standards. The complete genome exposed the genetic blueprint of the metabolic capabilities of the symbiont. Genes which were predicted to encode the proteins required for all the metabolic pathways typical of free-living chemoautotrophs were detected in the symbiont genome. These include major pathways including carbon fixation, sulfur oxidation, nitrogen assimilation, as well as amino acid and cofactor/vitamin biosynthesis. This genome sequence is invaluable in the study of these enigmatic associations and provides insights into the origin and evolution of autotrophic endosymbiosis.

  15. Prenatal Whole Genome Sequencing

    PubMed Central

    Donley, Greer; Hull, Sara Chandros; Berkman, Benjamin E.

    2014-01-01

    With whole genome sequencing set to become the preferred method of prenatal screening, we need to pay more attention to the massive amount of information it will deliver to parents—and the fact that we don't yet understand what most of it means. PMID:22777977

  16. Toward nanoscale genome sequencing.

    PubMed

    Ryan, Declan; Rahimi, Maryam; Lund, John; Mehta, Ranjana; Parviz, Babak A

    2007-09-01

    This article reports on the state-of-the-art technologies that sequence DNA using miniaturized devices. The article considers the miniaturization of existing technologies for sequencing DNA and the opportunities for cost reduction that 'on-chip' devices can deliver. The ability to construct nano-scale structures and perform measurements using novel nano-scale effects has provided new opportunities to identify nucleotides directly using physical, and not chemical, methods. The challenges that these technologies need to overcome to provide a US$1000-genome sequencing technology are also presented.

  17. Sequencing the maize genome.

    PubMed

    Martienssen, Robert A; Rabinowicz, Pablo D; O'Shaughnessy, Andrew; McCombie, W Richard

    2004-04-01

    Sequencing of complex genomes can be accomplished by enriching shotgun libraries for genes. In maize, gene-enrichment by copy-number normalization (high C(0)t) and methylation filtration (MF) have been used to generate up to two-fold coverage of the gene-space with less than 1 million sequencing reads. Simulations using sequenced bacterial artificial chromosome (BAC) clones predict that 5x coverage of gene-rich regions, accompanied by less than 1x coverage of subclones from BAC contigs, will generate high-quality mapped sequence that meets the needs of geneticists while accommodating unusually high levels of structural polymorphism. By sequencing several inbred strains, we propose a strategy for capturing this polymorphism to investigate hybrid vigor or heterosis.

  18. Towards Sequencing Cotton (Gossypium) Genomes

    USDA-ARS?s Scientific Manuscript database

    Despite rapidly decreasing costs and innovative technologies, sequencing of angiosperm genomes is not yet undertaken lightly. Generating larger amounts of sequence data more quickly does not address the difficulties of sequencing and assembling complex genomes de novo. The cotton genomes represent a...

  19. Multiplexed fragaria chloroplast genome sequencing

    Treesearch

    W. Njuguna; A. Liston; R. Cronn; N.V. Bassil

    2010-01-01

    A method to sequence multiple chloroplast genomes using ultra high throughput sequencing technologies was recently described. Complete chloroplast genome sequences can resolve phylogenetic relationships at low taxonomic levels and identify informative point mutations and indels. The objective of this research was to sequence multiple Fragaria...

  20. Genome Sequence Databases (Overview): Sequencing and Assembly

    SciTech Connect

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  1. Fungal Genome Sequencing and Bioenergy

    SciTech Connect

    Baker, Scott E.; Thykaer, Jette; Adney, William S.; Brettin, T.; Brockman, Fred J.; D'haeseleer, Patrik; Martinez, Antonio D.; Miller, R. M.; Rokhsar, Daniel S.; Schadt, Christopher W.; Torok, Tamas; Tuskan, Gerald; Bennett, Joan W.; Berka, Randy; Briggs, Steve; Heitman, Joseph; Taylor, John; Turgeon, Barbara G.; Werner-Washburne, Maggie; Himmel, Michael E.

    2008-09-30

    To date, the number of ongoing filamentous fungal genome sequencing projects is almost tenfold fewer than those of bacterial and archaeal genome projects. The fungi chosen for sequencing represent narrow kingdom diversity; most are pathogens or models. We advocate an ambitious, forward-looking phylogenetic-based genome sequencing program, designed to capture metabolic diversity within the fungal kingdom, thereby enhancing research into alternative bioenergy sources, bioremediation, and fungal-environment interactions.

  2. Targeted sequencing of plant genomes

    Treesearch

    Mark D. Huynh

    2014-01-01

    Next-generation sequencing (NGS) has revolutionized the field of genetics by providing a means for fast and relatively affordable sequencing. With the advancement of NGS, wholegenome sequencing (WGS) has become more commonplace. However, sequencing an entire genome is still not cost effective or even beneficial in all cases. In studies that do not require a whole-...

  3. Venturia carpophila draft genome sequence

    USDA-ARS?s Scientific Manuscript database

    Venturia carpophila causes peach scab, a disease that renders peach fruit unmarketable. We report a high-quality draft genome sequence (36.9 Mb) of V. carpophila from an isolate collected from a peach tree in central Georgia in the United States. The genome sequence described will be a useful resour...

  4. Toward Universal Forward Genetics: Using a Draft Genome Sequence of the Nematode Oscheius tipulae To Identify Mutations Affecting Vulva Development

    PubMed Central

    Besnard, Fabrice; Koutsovoulos, Georgios; Dieudonné, Sana; Blaxter, Mark; Félix, Marie-Anne

    2017-01-01

    Mapping-by-sequencing has become a standard method to map and identify phenotype-causing mutations in model species. Here, we show that a fragmented draft assembly is sufficient to perform mapping-by-sequencing in nonmodel species. We generated a draft assembly and annotation of the genome of the free-living nematode Oscheius tipulae, a distant relative of the model Caenorhabditis elegans. We used this draft to identify the likely causative mutations at the O. tipulae cov-3 locus, which affect vulval development. The cov-3 locus encodes the O. tipulae ortholog of C. elegans mig-13, and we further show that Cel-mig-13 mutants also have an unsuspected vulval-development phenotype. In a virtuous circle, we were able to use the linkage information collected during mutant mapping to improve the genome assembly. These results showcase the promise of genome-enabled forward genetics in nonmodel species. PMID:28630114

  5. Whole-exome/genome sequencing and genomics.

    PubMed

    Grody, Wayne W; Thompson, Barry H; Hudgins, Louanne

    2013-12-01

    As medical genetics has progressed from a descriptive entity to one focused on the functional relationship between genes and clinical disorders, emphasis has been placed on genomics. Genomics, a subelement of genetics, is the study of the genome, the sum total of all the genes of an organism. The human genome, which is contained in the 23 pairs of nuclear chromosomes and in the mitochondrial DNA of each cell, comprises >6 billion nucleotides of genetic code. There are some 23,000 protein-coding genes, a surprisingly small fraction of the total genetic material, with the remainder composed of noncoding DNA, regulatory sequences, and introns. The Human Genome Project, launched in 1990, produced a draft of the genome in 2001 and then a finished sequence in 2003, on the 50th anniversary of the initial publication of Watson and Crick's paper on the double-helical structure of DNA. Since then, this mass of genetic information has been translated at an ever-increasing pace into useable knowledge applicable to clinical medicine. The recent advent of massively parallel DNA sequencing (also known as shotgun, high-throughput, and next-generation sequencing) has brought whole-genome analysis into the clinic for the first time, and most of the current applications are directed at children with congenital conditions that are undiagnosable by using standard genetic tests for single-gene disorders. Thus, pediatricians must become familiar with this technology, what it can and cannot offer, and its technical and ethical challenges. Here, we address the concepts of human genomic analysis and its clinical applicability for primary care providers.

  6. Sequencing Complex Genomic Regions

    SciTech Connect

    Eichler, Evan

    2009-05-28

    Evan Eichler, Howard Hughes Medical Investigator at the University of Washington, gives the May 28, 2009 keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM. Part 1 of 2

  7. Sequencing Complex Genomic Regions

    SciTech Connect

    Eichler, Evan

    2009-05-28

    Evan Eichler, Howard Hughes Medical Investigator at the University of Washington, gives the May 28, 2009 keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM. Part 2 of 2

  8. Integrating sequence, evolution and functional genomics in regulatory genomics

    PubMed Central

    Vingron, Martin; Brazma, Alvis; Coulson, Richard; van Helden, Jacques; Manke, Thomas; Palin, Kimmo; Sand, Olivier; Ukkonen, Esko

    2009-01-01

    With genome analysis expanding from the study of genes to the study of gene regulation, 'regulatory genomics' utilizes sequence information, evolution and functional genomics measurements to unravel how regulatory information is encoded in the genome. PMID:19226437

  9. Genome sequences and great expectations

    PubMed Central

    Iliopoulos, Ioannis; Tsoka, Sophia; Andrade, Miguel A; Janssen, Paul; Audit, Benjamin; Tramontano, Anna; Valencia, Alfonso; Leroy, Christophe; Sander, Chris; Ouzounis, Christos A

    2001-01-01

    To assess how automatic function assignment will contribute to genome annotation in the next five years, we have performed an analysis of 31 available genome sequences. An emerging pattern is that function can be predicted for almost two-thirds of the 73,500 genes that were analyzed. Despite progress in computational biology, there will always be a great need for large-scale experimental determination of protein function. PMID:11178275

  10. Genome Sequence of Canine Herpesvirus

    PubMed Central

    Papageorgiou, Konstantinos V.; Suárez, Nicolás M.; Wilkie, Gavin S.; McDonald, Michael; Graham, Elizabeth M.; Davison, Andrew J.

    2016-01-01

    Canine herpesvirus is a widespread alphaherpesvirus that causes a fatal haemorrhagic disease of neonatal puppies. We have used high-throughput methods to determine the genome sequences of three viral strains (0194, V777 and V1154) isolated in the United Kingdom between 1985 and 2000. The sequences are very closely related to each other. The canine herpesvirus genome is estimated to be 125 kbp in size and consists of a unique long sequence (97.5 kbp) and a unique short sequence (7.7 kbp) that are each flanked by terminal and internal inverted repeats (38 bp and 10.0 kbp, respectively). The overall nucleotide composition is 31.6% G+C, which is the lowest among the completely sequenced alphaherpesviruses. The genome contains 76 open reading frames predicted to encode functional proteins, all of which have counterparts in other alphaherpesviruses. The availability of the sequences will facilitate future research on the diagnosis and treatment of canine herpesvirus-associated disease. PMID:27213534

  11. migS, a cis-acting site that affects bipolar positioning of oriC on the Escherichia coli chromosome

    PubMed Central

    Yamaichi, Yoshiharu; Niki, Hironori

    2004-01-01

    During replication of the Escherichia coli chromosome, the replicated Ori domains migrate towards opposite cell poles, suggesting that a cis-acting site for bipolar migration is located in this region. To identify this cis-acting site, a series of mutants was constructed by splitting subchromosomes from the original chromosome. One mutant, containing a 720 kb subchromosome, was found to be defective in the bipolar positioning of oriC. The creation of deletion mutants allowed the identification of migS, a 25 bp sequence, as the cis-acting site for the bipolar positioning of oriC. When migS was located at the replication terminus, the chromosomal segment showed bipolar positioning. migS was able to rescue bipolar migration of plasmid DNA containing a mutation in the SopABC partitioning system. Interestingly, multiple copies of the migS sequence on a plasmid in trans inhibited the bipolar positioning of oriC. Taken together, these findings indicate that migS plays a crucial role in the bipolar positioning of oriC. In addition, real-time analysis of the dynamic morphological changes of nucleoids in wild-type and migS mutants suggests that bipolar positioning of the replicated oriC contributes to nucleoid organization. PMID:14685268

  12. Genome Sequence of Spizellomyces punctatus

    PubMed Central

    Russ, Carsten; Lang, B. Franz; Chen, Zehua; Gujja, Sharvari; Shea, Terrance; Zeng, Qiandong; Young, Sarah; Nusbaum, Chad

    2016-01-01

    Spizellomyces punctatus is a basally branching chytrid fungus that is found in the Chytridiomycota phylum. Spizellomyces species are common in soil and of importance in terrestrial ecosystems. Here, we report the genome sequence of S. punctatus, which will facilitate the study of this group of early diverging fungi. PMID:27540072

  13. Fusicladium effusum draft genome sequence

    USDA-ARS?s Scientific Manuscript database

    The pecan scab fungus (Fusicladium effusum [G. Winter]) is an economically important pathogen of pecan (Carya illinoinensis [Wangenh]. K. Koch), on account of its impact on yield and quality of valuable nutmeats. We describe the first draft genome sequence of F. effusum, the characteristics of annot...

  14. MIG Operations in Korea.

    DTIC Science & Technology

    1986-03-01

    second phase of operations saw the introduction of MIG pincer and envelopment tactics. Often a force of 60 to 80 MIGs would cross from Manchuria over...southward-bound F-86s. An additional force of MIGs usually would come straight down the jaws of the pincers as far south as 21 Sinanjy in time to provide...increasingly difficult to bring him to grips . By no means did engagements keep pace with the enemy sortie rate. For all indications the Reds were pursuing a

  15. Genome Sequences of Eight Morphologically Diverse Alphaproteobacteria▿

    PubMed Central

    Brown, Pamela J. B.; Kysela, David T.; Buechlein, Aaron; Hemmerich, Chris; Brun, Yves V.

    2011-01-01

    The Alphaproteobacteriacomprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium. PMID:21705585

  16. Genome sequences of eight morphologically diverse Alphaproteobacteria.

    PubMed

    Brown, Pamela J B; Kysela, David T; Buechlein, Aaron; Hemmerich, Chris; Brun, Yves V

    2011-09-01

    The Alphaproteobacteria comprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium.

  17. Genome Sequence of Mycobacteriophage Momo

    PubMed Central

    Bina, Elizabeth A.; Brahme, Indraneel S.; Hill, Amy B.; Himmelstein, Philip H.; Hunsicker, Sara M.; Ish, Amanda R.; Le, Tinh S.; Martin, Mary M.; Moscinski, Catherine N.; Shetty, Sameer A.; Swierzewski, Tomasz; Iyengar, Varun B.; Kim, Hannah; Schafer, Claire E.; Grubb, Sarah R.; Warner, Marcie H.; Bowman, Charles A.; Russell, Daniel A.; Hatfull, Graham F.

    2015-01-01

    Momo is a newly discovered phage of Mycobacterium smegmatis mc2155. Momo has a double-stranded DNA genome 154,553 bp in length, with 233 predicted protein-encoding genes, 34 tRNA genes, and one transfer-messenger RNA (tmRNA) gene. Momo has a myoviral morphology and shares extensive nucleotide sequence similarity with subcluster C1 mycobacteriophages. PMID:26089415

  18. Genome Sequence of Mycobacteriophage Momo.

    PubMed

    Pope, Welkin H; Bina, Elizabeth A; Brahme, Indraneel S; Hill, Amy B; Himmelstein, Philip H; Hunsicker, Sara M; Ish, Amanda R; Le, Tinh S; Martin, Mary M; Moscinski, Catherine N; Shetty, Sameer A; Swierzewski, Tomasz; Iyengar, Varun B; Kim, Hannah; Schafer, Claire E; Grubb, Sarah R; Warner, Marcie H; Bowman, Charles A; Russell, Daniel A; Hatfull, Graham F

    2015-06-18

    Momo is a newly discovered phage of Mycobacterium smegmatis mc(2)155. Momo has a double-stranded DNA genome 154,553 bp in length, with 233 predicted protein-encoding genes, 34 tRNA genes, and one transfer-messenger RNA (tmRNA) gene. Momo has a myoviral morphology and shares extensive nucleotide sequence similarity with subcluster C1 mycobacteriophages. Copyright © 2015 Pope et al.

  19. Personal genome sequencing: current approaches and challenges

    PubMed Central

    Snyder, Michael; Du, Jiang; Gerstein, Mark

    2010-01-01

    The revolution in DNA sequencing technologies has now made it feasible to determine the genome sequences of many individuals; i.e., “personal genomes.” Genome sequences of cells and tissues from both normal and disease states have been determined. Using current approaches, whole human genome sequences are not typically assembled and determined de novo, but, instead, variations relative to a reference sequence are identified. We discuss the current state of personal genome sequencing, the main steps involved in determining a genome sequence (i.e., identifying single-nucleotide polymorphisms [SNPs] and structural variations [SVs], assembling new sequences, and phasing haplotypes), and the challenges and performance metrics for evaluating the accuracy of the reconstruction. Finally, we consider the possible individual and societal benefits of personal genome sequences. PMID:20194435

  20. Sequencing crop genomes: approaches and applications

    USDA-ARS?s Scientific Manuscript database

    Plant genome sequencing methodology parrallels the sequencing of the human genome. The first projects were slow and very expensive. BAC by BAC approaches were utilized first and whole-genome shotgun sequencing rapidly replaced that approach. So called 'next generation' technologies such as short rea...

  1. Sequencing Intractable DNA to Close Microbial Genomes

    SciTech Connect

    Hurt, Jr., Richard Ashley; Brown, Steven D; Podar, Mircea; Palumbo, Anthony Vito; Elias, Dwayne A

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  2. Fungal genome sequencing: basic biology to biotechnology.

    PubMed

    Sharma, Krishna Kant

    2016-08-01

    The genome sequences provide a first glimpse into the genomic basis of the biological diversity of filamentous fungi and yeast. The genome sequence of the budding yeast, Saccharomyces cerevisiae, with a small genome size, unicellular growth, and rich history of genetic and molecular analyses was a milestone of early genomics in the 1990s. The subsequent completion of fission yeast, Schizosaccharomyces pombe and genetic model, Neurospora crassa initiated a revolution in the genomics of the fungal kingdom. In due course of time, a substantial number of fungal genomes have been sequenced and publicly released, representing the widest sampling of genomes from any eukaryotic kingdom. An ambitious genome-sequencing program provides a wealth of data on metabolic diversity within the fungal kingdom, thereby enhancing research into medical science, agriculture science, ecology, bioremediation, bioenergy, and the biotechnology industry. Fungal genomics have higher potential to positively affect human health, environmental health, and the planet's stored energy. With a significant increase in sequenced fungal genomes, the known diversity of genes encoding organic acids, antibiotics, enzymes, and their pathways has increased exponentially. Currently, over a hundred fungal genome sequences are publicly available; however, no inclusive review has been published. This review is an initiative to address the significance of the fungal genome-sequencing program and provides the road map for basic and applied research.

  3. Whole-Genome Sequencing: Manual Library Preparation.

    PubMed

    Mardis, Elaine; McCombie, W Richard

    2017-01-03

    This protocol describes a manual approach for the preparation of genomic DNA libraries suitable for Illumina sequencing. Genomic DNA fragments produced by shearing by sonication are ligated to adaptors and amplified by polymerase chain reaction (PCR). The amplified DNA, separated by size and gel-purified, is suitable for use as template in whole-genome sequencing.

  4. SP8 Sequencing Extinct Genomes

    PubMed Central

    Poinar, H.

    2007-01-01

    Nucleic acids, which hold clues to the evolution of various animal and hominid taxa, are comparatively weak molecules from other cellular debris, and thus evolutionary biologists are in essence time trapped. Fortunately, DNA and protein fragments do exist in fossil remains beyond what theoretical experimentation would suggest. Sequestering of DNA molecules in humic or Maillard-like complexes likely represents a rich source of DNA molecules from the past, which have yet to be tapped. These molecules were impossible to acquire due to the selective nature of the polymerase chain reaction. Recently, however, rapid parallel pyrosequencing techniques, such as those used in metagenomics-based research, which, in theory, allow for the identification of all short nucleotide sequences in a sample in a non-selective approach, have the potential to allow the identification of all nucleic acids in a sample, and thus represent the way forward for ancient DNA. In theory, this new technology will allow the completion of genomes of extinct animals, plants, and microbes. I will discuss the benefits and pitfalls of this metagenomics approach to ancient DNA, highlighting our recent efforts underway to sequence the wooly mammoth genome as well as other fossil remains.

  5. Draft Genome Sequence of Lactobacillus rhamnosus 2166

    PubMed Central

    Melnikov, Vyacheslav G.; Kosarev, Igor V.; Abramov, Vyacheslav M.

    2014-01-01

    In this report, we present a draft sequence of the genome of Lactobacillus rhamnosus strain 2166, a potential novel probiotic. Genome annotation and read mapping onto a reference genome of L. rhamnosus strain GG allowed for the identification of the differences and similarities in the genomic contents and gene arrangements of these strains. PMID:24558254

  6. Value of a newly sequenced bacterial genome

    PubMed Central

    Barbosa, Eudes GV; Aburjaile, Flavia F; Ramos, Rommel TJ; Carneiro, Adriana R; Le Loir, Yves; Baumbach, Jan; Miyoshi, Anderson; Silva, Artur; Azevedo, Vasco

    2014-01-01

    Next-generation sequencing (NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft (partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the “scientific value” of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information. PMID:24921006

  7. [Construction of pORF-mIG and activity evaluation of MIG in vitro].

    PubMed

    Zhang, Ru; Tian, Ling; Xiao, Fei; Wen, Yan-jun; li, Jiong; Hou, Jian-mei; Zhang, Ling; Li, Gang; Yao, Bing; Chen, Xian-cheng; Mel, Kai

    2006-07-01

    To establish a transformant of monokine induced by interferon-gamma (MIG) with the eukaryotic expression vector for further investigating the efficacy of its use in antitumor gene therapy. The MIG full-length cDNA was amplified from pBLAST-MIG and was cloned into the eukaryote expression vector pORF-mcs, and the resulted recombinant plasmid was named pORF-MIG. The recombinant pORF-MIG was determined with restriction enzyme and sequencing, and then it was transfected into COS-7 cells by Lipfectamin. Expression of the transformant was detected by immunoblot assay, and the bioactivity was assessed by chemotaxis assay. The restriction analysis and the sequence determination confirmed that the recombinant pORF-MIG contained the MIG full-length cDNA. And the transformants of pORF-MIG expressed the MIG protein which could apparently attract the activated lymphocytes. The recombinant pORF-MIG was constructed successfully, and this recombinant eukaryotic expression vector could be used in additional studies on the biological effect of MIG and its use in anti-tumor gene therapy.

  8. Snake Genome Sequencing: Results and Future Prospects

    PubMed Central

    Kerkkamp, Harald M. I.; Kini, R. Manjunatha; Pospelov, Alexey S.; Vonk, Freek J.; Henkel, Christiaan V.; Richardson, Michael K.

    2016-01-01

    Snake genome sequencing is in its infancy—very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression. PMID:27916957

  9. Comprehensive genome sequencing of the liver cancer genome.

    PubMed

    Nakagawa, Hidewaki; Shibata, Tatsuhiro

    2013-11-01

    Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related death worldwide. Recently, comprehensive whole genome and exome sequencing analyses for HCC revealed new cancer-associated genes and a variety of genomic alterations. In particular, frequent genetic alterations of the chromatin remodeling genes were observed, suggesting a new potential therapeutic target for HCC. Sequencing analysis has further identified the molecular complexities of multicentric lesions and intratumoral heterogeneity. Detailed analyses of the somatic substitution pattern of the cancer genome and the HBV virus genome integration sites by using whole-genome sequencing will elucidate the molecular basis and diverse etiological factors involved in liver cancer development.

  10. The Genome Sequencing Center at NCGR

    SciTech Connect

    Schilkey, Faye

    2010-06-02

    Faye Schilkey from the National Center for Genome Resources discusses NCGR's research, sequencing and analysis experience on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  11. Genome Sequence of Lactobacillus rhamnosus ATCC 8530

    PubMed Central

    Pittet, Vanessa; Ewen, Emily; Bushell, Barry R.

    2012-01-01

    Lactobacillus rhamnosus is found in the human gastrointestinal tract and is important for probiotics. We became interested in L. rhamnosus isolate ATCC 8530 in relation to beer spoilage and hops resistance. We report here the genome sequence of this isolate, along with a brief comparison to other available L. rhamnosus genome sequences. PMID:22247527

  12. Genome sequence of Lactobacillus rhamnosus ATCC 8530.

    PubMed

    Pittet, Vanessa; Ewen, Emily; Bushell, Barry R; Ziola, Barry

    2012-02-01

    Lactobacillus rhamnosus is found in the human gastrointestinal tract and is important for probiotics. We became interested in L. rhamnosus isolate ATCC 8530 in relation to beer spoilage and hops resistance. We report here the genome sequence of this isolate, along with a brief comparison to other available L. rhamnosus genome sequences.

  13. Towards a reference pecan genome sequence

    USDA-ARS?s Scientific Manuscript database

    The cost of generating DNA sequence data has declined dramatically over the previous 15 years as a result of the Human Genome Project and the potential applications of genome sequencing for human medicine. This cost reduction has generated renewed interest among crop breeding scientists in applying...

  14. Maize genome sequencing by methylation filtration.

    PubMed

    Palmer, Lance E; Rabinowicz, Pablo D; O'Shaughnessy, Andrew L; Balija, Vivekanand S; Nascimento, Lidia U; Dike, Sujit; de la Bastide, Melissa; Martienssen, Robert A; McCombie, W Richard

    2003-12-19

    Gene enrichment strategies offer an alternative to sequencing large and repetitive genomes such as that of maize. We report the generation and analysis of nearly 100,000 undermethylated (or methylation filtration) maize sequences. Comparison with the rice genome reveals that methylation filtration results in a more comprehensive representation of maize genes than those that result from expressed sequence tags or transposon insertion sites sequences. About 7% of the repetitive DNA is unmethylated and thus selected in our libraries, but potentially active transposons and unmethylated organelle genomes can be identified. Reverse transcription polymerase chain reaction can be used to finish the maize transcriptome.

  15. Human genome sequencing in health and disease.

    PubMed

    Gonzaga-Jauregui, Claudia; Lupski, James R; Gibbs, Richard A

    2012-01-01

    Following the "finished," euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges.

  16. Human Genome Sequencing in Health and Disease

    PubMed Central

    Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.

    2013-01-01

    Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320

  17. The genome sequence of parrot bornavirus 5.

    PubMed

    Guo, Jianhua; Tizard, Ian

    2015-12-01

    Although several new avian bornaviruses have recently been described, information on their evolution, virulence, and sequence are often limited. Here we report the complete genome sequence of parrot bornavirus 5 (PaBV-5) isolated from a case of proventricular dilatation disease in a Palm cockatoo (Probosciger aterrimus). The complete genome consists of 8842 nucleotides with distinct 5' and 3' end sequences. This virus shares nucleotide sequence identities of 69-74 % with other bornaviruses in the genomic regions excluding the 5' and 3' terminal sequences. Phylogenetic analysis based on the genomic regions demonstrated this new isolate is an isolated branch within the clade that includes the aquatic bird bornaviruses and the passerine bornaviruses. Based on phylogenetic analyses and its low nucleotide sequence identities with other bornavirus, we support the proposal that PaBV-5 be assigned to a new bornavirus species:- Psittaciform 2 bornavirus.

  18. GenomeCons: a web server for manipulating multiple genome sequence alignments and their consensus sequences.

    PubMed

    Sato, Tetsuya; Suyama, Mikita

    2015-04-15

    Genome sequence alignments provide valuable information on many aspects of molecular biological processes. In this study, we developed a web server, GenomeCons, for manipulating multiple genome sequence alignments and their consensus sequences for high-throughput genome sequence analyses. This server facilitates the visual inspection of multiple genome sequence alignments for a set of genomic intervals at a time. This allows the user to examine how these sites are evolutionarily conserved over time for their functional importance. The server also reports consensus sequences for the input genomic intervals, which can be applied to downstream analyses such as the identification of common motifs in the regions determined by ChIP-seq experiments. GenomeCons is freely accessible at http://bioinfo.sls.kyushu-u.ac.jp/genomecons/ mikita@bioreg.kyushu-u.ac.jp. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  19. Translational genomics for plant breeding with the genome sequence explosion.

    PubMed

    Kang, Yang Jae; Lee, Taeyoung; Lee, Jayern; Shim, Sangrea; Jeong, Haneul; Satyawan, Dani; Kim, Moon Young; Lee, Suk-Ha

    2016-04-01

    The use of next-generation sequencers and advanced genotyping technologies has propelled the field of plant genomics in model crops and plants and enhanced the discovery of hidden bridges between genotypes and phenotypes. The newly generated reference sequences of unstudied minor plants can be annotated by the knowledge of model plants via translational genomics approaches. Here, we reviewed the strategies of translational genomics and suggested perspectives on the current databases of genomic resources and the database structures of translated information on the new genome. As a draft picture of phenotypic annotation, translational genomics on newly sequenced plants will provide valuable assistance for breeders and researchers who are interested in genetic studies. © 2015 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  20. Genome Sequencing and Analysis Conference IV

    SciTech Connect

    Not Available

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  1. Completely phased genome sequencing through chromosome sorting

    PubMed Central

    Yang, Hong; Chen, Xi; Wong, Wing Hung

    2011-01-01

    The two haploid genome sequences that a person inherits from the two parents represent the most fundamentally useful type of genetic information for the study of heritable diseases and the development of personalized medicine. Because of the difficulty in obtaining long-range phase information, current sequencing methods are unable to provide this information. Here, we introduce and show feasibility of a scalable approach capable of generating genomic sequences completely phased across the entire chromosome. PMID:21169219

  2. Genomic sequencing of Pleistocene cave bears

    SciTech Connect

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  3. The genome sequence of Drosophila melanogaster.

    SciTech Connect

    2000-03-24

    The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the {approximately}120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes {approximately}13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

  4. The genome sequence of Drosophila melanogaster.

    PubMed

    Adams, M D; Celniker, S E; Holt, R A; Evans, C A; Gocayne, J D; Amanatides, P G; Scherer, S E; Li, P W; Hoskins, R A; Galle, R F; George, R A; Lewis, S E; Richards, S; Ashburner, M; Henderson, S N; Sutton, G G; Wortman, J R; Yandell, M D; Zhang, Q; Chen, L X; Brandon, R C; Rogers, Y H; Blazej, R G; Champe, M; Pfeiffer, B D; Wan, K H; Doyle, C; Baxter, E G; Helt, G; Nelson, C R; Gabor, G L; Abril, J F; Agbayani, A; An, H J; Andrews-Pfannkoch, C; Baldwin, D; Ballew, R M; Basu, A; Baxendale, J; Bayraktaroglu, L; Beasley, E M; Beeson, K Y; Benos, P V; Berman, B P; Bhandari, D; Bolshakov, S; Borkova, D; Botchan, M R; Bouck, J; Brokstein, P; Brottier, P; Burtis, K C; Busam, D A; Butler, H; Cadieu, E; Center, A; Chandra, I; Cherry, J M; Cawley, S; Dahlke, C; Davenport, L B; Davies, P; de Pablos, B; Delcher, A; Deng, Z; Mays, A D; Dew, I; Dietz, S M; Dodson, K; Doup, L E; Downes, M; Dugan-Rocha, S; Dunkov, B C; Dunn, P; Durbin, K J; Evangelista, C C; Ferraz, C; Ferriera, S; Fleischmann, W; Fosler, C; Gabrielian, A E; Garg, N S; Gelbart, W M; Glasser, K; Glodek, A; Gong, F; Gorrell, J H; Gu, Z; Guan, P; Harris, M; Harris, N L; Harvey, D; Heiman, T J; Hernandez, J R; Houck, J; Hostin, D; Houston, K A; Howland, T J; Wei, M H; Ibegwam, C; Jalali, M; Kalush, F; Karpen, G H; Ke, Z; Kennison, J A; Ketchum, K A; Kimmel, B E; Kodira, C D; Kraft, C; Kravitz, S; Kulp, D; Lai, Z; Lasko, P; Lei, Y; Levitsky, A A; Li, J; Li, Z; Liang, Y; Lin, X; Liu, X; Mattei, B; McIntosh, T C; McLeod, M P; McPherson, D; Merkulov, G; Milshina, N V; Mobarry, C; Morris, J; Moshrefi, A; Mount, S M; Moy, M; Murphy, B; Murphy, L; Muzny, D M; Nelson, D L; Nelson, D R; Nelson, K A; Nixon, K; Nusskern, D R; Pacleb, J M; Palazzolo, M; Pittman, G S; Pan, S; Pollard, J; Puri, V; Reese, M G; Reinert, K; Remington, K; Saunders, R D; Scheeler, F; Shen, H; Shue, B C; Sidén-Kiamos, I; Simpson, M; Skupski, M P; Smith, T; Spier, E; Spradling, A C; Stapleton, M; Strong, R; Sun, E; Svirskas, R; Tector, C; Turner, R; Venter, E; Wang, A H; Wang, X; Wang, Z Y; Wassarman, D A; Weinstock, G M; Weissenbach, J; Williams, S M; WoodageT; Worley, K C; Wu, D; Yang, S; Yao, Q A; Ye, J; Yeh, R F; Zaveri, J S; Zhan, M; Zhang, G; Zhao, Q; Zheng, L; Zheng, X H; Zhong, F N; Zhong, W; Zhou, X; Zhu, S; Zhu, X; Smith, H O; Gibbs, R A; Myers, E W; Rubin, G M; Venter, J C

    2000-03-24

    The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

  5. Cancer genome-sequencing study design.

    PubMed

    Mwenifumbo, Jill C; Marra, Marco A

    2013-05-01

    Discoveries from cancer genome sequencing have the potential to translate into advances in cancer prevention, diagnostics, prognostics, treatment and basic biology. Given the diversity of downstream applications, cancer genome-sequencing studies need to be designed to best fulfil specific aims. Knowledge of second-generation cancer genome-sequencing study design also facilitates assessment of the validity and importance of the rapidly growing number of published studies. In this Review, we focus on the practical application of second-generation sequencing technology (also known as next-generation sequencing) to cancer genomics and discuss how aspects of study design and methodological considerations - such as the size and composition of the discovery cohort - can be tailored to serve specific research aims.

  6. Strategies for complete plastid genome sequencing.

    PubMed

    Twyford, Alex D; Ness, Rob W

    2016-10-28

    Plastid sequencing is an essential tool in the study of plant evolution. This high-copy organelle is one of the most technically accessible regions of the genome, and its sequence conservation makes it a valuable region for comparative genome evolution, phylogenetic analysis and population studies. Here, we discuss recent innovations and approaches for de novo plastid assembly that harness genomic tools. We focus on technical developments including low-cost sequence library preparation approaches for genome skimming, enrichment via hybrid baits and methylation-sensitive capture, sequence platforms with higher read outputs and longer read lengths, and automated tools for assembly. These developments allow for a much more streamlined assembly than via conventional short-range PCR. Although newer methods make complete plastid sequencing possible for any land plant or green alga, there are still challenges for producing finished plastomes particularly from herbarium material or from structurally divergent plastids such as those of parasitic plants.

  7. Coupled amplification and sequencing of genomic DNA.

    PubMed Central

    Ruano, G; Kidd, K K

    1991-01-01

    Addition of dideoxyribonucleotides during the exponential phase of the PCR should result in the synthesis of two complementary sequence ladders. We have explored this hypothesis to develop coupled amplification and sequencing of genomic DNA. Coupled amplification and sequencing is a biphasic method for sequencing both strands of template as they are amplified. Stage I selects and amplifies a single target from the genomic DNA sample. Stage II accomplishes the sequencing as well as additional amplification of the target using aliquots from the stage I reaction mixed with end-labeled primer and dideoxynucleotides. We have successfully applied coupled amplification and sequencing to a 300-base-pair fragment 4 kilobases upstream from HOX2B directly from human whole genomic DNA. Images PMID:1672768

  8. Sequencing and Analysis of Neanderthal Genomic DNA

    PubMed Central

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith, Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Pääbo, Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2008-01-01

    Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence indicate that the 65,250 base pairs of hominid sequence so far identified in the library are of Neanderthal origin, the strongest being the ascertainment of sequence identities between Neanderthal and chimpanzee at sites where the human genomic sequence is different. These results enabled us to calculate the human-Neanderthal divergence time based on multiple randomly distributed autosomal loci. Our analyses suggest that on average the Neanderthal genomic sequence we obtained and the reference human genome sequence share a most recent common ancestor ~706,000 years ago, and that the human and Neanderthal ancestral populations split ~370,000 years ago, before the emergence of anatomically modern humans. Our finding that the Neanderthal and human genomes are at least 99.5% identical led us to develop and successfully implement a targeted method for recovering specific ancient DNA sequences from metagenomic libraries. This initial analysis of the Neanderthal genome advances our understanding of the evolutionary relationship of Homo sapiens and Homo neanderthalensis and signifies the dawn of Neanderthal genomics. PMID:17110569

  9. Microbial species delineation using whole genome sequences

    SciTech Connect

    Kyrpides, Nikos; Mukherjee, Supratim; Ivanova, Natalia; Mavrommatics, Kostas; Pati, Amrita; Konstantinidis, Konstantinos

    2014-10-20

    Species assignments in prokaryotes use a manual, poly-phasic approach utilizing both phenotypic traits and sequence information of phylogenetic marker genes. With thousands of genomes being sequenced every year, an automated, uniform and scalable approach exploiting the rich genomic information in whole genome sequences is desired, at least for the initial assignment of species to an organism. We have evaluated pairwise genome-wide Average Nucleotide Identity (gANI) values and alignment fractions (AFs) for nearly 13,000 genomes using our fast implementation of the computation, identifying robust and widely applicable hard cut-offs for species assignments based on AF and gANI. Using these cutoffs, we generated stable species-level clusters of organisms, which enabled the identification of several species mis-assignments and facilitated the assignment of species for organisms without species definitions.

  10. Genome sequence of Coxiella burnetii strain Namibia

    PubMed Central

    2014-01-01

    We present the whole genome sequence and annotation of the Coxiella burnetii strain Namibia. This strain was isolated from an aborting goat in 1991 in Windhoek, Namibia. The plasmid type QpRS was confirmed in our work. Further genomic typing placed the strain into a unique genomic group. The genome sequence is 2,101,438 bp long and contains 1,979 protein-coding and 51 RNA genes, including one rRNA operon. To overcome the poor yield from cell culture systems, an additional DNA enrichment with whole genome amplification (WGA) methods was applied. We describe a bioinformatics pipeline for improved genome assembly including several filters with a special focus on WGA characteristics. PMID:25593636

  11. First Complete Genome Sequence of Corynebacterium riegelii

    PubMed Central

    Greninger, Alexander L.; Streithorst, Jessica

    2017-01-01

    ABSTRACT Here, we report the first complete genome sequence of Corynebacterium riegelii strain PUDD_83A45, isolated from the urine of a patient with urinary tract infection. The genome measured 2.56 Mb and contained no plasmid. PMID:28360160

  12. Multiple Genome Sequences of Lactobacillus plantarum Strains.

    PubMed

    Kafka, Thomas A; Geissler, Andreas J; Vogel, Rudi F

    2017-07-20

    We report here the genome sequences of four Lactobacillus plantarum strains which vary in surface hydrophobicity. Bioinformatic analysis, using additional genomes of Lactobacillus plantarum strains, revealed a possible correlation between the cell wall teichoic acid-type and cell surface hydrophobicity and provide the basis for consecutive analyses. Copyright © 2017 Kafka et al.

  13. Multiple Genome Sequences of Lactobacillus plantarum Strains

    PubMed Central

    Kafka, Thomas A.; Geissler, Andreas J.

    2017-01-01

    ABSTRACT We report here the genome sequences of four Lactobacillus plantarum strains which vary in surface hydrophobicity. Bioinformatic analysis, using additional genomes of Lactobacillus plantarum strains, revealed a possible correlation between the cell wall teichoic acid-type and cell surface hydrophobicity and provide the basis for consecutive analyses. PMID:28729269

  14. Virtually sequenced: The next genomic generation

    SciTech Connect

    Bains, W.

    1996-06-01

    The announcement of {open_quotes}virtual genomics{close_quotes} requires evaluation of the efficiency and accuracy of computer-generated sequencing efforts. {open_quotes}Digital Northerns{close_quotes}, or Northern blot electrophoresis done in the realm of computer data, have been developed by Incyte Pharmaceuticals (Palo Alto, CA) and Human Genome Sciences (Rockville, MD). 12 refs., 2 figs.

  15. Refined Pichia pastoris reference genome sequence.

    PubMed

    Sturmberger, Lukas; Chappell, Thomas; Geier, Martina; Krainer, Florian; Day, Kasey J; Vide, Ursa; Trstenjak, Sara; Schiefer, Anja; Richardson, Toby; Soriaga, Leah; Darnhofer, Barbara; Birner-Gruenberger, Ruth; Glick, Benjamin S; Tolstorukov, Ilya; Cregg, James; Madden, Knut; Glieder, Anton

    2016-10-10

    Strains of the species Komagataella phaffii are the most frequently used "Pichia pastoris" strains employed for recombinant protein production as well as studies on peroxisome biogenesis, autophagy and secretory pathway analyses. Genome sequencing of several different P. pastoris strains has provided the foundation for understanding these cellular functions in recent genomics, transcriptomics and proteomics experiments. This experimentation has identified mistakes, gaps and incorrectly annotated open reading frames in the previously published draft genome sequences. Here, a refined reference genome is presented, generated with genome and transcriptome sequencing data from multiple P. pastoris strains. Twelve major sequence gaps from 20 to 6000 base pairs were closed and 5111 out of 5256 putative open reading frames were manually curated and confirmed by RNA-seq and published LC-MS/MS data, including the addition of new open reading frames (ORFs) and a reduction in the number of spliced genes from 797 to 571. One chromosomal fragment of 76kbp between two previous gaps on chromosome 1 and another 134kbp fragment at the end of chromosome 4, as well as several shorter fragments needed re-orientation. In total more than 500 positions in the genome have been corrected. This reference genome is presented with new chromosomal numbering, positioning ribosomal repeats at the distal ends of the four chromosomes, and includes predicted chromosomal centromeres as well as the sequence of two linear cytoplasmic plasmids of 13.1 and 9.5kbp found in some strains of P. pastoris.

  16. Genome sequence and analysis of Lactobacillus helveticus

    PubMed Central

    Cremonesi, Paola; Chessa, Stefania; Castiglioni, Bianca

    2013-01-01

    The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of Lactobacillus helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE) inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract. As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones. PMID:23335916

  17. Sequencing and comparing whole mitochondrial genomes ofanimals

    SciTech Connect

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  18. Genomic sequencing of Pleistocene cave bears.

    PubMed

    Noonan, James P; Hofreiter, Michael; Smith, Doug; Priest, James R; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J Chris; Pääbo, Svante; Rubin, Edward M

    2005-07-22

    Despite the greater information content of genomic DNA, ancient DNA studies have largely been limited to the amplification of mitochondrial sequences. Here we describe metagenomic libraries constructed with unamplified DNA extracted from skeletal remains of two 40,000-year-old extinct cave bears. Analysis of approximately 1 megabase of sequence from each library showed that despite significant microbial contamination, 5.8 and 1.1% of clones contained cave bear inserts, yielding 26,861 base pairs of cave bear genome sequence. Comparison of cave bear and modern bear sequences revealed the evolutionary relationship of these lineages. The metagenomic approach used here establishes the feasibility of ancient DNA genome sequencing programs.

  19. Deep whole-genome sequencing of 90 Han Chinese genomes.

    PubMed

    Lan, Tianming; Lin, Haoxiang; Zhu, Wenjuan; Laurent, Tellier Christian Asker Melchior; Yang, Mengcheng; Liu, Xin; Wang, Jun; Wang, Jian; Yang, Huanming; Xu, Xun; Guo, Xiaosen

    2017-09-01

    Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency < 5%), including 5 813 503 single nucleotide polymorphisms, 1 169 199 InDels, and 17 927 structural variants. Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the

  20. From sequence mapping to genome assemblies.

    PubMed

    Otto, Thomas D

    2015-01-01

    The development of "next-generation" high-throughput sequencing technologies has made it possible for many labs to undertake sequencing-based research projects that were unthinkable just a few years ago. Although the scientific applications are diverse, e.g., new genome projects, gene expression analysis, genome-wide functional screens, or epigenetics-the sequence data are usually processed in one of two ways: sequence reads are either mapped to an existing reference sequence, or they are built into a new sequence ("de novo assembly"). In this chapter, we first discuss some limitations of the mapping process and how these may be overcome through local sequence assembly. We then introduce the concept of de novo assembly and describe essential assembly improvement procedures such as scaffolding, contig ordering, gap closure, error evaluation, gene annotation transfer and ab initio gene annotation. The results are high-quality draft assemblies that will facilitate informative downstream analyses.

  1. Direct determination of diploid genome sequences

    PubMed Central

    Weisenfeld, Neil I.; Kumar, Vijay; Shah, Preyas; Church, Deanna M.; Jaffe, David B.

    2017-01-01

    Determining the genome sequence of an organism is challenging, yet fundamental to understanding its biology. Over the past decade, thousands of human genomes have been sequenced, contributing deeply to biomedical research. In the vast majority of cases, these have been analyzed by aligning sequence reads to a single reference genome, biasing the resulting analyses, and in general, failing to capture sequences novel to a given genome. Some de novo assemblies have been constructed free of reference bias, but nearly all were constructed by merging homologous loci into single “consensus” sequences, generally absent from nature. These assemblies do not correctly represent the diploid biology of an individual. In exactly two cases, true diploid de novo assemblies have been made, at great expense. One was generated using Sanger sequencing, and one using thousands of clone pools. Here, we demonstrate a straightforward and low-cost method for creating true diploid de novo assemblies. We make a single library from ∼1 ng of high molecular weight DNA, using the 10x Genomics microfluidic platform to partition the genome. We applied this technique to seven human samples, generating low-cost HiSeq X data, then assembled these using a new “pushbutton” algorithm, Supernova. Each computation took 2 d on a single server. Each yielded contigs longer than 100 kb, phase blocks longer than 2.5 Mb, and scaffolds longer than 15 Mb. Our method provides a scalable capability for determining the actual diploid genome sequence in a sample, opening the door to new approaches in genomic biology and medicine. PMID:28381613

  2. My Identical Twin Sequenced our Genome.

    PubMed

    Schilit, Samantha L P; Schilit Nitenson, Arielle

    2017-04-01

    With rapidly declining costs, whole genome sequencing is becoming feasible for widespread use. Although cost-effectiveness is driving increased use of the technology, comprehensive recommendations on how to handle ethical dilemmas have yet to reach a consensus. In this article, Sam shares her experience of undergoing whole genome sequencing. Despite the deeply private nature of the test, the results do not solely belong to Sam; her identical twin sister, Arielle, shares virtually the same genome and received results without a formal consent process. This article explores their parallel experiences as a way of highlighting the controversial ethics of a private test with familial implications.

  3. Harnessing Whole Genome Sequencing in Medical Mycology.

    PubMed

    Cuomo, Christina A

    2017-01-01

    Comparative genome sequencing studies of human fungal pathogens enable identification of genes and variants associated with virulence and drug resistance. This review describes current approaches, resources, and advances in applying whole genome sequencing to study clinically important fungal pathogens. Genomes for some important fungal pathogens were only recently assembled, revealing gene family expansions in many species and extreme gene loss in one obligate species. The scale and scope of species sequenced is rapidly expanding, leveraging technological advances to assemble and annotate genomes with higher precision. By using iteratively improved reference assemblies or those generated de novo for new species, recent studies have compared the sequence of isolates representing populations or clinical cohorts. Whole genome approaches provide the resolution necessary for comparison of closely related isolates, for example, in the analysis of outbreaks or sampled across time within a single host. Genomic analysis of fungal pathogens has enabled both basic research and diagnostic studies. The increased scale of sequencing can be applied across populations, and new metagenomic methods allow direct analysis of complex samples.

  4. Complete Genome Sequencing of Trivittatus virus

    PubMed Central

    Groseth, Allison; Vine, Veronica; Weisend, Carla; Ebihara, Hideki

    2015-01-01

    Trivittatus virus (family Bunyaviridae, genus Orthobunyavirus) represents an important genetic intermediate between the California encephalitis group, and Bwamba/Pongola and Nyando groups. Here, we report the first complete genome sequence of the prototype (Eklund) strain, isolated in 1948, which interestingly shows only few differences compared to partial sequences of modern strains. PMID:26212363

  5. Complete Genome Sequence of Lleida Bat Lyssavirus

    PubMed Central

    Marston, Denise A.; Ellis, Richard J.; Wise, Emma L.; Aréchiga-Ceballos, Nidia; Freuling, Conrad M.; Banyard, Ashley C.; McElhinney, Lorraine M.; de Lamballerie, Xavier; Müller, Thomas; Echevarría, Juan E.

    2017-01-01

    ABSTRACT All lyssaviruses (family Rhabdoviridae) cause the disease rabies, an acute progressive encephalitis for which, once symptoms occur, there is no effective cure. Using next-generation sequencing, the full-genome sequence for a novel lyssavirus, Lleida bat lyssavirus (LLEBV), from the original brain of a common bent-winged bat has been confirmed. PMID:28082487

  6. Genome Sequence of Pseudomonas chlororaphis Strain 189.

    PubMed

    Town, Jennifer; Audy, Patrice; Boyetchko, Susan M; Dumonceaux, Tim J

    2016-06-23

    Pseudomonas chlororaphis strain 189 is a potent inhibitor of the growth of the potato pathogen Phytophthora infestans We determined the complete, finished sequence of the 6.8-Mbp genome of this strain, consisting of a single contiguous molecule. Strain 189 is closely related to previously sequenced strains of P. chlororaphis. Copyright © 2016 Town et al.

  7. Draft Genome Sequences of Elizabethkingia meningoseptica

    PubMed Central

    Matyi, Stephanie A.; Hoyt, Peter R.; Hosoyama, Akira; Yamazoe, Atsushi; Fujita, Nobuyuki

    2013-01-01

    Elizabethkingia meningoseptica is ubiquitous in nature, exhibits a multiple-antibiotic resistance phenotype, and causes rare opportunistic infections. We now report two draft genome sequences of E. meningoseptica type strains that were sequenced independently in two laboratories. PMID:23846266

  8. Whole-genome sequencing for comparative genomics and de novo genome assembly.

    PubMed

    Benjak, Andrej; Sala, Claudia; Hartkoorn, Ruben C

    2015-01-01

    Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).

  9. Sequence composition and genome organization of maize

    PubMed Central

    Messing, Joachim; Bharti, Arvind K.; Karlowski, Wojciech M.; Gundlach, Heidrun; Kim, Hye Ran; Yu, Yeisoo; Wei, Fusheng; Fuks, Galina; Soderlund, Carol A.; Mayer, Klaus F. X.; Wing, Rod A.

    2004-01-01

    Zea mays L. ssp. mays, or corn, one of the most important crops and a model for plant genetics, has a genome ≈80% the size of the human genome. To gain global insight into the organization of its genome, we have sequenced the ends of large insert clones, yielding a cumulative length of one-eighth of the genome with a DNA sequence read every 6.2 kb, thereby describing a large percentage of the genes and transposable elements of maize in an unbiased approach. Based on the accumulative 307 Mb of sequence, repeat sequences occupy 58% and genic regions occupy 7.5%. A conservative estimate predicts ≈59,000 genes, which is higher than in any other organism sequenced so far. Because the sequences are derived from bacterial artificial chromosome clones, which are ordered in overlapping bins, tagged genes are also ordered along continuous chromosomal segments. Based on this positional information, roughly one-third of the genes appear to consist of tandemly arrayed gene families. Although the ancestor of maize arose by tetraploidization, fewer than half of the genes appear to be present in two orthologous copies, indicating that the maize genome has undergone significant gene loss since the duplication event. PMID:15388850

  10. Biodiversity, genomes, and DNA sequence databases.

    PubMed

    Leipe, D D

    1996-12-01

    There are approximately 1.4 million organisms on this planet that have been described morphologically but there is no comparable coverage of biodiversity at the molecular level. Little more than 1% of the known species have been subject to any molecular scrutiny and eukaryotic genome projects have focused on a group of closely related model organisms. The past year, however, has seen an approximately 80% increase in the number of species represented in sequence databases and the completion of the sequencing of three prokaryotic genomes. Large-scale sequencing projects seem set to begin coverage of a wider range of the eukaryotic diversity, including green plants, microsporidians and diplomonads.

  11. Comparison of 61 Sequenced Escherichia coli Genomes

    PubMed Central

    Lukjancenko, Oksana; Wassenaar, Trudy M.

    2010-01-01

    Escherichia coli is an important component of the biosphere and is an ideal model for studies of processes involved in bacterial genome evolution. Sixty-one publically available E. coli and Shigella spp. sequenced genomes are compared, using basic methods to produce phylogenetic and proteomics trees, and to identify the pan- and core genomes of this set of sequenced strains. A hierarchical clustering of variable genes allowed clear separation of the strains into clusters, including known pathotypes; clinically relevant serotypes can also be resolved in this way. In contrast, when in silico MLST was performed, many of the various strains appear jumbled and less well resolved. The predicted pan-genome comprises 15,741 gene families, and only 993 (6%) of the families are represented in every genome, comprising the core genome. The variable or ‘accessory’ genes thus make up more than 90% of the pan-genome and about 80% of a typical genome; some of these variable genes tend to be co-localized on genomic islands. The diversity within the species E. coli, and the overlap in gene content between this and related species, suggests a continuum rather than sharp species borders in this group of Enterobacteriaceae. PMID:20623278

  12. Computational Genomics: From Genome Sequence To Global Gene Regulation

    NASA Astrophysics Data System (ADS)

    Li, Hao

    2000-03-01

    As various genome projects are shifting to the post-sequencing phase, it becomes a big challenge to analyze the sequence data and extract biological information using computational tools. In the past, computational genomics has mainly focused on finding new genes and mapping out their biological functions. With the rapid accumulation of experimental data on genome-wide gene activities, it is now possible to understand how genes are regulated on a genomic scale. A major mechanism for gene regulation is to control the level of transcription, which is achieved by regulatory proteins that bind to short DNA sequences - the regulatory elements. We have developed a new approach to identifying regulatory elements in genomes. The approach formalizes how one would proceed to decipher a ``text'' consisting of a long string of letters written in an unknown language that did not delineate words. The algorithm is based on a statistical mechanics model in which the sequence is segmented probabilistically into ``words'' and a ``dictionary'' of ``words'' is built concurrently. For the control regions in the yeast genome, we built a ``dictionary'' of about one thousand words which includes many known as well as putative regulatory elements. I will discuss how we can use this dictionary to search for genes that are likely to be regulated in a similar fashion and to analyze gene expression data generated from DNA micro-array experiments.

  13. Genome Sequence of Yersinia pestis KIM†

    PubMed Central

    Deng, Wen; Burland, Valerie; Plunkett III, Guy; Boutin, Adam; Mayhew, George F.; Liss, Paul; Perna, Nicole T.; Rose, Debra J.; Mau, Bob; Zhou, Shiguo; Schwartz, David C.; Fetherston, Jaqueline D.; Lindler, Luther E.; Brubaker, Robert R.; Plano, Gregory V.; Straley, Susan C.; McDonough, Kathleen A.; Nilles, Matthew L.; Matson, Jyl S.; Blattner, Frederick R.; Perry, Robert D.

    2002-01-01

    We present the complete genome sequence of Yersinia pestis KIM, the etiologic agent of bubonic and pneumonic plague. The strain KIM, biovar Mediaevalis, is associated with the second pandemic, including the Black Death. The 4.6-Mb genome encodes 4,198 open reading frames (ORFs). The origin, terminus, and most genes encoding DNA replication proteins are similar to those of Escherichia coli K-12. The KIM genome sequence was compared with that of Y. pestis CO92, biovar Orientalis, revealing homologous sequences but a remarkable amount of genome rearrangement for strains so closely related. The differences appear to result from multiple inversions of genome segments at insertion sequences, in a manner consistent with present knowledge of replication and recombination. There are few differences attributable to horizontal transfer. The KIM and E. coli K-12 genome proteins were also compared, exposing surprising amounts of locally colinear “backbone,” or synteny, that is not discernible at the nucleotide level. Nearly 54% of KIM ORFs are significantly similar to K-12 proteins, with conserved housekeeping functions. However, a number of E. coli pathways and transport systems and at least one global regulator were not found, reflecting differences in lifestyle between them. In KIM-specific islands, new genes encode candidate pathogenicity proteins, including iron transport systems, putative adhesins, toxins, and fimbriae. PMID:12142430

  14. Genome Sequence of the Palaeopolyploid soybean

    SciTech Connect

    Schmutz, Jeremy; Cannon, Steven B.; Schlueter, Jessica; Ma, Jianxin; Mitros, Therese; Nelson, William; Hyten, David L.; Song, Qijian; Thelen, Jay J.; Cheng, Jianlin; Xu, Dong; Hellsten, Uffe; May, Gregory D.; Yu, Yeisoo; Sakura, Tetsuya; Umezawa, Taishi; Bhattacharyya, Madan K.; Sandhu, Devinder; Valliyodan, Babu; Lindquist, Erika; Peto, Myron; Grant, David; Shu, Shengqiang; Goodstein, David; Barry, Kerrie; Futrell-Griggs, Montona; Abernathy, Brian; Du, Jianchang; Tian, Zhixi; Zhu, Liucun; Gill, Navdeep; Joshi, Trupti; Libault, Marc; Sethuraman, Anand; Zhang, Xue-Cheng; Shinozaki, Kazuo; Nguyen, Henry T.; Wing, Rod A.; Cregan, Perry; Specht, James; Grimwood, Jane; Rokhsar, Dan; Stacey, Gary; Shoemaker, Randy C.; Jackson, Scott A.

    2009-08-03

    Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70percent more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78percent of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75percent of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

  15. Rhipicephalus (Boophilus) microplus strain Deutsch, whole genome shotgun sequencing project first submission of genome sequence

    USDA-ARS?s Scientific Manuscript database

    The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence difficult. Cot filtration/selection techniques were used to reduce the repetitive fraction of the tick genome and enrich for the fraction of DNA with gene-containing regions. The Cot-selected ...

  16. Genome sequence of Phytophthora ramorum: implications for management

    Treesearch

    Brett Tyler; Sucheta Tripathy; Nik Grunwald; Kurt Lamour; Kelly Ivors; Matteo Garbelotto; Daniel Rokhsar; Nik Putnam; Igor Grigoriev; Jeffrey Boore

    2006-01-01

    A draft genome sequence has been determined for Phytophthora ramorum, together with a draft sequence of the soybean pathogen Phytophthora sojae. The P. ramorum genome was sequenced to a depth of 7-fold coverage, while the P. sojae genome was sequenced to a depth of 9-fold coverage. The genome...

  17. Genome Update. Let the consumer beware: Streptomyces genome sequence quality.

    PubMed

    Studholme, David J

    2016-01-01

    A genome sequence assembly represents a model of a genome. This article explores some tools and methods for assessing the quality of an assembly, using publicly available data for Streptomyces species as the example. There is great variability in quality of assemblies deposited in GenBank. Only in a small minority of these assemblies are the raw data available, enabling full appraisal of the assembly quality.

  18. Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    PubMed

    Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.

  19. Sequencing and comparative analysis of the gorilla MHC genomic sequence

    PubMed Central

    Wilming, Laurens G.; Hart, Elizabeth A.; Coggill, Penny C.; Horton, Roger; Gilbert, James G. R.; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L.

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC. PMID:23589541

  20. Accelerating Genome Sequencing 100X with FPGAs

    SciTech Connect

    Storaasli, Olaf O; Strenski, Dave

    2007-01-01

    The performance of two Cray XD1 systems with Virtex-II Pro 50 and Virtex-4 LX160 FPGAs was evaluated using the FASTA computational biology program for human genome (DNA and protein) sequence comparisons. FPGA speedups of 50X (Virtex-II Pro 50) and 100X (Virtex-4 LX160) over a 2.2 GHz Opteron were obtained. FPGA coding issues for human genome data are described.

  1. Genomic Sequence Variation Markup Language (GSVML).

    PubMed

    Nakaya, Jun; Kimura, Michio; Hiroi, Kaei; Ido, Keisuke; Yang, Woosung; Tanaka, Hiroshi

    2010-02-01

    With the aim of making good use of internationally accumulated genomic sequence variation data, which is increasing rapidly due to the explosive amount of genomic research at present, the development of an interoperable data exchange format and its international standardization are necessary. Genomic Sequence Variation Markup Language (GSVML) will focus on genomic sequence variation data and human health applications, such as gene based medicine or pharmacogenomics. We developed GSVML through eight steps, based on case analysis and domain investigations. By focusing on the design scope to human health applications and genomic sequence variation, we attempted to eliminate ambiguity and to ensure practicability. We intended to satisfy the requirements derived from the use case analysis of human-based clinical genomic applications. Based on database investigations, we attempted to minimize the redundancy of the data format, while maximizing the data covering range. We also attempted to ensure communication and interface ability with other Markup Languages, for exchange of omics data among various omics researchers or facilities. The interface ability with developing clinical standards, such as the Health Level Seven Genotype Information model, was analyzed. We developed the human health-oriented GSVML comprising variation data, direct annotation, and indirect annotation categories; the variation data category is required, while the direct and indirect annotation categories are optional. The annotation categories contain omics and clinical information, and have internal relationships. For designing, we examined 6 cases for three criteria as human health application and 15 data elements for three criteria as data formats for genomic sequence variation data exchange. The data format of five international SNP databases and six Markup Languages and the interface ability to the Health Level Seven Genotype Model in terms of 317 items were investigated. GSVML was developed as

  2. Sorghum Genome Sequencing by Methylation Filtration

    PubMed Central

    Budiman, Muhammad A; Nunberg, Andrew; Citek, Robert W; Robbins, Dan; Jones, Joshua; Flick, Elizabeth; Rohlfing, Theresa; Fries, Jason; Bradford, Kourtney; McMenamy, Jennifer; Smith, Michael; Holeman, Heather; Roe, Bruce A; Wiley, Graham; Korf, Ian F; Rabinowicz, Pablo D; Lakey, Nathan; McCombie, W. Richard; Jeddeloh, Jeffrey A; Martienssen, Robert A

    2005-01-01

    Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF) technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis. PMID:15660154

  3. Sorghum genome sequencing by methylation filtration.

    PubMed

    Bedell, Joseph A; Budiman, Muhammad A; Nunberg, Andrew; Citek, Robert W; Robbins, Dan; Jones, Joshua; Flick, Elizabeth; Rholfing, Theresa; Fries, Jason; Bradford, Kourtney; McMenamy, Jennifer; Smith, Michael; Holeman, Heather; Roe, Bruce A; Wiley, Graham; Korf, Ian F; Rabinowicz, Pablo D; Lakey, Nathan; McCombie, W Richard; Jeddeloh, Jeffrey A; Martienssen, Robert A

    2005-01-01

    Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF) technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis.

  4. Mig-6 controls EGFR trafficking and suppresses gliomagenesis.

    PubMed

    Ying, Haoqiang; Zheng, Hongwu; Scott, Kenneth; Wiedemeyer, Ruprecht; Yan, Haiyan; Lim, Carol; Huang, Joseph; Dhakal, Sabin; Ivanova, Elena; Xiao, Yonghong; Zhang, Hailei; Hu, Jian; Stommel, Jayne M; Lee, Michelle A; Chen, An-Jou; Paik, Ji-Hye; Segatto, Oreste; Brennan, Cameron; Elferink, Lisa A; Wang, Y Alan; Chin, Lynda; DePinho, Ronald A

    2010-04-13

    Glioblastoma multiforme (GBM) is the most common and lethal primary brain cancer that is driven by aberrant signaling of growth factor receptors, particularly the epidermal growth factor receptor (EGFR). EGFR signaling is tightly regulated by receptor endocytosis and lysosome-mediated degradation, although the molecular mechanisms governing such regulation, particularly in the context of cancer, remain poorly delineated. Here, high-resolution genomic profiles of GBM identified a highly recurrent focal 1p36 deletion encompassing the putative tumor suppressor gene, Mig-6. We show that Mig-6 quells the malignant potential of GBM cells and dampens EGFR signaling by driving EGFR into late endosomes and lysosome-mediated degradation upon ligand stimulation. Mechanistically, this effect is mediated by the binding of Mig-6 to a SNARE protein STX8, a protein known to be required for late endosome trafficking. Thus, Mig-6 functions to ensure recruitment of internalized receptor to late endosomes and subsequently the lysosomal degradation compartment through its ability to specifically link EGFR and STX8 during ligand-stimulated EGFR trafficking. In GBM, the highly frequent loss of Mig-6 would therefore serve to sustain aberrant EGFR-mediated oncogenic signaling. Together, these data uncover a unique tumor suppression mechanism involving the regulation of receptor trafficking.

  5. Mig-6 controls EGFR trafficking and suppresses gliomagenesis

    PubMed Central

    Ying, Haoqiang; Zheng, Hongwu; Scott, Kenneth; Wiedemeyer, Ruprecht; Yan, Haiyan; Lim, Carol; Huang, Joseph; Dhakal, Sabin; Ivanova, Elena; Xiao, Yonghong; Zhang, Hailei; Hu, Jian; Stommel, Jayne M.; Lee, Michelle A.; Chen, An-Jou; Paik, Ji-Hye; Segatto, Oreste; Brennan, Cameron; Elferink, Lisa A.; Wang, Y. Alan; Chin, Lynda; DePinho, Ronald A.

    2010-01-01

    Glioblastoma multiforme (GBM) is the most common and lethal primary brain cancer that is driven by aberrant signaling of growth factor receptors, particularly the epidermal growth factor receptor (EGFR). EGFR signaling is tightly regulated by receptor endocytosis and lysosome-mediated degradation, although the molecular mechanisms governing such regulation, particularly in the context of cancer, remain poorly delineated. Here, high-resolution genomic profiles of GBM identified a highly recurrent focal 1p36 deletion encompassing the putative tumor suppressor gene, Mig-6. We show that Mig-6 quells the malignant potential of GBM cells and dampens EGFR signaling by driving EGFR into late endosomes and lysosome-mediated degradation upon ligand stimulation. Mechanistically, this effect is mediated by the binding of Mig-6 to a SNARE protein STX8, a protein known to be required for late endosome trafficking. Thus, Mig-6 functions to ensure recruitment of internalized receptor to late endosomes and subsequently the lysosomal degradation compartment through its ability to specifically link EGFR and STX8 during ligand-stimulated EGFR trafficking. In GBM, the highly frequent loss of Mig-6 would therefore serve to sustain aberrant EGFR-mediated oncogenic signaling. Together, these data uncover a unique tumor suppression mechanism involving the regulation of receptor trafficking. PMID:20351267

  6. Complete genome sequence of Caulobacter crescentus

    PubMed Central

    Nierman, William C.; Feldblyum, Tamara V.; Laub, Michael T.; Paulsen, Ian T.; Nelson, Karen E.; Eisen, Jonathan; Heidelberg, John F.; Alley, M. R. K.; Ohta, Noriko; Maddock, Janine R.; Potocka, Isabel; Nelson, William C.; Newton, Austin; Stephens, Craig; Phadke, Nikhil D.; Ely, Bert; DeBoy, Robert T.; Dodson, Robert J.; Durkin, A. Scott; Gwinn, Michelle L.; Haft, Daniel H.; Kolonay, James F.; Smit, John; Craven, M. B.; Khouri, Hoda; Shetty, Jyoti; Berry, Kristi; Utterback, Teresa; Tran, Kevin; Wolf, Alex; Vamathevan, Jessica; Ermolaeva, Maria; White, Owen; Salzberg, Steven L.; Venter, J. Craig; Shapiro, Lucy; Fraser, Claire M.

    2001-01-01

    The complete genome sequence of Caulobacter crescentus was determined to be 4,016,942 base pairs in a single circular chromosome encoding 3,767 genes. This organism, which grows in a dilute aquatic environment, coordinates the cell division cycle and multiple cell differentiation events. With the annotated genome sequence, a full description of the genetic network that controls bacterial differentiation, cell growth, and cell cycle progression is within reach. Two-component signal transduction proteins are known to play a significant role in cell cycle progression. Genome analysis revealed that the C. crescentus genome encodes a significantly higher number of these signaling proteins (105) than any bacterial genome sequenced thus far. Another regulatory mechanism involved in cell cycle progression is DNA methylation. The occurrence of the recognition sequence for an essential DNA methylating enzyme that is required for cell cycle regulation is severely limited and shows a bias to intergenic regions. The genome contains multiple clusters of genes encoding proteins essential for survival in a nutrient poor habitat. Included are those involved in chemotaxis, outer membrane channel function, degradation of aromatic ring compounds, and the breakdown of plant-derived carbon sources, in addition to many extracytoplasmic function sigma factors, providing the organism with the ability to respond to a wide range of environmental fluctuations. C. crescentus is, to our knowledge, the first free-living α-class proteobacterium to be sequenced and will serve as a foundation for exploring the biology of this group of bacteria, which includes the obligate endosymbiont and human pathogen Rickettsia prowazekii, the plant pathogen Agrobacterium tumefaciens, and the bovine and human pathogen Brucella abortus. PMID:11259647

  7. Mapping and sequencing the human genome

    SciTech Connect

    1988-01-01

    Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.

  8. Mapping and Sequencing the Human Genome

    DOE R&D Accomplishments Database

    1988-01-01

    Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.

  9. The complete genome sequence of Mycobacterium bovis

    PubMed Central

    Garnier, Thierry; Eiglmeier, Karin; Camus, Jean-Christophe; Medina, Nadine; Mansoor, Huma; Pryor, Melinda; Duthoy, Stephanie; Grondin, Sophie; Lacroix, Celine; Monsempe, Christel; Simon, Sylvie; Harris, Barbara; Atkin, Rebecca; Doggett, Jon; Mayes, Rebecca; Keating, Lisa; Wheeler, Paul R.; Parkhill, Julian; Barrell, Bart G.; Cole, Stewart T.; Gordon, Stephen V.; Hewinson, R. Glyn

    2003-01-01

    Mycobacterium bovis is the causative agent of tuberculosis in a range of animal species and man, with worldwide annual losses to agriculture of $3 billion. The human burden of tuberculosis caused by the bovine tubercle bacillus is still largely unknown. M. bovis was also the progenitor for the M. bovis bacillus Calmette–Guérin vaccine strain, the most widely used human vaccine. Here we describe the 4,345,492-bp genome sequence of M. bovis AF2122/97 and its comparison with the genomes of Mycobacterium tuberculosis and Mycobacterium leprae. Strikingly, the genome sequence of M. bovis is >99.95% identical to that of M. tuberculosis, but deletion of genetic information has led to a reduced genome size. Comparison with M. leprae reveals a number of common gene losses, suggesting the removal of functional redundancy. Cell wall components and secreted proteins show the greatest variation, indicating their potential role in host–bacillus interactions or immune evasion. Furthermore, there are no genes unique to M. bovis, implying that differential gene expression may be the key to the host tropisms of human and bovine bacilli. The genome sequence therefore offers major insight on the evolution, host preference, and pathobiology of M. bovis. PMID:12788972

  10. Optimizing cancer genome sequencing and analysis

    PubMed Central

    Griffith, Malachi; Miller, Christopher A.; Griffith, Obi L.; Krysiak, Kilannin; Skidmore, Zachary L.; Ramu, Avinash; Walker, Jason R.; Dang, Ha X.; Trani, Lee; Larson, David E.; Demeter, Ryan T.; Wendl, Michael C.; McMichael, Joshua F.; Austin, Rachel E.; Magrini, Vincent; McGrath, Sean D.; Ly, Amy; Kulkarni, Shashikant; Cordes, Matthew G.; Fronick, Catrina C.; Fulton, Robert S.; Maher, Christopher A.; Ding, Li; Klco, Jeffery M.; Mardis, Elaine R.; Ley, Timothy J.; Wilson, Richard K.

    2015-01-01

    Summary Tumors are typically sequenced to depths of 75–100× (exome) or 30–50× (whole genome). We demonstrate that current sequencing paradigms are inadequate for tumors that are impure, aneuploid or clonally heterogeneous. To reassess optimal sequencing strategies, we performed ultra-deep (up to ~312×) whole genome sequencing (WGS) and exome capture (up to ~433×) of a primary acute myeloid leukemia, its subsequent relapse, and a matched normal skin sample. We tested multiple alignment and variant calling algorithms and validated ~200,000 putative SNVs by sequencing them to depths of ~1,000×. Additional targeted sequencing provided over 10,000× coverage and ddPCR assays provided up to ~250,000× sampling of selected sites. We evaluated the effects of different library generation approaches, depth of sequencing, and analysis strategies on the ability to effectively characterize a complex tumor. This dataset, representing the most comprehensively sequenced tumor described to date, will serve as an invaluable community resource (dbGaP accession id phs000159). PMID:26645048

  11. [Mapping and human genome sequence program].

    PubMed

    Weissenbach, J

    1997-03-01

    Until recently, human genome programs focused primarily on establishing maps that would provide signposts to researchers seeking to identify genes responsible for inherited diseases, as well as a basis for genome sequencing studies. Preestablished gene mapping goals have been reached. The over 7,000 microsatellite markers identified to date provide a map of sufficient density to allow localization of the gene of a monogenic disease with a precision of 1 to 2 million base pairs. The physical map, based on systematically arranged overlapping sets of artificial yeast chromosomes (YACs), has also made considerable headway during the last few years. The most recently published map covers more than 90% of the genome. However, currently available physical maps cannot be used for sequencing studies because multiple rearrangements occur in YACs. The recently developed sets of radioinduced hybrids are extremely useful for incorporating genes into existing maps. A network of American and European laboratories has successfully used these radioinduced hybrids to map 15,000 gene tags from large-scale cDNA library sequencing programs. There are increasingly pressing reasons for initiating large scale human genome sequencing studies.

  12. Draft Genome Sequence of Lactobacillus plantarum 2165

    PubMed Central

    Abramov, Vyacheslav M.

    2014-01-01

    This report describes a draft genome sequence of Lactobacillus plantarum 2165. The data demonstrate the presence of a large number of genes responsible for sugar metabolism and the fermentation activity of this bacterium. Different cell surface proteins, including fibronectin and mucus-binding adhesins, may contribute to the beneficial probiotic properties of this strain. PMID:24407651

  13. Genome sequence of Bacillus licheniformis WX-02.

    PubMed

    Yangtse, Wuming; Zhou, Yinhua; Lei, Yang; Qiu, Yimin; Wei, Xuetuan; Ji, Zhixia; Qi, Gaofu; Yong, Yangchun; Chen, Lingling; Chen, Shouwen

    2012-07-01

    Bacillus licheniformis is an important bacterium that has been used extensively for large-scale industrial production of exoenzymes and peptide antibiotics. B. licheniformis WX-02 produces poly-gamma-glutamate increasingly when fermented under stress conditions. Here its genome sequence (4,270,104 bp, with G+C content of 46.06%), which comprises a circular chromosome, is announced.

  14. Draft genome sequence of Bacillus oceanisediminis 2691.

    PubMed

    Lee, Yong-Jik; Lee, Sang-Jae; Jeong, Haeyoung; Kim, Hyun Ju; Ryu, Naeun; Kim, Byoung-Chan; Lee, Han-Seung; Lee, Dong-Woo; Lee, Sang Jun

    2012-11-01

    Bacillus oceanisediminis 2691 is an aerobic, Gram-positive, spore-forming, and moderately halophilic bacterium that was isolated from marine sediment of the Yellow Sea coast of South Korea. Here, we report the draft genome sequence of B. oceanisediminis 2691 that may have an important role in the bioremediation of marine sediment.

  15. Complete Genome Sequences of 61 Mycobacteriophages

    PubMed Central

    2016-01-01

    Mycobacteriophages—viruses of mycobacteria—provide insights into viral diversity and evolution as well as numerous tools for genetic dissection of Mycobacterium tuberculosis. Here we report the complete genome sequences of 61 mycobacteriophages newly isolated from environmental samples using Mycobacterium smegmatis mc2155 that expand our understanding of phage diversity. PMID:27389257

  16. Whole Genome Sequence of a Turkish Individual

    PubMed Central

    Dogan, Haluk; Can, Handan; Otu, Hasan H.

    2014-01-01

    Although whole human genome sequencing can be done with readily available technical and financial resources, the need for detailed analyses of genomes of certain populations still exists. Here we present, for the first time, sequencing and analysis of a Turkish human genome. We have performed 35x coverage using paired-end sequencing, where over 95% of sequencing reads are mapped to the reference genome covering more than 99% of the bases. The assembly of unmapped reads rendered 11,654 contigs, 2,168 of which did not reveal any homology to known sequences, resulting in ∼1 Mbp of unmapped sequence. Single nucleotide polymorphism (SNP) discovery resulted in 3,537,794 SNP calls with 29,184 SNPs identified in coding regions, where 106 were nonsense and 259 were categorized as having a high-impact effect. The homo/hetero zygosity (1,415,123∶2,122,671 or 1∶1.5) and transition/transversion ratios (2,383,204∶1,154,590 or 2.06∶1) were within expected limits. Of the identified SNPs, 480,396 were potentially novel with 2,925 in coding regions, including 48 nonsense and 95 high-impact SNPs. Functional analysis of novel high-impact SNPs revealed various interaction networks, notably involving hereditary and neurological disorders or diseases. Assembly results indicated 713,640 indels (1∶1.09 insertion/deletion ratio), ranging from −52 bp to 34 bp in length and causing about 180 codon insertion/deletions and 246 frame shifts. Using paired-end- and read-depth-based methods, we discovered 9,109 structural variants and compared our variant findings with other populations. Our results suggest that whole genome sequencing is a valuable tool for understanding variations in the human genome across different populations. Detailed analyses of genomes of diverse origins greatly benefits research in genetics and medicine and should be conducted on a larger scale. PMID:24416366

  17. Hardware accelerator for genomic sequence alignment.

    PubMed

    Chiang, Jason; Studniberg, Michael; Shaw, Jack; Seto, Shaw; Truong, Kevin

    2006-01-01

    To infer homology and subsequently gene function, the Smith-Waterman algorithm is used to find the optimal local alignment between two sequences. When searching sequence databases that may contain billions of sequences, this algorithm becomes computationally expensive. Consequently, in this paper, we focused on accelerating the Smith-Waterman algorithm by modifying the computationally repeated portion of the algorithm by FPGA hardware custom instructions. These simple modifications accelerated the algorithm runtime by an average of 287% compared to the pure software implementation. Therefore, further design of FPGA accelerated hardware offers a promising direction to seeking runtime improvement of genomic database searching.

  18. A Draft Sequence of the Neandertal Genome

    PubMed Central

    Green, Richard E.; Li, Heng; Zhai, Weiwei; Fritz, Markus Hsi-Yang; Hansen, Nancy F.; Durand, Eric Y.; Malaspinas, Anna-Sapfo; Jensen, Jeffrey D.; Marques-Bonet, Tomas; Alkan, Can; Prüfer, Kay; Meyer, Matthias; Burbano, Hernán A.; Good, Jeffrey M.; Schultz, Rigo; Aximu-Petri, Ayinuer; Butthof, Anne; Höber, Barbara; Höffner, Barbara; Siegemund, Madlen; Weihmann, Antje; Nusbaum, Chad; Lander, Eric S.; Russ, Carsten; Novod, Nathaniel; Affourtit, Jason; Egholm, Michael; Verna, Christine; Rudan, Pavao; Brajkovic, Dejana; Kucan, Željko; Gušic, Ivan; Doronichev, Vladimir B.; Golovanova, Liubov V.; Lalueza-Fox, Carles; de la Rasilla, Marco; Fortea, Javier; Rosas, Antonio; Schmitz, Ralf W.; Johnson, Philip L. F.; Eichler, Evan E.; Falush, Daniel; Birney, Ewan; Mullikin, James C.; Slatkin, Montgomery; Nielsen, Rasmus; Kelso, Janet; Lachmann, Michael; Reich, David; Pääbo, Svante

    2016-01-01

    Neandertals, the closest evolutionary relatives of present-day humans, lived in large parts of Europe and western Asia before disappearing 30,000 years ago. We present a draft sequence of the Neandertal genome composed of more than 4 billion nucleotides from three individuals. Comparisons of the Neandertal genome to the genomes of five present-day humans from different parts of the world identify a number of genomic regions that may have been affected by positive selection in ancestral modern humans, including genes involved in metabolism and in cognitive and skeletal development. We show that Neandertals shared more genetic variants with present-day humans in Eurasia than with present-day humans in sub-Saharan Africa, suggesting that gene flow from Neandertals into the ancestors of non-Africans occurred before the divergence of Eurasian groups from each other. PMID:20448178

  19. Gambling on a shortcut to genome sequencing

    SciTech Connect

    Roberts, L.

    1991-06-21

    Almost from the start of the Human Genome Project, a debate has been raging over whether to sequence the entire human genome, all 3 billion bases, or just the genes - a mere 2% or 3% of the genome, and by far the most interesting part. In England, Sydney Brenner convinced the Medical Research Council (MRC) to start with the expressed genes, or complementary DNAs. But the US stance has been that the entire sequence is essential if we are to understand the blueprint of man. Craig Venter of the National Institute of Neurological Disorders and Stroke says that focusing on the expressed genes may be even more useful than expected. His strategy involves randomly selecting clones from cDNA libraries which theoretically contain all the genes that are switched on at a particular time in a particular tissue. Then the researchers sequence just a short stretch of each clone, about 400 to 500 bases, to create can expressed sequence tag or EST. The sequences of these ESTs are then stored in a database. Using that information, other researchers can then recreate that EST by using polymerase chain reaction techniques.

  20. Dominant short repeated sequences in bacterial genomes.

    PubMed

    Avershina, Ekaterina; Rudi, Knut

    2015-03-01

    We use a novel multidimensional searching approach to present the first exhaustive search for all possible repeated sequences in 166 genomes selected to cover the bacterial domain. We found an overrepresentation of repeated sequences in all but one of the genomes. The most prevalent repeats by far were related to interspaced short palindromic repeats (CRISPRs)—conferring bacterial adaptive immunity. We identified a deep branching clade of thermophilic Firmicutes containing the highest number of CRISPR repeats. We also identified a high prevalence of tandem repeated heptamers. In addition, we identified GC-rich repeats that could potentially be involved in recombination events. Finally, we identified repeats in a 16322 amino acid mega protein (involved in biofilm formation) and inverted repeats flanking miniature transposable elements (MITEs). In conclusion, the exhaustive search for repeated sequences identified new elements and distribution of these, which has implications for understanding both the ecology and evolution of bacteria.

  1. Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes

    USDA-ARS?s Scientific Manuscript database

    Genomics research in mammals has produced reference genome sequences that are essential for identifying variation associated with disease. High quality reference genome sequences are now available for humans, model species, and economically important agricultural animals. Comparisons between these s...

  2. Comparative Analysis of Genome Sequences with VISTA

    DOE Data Explorer

    Dubchak, Inna

    VISTA is a comprehensive suite of programs and databases developed by and hosted at the Genomics Division of Lawrence Berkeley National Laboratory. They provide information and tools designed to facilitate comparative analysis of genomic sequences. Users have two ways to interact with the suite of applications at the VISTA portal. They can submit their own sequences and alignments for analysis (VISTA servers) or examine pre-computed whole-genome alignments of different species. A key menu option is the Enhancer Browser and Database at http://enhancer.lbl.gov/. The VISTA Enhancer Browser is a central resource for experimentally validated human noncoding fragments with gene enhancer activity as assessed in transgenic mice. Most of these noncoding elements were selected for testing based on their extreme conservation with other vertebrates. The results of this enhancer screen are provided through this publicly available website. The browser also features relevant results by external contributors and a large collection of additional genome-wide conserved noncoding elements which are candidate enhancer sequences. The LBL developers invite external groups to submit computational predictions of developmental enhancers. As of 10/19/2009 the database contains information on 1109 in vivo tested elements - 508 elements with enhancer activity.

  3. Agaricus bisporus genome sequence: a commentary.

    PubMed

    Kerrigan, Richard W; Challen, Michael P; Burton, Kerry S

    2013-06-01

    The genomes of two isolates of Agaricus bisporus have been sequenced recently. This soil-inhabiting fungus has a wide geographical distribution in nature and it is also cultivated in an industrialized indoor process ($4.7bn annual worldwide value) to produce edible mushrooms. Previously this lignocellulosic fungus has resisted precise econutritional classification, i.e. into white- or brown-rot decomposers. The generation of the genome sequence and transcriptomic analyses has revealed a new classification, 'humicolous', for species adapted to grow in humic-rich, partially decomposed leaf material. The Agaricus biporus genomes contain a collection of polysaccharide and lignin-degrading genes and more interestingly an expanded number of genes (relative to other lignocellulosic fungi) that enhance degradation of lignin derivatives, i.e. heme-thiolate peroxidases and β-etherases. A motif that is hypothesized to be a promoter element in the humicolous adaptation suite is present in a large number of genes specifically up-regulated when the mycelium is grown on humic-rich substrate. The genome sequence of A. bisporus offers a platform to explore fungal biology in carbon-rich soil environments and terrestrial cycling of carbon, nitrogen, phosphorus and potassium.

  4. Genome sequence of Aspergillus luchuensis NBRC 4314

    PubMed Central

    Yamada, Osamu; Machida, Masayuki; Hosoyama, Akira; Goto, Masatoshi; Takahashi, Toru; Futagami, Taiki; Yamagata, Youhei; Takeuchi, Michio; Kobayashi, Tetsuo; Koike, Hideaki; Abe, Keietsu; Asai, Kiyoshi; Arita, Masanori; Fujita, Nobuyuki; Fukuda, Kazuro; Higa, Ken-ichi; Horikawa, Hiroshi; Ishikawa, Takeaki; Jinno, Koji; Kato, Yumiko; Kirimura, Kohtaro; Mizutani, Osamu; Nakasone, Kaoru; Sano, Motoaki; Shiraishi, Yohei; Tsukahara, Masatoshi; Gomi, Katsuya

    2016-01-01

    Awamori is a traditional distilled beverage made from steamed Thai-Indica rice in Okinawa, Japan. For brewing the liquor, two microbes, local kuro (black) koji mold Aspergillus luchuensis and awamori yeast Saccharomyces cerevisiae are involved. In contrast, that yeasts are used for ethanol fermentation throughout the world, a characteristic of Japanese fermentation industries is the use of Aspergillus molds as a source of enzymes for the maceration and saccharification of raw materials. Here we report the draft genome of a kuro (black) koji mold, A. luchuensis NBRC 4314 (RIB 2604). The total length of nonredundant sequences was nearly 34.7 Mb, comprising approximately 2,300 contigs with 16 telomere-like sequences. In total, 11,691 genes were predicted to encode proteins. Most of the housekeeping genes, such as transcription factors and N-and O-glycosylation system, were conserved with respect to Aspergillus niger and Aspergillus oryzae. An alternative oxidase and acid-stable α-amylase regarding citric acid production and fermentation at a low pH as well as a unique glutamic peptidase were also found in the genome. Furthermore, key biosynthetic gene clusters of ochratoxin A and fumonisin B were absent when compared with A. niger genome, showing the safety of A. luchuensis for food and beverage production. This genome information will facilitate not only comparative genomics with industrial kuro-koji molds, but also molecular breeding of the molds in improvements of awamori fermentation. PMID:27651094

  5. An evaluation of Comparative Genome Sequencing (CGS) by comparing two previously-sequenced bacterial genomes

    PubMed Central

    Herring, Christopher D; Palsson, Bernhard Ø

    2007-01-01

    Background With the development of new technology, it has recently become practical to resequence the genome of a bacterium after experimental manipulation. It is critical though to know the accuracy of the technique used, and to establish confidence that all of the mutations were detected. Results In order to evaluate the accuracy of genome resequencing using the microarray-based Comparative Genome Sequencing service provided by Nimblegen Systems Inc., we resequenced the E. coli strain W3110 Kohara using MG1655 as a reference, both of which have been completely sequenced using traditional sequencing methods. CGS detected 7 of 8 small sequence differences, one large deletion, and 9 of 12 IS element insertions present in W3110, but did not detect a large chromosomal inversion. In addition, we confirmed that CGS also detected 2 SNPs, one deletion and 7 IS element insertions that are not present in the genome sequence, which we attribute to changes that occurred after the creation of the W3110 lambda clone library. The false positive rate for SNPs was one per 244 Kb of genome sequence. Conclusion CGS is an effective way to detect multiple mutations present in one bacterium relative to another, and while highly cost-effective, is prone to certain errors. Mutations occurring in repeated sequences or in sequences with a high degree of secondary structure may go undetected. It is also critical to follow up on regions of interest in which SNPs were not called because they often indicate deletions or IS element insertions. PMID:17697331

  6. Synaptotagmin gene content of the sequenced genomes.

    PubMed

    Craxton, Molly

    2004-07-06

    Synaptotagmins exist as a large gene family in mammals. There is much interest in the function of certain family members which act crucially in the regulated synaptic vesicle exocytosis required for efficient neurotransmission. Knowledge of the functions of other family members is relatively poor and the presence of Synaptotagmin genes in plants indicates a role for the family as a whole which is wider than neurotransmission. Identification of the Synaptotagmin genes within completely sequenced genomes can provide the entire Synaptotagmin gene complement of each sequenced organism. Defining the detailed structures of all the Synaptotagmin genes and their encoded products can provide a useful resource for functional studies and a deeper understanding of the evolution of the gene family. The current rapid increase in the number of sequenced genomes from different branches of the tree of life, together with the public deposition of evolutionarily diverse transcript sequences make such studies worthwhile. I have compiled a detailed list of the Synaptotagmin genes of Caenorhabditis, Anopheles, Drosophila, Ciona, Danio, Fugu, Mus, Homo, Arabidopsis and Oryza by examining genomic and transcript sequences from public sequence databases together with some transcript sequences obtained by cDNA library screening and RT-PCR. I have compared all of the genes and investigated the relationship between plant Synaptotagmins and their non-Synaptotagmin counterparts. I have identified and compared 98 Synaptotagmin genes from 10 sequenced genomes. Detailed comparison of transcript sequences reveals abundant and complex variation in Synaptotagmin gene expression and indicates the presence of Synaptotagmin genes in all animals and land plants. Amino acid sequence comparisons indicate patterns of conservation and diversity in function. Phylogenetic analysis shows the origin of Synaptotagmins in multicellular eukaryotes and their great diversification in animals. Synaptotagmins occur in

  7. Sequencing of Seven Haloarchaeal Genomes Reveals Patterns of Genomic Flux

    PubMed Central

    Lynch, Erin A.; Langille, Morgan G. I.; Darling, Aaron; Wilbanks, Elizabeth G.; Haltiner, Caitlin; Shao, Katie S. Y.; Starr, Michael O.; Teiling, Clotilde; Harkins, Timothy T.; Edwards, Robert A.; Eisen, Jonathan A.; Facciotti, Marc T.

    2012-01-01

    We report the sequencing of seven genomes from two haloarchaeal genera, Haloferax and Haloarcula. Ease of cultivation and the existence of well-developed genetic and biochemical tools for several diverse haloarchaeal species make haloarchaea a model group for the study of archaeal biology. The unique physiological properties of these organisms also make them good candidates for novel enzyme discovery for biotechnological applications. Seven genomes were sequenced to ∼20×coverage and assembled to an average of 50 contigs (range 5 scaffolds - 168 contigs). Comparisons of protein-coding gene compliments revealed large-scale differences in COG functional group enrichment between these genera. Analysis of genes encoding machinery for DNA metabolism reveals genera-specific expansions of the general transcription factor TATA binding protein as well as a history of extensive duplication and horizontal transfer of the proliferating cell nuclear antigen. Insights gained from this study emphasize the importance of haloarchaea for investigation of archaeal biology. PMID:22848480

  8. The first genome sequences of human bocaviruses from Vietnam

    PubMed Central

    2016-01-01

    As part of an ongoing effort to generate complete genome sequences of hand, foot and mouth disease-causing enteroviruses directly from clinical specimens, two complete coding sequences and two partial genomic sequences of human bocavirus 1 (n=3) and 2 (n=1) were co-amplified and sequenced, representing the first genome sequences of human bocaviruses from Vietnam. The sequences may aid future study aiming at understanding the evolution of the pathogen. PMID:28090592

  9. Whole genome sequence analysis of Mycobacterium suricattae.

    PubMed

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; van der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah Musa; Siame, Kabengele Keith; Gey van Pittius, Nicolaas Claudius; van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-12-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  10. [Inhibitory effect of Mig-7 silencing by retrovirus-mediated shRNA on vasculogenic mimicry, invasion and metastasis of human hepatocellular carcinoma cells in vitro].

    PubMed

    Qu, Bo; Sheng, Guan-Nan; Yu, Fei; Chen, Guan-Nan; Lv, Qi; Mao, Zhong-Peng; Guo, Long; Lv, Yi

    2016-11-20

    To explore the inhibitory effect of migration-inducing gene 7 (Mig-7) gene silencing induced by retroviral-mediated small hairpin RNA (shRNA) on vasculogenic mimicry (VM), invasion and metastasis of human hepatocellular carcinoma (HCC) cells in vitro. Two target sequences (Mig-7 shRNA-1 and Mig-7 shRNA-2) and one negative control sequence (Mig-7 shRNA-N) were synthesized. The recombinant retroviral vectors carrying Mig-7 shRNA were constructed, and HCC cell line MHCC-97H were transfected with Mig-7 shRNA-1, Mig-7 shRNA-2, Mig-7 shRNA-N, or the empty vector, or treated with 125 µg/mL recombinant human endostatin (ES). Mig-7 expression in the treated cells was detected using semi-quantitative PCR and Western blotting. The inhibitory effect of Mig-7 silencing on VM formation was investigated in a 3-dimensional cell culture system; the changes in cell adhesion, invasion and migration were assessed with intercellular adhesion assay, Transwell invasion assay and Transwell migration assay, respectively. The expression of Mig-7 at both mRNA and protein levels decreased significantly, VM formation, invasion and metastasis were suppressed, while intercellular adhesion increased significantly in MHCC-97H cells in Mig-7 shRNA-1 and Mig-7 shRNA-2 groups (P<0.05); such changes were not observed in cells transfected with Mig-7 shRNA-N or the empty vector, nor in cells treated with ES. Mig-7 silencing by retroviral-mediated shRNA significantly inhibits VM formation, invasion and metastasis and increases the intercellular adhesion of the HCC cells, while ES does not have such inhibitory effects.

  11. Benchmark dataset for Whole Genome sequence compression.

    PubMed

    C L, Biji; Nair, Achuthsankar

    2016-05-16

    The research in DNA data compression lacks a standard dataset to test out compression tools specific to DNA. This paper argues that the current state of achievement in DNA compression is unable to be bench marked in the absence of such scientifically compiled whole genome sequence dataset and proposes a bench mark dataset using multistage sampling procedure. Considering the genome sequence of organisms available in the National Centre for Biotechnology and Information (NCBI) as the universe, the proposed dataset selects 1105 prokaryotes, 200 plasmids, 164 viruses and 65 eukaryotes. This paper reports the results of using 3 established tools on the newly compiled dataset and show that their strength and weakness are evident only with a comparison based on the scientifically compiled bench mark data set.

  12. Complete Genome Sequence of Ikoma Lyssavirus

    PubMed Central

    Marston, Denise A.; Ellis, Richard J.; Horton, Daniel L.; Kuzmin, Ivan V.; Wise, Emma L.; McElhinney, Lorraine M.; Banyard, Ashley C.; Ngeleja, Chanasa; Keyyu, Julius; Cleaveland, Sarah; Lembo, Tiziana; Rupprecht, Charles E.

    2012-01-01

    Lyssaviruses (family Rhabdoviridae) constitute one of the most important groups of viral zoonoses globally. All lyssaviruses cause the disease rabies, an acute progressive encephalitis for which, once symptoms occur, there is no effective cure. Currently available vaccines are highly protective against the predominantly circulating lyssavirus species. Using next-generation sequencing technologies, we have obtained the whole-genome sequence for a novel lyssavirus, Ikoma lyssavirus (IKOV), isolated from an African civet in Tanzania displaying clinical signs of rabies. Genetically, this virus is the most divergent within the genus Lyssavirus. Characterization of the genome will help to improve our understanding of lyssavirus diversity and enable investigation into vaccine-induced immunity and protection. PMID:22923801

  13. Simple sequence repeats in prokaryotic genomes

    PubMed Central

    Mrázek, Jan; Guo, Xiangxue; Shah, Apurva

    2007-01-01

    Simple sequence repeats (SSRs) in DNA sequences are composed of tandem iterations of short oligonucleotides and may have functional and/or structural properties that distinguish them from general DNA sequences. They are variable in length because of slip-strand mutations and may also affect local structure of the DNA molecule or the encoded proteins. Long SSRs (LSSRs) are common in eukaryotes but rare in most prokaryotes. In pathogens, SSRs can enhance antigenic variance of the pathogen population in a strategy that counteracts the host immune response. We analyze representations of SSRs in >300 prokaryotic genomes and report significant differences among different prokaryotes as well as among different types of SSRs. LSSRs composed of short oligonucleotides (1–4 bp length, designated LSSR1–4) are often found in host-adapted pathogens with reduced genomes that are not known to readily survive in a natural environment outside the host. In contrast, LSSRs composed of longer oligonucleotides (5–11 bp length, designated LSSR5–11) are found mostly in nonpathogens and opportunistic pathogens with large genomes. Comparisons among SSRs of different lengths suggest that LSSR1–4 are likely maintained by selection. This is consistent with the established role of some LSSR1–4 in enhancing antigenic variance. By contrast, abundance of LSSR5–11 in some genomes may reflect the SSRs' general tendency to expand rather than their specific role in the organisms' physiology. Differences among genomes in terms of SSR representations and their possible interpretations are discussed. PMID:17485665

  14. Draft Genome Sequence of Actinomyces massiliensis Strain 4401292T

    PubMed Central

    Robert, Catherine; Gimenez, Grégory; Gharbi, Reem; Raoult, Didier

    2012-01-01

    A draft genome sequence of Actinomyces massiliensis, an anaerobic bacterium isolated from a patient's blood culture, is described here. CRISPR-associated proteins, insertion sequences, and toxin-antitoxin loci were found on the genome. PMID:22933754

  15. Whole-genome sequencing in bacteriology: state of the art.

    PubMed

    Dark, Michael J

    2013-10-08

    Over the last ten years, genome sequencing capabilities have expanded exponentially. There have been tremendous advances in sequencing technology, DNA sample preparation, genome assembly, and data analysis. This has led to advances in a number of facets of bacterial genomics, including metagenomics, clinical medicine, bacterial archaeology, and bacterial evolution. This review examines the strengths and weaknesses of techniques in bacterial genome sequencing, upcoming technologies, and assembly techniques, as well as highlighting recent studies that highlight new applications for bacterial genomics.

  16. Whole-genome sequencing in bacteriology: state of the art

    PubMed Central

    Dark, Michael J

    2013-01-01

    Over the last ten years, genome sequencing capabilities have expanded exponentially. There have been tremendous advances in sequencing technology, DNA sample preparation, genome assembly, and data analysis. This has led to advances in a number of facets of bacterial genomics, including metagenomics, clinical medicine, bacterial archaeology, and bacterial evolution. This review examines the strengths and weaknesses of techniques in bacterial genome sequencing, upcoming technologies, and assembly techniques, as well as highlighting recent studies that highlight new applications for bacterial genomics. PMID:24143115

  17. Genome Sequence of Aerococcus viridans LL1

    PubMed Central

    Qin, Nan; Zheng, Beiwen; Yang, Fengling; Chen, Yanfei; Guo, Jing; Hu, Xinjun

    2012-01-01

    Aerococcus viridans is a catalase-negative Gram-positive bacterium and has been described as an airborne organism widely distributed in the hospital environment or in clinical specimens. We isolated A. viridans strain LL1 from indoor dust samples collected by a patient. Here, we prepared a genome sequence for this strain consisting of 31 contigs totaling 1,994,039 bases and a GC content of 39.42%. PMID:22815455

  18. Genome sequence of Aerococcus viridans LL1.

    PubMed

    Qin, Nan; Zheng, Beiwen; Yang, Fengling; Chen, Yanfei; Guo, Jing; Hu, Xinjun; Li, Lanjuan

    2012-08-01

    Aerococcus viridans is a catalase-negative Gram-positive bacterium and has been described as an airborne organism widely distributed in the hospital environment or in clinical specimens. We isolated A. viridans strain LL1 from indoor dust samples collected by a patient. Here, we prepared a genome sequence for this strain consisting of 31 contigs totaling 1,994,039 bases and a GC content of 39.42%.

  19. Complete Genome Sequences of 138 Mycobacteriophages

    PubMed Central

    2012-01-01

    Bacteriophages are the most numerous biological entities in the biosphere, and although their genetic diversity is high, it remains ill defined. Mycobacteriophages—the viruses of mycobacterial hosts—provide insights into this diversity as well as tools for manipulating Mycobacterium tuberculosis. We report here the complete genome sequences of 138 new mycobacteriophages, which—together with the 83 mycobacteriophages previously reported—represent the largest collection of phages known to infect a single common host, Mycobacterium smegmatis mc2 155. PMID:22282335

  20. Draft Genome Sequence of Rubrivivax gelatinosus CBS

    SciTech Connect

    Hu, P. S.; Lang, J.; Wawrousek, K.; Yu, J. P.; Maness, P. C.; Chen, J.

    2012-06-01

    Rubrivivax gelatinosus CBS, a purple nonsulfur photosynthetic bacterium, can grow photosynthetically using CO and N{sub 2} as the sole carbon and nitrogen nutrients, respectively. R. gelatinosus CBS is of particular interest due to its ability to metabolize CO and yield H{sub 2}. We present the 5-Mb draft genome sequence of R. gelatinosus CBS with the goal of providing genetic insight into the metabolic properties of this bacterium.

  1. DNA methylation detection: bisulfite genomic sequencing analysis.

    PubMed

    Li, Yuanyuan; Tollefsbol, Trygve O

    2011-01-01

    DNA methylation, which most commonly occurs at the C5 position of cytosines within CpG dinucleotides, plays a pivotal role in many biological procedures such as gene expression, embryonic development, cellular proliferation, differentiation, and chromosome stability. Aberrant DNA methylation is often associated with loss of DNA homeostasis and genomic instability leading to the development of human diseases such as cancer. The importance of DNA methylation creates an urgent demand for effective methods with high sensitivity and reliability to explore innovative diagnostic and therapeutic strategies. Bisulfite genomic sequencing developed by Frommer and colleagues was recognized as a revolution in DNA methylation analysis based on conversion of genomic DNA by using sodium bisulfite. Besides various merits of the bisulfite genomic sequencing method such as being highly qualitative and quantitative, it serves as a fundamental principle to many derived methods to better interpret the mystery of DNA methylation. Here, we present a protocol currently frequently used in our laboratory that has proven to yield optimal outcomes. We also discuss the potential technical problems and troubleshooting notes for a variety of applications in this field.

  2. Genome sequence of Halobacterium species NRC-1

    PubMed Central

    Ng, Wailap Victor; Kennedy, Sean P.; Mahairas, Gregory G.; Berquist, Brian; Pan, Min; Shukla, Hem Dutt; Lasky, Stephen R.; Baliga, Nitin S.; Thorsson, Vesteinn; Sbrogna, Jennifer; Swartzell, Steven; Weir, Douglas; Hall, John; Dahl, Timothy A.; Welti, Russell; Goo, Young Ah; Leithauser, Brent; Keller, Kim; Cruz, Randy; Danson, Michael J.; Hough, David W.; Maddocks, Deborah G.; Jablonski, Peter E.; Krebs, Mark P.; Angevine, Christine M.; Dale, Heather; Isenbarger, Thomas A.; Peck, Ronald F.; Pohlschroder, Mechthild; Spudich, John L.; Jung, Kwang-Hwan; Alam, Maqsudul; Freitas, Tracey; Hou, Shaobin; Daniels, Charles J.; Dennis, Patrick P.; Omer, Arina D.; Ebhardt, Holger; Lowe, Todd M.; Liang, Ping; Riley, Monica; Hood, Leroy; DasSarma, Shiladitya

    2000-01-01

    We report the complete sequence of an extreme halophile, Halobacterium sp. NRC-1, harboring a dynamic 2,571,010-bp genome containing 91 insertion sequences representing 12 families and organized into a large chromosome and 2 related minichromosomes. The Halobacterium NRC-1 genome codes for 2,630 predicted proteins, 36% of which are unrelated to any previously reported. Analysis of the genome sequence shows the presence of pathways for uptake and utilization of amino acids, active sodium-proton antiporter and potassium uptake systems, sophisticated photosensory and signal transduction pathways, and DNA replication, transcription, and translation systems resembling more complex eukaryotic organisms. Whole proteome comparisons show the definite archaeal nature of this halophile with additional similarities to the Gram-positive Bacillus subtilis and other bacteria. The ease of culturing Halobacterium and the availability of methods for its genetic manipulation in the laboratory, including construction of gene knockouts and replacements, indicate this halophile can serve as an excellent model system among the archaea. PMID:11016950

  3. Draft genome sequence of an aflatoxigenic Aspergillus species, A. bombycis

    USDA-ARS?s Scientific Manuscript database

    The genome of the A. bombycis Type strain was sequenced using a Personal Genome Machine, followed by annotation of its predicted genes. The genome size for A. bombycis was found to be approximately 37 Mb and contained 12,266 genes. This announcement introduces a sequenced genome for an aflatoxigenic...

  4. Identification of ancient remains through genomic sequencing

    PubMed Central

    Blow, Matthew J.; Zhang, Tao; Woyke, Tanja; Speller, Camilla F.; Krivoshapkin, Andrei; Yang, Dongya Y.; Derevianko, Anatoly; Rubin, Edward M.

    2008-01-01

    Studies of ancient DNA have been hindered by the preciousness of remains, the small quantities of undamaged DNA accessible, and the limitations associated with conventional PCR amplification. In these studies, we developed and applied a genomewide adapter-mediated emulsion PCR amplification protocol for ancient mammalian samples estimated to be between 45,000 and 69,000 yr old. Using 454 Life Sciences (Roche) and Illumina sequencing (formerly Solexa sequencing) technologies, we examined over 100 megabases of DNA from amplified extracts, revealing unbiased sequence coverage with substantial amounts of nonredundant nuclear sequences from the sample sources and negligible levels of human contamination. We consistently recorded over 500-fold increases, such that nanogram quantities of starting material could be amplified to microgram quantities. Application of our protocol to a 50,000-yr-old uncharacterized bone sample that was unsuccessful in mitochondrial PCR provided sufficient nuclear sequences for comparison with extant mammals and subsequent phylogenetic classification of the remains. The combined use of emulsion PCR amplification and high-throughput sequencing allows for the generation of large quantities of DNA sequence data from ancient remains. Using such techniques, even small amounts of ancient remains with low levels of endogenous DNA preservation may yield substantial quantities of nuclear DNA, enabling novel applications of ancient DNA genomics to the investigation of extinct phyla. PMID:18426903

  5. Swine Genome Sequencing Consortium (SGSC): A Strategic Roadmap for Sequencing The Pig Genome

    PubMed Central

    Schook, Lawrence B.; Beever, Jonathan E.; Rogers, Jane; Humphray, Sean; Archibald, Alan; Chardon, Patrick; Milan, Denis; Rohrer, Gary; Eversole, Kellye

    2005-01-01

    The Swine Genome Sequencing Consortium (SGSC) was formed in September 2003 by academic, government and industry representatives to provide international coordination for sequencing the pig genome. The SGSC’s mission is to advance biomedical research for animal production and health by the development of DNAbased tools and products resulting from the sequencing of the swine genome. During the past 2 years, the SGSC has met bi-annually to develop a strategic roadmap for creating the required scientific resources, to integrate existing physical maps, and to create a sequencing strategy that captured international participation and a broad funding base. During the past year, SGSC members have integrated their respective physical mapping data with the goal of creating a minimal tiling path (MTP) that will be used as the sequencing template. During the recent Plant and Animal Genome meeting (January 16, 2005 San Diego, CA), presentations demonstrated that a human–pig comparative map has been completed, BAC fingerprint contigs (FPC) for each of the autosomes and X chromosome have been constructed and that BAC end-sequencing has permitted, through BLAST analysis and RH-mapping, anchoring of the contigs. Thus, significant progress has been made towards the creation of a MTP. In addition, whole-genome (WG) shotgun libraries have been constructed and are currently being sequenced in various laboratories around the globe. Thus, a hybrid sequencing approach in which 3x coverage of BACs comprising the MTP and 3x of the WG-shotgun libraries will be used to develop a draft 6x coverage of the pig genome. PMID:18629187

  6. Genome sequence of Aspergillus luchuensis NBRC 4314.

    PubMed

    Yamada, Osamu; Machida, Masayuki; Hosoyama, Akira; Goto, Masatoshi; Takahashi, Toru; Futagami, Taiki; Yamagata, Youhei; Takeuchi, Michio; Kobayashi, Tetsuo; Koike, Hideaki; Abe, Keietsu; Asai, Kiyoshi; Arita, Masanori; Fujita, Nobuyuki; Fukuda, Kazuro; Higa, Ken-Ichi; Horikawa, Hiroshi; Ishikawa, Takeaki; Jinno, Koji; Kato, Yumiko; Kirimura, Kohtaro; Mizutani, Osamu; Nakasone, Kaoru; Sano, Motoaki; Shiraishi, Yohei; Tsukahara, Masatoshi; Gomi, Katsuya

    2016-12-01

    Awamori is a traditional distilled beverage made from steamed Thai-Indica rice in Okinawa, Japan. For brewing the liquor, two microbes, local kuro (black) koji mold Aspergillus luchuensis and awamori yeast Saccharomyces cerevisiae are involved. In contrast, that yeasts are used for ethanol fermentation throughout the world, a characteristic of Japanese fermentation industries is the use of Aspergillus molds as a source of enzymes for the maceration and saccharification of raw materials. Here we report the draft genome of a kuro (black) koji mold, A. luchuensis NBRC 4314 (RIB 2604). The total length of nonredundant sequences was nearly 34.7 Mb, comprising approximately 2,300 contigs with 16 telomere-like sequences. In total, 11,691 genes were predicted to encode proteins. Most of the housekeeping genes, such as transcription factors and N-and O-glycosylation system, were conserved with respect to Aspergillus niger and Aspergillus oryzae An alternative oxidase and acid-stable α-amylase regarding citric acid production and fermentation at a low pH as well as a unique glutamic peptidase were also found in the genome. Furthermore, key biosynthetic gene clusters of ochratoxin A and fumonisin B were absent when compared with A. niger genome, showing the safety of A. luchuensis for food and beverage production. This genome information will facilitate not only comparative genomics with industrial kuro-koji molds, but also molecular breeding of the molds in improvements of awamori fermentation. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  7. Transforming clinical microbiology with bacterial genome sequencing.

    PubMed

    Didelot, Xavier; Bowden, Rory; Wilson, Daniel J; Peto, Tim E A; Crook, Derrick W

    2012-09-01

    Whole-genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here, we review the current status of clinical microbiology and how it has already begun to be transformed by using next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties, such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. We predict that the application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow.

  8. Transforming clinical microbiology with bacterial genome sequencing

    PubMed Central

    2016-01-01

    Whole genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here we review the current status of clinical microbiology and how it has already begun to be transformed by the use of next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. The application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow. PMID:22868263

  9. Comparison between hybrid laser-MIG welding and MIG welding for the invar36 alloy

    NASA Astrophysics Data System (ADS)

    Zhan, Xiaohong; Li, Yubo; Ou, Wenmin; Yu, Fengyi; Chen, Jie; Wei, Yanhong

    2016-11-01

    The invar36 alloy is suitable to produce mold of composite materials structure because it has similar thermal expansion coefficient with composite materials. In the present paper, the MIG welding and laser-MIG hybrid welding methods are compared to get the more appropriate method to overcome the poor weldability of invar36 alloy. According to the analysis of the experimental and simulated results, it has been proved that the Gauss and cone combined heat source model can characterize the laser-MIG hybrid welding heat source well. The total welding time of MIG welding is 8 times that of hybrid laser-MIG welding. The welding material consumption of MIG welding is about 4 times that of hybrid laser-MIG welding. The stress and deformation simulation indicate that the peak value of deformation during MIG welding is 3 times larger than that of hybrid laser-MIG welding.

  10. Energy balance in MIG arcs

    NASA Astrophysics Data System (ADS)

    Schnick, M.; Hertel, M.; Fuessel, U.; Uhrlandt, D.

    2013-06-01

    Recent studies of metal inert gas (MIG) processes by spectroscopy and fluid simulations have shown that metal evaporation causes a specific spatial structure of the arc, and among others a minimum of plasma temperature at the arc centre. Changes in the arc structure and in the heat transfer to the material are closely connected with the arc energy balance; its detailed analysis has not been carried out so far under the specific impact of metal vapour. In this paper, magnetohydrodynamic (MHD) simulations of an MIG arc in argon including iron evaporation at the wire tip are considered. The main terms in the energy balance are discussed focusing on a comparison of the arc regions with and without metal vapour. In addition, a simple approach of the energy balance at a cross section of the MIG arc is proposed where all details of the heat transport are neglected. The MHD model and the simplified approach are in good agreement and clearly demonstrate that the specific structure in an MIG arc is mainly caused by the different temperature dependence of the plasma radiation and the electrical conductivity in argon or in argon mixtures with iron vapour.

  11. Why Assembling Plant Genome Sequences Is So Challenging

    PubMed Central

    Claros, Manuel Gonzalo; Bautista, Rocío; Guerrero-Fernández, Darío; Benzerki, Hicham; Seoane, Pedro; Fernández-Pozo, Noé

    2012-01-01

    In spite of the biological and economic importance of plants, relatively few plant species have been sequenced. Only the genome sequence of plants with relatively small genomes, most of them angiosperms, in particular eudicots, has been determined. The arrival of next-generation sequencing technologies has allowed the rapid and efficient development of new genomic resources for non-model or orphan plant species. But the sequencing pace of plants is far from that of animals and microorganisms. This review focuses on the typical challenges of plant genomes that can explain why plant genomics is less developed than animal genomics. Explanations about the impact of some confounding factors emerging from the nature of plant genomes are given. As a result of these challenges and confounding factors, the correct assembly and annotation of plant genomes is hindered, genome drafts are produced, and advances in plant genomics are delayed. PMID:24832233

  12. Porcine parvovirus: DNA sequence and genome organization.

    PubMed

    Ranz, A I; Manclús, J J; Díaz-Aroca, E; Casal, J I

    1989-10-01

    We have determined the nucleotide sequence of an almost full-length clone of porcine parvovirus (PPV). The sequence is 4973 nucleotides (nt) long. The 3' end of virion DNA shows a Y-shaped configuration homologous to rodent parvoviruses. The 5' end of virion DNA shows a repetition of 127 nt at the carboxy terminus of the capsid proteins. The overall organization of the PPV genome is similar to those of other autonomous parvoviruses. There are two large open reading frames (ORFs) that almost entirely cover the genome, both located in the same frame of the complementary strand. The left ORF encodes the non-structural protein NS1 and the right ORF encodes the capsid proteins (VP1, VP2 and VP3). Promoter analysis, location of splicing sites and putative amino acid sequences for the viral proteins show a high homology of PPV with feline panleukopenia virus and canine parvoviruses (FPV and CPV) and rodent parvovirus. Therefore we conclude that PPV is related to the Kilham rat virus (KRV) group of autonomous parvoviruses formed by KRV, minute virus of mice, Lu III, H-1, FPV and CPV.

  13. Viral sequences integrated into plant genomes.

    PubMed

    Harper, Glyn; Hull, Roger; Lockhart, Ben; Olszewski, Neil

    2002-01-01

    Sequences of various DNA plant viruses have been found integrated into the host genome. There are two forms of integrant, those that can form episomal viral infections and those that cannot. Integrants of three pararetroviruses, Banana streak virus (BSV), Tobacco vein clearing virus (TVCV), and Petunia vein clearing virus (PVCV), can generate episomal infections in certain hybrid plant hosts in response to stress. In the case of BSV and TVCV, one of the parents contains the integrant but is has not been seen to be activated in that parent; the other parent does not contain the integrant. The number of integrant loci is low for BSV and PVCV and high in TVCV. The structure of the integrants is complex, and it is thought that episomal virus is released by recombination and/or reverse transcription. Geminiviral and pararetroviral sequences are found in plant genomes although not so far associated with a virus disease. It appears that integration of viral sequences is widespread in the plant kingdom and has been occurring for a long period of time.

  14. Draft Genome Sequence of Mycobacterium chimaera Type ...

    EPA Pesticide Factsheets

    We report the draft genome sequence of the type strain Mycobacterium chimaera Fl-0169T, a member of the Mycobacterium avium complex (MAC). M. chimaera Fl-0169T was isolated from a patient in Italy and is highly similar to strains of M. chimaera isolated in Ireland, though Fl-0169T possesses unique virulence genes. Evidence suggests that M. avium, M. intracellulare, and M. chimaera are differently virulent and a comparative genomic analysis is critically needed to identify diagnostic targets that reliably differentiate species of MAC. With treatment costs for Mycobacterium infections estimated to be >$1.8 B annually in the U.S., correct species identification will result in improved treatment selection, lower costs, and improved patient outcomes.

  15. Simple sequence repeats in bryophyte mitochondrial genomes.

    PubMed

    Zhao, Chao-Xian; Zhu, Rui-Liang; Liu, Yang

    2016-01-01

    Simple sequence repeats (SSRs) are thought to be common in plant mitochondrial (mt) genomes, but have yet to be fully described for bryophytes. We screened the mt genomes of two liverworts (Marchantia polymorpha and Pleurozia purpurea), two mosses (Physcomitrella patens and Anomodon rugelii) and two hornworts (Phaeoceros laevis and Nothoceros aenigmaticus), and detected 475 SSRs. Some SSRs are found conserved during the evolution, among which except one exists in both liverworts and mosses, all others are shared only by the two liverworts, mosses or hornworts. SSRs are known as DNA tracts having high mutation rates; however, according to our observations, they still can evolve slowly. The conservativeness of these SSRs suggests that they are under strong selection and could play critical roles in maintaining the gene functions.

  16. Aligning Two Genomic Sequences That Contain Duplications

    NASA Astrophysics Data System (ADS)

    Hou, Minmei; Riemer, Cathy; Berman, Piotr; Hardison, Ross C.; Miller, Webb

    It is difficult to properly align genomic sequences that contain intra-species duplications. With this goal in mind, we have developed a tool, called TOAST (two-way orthologous alignment selection tool), for predicting whether two aligned regions from different species are orthologous, i.e., separated by a speciation event, as opposed to a duplication event. The advantage of restricting alignment to orthologous pairs is that they constitute the aligning regions that are most likely to share the same biological function, and most easily analyzed for evidence of selection. We evaluate TOAST on 12 human/mouse gene clusters.

  17. Rickettsia felis, from culture to genome sequencing.

    PubMed

    Ogata, H; Robert, C; Audic, S; Robineau, S; Blanc, G; Fournier, P E; Renesto, P; Claverie, J M; Raoult, D

    2005-12-01

    Rickettsia felis has been recently cultured in XTC2 cells. This allows production of enough bacteria to create a genomic bank and to sequence it. The chromosome of R. felis is longer than that of previously sequenced rickettsiae and it possess 2 plasmids. Microscopically, this bacterium exhibits two forms of pili: one resembles a conjugative pilus and another forms hair-like projections that may play a role in pathogenicity. R. felis also exhibits several copies of ankyrin-repeat genes and tetratricopeptide encoding gene that are specifically linked to pathogenic host-associated bacteria. It also contains toxin-antitoxin system encoding genes that are extremely rare in intracellular bacteria and may be linked to plasmid maintenance.

  18. Initial sequencing and comparative analysis of the mouse genome

    SciTech Connect

    Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F.; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E.; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R.; Brown, Daniel G.; Brown, Stephen D.; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D.; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T.; Church, Deanna M.; Clamp, Michele; Clee, Christopher; Collins, Francis S.; Cook, Lisa L.; Copley, Richard R.; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D.; Deri, Justin; Dermitzakis, Emmanouil T.; Dewey, Colin; Dickens, Nicholas J.; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M.; Eddy, Sean R.; Elnitski, Laura; Emes, Richard D.; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A.; Flicek, Paul; Foley, Karen; Frankel, Wayne N.; Fulton, Lucinda A.; Fulton, Robert S.; Furey, Terrence S.; Gage, Diane; Gibbs, Richard A.; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A.; Green, Eric D.; Gregory, Simon; Guigo, Roderic; Guyer, Mark; Hardison, Ross C.; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W.; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B.; Johnson, L. Steven; Jones, Matthew; Jones, Thomas A.; Joy, Ann; Kamal, Michael; Karlsson, Elinor K.; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W. James; Kirby, Andrew; Kolbe, Diana L.; Korf, Ian; Kucherlapati, Raju S.; Kulbokas III, Edward J.; Kulp, David; Landers, Tom; Leger, J.P.; Leonard, Steven; Letunic, Ivica; Levine, Rosie; et al.

    2002-12-15

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  19. Initial sequencing and comparative analysis of the mouse genome.

    PubMed

    Waterston, Robert H; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R; Brown, Daniel G; Brown, Stephen D; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T; Church, Deanna M; Clamp, Michele; Clee, Christopher; Collins, Francis S; Cook, Lisa L; Copley, Richard R; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D; Deri, Justin; Dermitzakis, Emmanouil T; Dewey, Colin; Dickens, Nicholas J; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M; Eddy, Sean R; Elnitski, Laura; Emes, Richard D; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A; Flicek, Paul; Foley, Karen; Frankel, Wayne N; Fulton, Lucinda A; Fulton, Robert S; Furey, Terrence S; Gage, Diane; Gibbs, Richard A; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A; Green, Eric D; Gregory, Simon; Guigó, Roderic; Guyer, Mark; Hardison, Ross C; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B; Johnson, L Steven; Jones, Matthew; Jones, Thomas A; Joy, Ann; Kamal, Michael; Karlsson, Elinor K; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W James; Kirby, Andrew; Kolbe, Diana L; Korf, Ian; Kucherlapati, Raju S; Kulbokas, Edward J; Kulp, David; Landers, Tom; Leger, J P; Leonard, Steven; Letunic, Ivica; Levine, Rosie; Li, Jia; Li, Ming; Lloyd, Christine; Lucas, Susan; Ma, Bin; Maglott, Donna R; Mardis, Elaine R; Matthews, Lucy; Mauceli, Evan; Mayer, John H; McCarthy, Megan; McCombie, W Richard; McLaren, Stuart; McLay, Kirsten; McPherson, John D; Meldrim, Jim; Meredith, Beverley; Mesirov, Jill P; Miller, Webb; Miner, Tracie L; Mongin, Emmanuel; Montgomery, Kate T; Morgan, Michael; Mott, Richard; Mullikin, James C; Muzny, Donna M; Nash, William E; Nelson, Joanne O; Nhan, Michael N; Nicol, Robert; Ning, Zemin; Nusbaum, Chad; O'Connor, Michael J; Okazaki, Yasushi; Oliver, Karen; Overton-Larty, Emma; Pachter, Lior; Parra, Genís; Pepin, Kymberlie H; Peterson, Jane; Pevzner, Pavel; Plumb, Robert; Pohl, Craig S; Poliakov, Alex; Ponce, Tracy C; Ponting, Chris P; Potter, Simon; Quail, Michael; Reymond, Alexandre; Roe, Bruce A; Roskin, Krishna M; Rubin, Edward M; Rust, Alistair G; Santos, Ralph; Sapojnikov, Victor; Schultz, Brian; Schultz, Jörg; Schwartz, Matthias S; Schwartz, Scott; Scott, Carol; Seaman, Steven; Searle, Steve; Sharpe, Ted; Sheridan, Andrew; Shownkeen, Ratna; Sims, Sarah; Singer, Jonathan B; Slater, Guy; Smit, Arian; Smith, Douglas R; Spencer, Brian; Stabenau, Arne; Stange-Thomann, Nicole; Sugnet, Charles; Suyama, Mikita; Tesler, Glenn; Thompson, Johanna; Torrents, David; Trevaskis, Evanne; Tromp, John; Ucla, Catherine; Ureta-Vidal, Abel; Vinson, Jade P; Von Niederhausern, Andrew C; Wade, Claire M; Wall, Melanie; Weber, Ryan J; Weiss, Robert B; Wendl, Michael C; West, Anthony P; Wetterstrand, Kris; Wheeler, Raymond; Whelan, Simon; Wierzbowski, Jamey; Willey, David; Williams, Sophie; Wilson, Richard K; Winter, Eitan; Worley, Kim C; Wyman, Dudley; Yang, Shan; Yang, Shiaw-Pyng; Zdobnov, Evgeny M; Zody, Michael C; Lander, Eric S

    2002-12-05

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  20. Research ethics and the challenge of whole-genome sequencing

    PubMed Central

    McGuire, Amy L.; Caulfield, Timothy; Cho, Mildred K.

    2008-01-01

    The recent completion of the first two individual whole-genome sequences is a research milestone. As personal genome research advances, investigators and international research bodies must ensure ethical research conduct. We identify three major ethical considerations that have been implicated in whole-genome research: the return of research results to participants; the obligations, if any, that are owed to participants’ relatives; and the future use of samples and data taken for whole-genome sequencing. Although the issues are not new, we discuss their implications for personal genomics and provide recommendations for appropriate management in the context of research involving individual whole-genome sequencing. PMID:18087293

  1. Evaluation of Genome Sequencing Quality in Selected Plant Species Using Expressed Sequence Tags

    PubMed Central

    Shangguan, Lingfei; Han, Jian; Kayesh, Emrul; Sun, Xin; Zhang, Changqing; Pervaiz, Tariq; Wen, Xicheng; Fang, Jinggui

    2013-01-01

    Background With the completion of genome sequencing projects for more than 30 plant species, large volumes of genome sequences have been produced and stored in online databases. Advancements in sequencing technologies have reduced the cost and time of whole genome sequencing enabling more and more plants to be subjected to genome sequencing. Despite this, genome sequence qualities of multiple plants have not been evaluated. Methodology/Principal Finding Integrity and accuracy were calculated to evaluate the genome sequence quality of 32 plants. The integrity of a genome sequence is presented by the ratio of chromosome size and genome size (or between scaffold size and genome size), which ranged from 55.31% to nearly 100%. The accuracy of genome sequence was presented by the ratio between matched EST and selected ESTs where 52.93% ∼ 98.28% and 89.02% ∼ 98.85% of the randomly selected clean ESTs could be mapped to chromosome and scaffold sequences, respectively. According to the integrity, accuracy and other analysis of each plant species, thirteen plant species were divided into four levels. Arabidopsis thaliana, Oryza sativa and Zea mays had the highest quality, followed by Brachypodium distachyon, Populus trichocarpa, Vitis vinifera and Glycine max, Sorghum bicolor, Solanum lycopersicum and Fragaria vesca, and Lotus japonicus, Medicago truncatula and Malus × domestica in that order. Assembling the scaffold sequences into chromosome sequences should be the primary task for the remaining nineteen species. Low GC content and repeat DNA influences genome sequence assembly. Conclusion The quality of plant genome sequences was found to be lower than envisaged and thus the rapid development of genome sequencing projects as well as research on bioinformatics tools and the algorithms of genome sequence assembly should provide increased processing and correction of genome sequences that have already been published. PMID:23922843

  2. Draft Genome Sequence of the Fungus Trametes hirsuta 072.

    PubMed

    Pavlov, Andrey R; Tyazhelova, Tatiana V; Moiseenko, Konstantin V; Vasina, Daria V; Mosunova, Olga V; Fedorova, Tatiana V; Maloshenok, Lilya G; Landesman, Elena O; Bruskin, Sergei A; Psurtseva, Nadezhda V; Slesarev, Alexei I; Kozyavkin, Sergei A; Koroleva, Olga V

    2015-11-19

    A standard draft genome sequence of the white rot saprotrophic fungus Trametes hirsuta 072 (Basidiomycota, Polyporales) is presented. The genome sequence contains about 33.6 Mb assembled in 141 scaffolds with a G+C content of ~57.6%. The draft genome annotation predicts 14,598 putative protein-coding open reading frames (ORFs). Copyright © 2015 Pavlov et al.

  3. Draft Genome Sequence of the Fungus Trametes hirsuta 072

    PubMed Central

    Tyazhelova, Tatiana V.; Moiseenko, Konstantin V.; Vasina, Daria V.; Mosunova, Olga V.; Fedorova, Tatiana V.; Maloshenok, Lilya G.; Landesman, Elena O.; Bruskin, Sergei A.; Psurtseva, Nadezhda V.; Slesarev, Alexei I.; Kozyavkin, Sergei A.; Koroleva, Olga V.

    2015-01-01

    A standard draft genome sequence of the white rot saprotrophic fungus Trametes hirsuta 072 (Basidiomycota, Polyporales) is presented. The genome sequence contains about 33.6 Mb assembled in 141 scaffolds with a G+C content of ~57.6%. The draft genome annotation predicts 14,598 putative protein-coding open reading frames (ORFs). PMID:26586872

  4. Detecting long tandem duplications in genomic sequences

    PubMed Central

    2012-01-01

    Background Detecting duplication segments within completely sequenced genomes provides valuable information to address genome evolution and in particular the important question of the emergence of novel functions. The usual approach to gene duplication detection, based on all-pairs protein gene comparisons, provides only a restricted view of duplication. Results In this paper, we introduce ReD Tandem, a software using a flow based chaining algorithm targeted at detecting tandem duplication arrays of moderate to longer length regions, with possibly locally weak similarities, directly at the DNA level. On the A. thaliana genome, using a reference set of tandem duplicated genes built using TAIR,a we show that ReD Tandem is able to predict a large fraction of recently duplicated genes (dS < 1) and that it is also able to predict tandem duplications involving non coding elements such as pseudo-genes or RNA genes. Conclusions ReD Tandem allows to identify large tandem duplications without any annotation, leading to agnostic identification of tandem duplications. This approach nicely complements the usual protein gene based which ignores duplications involving non coding regions. It is however inherently restricted to relatively recent duplications. By recovering otherwise ignored events, ReD Tandem gives a more comprehensive view of existing evolutionary processes and may also allow to improve existing annotations. PMID:22568762

  5. Rapid whole genome sequencing and precision neonatology.

    PubMed

    Petrikin, Joshua E; Willig, Laurel K; Smith, Laurie D; Kingsmore, Stephen F

    2015-12-01

    Traditionally, genetic testing has been too slow or perceived to be impractical to initial management of the critically ill neonate. Technological advances have led to the ability to sequence and interpret the entire genome of a neonate in as little as 26 h. As the cost and speed of testing decreases, the utility of whole genome sequencing (WGS) of neonates for acute and latent genetic illness increases. Analyzing the entire genome allows for concomitant evaluation of the currently identified 5588 single gene diseases. When applied to a select population of ill infants in a level IV neonatal intensive care unit, WGS yielded a diagnosis of a causative genetic disease in 57% of patients. These diagnoses may lead to clinical management changes ranging from transition to palliative care for uniformly lethal conditions for alteration or initiation of medical or surgical therapy to improve outcomes in others. Thus, institution of 2-day WGS at time of acute presentation opens the possibility of early implementation of precision medicine. This implementation may create opportunities for early interventional, frequently novel or off-label therapies that may alter disease trajectory in infants with what would otherwise be fatal disease. Widespread deployment of rapid WGS and precision medicine will raise ethical issues pertaining to interpretation of variants of unknown significance, discovery of incidental findings related to adult onset conditions and carrier status, and implementation of medical therapies for which little is known in terms of risks and benefits. Despite these challenges, precision neonatology has significant potential both to decrease infant mortality related to genetic diseases with onset in newborns and to facilitate parental decision making regarding transition to palliative care.

  6. Complete genome sequence of Methanoculleus marisnigri type strain JR1

    SciTech Connect

    Anderson, Iain; Sieprawska-Lupa, Magdalena; Goltsman, Eugene; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Tice, Hope; Dalin, Eileen; Barry, Kerrie; Saunders, Elizabeth H; Han, Cliff; Brettin, Tom; Detter, J. Chris; Bruce, David; Mikhailova, Natalia; Pitluck, Sam; Hauser, Loren John; Land, Miriam L; Lucas, Susan; Richardson, P M; Whitman, W. B.; Kyrpides, Nikos C

    2009-01-01

    Methanoculleus marisnigri Romesser et al. 1981 is a methanogen belonging to the order Methanomicrobiales within the archaeal phylum Euryarchaeota. The type strain, JR1, was isolated from anoxic sediments of the Black Sea. M. marisnigri is of phylogenetic interest because at the time the sequencing project began only one genome had previously been sequenced from the order Methanomicrobiales. We report here the complete genome sequence of M. marisnigri type strain JR1 and its annotation. This is part of a Joint Genome Institute 2006 Community Sequencing Program to sequence genomes of diverse Archaea.

  7. Complete genome sequence of Methanocorpusculum labreanum type strain Z

    SciTech Connect

    Anderson, Iain; Sieprawska-Lupa, Magdalena; Goltsman, Eugene; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Tice, Hope; Dalin, Eileen; Barry, Kerrie; Pitluck, Sam; Hauser, Loren John; Land, Miriam L; Lucas, Susan; Richardson, P M; Whitman, W. B.; Kyrpides, Nikos C

    2009-01-01

    Methanocorpusculum labreanum is a methanogen belonging to the order Methanomicrobiales within the archaeal phylum Euryarchaeota. The type strain Z was isolated from surface sediments of Tar Pit Lake in the La Brea Tar Pits in Los Angeles, California. M. labreanum is of phylogenetic interest because at the time the sequencing project began only one genome had previously been sequenced from the order Methanomicrobiales. We report here the complete genome sequence of M. labreanum type strain Z and its annotation. This is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.

  8. Genomic Sequence Comparisons, 1987-2003 Final Report

    SciTech Connect

    George M. Church

    2004-07-29

    This project was to develop new DNA sequencing and RNA and protein quantitation methods and related genome annotation tools. The project began in 1987 with the development of multiplex sequencing (published in Science in 1988), and one of the first automated sequencing methods. This lead to the first commercial genome sequence in 1994 and to the establishment of the main commercial participants (GTC then Agencourt) in the public DOE/NIH genome project. In collaboration with GTC we contributed to one of the first complete DOE genome sequences, in 1997, that of Methanobacterium thermoautotropicum, a species of great relevance to energy-rich gas production.

  9. Identification and annotation of repetitive sequences in fungal genomes

    USDA-ARS?s Scientific Manuscript database

    Cheaper and faster sequencing technologies have fundamentally changed the pace of genome sequencing projects and have contributed to the ever-increasing volume of genomic data. This has been paralleled by an increase in computational power and resources to process and translate raw sequence data int...

  10. First Complete Genome Sequence of Cherry virus A

    PubMed Central

    Koinuma, Hiroaki; Nijo, Takamichi; Iwabuchi, Nozomu; Yoshida, Tetsuya; Keima, Takuya; Okano, Yukari; Maejima, Kensaku; Yamaji, Yasuyuki

    2016-01-01

    The 5′-terminal genomic sequence of Cherry virus A (CVA) has long been unknown. We determined the first complete genome sequence of an apricot isolate of CVA (7,434 nucleotides [nt]). The 5′-untranslated region was 107 nt in length, which was 53 nt longer than those of known CVA sequences. PMID:27284130

  11. Next Generation Sequencing at the University of Chicago Genomics Core

    SciTech Connect

    Faber, Pieter

    2013-04-24

    The University of Chicago Genomics Core provides University of Chicago investigators (and external clients) access to State-of-the-Art genomics capabilities: next generation sequencing, Sanger sequencing / genotyping and micro-arrays (gene expression, genotyping, and methylation). The current presentation will highlight our capabilities in the area of ultra-high throughput sequencing analysis.

  12. Draft Genome Sequence of Bacillus amyloliquefaciens B-1895

    PubMed Central

    Melnikov, Vyacheslav G.; Chistyakov, Vladimir A.

    2014-01-01

    In this report, we present a draft genome sequence of Bacillus amyloliquefaciens strain B-1895. Comparison with the genome of a reference strain demonstrated similar overall organization, as well as differences involving large gene clusters. PMID:24948774

  13. Draft Genome Sequence of Brevibacterium massiliense Strain 541308T

    PubMed Central

    Robert, Catherine; Gimenez, Grégory; Raoult, Didier

    2012-01-01

    A draft genome sequence of Brevibacterium massiliense, an aerobic bacterium isolated from a human ankle discharge, is described here. CRISPR-associated proteins were found to be encoded in the genome, and analysis of transport proteins was performed. PMID:22933772

  14. Genome sequencing of the important oilseed crop Sesamum indicum L

    PubMed Central

    2013-01-01

    The Sesame Genome Working Group (SGWG) has been formed to sequence and assemble the sesame (Sesamum indicum L.) genome. The status of this project and our planned analyses are described. PMID:23369264

  15. Racial/Ethnic Disparities in Genomic Sequencing.

    PubMed

    Spratt, Daniel E; Chan, Tiffany; Waldron, Levi; Speers, Corey; Feng, Felix Y; Ogunwobi, Olorunseun O; Osborne, Joseph R

    2016-08-01

    Although poorly understood, there is heterogeneity in the molecular biology of cancer across race and ethnicities. The representation of racial minorities in large genomic sequencing efforts is unclear, and could have an impact on health care disparities. To determine the racial distribution among samples sequenced within The Cancer Genome Atlas (TCGA) and the deficit of samples needed to detect moderately common mutational frequencies in racial minorities. This was a retrospective review of individual patient data from TCGA data portal accessed in July 2015. TCGA comprises samples from a wide array of institutions primarily across the United States. Samples from 10 of the 31 currently available tumor types were analyzed, comprising 5729 samples from the approximately 11 000 available. Using the estimated median somatic mutational frequency, the samples needed beyond TCGA to detect a 10% and 5% mutational frequency over the background somatic mutation frequency were calculated for each tumor type by racial ethnicity. Of the 5729 samples, 77% (n = 4389) were white, 12% (n = 660) were black, 3% (n = 173) were Asian, 3% (n = 149) were Hispanic, and less than 0.5% combined were from patients of Native Hawaiian, Pacific Islander, Alaskan Native, or American Indian decent. This overrepresents white patients compared with the US population and underrepresents primarily Asian and Hispanic patients. With a somatic mutational frequency of 0.7 (prostate cancer) to 9.9 (lung squamous cell cancer), all tumor types from white patients contained enough samples to detect a 10% mutational frequency. This is in contrast to all other racial ethnicities, for which group-specific mutations with 10% frequency would be detectable only for black patients with breast cancer. Group-specific mutations with 5% frequency would be undetectable in any racial minority, but detectable in white patients for all cancer types except lung (adenocarcinoma and squamous cell carcinoma

  16. Complete genome sequence of Acetohalobium arabaticum type strain (Z-7288).

    PubMed

    Sikorski, Johannes; Lapidus, Alla; Chertkov, Olga; Lucas, Susan; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Han, Cliff; Brambilla, Evelyne; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Bruce, David; Detter, Chris; Tapia, Roxanne; Goodwin, Lynne; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Rohde, Manfred; Göker, Markus; Spring, Stefan; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-08-20

    Acetohalobium arabaticum Zhilina and Zavarzin 1990 is of special interest because of its physiology and its participation in the anaerobic C(1)-trophic chain in hypersaline environments. This is the first completed genome sequence of the family Halobacteroidaceae and only the second genome sequence in the order Halanaerobiales. The 2,469,596 bp long genome with its 2,353 protein-coding and 90 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  17. Complete genome sequence of Arcanobacterium haemolyticum type strain (11018T)

    SciTech Connect

    Yasawong, Montri; Teshima, Hazuki; Lapidus, Alla L.; Nolan, Matt; Lucas, Susan; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Bruce, David; Detter, J. Chris; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Rohde, Manfred; Sikorski, Johannes; Pukall, Rudiger; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-01

    Vulcanisaeta distributa Itoh et al. 2002 belongs to the family Thermoproteaceae in the phylum Crenarchaeota. The genus Vulcanisaeta is characterized by a global distribution in hot and acidic springs. This is the first genome sequence from a member of the genus Vulcanisaeta and seventh genome sequence in the family Thermoproteaceae. The 2,374,137 bp long genome with its 2,544 protein-coding and 49 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  18. Next-generation sequencing strategies for characterizing the turkey genome.

    PubMed

    Dalloul, Rami A; Zimin, Aleksey V; Settlage, Robert E; Kim, Sungwon; Reed, Kent M

    2014-02-01

    The turkey genome sequencing project was initiated in 2008 and has relied primarily on next-generation sequencing (NGS) technologies. Our first efforts used a synergistic combination of 2 NGS platforms (Roche/454 and Illumina GAII), detailed bacterial artificial chromosome (BAC) maps, and unique assembly tools to sequence and assemble the genome of the domesticated turkey, Meleagris gallopavo. Since the first release in 2010, efforts to improve the genome assembly, gene annotation, and genomic analyses continue. The initial assembly build (2.01) represented about 89% of the genome sequence with 17X coverage depth (931 Mb). Sequence contigs were assigned to 30 of the 40 chromosomes with approximately 10% of the assembled sequence corresponding to unassigned chromosomes (ChrUn). The sequence has been refined through both genome-wide and area-focused sequencing, including shotgun and paired-end sequencing, and targeted sequencing of chromosomal regions with low or incomplete coverage. These additional efforts have improved the sequence assembly resulting in 2 subsequent genome builds of higher genome coverage (25X/Build3.0 and 30X/Build4.0) with a current sequence totaling 1,010 Mb. Further, BAC with end sequences assigned to the Z/W and MG18 (MHC) chromosomes, ChrUn, or not placed in the previous build were isolated, deeply sequenced (Hi-Seq), and incorporated into the latest build (5.0). To aid in the annotation and to generate a gene expression atlas of major tissues, a comprehensive set of RNA samples was collected at various developmental stages of female and male turkeys. Transcriptome sequencing data (using Illumina Hi-Seq) will provide information to enhance the final assembly and ultimately improve sequence annotation. The most current sequence covers more than 95% of the turkey genome and should yield a much improved gene level of annotation, making it a valuable resource for studying genetic variations underlying economically important traits in poultry.

  19. Integration of new alternative reference strain genome sequences into the Saccharomyces genome database.

    PubMed

    Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C; Dalusag, Kyla; Demeter, Janos; Engel, Stacia; Hellerstedt, Sage T; Karra, Kalpana; Hitz, Benjamin C; Nash, Robert S; Paskov, Kelley; Sheppard, Travis; Skrzypek, Marek; Weng, Shuai; Wong, Edith; Michael Cherry, J

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. To provide a wider scope of genetic and phenotypic variation in yeast, the genome sequences and their corresponding annotations from 11 alternative S. cerevisiae reference strains have been integrated into SGD. Genomic and protein sequence information for genes from these strains are now available on the Sequence and Protein tab of the corresponding Locus Summary pages. We illustrate how these genome sequences can be utilized to aid our understanding of strain-specific functional and phenotypic differences.Database URL: www.yeastgenome.org.

  20. Draft Genome Sequence for a Urinary Isolate of Nosocomiicoccus ampullae

    PubMed Central

    Hilt, Evann E.; Price, Travis K.; Diebel, Katherine; Putonti, Catherine

    2016-01-01

    A draft genome sequence for a urinary isolate of Nosocomiicoccus ampullae (UMB0853) was investigated. The size of the genome was 1,578,043 bp, with an observed G+C content of 36.1%. Annotation revealed 10 rRNA sequences, 40 tRNA genes, and 1,532 protein-coding sequences. Genome coverage was 727× and consisted of 32 contigs, with an N50 of 109,831 bp. PMID:27856579

  1. Selection to sequence: opportunities in fungal genomics

    SciTech Connect

    Baker, Scott E.

    2009-12-01

    Selection is a biological force, causing genotypic and phenotypic change over time. Whether environmental or human induced, selective pressures shape the genotypes and the phenotypes of organisms both in nature and in the laboratory. In nature, selective pressure is highly dynamic and the sum of the environment and other organisms. In the laboratory, selection is used in genetic studies and industrial strain development programs to isolate mutants affecting biological processes of interest to researchers. Selective pressures are important considerations for fungal biology. In the laboratory a number of fungi are used as experimental systems to study a wide range of biological processes and in nature fungi are important pathogens of plants and animals and play key roles in carbon and nitrogen cycling. The continued development of high throughput sequencing technologies makes it possible to characterize at the genomic level, the effect of selective pressures both in the lab and in nature for filamentous fungi as well as other organisms.

  2. The complete genome sequence of a dog: a perspective.

    PubMed

    Lee, Soohyun; Kasif, Simon

    2006-06-01

    A complete, high-quality reference sequence of a dog genome was recently produced by a team of researchers led by the Broad Institute, achieving another major milestone in deciphering the genomic landscape of mammalian organisms. The genome sequence provides an indispensable resource for comparative analysis and novel insights into dog and human evolution and history. Together with the survey sequence of a poodle previously published in 2003, the two dog genome sequences allowed identification of more than 2.5 million single nucleotide polymorphisms within and between dog breeds, which can be used in evolutionary analysis, behavioral studies and disease gene mapping.(1)

  3. Mitochondrial genome sequences from wild and cultivated barley (Hordeum vulgare).

    PubMed

    Hisano, Hiroshi; Tsujimura, Mai; Yoshida, Hideya; Terachi, Toru; Sato, Kazuhiro

    2016-10-24

    Sequencing analysis of mitochondrial genomes is important for understanding the evolution and genome structures of various plant species. Barley is a self-pollinated diploid plant with seven chromosomes comprising a large haploid genome of 5.1 Gbp. Wild barley (Hordeum vulgare ssp. spontaneum) and cultivated barley (H. vulgare ssp. vulgare) have cross compatibility and closely related genomes, although a significant number of nucleotide polymorphisms have been reported between their genomes. We determined the complete nucleotide sequences of the mitochondrial genomes of wild and cultivated barley. Two independent circular maps of the 525,599 bp barley mitochondrial genome were constructed by de novo assembly of high-throughput sequencing reads of barley lines H602 and Haruna Nijo, with only three SNPs detected between haplotypes. These mitochondrial genomes contained 33 protein-coding genes, three ribosomal RNAs, 16 transfer RNAs, 188 new ORFs, six major repeat sequences and several types of transposable elements. Of the barley mitochondrial genome-encoded proteins, NAD6, NAD9 and RPS4 had unique structures among grass species. The mitochondrial genome of barley was similar to those of other grass species in terms of gene content, but the configuration of the genes was highly differentiated from that of other grass species. Mitochondrial genome sequencing is essential for annotating the barley nuclear genome; our mitochondrial sequencing identified a significant number of fragmented mitochondrial sequences in the reported nuclear genome sequences. Little polymorphism was detected in the barley mitochondrial genome sequences, which should be explored further to elucidate the evolution of barley.

  4. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

    USDA-ARS?s Scientific Manuscript database

    An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions fr...

  5. The reference genome sequence of Saccharomyces cerevisiae: then and now.

    PubMed

    Engel, Stacia R; Dietrich, Fred S; Fisk, Dianna G; Binkley, Gail; Balakrishnan, Rama; Costanzo, Maria C; Dwight, Selina S; Hitz, Benjamin C; Karra, Kalpana; Nash, Robert S; Weng, Shuai; Wong, Edith D; Lloyd, Paul; Skrzypek, Marek S; Miyasato, Stuart R; Simison, Matt; Cherry, J Michael

    2014-03-20

    The genome of the budding yeast Saccharomyces cerevisiae was the first completely sequenced from a eukaryote. It was released in 1996 as the work of a worldwide effort of hundreds of researchers. In the time since, the yeast genome has been intensively studied by geneticists, molecular biologists, and computational scientists all over the world. Maintenance and annotation of the genome sequence have long been provided by the Saccharomyces Genome Database, one of the original model organism databases. To deepen our understanding of the eukaryotic genome, the S. cerevisiae strain S288C reference genome sequence was updated recently in its first major update since 1996. The new version, called "S288C 2010," was determined from a single yeast colony using modern sequencing technologies and serves as the anchor for further innovations in yeast genomic science.

  6. The Reference Genome Sequence of Saccharomyces cerevisiae: Then and Now

    PubMed Central

    Engel, Stacia R.; Dietrich, Fred S.; Fisk, Dianna G.; Binkley, Gail; Balakrishnan, Rama; Costanzo, Maria C.; Dwight, Selina S.; Hitz, Benjamin C.; Karra, Kalpana; Nash, Robert S.; Weng, Shuai; Wong, Edith D.; Lloyd, Paul; Skrzypek, Marek S.; Miyasato, Stuart R.; Simison, Matt; Cherry, J. Michael

    2014-01-01

    The genome of the budding yeast Saccharomyces cerevisiae was the first completely sequenced from a eukaryote. It was released in 1996 as the work of a worldwide effort of hundreds of researchers. In the time since, the yeast genome has been intensively studied by geneticists, molecular biologists, and computational scientists all over the world. Maintenance and annotation of the genome sequence have long been provided by the Saccharomyces Genome Database, one of the original model organism databases. To deepen our understanding of the eukaryotic genome, the S. cerevisiae strain S288C reference genome sequence was updated recently in its first major update since 1996. The new version, called “S288C 2010,” was determined from a single yeast colony using modern sequencing technologies and serves as the anchor for further innovations in yeast genomic science. PMID:24374639

  7. Whole-Genome Sequencing in Outbreak Analysis

    PubMed Central

    Turner, Stephen D.; Riley, Margaret F.; Petri, William A.; Hewlett, Erik L.

    2015-01-01

    SUMMARY In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  8. A taste of pineapple evolution through genome sequencing.

    PubMed

    Xu, Qing; Liu, Zhong-Jian

    2015-12-01

    The genome sequence assembly of the highly heterozygous Ananas comosus and its varieties is an impressive technical achievement. The sequence opens the door to a greater understanding of pineapple morphology and evolution.

  9. Complete Genome Sequence of Pigmentation Negative Yersinia Pestis strain Cadman Running head: Complete Genome Sequence of Y. pestis strain Cadman

    DTIC Science & Technology

    2016-10-27

    Institute of Infectious Diseases, Fort Detrick, Frederick, Maryland, USA 9 10 11 Running head: Complete Genome Sequence of Y. pestis strain Cadman...1 Complete Genome Sequence of Pigmentation Negative Yersinia pestis strain Cadman 1 2 3 Sean Lovetta, Kitty Chaseb, Galina Korolevaa, Gustavo...we report the genome sequence of Yersinia pestis strain Cadman, an attenuated strain 25 lacking the pgm locus. Y. pestis is the causative agent of

  10. The Novel Secreted Factor MIG-18 Acts with MIG-17/ADAMTS to Control Cell Migration in Caenorhabditis elegans

    PubMed Central

    Kim, Hon-Song; Kitano, Yuko; Mori, Masataka; Takano, Tomomi; Harbaugh, Thomas Edward; Mizutani, Kae; Yanagimoto, Haruka; Miwa, Sayaka; Ihara, Shinji; Kubota, Yukihiko; Shibata, Yukimasa; Ikenishi, Kohji; Garriga, Gian; Nishiwaki, Kiyoji

    2014-01-01

    The migration of Caenorhabditis elegans gonadal distal tip cells (DTCs) offers an excellent model to study the migration of epithelial tubes in organogenesis. mig-18 mutants cause meandering or wandering migration of DTCs during gonad formation, which is very similar to that observed in animals with mutations in mig-17, which encodes a secreted metalloprotease of the ADAMTS (a disintegrin and metalloprotease with thrombospondin motifs) family. MIG-18 is a novel secreted protein that is conserved only among nematode species. The mig-17(null) and mig-18 double mutants exhibited phenotypes similar to those in mig-17(null) single mutants. In addition, the mutations in fbl-1/fibulin-1 and let-2/collagen IV that suppress mig-17 mutations also suppressed the mig-18 mutation, suggesting that mig-18 and mig-17 function in a common genetic pathway. The Venus-MIG-18 fusion protein was secreted from muscle cells and localized to the gonadal basement membrane, a tissue distribution reminiscent of that observed for MIG-17. Overexpression of MIG-18 in mig-17 mutants and vice versa partially rescued the relevant DTC migration defects, suggesting that MIG-18 and MIG-17 act cooperatively rather than sequentially. We propose that MIG-18 may be a cofactor of MIG-17/ADAMTS that functions in the regulation of the gonadal basement membrane to achieve proper direction of DTC migration during gonadogenesis. PMID:24318535

  11. The novel secreted factor MIG-18 acts with MIG-17/ADAMTS to control cell migration in Caenorhabditis elegans.

    PubMed

    Kim, Hon-Song; Kitano, Yuko; Mori, Masataka; Takano, Tomomi; Harbaugh, Thomas Edward; Mizutani, Kae; Yanagimoto, Haruka; Miwa, Sayaka; Ihara, Shinji; Kubota, Yukihiko; Shibata, Yukimasa; Ikenishi, Kohji; Garriga, Gian; Nishiwaki, Kiyoji

    2014-02-01

    The migration of Caenorhabditis elegans gonadal distal tip cells (DTCs) offers an excellent model to study the migration of epithelial tubes in organogenesis. mig-18 mutants cause meandering or wandering migration of DTCs during gonad formation, which is very similar to that observed in animals with mutations in mig-17, which encodes a secreted metalloprotease of the ADAMTS (a disintegrin and metalloprotease with thrombospondin motifs) family. MIG-18 is a novel secreted protein that is conserved only among nematode species. The mig-17(null) and mig-18 double mutants exhibited phenotypes similar to those in mig-17(null) single mutants. In addition, the mutations in fbl-1/fibulin-1 and let-2/collagen IV that suppress mig-17 mutations also suppressed the mig-18 mutation, suggesting that mig-18 and mig-17 function in a common genetic pathway. The Venus-MIG-18 fusion protein was secreted from muscle cells and localized to the gonadal basement membrane, a tissue distribution reminiscent of that observed for MIG-17. Overexpression of MIG-18 in mig-17 mutants and vice versa partially rescued the relevant DTC migration defects, suggesting that MIG-18 and MIG-17 act cooperatively rather than sequentially. We propose that MIG-18 may be a cofactor of MIG-17/ADAMTS that functions in the regulation of the gonadal basement membrane to achieve proper direction of DTC migration during gonadogenesis.

  12. Insights from 20 years of bacterial genome sequencing.

    PubMed

    Land, Miriam; Hauser, Loren; Jun, Se-Ran; Nookaew, Intawat; Leuze, Michael R; Ahn, Tae-Hyuk; Karpinets, Tatiana; Lund, Ole; Kora, Guruprased; Wassenaar, Trudy; Poudel, Suresh; Ussery, David W

    2015-03-01

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome

  13. Insights from 20 years of bacterial genome sequencing

    SciTech Connect

    Land, Miriam L.; Hauser, Loren; Jun, Se-Ran; Nookaew, Intawat; Leuze, Michael Rex; Ahn, Tae-Hyuk; Karpinets, Tatiana V.; Lund, Ole; Kora, Guruprased H.; Wassenaar, Trudy; Poudel, Suresh; Ussery, David W.

    2015-02-27

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome

  14. Insights from 20 years of bacterial genome sequencing

    DOE PAGES

    Land, Miriam L.; Hauser, Loren; Jun, Se-Ran; ...

    2015-02-27

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date,more » there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about

  15. Insights from twenty years of bacterial genome sequencing

    SciTech Connect

    Land, Miriam L; Hauser, Loren John; Jun, Se Ran; Nookaew, Intawat; Leuze, Michael Rex; Ahn, Tae-Hyuk; Karpinets, Tatiana V; Lund, Ole; Kora, Guruprasad H; Wassenaar, Trudy; Poudel, Suresh; Ussery, David W

    2015-01-01

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome

  16. First complete genome sequence of Bacillus glycinifermentans B-27.

    PubMed

    Stadermann, Kai Bernd; Blom, Jochen; Borgmeier, Claudia; Sciberras, Natalie; Herbold, Sandra; Kipker, Maike; Meurer, Guido; Molck, Stella; Petri, Daniel; Pelzer, Stefan; Schneider, Jessica

    2017-09-10

    The first complete genome sequence of Bacillus glycinifermentans B-27 was determined by SMRT sequencing generating a genome sequence with a total length of 4,607,442 bases. Based on this sequence 4738 protein-coding sequences were predicted and used to identify gene clusters that are related to the production of secondary metabolites such as Lichenysin, Bacillibactin and Bacitracin. This genomic potential combined with the ability of B. glycinifermentans B-27 to grown in bile containing media might contribute to a future application of this strain as probiotic in productive livestock potentially inhibiting competing and pathogenic organisms. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  17. MIPS: a database for protein sequences and complete genomes.

    PubMed Central

    Mewes, H W; Hani, J; Pfeiffer, F; Frishman, D

    1998-01-01

    The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). Through its WWW server (http://www.mips.biochem.mpg.de ) MIPS provides access to a variety of generic databases, including a database of protein families as well as automatically generated data by the systematic application of sequence analysis algorithms. The yeast genome sequence and its related information was also compiled on CD-ROM to provide dynamic interactive access to the 16 chromosomes of the first eukaryotic genome unraveled. PMID:9399795

  18. Genome Project Standards in a New Era of Sequencing

    SciTech Connect

    GSC Consortia; HMP Jumpstart Consortia; Chain, P. S. G.; Grafham, D. V.; Fulton, R. S.; FitzGerald, M. G.; Hostetler, J.; Muzny, D.; Detter, J. C.; Ali, J.; Birren, B.; Bruce, D. C.; Buhay, C.; Cole, J. R.; Ding, Y.; Dugan, S.; Field, D.; Garrity, G. M.; Gibbs, R.; Graves, T.; Han, C. S.; Harrison, S. H.; Highlander, S.; Hugenholtz, P.; Khouri, H. M.; Kodira, C. D.; Kolker, E.; Kyrpides, N. C.; Lang, D.; Lapidus, A.; Malfatti, S. A.; Markowitz, V.; Metha, T.; Nelson, K. E.; Parkhill, J.; Pitluck, S.; Qin, X.; Read, T. D.; Schmutz, J.; Sozhamannan, S.; Strausberg, R.; Sutton, G.; Thomson, N. R.; Tiedje, J. M.; Weinstock, G.; Wollam, A.

    2009-06-01

    For over a decade, genome 43 sequences have adhered to only two standards that are relied on for purposes of sequence analysis by interested third parties (1, 2). However, ongoing developments in revolutionary sequencing technologies have resulted in a redefinition of traditional whole genome sequencing that requires a careful reevaluation of such standards. With commercially available 454 pyrosequencing (followed by Illumina, SOLiD, and now Helicos), there has been an explosion of genomes sequenced under the moniker 'draft', however these can be very poor quality genomes (due to inherent errors in the sequencing technologies, and the inability of assembly programs to fully address these errors). Further, one can only infer that such draft genomes may be of poor quality by navigating through the databases to find the number and type of reads deposited in sequence trace repositories (and not all genomes have this available), or to identify the number of contigs or genome fragments deposited to the database. The difficulty in assessing the quality of such deposited genomes has created some havoc for genome analysis pipelines and contributed to many wasted hours of (mis)interpretation. These same novel sequencing technologies have also brought an exponential leap in raw sequencing capability, and at greatly reduced prices that have further skewed the time- and cost-ratios of draft data generation versus the painstaking process of improving and finishing a genome. The resulting effect is an ever-widening gap between drafted and finished genomes that only promises to continue (Figure 1), hence there is an urgent need to distinguish good and poor datasets. The sequencing institutes in the authorship, along with the NIH's Human Microbiome Project Jumpstart Consortium (3), strongly believe that a new set of standards is required for genome sequences. The following represents a set of six community-defined categories of genome sequence standards that better reflect the

  19. Genome Wide Characterization of Simple Sequence Repeats in Cucumber

    USDA-ARS?s Scientific Manuscript database

    The whole genome sequence of the cucumber cultivar Gy14 was recently sequenced at 15× coverage with the Roche 454 Titanium technology. The microsatellite DNA sequences (simple sequence repeats, SSRs) in the assembled scaffolds were computationally explored and characterized. A total of 112,073 SSRs ...

  20. Finishing The Euchromatic Sequence Of The Human Genome

    SciTech Connect

    Rubin, Edward M.; Lucas, Susan; Richardson, Paul; Rokhsar, Daniel; Pennacchio, Len

    2004-09-07

    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process.The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers {approx}99% of the euchromatic genome and is accurate to an error rate of {approx}1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number,birth and death. Notably, the human genome seems to encode only20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead.

  1. Validation of rice genome sequence by optical mapping

    PubMed Central

    Zhou, Shiguo; Bechner, Michael C; Place, Michael; Churas, Chris P; Pape, Louise; Leong, Sally A; Runnheim, Rod; Forrest, Dan K; Goldstein, Steve; Livny, Miron; Schwartz, David C

    2007-01-01

    Background Rice feeds much of the world, and possesses the simplest genome analyzed to date within the grass family, making it an economically relevant model system for other cereal crops. Although the rice genome is sequenced, validation and gap closing efforts require purely independent means for accurate finishing of sequence build data. Results To facilitate ongoing sequencing finishing and validation efforts, we have constructed a whole-genome SwaI optical restriction map of the rice genome. The physical map consists of 14 contigs, covering 12 chromosomes, with a total genome size of 382.17 Mb; this value is about 11% smaller than original estimates. 9 of the 14 optical map contigs are without gaps, covering chromosomes 1, 2, 3, 4, 5, 7, 8 10, and 12 in their entirety – including centromeres and telomeres. Alignments between optical and in silico restriction maps constructed from IRGSP (International Rice Genome Sequencing Project) and TIGR (The Institute for Genomic Research) genome sequence sources are comprehensive and informative, evidenced by map coverage across virtually all published gaps, discovery of new ones, and characterization of sequence misassemblies; all totalling ~14 Mb. Furthermore, since optical maps are ordered restriction maps, identified discordances are pinpointed on a reliable physical scaffold providing an independent resource for closure of gaps and rectification of misassemblies. Conclusion Analysis of sequence and optical mapping data effectively validates genome sequence assemblies constructed from large, repeat-rich genomes. Given this conclusion we envision new applications of such single molecule analysis that will merge advantages offered by high-resolution optical maps with inexpensive, but short sequence reads generated by emerging sequencing platforms. Lastly, map construction techniques presented here points the way to new types of comparative genome analysis that would focus on discernment of structural differences

  2. Microbial genome sequencing using optical mapping and Illumina sequencing

    USDA-ARS?s Scientific Manuscript database

    Introduction Optical mapping is a technique in which strands of genomic DNA are digested with one or more restriction enzymes, and a physical map of the genome constructed from the resulting image. In outline, genomic DNA is extracted from a pure culture, linearly arrayed on a specialized glass sli...

  3. Genome sequence of Brevibacillus laterosporus strain GI-9.

    PubMed

    Sharma, Vikas; Singh, Pradip K; Midha, Samriti; Ranjan, Manish; Korpole, Suresh; Patil, Prabhu B

    2012-03-01

    We report the 5.18-Mb genome sequence of Brevibacillus laterosporus strain GI-9, isolated from a subsurface soil sample during a screen for novel strains producing antimicrobial compounds. The draft genome of this strain will aid in biotechnological exploitation and comparative genomics of Brevibacillus laterosporus strains.

  4. Draft Genome Sequence of Cystobacter ferrugineus Strain Cbfe23

    PubMed Central

    Akbar, Shukria; Dowd, Scot E.

    2017-01-01

    ABSTRACT In an effort to explore myxobacterial natural product biosynthetic pathways, the draft genome sequence of Cystobacter ferrugineus strain Cbfe23 has been obtained. Analysis of the genome using antiSMASH suggests a multitude of unique natural product biosynthetic pathways. This genome will contribute to the investigation of secondary metabolism in other myxobacterial species. PMID:28183768

  5. Draft Genome Sequence of Archangium sp. Strain Cb G35

    PubMed Central

    Adaikpoh, Barbara I.; Dowd, Scot E.

    2017-01-01

    ABSTRACT In an effort to explore myxobacterial natural product biosynthetic pathways, the draft genome sequence of Archangium sp. strain Cb G35 has been obtained. Analysis of the genome using antiSMASH predicts 49 natural product biosynthetic pathways. This genome will contribute to the investigation of myxobacterial secondary metabolite biosynthetic pathways. PMID:28232451

  6. Cancer Genome Sequencing: Understanding Malignancy as a Disease of the Genome, its Conformation, and its Evolution

    PubMed Central

    Patel, Lalit R.; Nykter, Matti; Chen, Kexin; Zhang, Wei

    2013-01-01

    Advances in cancer genomics have been propelled by the steady evolution of molecular profiling technologies. Over the past decade, high-throughput sequencing technologies have matured to the point necessary to support disease-specific shotgun sequencing. This has compelled whole-genome sequencing studies across a broad panel of malignancies. The emergence of high-throughput sequencing technologies has inspired new chemical and computational techniques enabling interrogation of cancer-specific genomic and transcriptomic variants, previously unannotated genes, and chromatin structure. Finally, recent progress in single-cell sequencing holds great promise for studies interrogating the consequences of tumor evolution in cancers presenting with genomic heterogeneity. PMID:23111104

  7. Complete Genome Sequence of Bacillus megaterium Bacteriophage Eldridge

    PubMed Central

    Reveille, Alexandra M.; Eldridge, Kimberly A.

    2016-01-01

    In this study the complete genome sequence of the unique bacteriophage Eldridge, isolated from soil using Bacillus megaterium as the host organism, was determined. Eldridge is a myovirus with a genome consisting of 242 genes and is unique when compared to phage sequences in GenBank. PMID:27103735

  8. Complete Genome Sequence of Staphylococcus pseudintermedius Type Strain LMG 22219

    PubMed Central

    Abouelkhair, Mohamed A.; Riley, Matthew C.; Bemis, David A.

    2017-01-01

    ABSTRACT We report the first complete genome sequence of LMG 22219 (=ON 86T = CCUG 49543T), the Staphylococcus pseudintermedius type strain isolated from feline lung tissue. This sequence information will facilitate phylogenetic comparisons of staphylococcal species and other bacteria at the genome level. PMID:28209834

  9. Draft Genome Sequence of the Pelagic Photoferrotroph Chlorobium phaeoferrooxidans

    PubMed Central

    Hahn, Aria S.; Morgan-Lang, Connor; Thompson, Katherine J.; Simister, Rachel L.; Llirós, Marc; Hirst, Martin; Hallam, Steven J.

    2017-01-01

    ABSTRACT Here, we report the draft genome sequence of Chlorobium phaeoferrooxidans, a photoferrotrophic member of the genus Chlorobium in the phylum Chlorobi. This genome sequence provides insight into the metabolic capacity that underpins photoferrotrophy within low-light-adapted pelagic Chlorobi. PMID:28360175

  10. Whole-Genome Sequences of 26 Vibrio cholerae Isolates

    PubMed Central

    Watve, Samit S.; Chande, Aroon T.; Rishishwar, Lavanya; Jordan, I. King

    2016-01-01

    The human pathogen Vibrio cholerae employs several adaptive mechanisms for environmental persistence, including natural transformation and type VI secretion, creating a reservoir for the spread of disease. Here, we report whole-genome sequences of 26 diverse V. cholerae isolates, significantly increasing the sequence diversity of publicly available V. cholerae genomes. PMID:28007852

  11. Draft Genome Sequence of Rhodococcus sp. Strain 66b

    PubMed Central

    Myers, Cindy A.; O’Sullivan, Cathryn A.; Roper, Margaret M.

    2017-01-01

    ABSTRACT We report here the draft genome sequence and annotation of Rhodococcus sp. strain 66b isolated from the soil of southwest Western Australia. This strain exhibits a range of bioactivities, including plant growth promotion, biosurfactant production, and wax degradation. Whole-genome sequencing was conducted to uncover the underlying mechanisms. PMID:28546474

  12. Complete Genome Sequence of Lactobacillus plantarum CGMCC 8198

    PubMed Central

    Dong, Qing-Qing; Hu, Hai-Jie; Wang, Qiu-Tong; Gu, Xiang-Chao; Zhou, Hao; Zhou, Wen-Juan; Ni, Xiao-Meng

    2017-01-01

    ABSTRACT We report the complete genome sequence of Lactobacillus plantarum CGMCC 8198, a novel probiotic strain isolated from fermented herbage. We have determined the complete genome sequence of strain L. plantarum CGMCC 8198, which consists of genes that are likely to be involved in dairy fermentation and that have probiotic qualities. PMID:28183756

  13. Complete Genome Sequence of Burkholderia cepacia Strain LO6

    PubMed Central

    Belcaid, Mahdi; Kang, Yun; Tuanyok, Apichai

    2015-01-01

    Burkholderia cepacia strain LO6 is a betaproteobacterium that was isolated from a cystic fibrosis patient. Here we report the 6.4 Mb draft genome sequence assembled into 2 contigs. This genome sequence will aid the transcriptomic profiling of this bacterium and help us to better understand the mechanisms specific to pulmonary infections. PMID:26067955

  14. Complete Genome Sequence of Burkholderia cepacia Strain LO6.

    PubMed

    Belcaid, Mahdi; Kang, Yun; Tuanyok, Apichai; Hoang, Tung T

    2015-06-11

    Burkholderia cepacia strain LO6 is a betaproteobacterium that was isolated from a cystic fibrosis patient. Here we report the 6.4 Mb draft genome sequence assembled into 2 contigs. This genome sequence will aid the transcriptomic profiling of this bacterium and help us to better understand the mechanisms specific to pulmonary infections.

  15. Draft Genome Sequence of Neurospora crassa Strain FGSC 73

    SciTech Connect

    Baker, Scott E.; Schackwitz, Wendy; Lipzen, Anna; Martin, Joel; Haridas, Sajeet; LaButti, Kurt; Grigoriev, Igor V.; Simmons, Blake A.; McCluskey, Kevin

    2015-03-05

    We report the elucidation of the complete genome of the Neurospora crassa (Shear and Dodge) strain FGSC 73, a mat-a, trp-3 mutant strain. The genome sequence around the idiotypic mating type locus represents the only publicly available sequence for a mat-a strain. 40.42 Megabases are assembled into 358 scaffolds carrying 11,978 gene models.

  16. Draft Genome Sequence of Neurospora crassa Strain FGSC 73

    DOE PAGES

    Baker, Scott E.; Schackwitz, Wendy; Lipzen, Anna; ...

    2015-04-02

    We report the elucidation of the complete genome of the Neurospora crassa (Shear and Dodge) strain FGSC 73, a mat-a, trp-3 mutant strain. The genome sequence around the idiotypic mating type locus represents the only publicly available sequence for a mat-a strain. 40.42 Megabases are assembled into 358 scaffolds carrying 11,978 gene models.

  17. Genome Sequence of Pasteurella multocida Strain Razi_Pm0001

    PubMed Central

    Tadayon, Keyvan

    2017-01-01

    ABSTRACT We report here the genome sequence of Pasteurella multocida Razi_Pm0001 from bovine origin, isolated in Iran in 1936. The genome has a size of 2,360,663 bp, a G+C content of 40.4%, and is predicted to contain 2,052 coding sequences. PMID:28153892

  18. Complete genome sequence of Enterobacter aerogenes KCTC 2190.

    PubMed

    Shin, Sang Heum; Kim, Sewhan; Kim, Jae Young; Lee, Soojin; Um, Youngsoon; Oh, Min-Kyu; Kim, Young-Rok; Lee, Jinwon; Yang, Kap-Seok

    2012-05-01

    This is the first complete genome sequence of the Enterobacter aerogenes species. Here we present the genome sequence of E. aerogenes KCTC 2190, which contains 5,280,350 bp with a G + C content of 54.8 mol%, 4,912 protein-coding genes, and 109 structural RNAs.

  19. Complete Genome Sequences of Five Paenibacillus larvae Bacteriophages.

    PubMed

    Sheflo, Michael A; Gardner, Adam V; Merrill, Bryan D; Fisher, Joshua N B; Lunt, Bryce L; Breakwell, Donald P; Grose, Julianne H; Burnett, Sandra H

    2013-11-14

    Paenibacillus larvae is a pathogen of honeybees that causes American foulbrood (AFB). We isolated bacteriophages from soil containing bee debris collected near beehives in Utah. We announce five high-quality complete genome sequences, which represent the first completed genome sequences submitted to GenBank for any P. larvae bacteriophage.

  20. Draft Genome Sequence of Tannerella forsythia Type Strain ATCC 43037.

    PubMed

    Friedrich, Valentin; Pabinger, Stephan; Chen, Tsute; Messner, Paul; Dewhirst, Floyd E; Schäffer, Christina

    2015-06-11

    Tannerella forsythia is an oral pathogen implicated in the development of periodontitis. Here, we report the draft genome sequence of the Tannerella forsythia strain ATCC 43037. The previously available genome of this designation (NCBI reference sequence NC_016610.1) was discovered to be derived from a different strain, FDC 92A2 (= ATCC BAA-2717).

  1. Draft Genome Sequence of Micrococcus luteus (Schroeter) Cohn (ATCC 12698).

    PubMed

    Putonti, Catherine; Cudone, Evan; Kalesinskas, Laurynas; Engelbrecht, Kathleen C; Koenig, David W; Wolfe, Alan J

    2017-07-06

    The actinobacterium Micrococcus luteus can be found in a wide variety of habitats. Here, we report the 2,411,958-bp draft genome sequence of the type strain M. leuteus (Schroeter) Cohn (ATCC 12698). Characteristic of this taxa, the genome sequence has a high G+C content, 73.14%. Copyright © 2017 Putonti et al.

  2. Full Genome Sequence of Giant Panda Rotavirus Strain CH-1

    PubMed Central

    Guo, Ling; Yang, Shaolin; Wang, Chengdong; Chen, Shijie; Yang, Xiaonong; Hou, Rong; Quan, Zifang; Hao, Zhongxiang

    2013-01-01

    We report here the complete genomic sequence of the giant panda rotavirus strain CH-1. This work is the first to document the complete genomic sequence (segments 1 to 11) of the CH-1 strain, which offers an effective platform for providing authentic research experiences to novice scientists. PMID:23469354

  3. Almost finished: the complete genome sequence of Mycosphaerella graminicola

    USDA-ARS?s Scientific Manuscript database

    Mycosphaerella graminicola causes septoria tritici blotch of wheat. An 8.9x shotgun sequence of bread wheat strain IPO323 was generated through the Community Sequencing Program of the U.S. Department of Energy’s Joint Genome Institute (JGI), and was finished at the Stanford Human Genome Center. The ...

  4. Initial sequencing and analysis of the human genome.

    PubMed

    Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, Y; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowki, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

    2001-02-15

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

  5. Draft genome sequence of Kocuria rhizophila P7-4.

    PubMed

    Kim, Woo-Jin; Kim, Young-Ok; Kim, Dae-Soo; Choi, Sang-Haeng; Kim, Dong-Wook; Lee, Jun-Seo; Kong, Hee Jeong; Nam, Bo-Hye; Kim, Bong-Seok; Lee, Sang-Jun; Park, Hong-Seog; Chae, Sung-Hwa

    2011-08-01

    We report the draft genome sequence of Kocuria rhizophila P7-4, which was isolated from the intestine of Siganus doliatus caught in the Pacific Ocean. The 2.83-Mb genome sequence consists of 75 large contigs (>100 bp in size) and contains 2,462 predicted protein-coding genes.

  6. Draft Genome Sequence of Vibrio (Listonella) anguillarum ATCC 14181

    PubMed Central

    Grim, Christopher J.

    2016-01-01

    We report the draft genome sequence of Vibrio anguillarum ATCC 14181, a Gram-negative, hemolytic, O2 serotype marine bacterium that causes mortality in mariculture species. The availability of this genome sequence will add to our knowledge of diversity and virulence mechanisms of Vibrio anguillarum as well as other pathogenic Vibrio spp. PMID:27795288

  7. Draft Genome Sequence of “Cohnella kolymensis” B-2846

    PubMed Central

    Kudryashova, Ekaterina B.; Ariskina, Elena V.

    2016-01-01

    A draft genome sequence of “Cohnella kolymensis” strain B-2846 was derived using IonTorrent sequencing technology. The size of the assembly and G+C content were in agreement with those of other species of this genus. Characterization of the genome of a novel species of Cohnella will assist in bacterial systematics. PMID:26769947

  8. Complete genome sequence of ‘Candidatus Liberibacter africanus’

    USDA-ARS?s Scientific Manuscript database

    The complete genome sequence of ‘Candidatus Liberibacter africanus’ (Laf), strain ptsapsy, was obtained by an Illumina HiSeq 2000. The Laf genome comprises 1,192,232 nucleotides, 34.5% GC content, 1,141 predicted coding sequences, 44 tRNAs, 3 complete copies of ribosomal RNA genes (16S, 23S and 5S) ...

  9. Draft Genome Sequence of the Suttonella ornithocola Bacterium

    PubMed Central

    Waldman Ben-Asher, Hiba; Yerushalmi, Rebecca; Wachtel, Chaim; Barbiro-Michaely, Efrat

    2017-01-01

    ABSTRACT   We report here the draft genome sequence of the Suttonella ornithocola bacterium. To date, this bacterium, found in birds, passed only phylogenetic and phenotypic analyses. To our knowledge, this is the first publication of the Suttonella ornithocola genome sequence. The genetic profile provides a basis for further analysis of its infection pathways. PMID:28209820

  10. Complete Genome Sequence of Staphylococcus pseudintermedius Type Strain LMG 22219.

    PubMed

    Abouelkhair, Mohamed A; Riley, Matthew C; Bemis, David A; Kania, Stephen A

    2017-02-16

    We report the first complete genome sequence of LMG 22219 (=ON 86(T) = CCUG 49543(T)), the Staphylococcus pseudintermedius type strain isolated from feline lung tissue. This sequence information will facilitate phylogenetic comparisons of staphylococcal species and other bacteria at the genome level.

  11. Selection of sequence variants to improve dairy cattle genomic predictions

    USDA-ARS?s Scientific Manuscript database

    Genomic prediction reliabilities improved when adding selected sequence variants from run 5 of the 1,000 bull genomes project. High density (HD) imputed genotypes for 26,970 progeny tested Holstein bulls were combined with sequence variants for 444 Holstein animals. The first test included 481,904 c...

  12. Single-cell RNA sequencing reveals an altered gene expression pattern as a result of CRISPR/cas9-mediated deletion of Gene 33/Mig6 and chronic exposure to hexavalent chromium in human lung epithelial cells.

    PubMed

    Park, Soyoung; Zhang, Xiaowen; Li, Cen; Yin, Changhong; Li, Jiangwei; Fallon, John T; Huang, Weihua; Xu, Dazhong

    2017-09-01

    Gene 33 (Mig6, ERRFI1) is an adaptor protein with multiple cellular functions. We recently reported that depletion of this protein promotes lung epithelial cell transformation induced by hexavalent chromium [Cr(VI)]. However, the early molecular events that mediate this process are not clear. In the present study, we used single-cell RNA sequencing to compare gene expression profiles between BEAS-2B lung epithelial cells chronically exposed to a sublethal dose of Cr(VI) with or without CRISPR/cas9-mediated deletion of Gene 33. Our data reveal 83 differentially expressed genes. The most notable changes are genes associated with cell adhesion, oxidative stresses, protein ubiquitination, epithelial-mesenchymal transition/metastasis, and WNT signaling. Up-regulation of some neuro-specific genes is also evident, particularly ubiquitin carboxyl-terminal hydrolase L1 (UCHL1), a deubiquitinase and potential biomarker for lung cancer. Gene 33 deletion and/or Cr(VI) exposure did not cause discernable changes in cell morphology. However, Gene 33 deletion led to a modest but significant reduction of cells in the G2/M phase of the cell cycle regardless of Cr(VI) exposure. Gene 33 deletion also significantly reduced cell proliferation. Interestingly, Cr(VI) exposure eliminated the difference in cell proliferation between the two genotypes. Gene 33 deletion also significantly elevated cell migration. Our data indicate that combined Gene 33 deletion and chronic Cr(VI) exposure produces a gene expression pattern and a phenotype resemble those of the transformed lung epithelial cells. Given the known association of UCHL1 with lung cancer, we propose that UCHL1 is an important player in the early stage of lung epithelial cell transformation and tumorigenesis. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. Haplotype-resolved genome sequencing of a Gujarati Indian individual.

    PubMed

    Kitzman, Jacob O; Mackenzie, Alexandra P; Adey, Andrew; Hiatt, Joseph B; Patwardhan, Rupali P; Sudmant, Peter H; Ng, Sarah B; Alkan, Can; Qiu, Ruolan; Eichler, Evan E; Shendure, Jay

    2011-01-01

    Haplotype information is essential to the complete description and interpretation of genomes, genetic diversity and genetic ancestry. Although individual human genome sequencing is increasingly routine, nearly all such genomes are unresolved with respect to haplotype. Here we combine the throughput of massively parallel sequencing with the contiguity information provided by large-insert cloning to experimentally determine the haplotype-resolved genome of a South Asian individual. A single fosmid library was split into a modest number of pools, each providing ∼3% physical coverage of the diploid genome. Sequencing of each pool yielded reads overwhelmingly derived from only one homologous chromosome at any given location. These data were combined with whole-genome shotgun sequence to directly phase 94% of ascertained heterozygous single nucleotide polymorphisms (SNPs) into long haplotype blocks (N50 of 386 kilobases (kbp)). This method also facilitates the analysis of structural variation, for example, to anchor novel insertions to specific locations and haplotypes.

  14. Trends in Next-Generation Sequencing and a New Era for Whole Genome Sequencing

    PubMed Central

    2016-01-01

    This article is a mini-review that provides a general overview for next-generation sequencing (NGS) and introduces one of the most popular NGS applications, whole genome sequencing (WGS), developed from the expansion of human genomics. NGS technology has brought massively high throughput sequencing data to bear on research questions, enabling a new era of genomic research. Development of bioinformatic software for NGS has provided more opportunities for researchers to use various applications in genomic fields. De novo genome assembly and large scale DNA resequencing to understand genomic variations are popular genomic research tools for processing a tremendous amount of data at low cost. Studies on transcriptomes are now available, from previous-hybridization based microarray methods. Epigenetic studies are also available with NGS applications such as whole genome methylation sequencing and chromatin immunoprecipitation followed by sequencing. Human genetics has faced a new paradigm of research and medical genomics by sequencing technologies since the Human Genome Project. The trend of NGS technologies in human genomics has brought a new era of WGS by enabling the building of human genomes databases and providing appropriate human reference genomes, which is a necessary component of personalized medicine and precision medicine. PMID:27915479

  15. Trends in Next-Generation Sequencing and a New Era for Whole Genome Sequencing.

    PubMed

    Park, Sang Tae; Kim, Jayoung

    2016-11-01

    This article is a mini-review that provides a general overview for next-generation sequencing (NGS) and introduces one of the most popular NGS applications, whole genome sequencing (WGS), developed from the expansion of human genomics. NGS technology has brought massively high throughput sequencing data to bear on research questions, enabling a new era of genomic research. Development of bioinformatic software for NGS has provided more opportunities for researchers to use various applications in genomic fields. De novo genome assembly and large scale DNA resequencing to understand genomic variations are popular genomic research tools for processing a tremendous amount of data at low cost. Studies on transcriptomes are now available, from previous-hybridization based microarray methods. Epigenetic studies are also available with NGS applications such as whole genome methylation sequencing and chromatin immunoprecipitation followed by sequencing. Human genetics has faced a new paradigm of research and medical genomics by sequencing technologies since the Human Genome Project. The trend of NGS technologies in human genomics has brought a new era of WGS by enabling the building of human genomes databases and providing appropriate human reference genomes, which is a necessary component of personalized medicine and precision medicine.

  16. From complete genome sequence to “complete“ understanding?

    PubMed Central

    Galperin, Michael Y.; Koonin, Eugene V.

    2011-01-01

    The rapidly accumulating genome sequence data allow researchers to address fundamental biological questions that were not even asked just a few years ago. A major problem in genomics is the widening gap between the rapid progress in genome sequencing and the comparatively slow progress in the functional characterization of sequenced genomes. Here we discuss two key questions of genome biology: whether we need more genomes, and how deep is our understanding of biology based on genomic analysis. We argue that overly specific annotations of gene functions are often less useful than the more generic, but also more robust, functional assignments based on protein family classification. We also discuss problems in understanding the functions of the remaining “conserved hypothetical” genes. PMID:20647113

  17. De novo assembly of a bell pepper endornavirus genome sequence using RNA sequencing data.

    PubMed

    Jo, Yeonhwa; Choi, Hoseng; Cho, Won Kyong

    2015-03-19

    The genus Endornavirus is a double-stranded RNA virus that infects a wide range of hosts. In this study, we report on the de novo assembly of a bell pepper endornavirus genome sequence by RNA sequencing (RNA-Seq). Our result demonstrates the successful application of RNA-Seq to obtain a complete viral genome sequence from the transcriptome data.

  18. The Brachypodium genome sequence: a resource for oat genomics research

    USDA-ARS?s Scientific Manuscript database

    Oat (Avena sativa) is an important cereal crop used as both an animal feed and for human consumption. Genetic and genomic research on oat is hindered because it is hexaploid and possesses a large (13 Gb) genome. Diploid Avena relatives have been employed for genetic and genomic studies, but only mod...

  19. Ten years of bacterial genome sequencing: comparative-genomics-based discoveries.

    PubMed

    Binnewies, Tim T; Motro, Yair; Hallin, Peter F; Lund, Ole; Dunn, David; La, Tom; Hampson, David J; Bellgard, Matthew; Wassenaar, Trudy M; Ussery, David W

    2006-07-01

    It has been more than 10 years since the first bacterial genome sequence was published. Hundreds of bacterial genome sequences are now available for comparative genomics, and searching a given protein against more than a thousand genomes will soon be possible. The subject of this review will address a relatively straightforward question: "What have we learned from this vast amount of new genomic data?" Perhaps one of the most important lessons has been that genetic diversity, at the level of large-scale variation amongst even genomes of the same species, is far greater than was thought. The classical textbook view of evolution relying on the relatively slow accumulation of mutational events at the level of individual bases scattered throughout the genome has changed. One of the most obvious conclusions from examining the sequences from several hundred bacterial genomes is the enormous amount of diversity--even in different genomes from the same bacterial species. This diversity is generated by a variety of mechanisms, including mobile genetic elements and bacteriophages. An examination of the 20 Escherichia coli genomes sequenced so far dramatically illustrates this, with the genome size ranging from 4.6 to 5.5 Mbp; much of the variation appears to be of phage origin. This review also addresses mobile genetic elements, including pathogenicity islands and the structure of transposable elements. There are at least 20 different methods available to compare bacterial genomes. Metagenomics offers the chance to study genomic sequences found in ecosystems, including genomes of species that are difficult to culture. It has become clear that a genome sequence represents more than just a collection of gene sequences for an organism and that information concerning the environment and growth conditions for the organism are important for interpretation of the genomic data. The newly proposed Minimal Information about a Genome Sequence standard has been developed to obtain this

  20. Draft Sequences of the Radish (Raphanus sativus L.) Genome

    PubMed Central

    Kitashiba, Hiroyasu; Li, Feng; Hirakawa, Hideki; Kawanabe, Takahiro; Zou, Zhongwei; Hasegawa, Yoichi; Tonosaki, Kaoru; Shirasawa, Sachiko; Fukushima, Aki; Yokoi, Shuji; Takahata, Yoshihito; Kakizaki, Tomohiro; Ishida, Masahiko; Okamoto, Shunsuke; Sakamoto, Koji; Shirasawa, Kenta; Tabata, Satoshi; Nishio, Takeshi

    2014-01-01

    Radish (Raphanus sativus L., n = 9) is one of the major vegetables in Asia. Since the genomes of Brassica and related species including radish underwent genome rearrangement, it is quite difficult to perform functional analysis based on the reported genomic sequence of Brassica rapa. Therefore, we performed genome sequencing of radish. Short reads of genomic sequences of 191.1 Gb were obtained by next-generation sequencing (NGS) for a radish inbred line, and 76,592 scaffolds of ≥300 bp were constructed along with the bacterial artificial chromosome-end sequences. Finally, the whole draft genomic sequence of 402 Mb spanning 75.9% of the estimated genomic size and containing 61,572 predicted genes was obtained. Subsequently, 221 single nucleotide polymorphism markers and 768 PCR-RFLP markers were used together with the 746 markers produced in our previous study for the construction of a linkage map. The map was combined further with another radish linkage map constructed mainly with expressed sequence tag-simple sequence repeat markers into a high-density integrated map of 1,166 cM with 2,553 DNA markers. A total of 1,345 scaffolds were assigned to the linkage map, spanning 116.0 Mb. Bulked PCR products amplified by 2,880 primer pairs were sequenced by NGS, and SNPs in eight inbred lines were identified. PMID:24848699

  1. Draft sequences of the radish (Raphanus sativus L.) genome.

    PubMed

    Kitashiba, Hiroyasu; Li, Feng; Hirakawa, Hideki; Kawanabe, Takahiro; Zou, Zhongwei; Hasegawa, Yoichi; Tonosaki, Kaoru; Shirasawa, Sachiko; Fukushima, Aki; Yokoi, Shuji; Takahata, Yoshihito; Kakizaki, Tomohiro; Ishida, Masahiko; Okamoto, Shunsuke; Sakamoto, Koji; Shirasawa, Kenta; Tabata, Satoshi; Nishio, Takeshi

    2014-10-01

    Radish (Raphanus sativus L., n = 9) is one of the major vegetables in Asia. Since the genomes of Brassica and related species including radish underwent genome rearrangement, it is quite difficult to perform functional analysis based on the reported genomic sequence of Brassica rapa. Therefore, we performed genome sequencing of radish. Short reads of genomic sequences of 191.1 Gb were obtained by next-generation sequencing (NGS) for a radish inbred line, and 76,592 scaffolds of ≥ 300 bp were constructed along with the bacterial artificial chromosome-end sequences. Finally, the whole draft genomic sequence of 402 Mb spanning 75.9% of the estimated genomic size and containing 61,572 predicted genes was obtained. Subsequently, 221 single nucleotide polymorphism markers and 768 PCR-RFLP markers were used together with the 746 markers produced in our previous study for the construction of a linkage map. The map was combined further with another radish linkage map constructed mainly with expressed sequence tag-simple sequence repeat markers into a high-density integrated map of 1,166 cM with 2,553 DNA markers. A total of 1,345 scaffolds were assigned to the linkage map, spanning 116.0 Mb. Bulked PCR products amplified by 2,880 primer pairs were sequenced by NGS, and SNPs in eight inbred lines were identified. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  2. Genome Sequence of Human Rhinovirus A22, Strain Lancaster/2015

    PubMed Central

    Atkinson, Kate V.; Bishop, Lisa A.; Rhodes, Glenn; Salez, Nicolas; McEwan, Neil R.; Hegarty, Matthew J.; Robey, Julie; Harding, Nicola; Wetherell, Simon; Lauder, Robert M.; Pickup, Roger W.; Wilkinson, Mark

    2017-01-01

    ABSTRACT The genome of human rhinovirus A22 (HRV-A22) was assembled by deep sequencing RNA samples from nasopharyngeal swabs. The assembled genome is 8.7% divergent from the HRV-A22 reference strain over its full length, and it is only the second full-length genome sequence for HRV-A22. The new strain is designated strain HRV-A22/Lancaster/2015. PMID:28336607

  3. The genome sequence of Schizosaccharomyces pombe.

    PubMed

    Wood, V; Gwilliam, R; Rajandream, M-A; Lyne, M; Lyne, R; Stewart, A; Sgouros, J; Peat, N; Hayles, J; Baker, S; Basham, D; Bowman, S; Brooks, K; Brown, D; Brown, S; Chillingworth, T; Churcher, C; Collins, M; Connor, R; Cronin, A; Davis, P; Feltwell, T; Fraser, A; Gentles, S; Goble, A; Hamlin, N; Harris, D; Hidalgo, J; Hodgson, G; Holroyd, S; Hornsby, T; Howarth, S; Huckle, E J; Hunt, S; Jagels, K; James, K; Jones, L; Jones, M; Leather, S; McDonald, S; McLean, J; Mooney, P; Moule, S; Mungall, K; Murphy, L; Niblett, D; Odell, C; Oliver, K; O'Neil, S; Pearson, D; Quail, M A; Rabbinowitsch, E; Rutherford, K; Rutter, S; Saunders, D; Seeger, K; Sharp, S; Skelton, J; Simmonds, M; Squares, R; Squares, S; Stevens, K; Taylor, K; Taylor, R G; Tivey, A; Walsh, S; Warren, T; Whitehead, S; Woodward, J; Volckaert, G; Aert, R; Robben, J; Grymonprez, B; Weltjens, I; Vanstreels, E; Rieger, M; Schäfer, M; Müller-Auer, S; Gabel, C; Fuchs, M; Düsterhöft, A; Fritzc, C; Holzer, E; Moestl, D; Hilbert, H; Borzym, K; Langer, I; Beck, A; Lehrach, H; Reinhardt, R; Pohl, T M; Eger, P; Zimmermann, W; Wedler, H; Wambutt, R; Purnelle, B; Goffeau, A; Cadieu, E; Dréano, S; Gloux, S; Lelaure, V; Mottier, S; Galibert, F; Aves, S J; Xiang, Z; Hunt, C; Moore, K; Hurst, S M; Lucas, M; Rochet, M; Gaillardin, C; Tallada, V A; Garzon, A; Thode, G; Daga, R R; Cruzado, L; Jimenez, J; Sánchez, M; del Rey, F; Benito, J; Domínguez, A; Revuelta, J L; Moreno, S; Armstrong, J; Forsburg, S L; Cerutti, L; Lowe, T; McCombie, W R; Paulsen, I; Potashkin, J; Shpakovski, G V; Ussery, D; Barrell, B G; Nurse, P; Cerrutti, L

    2002-02-21

    We have sequenced and annotated the genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (Saccharomyces cerevisiae), possibly reflecting more-extended control regions. Some 43% of the genes contain introns, of which there are 4,730. Fifty genes have significant similarity with human disease genes; half of these are cancer related. We identify highly conserved genes important for eukaryotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing. These genes may have originated with the appearance of eukaryotic life. Few similarly conserved genes that are important for multicellular organization were identified, suggesting that the transition from prokaryotes to eukaryotes required more new genes than did the transition from unicellular to multicellular organization.

  4. Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi

    SciTech Connect

    Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.; Casjens, S. R.; Qiu, W.-G.; Mongodin, E. F.; Luft, B. J.

    2011-02-01

    Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.

  5. The dog genome: survey sequencing and comparative analysis.

    PubMed

    Kirkness, Ewen F; Bafna, Vineet; Halpern, Aaron L; Levy, Samuel; Remington, Karin; Rusch, Douglas B; Delcher, Arthur L; Pop, Mihai; Wang, Wei; Fraser, Claire M; Venter, J Craig

    2003-09-26

    A survey of the dog genome sequence (6.22 million sequence reads; 1.5x coverage) demonstrates the power of sample sequencing for comparative analysis of mammalian genomes and the generation of species-specific resources. More than 650 million base pairs (>25%) of dog sequence align uniquely to the human genome, including fragments of putative orthologs for 18,473 of 24,567 annotated human genes. Mutation rates, conserved synteny, repeat content, and phylogeny can be compared among human, mouse, and dog. A variety of polymorphic elements are identified that will be valuable for mapping the genetic basis of diseases and traits in the dog.

  6. Cancer whole-genome sequencing: present and future.

    PubMed

    Nakagawa, H; Wardell, C P; Furuta, M; Taniguchi, H; Fujimoto, A

    2015-12-03

    Recent explosive advances in next-generation sequencing technology and computational approaches to massive data enable us to analyze a number of cancer genome profiles by whole-genome sequencing (WGS). To explore cancer genomic alterations and their diversity comprehensively, global and local cancer genome-sequencing projects, including ICGC and TCGA, have been analyzing many types of cancer genomes mainly by exome sequencing. However, there is limited information on somatic mutations in non-coding regions including untranslated regions, introns, regulatory elements and non-coding RNAs, and rearrangements, sometimes producing fusion genes, and pathogen detection in cancer genomes remain widely unexplored. WGS approaches can detect these unexplored mutations, as well as coding mutations and somatic copy number alterations, and help us to better understand the whole landscape of cancer genomes and elucidate functions of these unexplored genomic regions. Analysis of cancer genomes using the present WGS platforms is still primitive and there are substantial improvements to be made in sequencing technologies, informatics and computer resources. Taking account of the extreme diversity of cancer genomes and phenotype, it is also required to analyze much more WGS data and integrate these with multi-omics data, functional data and clinical-pathological data in a large number of sample sets to interpret them more fully and efficiently.

  7. Progress in Understanding and Sequencing the Genome of Brassica rapa

    PubMed Central

    Hong, Chang Pyo; Kwon, Soo-Jin; Kim, Jung Sun; Yang, Tae-Jin; Park, Beom-Seok; Lim, Yong Pyo

    2008-01-01

    Brassica rapa, which is closely related to Arabidopsis thaliana, is an important crop and a model plant for studying genome evolution via polyploidization. We report the current understanding of the genome structure of B. rapa and efforts for the whole-genome sequencing of the species. The tribe Brassicaceae, which comprises ca. 240 species, descended from a common hexaploid ancestor with a basic genome similar to that of Arabidopsis. Chromosome rearrangements, including fusions and/or fissions, resulted in the present-day “diploid” Brassica species with variation in chromosome number and phenotype. Triplicated genomic segments of B. rapa are collinear to those of A. thaliana with InDels. The genome triplication has led to an approximately 1.7-fold increase in the B. rapa gene number compared to that of A. thaliana. Repetitive DNA of B. rapa has also been extensively amplified and has diverged from that of A. thaliana. For its whole-genome sequencing, the Brassica rapa Genome Sequencing Project (BrGSP) consortium has developed suitable genomic resources and constructed genetic and physical maps. Ten chromosomes of B. rapa are being allocated to BrGSP consortium participants, and each chromosome will be sequenced by a BAC-by-BAC approach. Genome sequencing of B. rapa will offer a new perspective for plant biology and evolution in the context of polyploidization. PMID:18288250

  8. Deep sequencing of 10,000 human genomes

    PubMed Central

    Pierce, Levi C. T.; Biggs, William H.; di Iulio, Julia; Wong, Emily H. M.; Fabani, Martin M.; Kirkness, Ewen F.; Moustafa, Ahmed; Shah, Naisha; Xie, Chao; Brewerton, Suzanne C.; Bulsara, Nadeem; Garner, Chad; Metzker, Gary; Sandoval, Efren; Perkins, Brad A.; Och, Franz J.; Turpaz, Yaron; Venter, J. Craig

    2016-01-01

    We report on the sequencing of 10,545 human genomes at 30×–40× coverage with an emphasis on quality metrics and novel variant and sequence discovery. We find that 84% of an individual human genome can be sequenced confidently. This high-confidence region includes 91.5% of exon sequence and 95.2% of known pathogenic variant positions. We present the distribution of over 150 million single-nucleotide variants in the coding and noncoding genome. Each newly sequenced genome contributes an average of 8,579 novel variants. In addition, each genome carries on average 0.7 Mb of sequence that is not found in the main build of the hg38 reference genome. The density of this catalog of variation allowed us to construct high-resolution profiles that define genomic sites that are highly intolerant of genetic variation. These results indicate that the data generated by deep genome sequencing is of the quality necessary for clinical use. PMID:27702888

  9. Deep sequencing of 10,000 human genomes.

    PubMed

    Telenti, Amalio; Pierce, Levi C T; Biggs, William H; di Iulio, Julia; Wong, Emily H M; Fabani, Martin M; Kirkness, Ewen F; Moustafa, Ahmed; Shah, Naisha; Xie, Chao; Brewerton, Suzanne C; Bulsara, Nadeem; Garner, Chad; Metzker, Gary; Sandoval, Efren; Perkins, Brad A; Och, Franz J; Turpaz, Yaron; Venter, J Craig

    2016-10-18

    We report on the sequencing of 10,545 human genomes at 30×-40× coverage with an emphasis on quality metrics and novel variant and sequence discovery. We find that 84% of an individual human genome can be sequenced confidently. This high-confidence region includes 91.5% of exon sequence and 95.2% of known pathogenic variant positions. We present the distribution of over 150 million single-nucleotide variants in the coding and noncoding genome. Each newly sequenced genome contributes an average of 8,579 novel variants. In addition, each genome carries on average 0.7 Mb of sequence that is not found in the main build of the hg38 reference genome. The density of this catalog of variation allowed us to construct high-resolution profiles that define genomic sites that are highly intolerant of genetic variation. These results indicate that the data generated by deep genome sequencing is of the quality necessary for clinical use.

  10. Sequences associated with centromere competency in the human genome.

    PubMed

    Hayden, Karen E; Strome, Erin D; Merrett, Stephanie L; Lee, Hye-Ran; Rudd, M Katharine; Willard, Huntington F

    2013-02-01

    Centromeres, the sites of spindle attachment during mitosis and meiosis, are located in specific positions in the human genome, normally coincident with diverse subsets of alpha satellite DNA. While there is strong evidence supporting the association of some subfamilies of alpha satellite with centromere function, the basis for establishing whether a given alpha satellite sequence is or is not designated a functional centromere is unknown, and attempts to understand the role of particular sequence features in establishing centromere identity have been limited by the near identity and repetitive nature of satellite sequences. Utilizing a broadly applicable experimental approach to test sequence competency for centromere specification, we have carried out a genomic and epigenetic functional analysis of endogenous human centromere sequences available in the current human genome assembly. The data support a model in which functionally competent sequences confer an opportunity for centromere specification, integrating genomic and epigenetic signals and promoting the concept of context-dependent centromere inheritance.

  11. Sequences Associated with Centromere Competency in the Human Genome

    PubMed Central

    Hayden, Karen E.; Strome, Erin D.; Merrett, Stephanie L.; Lee, Hye-Ran; Rudd, M. Katharine

    2013-01-01

    Centromeres, the sites of spindle attachment during mitosis and meiosis, are located in specific positions in the human genome, normally coincident with diverse subsets of alpha satellite DNA. While there is strong evidence supporting the association of some subfamilies of alpha satellite with centromere function, the basis for establishing whether a given alpha satellite sequence is or is not designated a functional centromere is unknown, and attempts to understand the role of particular sequence features in establishing centromere identity have been limited by the near identity and repetitive nature of satellite sequences. Utilizing a broadly applicable experimental approach to test sequence competency for centromere specification, we have carried out a genomic and epigenetic functional analysis of endogenous human centromere sequences available in the current human genome assembly. The data support a model in which functionally competent sequences confer an opportunity for centromere specification, integrating genomic and epigenetic signals and promoting the concept of context-dependent centromere inheritance. PMID:23230266

  12. [Human genome sequencing--next generation technology or will the routine sequencing of human genome be possible?].

    PubMed

    Pospísilová, S; Tichý, B; Mayer, J

    2009-01-01

    DNA sequencing has become a standard method widely used in molecular genetic analysis of biological materials. Its use in medicine is widespread, especially in diagnostics of inherited disorders and cancer related diseases. Development of DNA diagnostics has been strongly accelerated by publication of the human genome sequence in 2001. During the last few years one can observe rapid development of novel sequencing technologies, which have led to the introduction of so called "New Generation Sequencing". These new technologies based on principles of massive parallel sequencing (e.g. Roche/454, Illumina Genome Analyzer IIx, Life Technologies SOLiD 3 and others) enable a massive increase of sequencing capacity and in parallel also a fundamental decrease of costs. This major technological breakthrough allowed development of the whole-genome sequencing including analyses of individual human genomes. It also started the era of personal genomics. The first sequenced individual human genomes belonged to famous geneticists J. C. Venter (2007) and J. D. Watson (2008), but they were rapidly followed by sequencing analyses of other individuals from various ethnic groups. These studies brought substantial information about interpersonal differences in genome structure (through characterization of nucleotide polymorphisms, DNA deletions and amplifications etc.). Sequencing of cancer cell genomes, e.g. acute myeloid leukemia has already brought first important clinically relevant results. Although currently we are still unable to interpret the relevance of all detected genome variants, it is obvious, that the possibility to sequence individual human genomes represents a fundamental breakthrough not only in DNA diagnostics but also in clinical medicine.

  13. Expression and purification of recombinant human Mig in Escherichia coli and its comparison with murine Mig.

    PubMed

    Qian, Lan; Zhu, Shunying; Shen, Jiaqing; Han, Xiaodong; Gao, Jin; Wu, Mingyuan; Yu, Yan; Lu, Huili; Han, Wei

    2012-03-01

    Monokine induced by IFN-γ (Mig) is a member of CXC-chemokines and recruits T-lymphocytes to activate the immune response. In recent years, it has raised much interest in the areas of autoimmune disease and allograft rejection, as the production of recombinant human Mig (rHuMig) would be of considerable significance for both research and potential clinical use. Here we report the expression, preparation and characterization of non-tagged recombinant human Mig (rHuMig) using a prokaryotic expression system. Following expression in Escherichia coli (E. coli) BL21, the 103 amino acid residue of rHuMig was purified from bacteria inclusion bodies with a one-step S-Sepharose cation exchange chromatography. The product was immunologically characterized via Western blot and its purity was determined via SDS-PAGE and silver staining to be above 99%, with an endotoxin level <0.5EU/μg via a chemotaxis assay, rHuMig demonstrated chemotactic activity on mouse spleen lymphocytes with an ED50 of 15 ng/mL. Additionally, using a proliferation assay, rHuMig significantly inhibited proliferation of the human bladder cell line T24. In vivo experiments revealed that rHuMig could inhibit mouse bone marrow mononuclear cells cycling into the S-phase and reduced intestinal cell proliferation. Our results demonstrate that rHuMig is fully functional in the mouse model. Copyright © 2012 Elsevier Inc. All rights reserved.

  14. Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes

    SciTech Connect

    McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.; Kuehl, Jennifer V.; Boore, Jeffrey L.; dePamphilis, Claude W.

    2005-08-26

    Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. A minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.

  15. Real-time, portable genome sequencing for Ebola surveillance.

    PubMed

    Quick, Joshua; Loman, Nicholas J; Duraffour, Sophie; Simpson, Jared T; Severi, Ettore; Cowley, Lauren; Bore, Joseph Akoi; Koundouno, Raymond; Dudas, Gytis; Mikhail, Amy; Ouédraogo, Nobila; Afrough, Babak; Bah, Amadou; Baum, Jonathan H J; Becker-Ziaja, Beate; Boettcher, Jan Peter; Cabeza-Cabrerizo, Mar; Camino-Sánchez, Álvaro; Carter, Lisa L; Doerrbecker, Juliane; Enkirch, Theresa; García-Dorival, Isabel; Hetzelt, Nicole; Hinzmann, Julia; Holm, Tobias; Kafetzopoulou, Liana Eleni; Koropogui, Michel; Kosgey, Abigael; Kuisma, Eeva; Logue, Christopher H; Mazzarelli, Antonio; Meisel, Sarah; Mertens, Marc; Michel, Janine; Ngabo, Didier; Nitzsche, Katja; Pallasch, Elisa; Patrono, Livia Victoria; Portmann, Jasmine; Repits, Johanna Gabriella; Rickett, Natasha Y; Sachse, Andreas; Singethan, Katrin; Vitoriano, Inês; Yemanaberhan, Rahel L; Zekeng, Elsa G; Racine, Trina; Bello, Alexander; Sall, Amadou Alpha; Faye, Ousmane; Faye, Oumar; Magassouba, N'Faly; Williams, Cecelia V; Amburgey, Victoria; Winona, Linda; Davis, Emily; Gerlach, Jon; Washington, Frank; Monteil, Vanessa; Jourdain, Marine; Bererd, Marion; Camara, Alimou; Somlare, Hermann; Camara, Abdoulaye; Gerard, Marianne; Bado, Guillaume; Baillet, Bernard; Delaune, Déborah; Nebie, Koumpingnin Yacouba; Diarra, Abdoulaye; Savane, Yacouba; Pallawo, Raymond Bernard; Gutierrez, Giovanna Jaramillo; Milhano, Natacha; Roger, Isabelle; Williams, Christopher J; Yattara, Facinet; Lewandowski, Kuiama; Taylor, James; Rachwal, Phillip; Turner, Daniel J; Pollakis, Georgios; Hiscox, Julian A; Matthews, David A; O'Shea, Matthew K; Johnston, Andrew McD; Wilson, Duncan; Hutley, Emma; Smit, Erasmus; Di Caro, Antonino; Wölfel, Roman; Stoecker, Kilian; Fleischmann, Erna; Gabriel, Martin; Weller, Simon A; Koivogui, Lamine; Diallo, Boubacar; Keïta, Sakoba; Rambaut, Andrew; Formenty, Pierre; Günther, Stephan; Carroll, Miles W

    2016-02-11

    The Ebola virus disease epidemic in West Africa is the largest on record, responsible for over 28,599 cases and more than 11,299 deaths. Genome sequencing in viral outbreaks is desirable to characterize the infectious agent and determine its evolutionary rate. Genome sequencing also allows the identification of signatures of host adaptation, identification and monitoring of diagnostic targets, and characterization of responses to vaccines and treatments. The Ebola virus (EBOV) genome substitution rate in the Makona strain has been estimated at between 0.87 × 10(-3) and 1.42 × 10(-3) mutations per site per year. This is equivalent to 16-27 mutations in each genome, meaning that sequences diverge rapidly enough to identify distinct sub-lineages during a prolonged epidemic. Genome sequencing provides a high-resolution view of pathogen evolution and is increasingly sought after for outbreak surveillance. Sequence data may be used to guide control measures, but only if the results are generated quickly enough to inform interventions. Genomic surveillance during the epidemic has been sporadic owing to a lack of local sequencing capacity coupled with practical difficulties transporting samples to remote sequencing facilities. To address this problem, here we devise a genomic surveillance system that utilizes a novel nanopore DNA sequencing instrument. In April 2015 this system was transported in standard airline luggage to Guinea and used for real-time genomic surveillance of the ongoing epidemic. We present sequence data and analysis of 142 EBOV samples collected during the period March to October 2015. We were able to generate results less than 24 h after receiving an Ebola-positive sample, with the sequencing process taking as little as 15-60 min. We show that real-time genomic surveillance is possible in resource-limited settings and can be established rapidly to monitor outbreaks.

  16. Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data.

    PubMed

    Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay

    2013-01-01

    Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.

  17. Returning genome sequences to research participants: Policy and practice.

    PubMed

    Wright, Caroline F; Middleton, Anna; Barrett, Jeffrey C; Firth, Helen V; FitzPatrick, David R; Hurles, Matthew E; Parker, Michael

    2017-02-24

    Despite advances in genomic science stimulating an explosion of literature around returning health-related findings, the possibility of returning entire genome sequences to individual research participants has not been widely considered. Through direct involvement in large-scale translational genomics studies, we have identified a number of logistical challenges that would need to be overcome prior to returning individual genome sequence data, including verifying that the data belong to the requestor and providing appropriate informatics support. In addition, we identify a number of ethico-legal issues that require careful consideration, including returning data to family members, mitigating against unintended consequences, and ensuring appropriate governance. Finally, recognising that there is an opportunity cost to addressing these issues, we make some specific pragmatic suggestions for studies that are considering whether to share individual genomic datasets with individual study participants. If data are shared, research should be undertaken into the personal, familial and societal impact of receiving individual genome sequence data.

  18. Returning genome sequences to research participants: Policy and practice

    PubMed Central

    2017-01-01

    Despite advances in genomic science stimulating an explosion of literature around returning health-related findings, the possibility of returning entire genome sequences to individual research participants has not been widely considered. Through direct involvement in large-scale translational genomics studies, we have identified a number of logistical challenges that would need to be overcome prior to returning individual genome sequence data, including verifying that the data belong to the requestor and providing appropriate informatics support. In addition, we identify a number of ethico-legal issues that require careful consideration, including returning data to family members, mitigating against unintended consequences, and ensuring appropriate governance. Finally, recognising that there is an opportunity cost to addressing these issues, we make some specific pragmatic suggestions for studies that are considering whether to share individual genomic datasets with individual study participants. If data are shared, research should be undertaken into the personal, familial and societal impact of receiving individual genome sequence data. PMID:28317033

  19. Savant: genome browser for high-throughput sequencing data.

    PubMed

    Fiume, Marc; Williams, Vanessa; Brook, Andrew; Brudno, Michael

    2010-08-15

    The advent of high-throughput sequencing (HTS) technologies has made it affordable to sequence many individuals' genomes. Simultaneously the computational analysis of the large volumes of data generated by the new sequencing machines remains a challenge. While a plethora of tools are available to map the resulting reads to a reference genome, and to conduct primary analysis of the mappings, it is often necessary to visually examine the results and underlying data to confirm predictions and understand the functional effects, especially in the context of other datasets. We introduce Savant, the Sequence Annotation, Visualization and ANalysis Tool, a desktop visualization and analysis browser for genomic data. Savant was developed for visualizing and analyzing HTS data, with special care taken to enable dynamic visualization in the presence of gigabases of genomic reads and references the size of the human genome. Savant supports the visualization of genome-based sequence, point, interval and continuous datasets, and multiple visualization modes that enable easy identification of genomic variants (including single nucleotide polymorphisms, structural and copy number variants), and functional genomic information (e.g. peaks in ChIP-seq data) in the context of genomic annotations. Savant is freely available at http://compbio.cs.toronto.edu/savant.

  20. Reference genome sequence of the model plant Setaria

    SciTech Connect

    Bennetzen, Jeffrey L; Schmutz, Jeremy; Wang, Hao; Percifield, Ryan; Hawkins, Jennifer; Pontaroli, Ana C.; Estep, Matt; Feng, Liang; Vaughn, Justin N; Grimwood, Jane; Jenkins, Jerry; Barry, Kerrie; Lindquist, Erika; Hellsten, Uffe; Deshpande, Shweta; Wang, Xuewen; Wu, Xiaomei; Mitros, Therese; Triplett, Jimmy; Yang, Xiaohan; Ye, Chuyu; Mauro-Herrera, Margarita; Wang, Lin; Li, Pinghua; Sharma, Manoj; Sharma, Rita; Ronald, Pamela; Panaud, Olivier; Kellogg, Elizabeth A.; Brutnell, Thomas P.; Doust, Andrew N.; Tuskan, Gerald A; Rokhsar, Daniel; Devos, Katrien M

    2012-01-01

    We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The ~400-Mb assembly covers ~80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).

  1. Reference genome sequence of the model plant Setaria

    SciTech Connect

    Bennetzen, Jeffrey L; Yang, Xiaohan; Ye, Chuyu; Tuskan, Gerald A

    2012-01-01

    We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The {approx}400-Mb assembly covers {approx}80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).

  2. Reference genome sequence of the model plant Setaria.

    PubMed

    Bennetzen, Jeffrey L; Schmutz, Jeremy; Wang, Hao; Percifield, Ryan; Hawkins, Jennifer; Pontaroli, Ana C; Estep, Matt; Feng, Liang; Vaughn, Justin N; Grimwood, Jane; Jenkins, Jerry; Barry, Kerrie; Lindquist, Erika; Hellsten, Uffe; Deshpande, Shweta; Wang, Xuewen; Wu, Xiaomei; Mitros, Therese; Triplett, Jimmy; Yang, Xiaohan; Ye, Chu-Yu; Mauro-Herrera, Margarita; Wang, Lin; Li, Pinghua; Sharma, Manoj; Sharma, Rita; Ronald, Pamela C; Panaud, Olivier; Kellogg, Elizabeth A; Brutnell, Thomas P; Doust, Andrew N; Tuskan, Gerald A; Rokhsar, Daniel; Devos, Katrien M

    2012-05-13

    We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The ∼400-Mb assembly covers ∼80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).

  3. Detection of DNA Methylation by Whole-Genome Bisulfite Sequencing.

    PubMed

    Li, Qing; Hermanson, Peter J; Springer, Nathan M

    2018-01-01

    DNA methylation plays an important role in the regulation of the expression of transposons and genes. Various methods have been developed to assay DNA methylation levels. Bisulfite sequencing is considered to be the "gold standard" for single-base resolution measurement of DNA methylation levels. Coupled with next-generation sequencing, whole-genome bisulfite sequencing (WGBS) allows DNA methylation to be evaluated at a genome-wide scale. Here, we described a protocol for WGBS in plant species with large genomes. This protocol has been successfully applied to assay genome-wide DNA methylation levels in maize and barley. This protocol has also been successfully coupled with sequence capture technology to assay DNA methylation levels in a targeted set of genomic regions.

  4. Marsupial Genome Sequences: Providing Insight into Evolution and Disease

    PubMed Central

    Deakin, Janine E.

    2012-01-01

    Marsupials (metatherians), with their position in vertebrate phylogeny and their unique biological features, have been studied for many years by a dedicated group of researchers, but it has only been since the sequencing of the first marsupial genome that their value has been more widely recognised. We now have genome sequences for three distantly related marsupial species (the grey short-tailed opossum, the tammar wallaby, and Tasmanian devil), with the promise of many more genomes to be sequenced in the near future, making this a particularly exciting time in marsupial genomics. The emergence of a transmissible cancer, which is obliterating the Tasmanian devil population, has increased the importance of obtaining and analysing marsupial genome sequence for understanding such diseases as well as for conservation efforts. In addition, these genome sequences have facilitated studies aimed at answering questions regarding gene and genome evolution and provided insight into the evolution of epigenetic mechanisms. Here I highlight the major advances in our understanding of evolution and disease, facilitated by marsupial genome projects, and speculate on the future contributions to be made by such sequences. PMID:24278712

  5. First Complete Genome Sequences of Two Keystone Viruses from Florida

    PubMed Central

    Stockwell, Timothy B.; Heberlein-Larson, Lea A.; Tan, Yi; Halpin, Rebecca A.; Fedorova, Nadia; Katzel, Daniel A.; Smole, Sandra; Unnasch, Thomas R.; Kramer, Laura D.

    2015-01-01

    We report here the first complete sequences of two Keystone virus (KEYV) genomes isolated from Florida in 2005, which include the first two publicly available complete large (L) gene sequences. The sequences of the KEYV L segments show 75.99 to 83.86% nucleotide similarity with those of other viruses in the California (CAL) serogroup of bunyaviruses. PMID:26514762

  6. Selecting sequence variants to improve genomic predictions for dairy cattle

    USDA-ARS?s Scientific Manuscript database

    Millions of genetic variants have been identified by population-scale sequencing projects, but subsets are needed for routine genomic predictions or to include on genotyping arrays. Methods of selecting sequence variants were compared using both simulated sequence genotypes and actual data from run ...

  7. Genome sequencing of the redbanded stink bug (Piezodorus guildinii)

    USDA-ARS?s Scientific Manuscript database

    We assembled a partial genome sequence from the redbanded stink bug, Piezodorus guildinii from Illumina MiSeq sequencing runs. The sequence has been submitted and published under NCBI GenBank Accession Number JTEQ01000000. The BioProject and BioSample Accession numbers are PRJNA263369 and SAMN030997...

  8. Complete genome sequence of Thermomonospora curvata type strain (B9)

    SciTech Connect

    Chertkov, Olga; Sikorski, Johannes; Nolan, Matt; Lapidus, Alla L.; Lucas, Susan; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Ngatchou, Olivier Duplex; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Brettin, Thomas S; Han, Cliff; Detter, J. Chris; Rohde, Manfred; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2011-01-01

    Thermomonospora curvata Henssen 1957 is the type species of the genus Thermomonospora. This genus is of interest because members of this clade are sources of new antibiotics, enzymes, and products with pharmacological activity. In addition, members of this genus participate in the active degradation of cellulose. This is the first complete genome sequence of a member of the family Thermomonosporaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 5,639,016 bp long genome with its 4,985 protein-coding and 76 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  9. Complete genome sequence of Sulfurospirillum deleyianum type strain (5175T)

    PubMed Central

    Sikorski, Johannes; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Chen, Feng; Tice, Hope; Cheng, Jan-Fang; Saunders, Elizabeth; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ovchinnikova, Galina; Pati, Amrita; Ivanova, Natalia; Mavromatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; Chain, Patrick; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Brettin, Thomas; Detter, John C.; Han, Cliff; Rohde, Manfred; Lang, Elke; Spring, Stefan; Göker, Markus; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2010-01-01

    Sulfurospirillum deleyianum Schumacher et al. 1993 is the type species of the genus Sulfurospirillum. S. deleyianum is a model organism for studying sulfur reduction and dissimilatory nitrate reduction as an energy source for growth. Also, it is a prominent model organism for studying the structural and functional characteristics of cytochrome c nitrite reductase. Here, we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the genus Sulfurospirillum. The 2,306,351 bp long genome with its 2,291 protein-coding and 52 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304697

  10. The revolution of whole genome sequencing to study parasites.

    PubMed

    Forrester, Sarah Jayne; Hall, Neil

    2014-07-01

    Genome sequencing has revolutionized the way in which we approach biological research from fundamental molecular biology to ecology and epidemiology. In the last 10 years the field of genomics has changed enormously as technology has improved and the tools for genomic sequencing have moved out of a few dedicated centers and now can be performed on bench-top instruments. In this review we will cover some of the key discoveries that were catalyzed by some of the first genome projects and discuss how this field is developing, what the new challenges are and how this may impact on research in the near future.

  11. Complete genome sequence of Acidimicrobium ferrooxidans type strain (ICPT)

    SciTech Connect

    Clum, Alicia; Nolan, Matt; Lang, Elke; Glavina Del Rio, Tijana; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavrommatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Goker, Markus; Spring, Stefan; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia C.; Chain, Patrick; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Lapidus, Alla

    2009-05-20

    Acidimicrobium ferrooxidans (Clark and Norris 1996) is the sole and type species of the genus, which until recently was the only genus within the actinobacterial family Acidimicrobiaceae and in the order Acidomicrobiales. Rapid oxidation of iron pyrite during autotrophic growth in the absence of an enhanced CO2 concentration is characteristic for A. ferrooxidans. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the order Acidomicrobiales, and the 2,158,157 bp long single replicon genome with its 2038 protein coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  12. Complete genome sequence of Sulfurospirillum deleyianum type strain (5175T)

    SciTech Connect

    Sikorski, Johannes; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Chen, Feng; Tice, Hope; Cheng, Jan-Fang; Saunders, Elizabeth H; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Ovchinnikova, Galina; Pati, Amrita; Ivanova, N; Mavromatis, K; Chen, Amy; Palaniappan, Krishna; Chain, Patrick S. G.; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Detter, J. Chris; Han, Cliff; Rohde, Manfred; Lang, Elke; Spring, Stefan; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-01

    Sulfurospirillum deleyianum Schumacher et al. 1993 is the type species of the genus Sulfurospirillum. S. deleyianum is a model organism for studying sulfur reduction and dissimilatory nitrate reduction as energy source for growth. Also, it is a prominent model organism for studying the structural and functional characteristics of the cytochrome c nitrite reductase. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the genus Sulfurospirillum. The 2,306,351 bp long genome with its 2291 protein-coding and 52 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  13. Complete genome sequence of Spirosoma linguale type strain (1T)

    SciTech Connect

    Lail, Kathleen; Sikorski, Johannes; Saunders, Elizabeth H; Lapidus, Alla L.; Glavina Del Rio, Tijana; Copeland, A; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Nolan, Matt; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Ivanova, N; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Detter, J. Chris; Schutze, Andrea; Rohde, Manfred; Tindall, Brian; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Chen, Feng

    2010-01-01

    Spirosoma linguale Migula 1894 is the type species of the genus. S. linguale is a free-living and non-pathogenic organism, known for its peculiar ringlike and horseshoe-shaped cell morphology. Here we describe the features of this organism, together with the complete ge-nome sequence and annotation. This is only the third completed genome sequence of a member of the family Cytophagaceae. The 8,491,258 bp long genome with its eight plas-mids, 7,069 protein-coding and 60 RNA genes is part of the Genomic Encyclopedia of Bacte-ria and Archaea project.

  14. Complete genome sequence of Gordonia bronchialis type strain (3410T)

    PubMed Central

    Ivanova, Natalia; Sikorski, Johannes; Jando, Marlen; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Del Rio, Tijana Glavina; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Chain, Patrick; Saunders, Elizabeth; Han, Cliff; Detter, John C.; Brettin, Thomas; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C.

    2010-01-01

    Gordonia bronchialis Tsukamura 1971 is the type species of the genus. G. bronchialis is a human-pathogenic organism that has been isolated from a large variety of human tissues. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family Gordoniaceae. The 5,290,012 bp long genome with its 4,944 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304674

  15. Complete genome sequence of Gordonia bronchialis type strain (3410T)

    SciTech Connect

    Ivanova, N; Sikorski, Johannes; Jando, Marlen; Lapidus, Alla L.; Nolan, Matt; Glavina Del Rio, Tijana; Tice, Hope; Copeland, A; Cheng, Jan-Fang; Chen, Feng; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Saunders, Elizabeth H; Han, Cliff; Detter, J C; Brettin, Thomas S; Rohde, Manfred; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2010-01-01

    Gordonia bronchialis Tsukamura 1971 is the type species of the genus. G. bronchialis is a human-pathogenic organism that has been isolated from a large variety of human tissues. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family Gordoniaceae. The 5,290,012 bp long genome with its 4,944 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  16. Corrosion Behavior of MIG Brazed and MIG Welded Joints of Automotive DP600-GI Steel Sheet

    NASA Astrophysics Data System (ADS)

    Basak, Sushovan; Das, Hrishikesh; Pal, Tapan Kumar; Shome, Mahadev

    2016-12-01

    Galvanized dual-phase steel sheets are extensively used by the auto industry for their corrosion resistance property. Welding by the metal inert gas (MIG) process causes degradation of the steel in the vicinity of the joint due to excessive zinc evaporation. In order to minimize Zn loss, the MIG brazing process has been tried out in lap joint configuration over a heat input range of 136-204 J mm-1. The amount of zinc loss, intermetallic formation and corrosion properties in the joint area has been evaluated for both MIG brazing and MIG welding. Corrosion rate of 21 mm year-1 has been reduced to 2 mm year-1 by adopting MIGB in place MIGW. Impedance study has shown that the corrosion mechanism in base metal, MIG brazed and MIG welded joints is dominated by charge transfer, diffusion and mixed mode control processes, respectively.

  17. Clinical Utility of the Mig-SCog.

    PubMed

    Gil-Gouveia, Raquel; Oliveira, António G; Pavão Martins, Isabel

    2016-06-01

    Mig-SCog is a 9-item questionnaire developed to quantify attack-related cognitive complaints in migraine (M). The items relate to executive function and language, and the total Mig-SCog score is the sum of those scales. To evaluate the Mig-SCog scores regarding cognitive symptoms during a variety of conditions. We conducted a prospective comparative study of the Mig-SCog scores (1) between migraine and tension-type headache (TTH) patients during a headache; (2) in migraine patients between migraine attacks, non-headache pain and pain-free status; (3) in migraine patients during and outside a migraine attack. One hundred forty-nine patients (98 M and 51 TTHA). Total Mig-SCog score was higher in migraine patients than TTH (8.0 ± 4.1 vs 3.4 ± 3.2, P < .0001). Sixty-three patients took part in the next part of the study. Migraine patients rated the Mig-SCog higher for migraine (7.9 ± 4.6) than for non-headache pain (2.3 ± 2.9, P < .0006) or pain-free (1.6 ± 2.4, P < .0006). In the final phase of the study, 38 patients Mig-SCog scores were not significantly different whether obtained during or outside an M attack (P = .26). Attack-related subjective cognitive symptoms, assessed by Mig-SCog scores, differed between migraine and TTH patients. The Mig-SCog scores from migraine patients were found to be higher during migraine than during non-headache pain or pain-free conditions. Patient scoring from memory for usual attacks was not significantly different to scoring within attacks, We believe this demonstrates negligible recall bias. © 2016 American Headache Society.

  18. Genome Science: A Video Tour of the Washington University Genome Sequencing Center for High School and Undergraduate Students

    ERIC Educational Resources Information Center

    Flowers, Susan K.; Easter, Carla; Holmes, Andrea; Cohen, Brian; Bednarski, April E.; Mardis, Elaine R.; Wilson, Richard K.; Elgin, Sarah C. R.

    2005-01-01

    Sequencing of the human genome has ushered in a new era of biology. The technologies developed to facilitate the sequencing of the human genome are now being applied to the sequencing of other genomes. In 2004, a partnership was formed between Washington University School of Medicine Genome Sequencing Center's Outreach Program and Washington…

  19. Genome Science: A Video Tour of the Washington University Genome Sequencing Center for High School and Undergraduate Students

    ERIC Educational Resources Information Center

    Flowers, Susan K.; Easter, Carla; Holmes, Andrea; Cohen, Brian; Bednarski, April E.; Mardis, Elaine R.; Wilson, Richard K.; Elgin, Sarah C. R.

    2005-01-01

    Sequencing of the human genome has ushered in a new era of biology. The technologies developed to facilitate the sequencing of the human genome are now being applied to the sequencing of other genomes. In 2004, a partnership was formed between Washington University School of Medicine Genome Sequencing Center's Outreach Program and Washington…

  20. Alfresco—A Workbench for Comparative Genomic Sequence Analysis

    PubMed Central

    Jareborg, Niclas; Durbin, Richard

    2000-01-01

    Comparative analysis of genomic sequences provides a powerful tool for identifying regions of potential biologic function; by comparing corresponding regions of genomes from suitable species, protein coding or regulatory regions can be identified by their homology. This requires the use of several specific types of computational analysis tools. Many programs exist for these types of analysis; not many exist for overall view/control of the results, which is necessary for large-scale genomic sequence analysis. Using Java, we have developed a new visualization tool that allows effective comparative genome sequence analysis. The program handles a pair of sequences from putatively homologous regions in different species. Results from various different existing external analysis programs, such as database searching, gene prediction, repeat masking, and alignment programs, are visualized and used to find corresponding functional sequence domains in the two sequences. The user interacts with the program through a graphic display of the genome regions, in which an independently scrollable and zoomable symbolic representation of the sequences is shown. As an example, the analysis of two unannotated orthologous genomic sequences from human and mouse containing parts of the UTY locus is presented. PMID:10958633

  1. Oxford Nanopore MinION Sequencing and Genome Assembly.

    PubMed

    Lu, Hengyun; Giordano, Francesca; Ning, Zemin

    2016-10-01

    The revolution of genome sequencing is continuing after the successful second-generation sequencing (SGS) technology. The third-generation sequencing (TGS) technology, led by Pacific Biosciences (PacBio), is progressing rapidly, moving from a technology once only capable of providing data for small genome analysis, or for performing targeted screening, to one that promises high quality de novo assembly and structural variation detection for human-sized genomes. In 2014, the MinION, the first commercial sequencer using nanopore technology, was released by Oxford Nanopore Technologies (ONT). MinION identifies DNA bases by measuring the changes in electrical conductivity generated as DNA strands pass through a biological pore. Its portability, affordability, and speed in data production makes it suitable for real-time applications, the release of the long read sequencer MinION has thus generated much excitement and interest in the genomics community. While de novo genome assemblies can be cheaply produced from SGS data, assembly continuity is often relatively poor, due to the limited ability of short reads to handle long repeats. Assembly quality can be greatly improved by using TGS long reads, since repetitive regions can be easily expanded into using longer sequencing lengths, despite having higher error rates at the base level. The potential of nanopore sequencing has been demonstrated by various studies in genome surveillance at locations where rapid and reliable sequencing is needed, but where resources are limited. Copyright © 2016 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  2. Complete genome sequence of Staphylothermus hellenicus P8T

    SciTech Connect

    Anderson, Iain; Wirth, Reinhard; Lucas, Susan; Copeland, A; Lapidus, Alla L.; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Davenport, Karen W.; Detter, J. Chris; Han, Cliff; Tapia, Roxanne; Land, Miriam L; Hauser, Loren John; Pati, Amrita; Mikhailova, Natalia; Woyke, Tanja; Klenk, Hans-Peter; Kyrpides, Nikos C; Ivanova, N

    2011-01-01

    Staphylothermus hellenicus belongs to the order Desulfurococcales within the archaeal phy- lum Crenarchaeota. Strain P8T is the type strain of the species and was isolated from a shal- low hydrothermal vent system at Palaeochori Bay, Milos, Greece. It is a hyperthermophilic, anaerobic heterotroph. Here we describe the features of this organism together with the com- plete genome sequence and annotation. The 1,580,347 bp genome with its 1,668 protein- coding and 48 RNA genes was sequenced as part of a DOE Joint Genome Institute (JGI) La- boratory Sequencing Program (LSP) project.

  3. Complete Genome Sequence of Phytopathogenic Pectobacterium atrosepticum Bacteriophage Peat1

    PubMed Central

    Kalischuk, Melanie; Hachey, John

    2015-01-01

    Pectobacterium atrosepticum is a common phytopathogen causing significant economic losses worldwide. To develop a biocontrol strategy for this blackleg pathogen of solanaceous plants, P. atrosepticum bacteriophage Peat1 was isolated and its genome completely sequenced. Interestingly, morphological and sequence analyses of the 45,633-bp genome revealed that phage Peat1 is a member of the family Podoviridae and most closely resembles the Klebsiella pneumoniae bacteriophage KP34. This is the first published complete genome sequence of a phytopathogenic P. atrosepticum bacteriophage, and details provide important information for the development of biocontrol by advancing our understanding of phage-phytopathogen interactions. PMID:26272557

  4. Complete Genome Sequence of Phytopathogenic Pectobacterium atrosepticum Bacteriophage Peat1.

    PubMed

    Kalischuk, Melanie; Hachey, John; Kawchuk, Lawrence

    2015-08-13

    Pectobacterium atrosepticum is a common phytopathogen causing significant economic losses worldwide. To develop a biocontrol strategy for this blackleg pathogen of solanaceous plants, P. atrosepticum bacteriophage Peat1 was isolated and its genome completely sequenced. Interestingly, morphological and sequence analyses of the 45,633-bp genome revealed that phage Peat1 is a member of the family Podoviridae and most closely resembles the Klebsiella pneumoniae bacteriophage KP34. This is the first published complete genome sequence of a phytopathogenic P. atrosepticum bacteriophage, and details provide important information for the development of biocontrol by advancing our understanding of phage-phytopathogen interactions.

  5. Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

    PubMed

    Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi

    2017-07-01

    PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.

  6. The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats

    PubMed Central

    van der Weide, Robin H.; Simonis, Marieke; Hermsen, Roel; Toonen, Pim; Cuppen, Edwin; de Ligt, Joep

    2016-01-01

    Unmapped next-generation sequencing reads are typically ignored while they contain biologically relevant information. We systematically analyzed unmapped reads from whole genome sequencing of 33 inbred rat strains. High quality reads were selected and enriched for biologically relevant sequences; similarity-based analysis revealed clustering similar to previously reported phylogenetic trees. Our results demonstrate that on average 20% of all unmapped reads harbor sequences that can be used to improve reference genomes and generate hypotheses on potential genotype-phenotype relationships. Analysis pipelines would benefit from incorporating the described methods and reference genomes would benefit from inclusion of the genomic segments obtained through these efforts. PMID:27501045

  7. The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats.

    PubMed

    van der Weide, Robin H; Simonis, Marieke; Hermsen, Roel; Toonen, Pim; Cuppen, Edwin; de Ligt, Joep

    2016-01-01

    Unmapped next-generation sequencing reads are typically ignored while they contain biologically relevant information. We systematically analyzed unmapped reads from whole genome sequencing of 33 inbred rat strains. High quality reads were selected and enriched for biologically relevant sequences; similarity-based analysis revealed clustering similar to previously reported phylogenetic trees. Our results demonstrate that on average 20% of all unmapped reads harbor sequences that can be used to improve reference genomes and generate hypotheses on potential genotype-phenotype relationships. Analysis pipelines would benefit from incorporating the described methods and reference genomes would benefit from inclusion of the genomic segments obtained through these efforts.

  8. MIPS: a database for genomes and protein sequences.

    PubMed

    Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).

  9. Mig-14 plays an important role in influencing gene expression of Salmonella enterica serovar Typhi, which contributes to cell invasion under hyperosmotic conditions.

    PubMed

    Sheng, Xiumei; Zhang, Hong; Xia, Qiufeng; Xu, Shungao; Xu, Huaxi; Huang, Xinxiang

    2013-11-01

    mig-14 is a horizontally acquired host-induced virulence gene in Salmonella enterica serovar Typhi. The molecular function of mig-14 is still unknown; sequence analysis showed that mig-14 shared homology with the helix-loop-helix motif of the AraC family of transcriptional regulatory proteins. In our previous microarray-based studies, mig-14 was upregulated at the early stage of high osmotic stress, indicating a potential role under this condition. Therefore, we compared growth and the global transcriptional difference between wild-type and mig-14 mutant strains to identify the role of Mig-14. The results showed that growth of mig-14 mutant strain was clearly slower than that of the wild-type strain, and 148 genes showed significant differences in expression between these two strains under upshift high osmotic treatment for 30 min. In total, 77 genes and 71 genes in the mig-14 mutant strain were upregulated and downregulated, respectively. Genes involved in invasion, virulence, flagellation, motility and chemotaxis of Salmonella were downregulated. Thus, cell invasion abilities of these two strains were further analyzed. The results confirmed that activities of mig-14 were important for cell invasion. Copyright © 2013 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  10. Accurate whole human genome sequencing using reversible terminator chemistry.

    PubMed

    Bentley, David R; Balasubramanian, Shankar; Swerdlow, Harold P; Smith, Geoffrey P; Milton, John; Brown, Clive G; Hall, Kevin P; Evers, Dirk J; Barnes, Colin L; Bignell, Helen R; Boutell, Jonathan M; Bryant, Jason; Carter, Richard J; Keira Cheetham, R; Cox, Anthony J; Ellis, Darren J; Flatbush, Michael R; Gormley, Niall A; Humphray, Sean J; Irving, Leslie J; Karbelashvili, Mirian S; Kirk, Scott M; Li, Heng; Liu, Xiaohai; Maisinger, Klaus S; Murray, Lisa J; Obradovic, Bojan; Ost, Tobias; Parkinson, Michael L; Pratt, Mark R; Rasolonjatovo, Isabelle M J; Reed, Mark T; Rigatti, Roberto; Rodighiero, Chiara; Ross, Mark T; Sabot, Andrea; Sankar, Subramanian V; Scally, Aylwyn; Schroth, Gary P; Smith, Mark E; Smith, Vincent P; Spiridou, Anastassia; Torrance, Peta E; Tzonev, Svilen S; Vermaas, Eric H; Walter, Klaudia; Wu, Xiaolin; Zhang, Lu; Alam, Mohammed D; Anastasi, Carole; Aniebo, Ify C; Bailey, David M D; Bancarz, Iain R; Banerjee, Saibal; Barbour, Selena G; Baybayan, Primo A; Benoit, Vincent A; Benson, Kevin F; Bevis, Claire; Black, Phillip J; Boodhun, Asha; Brennan, Joe S; Bridgham, John A; Brown, Rob C; Brown, Andrew A; Buermann, Dale H; Bundu, Abass A; Burrows, James C; Carter, Nigel P; Castillo, Nestor; Chiara E Catenazzi, Maria; Chang, Simon; Neil Cooley, R; Crake, Natasha R; Dada, Olubunmi O; Diakoumakos, Konstantinos D; Dominguez-Fernandez, Belen; Earnshaw, David J; Egbujor, Ugonna C; Elmore, David W; Etchin, Sergey S; Ewan, Mark R; Fedurco, Milan; Fraser, Louise J; Fuentes Fajardo, Karin V; Scott Furey, W; George, David; Gietzen, Kimberley J; Goddard, Colin P; Golda, George S; Granieri, Philip A; Green, David E; Gustafson, David L; Hansen, Nancy F; Harnish, Kevin; Haudenschild, Christian D; Heyer, Narinder I; Hims, Matthew M; Ho, Johnny T; Horgan, Adrian M; Hoschler, Katya; Hurwitz, Steve; Ivanov, Denis V; Johnson, Maria Q; James, Terena; Huw Jones, T A; Kang, Gyoung-Dong; Kerelska, Tzvetana H; Kersey, Alan D; Khrebtukova, Irina; Kindwall, Alex P; Kingsbury, Zoya; Kokko-Gonzales, Paula I; Kumar, Anil; Laurent, Marc A; Lawley, Cynthia T; Lee, Sarah E; Lee, Xavier; Liao, Arnold K; Loch, Jennifer A; Lok, Mitch; Luo, Shujun; Mammen, Radhika M; Martin, John W; McCauley, Patrick G; McNitt, Paul; Mehta, Parul; Moon, Keith W; Mullens, Joe W; Newington, Taksina; Ning, Zemin; Ling Ng, Bee; Novo, Sonia M; O'Neill, Michael J; Osborne, Mark A; Osnowski, Andrew; Ostadan, Omead; Paraschos, Lambros L; Pickering, Lea; Pike, Andrew C; Pike, Alger C; Chris Pinkard, D; Pliskin, Daniel P; Podhasky, Joe; Quijano, Victor J; Raczy, Come; Rae, Vicki H; Rawlings, Stephen R; Chiva Rodriguez, Ana; Roe, Phyllida M; Rogers, John; Rogert Bacigalupo, Maria C; Romanov, Nikolai; Romieu, Anthony; Roth, Rithy K; Rourke, Natalie J; Ruediger, Silke T; Rusman, Eli; Sanches-Kuiper, Raquel M; Schenker, Martin R; Seoane, Josefina M; Shaw, Richard J; Shiver, Mitch K; Short, Steven W; Sizto, Ning L; Sluis, Johannes P; Smith, Melanie A; Ernest Sohna Sohna, Jean; Spence, Eric J; Stevens, Kim; Sutton, Neil; Szajkowski, Lukasz; Tregidgo, Carolyn L; Turcatti, Gerardo; Vandevondele, Stephanie; Verhovsky, Yuli; Virk, Selene M; Wakelin, Suzanne; Walcott, Gregory C; Wang, Jingwen; Worsley, Graham J; Yan, Juying; Yau, Ling; Zuerlein, Mike; Rogers, Jane; Mullikin, James C; Hurles, Matthew E; McCooke, Nick J; West, John S; Oaks, Frank L; Lundberg, Peter L; Klenerman, David; Durbin, Richard; Smith, Anthony J

    2008-11-06

    DNA sequence information underpins genetic research, enabling discoveries of important biological or medical benefit. Sequencing projects have traditionally used long (400-800 base pair) reads, but the existence of reference sequences for the human and many other genomes makes it possible to develop new, fast approaches to re-sequencing, whereby shorter reads are compared to a reference to identify intraspecies genetic variation. Here we report an approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost. Single molecules of DNA are attached to a flat surface, amplified in situ and used as templates for synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analysed to generate high-quality sequence. We demonstrate application of this approach to human genome sequencing on flow-sorted X chromosomes and then scale the approach to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. We build an accurate consensus sequence from >30x average depth of paired 35-base reads. We characterize four million single-nucleotide polymorphisms and four hundred thousand structural variants, many of which were previously unknown. Our approach is effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.

  11. Physician Experiences and Understanding of Genomic Sequencing in Oncology.

    PubMed

    Weipert, Caroline M; Ryan, Kerry A; Everett, Jessica N; Yashar, Beverly M; Chinnaiyan, Arul M; Scott Roberts, J; De Vries, Raymond; Zikmund-Fisher, Brian J; Raymond, Victoria M

    2017-08-24

    The amount of information produced by genomic sequencing is vast, technically complicated, and can be difficult to interpret. Appropriately tailoring genomic information for non-geneticists is an essential next step in the clinical use of genomic sequencing. To initiate development of a framework for genomic results communication, we conducted eighteen qualitative interviews with oncologists who had referred adult cancer patients to a matched tumor-normal tissue genomic sequencing study. In our qualitative analysis, we found varied levels of clinician knowledge relating to sequencing technology, the scope of the tumor genomic sequencing study, and incidental germline findings. Clinicians expressed a perceived need for more genetics education. Additionally, they had a variety of suggestions for improving results reports and possible resources to aid in results interpretation. Most clinicians felt genetic counselors were needed when incidental germline findings were identified. Our research suggests that more consistent genetics education is imperative in ensuring the proper utilization of genomic sequencing in cancer care. Clinician suggestions for results interpretation resources and results report modifications could be used to improve communication. Clinicians' perceived need to involve genetic counselors when incidental germline findings were found suggests genetic specialists could play a critical role in ensuring patients receive appropriate follow-up.

  12. Composition and organization of active centromere sequences in complex genomes

    PubMed Central

    2012-01-01

    Background Centromeres are sites of chromosomal spindle attachment during mitosis and meiosis. While the sequence basis for centromere identity remains a subject of considerable debate, one approach is to examine the genomic organization at these active sites that are correlated with epigenetic marks of centromere function. Results We have developed an approach to characterize both satellite and non-satellite centromeric sequences that are missing from current assemblies in complex genomes, using the dog genome as an example. Combining this genomic reference with an epigenetic dataset corresponding to sequences associated with the histone H3 variant centromere protein A (CENP-A), we identify active satellite sequence domains that appear to be both functionally and spatially distinct within the overall definition of satellite families. Conclusions These findings establish a genomic and epigenetic foundation for exploring the functional role of centromeric sequences in the previously sequenced dog genome and provide a model for similar studies within the context of less-characterized genomes. PMID:22817545

  13. Comparative genomics of first available bovine Anaplasma phagocytophilum genome obtained with targeted sequence capture.

    PubMed

    Dugat, Thibaud; Loux, Valentin; Marthey, Sylvain; Moroldo, Marco; Lagrée, Anne-Claire; Boulouis, Henri-Jean; Haddad, Nadia; Maillard, Renaud

    2014-11-17

    Anaplasma phagocytophilum is a zoonotic and obligate intracellular bacterium transmitted by ticks. In domestic ruminants, it is the causative agent of tick-borne fever, which causes significant economic losses in Europe. As A. phagocytophilum is difficult to isolate and cultivate, only nine genome sequences have been published to date, none of which originate from a bovine strain.Our goals were to; 1/ develop a sequencing methodology which efficiently circumvents the difficulties associated with A. phagocytophilum isolation and culture; 2/ describe the first genome of a bovine strain; and 3/ compare it with available genomes, in order to both explore key genomic features at the species level, and to identify candidate genes that could be specific to bovine strains. DNA was extracted from a bovine blood sample infected by A. phagocytophilum. Following a whole genome capture approach, A. phagocytophilum DNA was enriched 197-fold in the sample and then sequenced using Illumina technology. In total, 58.9% of obtained reads corresponded to the A. phagocytophilum genome, covering 85.3% of the HZ genome. Then by performing comparisons with nine previously-sequenced A. phagocytophilum genomes, we determined the core genome of these ten strains. Following analysis, 1281 coding DNA sequences, including 1001 complete sequences, were detected in the A. phagocytophilum bovine genome, of which four appeared to be unique to the bovine isolate. These four coding DNA sequences coded for "hypothetical proteins of unknown function" and require further analysis. We also identified nine proteins common to both European domestic ruminants tested. Using a whole genome capture approach, we have sequenced the first A. phagocytophilum genome isolated from a cow. To the best of our knowledge, this is the first time that this method has been used to selectively enrich pathogenic bacterial DNA from samples also containing host DNA. The four proteins unique to the A. phagocytophilum bovine

  14. Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.

    PubMed

    Staats, Martijn; Erkens, Roy H J; van de Vossenberg, Bart; Wieringa, Jan J; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E; Bakker, Freek T

    2013-01-01

    Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal

  15. Real-time, portable genome sequencing for Ebola surveillance

    PubMed Central

    Bore, Joseph Akoi; Koundouno, Raymond; Dudas, Gytis; Mikhail, Amy; Ouédraogo, Nobila; Afrough, Babak; Bah, Amadou; Baum, Jonathan HJ; Becker-Ziaja, Beate; Boettcher, Jan-Peter; Cabeza-Cabrerizo, Mar; Camino-Sanchez, Alvaro; Carter, Lisa L.; Doerrbecker, Juiliane; Enkirch, Theresa; Dorival, Isabel Graciela García; Hetzelt, Nicole; Hinzmann, Julia; Holm, Tobias; Kafetzopoulou, Liana Eleni; Koropogui, Michel; Kosgey, Abigail; Kuisma, Eeva; Logue, Christopher H; Mazzarelli, Antonio; Meisel, Sarah; Mertens, Marc; Michel, Janine; Ngabo, Didier; Nitzsche, Katja; Pallash, Elisa; Patrono, Livia Victoria; Portmann, Jasmine; Repits, Johanna Gabriella; Rickett, Natasha Yasmin; Sachse, Andrea; Singethan, Katrin; Vitoriano, Inês; Yemanaberhan, Rahel L; Zekeng, Elsa G; Trina, Racine; Bello, Alexander; Sall, Amadou Alpha; Faye, Ousmane; Faye, Oumar; Magassouba, N’Faly; Williams, Cecelia V.; Amburgey, Victoria; Winona, Linda; Davis, Emily; Gerlach, Jon; Washington, Franck; Monteil, Vanessa; Jourdain, Marine; Bererd, Marion; Camara, Alimou; Somlare, Hermann; Camara, Abdoulaye; Gerard, Marianne; Bado, Guillaume; Baillet, Bernard; Delaune, Déborah; Nebie, Koumpingnin Yacouba; Diarra, Abdoulaye; Savane, Yacouba; Pallawo, Raymond Bernard; Gutierrez, Giovanna Jaramillo; Milhano, Natacha; Roger, Isabelle; Williams, Christopher J; Yattara, Facinet; Lewandowski, Kuiama; Taylor, Jamie; Rachwal, Philip; Turner, Daniel; Pollakis, Georgios; Hiscox, Julian A.; Matthews, David A.; O’Shea, Matthew K.; Johnston, Andrew McD; Wilson, Duncan; Hutley, Emma; Smit, Erasmus; Di Caro, Antonino; Woelfel, Roman; Stoecker, Kilian; Fleischmann, Erna; Gabriel, Martin; Weller, Simon A.; Koivogui, Lamine; Diallo, Boubacar; Keita, Sakoba; Rambaut, Andrew; Formenty, Pierre; Gunther, Stephan; Carroll, Miles W.

    2016-01-01

    The Ebola virus disease (EVD) epidemic in West Africa is the largest on record, responsible for >28,599 cases and >11,299 deaths 1. Genome sequencing in viral outbreaks is desirable in order to characterize the infectious agent to determine its evolutionary rate, signatures of host adaptation, identification and monitoring of diagnostic targets and responses to vaccines and treatments. The Ebola virus genome (EBOV) substitution rate in the Makona strain has been estimated at between 0.87 × 10−3 to 1.42 × 10−3 mutations per site per year. This is equivalent to 16 to 27 mutations in each genome, meaning that sequences diverge rapidly enough to identify distinct sub-lineages during a prolonged epidemic 2-7. Genome sequencing provides a high-resolution view of pathogen evolution and is increasingly sought-after for outbreak surveillance. Sequence data may be used to guide control measures, but only if the results are generated quickly enough to inform interventions 8. Genomic surveillance during the epidemic has been sporadic due to a lack of local sequencing capacity coupled with practical difficulties transporting samples to remote sequencing facilities 9. In order to address this problem, we devised a genomic surveillance system that utilizes a novel nanopore DNA sequencing instrument. In April 2015 this system was transported in standard airline luggage to Guinea and used for real-time genomic surveillance of the ongoing epidemic. Here we present sequence data and analysis of 142 Ebola virus (EBOV) samples collected during the period March to October 2015. We were able to generate results in less than 24 hours after receiving an Ebola positive sample, with the sequencing process taking as little as 15-60 minutes. We show that real-time genomic surveillance is possible in resource-limited settings and can be established rapidly to monitor outbreaks. PMID:26840485

  16. The Genome Sequence of Avibacterium paragallinarum Strain CL Has a Large Repertoire of Insertion Sequence Elements.

    PubMed

    Horta-Valerdi, Guillermo; Sanchez-Alonso, Maria Patricia; Perez-Marquez, Victor M; Negrete-Abascal, Erasmo; Vaca-Pacheco, Sergio; Hernandez-Gonzalez, Ismael; Gomez-Lunar, Zulema; Olmedo-Álvarez, Gabriela; Vázquez-Cruz, Candelario

    2017-04-13

    The draft genome sequence of Avibacterium paragallinarum strain CL serovar C is reported here. The genome comprises 154 contigs corresponding to 2.4 Mb with 41% G+C content and many insertion sequence (IS) elements, a characteristic not previously reported in A. paragallinarum. Copyright © 2017 Horta-Valerdi et al.

  17. The Genome Sequence of Avibacterium paragallinarum Strain CL Has a Large Repertoire of Insertion Sequence Elements

    PubMed Central

    Horta-Valerdi, Guillermo; Sanchez-Alonso, Maria Patricia; Perez-Marquez, Victor M.; Negrete-Abascal, Erasmo; Vaca-Pacheco, Sergio; Hernandez-Gonzalez, Ismael; Gomez-Lunar, Zulema; Olmedo-Álvarez, Gabriela

    2017-01-01

    ABSTRACT The draft genome sequence of Avibacterium paragallinarum strain CL serovar C is reported here. The genome comprises 154 contigs corresponding to 2.4 Mb with 41% G+C content and many insertion sequence (IS) elements, a characteristic not previously reported in A. paragallinarum. PMID:28408672

  18. Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology

    Treesearch

    Richard Cronn; Aaron Liston; Matthew Parks; David S. Gernandt; Rongkun Shen; Todd Mockler

    2008-01-01

    Organellar DNA sequences are widely used in evolutionary and population genetic studies; however, the conservative nature of chloroplast gene and genome evolution often limits phylogenetic resolution and statistical power. To gain maximal access to the historical record contained within chloroplast genomes, we have adapted multiplex sequencing-by-synthesis (MSBS) to...

  19. BAC-pool 454-sequencing: A rapid and efficient approach to sequence complex tetraploid cotton genomes

    USDA-ARS?s Scientific Manuscript database

    New and emerging next generation sequencing technologies have been promising in reducing sequencing costs, but not significantly for complex polyploid plant genomes such as cotton. Large and highly repetitive genome of G. hirsutum (~2.5GB) is less amenable and cost-intensive with traditional BAC-by...

  20. Draft Genome Sequences of 24 Microbial Strains Assembled from Direct Sequencing from 4 Stool Samples

    PubMed Central

    Hernández, Álvaro; White, Bryan A.; O’Brien, Daniel; Ahlquist, David; Boardman, Lisa

    2015-01-01

    The ability to assemble genomes from metagenomic sequencing avoids the need for culture and any associated culture biases. We assembled 24 essentially complete draft genomes from metagenomic pair-end and size-selected mate pair sequencing from 4 stool samples, 2 from subjects diagnosed with colorectal cancer and 2 from healthy controls. PMID:26021920

  1. Genome Sequence of a Novel Iflavirus from mRNA Sequencing of the Butterfly Heliconius erato

    PubMed Central

    Macias-Muñoz, Aide; Briscoe, Adriana D.

    2014-01-01

    Here, we report the genome sequence of a novel iflavirus strain recovered from the neotropical butterfly Heliconius erato. The coding DNA sequence (CDS) of the iflavirus genome was 8,895 nucleotides in length, encoding a polyprotein that was 2,965 amino acids long. PMID:24831145

  2. Complete genome sequence of Cellulomonas flavigena type strain (134).

    PubMed

    Abt, Birte; Foster, Brian; Lapidus, Alla; Clum, Alicia; Sun, Hui; Pukall, Rüdiger; Lucas, Susan; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Goodwin, Lynne; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Rohde, Manfred; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-07-29

    Cellulomonas flavigena (Kellerman and McBeth 1912) Bergey et al. 1923 is the type species of the genus Cellulomonas of the actinobacterial family Cellulomonadaceae. Members of the genus Cellulomonas are of special interest for their ability to degrade cellulose and hemicellulose, particularly with regard to the use of biomass as an alternative energy source. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the genus Cellulomonas, and next to the human pathogen Tropheryma whipplei the second complete genome sequence within the actinobacterial family Cellulomonadaceae. The 4,123,179 bp long single replicon genome with its 3,735 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  3. Genome sequencing and analysis of the model grass Brachypodium distachyon.

    PubMed

    2010-02-11

    Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.

  4. Genome sequencing and analysis of the model grass Brachypodium distachyon

    SciTech Connect

    Yang, Xiaohan; Kalluri, Udaya C; Tuskan, Gerald A

    2010-01-01

    Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.

  5. Complete genome sequence of Cellulomonas flavigena type strain (134T)

    PubMed Central

    Abt, Birte; Foster, Brian; Lapidus, Alla; Clum, Alicia; Sun, Hui; Pukall, Rüdiger; Lucas, Susan; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Goodwin, Lynne; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Rohde, Manfred; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2010-01-01

    Cellulomonas flavigena (Kellerman and McBeth 1912) Bergey et al. 1923 is the type species of the genus Cellulomonas of the actinobacterial family Cellulomonadaceae. Members of the genus Cellulomonas are of special interest for their ability to degrade cellulose and hemicellulose, particularly with regard to the use of biomass as an alternative energy source. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the genus Cellulomonas, and next to the human pathogen Tropheryma whipplei the second complete genome sequence within the actinobacterial family Cellulomonadaceae. The 4,123,179 bp long single replicon genome with its 3,735 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304688

  6. Complete genome sequence of Cellulomonas flavigena type strain (134T)

    SciTech Connect

    Abt, Birte; Foster, Brian; Lapidus, Alla L.; Clum, Alicia; Sun, Hui; Pukall, Rudiger; Lucas, Susan; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Goodwin, Lynne A.; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Rohde, Manfred; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-01

    Cellulomonas flavigena (Kellerman and McBeth 1912) Bergey et al. 1923 is the type species of the genus Cellulomonas of the actinobacterial family Cellulomonadaceae. Members of the genus Cellulomonas are of special interest for their ability to degrade cellulose and hemicellulose, particularly with regard to the use of biomass as an alternative energy source. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the genus Cellulomonas, and next to the human pathogen Tropheryma whipplei the second complete genome sequence within the actinobacterial family Cellulomonadaceae. The 4,123,179 bp long single replicon genome with its 3,735 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  7. The Release 6 reference sequence of the Drosophila melanogaster genome.

    PubMed

    Hoskins, Roger A; Carlson, Joseph W; Wan, Kenneth H; Park, Soo; Mendez, Ivonne; Galle, Samuel E; Booth, Benjamin W; Pfeiffer, Barret D; George, Reed A; Svirskas, Robert; Krzywinski, Martin; Schein, Jacqueline; Accardo, Maria Carmela; Damia, Elisabetta; Messina, Giovanni; Méndez-Lago, María; de Pablos, Beatriz; Demakova, Olga V; Andreyeva, Evgeniya N; Boldyreva, Lidiya V; Marra, Marco; Carvalho, A Bernardo; Dimitri, Patrizio; Villasante, Alfredo; Zhimulev, Igor F; Rubin, Gerald M; Karpen, Gary H; Celniker, Susan E

    2015-03-01

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads.

  8. The Release 6 reference sequence of the Drosophila melanogaster genome

    DOE PAGES

    Hoskins, Roger A.; Carlson, Joseph W.; Wan, Kenneth H.; ...

    2015-01-14

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy andmore » middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. In conclusion, further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads.« less

  9. The Release 6 reference sequence of the Drosophila melanogaster genome

    PubMed Central

    Carlson, Joseph W.; Wan, Kenneth H.; Park, Soo; Mendez, Ivonne; Galle, Samuel E.; Booth, Benjamin W.; Pfeiffer, Barret D.; George, Reed A.; Svirskas, Robert; Krzywinski, Martin; Schein, Jacqueline; Accardo, Maria Carmela; Damia, Elisabetta; Messina, Giovanni; Méndez-Lago, María; de Pablos, Beatriz; Demakova, Olga V.; Andreyeva, Evgeniya N.; Boldyreva, Lidiya V.; Marra, Marco; Carvalho, A. Bernardo; Dimitri, Patrizio; Villasante, Alfredo; Zhimulev, Igor F.; Rubin, Gerald M.; Karpen, Gary H.

    2015-01-01

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads. PMID:25589440

  10. Draft Genome Sequences of 1,183 Salmonella Strains from the 100K Pathogen Genome Project

    PubMed Central

    Kong, Nguyet; Davis, Matthew; Arabyan, Narine; Huang, Bihua C.; Weis, Allison M.; Chen, Poyin; Thao, Kao; Ng, Whitney; Chin, Ning; Foutouhi, Soraya; Foutouhi, Azarene; Kaufman, James; Xie, Yi; Storey, Dylan B.

    2017-01-01

    ABSTRACT Salmonella is a common food-associated bacterium that has substantial impact on worldwide human health and the global economy. This is the public release of 1,183 Salmonella draft genome sequences as part of the 100K Pathogen Genome Project. These isolates represent global genomic diversity in the Salmonella genus. PMID:28705963

  11. Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution

    USDA-ARS?s Scientific Manuscript database

    Genetic and genomic analyses of Upland cotton (Gossypium hirsutum) are difficult because it has a complex allotetraploid (AADD; 2n = 4x = 52) genome. Here we sequenced, assembled and analyzed the world's most important cultivated cotton genome with 246.2 gigabase (Gb) clean data obtained using whol...

  12. 100K Pathogen Genome Project: 306 Listeria Draft Genome Sequences for Food Safety and Public Health.

    PubMed

    Chen, Poyin; Kong, Nguyet; Huang, Bihua; Thao, Kao; Ng, Whitney; Storey, Dylan Bobby; Arabyan, Narine; Foutouhi, Azarene; Foutouhi, Soraya; Weimer, Bart C

    2017-02-09

    Listeria monocytogenes is a food-associated bacterium that is responsible for food-related illnesses worldwide. This is the initial public release of 306 L. monocytogenes genome sequences as part of the 100K Pathogen Genome Project. These isolates represent global genomic diversity in L. monocytogenes.

  13. 100K Pathogen Genome Project: 306 Listeria Draft Genome Sequences for Food Safety and Public Health

    PubMed Central

    Chen, Poyin; Kong, Nguyet; Huang, Bihua; Thao, Kao; Ng, Whitney; Storey, Dylan Bobby; Arabyan, Narine; Foutouhi, Azarene; Foutouhi, Soraya

    2017-01-01

    ABSTRACT Listeria monocytogenes is a food-associated bacterium that is responsible for food-related illnesses worldwide. This is the initial public release of 306 L. monocytogenes genome sequences as part of the 100K Pathogen Genome Project. These isolates represent global genomic diversity in L. monocytogenes. PMID:28183778

  14. Draft Genome Sequences of 1,183 Salmonella Strains from the 100K Pathogen Genome Project.

    PubMed

    Kong, Nguyet; Davis, Matthew; Arabyan, Narine; Huang, Bihua C; Weis, Allison M; Chen, Poyin; Thao, Kao; Ng, Whitney; Chin, Ning; Foutouhi, Soraya; Foutouhi, Azarene; Kaufman, James; Xie, Yi; Storey, Dylan B; Weimer, Bart C

    2017-07-13

    Salmonella is a common food-associated bacterium that has substantial impact on worldwide human health and the global economy. This is the public release of 1,183 Salmonella draft genome sequences as part of the 100K Pathogen Genome Project. These isolates represent global genomic diversity in the Salmonella genus. Copyright © 2017 Kong et al.

  15. The Arabidopsis lyrata genome sequence and the basis of rapid genome size change

    SciTech Connect

    Hu, Tina T.; Pattyn, Pedro; Bakker, Erica G.; Cao, Jun; Cheng, Jan-Fang; Clark, Richard M.; Fahlgren, Noah; Fawcett, Jeffrey A.; Grimwood, Jane; Gundlach, Heidrun; Haberer, Georg; Hollister, Jesse D.; Ossowski, Stephan; Ottilar, Robert P.; Salamov, Asaf A.; Schneeberger, Korbinian; Spannagl, Manuel; Wang, Xi; Yang, Liang; Nasrallah, Mikhail E.; Bergelson, Joy; Carrington, James C.; Gaut, Brandon S.; Schmutz, Jeremy; Mayer, Klaus F. X.; Van de Peer, Yves; Grigoriev, Igor V.; Nordborg, Magnus; Weigel, Detlef; Guo, Ya-Long

    2011-04-29

    In our manuscript, we present a high-quality genome sequence of the Arabidopsis thaliana relative, Arabidopsis lyrata, produced by dideoxy sequencing. We have performed the usual types of genome analysis (gene annotation, dN/dS studies etc. etc.), but this is relegated to the Supporting Information. Instead, we focus on what was a major motivation for sequencing this genome, namely to understand how A. thaliana lost half its genome in a few million years and lived to tell the tale. The rather surprising conclusion is that there is not a single genomic feature that accounts for the reduced genome, but that every aspect centromeres, intergenic regions, transposable elements, gene family number is affected through hundreds of thousands of cuts. This strongly suggests that overall genome size in itself is what has been under selection, a suggestion that is strongly supported by our demonstration (using population genetics data from A. thaliana) that new deletions seem to be driven to fixation.

  16. Genome Sequence of Escherichia coli Tailed Phage Utah

    PubMed Central

    Leavitt, Justin C.; Heitkamp, Alexandra J.; Bhattacharjee, Ananda S.; Gilcrease, Eddie B.

    2017-01-01

    ABSTRACT Escherichia coli bacteriophage Utah is a member of the chi-like tailed phage cluster in the Siphoviridae family. We report here the complete 59,024-bp sequence of the genome of phage Utah. PMID:28360173

  17. Draft Genome Sequences of 24 Lactococcus lactis Strains

    PubMed Central

    Backus, Lennart; Wels, Michiel; Boekhorst, Jos; Dijkstra, Annereinou R.; Beerthuyzen, Marke; Kelly, William J.; Siezen, Roland J.; van Hijum, Sacha A. F. T.

    2017-01-01

    ABSTRACT The lactic acid bacterium Lactococcus lactis is widely used for the production of fermented dairy products. Here, we present the draft genome sequences of 24 L. lactis strains isolated from different environments and geographic locations. PMID:28360177

  18. Draft Genome Sequences of Gammaproteobacterial Methanotrophs Isolated from Marine Ecosystems

    PubMed Central

    Flynn, James D.; Hirayama, Hisako; Sakai, Yasuyoshi; Dunfield, Peter F.; Knief, Claudia; Op den Camp, Huub J. M.; Jetten, Mike S. M.; Khmelenina, Valentina N.; Trotsenko, Yuri A.; Murrell, J. Colin; Semrau, Jeremy D.; Svenning, Mette M.; Stein, Lisa Y.; Kyrpides, Nikos; Shapiro, Nicole; Woyke, Tanja; Bringel, Françoise; Vuilleumier, Stéphane; DiSpirito, Alan A.

    2016-01-01

    The genome sequences of Methylobacter marinus A45, Methylobacter sp. strain BBA5.1, and Methylomarinum vadi IT-4 were obtained. These aerobic methanotrophs are typical members of coastal and hydrothermal vent marine ecosystems. PMID:26798114

  19. Draft Genome Sequences of Three Mycobacterium chimaera Respiratory Isolates.

    PubMed

    Mac Aogáin, Micheál; Roycroft, Emma; Raftery, Philomena; Mok, Simone; Fitzgibbon, Margaret; Rogers, Thomas R

    2015-12-03

    Mycobacterium chimaera is an opportunistic human pathogen implicated in both pulmonary and cardiovascular infections. Here, we report the draft genome sequences of three strains isolated from human respiratory specimens. Copyright © 2015 Mac Aogáin et al.

  20. Draft Genome Sequences of Nine Cyanobacterial Strains from Diverse Habitats.

    PubMed

    Zhu, Tao; Hou, Shengwei; Lu, Xuefeng; Hess, Wolfgang R

    2017-03-02

    Here, we report the annotated draft genome sequences of nine different cyanobacteria, which were originally collected from different habitats, including hot springs, terrestrial, freshwater, and marine environments, and cover four of the five morphological subsections of cyanobacteria.

  1. Sequence analysis of the complete mitochondrial genome of Youxian sheldrake.

    PubMed

    He, Shao-Ping; Liu, Li-Li; Yu, Qi-Fang; Li, Si; He, Jian-Hua

    2016-01-01

    Youxian sheldrake is excellent native breeds in Hunan province in China. The complete mitochondrial (mt) genome sequence plays an important role in the accurate determination of phylogenetic relationships among metazoans. This is the first study to determine the complete mitochondrial genome sequence of Youxian sheldrake using PCR-based amplification and Sanger sequencing. The characteristic of the entire mitochondrial genome was analyzed in detail, the total length of the mitogenome is 16,605 bp, with the base composition of 29.21% A, 22.18% T, 32.84% C, 15.77% G in the Youxian sheldrake. It contained 2 ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and a major non-coding control region (D-loop region). The complete mitochondrial genome sequence of Youxian sheldrake provided an important data for further study of the phylogenetics of poultry, and available data for the genetics and breeding.

  2. Draft Genome Sequence of Paecilomyces hepiali, Isolated from Cordyceps sinensis.

    PubMed

    Yu, Yi; Wang, Wenting; Wang, Linping; Pang, Fang; Guo, Lanping; Song, Lai; Liu, Guiming; Feng, Chengqiang

    2016-07-07

    Paecilomyces hepiali is an endoparasitic fungus that commonly exists in the natural Cordyceps sinensis Here, we report the draft genome sequence of P. hepiali, which will facilitate the exploitation of medicinal compounds produced by the fungus.

  3. Draft Genome Sequence of Paecilomyces hepiali, Isolated from Cordyceps sinensis

    PubMed Central

    Yu, Yi; Wang, Wenting; Wang, Linping; Pang, Fang; Guo, Lanping; Song, Lai

    2016-01-01

    Paecilomyces hepiali is an endoparasitic fungus that commonly exists in the natural Cordyceps sinensis. Here, we report the draft genome sequence of P. hepiali, which will facilitate the exploitation of medicinal compounds produced by the fungus. PMID:27389266

  4. Genome Sequence of Mycoplasma hyorhinis Strain DBS 1050.

    PubMed

    Dabrazhynetskaya, Alena; Soika, Valerii; Volokhov, Dmitriy; Simonyan, Vahan; Chizhikov, Vladimir

    2014-03-06

    Mycoplasma hyorhinis is known as one of the most prevalent contaminants of mammalian cell and tissue cultures worldwide. Here, we present the complete genome sequence of the fastidious M. hyorhinis strain DBS 1050.

  5. Genome Sequence of Mycoplasma hyorhinis Isolated from Cell Cultures

    PubMed Central

    Cibulski, Samuel Paulo; Siqueira, Franciele Maboni; Teixeira, Thais Fumaco; Mayer, Fabiana Quoos; Almeida, Luiz Gonzaga

    2016-01-01

    Mycoplasmas are major contaminants of mammalian cell cultures. Here, the complete genome sequence of Mycoplasma hyorhinis recovered from Madin-Darby bovine kidney (MDBK) cells is reported. PMID:27738034

  6. Transcriptome and genome sequencing uncovers functional variation in humans.

    PubMed

    Lappalainen, Tuuli; Sammeth, Michael; Friedländer, Marc R; 't Hoen, Peter A C; Monlong, Jean; Rivas, Manuel A; Gonzàlez-Porta, Mar; Kurbatova, Natalja; Griebel, Thasso; Ferreira, Pedro G; Barann, Matthias; Wieland, Thomas; Greger, Liliana; van Iterson, Maarten; Almlöf, Jonas; Ribeca, Paolo; Pulyakhina, Irina; Esser, Daniela; Giger, Thomas; Tikhonov, Andrew; Sultan, Marc; Bertier, Gabrielle; MacArthur, Daniel G; Lek, Monkol; Lizano, Esther; Buermans, Henk P J; Padioleau, Ismael; Schwarzmayr, Thomas; Karlberg, Olof; Ongen, Halit; Kilpinen, Helena; Beltran, Sergi; Gut, Marta; Kahlem, Katja; Amstislavskiy, Vyacheslav; Stegle, Oliver; Pirinen, Matti; Montgomery, Stephen B; Donnelly, Peter; McCarthy, Mark I; Flicek, Paul; Strom, Tim M; Lehrach, Hans; Schreiber, Stefan; Sudbrak, Ralf; Carracedo, Angel; Antonarakis, Stylianos E; Häsler, Robert; Syvänen, Ann-Christine; van Ommen, Gert-Jan; Brazma, Alvis; Meitinger, Thomas; Rosenstiel, Philip; Guigó, Roderic; Gut, Ivo G; Estivill, Xavier; Dermitzakis, Emmanouil T

    2013-09-26

    Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project--the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.

  7. Draft Genome Sequences of 11 Lactococcus lactis subsp. cremoris Strains

    PubMed Central

    Backus, Lennart; Boekhorst, Jos; Dijkstra, Annereinou; Beerthuyzen, Marke; Siezen, Roland J.; Bachmann, Herwig; van Hijum, Sacha A. F. T.

    2017-01-01

    ABSTRACT The lactic acid bacterium Lactococcus lactis is widely used for the fermentation of dairy products. Here, we present the draft genome sequences of 11 L. lactis subsp. cremoris strains isolated from different environments. PMID:28302789

  8. Complete genome sequence of Allochromatium vinosum DSM 180T

    PubMed Central

    Weissgerber, Thomas; Zigann, Renate; Bruce, David; Chang, Yun-juan; Detter, John C.; Han, Cliff; Hauser, Loren; Jeffries, Cynthia D.; Land, Miriam; Munk, A. Christine; Tapia, Roxanne; Dahl, Christiane

    2011-01-01

    Allochromatium vinosum formerly Chromatium vinosum is a mesophilic purple sulfur bacterium belonging to the family Chromatiaceae in the bacterial class Gammaproteobacteria. The genus Allochromatium contains currently five species. All members were isolated from freshwater, brackish water or marine habitats and are predominately obligate phototrophs. Here we describe the features of the organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the Chromatiaceae within the purple sulfur bacteria thriving in globally occurring habitats. The 3,669,074 bp genome with its 3,302 protein-coding and 64 RNA genes was sequenced within the Joint Genome Institute Community Sequencing Program. PMID:22675582

  9. Draft Genome Sequences of Nine Cyanobacterial Strains from Diverse Habitats

    PubMed Central

    Zhu, Tao; Hou, Shengwei

    2017-01-01

    ABSTRACT Here, we report the annotated draft genome sequences of nine different cyanobacteria, which were originally collected from different habitats, including hot springs, terrestrial, freshwater, and marine environments, and cover four of the five morphological subsections of cyanobacteria. PMID:28254973

  10. Complete Genome Sequences of Six Strains of the Genus Methylobacterium

    SciTech Connect

    Marx, Christopher J; Bringel, Francoise O.; Christoserdova, Ludmila; Moulin, Lionel; UI Hague, Muhammad Farhan; Fleischman, Darrell E.; Gruffaz, Christelle; Jourand, Philippe; Knief, Claudia; Lee, Ming-Chun; Muller, Emilie E. L.; Nadalig, Thierry; Peyraud, Remi; Roselli, Sandro; Russ, Lina; Goodwin, Lynne A.; Ivanov, Pavel S.; Ivanova, N; Kyrpides, Nikos C; Lajus, Aurelie; Medigue, Claudine; Nolan, Matt; Woyke, Tanja; Stolyar, Sergey; Vorholt, Julia A.; Vuilleumier, Stephane

    2012-01-01

    The complete and assembled genome sequences were determined for six strains of the alphaproteobacterial genus Methylobacterium, chosen for their key adaptations to different plant-associated niches and environmental constraints.

  11. Complete genome sequences of six strains of the genus methylobacterium

    SciTech Connect

    Marx, Christopher J; Bringel, Francoise O.; Christoserdova, Ludmila; Moulin, Lionel; Farhan Ul Haque, Muhammad; Fleischman, Darrell E.; Gruffaz, Christelle; Jourand, Philippe; Knief, Claudia; Lee, Ming-Chun; Muller, Emilie E. L.; Nadalig, Thierry; Peyraud, Remi; Roselli, Sandro; Russ, Lina; Aguero, Fernan; Goodwin, Lynne A.; Ivanova, N; Kyrpides, Nikos C; Lajus, Aurelie; Medigue, Claudine; Nolan, Matt; Woyke, Tanja; Stolyar, Sergey; Vorholt, Julia A.; Vuilleumier, Stephane

    2012-01-01

    The complete and assembled genome sequences were determined for six strains of the alphaproteobacterial genus Methylobacterium, chosen for their key adaptations to different plant-associated niches and environmental constraints.

  12. Genome Sequence of the Immunomodulatory Strain Bifidobacterium bifidum LMG 13195

    PubMed Central

    Gueimonde, Miguel; Ventura, Marco; Margolles, Abelardo

    2012-01-01

    In this work, we report the genome sequences of Bifidobacterium bifidum strain LMG13195. Results from our research group show that this strain is able to interact with human immune cells, generating functional regulatory T cells. PMID:23209243

  13. Complete Genome Sequence of Rahnella aquatilis CIP 78.65

    SciTech Connect

    Martinez, Robert J; Bruce, David; Detter, J C; Goodwin, Lynne A.; Han, James; Han, Cliff; Held, Brittany; Land, Miriam L; Mikhailova, Natalia; Nolan, Matt; Pennacchio, Len; Pitluck, Sam; Tapia, Roxanne; Woyke, Tanja; Sobeckya, Patricia A.

    2012-01-01

    Rahnella aquatilis CIP 78.65 is a gammaproteobacterium isolated from a drinking water source in Lille, France. Here we report the complete genome sequence of Rahnella aquatilis CIP 78.65, the type strain of R. aquatilis.

  14. Draft Genome Sequences of Three Mycobacterium chimaera Respiratory Isolates

    PubMed Central

    Roycroft, Emma; Raftery, Philomena; Mok, Simone; Fitzgibbon, Margaret; Rogers, Thomas R.

    2015-01-01

    Mycobacterium chimaera is an opportunistic human pathogen implicated in both pulmonary and cardiovascular infections. Here, we report the draft genome sequences of three strains isolated from human respiratory specimens. PMID:26634757

  15. Genome sequence of the fish pathogen Flavobacterium columnare ATCC 49512

    USDA-ARS?s Scientific Manuscript database

    Flavobacterium columnare is a Gram-negative, rod shaped, motile, and highly prevalent fish pathogen causing columnaris disease in freshwater fish worldwide. Here, we present the complete genome sequence of F. columnare strain ATCC 49512. ...

  16. Genome Sequence of the Fish Pathogen Flavobacterium columnare ATCC 49512

    PubMed Central

    Tekedar, Hasan C.; Karsi, Attila; Gillaspy, Allison F.; Dyer, David W.; Benton, Nicole R.; Zaitshik, Jeremy; Vamenta, Stefanie; Banes, Michelle M.; Gülsoy, Nagihan; Aboko-Cole, Mary; Waldbieser, Geoffrey C.

    2012-01-01

    Flavobacterium columnare is a Gram-negative, rod-shaped, motile, and highly prevalent fish pathogen causing columnaris disease in freshwater fish worldwide. Here, we present the complete genome sequence of F. columnare strain ATCC 49512. PMID:22535941

  17. Draft Genome Sequence of Hydrotalea flava Strain CCUG 51397T.

    PubMed

    Leite, Laura Rabelo; Medeiros, Julliane Dutra; Fernandes, Gabriel Rocha; Araújo, Flávio; Pylro, Victor Satler; Salim, Anna Christina; Volpini, Ângela; Oliveira, Guilherme; Cuadros-Orellana, Sara

    2016-06-09

    We report the draft genome sequence of Hydrotalea flava CCUG 51397(T), the type strain of the genus Hydrotalea (family Chitinophagaceae), isolated from water samples in southern Sweden. Copyright © 2016 Leite et al.

  18. Complete Genome Sequence of Rahnella aquatilis CIP 78.65

    PubMed Central

    Bruce, David; Detter, Chris; Goodwin, Lynne A.; Han, James; Han, Cliff S.; Held, Brittany; Land, Miriam L.; Mikhailova, Natalia; Nolan, Matt; Pennacchio, Len; Pitluck, Sam; Tapia, Roxanne; Woyke, Tanja; Sobecky, Patricia A.

    2012-01-01

    Rahnella aquatilis CIP 78.65 is a gammaproteobacterium isolated from a drinking water source in Lille, France. Here we report the complete genome sequence of Rahnella aquatilis CIP 78.65, the type strain of R. aquatilis. PMID:22582378

  19. Genome sequence of vanilla distortion mosaic virus infecting Coriandrum sativum.

    PubMed

    Adams, I P; Rai, S; Deka, M; Harju, V; Hodges, T; Hayward, G; Skelton, A; Fox, A; Boonham, N

    2014-12-01

    The 9573-nucleotide genome of a potyvirus was sequenced from a Coriandrum sativum plant from India with viral symptoms. On analysis, this virus was shown to have greater than 85 % nucleotide sequence identity to vanilla distortion mosaic virus (VDMV). Analysis of the putative coat protein sequence confirmed that this virus was in fact VDMV, with greater than 91 % amino acid sequence identity. The genome appears to encode a 3083-amino-acid polyprotein potentially cleaved into the 10 mature proteins expected in potyviruses. Phylogenetic analysis confirmed that VDMV is a distinct but ungrouped member of the genus Potyvirus.

  20. Intra-species sequence comparisons for annotating genomes

    SciTech Connect

    Boffelli, Dario; Weer, Claire V.; Weng, Li; Lewis, Keith D.; Shoukry, Malak I.; Pachter, Lior; Keys, David N.; Rubin, Edward M.

    2004-07-15

    Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intra-species sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents and a set of genomic intervals amplified, resequenced and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom and raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. The sequence data from this study has been submitted to GenBank under accession nos. AY667278-AY667407.

  1. Genome Sequence of Mycobacterium Phage Waterfoul

    PubMed Central

    Jackson, Paige N.; Embry, Ella K.; Johnson, Christa O.; Watson, Tiara L.; Weast, Sayre K.; DeGraw, Caroline J.; Douglas, Jessica R.; Sellers, J. Michael; D’Angelo, William A.

    2016-01-01

    Waterfoul is a newly isolated temperate siphovirus of Mycobacterium smegmatis mc2155. It was identified as a member of the K5 cluster of Mycobacterium phages and has a 61,248-bp genome with 95 predicted genes. PMID:27856585

  2. Genome Sequence of Cluster W Mycobacteriophage Taptic

    PubMed Central

    Mageeney, Catherine M.; Seier, Emily R.; Esposito, Elise C.; Graham, Lee H.; Heckman, Emily L.; Hipwell, Chelsea M.; Kelliher, Allison B.; Lando, Nicole A.; Morales, Patricia Y.; Russell, Daniel A.; Tsaousis, Barbara E.; Kenna, Margaret A.

    2017-01-01

    ABSTRACT The Taptic genome is the first to be annotated from the W cluster of mycobacteriophages infecting Mycobacterium smegmatis mc2155. All 92 predicted open reading frames (ORFs) and a single tRNA specifying glycine (tRNA-gly) are transcribed rightward. Many functionally uncharacterized ORFs appear to be W cluster specific, as nucleotide similarity is shared only with other W cluster genomes. PMID:28302785

  3. Draft genome sequence of Therminicola potens strain JR

    SciTech Connect

    Byrne-Bailey, K.G.; Wrighton, K.C.; Melnyk, R.A.; Agbo, P.; Hazen, T.C.; Coates, J.D.

    2010-07-01

    'Thermincola potens' strain JR is one of the first Gram-positive dissimilatory metal-reducing bacteria (DMRB) for which there is a complete genome sequence. Consistent with the physiology of this organism, preliminary annotation revealed an abundance of multiheme c-type cytochromes that are putatively associated with the periplasm and cell surface in a Gram-positive bacterium. Here we report the complete genome sequence of strain JR.

  4. Complete genome sequence of the giant Pseudomonas phage Lu11.

    PubMed

    Adriaenssens, E M; Mattheus, W; Cornelissen, A; Shaburova, O; Krylov, V N; Kropinski, A M; Lavigne, R

    2012-06-01

    The complete genome sequence of the giant Pseudomonas phage Lu11 was determined, comparing 454 and Sanger sequencing. The double-stranded DNA (dsDNA) genome is 280,538 bp long and encodes 391 open reading frames (ORFs) and no tRNAs. The closest relative is Ralstonia phage ϕRSL1, encoding 40 similar proteins. As such, Lu11 can be considered phylogenetically unique within the Myoviridae and indicates the diversity of the giant phages within this family.

  5. Complete Mitochondrial Genome Sequence of Sunflower (Helianthus annuus L.)

    PubMed Central

    Ebert, Daniel P.; Kane, Nolan C.; Rieseberg, Loren H.

    2016-01-01

    This is the first complete mitochondrial genome sequence for sunflower and the first complete mitochondrial genome for any member of Asteraceae, the largest plant family, which includes over 23,000 named species. The master circle is 300,945-bp long and includes 27 protein-coding sequences, 18 tRNAs, and the 26S, 5S, and 18S rRNAs. PMID:27635002

  6. Complete Genome Sequences of 38 Gordonia sp. Bacteriophages

    PubMed Central

    Montgomery, Matthew T.; Bonilla, J. Alfred; Dejong, Randall; Garlena, Rebecca A.; Guerrero Bustamante, Carlos; Klyczek, Karen K.; Russell, Daniel A.; Wertz, John T.; Jacobs-Sera, Deborah; Hatfull, Graham F.

    2017-01-01

    ABSTRACT We report here the genome sequences of 38 newly isolated bacteriophages using Gordonia terrae 3612 (ATCC 25594) and Gordonia neofelifaecis NRRL59395 as bacterial hosts. All of the phages are double-stranded DNA (dsDNA) tail phages with siphoviral morphologies, with genome sizes ranging from 17,118 bp to 93,843 bp and spanning considerable nucleotide sequence diversity. PMID:28057748

  7. Draft Genome Sequence of Klebsiella pneumoniae AWD5

    PubMed Central

    Rajkumari, Jina; Singha, L. Paikhomba

    2017-01-01

    ABSTRACT Here, we report the draft genome sequence of Klebsiella pneumoniae strain AWD5, isolated from an automobile workshop in India. The de novo assembly resulted in a 4,807,409 bp genome containing 25 rRNA genes, 81 tRNAs, and 4,636 coding sequences (CDS). It carries important genes for polyaromatic hydrocarbon degradation and benzoate degradation. PMID:28153891

  8. Draft genome sequence of Gluconobacter thailandicus NBRC 3257

    PubMed Central

    Matsutani, Minenosuke; Yakushi, Toshiharu

    2014-01-01

    Gluconobacter thailandicus strain NBRC 3257, isolated from downy cherry (Prunus tomentosa), is a strict aerobic rod-shaped Gram-negative bacterium. Here, we report the features of this organism, together with the draft genome sequence and annotation. The draft genome sequence is composed of 107 contigs for 3,446,046 bp with 56.17% G+C content and contains 3,360 protein-coding genes and 54 RNA genes. PMID:25197448

  9. Use of information theory to study genome sequences

    NASA Astrophysics Data System (ADS)

    Ohya, Masanori; Sato, Keiko

    2000-12-01

    The genome sequence carries information about life as an order of four bases. It is considered that this order indicates a special code structure. In this paper we discuss how the mutual entropy, the main concept in Shannon's communication theory, can be used to study genome sequences, and how a measure introduced in our previous paper [10] for the analysis of similarities of code structures is applied for examining the coding structure of several species, in particular, HIV-1.

  10. Draft Genome Sequence of Neurospora crassa Strain FGSC 73

    SciTech Connect

    Baker, Scott E.; Schackwitz, Wendy; Lipzen, Anna; Martin, Joel; Haridas, Sajeet; LaButti, Kurt; Grigoriev, Igor V.; Simmons, Blake A.; McCluskey, Kevin

    2015-04-02

    We report the elucidation of the complete genome of the Neurospora crassa (Shear and Dodge) strain FGSC 73, a mat-a, trp-3 mutant strain. The genome sequence around the idiotypic mating type locus represents the only publicly available sequence for a mat-a strain. 40.42 Megabases are assembled into 358 scaffolds carrying 11,978 gene models.

  11. Single-Molecule Sequencing of the Drosophila serrata Genome

    PubMed Central

    Allen, Scott L.; Delaney, Emily K.; Kopp, Artyom; Chenoweth, Stephen F.

    2017-01-01

    Long-read sequencing technology promises to greatly enhance de novo assembly of genomes for nonmodel species. Although the error rates of long reads have been a stumbling block, sequencing at high coverage permits the self-correction of many errors. Here, we sequence and de novo assemble the genome of Drosophila serrata, a species from the montium subgroup that has been well-studied for latitudinal clines, sexual selection, and gene expression, but which lacks a reference genome. Using 11 PacBio single-molecule real-time (SMRT cells), we generated 12 Gbp of raw sequence data comprising ∼65 × whole-genome coverage. Read lengths averaged 8940 bp (NRead50 12,200) with the longest read at 53 kbp. We self-corrected reads using the PBDagCon algorithm and assembled the genome using the MHAP algorithm within the PBcR assembler. Total genome length was 198 Mbp with an N50 just under 1 Mbp. Contigs displayed a high degree of chromosome arm-level conservation with the D. melanogaster genome and many could be sensibly placed on the D. serrata physical map. We also provide an initial annotation for this genome using in silico gene predictions that were supported by RNA-seq data. PMID:28143951

  12. Draft Genome Sequence of Subantarctic Rhodococcus sp. Strain 1139

    PubMed Central

    Baker, Anthony L.; Charleston, Michael A.; Britz, Margaret L.

    2017-01-01

    ABSTRACT The draft genome sequence of subantarctic Rhodococcus sp. strain 1139 is reported here. The genome size is 7.04 Mb with high G+C content (62.3%) and it contains a large number of genes involved in lipid synthesis. This lipid synthesis system is characteristic of oleaginous Actinobacteria, which are of interest for biofuel production. PMID:28385836

  13. Genome sequences of Listeria monocytogenes strains with resistance to arsenic

    USDA-ARS?s Scientific Manuscript database

    Listeria monocytogenes frequently exhibits resistance to arsenic. We report here the draft genome sequences of eight genetically diverse arsenic-resistant L. monocytogenes strains from human listeriosis and food-associated environments. Availability of these genomes would help to elucidate the role ...

  14. Draft Genome Sequences of Itaconate-Producing Ustilaginaceae

    PubMed Central

    Geiser, Elena; Ludwig, Florian; Zambanini, Thiemo; Blank, Lars M.

    2016-01-01

    Some smut fungi of the family Ustilaginaceae produce itaconate from glucose. De novo genome sequencing of nine itaconate-producing Ustilaginaceae revealed genome sizes between 19 and 25 Mbp. Comparison to the itaconate cluster of U. maydis MB215 revealed all essential genes for itaconate production contributing to metabolic engineering for improving itaconate production. PMID:27979931

  15. Insights into vertebrate evolution from the chicken genome sequence

    PubMed Central

    Furlong, Rebecca F

    2005-01-01

    The chicken has recently joined the ever-growing list of fully sequenced animal genomes. Its unique features include expanded gene families involved in egg and feather production as well as more surprising large families, such as those for olfactory receptors. Comparisons with other vertebrate genomes move us closer to defining a set of essential vertebrate genes. PMID:15693954

  16. Draft Genome Sequence of Pseudomonas sp. nov. H2

    PubMed Central

    Loftie-Eaton, Wesley; Suzuki, Haruo; Bashford, Kelsie; Heuer, Holger; Stragier, Pieter; De Vos, Paul; Settles, Matthew L.

    2015-01-01

    We report the draft genome sequence of Pseudomonas sp. nov. H2, isolated from creek sediment in Moscow, ID, USA. The strain is most closely related to Pseudomonas putida. However, it has a slightly smaller genome that appears to have been impacted by horizontal gene transfer and poorly maintains IncP-1 plasmids. PMID:25838493

  17. Complete Genome Sequence of Spiroplasma sp. TU-14

    PubMed Central

    Lo, Wen-Sui; Haryono, Mindia; Gasparich, Gail E.

    2017-01-01

    ABSTRACT Spiroplasma sp. TU-14 was isolated from a contaminated sample of Entomoplasma lucivorax PIPN-2T obtained from the International Organization for Mycoplasmology collection. Here, we report the complete genome sequence of this bacterium to facilitate the investigation of its biology and the comparative genomics among Spiroplasma spp. PMID:28082500

  18. Complete genome sequence of Aeromonas hydrophila AL06-06

    USDA-ARS?s Scientific Manuscript database

    Aeromonas hydrophila occurs in freshwater environments and infects fish and mammals. In this work, we report the complete genome sequence of Aeromonas hydrophila AL06-06, which was isolated from diseased goldfish and is being used for comparative genomic studies with A. hydrophila strains causing ba...

  19. Draft genome sequence of Phomopsis longicolla MSPL 10-6

    USDA-ARS?s Scientific Manuscript database

    Phomopsis longicolla T.W. Hobbs is the primary cause of Phomopsis seed decay in soybean. We report the de novo assembled draft genome sequence of P. longicolla isolate MSPL10-6 with a 54.8-fold depth of coverage. The resulting draft genome was estimated to be approximately 64 Mb in size with an over...

  20. Complete Genome Sequence of Bacillus thuringiensis Strain 407 Cry-

    PubMed Central

    Poehlein, Anja; Rosenstiel, Philip; Liesegang, Heiko

    2013-01-01

    Bacillus thuringiensis is an insect pathogen that has been used widely as a biopesticide. Here, we report the genome sequence of strain 407 Cry-, which is used to study the genetic determinants of pathogenicity. The genome consists of a 5.5-Mb chromosome and nine plasmids, including a novel 502-kb megaplasmid. PMID:23405326

  1. Draft Genome Sequence of Tannerella forsythia Clinical Isolate 9610

    PubMed Central

    Hanson-Drury, Sesha; Liu, Quanhui; Vo, Anh T.; Kim, Michelle; Watling, Michael; Bumgarner, Roger S.

    2017-01-01

    ABSTRACT We present here the draft genome sequence of Tannerella forsythia 9610, a clinical isolate obtained from a periodontitis patient. The genome is composed of 79 scaffolds with 82 contigs, for a length of 3,201,941 bp and a G+C of 47.3%. PMID:28336586

  2. Complete Genome Sequence of Cronobacter sakazakii Strain CMCC 45402.

    PubMed

    Zhao, Zhijing; Wang, Lei; Wang, Bin; Liang, Haoyu; Ye, Qiang; Zeng, Ming

    2014-01-16

    Cronobacter sakazakii is considered to be an important pathogen involved in life-threatening neonatal infections. Here, we report the annotated complete genome sequence of C. sakazakii strain CMCC 45402, obtained from a milk sample in China. The major findings from the genomic analysis provide a better understanding of the isolates from China.

  3. Draft Genome Sequence of Enterobacter cloacae Strain JD6301

    PubMed Central

    Wilson, Jessica G.; French, William T.; Lipzen, Anna; Martin, Joel; Schackwitz, Wendy; Woyke, Tanja; Shapiro, Nicole; Bullard, James W.; Champlin, Franklin R.

    2014-01-01

    Enterobacter cloacae strain JD6301 was isolated from a mixed culture with wastewater collected from a municipal treatment facility and oleaginous microorganisms. A draft genome sequence of this organism indicates that it has a genome size of 4,772,910 bp, an average G+C content of 53%, and 4,509 protein-coding genes. PMID:24874669

  4. A snapshot of the emerging tomato genome sequence

    USDA-ARS?s Scientific Manuscript database

    The genome of tomato (Solanum lycopersicum) is being sequenced by an international consortium of 10 countries (Korea, China, the United Kingdom, India, the Netherlands, France, Japan, Spain, Italy and the United States) as part of a larger initiative called the ‘International Solanaceae Genome Proje...

  5. Complete Genome Sequence of the Oncolytic Sendai virus Strain Moscow

    PubMed Central

    Zainutdinov, Sergei S.; Tikunov, Artem Y.; Matveeva, Olga V.

    2016-01-01

    We report here the complete genome sequence of Sendai virus Moscow strain. Anecdotal evidence for the efficacy of oncolytic virotherapy exists for this strain. The RNA genome of the Moscow strain is 15,384 nucleotides in length and differs from the nearest strain, BB1, by 18 nucleotides and 11 amino acids. PMID:27516510

  6. Complete genome sequence of Campylobacter gracilis ATCC 33236T

    USDA-ARS?s Scientific Manuscript database

    The human oral pathogen Campylobacter gracilis has been isolated from periodontal and endodontal infections, and also from non-oral head, neck or lung infections. This study describes the whole-genome sequence of the human periodontal isolate ATCC 33236T (=FDC 1084), which is the first closed genome...

  7. Complete Genome Sequence of Pseudomonas aeruginosa Phage AAT-1.

    PubMed

    Andrade-Domínguez, Andrés; Kolter, Roberto

    2016-08-25

    Aspects of the interaction between phages and animals are of interest and importance for medical applications. Here, we report the genome sequence of the lytic Pseudomonas phage AAT-1, isolated from mammalian serum. AAT-1 is a double-stranded DNA phage, with a genome of 57,599 bp, containing 76 predicted open reading frames.

  8. Complete Genome Sequence of Desulfovibrio piger FI11049

    PubMed Central

    Nueno Palop, Carmen; Mayer, Melinda J.; Crost, Emmanuelle; Narbad, Arjan

    2017-01-01

    ABSTRACT The complete genome sequence of Desulfovibrio piger FI11049 was determined. The genome consists of a single circular chromosome of 2,807,531 bp encoding seven rRNA operons, 76 tRNA genes, and 2,535 coding genes. PMID:28209813

  9. Complete genome sequence of pronghorn virus, a pestivirus

    USDA-ARS?s Scientific Manuscript database

    The complete genome sequence of Pronghorn virus, a member of the Pestivirus genus of the Flaviviridae, was determined. The virus, originally isolated from a pronghorn antelope, had a genome of 12,287 nucleotides with a single open reading frame of 11,694 bases encoding 3898 amino acids....

  10. Complete genome sequence of pronghorn virus, a pestivirus.

    PubMed

    Neill, John D; Ridpath, Julia F; Fischer, Nicole; Grundhoff, Adam; Postel, Alexander; Becher, Paul

    2014-06-12

    The complete genome sequence of pronghorn virus, a member of the Pestivirus genus of the family Flaviviridae, was determined here. The virus, originally isolated from a pronghorn antelope, has a genome of 12,273 nucleotides, with a single open reading frame of 11,694 bases encoding 3,897 amino acids.

  11. Complete Genome Sequence of Pronghorn Virus, a Pestivirus

    PubMed Central

    Ridpath, Julia F.; Fischer, Nicole; Grundhoff, Adam; Postel, Alexander; Becher, Paul

    2014-01-01

    The complete genome sequence of pronghorn virus, a member of the Pestivirus genus of the family Flaviviridae, was determined here. The virus, originally isolated from a pronghorn antelope, has a genome of 12,273 nucleotides, with a single open reading frame of 11,694 bases encoding 3,897 amino acids. PMID:24926058

  12. Complete Genome Sequence of Bacteroides ovatus V975

    PubMed Central

    Goesmann, Alexander; Carding, Simon R.

    2016-01-01

    The complete genome sequence of Bacteroides ovatus V975 was determined. The genome consists of a single circular chromosome of 6,475,296 bp containing five rRNA operons, 68 tRNA genes, and 4,959 coding genes. PMID:27908995

  13. Complete Genome Sequence of Bacteroides ovatus V975.

    PubMed

    Wegmann, Udo; Goesmann, Alexander; Carding, Simon R

    2016-12-01

    The complete genome sequence of Bacteroides ovatus V975 was determined. The genome consists of a single circular chromosome of 6,475,296 bp containing five rRNA operons, 68 tRNA genes, and 4,959 coding genes. Copyright © 2016 Wegmann et al.

  14. Complete Mitochondrial Genome Sequence of the Pezizomycete Pyronema confluens

    PubMed Central

    2016-01-01

    The complete mitochondrial genome of the ascomycete Pyronema confluens has been sequenced. The circular genome has a size of 191 kb and contains 48 protein-coding genes, 26 tRNA genes, and two rRNA genes. Of the protein-coding genes, 14 encode conserved mitochondrial proteins, and 31 encode predicted homing endonuclease genes. PMID:27174271

  15. Mitochondrial Genome Sequence of the Glass Sponge Oopsacas minuta

    PubMed Central

    Santini, Sébastien; Rocher, Caroline; Le Bivic, André

    2015-01-01

    We report the complete mitochondrial genome sequence of the Mediterranean glass sponge Oopsacas minuta. This 19-kb mitochondrial genome has 24 noncoding genes (22 tRNAs and 2 rRNAs) and 14 protein-encoding genes coding for 11 subunits of respiratory chain complexes and 3 ATP synthase subunits. PMID:26227597

  16. Draft genome sequences of 10 strains of the genus exiguobacterium.

    PubMed

    Vishnivetskaya, Tatiana A; Chauhan, Archana; Layton, Alice C; Pfiffner, Susan M; Huntemann, Marcel; Copeland, Alex; Chen, Amy; Kyrpides, Nikos C; Markowitz, Victor M; Palaniappan, Krishna; Ivanova, Natalia; Mikhailova, Natalia; Ovchinnikova, Galina; Andersen, Evan W; Pati, Amrita; Stamatis, Dimitrios; Reddy, T B K; Shapiro, Nicole; Nordberg, Henrik P; Cantor, Michael N; Hua, X Susan; Woyke, Tanja

    2014-10-16

    High-quality draft genome sequences were determined for 10 Exiguobacterium strains in order to provide insight into their evolutionary strategies for speciation and environmental adaptation. The selected genomes include psychrotrophic and thermophilic species from a range of habitats, which will allow for a comparison of metabolic pathways and stress response genes.

  17. Draft Genome Sequences of 10 Strains of the Genus Exiguobacterium

    PubMed Central

    Chauhan, Archana; Layton, Alice C.; Pfiffner, Susan M.; Huntemann, Marcel; Copeland, Alex; Chen, Amy; Kyrpides, Nikos C.; Markowitz, Victor M.; Palaniappan, Krishna; Ivanova, Natalia; Mikhailova, Natalia; Ovchinnikova, Galina; Andersen, Evan W.; Pati, Amrita; Stamatis, Dimitrios; Reddy, T. B. K.; Shapiro, Nicole; Nordberg, Henrik P.; Cantor, Michael N.; Hua, X. Susan; Woyke, Tanja

    2014-01-01

    High-quality draft genome sequences were determined for 10 Exiguobacterium strains in order to provide insight into their evolutionary strategies for speciation and environmental adaptation. The selected genomes include psychrotrophic and thermophilic species from a range of habitats, which will allow for a comparison of metabolic pathways and stress response genes. PMID:25323723

  18. The tomato genome sequence provides insight into fleshy fruit evolution

    USDA-ARS?s Scientific Manuscript database

    The genome of the inbred tomato cultivar ‘Heinz 1706’ was sequenced and assembled using a combination of Sanger and “next generation” technologies. The predicted genome size is ~900 Mb, consistent with prior estimates, of which 760 Mb were assembled in 91 scaffolds aligned to the 12 tomato chromosom...

  19. Complete Genome Sequence of Bacillus thuringiensis Bacteriophage Smudge

    PubMed Central

    Cornell, Jessica L.; Breslin, Eileen; Schuhmacher, Zachary; Himelright, Madison; Berluti, Cassandra; Boyd, Charles; Carson, Rachel; Del Gallo, Elle; Giessler, Caris; Gilliam, Benjamin; Heatherly, Catherine; Nevin, Julius; Nguyen, Bryan; Nguyen, Justin; Parada, Jocelyn; Sutterfield, Blake; Tukruni, Muruj

    2016-01-01

    Smudge, a bacteriophage enriched from soil using Bacillus thuringiensis DSM-350 as the host, had its complete genome sequenced. Smudge is a myovirus with a genome consisting of 292 genes and was identified as belonging to the C1 cluster of Bacillus phages. PMID:27540049

  20. Genomic sequence for the aflatoxigenic filamentous fungus Aspergillus nomius

    USDA-ARS?s Scientific Manuscript database

    The genome of the A. nomius type strain was sequenced using a personal genome machine. Annotation of the genes was undertaken, followed by gene ontology and an investigation into the number of secondary metabolite clusters. Comparative studies with other Aspergillus species involved shared/unique ge...

  1. Draft genome sequence of the silver pomfret fish, Pampus argenteus.

    PubMed

    AlMomin, Sabah; Kumar, Vinod; Al-Amad, Sami; Al-Hussaini, Mohsen; Dashti, Talal; Al-Enezi, Khaznah; Akbar, Abrar

    2016-01-01

    Silver pomfret, Pampus argenteus, is a fish species from coastal waters. Despite its high commercial value, this edible fish has not been sequenced. Hence, its genetic and genomic studies have been limited. We report the first draft genome sequence of the silver pomfret obtained using a Next Generation Sequencing (NGS) technology. We assembled 38.7 Gb of nucleotides into scaffolds of 350 Mb with N50 of about 1.5 kb, using high quality paired end reads. These scaffolds represent 63.7% of the estimated silver pomfret genome length. The newly sequenced and assembled genome has 11.06% repetitive DNA regions, and this percentage is comparable to that of the tilapia genome. The genome analysis predicted 16 322 genes. About 91% of these genes showed homology with known proteins. Many gene clusters were annotated to protein and fatty-acid metabolism pathways that may be important in the context of the meat texture and immune system developmental processes. The reference genome can pave the way for the identification of many other genomic features that could improve breeding and population-management strategies, and it can also help characterize the genetic diversity of P. argenteus.

  2. Complete genome sequence of Streptococcus thermophilus strain ND03.

    PubMed

    Sun, Zhihong; Chen, Xia; Wang, Jicheng; Zhao, Wenjing; Shao, Yuyu; Wu, Lan; Zhou, Zhemin; Sun, Tiansong; Wang, Lei; Meng, He; Zhang, Heping; Chen, Wei

    2011-02-01

    Streptococcus thermophilus strain ND03 is a Chinese commercial dairy starter used for the manufacture of yogurt. It was isolated from naturally fermented yak milk in Qinghai, China. We present here the complete genome sequence of ND03 and compare it to three other published genomes of Streptococcus thermophilus strains.

  3. Coelacanth genome sequence reveals the evolutionary history of vertebrate genes.

    PubMed

    Noonan, James P; Grimwood, Jane; Danke, Joshua; Schmutz, Jeremy; Dickson, Mark; Amemiya, Chris T; Myers, Richard M

    2004-12-01

    The coelacanth is one of the nearest living relatives of tetrapods. However, a teleost species such as zebrafish or Fugu is typically used as the outgroup in current tetrapod comparative sequence analyses. Such studies are complicated by the fact that teleost genomes have undergone a whole-genome duplication event, as well as individual gene-duplication events. Here, we demonstrate the value of coelacanth genome sequence by complete sequencing and analysis of the protocadherin gene cluster of the Indonesian coelacanth, Latimeria menadoensis. We found that coelacanth has 49 protocadherin cluster genes organized in the same three ordered subclusters, alpha, beta, and gamma, as the 54 protocadherin cluster genes in human. In contrast, whole-genome and tandem duplications have generated two zebrafish protocadherin clusters comprised of at least 97 genes. Additionally, zebrafish protocadherins are far more prone to homogenizing gene conversion events than coelacanth protocadherins, suggesting that recombination- and duplication-driven plasticity may be a feature of teleost genomes. Our results indicate that coelacanth provides the ideal outgroup sequence against which tetrapod genomes can be measured. We therefore present L. menadoensis as a candidate for whole-genome sequencing.

  4. Genome sequence of the olive tree, Olea europaea.

    PubMed

    Cruz, Fernando; Julca, Irene; Gómez-Garrido, Jèssica; Loska, Damian; Marcet-Houben, Marina; Cano, Emilio; Galán, Beatriz; Frias, Leonor; Ribeca, Paolo; Derdak, Sophia; Gut, Marta; Sánchez-Fernández, Manuel; García, Jose Luis; Gut, Ivo G; Vargas, Pablo; Alioto, Tyler S; Gabaldón, Toni

    2016-06-27

    The Mediterranean olive tree (Olea europaea subsp. europaea) was one of the first trees to be domesticated and is currently of major agricultural importance in the Mediterranean region as the source of olive oil. The molecular bases underlying the phenotypic differences among domesticated cultivars, or between domesticated olive trees and their wild relatives, remain poorly understood. Both wild and cultivated olive trees have 46 chromosomes (2n). A total of 543 Gb of raw DNA sequence from whole genome shotgun sequencing, and a fosmid library containing 155,000 clones from a 1,000+ year-old olive tree (cv. Farga) were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 443 kb, and a total length of 1.31 Gb, which represents 95 % of the estimated genome length (1.38 Gb). In addition, the associated fungus Aureobasidium pullulans was partially sequenced. Genome annotation, assisted by RNA sequencing from leaf, root, and fruit tissues at various stages, resulted in 56,349 unique protein coding genes, suggesting recent genomic expansion. Genome completeness, as estimated using the CEGMA pipeline, reached 98.79 %. The assembled draft genome of O. europaea will provide a valuable resource for the study of the evolution and domestication processes of this important tree, and allow determination of the genetic bases of key phenotypic traits. Moreover, it will enhance breeding programs and the formation of new varieties.

  5. Sequencing viral genomes from a single isolated plaque.

    PubMed

    Depew, Jessica; Zhou, Bin; McCorrison, Jamison M; Wentworth, David E; Purushe, Janaki; Koroleva, Galina; Fouts, Derrick E

    2013-06-06

    Whole genome sequencing of viruses and bacteriophages is often hindered because of the need for large quantities of genomic material. A method is described that combines single plaque sequencing with an optimization of Sequence Independent Single Primer Amplification (SISPA). This method can be used for de novo whole genome next-generation sequencing of any cultivable virus without the need for large-scale production of viral stocks or viral purification using centrifugal techniques. A single viral plaque of a variant of the 2009 pandemic H1N1 human Influenza A virus was isolated and amplified using the optimized SISPA protocol. The sensitivity of the SISPA protocol presented here was tested with bacteriophage F_HA0480sp/Pa1651 DNA. The amplified products were sequenced with 454 and Illumina HiSeq platforms. Mapping and de novo assemblies were performed to analyze the quality of data produced from this optimized method. Analysis of the sequence data demonstrated that from a single viral plaque of Influenza A, a mapping assembly with 3590-fold average coverage representing 100% of the genome could be produced. The de novo assembled data produced contigs with 30-fold average sequence coverage, representing 96.5% of the genome. Using only 10 pg of starting DNA from bacteriophage F_HA0480sp/Pa1651 in the SISPA protocol resulted in sequencing data that gave a mapping assembly with 3488-fold average sequence coverage, representing 99.9% of the reference and a de novo assembly with 45-fold average sequence coverage, representing 98.1% of the genome. The optimized SISPA protocol presented here produces amplified product that when sequenced will give high quality data that can be used for de novo assembly. The protocol requires only a single viral plaque or as little as 10 pg of DNA template, which will facilitate rapid identification of viruses during an outbreak and viruses that are difficult to propagate.

  6. The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes.

    PubMed

    Mao, Qing; Ciotlos, Serban; Zhang, Rebecca Yu; Ball, Madeleine P; Chin, Robert; Carnevali, Paolo; Barua, Nina; Nguyen, Staci; Agarwal, Misha R; Clegg, Tom; Connelly, Abram; Vandewege, Ward; Zaranek, Alexander Wait; Estep, Preston W; Church, George M; Drmanac, Radoje; Peters, Brock A

    2016-10-11

    Since the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information. As part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics' Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics' standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data. These genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function.

  7. The Z curve database: a graphic representation of genome sequences.

    PubMed

    Zhang, Chun-Ting; Zhang, Ren; Ou, Hong-Yu

    2003-03-22

    Genome projects for many prokaryotic and eukaryotic species have been completed and more new genome projects are being underway currently. The availability of a large number of genomic sequences for researchers creates a need to find graphic tools to study genomes in a perceivable form. The Z curve is one of such tools available for visualizing genomes. The Z curve is a unique three-dimensional curve representation for a given DNA sequence in the sense that each can be uniquely reconstructed given the other. The Z curve database for more than 1000 genomes have been established here. The database contains the Z curves for archaea, bacteria, eukaryota, organelles, phages, plasmids, viroids and viruses, whose genomic sequences are currently available. All the 3-dimensional Z curves and their three component curves are stored in the database. The applications of the Z curve database on comparative genomics, gene prediction, computation of G+C content with a windowless technique, prediction of replication origins and terminations of bacterial and archaeal genomes and study of local deviations from the Chargaff Parity Rule 2 etc. are presented in detail. The Z curve database reported here is a treasure trove in which biologists could find useful biological knowledge.

  8. Use of Whole Genome Sequence Data To Infer Baculovirus Phylogeny

    PubMed Central

    Herniou, Elisabeth A.; Luque, Teresa; Chen, Xinwen; Vlak, Just M.; Winstanley, Doreen; Cory, Jennifer S.; O'Reilly, David R.

    2001-01-01

    Several phylogenetic methods based on whole genome sequence data were evaluated using data from nine complete baculovirus genomes. The utility of three independent character sets was assessed. The first data set comprised the sequences of the 63 genes common to these viruses. The second set of characters was based on gene order, and phylogenies were inferred using both breakpoint distance analysis and a novel method developed here, termed neighbor pair analysis. The third set recorded gene content by scoring gene presence or absence in each genome. All three data sets yielded phylogenies supporting the separation of the Nucleopolyhedrovirus (NPV) and Granulovirus (GV) genera, the division of the NPVs into groups I and II, and species relationships within group I NPVs. Generation of phylogenies based on the combined sequences of all 63 shared genes proved to be the most effective approach to resolving the relationships among the group II NPVs and the GVs. The history of gene acquisitions and losses that have accompanied baculovirus diversification was visualized by mapping the gene content data onto the phylogenetic tree. This analysis highlighted the fluid nature of baculovirus genomes, with evidence of frequent genome rearrangements and multiple gene content changes during their evolution. Of more than 416 genes identified in the genomes analyzed, only 63 are present in all nine genomes, and 200 genes are found only in a single genome. Despite this fluidity, the whole genome-based methods we describe are sufficiently powerful to recover the underlying phylogeny of the viruses. PMID:11483757

  9. A 454 sequencing approach to dipteran mitochondrial genome research.

    PubMed

    Ramakodi, Meganathan P; Singh, Baneshwar; Wells, Jeffrey D; Guerrero, Felix; Ray, David A

    2015-01-01

    The availability of complete mitochondrial genome (mtgenome) data for Diptera, one of the largest metazoan orders, in public databases is limited. The advent of high throughput sequencing technology provides the potential to generate mtgenomes for many species affordably and quickly. However, these technologies need to be validated for dipterans as the members of this clade play important economic and research roles. Illumina and 454 sequencing platforms are widely used in genomic research involving non-model organisms. The Illumina platform has already been utilized for generating mitochondrial genomes without using conventional long range PCR for insects whereas the power of 454 sequencing for generating mitochondrial genome drafts without PCR has not yet been validated for insects. Thus, this study examines the utility of 454 sequencing approach for dipteran mtgenomic research. We generated complete or nearly complete mitochondrial genomes for Cochliomyia hominivorax, Haematobia irritans, Phormia regina and Sarcophaga crassipalpis using a 454 sequencing approach. Comparisons between newly obtained and existing assemblies for C. hominivorax and H. irritans revealed no major discrepancies and verified the utility of 454 sequencing for dipteran mitochondrial genomes. We also report the complete mitochondrial sequences for two forensically important flies, P. regina and S. crassipalpis, which could be used to provide useful information to legal personnel. Comparative analyses revealed that dipterans follow similar codon usage and nucleotide biases that could be due to mutational and selection pressures. This study illustrates the utility of 454 sequencing to obtain complete mitochondrial genomes for dipterans without the aid of conventional molecular techniques such as PCR and cloning and validates this method of mtgenome sequencing in arthropods.

  10. Dissection of the octoploid strawberry genome by deep sequencing of the genomes of Fragaria species.

    PubMed

    Hirakawa, Hideki; Shirasawa, Kenta; Kosugi, Shunichi; Tashiro, Kosuke; Nakayama, Shinobu; Yamada, Manabu; Kohara, Mistuyo; Watanabe, Akiko; Kishida, Yoshie; Fujishiro, Tsunakazu; Tsuruoka, Hisano; Minami, Chiharu; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Komaki, Akiko; Yanagi, Tomohiro; Guoxin, Qin; Maeda, Fumi; Ishikawa, Masami; Kuhara, Satoru; Sato, Shusei; Tabata, Satoshi; Isobe, Sachiko N

    2014-01-01

    Cultivated strawberry (Fragaria x ananassa) is octoploid and shows allogamous behaviour. The present study aims at dissecting this octoploid genome through comparison with its wild relatives, F. iinumae, F. nipponica, F. nubicola, and F. orientalis by de novo whole-genome sequencing on an Illumina and Roche 454 platforms. The total length of the assembled Illumina genome sequences obtained was 698 Mb for F. x ananassa, and ∼200 Mb each for the four wild species. Subsequently, a virtual reference genome termed FANhybrid_r1.2 was constructed by integrating the sequences of the four homoeologous subgenomes of F. x ananassa, from which heterozygous regions in the Roche 454 and Illumina genome sequences were eliminated. The total length of FANhybrid_r1.2 thus created was 173.2 Mb with the N50 length of 5137 bp. The Illumina-assembled genome sequences of F. x ananassa and the four wild species were then mapped onto the reference genome, along with the previously published F. vesca genome sequence to establish the subgenomic structure of F. x ananassa. The strategy adopted in this study has turned out to be successful in dissecting the genome of octoploid F. x ananassa and appears promising when applied to the analysis of other polyploid plant species.

  11. Dissection of the Octoploid Strawberry Genome by Deep Sequencing of the Genomes of Fragaria Species

    PubMed Central

    Hirakawa, Hideki; Shirasawa, Kenta; Kosugi, Shunichi; Tashiro, Kosuke; Nakayama, Shinobu; Yamada, Manabu; Kohara, Mistuyo; Watanabe, Akiko; Kishida, Yoshie; Fujishiro, Tsunakazu; Tsuruoka, Hisano; Minami, Chiharu; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Komaki, Akiko; Yanagi, Tomohiro; Guoxin, Qin; Maeda, Fumi; Ishikawa, Masami; Kuhara, Satoru; Sato, Shusei; Tabata, Satoshi; Isobe, Sachiko N.

    2014-01-01

    Cultivated strawberry (Fragaria x ananassa) is octoploid and shows allogamous behaviour. The present study aims at dissecting this octoploid genome through comparison with its wild relatives, F. iinumae, F. nipponica, F. nubicola, and F. orientalis by de novo whole-genome sequencing on an Illumina and Roche 454 platforms. The total length of the assembled Illumina genome sequences obtained was 698 Mb for F. x ananassa, and ∼200 Mb each for the four wild species. Subsequently, a virtual reference genome termed FANhybrid_r1.2 was constructed by integrating the sequences of the four homoeologous subgenomes of F. x ananassa, from which heterozygous regions in the Roche 454 and Illumina genome sequences were eliminated. The total length of FANhybrid_r1.2 thus created was 173.2 Mb with the N50 length of 5137 bp. The Illumina-assembled genome sequences of F. x ananassa and the four wild species were then mapped onto the reference genome, along with the previously published F. vesca genome sequence to establish the subgenomic structure of F. x ananassa. The strategy adopted in this study has turned out to be successful in dissecting the genome of octoploid F. x ananassa and appears promising when applied to the analysis of other polyploid plant species. PMID:24282021

  12. Whole genome sequence analysis of BT-474 using complete Genomics' standard and long fragment read technologies.

    PubMed

    Ciotlos, Serban; Mao, Qing; Zhang, Rebecca Yu; Li, Zhenyu; Chin, Robert; Gulbahce, Natali; Liu, Sophie Jia; Drmanac, Radoje; Peters, Brock A

    2016-01-01

    The cell line BT-474 is a popular cell line for studying the biology of cancer and developing novel drugs. However, there is no complete, published genome sequence for this highly utilized scientific resource. In this study we sought to provide a comprehensive and useful data set for the scientific community by generating a whole genome sequence for BT-474. Five μg of genomic DNA, isolated from an early passage of the BT-474 cell line, was used to generate a whole genome sequence (114X coverage) using Complete Genomics' standard sequencing process. To provide additional variant phasing and structural variation data we also processed and analyzed two separate libraries of 5 and 6 individual cells to depths of 99X and 87X, respectively, using Complete Genomics' Long Fragment Read (LFR) technology. BT-474 is a highly aneuploid cell line with an extremely complex genome sequence. This ~300X total coverage genome sequence provides a more complete understanding of this highly utilized cell line at the genomic level.

  13. Complete Genome Sequence of Methanomassiliicoccus luminyensis, the Largest Genome of a Human-Associated Archaea Species

    PubMed Central

    Gorlas, Aurore; Robert, Catherine; Gimenez, Gregory; Drancourt, Michel

    2012-01-01

    The present study describes the complete and annotated genome sequence of Methanomassiliicoccus luminyensis strain B10 (DSM 24529T, CSUR P135), which was isolated from human feces. The 2.6-Mb genome represents the largest genome of a methanogenic euryarchaeon isolated from humans. The genome data of M. luminyensis reveal unique features and horizontal gene transfer events, which might have occurred during its adaptation and/or evolution in the human ecosystem. PMID:22887657

  14. Complete genome sequence of Methanomassiliicoccus luminyensis, the largest genome of a human-associated Archaea species.

    PubMed

    Gorlas, Aurore; Robert, Catherine; Gimenez, Gregory; Drancourt, Michel; Raoult, Didier

    2012-09-01

    The present study describes the complete and annotated genome sequence of Methanomassiliicoccus luminyensis strain B10 (DSM 24529(T), CSUR P135), which was isolated from human feces. The 2.6-Mb genome represents the largest genome of a methanogenic euryarchaeon isolated from humans. The genome data of M. luminyensis reveal unique features and horizontal gene transfer events, which might have occurred during its adaptation and/or evolution in the human ecosystem.

  15. Mig-6 Mouse Model of Endometrial Cancer.

    PubMed

    Kim, Tae Hoon; Yoo, Jung-Yoon; Jeong, Jae-Wook

    2017-01-01

    Endometrial cancer is a frequently occurring gynecological disorder. Estrogen-dependent endometrioid carcinoma is the most common type of gynecological cancer. One of the major pathologic phenomena of endometrial cancer is the loss of estrogen (E2) and progesterone (P4) control over uterine epithelial cell proliferation. P4 antagonizes the growth-promoting properties of E2 in the uterus. P4 prevents the development of endometrial cancer associated with unopposed E2 by blocking E2 actions. Mitogen inducible gene 6 (Mig-6, Errfi1, RALT, or gene 33) is an immediate early response gene that can be induced by various mitogens and common chronic stress stimuli. Mig-6 has been identified as an important component of P4-mediated inhibition of E2 signaling in the uterus. Decreased expression of MIG-6 is observed in human endometrial carcinomas. Transgenic mice with Mig-6 ablation in the uterus develop endometrial hyperplasia and E2-dependent endometrial cancer. Thus, MIG-6 has a tumor suppressor function in endometrial tumorigenesis. The following discussion summarizes our current knowledge of Mig-6 mouse models and their role in understanding the molecular mechanisms of endometrial tumorigenesis and in the development of therapeutic approaches for endometrial cancer.

  16. Choosing a Benchtop Sequencing Machine to Characterise Helicobacter pylori Genomes

    PubMed Central

    Perkins, Timothy T.; Tay, Chin Yen; Thirriot, Fanny; Marshall, Barry

    2013-01-01

    The fully annotated genome sequence of the European strain, 26695 was first published in 1997 and, in 1999, it was directly compared to the USA isolate J99, promoting two standard laboratory isolates for Helicobacter pylori (H. pylori) research. With the genomic scaffolds available from these important genomes and the advent of benchtop high-throughput sequencing technology, a bacterial genome can now be sequenced within a few days. We sequenced and analysed strains J99 and 26695 using the benchtop-sequencing machines Ion Torrent PGM and the Illumina MiSeq Nextera and Nextera XT methodologies. Using publically available algorithms, we analysed the raw data and interrogated both genomes by mapping the data and by de novo assembly. We compared the accuracy of the coding sequence assemblies to the originally published sequences. With the Ion Torrent PGM, we found an inherently high-error rate in the raw sequence data. Using the Illumina MiSeq, we found significantly more non-covered nucleotides when using the less expensive Illumina Nextera XT compared with the Illumina Nextera library creation method. We found the most accurate de novo assemblies using the Nextera technology, however, extracting an accurate multi-locus sequence type was inconsistent compared to the Ion Torrent PGM. We found the cagPAI failed to assemble onto a single contig in all technologies but was more accurate using the Nextera. Our results indicate the Illumina MiSeq Nextera method is the most accurate for de novo whole genome sequencing of H. pylori. PMID:23840736

  17. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis

    PubMed Central

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-01-01

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled. PMID:26586576

  18. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    PubMed

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-11-20

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.

  19. Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis.

    PubMed

    Carlton, Jane M; Hirt, Robert P; Silva, Joana C; Delcher, Arthur L; Schatz, Michael; Zhao, Qi; Wortman, Jennifer R; Bidwell, Shelby L; Alsmark, U Cecilia M; Besteiro, Sébastien; Sicheritz-Ponten, Thomas; Noel, Christophe J; Dacks, Joel B; Foster, Peter G; Simillion, Cedric; Van de Peer, Yves; Miranda-Saavedra, Diego; Barton, Geoffrey J; Westrop, Gareth D; Müller, Sylke; Dessi, Daniele; Fiori, Pier Luigi; Ren, Qinghu; Paulsen, Ian; Zhang, Hanbang; Bastida-Corcuera, Felix D; Simoes-Barbosa, Augusto; Brown, Mark T; Hayes, Richard D; Mukherjee, Mandira; Okumura, Cheryl Y; Schneider, Rachel; Smith, Alias J; Vanacova, Stepanka; Villalvazo, Maria; Haas, Brian J; Pertea, Mihaela; Feldblyum, Tamara V; Utterback, Terry R; Shu, Chung-Li; Osoegawa, Kazutoyo; de Jong, Pieter J; Hrdy, Ivan; Horvathova, Lenka; Zubacova, Zuzana; Dolezal, Pavel; Malik, Shehre-Banoo; Logsdon, John M; Henze, Katrin; Gupta, Arti; Wang, Ching C; Dunne, Rebecca L; Upcroft, Jacqueline A; Upcroft, Peter; White, Owen; Salzberg, Steven L; Tang, Petrus; Chiu, Cheng-Hsun; Lee, Ying-Shiung; Embley, T Martin; Coombs, Graham H; Mottram, Jeremy C; Tachezy, Jan; Fraser-Liggett, Claire M; Johnson, Patricia J

    2007-01-12

    We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the approximately 160-megabase genome, reflecting a recent massive expansion of genetic material. This expansion, in conjunction with the shaping of metabolic pathways that likely transpired through lateral gene transfer from bacteria, and amplification of specific gene families implicated in pathogenesis and phagocytosis of host proteins may exemplify adaptations of the parasite during its transition to a urogenital environment. The genome sequence predicts previously unknown functions for the hydrogenosome, which support a common evolutionary origin of this unusual organelle with mitochondria.

  20. Complete genome sequence of Serratia plymuthica strain AS12

    SciTech Connect

    Neupane, Saraswoti; Finlay, Roger D.; Alstrom, Sadhna; Goodwin, Lynne A.; Kyrpides, Nikos C; Lucas, Susan; Lapidus, Alla L.; Bruce, David; Pitluck, Sam; Peters, Lin; Ovchinnikova, Galina; Chertkov, Olga; Han, James; Han, Cliff; Tapia, Roxanne; Detter, J. Chris; Land, Miriam L; Hauser, Loren John; Cheng, Jan-Fang; Ivanova, N; Pagani, Ioanna; Klenk, Hans-Peter; Woyke, Tanja; Hogberg, Nils

    2012-01-01

    A plant associated member of the family Enterobacteriaceae, Serratia plymuthica strain AS12 was isolated from rapeseed roots. It is of scientific interest due to its plant growth promoting and plant pathogen inhibiting ability. The genome of S. plymuthica AS12 comprises a 5,443,009 bp long circular chromosome, which consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome was sequenced within the 2010 DOE-JGI Community Sequencing Program (CSP2010) as part of the project entitled 'Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogens'.

  1. Complete genome sequence of Serratia plymuthica strain AS12

    PubMed Central

    Finlay, Roger D.; Alström, Sadhna; Goodwin, Lynne; Kyrpides, Nikos C.; Lucas, Susan; Lapidus, Alla; Bruce, David; Pitluck, Sam; Peters, Lin; Ovchinnikova, Galina; Chertkov, Olga; Han, James; Han, Cliff; Tapia, Roxanne; Detter, John C.; Land, Miriam; Hauser, Loren; Cheng, Jan-Fang; Ivanova, Natalia; Pagani, Ioanna; Klenk, Hans-Peter; Woyke, Tanja; Högberg, Nils

    2012-01-01

    A plant-associated member of the family Enterobacteriaceae, Serratia plymuthica strain AS12 was isolated from rapeseed roots. It is of scientific interest because it promotes plant growth and inhibits plant pathogens. The genome of S. plymuthica AS12 comprises a 5,443,009 bp long circular chromosome, which consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome was sequenced within the 2010 DOE-JGI Community Sequencing Program (CSP2010) as part of the project entitled “Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogens”. PMID:22768360

  2. Widespread mitovirus sequences in plant genomes

    PubMed Central

    Warner, Benjamin E.; Yerramsetty, Pradeep

    2015-01-01

    The exploration of the evolution of RNA viruses has been aided recently by the discovery of copies of fragments or complete genomes of non-retroviral RNA viruses (Non-retroviral Endogenous RNA Viral Elements, or NERVEs) in many eukaryotic nuclear genomes. Among the most prominent NERVEs are partial copies of the RNA dependent RNA polymerase (RdRP) of the mitoviruses in plant mitochondrial genomes. Mitoviruses are in the family Narnaviridae, which are the simplest viruses, encoding only a single protein (the RdRP) in their unencapsidated viral plus strand. Narnaviruses are known only in fungi, and the origin of plant mitochondrial mitovirus NERVEs appears to be horizontal transfer from plant pathogenic fungi. At least one mitochondrial mitovirus NERVE, but not its nuclear copy, is expressed. PMID:25870770

  3. Phylogenomics from Whole Genome Sequences Using aTRAM.

    PubMed

    Allen, Julie M; Boyd, Bret; Nguyen, Nam-Phuong; Vachaspati, Pranjal; Warnow, Tandy; Huang, Daisie I; Grady, Patrick G S; Bell, Kayce C; Cronk, Quentin C B; Mugisha, Lawrence; Pittendrigh, Barry R; Leonardi, M Soledad; Reed, David L; Johnson, Kevin P

    2017-09-01

    Novel sequencing technologies are rapidly expanding the size of data sets that can be applied to phylogenetic studies. Currently the most commonly used phylogenomic approaches involve some form of genome reduction. While these approaches make assembling phylogenomic data sets more economical for organisms with large genomes, they reduce the genomic coverage and thereby the long-term utility of the data. Currently, for organisms with moderate to small genomes ($<$1000 Mbp) it is feasible to sequence the entire genome at modest coverage ($10-30\\times$). Computational challenges for handling these large data sets can be alleviated by assembling targeted reads, rather than assembling the entire genome, to produce a phylogenomic data matrix. Here we demonstrate the use of automated Target Restricted Assembly Method (aTRAM) to assemble 1107 single-copy ortholog genes from whole genome sequencing of sucking lice (Anoplura) and out-groups. We developed a pipeline to extract exon sequences from the aTRAM assemblies by annotating them with respect to the original target protein. We aligned these protein sequences with the inferred amino acids and then performed phylogenetic analyses on both the concatenated matrix of genes and on each gene separately in a coalescent analysis. Finally, we tested the limits of successful assembly in aTRAM by assembling 100 genes from close- to distantly related taxa at high to low levels of coverage.Both the concatenated analysis and the coalescent-based analysis produced the same tree topology, which was consistent with previously published results and resolved weakly supported nodes. These results demonstrate that this approach is successful at developing phylogenomic data sets from raw genome sequencing reads. Further, we found that with coverages above $5-10\\times$, aTRAM was successful at assembling 80-90% of the contigs for both close and distantly related taxa. As sequencing costs continue to decline, we expect full genome sequencing

  4. Specialized microbial databases for inductive exploration of microbial genome sequences

    PubMed Central

    Fang, Gang; Ho, Christine; Qiu, Yaowu; Cubas, Virginie; Yu, Zhou; Cabau, Cédric; Cheung, Frankie; Moszer, Ivan; Danchin, Antoine

    2005-01-01

    Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore , a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya) has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis) associated to related organisms for comparison. PMID:15698474

  5. Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology

    PubMed Central

    Cronn, Richard; Liston, Aaron; Parks, Matthew; Gernandt, David S.; Shen, Rongkun; Mockler, Todd

    2008-01-01

    Organellar DNA sequences are widely used in evolutionary and population genetic studies, however, the conservative nature of chloroplast gene and genome evolution often limits phylogenetic resolution and statistical power. To gain maximal access to the historical record contained within chloroplast genomes, we have adapted multiplex sequencing-by-synthesis (MSBS) to simultaneously sequence multiple genomes using the Illumina Genome Analyzer. We PCR-amplified ∼120 kb plastomes from eight species (seven Pinus, one Picea) in 35 reactions. Pooled products were ligated to modified adapters that included 3 bp indexing tags and samples were multiplexed at four genomes per lane. Tagged microreads were assembled by de novo and reference-guided assembly methods, using previously published Pinus plastomes as surrogate references. Assemblies for these eight genomes are estimated at 88–94% complete, with an average sequence depth of 55× to 186×. Mononucleotide repeats interrupt contig assembly with increasing repeat length, and we estimate that the limit for their assembly is 16 bp. Comparisons to 37 kb of Sanger sequence show a validated error rate of 0.056%, and conspicuous errors are evident from the assembly process. This efficient sequencing approach yields high-quality draft genomes and should have immediate applicability to genomes with comparable complexity. PMID:18753151

  6. Genome sequence of the cultivated cotton Gossypium arboreum.

    PubMed

    Li, Fuguang; Fan, Guangyi; Wang, Kunbo; Sun, Fengming; Yuan, Youlu; Song, Guoli; Li, Qin; Ma, Zhiying; Lu, Cairui; Zou, Changsong; Chen, Wenbin; Liang, Xinming; Shang, Haihong; Liu, Weiqing; Shi, Chengcheng; Xiao, Guanghui; Gou, Caiyun; Ye, Wuwei; Xu, Xun; Zhang, Xueyan; Wei, Hengling; Li, Zhifang; Zhang, Guiyin; Wang, Junyi; Liu, Kun; Kohel, Russell J; Percy, Richard G; Yu, John Z; Zhu, Yu-Xian; Wang, Jun; Yu, Shuxun

    2014-06-01

    The complex allotetraploid nature of the cotton genome (AADD; 2n = 52) makes genetic, genomic and functional analyses extremely challenging. Here we sequenced and assembled the Gossypium arboreum (AA; 2n = 26) genome, a putative contributor of the A subgenome. A total of 193.6 Gb of clean sequence covering the genome by 112.6-fold was obtained by paired-end sequencing. We further anchored and oriented 90.4% of the assembly on 13 pseudochromosomes and found that 68.5% of the genome is occupied by repetitive DNA sequences. We predicted 41,330 protein-coding genes in G. arboreum. Two whole-genome duplications were shared by G. arboreum and Gossypium raimondii before speciation. Insertions of long terminal repeats in the past 5 million years are responsible for the twofold difference in the sizes of these genomes. Comparative transcriptome studies showed the key role of the nucleotide binding site (NBS)-encoding gene family in resistance to Verticillium dahliae and the involvement of ethylene in the development of cotton fiber cells.

  7. The genome sequence of sweet cherry (Prunus avium) for use in genomics-assisted breeding.

    PubMed

    Shirasawa, Kenta; Isuzugawa, Kanji; Ikenaga, Mitsunobu; Saito, Yutaro; Yamamoto, Toshiya; Hirakawa, Hideki; Isobe, Sachiko

    2017-05-25

    We determined the genome sequence of sweet cherry (Prunus avium) using next-generation sequencing technology. The total length of the assembled sequences was 272.4 Mb, consisting of 10,148 scaffold sequences with an N50 length of 219.6 kb. The sequences covered 77.8% of the 352.9 Mb sweet cherry genome, as estimated by k-mer analysis, and included >96.0% of the core eukaryotic genes. We predicted 43,349 complete and partial protein-encoding genes. A high-density consensus map with 2,382 loci was constructed using double-digest restriction site-associated DNA sequencing. Comparing the genetic maps of sweet cherry and peach revealed high synteny between the two genomes; thus the scaffolds were integrated into pseudomolecules using map- and synteny-based strategies. Whole-genome resequencing of six modern cultivars found 1,016,866 SNPs and 162,402 insertions/deletions, out of which 0.7% were deleterious. The sequence variants, as well as simple sequence repeats, can be used as DNA markers. The genomic information helps us to identify agronomically important genes and will accelerate genetic studies and breeding programs for sweet cherries. Further information on the genomic sequences and DNA markers is available in DBcherry (http://cherry.kazusa.or.jp (8 May 2017, date last accessed)). © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  8. Decoding the genome beyond sequencing: the new phase of genomic research.

    PubMed

    Heng, Henry H Q; Liu, Guo; Stevens, Joshua B; Bremer, Steven W; Ye, Karen J; Abdallah, Batoul Y; Horne, Steven D; Ye, Christine J

    2011-10-01

    While our understanding of gene-based biology has greatly improved, it is clear that the function of the genome and most diseases cannot be fully explained by genes and other regulatory elements. Genes and the genome represent distinct levels of genetic organization with their own coding systems; Genes code parts like protein and RNA, but the genome codes the structure of genetic networks, which are defined by the whole set of genes, chromosomes and their topological interactions within a cell. Accordingly, the genetic code of DNA offers limited understanding of genome functions. In this perspective, we introduce the genome theory which calls for the departure of gene-centric genomic research. To make this transition for the next phase of genomic research, it is essential to acknowledge the importance of new genome-based biological concepts and to establish new technology platforms to decode the genome beyond sequencing. Copyright © 2011 Elsevier Inc. All rights reserved.

  9. Mycobacterium tuberculosis increases IP-10 and MIG protein despite inhibition of IP-10 and MIG transcription.

    PubMed

    Bai, Xiyuan; Chmura, Kathryn; Ovrutsky, Alida R; Bowler, Russell P; Scheinman, Robert I; Oberley-Deegan, Rebecca E; Liu, Haiying; Shang, Shaobin; Ordway, Diane; Chan, Edward D

    2011-01-01

    Mycobacterium tuberculosis (MTB) has evolved methods to evade interferon-gamma (IFNγ) mediated protection. We sought to determine the effect of MTB infection on expression of IFNγ-inducible Protein 10 (IP-10) and Monokine Induced by IFNγ (MIG), two chemokines involved in host defense. MTB infection of THP-1 cells inhibited the transcription of IP-10 and MIG. A key mechanism for the inhibition is the disruption of binding of Signal Transduction and Activation of Transcription 1-alpha (STAT1α) to its cis-regulatory element, present in the 5'-flanking region of both IP-10 and MIG promoters. Use of inhibitors specific to the nuclear factor-kappa B (NFκB) and p38 mitogen-activated protein kinase (p38(mapk)) implicate these two signaling pathways in mediating the effect of MTB on the inhibition of IFNγ-induced IP-10 and MIG mRNA expression. Interestingly, despite transcriptional inhibition, there was an unexpected increase in IP-10 and MIG protein production after combined IFNγ and MTB stimulation. MTB also inhibited IFNγ induction of MIG mRNA but augmented MIG protein in primary human monocyte-derived macrophages. The synergy between MTB and IFNγ in the induction of IP-10 and MIG protein appears to involve novel post-transcriptional events that incorporates non-canonical functions of NFκB and p38(mapk). Published by Elsevier Ltd.

  10. The sheep genome reference sequence: a work in progress.

    PubMed

    Archibald, A L; Cockett, N E; Dalrymple, B P; Faraut, T; Kijas, J W; Maddox, J F; McEwan, J C; Hutton Oddy, V; Raadsma, H W; Wade, C; Wang, J; Wang, W; Xun, X

    2010-10-01

    Until recently, the construction of a reference genome was performed using Sanger sequencing alone. The emergence of next-generation sequencing platforms now means reference genomes may incorporate sequence data generated from a range of sequencing platforms, each of which have different read length, systematic biases and mate-pair characteristics. The objective of this review is to inform the mammalian genomics community about the experimental strategy being pursued by the International Sheep Genomics Consortium (ISGC) to construct the draft reference genome of sheep (Ovis aries). Component activities such as data generation, sequence assembly and annotation are described, along with information concerning the key researchers performing the work. This aims to foster future participation from across the research community through the coordinated activities of the consortium. The review also serves as a 'marker paper' by providing information concerning the pre-publication release of the reference genome. This ensures the ISGC adheres to the framework for data sharing established at the recent Toronto International Data Release Workshop and provides guidelines for data users.

  11. Sequencing and comparative analyses of the genomes of zoysiagrasses

    PubMed Central

    Tanaka, Hidenori; Hirakawa, Hideki; Kosugi, Shunichi; Nakayama, Shinobu; Ono, Akiko; Watanabe, Akiko; Hashiguchi, Masatsugu; Gondo, Takahiro; Ishigaki, Genki; Muguerza, Melody; Shimizu, Katsuya; Sawamura, Noriko; Inoue, Takayasu; Shigeki, Yuichi; Ohno, Naoki; Tabata, Satoshi; Akashi, Ryo; Sato, Shusei

    2016-01-01

    Zoysia is a warm-season turfgrass, which comprises 11 allotetraploid species (2n = 4x = 40), each possessing different morphological and physiological traits. To characterize the genetic systems of Zoysia plants and to analyse their structural and functional differences in individual species and accessions, we sequenced the genomes of Zoysia species using HiSeq and MiSeq platforms. As a reference sequence of Zoysia species, we generated a high-quality draft sequence of the genome of Z. japonica accession ‘Nagirizaki’ (334 Mb) in which 59,271 protein-coding genes were predicted. In parallel, draft genome sequences of Z. matrella ‘Wakaba’ and Z. pacifica ‘Zanpa’ were also generated for comparative analyses. To investigate the genetic diversity among the Zoysia species, genome sequence reads of three additional accessions, Z. japonica ‘Kyoto’, Z. japonica ‘Miyagi’ and Z. matrella ‘Chiba Fair Green’, were accumulated, and aligned against the reference genome of ‘Nagirizaki’ along with those from ‘Wakaba’ and ‘Zanpa’. As a result, we detected 7,424,163 single-nucleotide polymorphisms and 852,488 short indels among these species. The information obtained in this study will be valuable for basic studies on zoysiagrass evolution and genetics as well as for the breeding of zoysiagrasses, and is made available in the ‘Zoysia Genome Database’ at http://zoysia.kazusa.or.jp. PMID:26975196

  12. Contrasting DNA sequence organisation patterns in sauropsidian genomes.

    PubMed

    Epplen, J T; Diedrich, U; Wagenmann, M; Schmidtke, J; Engel, W

    1979-11-01

    The genomic DNA organisation patterns of four sauropsidian species, namely Python reticularis, Caiman crocodilus, Terrapene carolina triungius and Columba livia domestica were investigated by reassociation of short and long DNA fragments, by hyperchromicity measurements of reannealed fragments and by length estimations of S1-nuclease resistant repetitive duplexes. While the genomic DNA of the three reptilian species shows a short period interspersion pattern, the genome of the avian species is organised in a long period interspersion pattern apparently typical for birds. These findings are discussed in view of the close phylogenetic relationships of birds and reptiles, and also with regard to a possible relationship between the extent of sequence interspersion and genome size.

  13. Draft genome sequences of 21 Salmonella enterica serovar enteritidis strains.

    PubMed

    Timme, Ruth E; Allard, Marc W; Luo, Yan; Strain, Errol; Pettengill, James; Wang, Charles; Li, Cong; Keys, Christine E; Zheng, Jie; Stones, Robert; Wilson, Mark R; Musser, Steven M; Brown, Eric W

    2012-11-01

    Salmonella enterica subsp. enterica serovar Enteritidis is a common food-borne pathogen, often associated with shell eggs and poultry. Here, we report draft genomes of 21 S. Enteritidis strains associated with or related to the U.S.-wide 2010 shell egg recall. Eleven of these genomes were from environmental isolates associated with the egg outbreak, and 10 were reference isolates from previous years, unrelated to the outbreak. The whole-genome sequence data for these 21 human pathogen strains are being released in conjunction with the newly formed 100K Genome Project.

  14. Complete genome sequence of Shewanella putrefaciens. Final report

    SciTech Connect

    Heidelberg, John F.

    2001-04-01

    Seventy percent of the costs for genome sequencing Shewanella putrefaciens (oneidensis) were requested. These funds were expected to allow completion of the low-pass (5-fold) random sequencing and complete closure and annotation of the 200 kbp plasmid. Because of cost reduction that occurred during the period of this grant, these goals have been far exceeded. Currently, the S. putrefaciens genome is very nearly completely closed, even though the genome was significantly larger than expected and extremely repetitive. The entire genome sequence has been made BLAST searchable on the TIGR web page, and an extensive effort has been made to make data and analyses available to all researchers working on S. putrefaciens (oneidensis).

  15. Comparison of mitochondrial genome sequences of pangolins (Mammalia, Pholidota).

    PubMed

    Hassanin, Alexandre; Hugot, Jean-Pierre; van Vuuren, Bettine Jansen

    2015-04-01

    The complete mitochondrial genome was sequenced for three species of pangolins, Manis javanica, Phataginus tricuspis, and Smutsia temminckii, and comparisons were made with two other species, Manis pentadactyla and Phataginus tetradactyla. The genome of Manidae contains the 37 genes found in a typical mammalian genome, and the structure of the control region is highly conserved among species. In Manis, the overall base composition differs from that found in African genera. Phylogenetic analyses support the monophyly of the genera Manis, Phataginus, and Smutsia, as well as the basal division between Maninae and Smutsiinae. Comparisons with GenBank sequences reveal that the reference genomes of M. pentadactyla and P. tetradactyla (accession numbers NC_016008 and NC_004027) were sequenced from misidentified taxa, and that a new species of tree pangolin should be described in Gabon.

  16. Complete genome sequence of Streptobacillus moniliformis type strain (9901T)

    SciTech Connect

    Nolan, Matt; Gronow, Sabine; Lapidus, Alla L.; Ivanova, N; Copeland, A; Lucas, Susan; Glavina Del Rio, Tijana; Chen, Feng; Sims, David; Meincke, Linda; Bruce, David; Goodwin, Lynne A.; Han, Cliff; Detter, J. Chris; Ovchinnikova, Galina; Pati, Amrita; Mavromatis, K; Mikhailova, Natalia; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Rohde, Manfred; Sproer, Cathrin; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Chain, Patrick S. G.

    2009-01-01

    Streptobacillus moniliformis Levaditi et al. 1925 is the sole and type species of the genus, and is of phylogenetic interest because of its isolated location in the sparsely populated and neither taxonomically nor genomically much accessed family 'Leptotrichiaceae' within the phylum 'Fusobacteria'. S. moniliformis, a Gram-negative, non-motile and pleomorphic bacterium, is the etiologic agent of rat bite fever and Haverhill fever. Strain 9901T, the type strain of the species, was isolated from a patient with rat bite fever. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is only the second completed genome sequence of the order 'Fusobacteriales' and no more than the third sequence from the phylum 'Fusobacteria'. The 1,662,578 bp long chromosome and the 10,702 bp plasmid with a total of 1511 protein-coding and 55 RNA genes are part of the Genomic Encyclopedia of Bacteria and Archaea project.

  17. Complete genome sequence of Pirellula staleyi type strain (ATCC 27377).

    PubMed

    Clum, Alicia; Tindall, Brian J; Sikorski, Johannes; Ivanova, Natalia; Mavrommatis, Konstantinos; Lucas, Susan; Glavina, Tijana; Del Rio; Nolan, Matt; Chen, Feng; Tice, Hope; Pitluck, Sam; Cheng, Jan-Fang; Chertkov, Olga; Brettin, Thomas; Han, Cliff; Detter, John C; Kuske, Cheryl; Bruce, David; Goodwin, Lynne; Ovchinikova, Galina; Pati, Amrita; Mikhailova, Natalia; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Chain, Patrick; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lapidus, Alla

    2009-12-30

    Pirellula staleyi Schlesner and Hirsch 1987 is the type species of the genus Pirellula of the family Planctomycetaceae. Members of this pear- or teardrop-shaped bacterium show a clearly visible pointed attachment pole and can be distinguished from other Planctomycetes by a lack of true stalks. Strains closely related to the species have been isolated from fresh and brackish water, as well as from hypersaline lakes. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the order Planctomyces and only the second sequence from the phylum Planctobacteria/Planctomycetes. The 6,196,199 bp long genome with its 4773 protein-coding and 49 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  18. Long-read sequence assembly of the gorilla genome

    PubMed Central

    Gordon, David; Huddleston, John; Chaisson, Mark J. P.; Hill, Christopher M.; Kronenberg, Zev N.; Munson, Katherine M.; Malig, Maika; Raja, Archana; Fiddes, Ian; Hillier, LaDeana W.; Dunn, Christopher; Baker, Carl; Armstrong, Joel; Diekhans, Mark; Paten, Benedict; Shendure, Jay; Wilson, Richard K.; Haussler, David; Chin, Chen-Shan; Eichler, Evan E.

    2016-01-01

    Accurate sequence and assembly of genomes is a critical first step for studies of genetic variation. We generated a high-quality assembly of the gorilla genome using single-molecule, real-time sequence technology and a string graph de novo assembly algorithm. The new assembly improves contiguity by two to three orders of magnitude with respect to previously released assemblies, recovering 87% of missing reference exons and incomplete gene models. Although regions of large, high-identity segmental duplications remain largely unresolved, this comprehensive assembly provides new biological insight into genetic diversity, structural variation, gene loss, and representation of repeat structures within the gorilla genome. The approach provides a path forward for the routine assembly of mammalian genomes at a level approaching that of the current quality of the human genome. PMID:27034376

  19. Long-read sequence assembly of the gorilla genome.

    PubMed

    Gordon, David; Huddleston, John; Chaisson, Mark J P; Hill, Christopher M; Kronenberg, Zev N; Munson, Katherine M; Malig, Maika; Raja, Archana; Fiddes, Ian; Hillier, LaDeana W; Dunn, Christopher; Baker, Carl; Armstrong, Joel; Diekhans, Mark; Paten, Benedict; Shendure, Jay; Wilson, Richard K; Haussler, David; Chin, Chen-Shan; Eichler, Evan E

    2016-04-01

    Accurate sequence and assembly of genomes is a critical first step for studies of genetic variation. We generated a high-quality assembly of the gorilla genome using single-molecule, real-time sequence technology and a string graph de novo assembly algorithm. The new assembly improves contiguity by two to three orders of magnitude with respect to previously released assemblies, recovering 87% of missing reference exons and incomplete gene models. Although regions of large, high-identity segmental duplications remain largely unresolved, this comprehensive assembly provides new biological insight into genetic diversity, structural variation, gene loss, and representation of repeat structures within the gorilla genome. The approach provides a path forward for the routine assembly of mammalian genomes at a level approaching that of the current quality of the human genome.

  20. Complete genome sequence of Kangiella koreensis type strain (SW-125).

    PubMed

    Han, Cliff; Sikorski, Johannes; Lapidus, Alla; Nolan, Matt; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Copeland, Alex; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Chain, Patrick; Saunders, Elizabeth; Brettin, Thomas; Göker, Markus; Tindall, Brian J; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Detter, John C

    2009-11-22

    Kangiella koreensis (Yoon et al. 2004) is the type species of the genus and is of phylogenetic interest because of the very isolated location of the genus Kangiella in the gammaproteobacterial order Oceanospirillales. K. koreensis SW-125(T) is a Gram-negative, non-motile, non-spore-forming bacterium isolated from tidal flat sediments at Daepo Beach, Yellow Sea, Korea. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first completed genome sequence from the genus Kangiella and only the fourth genome from the order Oceanospirillales. This 2,852,073 bp long single replicon genome with its 2647 protein-coding and 48 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  1. Complete genome sequence of Haliangium ochraceum type strain (SMP-2).

    PubMed

    Ivanova, Natalia; Daum, Chris; Lang, Elke; Abt, Birte; Kopitz, Markus; Saunders, Elizabeth; Lapidus, Alla; Lucas, Susan; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Mavromatis, Konstantinos; Pati, Amrita; Mikhailova, Natalia; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Detter, John C; Brettin, Thomas; Rohde, Manfred; Göker, Markus; Bristow, Jim; Markowitz, Victor; Eisen, Jonathan A; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-28

    Haliangium ochraceum Fudou et al. 2002 is the type species of the genus Haliangium in the myxococcal family 'Haliangiaceae'. Members of the genus Haliangium are the first halophilic myxobacterial taxa described. The cells of the species follow a multicellular lifestyle in highly organized biofilms, called swarms, they decompose bacterial and yeast cells as most myxobacteria do. The fruiting bodies contain particularly small coccoid myxospores. H. ochraceum encodes the first actin homologue identified in a bacterial genome. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the myxococcal suborder Nannocystineae, and the 9,446,314 bp long single replicon genome with its 6,898 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  2. Sequencing of a new target genome: the Pediculus humanus humanus (Phthiraptera: Pediculidae) genome project.

    PubMed

    Pittendrigh, B R; Clark, J M; Johnston, J S; Lee, S H; Romero-Severson, J; Dasch, G A

    2006-11-01

    The human body louse, Pediculus humanus humanus (L.), and the human head louse, Pediculus humanus capitis, belong to the hemimetabolous order Phthiraptera. The body louse is the primary vector that transmits the bacterial agents of louse-borne relapsing fever, trench fever, and epidemic typhus. The genomes of the bacterial causative agents of several of these aforementioned diseases have been sequenced. Thus, determining the body louse genome will enhance studies of host-vector-pathogen interactions. Although not important as a major disease vector, head lice are of major social concern. Resistance to traditional pesticides used to control head and body lice have developed. It is imperative that new molecular targets be discovered for the development of novel compounds to control these insects. No complete genome sequence exists for a hemimetabolous insect species primarily because hemimetabolous insects often have large (2000 Mb) to very large (up to 16,300 Mb) genomes. Fortuitously, we determined that the human body louse has one of the smallest genome sizes known in insects, suggesting it may be a suitable choice as a minimal hemimetabolous genome in which many genes have been eliminated during its adaptation to human parasitism. Because many louse species infest birds and mammals, the body louse genome-sequencing project will facilitate studies of their comparative genomics. A 6-8X coverage of the body louse genome, plus sequenced expressed sequence tags, should provide the entomological, evolutionary biology, medical, and public health communities with useful genetic information.

  3. Complete Genome Sequence of Spiroplasma sp. NBRC 100390

    PubMed Central

    Haryono, Mindia; Lo, Wen-Sui; Gasparich, Gail E.

    2017-01-01

    ABSTRACT Spiroplasma sp. NBRC 100390 was initially described as a duplicate of S. atrichopogonis GNAT3597T (=ATCC BAA-520T) but later found to be different in the 16S rDNA sequences. Here, we report the complete genome sequence of this bacterium to establish its identity and to facilitate future investigation. PMID:28280009

  4. Sequencing the Genome of the Heirloom Watermelon Cultivar Charleston Gray

    USDA-ARS?s Scientific Manuscript database

    The genome of the watermelon cultivar Charleston Gray, a major heirloom which has been used in breeding programs of many watermelon cultivars, was sequenced. Our strategy involved a hybrid approach using the Illumina and 454/Titanium next-generation sequencing technologies. For Illumina, shotgun g...

  5. Genome sequence of Stachybotrys chartarum Strain 51-11

    EPA Science Inventory

    Stachybotrys chartarum strain 51-11 genome was sequenced by shotgun sequencing utilizing Illumina Hiseq 2000 and PacBio long read technology. Since Stachybotrys chartarum has been implicated in health impacts within water-damaged buildings, any information extracted from the geno...

  6. Complete genome sequence of a new maize-associated cytorhabdovirus

    USDA-ARS?s Scientific Manuscript database

    A new 11,877 nt cytorhabdovirus sequence with 6 open reading frames has been identified in a maize sample. It shares 50 and 51% genome-wide nucleotide sequence identity with northern cereal mosaic cytorhabdovirus (NCMV) and barley yellow striate mosaic cytorhabdovirus (BYSMV), respectively....

  7. Complete Genome Sequence of the Alfalfa latent virus

    PubMed Central

    Shao, Jonathan; Postnikova, Olga A.

    2015-01-01

    The first complete genome sequence of the Alfalfa latent carlavirus (ALV) was obtained by primer walking and Illumina RNA sequencing. The virus differs substantially from the Czech ALV isolate and the Pea streak virus isolate from Wisconsin. The absence of a clear nucleic acid-binding protein indicates ALV divergence from other carlaviruses. PMID:25883281

  8. Complete Genome Sequence of Kocuria palustris MU14/1

    PubMed Central

    Foecking, Mark F.

    2015-01-01

    Presented here is the first completely assembled genome sequence of Kocuria palustris, an actinobacterial species with broad ecological distribution. The single, circular chromosome of K. palustris MU14/1 comprises 2,854,447 bp, has a G+C content of 70.5%, and contains a deduced gene set of 2,521 coding sequences. PMID:26472837

  9. Genome sequence of Stachybotrys chartarum Strain 51-11

    EPA Science Inventory

    Stachybotrys chartarum strain 51-11 genome was sequenced by shotgun sequencing utilizing Illumina Hiseq 2000 and PacBio long read technology. Since Stachybotrys chartarum has been implicated in health impacts within water-damaged buildings, any information extracted from the geno...

  10. Draft Genome Sequence of Aeromonas sp. Strain EERV15

    PubMed Central

    Ehsani, Elham; Barrantes, Israel; Vandermaesen, Johanna; Geffers, Robert; Jarek, Michael; Boon, Nico; Springael, Dirk; Pieper, Dietmar H.

    2016-01-01

    We report here the draft genome sequence of Aeromonas sp. strain EERV15 isolated from sand filter. The organism most closely related to Aeromonas sp. EERV15 is Aeromonas veronii B565, with an average 83% amino acid sequence similarity of putatively encoded protein open reading frames. PMID:27540061

  11. Genome sequence analysis of the model grass Brachypodium distachyon: insights into grass genome evolution

    SciTech Connect

    Schulman, Al

    2009-08-09

    Three subfamilies of grasses, the Erhardtoideae (rice), the Panicoideae (maize, sorghum, sugar cane and millet), and the Pooideae (wheat, barley and cool season forage grasses) provide the basis of human nutrition and are poised to become major sources of renewable energy. Here we describe the complete genome sequence of the wild grass Brachypodium distachyon (Brachypodium), the first member of the Pooideae subfamily to be completely sequenced. Comparison of the Brachypodium, rice and sorghum genomes reveals a precise sequence- based history of genome evolution across a broad diversity of the grass family and identifies nested insertions of whole chromosomes into centromeric regions as a predominant mechanism driving chromosome evolution in the grasses. The relatively compact genome of Brachypodium is maintained by a balance of retroelement replication and loss. The complete genome sequence of Brachypodium, coupled to its exceptional promise as a model system for grass research, will support the development of new energy and food crops

  12. Genomic Sequencing of Single Microbial Cells from Environmental Samples

    SciTech Connect

    Ishoey, Thomas; Woyke, Tanja; Stepanauskas, Ramunas; Novotny, Mark; Lasken, Roger S.

    2008-02-01

    Recently developed techniques allow genomic DNA sequencing from single microbial cells [Lasken RS: Single-cell genomic sequencing using multiple displacement amplification, Curr Opin Microbiol 2007, 10:510-516]. Here, we focus on research strategies for putting these methods into practice in the laboratory setting. An immediate consequence of single-cell sequencing is that it provides an alternative to culturing organisms as a prerequisite for genomic sequencing. The microgram amounts of DNA required as template are amplified from a single bacterium by a method called multiple displacement amplification (MDA) avoiding the need to grow cells. The ability to sequence DNA from individual cells will likely have an immense impact on microbiology considering the vast numbers of novel organisms, which have been inaccessible unless culture-independent methods could be used. However, special approaches have been necessary to work with amplified DNA. MDA may not recover the entire genome from the single copy present in most bacteria. Also, some sequence rearrangements can occur during the DNA amplification reaction. Over the past two years many research groups have begun to use MDA, and some practical approaches to single-cell sequencing have been developed. We review the consensus that is emerging on optimum methods, reliability of amplified template, and the proper interpretation of 'composite' genomes which result from the necessity of combining data from several single-cell MDA reactions in order to complete the assembly. Preferred laboratory methods are considered on the basis of experience at several large sequencing centers where >70% of genomes are now often recovered from single cells. Methods are reviewed for preparation of bacterial fractions from environmental samples, single-cell isolation, DNA amplification by MDA, and DNA sequencing.

  13. Modeling genome coverage in single-cell sequencing

    PubMed Central

    Daley, Timothy; Smith, Andrew D.

    2014-01-01

    Motivation: Single-cell DNA sequencing is necessary for examining genetic variation at the cellular level, which remains hidden in bulk sequencing experiments. But because they begin with such small amounts of starting material, the amount of information that is obtained from single-cell sequencing experiment is highly sensitive to the choice of protocol employed and variability in library preparation. In particular, the fraction of the genome represented in single-cell sequencing libraries exhibits extreme variability due to quantitative biases in amplification and loss of genetic material. Results: We propose a method to predict the genome coverage of a deep sequencing experiment using information from an initial shallow sequencing experiment mapped to a reference genome. The observed coverage statistics are used in a non-parametric empirical Bayes Poisson model to estimate the gain in coverage from deeper sequencing. This approach allows researchers to know statistical features of deep sequencing experiments without actually sequencing deeply, providing a basis for optimizing and comparing single-cell sequencing protocols or screening libraries. Availability and implementation: The method is available as part of the preseq software package. Source code is available at http://smithlabresearch.org/preseq. Contact: andrewds@usc.edu Supplementary information: Supplementary material is available at Bioinformatics online. PMID:25107873

  14. Draft Genome Sequence of Mycobacterium colombiense

    PubMed Central

    Bouam, Amar; Robert, Catherine; Levasseur, Anthony

    2017-01-01

    ABSTRACT Mycobacterium colombiense is a rapidly growing mycobacterium initially isolated from the blood of an HIV-positive patient in Colombia. Its 5,854,893-bp draft genome exhibits a G+C content of 67.64%, 5,233 protein-coding genes, and 54 predicted RNA genes. PMID:28385843

  15. Complete genome sequence of southern tomato virus identified from China using next generation sequencing

    USDA-ARS?s Scientific Manuscript database

    Complete genome sequence of a double-stranded RNA (dsRNA) virus, southern tomato virus (STV), on tomatoes in China, was elucidated using small RNAs deep sequencing. The identified STV_CN12 shares 99% sequence identity to other isolates from Mexico, France, Spain, and U.S. This is the first report ...

  16. Revisiting the sequencing of the first tree genome: Populus trichocarpa.

    PubMed

    Wullschleger, Stan D; Weston, D J; DiFazio, S P; Tuskan, G A

    2013-04-01

    Ten years ago, it was announced that the Joint Genome Institute with funds provided by the Department of Energy, Office of Science, Biological and Environmental Research would sequence the black cottonwood (Populus trichocarpa Torr. & Gray) genome. This landmark decision was the culmination of work by the forest science community to develop Populus as a model system. Since its public release in late 2006, the availability of the Populus genome has spawned research in plant biology, morphology, genetics and ecology. Here we address how the tree physiologist has used this resource. More specifically, we revisit our earlier contention that the rewards of sequencing the Populus genome would depend on how quickly scientists working with woody perennials could adopt molecular approaches to investigate the mechanistic underpinnings of basic physiological processes. Several examples illustrate the integration of functional and comparative genomics into the forest sciences, especially in areas that target improved understanding of the developmental differences between woody perennials and herbaceous annuals (e.g., phase transitions). Sequencing the Populus genome and the availability of genetic and genomic resources has also been instrumental in identifying candidate genes that underlie physiological and morphological traits of interest. Genome-enabled research has advanced our understanding of how phenotype and genotype are related and provided insights into the genetic mechanisms whereby woody perennials adapt to environmental stress. In the future, we anticipate that low-cost, high-throughput sequencing will continue to facilitate research in tree physiology and enhance our understanding at scales of individual organisms and populations. A challenge remains, however, as to how genomic resources, including the Populus genome, can be used to understand ecosystem function. Although examples are limited, progress in this area is encouraging and will undoubtedly improve as

  17. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    PubMed

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp.

  18. Initial sequence and comparative analysis of the cat genome

    PubMed Central

    Pontius, Joan U.; Mullikin, James C.; Smith, Douglas R.; Lindblad-Toh, Kerstin; Gnerre, Sante; Clamp, Michele; Chang, Jean; Stephens, Robert; Neelam, Beena; Volfovsky, Natalia; Schäffer, Alejandro A.; Agarwala, Richa; Narfström, Kristina; Murphy, William J.; Giger, Urs; Roca, Alfred L.; Antunes, Agostinho; Menotti-Raymond, Marilyn; Yuhki, Naoya; Pecon-Slattery, Jill; Johnson, Warren E.; Bourque, Guillaume; Tesler, Glenn; O’Brien, Stephen J.

    2007-01-01

    The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing ∼65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence. PMID:17975172

  19. Sequence Analysis of the Genome of Carnation (Dianthus caryophyllus L.)

    PubMed Central

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-01-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. ‘Francesco’ was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568 887 315 bp, consisting of 45 088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16 644 bp and 60 737 bp, respectively, and the longest scaffold was 1 287 144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. PMID:24344172

  20. Genome sequencing of a single tardigrade Hypsibius dujardini individual

    PubMed Central

    Arakawa, Kazuharu; Yoshida, Yuki; Tomita, Masaru

    2016-01-01

    Tardigrades are ubiquitous microscopic animals that play an important role in the study of metazoan phylogeny. Most terrestrial tardigrades can withstand extreme environments by entering an ametabolic desiccated state termed anhydrobiosis. Due to their small size and the non-axenic nature of laboratory cultures, molecular studies of tardigrades are prone to contamination. To minimize the possibility of microbial contaminations and to obtain high-quality genomic information, we have developed an ultra-low input library sequencing protocol to enable the genome sequencing of a single tardigrade Hypsibius dujardini individual. Here, we describe the details of our sequencing data and the ultra-low input library preparation methodologies. PMID:27529330

  1. Genome sequencing of a single tardigrade Hypsibius dujardini individual.

    PubMed

    Arakawa, Kazuharu; Yoshida, Yuki; Tomita, Masaru

    2016-08-16

    Tardigrades are ubiquitous microscopic animals that play an important role in the study of metazoan phylogeny. Most terrestrial tardigrades can withstand extreme environments by entering an ametabolic desiccated state termed anhydrobiosis. Due to their small size and the non-axenic nature of laboratory cultures, molecular studies of tardigrades are prone to contamination. To minimize the possibility of microbial contaminations and to obtain high-quality genomic information, we have developed an ultra-low input library sequencing protocol to enable the genome sequencing of a single tardigrade Hypsibius dujardini individual. Here, we describe the details of our sequencing data and the ultra-low input library preparation methodologies.

  2. Genome Sequence of Luminous Piezophile Photobacterium phosphoreum ANT-2200

    PubMed Central

    Zhang, Sheng-Da; Barbe, Valérie; Garel, Marc; Zhang, Wei-Jia; Chen, Haitao; Santini, Claire-Lise; Murat, Dorothée; Jing, Hongmei; Zhao, Yuan; Lajus, Aurélie; Martini, Séverine; Pradel, Nathalie; Tamburini, Christian

    2014-01-01

    Bacteria of the genus Photobacterium thrive worldwide in oceans and show substantially varied lifestyles, including free-living, commensal, pathogenic, symbiotic, and piezophilic. Here, we present the genome sequence of a luminous, piezophilic Photobacterium phosphoreum strain, ANT-2200, isolated from a water column at 2,200 m depth in the Mediterranean Sea. It is the first genomic sequence of the P. phosphoreum group. An analysis of the sequence provides insight into the adaptation of bacteria to the deep-sea habitat. PMID:24744322

  3. [Organization of simple sequences in the Drosophilia melanoga ter genome].

    PubMed

    Vashakidze, R P; Mamulashvili, N A; Kalandarishvili, K G; Kolchinskiĭ, A M; Zaalishvili, M M

    1988-01-01

    Fragments of Drosophila melanogaster DNA that intensively hybridize with simple sequences poly[(dG-dT).(dC-dA)], poly[(dA).(dT)] and poly[(dG-dA).(dC-dT)] were cloned. The first two types of simple sequences are organized in these clones as separated stretches of moderate length, repeated many times within 12-15 kb. Each cluster contains only one type of the simple sequences and originates from a unique in the genome. In contrast, poly[(dG-dA).(dC-dT)] occurs in the genome as several isolated motifs.

  4. Legume genomics: understanding biology through DNA and RNA sequencing

    PubMed Central

    O'Rourke, Jamie A.; Bolon, Yung-Tsi; Bucciarelli, Bruna; Vance, Carroll P.

    2014-01-01

    Background The legume family (Leguminosae) consists of approx. 17 000 species. A few of these species, including, but not limited to, Phaseolus vulgaris, Cicer arietinum and Cajanus cajan, are important dietary components, providing protein for approx. 300 million people worldwide. Additional species, including soybean (Glycine max) and alfalfa (Medicago sativa), are important crops utilized mainly in animal feed. In addition, legumes are important contributors to biological nitrogen, forming symbiotic relationships with rhizobia to fix atmospheric N2 and providing up to 30 % of available nitrogen for the next season of crops. The application of high-throughput genomic technologies including genome sequencing projects, genome re-sequencing (DNA-seq) and transcriptome sequencing (RNA-seq) by the legume research community has provided major insights into genome evolution, genomic architecture and domestication. Scope and Conclusions This review presents an overview of the current state of legume genomics and explores the role that next-generation sequencing technologies play in advancing legume genomics. The adoption of next-generation sequencing and implementation of associated bioinformatic tools has allowed researchers to turn each species of interest into their own model organism. To illustrate the power of next-generation sequencing, an in-depth overview of the transcriptomes of both soybean and white lupin (Lupinus albus) is provided. The soybean transcriptome focuses on analysing seed development in two near-isogenic lines, examining the role of transporters, oil biosynthesis and nitrogen utilization. The white lupin transcriptome analysis examines how phosphate deficiency alters gene expression patterns, inducing the formation of cluster roots. Such studies illustrate the power of next-generation sequencing and bioinformatic analyses in elucidating the gene networks underlying biological processes. PMID:24769535

  5. Next-generation sequencing and large genome assemblies.

    PubMed

    Henson, Joseph; Tischler, German; Ning, Zemin

    2012-06-01

    The next-generation sequencing (NGS) revolution has drastically reduced time and cost requirements for sequencing of large genomes, and also qualitatively changed the problem of assembly. This article reviews the state of the art in de novo genome assembly, paying particular attention to mammalian-sized genomes. The strengths and weaknesses of the main sequencing platforms are highlighted, leading to a discussion of assembly and the new challenges associated with NGS data. Current approaches to assembly are outlined and the various software packages available are introduced and compared. The question of whether quality assemblies can be produced using short-read NGS data alone, or whether it must be combined with more expensive sequencing techniques, is considered. Prospects for future assemblers and tests of assembly performance are also discussed.

  6. Genomic sequencing in clinical practice: applications, challenges, and opportunities

    PubMed Central

    Krier, Joel B.; Kalia, Sarah S.; Green, Robert C.

    2016-01-01

    The development of massively parallel sequencing (or next-generation sequencing) has facilitated a rapid implementation of genomic sequencing in clinical medicine. Genomic sequencing (GS) is now an essential tool for evaluating rare disorders, identifying therapeutic targets in neoplasms, and screening for prenatal aneuploidy. Emerging applications, such as GS for preconception carrier screening and predisposition screening in healthy individuals, are being explored in research settings and utilized by members of the public eager to incorporate genomic information into their health management. The rapid pace of adoption has created challenges for all stakeholders in clinical GS, from standardizing variant interpretation approaches in clinical molecular laboratories to ensuring that nongeneticist clinicians are prepared for new types of clinical information. Clinical GS faces a pivotal moment, as the vast potential of new quantities and types of data enable further clinical innovation and complicated implementation questions continue to be resolved. PMID:27757064

  7. Genomic sequencing in clinical practice: applications, challenges, and opportunities.

    PubMed

    Krier, Joel B; Kalia, Sarah S; Green, Robert C

    2016-09-01

    The development of massively parallel sequencing (or next-generation sequencing) has facilitated a rapid implementation of genomic sequencing in clinical medicine. Genomic sequencing (GS) is now an essential tool for evaluating rare disorders, identifying therapeutic targets in neoplasms, and screening for prenatal aneuploidy. Emerging applications, such as GS for preconception carrier screening and predisposition screening in healthy individuals, are being explored in research settings and utilized by members of the public eager to incorporate genomic information into their health management. The rapid pace of adoption has created challenges for all stakeholders in clinical GS, from standardizing variant interpretation approaches in clinical molecular laboratories to ensuring that nongeneticist clinicians are prepared for new types of clinical information. Clinical GS faces a pivotal moment, as the vast potential of new quantities and types of data enable further clinical innovation and complicated implementation questions continue to be resolved.

  8. [DNA analysis for the post genome-sequencing era].

    PubMed

    Kambara, Hideki

    2002-05-01

    With the completion of the human genome sequencing, the new post genome-sequencing era has started. The major subjects are clarifying the function of genes to apply this information to medical as well as various industrial fields. Various DNA analysis methods and instruments for gene expression profiling as well as genetic diversity including SNPs typing are required and have been developed. Here, the history and technologies related to DNA analysis including the Wada project in the early 1980's, and the Human genome project from 1990 are described. Various new technologies have developed in this decade. They include a capillary gel array DNA sequencer, DNA chips, bead probe arrays, a new DNA sequencing method using pyrosequencing and an efficient SNP typing method by BAMPER.

  9. Next-generation sequencing and large genome assemblies

    PubMed Central

    Henson, Joseph; Tischler, German; Ning, Zemin

    2012-01-01

    The next-generation sequencing (NGS) revolution has drastically reduced time and cost requirements for sequencing of large genomes, and also qualitatively changed the problem of assembly. This article reviews the state of the art in de novo genome assembly, paying particular attention to mammalian-sized genomes. The strengths and weaknesses of the main sequencing platforms are highlighted, leading to a discussion of assembly and the new challenges associated with NGS data. Current approaches to assembly are outlined and the various software packages available are introduced and compared. The question of whether quality assemblies can be produced using short-read NGS data alone, or whether it must be combined with more expensive sequencing techniques, is considered. Prospects for future assemblers and tests of assembly performance are also discussed. PMID:22676195

  10. A physical map of the papaya genome with integrated genetic map and genome sequence

    PubMed Central

    2009-01-01

    Background Papaya is a major fruit crop in tropical and subtropical regions worldwide and has primitive sex chromosomes controlling sex determination in this trioecious species. The papaya genome was recently sequenced because of its agricultural importance, unique biological features, and successful application of transgenic papaya for resistance to papaya ringspot virus. As a part of the genome sequencing project, we constructed a BAC-based physical map using a high information-content fingerprinting approach to assist whole genome shotgun sequence assembly. Results The physical map consists of 963 contigs, representing 9.4× genome equivalents, and was integrated with the genetic map and genome sequence using BAC end sequences and a sequence-tagged high-density genetic map. The estimated genome coverage of the physical map is about 95.8%, while 72.4% of the genome was aligned to the genetic map. A total of 1,181 high quality overgo (overlapping oligonucleotide) probes representing conserved sequences in Arabidopsis and genetically mapped loci in Brassica were anchored on the physical map, which provides a foundation for comparative genomics in the Brassicales. The integrated genetic and physical map aligned with the genome sequence revealed recombination hotspots as well as regions suppressed for recombination across the genome, particularly on the recently evolved sex chromosomes. Suppression of recombination spread to the adjacent region of the male specific region of the Y chromosome (MSY), and recombination rates were recovered gradually and then exceeded the genome average. Recombination hotspots were observed at about 10 Mb away on both sides of the MSY, showing 7-fold increase compared with the genome wide average, demonstrating the dynamics of recombination of the sex chromosomes. Conclusion A BAC-based physical map of papaya was constructed and integrated with the genetic map and genome sequence. The integrated map facilitated the draft genome assembly

  11. Spectral entropy criteria for structural segmentation in genomic DNA sequences

    NASA Astrophysics Data System (ADS)

    Chechetkin, V. R.; Lobzin, V. V.

    2004-07-01

    The spectral entropy is calculated with Fourier structure factors and characterizes the level of structural ordering in a sequence of symbols. It may efficiently be applied to the assessment and reconstruction of the modular structure in genomic DNA sequences. We present the relevant spectral entropy criteria for the local and non-local structural segmentation in DNA sequences. The results are illustrated with the model examples and analysis of intervening exon-intron segments in the protein-coding regions.

  12. Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome

    PubMed Central

    Bergman, Casey M; Pfeiffer, Barret D; Rincón-Limas, Diego E; Hoskins, Roger A; Gnirke, Andreas; Mungall, Chris J; Wang, Adrienne M; Kronmiller, Brent; Pacleb, Joanne; Park, Soo; Stapleton, Mark; Wan, Kenneth; George, Reed A; de Jong, Pieter J; Botas, Juan; Rubin, Gerald M; Celniker, Susan E

    2002-01-01

    Background It is widely accepted that comparative sequence data can aid the functional annotation of genome sequences; however, the most informative species and features of genome evolution for comparison remain to be determined. Results We analyzed conservation in eight genomic regions (apterous, even-skipped, fushi tarazu, twist, and Rhodopsins 1, 2, 3 and 4) from four Drosophila species (D. erecta, D. pseudoobscura, D. willistoni, and D. littoralis) covering more than 500 kb of the D. melanogaster genome. All D. melanogaster genes (and 78-82% of coding exons) identified in divergent species such as D. pseudoobscura show evidence of functional constraint. Addition of a third species can reveal functional constraint in otherwise non-significant pairwise exon comparisons. Microsynteny is largely conserved, with rearrangement breakpoints, novel transposable element insertions, and gene transpositions occurring in similar numbers. Rates of amino-acid substitution are higher in uncharacterized genes relative to genes that have previously been studied. Conserved non-coding sequences (CNCSs) tend to be spatially clustered with conserved spacing between CNCSs, and clusters of CNCSs can be used to predict enhancer sequences. Conclusions Our results provide the basis for choosing species whose genome sequences would be most useful in aiding the functional annotation of coding and cis-regulatory sequences in Drosophila. Furthermore, this work shows how decoding the spatial organization of conserved sequences, such as the clustering of CNCSs, can complement efforts to annotate eukaryotic genomes on the basis of sequence conservation alone. PMID:12537575

  13. Sequence-based mapping of the polyploid wheat genome.

    PubMed

    Saintenac, Cyrille; Jiang, Dayou; Wang, Shichen; Akhunov, Eduard

    2013-07-08

    The emergence of new sequencing technologies has provided fast and cost-efficient strategies for high-resolution mapping of complex genomes. Although these approaches hold great promise to accelerate genome analysis, their application in studying genetic variation in wheat has been hindered by the complexity of its polyploid genome. Here, we applied the next-generation sequencing of a wheat doubled-haploid mapping population for high-resolution gene mapping and tested its utility for ordering shotgun sequence contigs of a flow-sorted wheat chromosome. A bioinformatical pipeline was developed for reliable variant analysis of sequence data generated for polyploid wheat mapping populations. The results of variant mapping were consistent with the results obtained using the wheat 9000 SNP iSelect assay. A reference map of the wheat genome integrating 2740 gene-associated single-nucleotide polymorphisms from the wheat iSelect assay, 1351 diversity array technology, 118 simple sequence repeat/sequence-tagged sites, and 416,856 genotyping-by-sequencing markers was developed. By analyzing the sequenced megabase-size regions of the wheat genome we showed that mapped markers are located within 40-100 kb from genes providing a possibility for high-resolution mapping at the level of a single gene. In our population, gene loci controlling a seed color phenotype cosegregated with 2459 markers including one that was located within the red seed color gene. We demonstrate that the high-density reference map presented here is a useful resource for gene mapping and linking physical and genetic maps of the wheat genome.

  14. Pervasive sequence patents cover the entire human genome

    PubMed Central

    2013-01-01

    The scope and eligibility of patents for genetic sequences have been debated for decades, but a critical case regarding gene patents (Association of Molecular Pathologists v. Myriad Genetics) is now reaching the US Supreme Court. Recent court rulings have supported the assertion that such patents can provide intellectual property rights on sequences as small as 15 nucleotides (15mers), but an analysis of all current US patent claims and the human genome presented here shows that 15mer sequences from all human genes match at least one other gene. The average gene matches 364 other genes as 15mers; the breast-cancer-associated gene BRCA1 has 15mers matching at least 689 other genes. Longer sequences (1,000 bp) still showed extensive cross-gene matches. Furthermore, 15mer-length claims from bovine and other animal patents could also claim as much as 84% of the genes in the human genome. In addition, when we expanded our analysis to full-length patent claims on DNA from all US patents to date, we found that 41% of the genes in the human genome have been claimed. Thus, current patents for both short and long nucleotide sequences are extraordinarily non-specific and create an uncertain, problematic liability for genomic medicine, especially in regard to targeted re-sequencing and other sequence diagnostic assays. PMID:23522065

  15. Next Generation DNA Sequencing and the Future of Genomic Medicine

    PubMed Central

    Anderson, Matthew W.; Schrijver, Iris

    2010-01-01

    In the years since the first complete human genome sequence was reported, there has been a rapid development of technologies to facilitate high-throughput sequence analysis of DNA (termed “next-generation” sequencing). These novel approaches to DNA sequencing offer the promise of complete genomic analysis at a cost feasible for routine clinical diagnostics. However, the ability to more thoroughly interrogate genomic sequence raises a number of important issues with regard to result interpretation, laboratory workflow, data storage, and ethical considerations. This review describes the current high-throughput sequencing platforms commercially available, and compares the inherent advantages and disadvantages of each. The potential applications for clinical diagnostics are considered, as well as the need for software and analysis tools to interpret the vast amount of data generated. Finally, we discuss the clinical and ethical implications of the wealth of genetic information generated by these methods. Despite the challenges, we anticipate that the evolution and refinement of high-throughput DNA sequencing technologies will catalyze a new era of personalized medicine based on individualized genomic analysis. PMID:24710010

  16. Draft genome sequence of the Tibetan antelope.

    PubMed

    Ge, Ri-Li; Cai, Qingle; Shen, Yong-Yi; San, A; Ma, Lan; Zhang, Yong; Yi, Xin; Chen, Yan; Yang, Lingfeng; Huang, Ying; He, Rongjun; Hui, Yuanyuan; Hao, Meirong; Li, Yue; Wang, Bo; Ou, Xiaohua; Xu, Jiaohui; Zhang, Yongfen; Wu, Kui; Geng, Chunyu; Zhou, Weiping; Zhou, Taicheng; Irwin, David M; Yang, Yingzhong; Ying, Liu; Bao, Haihua; Kim, Jaebum; Larkin, Denis M; Ma, Jian; Lewin, Harris A; Xing, Jinchuan; Platt, Roy N; Ray, David A; Auvil, Loretta; Capitanu, Boris; Zhang, Xiufeng; Zhang, Guojie; Murphy, Robert W; Wang, Jun; Zhang, Ya-Ping; Wang, Jian

    2013-01-01

    The Tibetan antelope (Pantholops hodgsonii) is endemic to the extremely inhospitable high-altitude environment of the Qinghai-Tibetan Plateau, a region that has a low partial pressure of oxygen and high ultraviolet radiation. Here we generate a draft genome of this artiodactyl and use it to detect the potential genetic bases of highland adaptation. Compared with other plain-dwelling mammals, the genome of the Tibetan antelope shows signals of adaptive evolution and gene-family expansion in genes associated with energy metabolism and oxygen transmission. Both the highland American pika, and the Tibetan antelope have signals of positive selection for genes involved in DNA repair and the production of ATPase. Genes associated with hypoxia seem to have experienced convergent evolution. Thus, our study suggests that common genetic mechanisms might have been utilized to enable high-altitude adaptation.

  17. Draft genome sequence of the Tibetan antelope

    PubMed Central

    Ge, Ri-Li; Cai, Qingle; Shen, Yong-Yi; San, A; Ma, Lan; Zhang, Yong; Yi, Xin; Chen, Yan; Yang, Lingfeng; Huang, Ying; He, Rongjun; Hui, Yuanyuan; Hao, Meirong; Li, Yue; Wang, Bo; Ou, Xiaohua; Xu, Jiaohui; Zhang, Yongfen; Wu, Kui; Geng, Chunyu; Zhou, Weiping; Zhou, Taicheng; Irwin, David M.; Yang, Yingzhong; Ying, Liu; Bao, Haihua; Kim, Jaebum; Larkin, Denis M.; Ma, Jian; Lewin, Harris A.; Xing, Jinchuan; Platt, Roy N.; Ray, David A.; Auvil, Loretta; Capitanu, Boris; Zhang, Xiufeng; Zhang, Guojie; Murphy, Robert W.; Wang, Jun; Zhang, Ya-Ping; Wang, Jian

    2013-01-01

    The Tibetan antelope (Pantholops hodgsonii) is endemic to the extremely inhospitable high-altitude environment of the Qinghai-Tibetan Plateau, a region that has a low partial pressure of oxygen and high ultraviolet radiation. Here we generate a draft genome of this artiodactyl and use it to detect the potential genetic bases of highland adaptation. Compared with other plain-dwelling mammals, the genome of the Tibetan antelope shows signals of adaptive evolution and gene-family expansion in genes associated with energy metabolism and oxygen transmission. Both the highland American pika, and the Tibetan antelope have signals of positive selection for genes involved in DNA repair and the production of ATPase. Genes associated with hypoxia seem to have experienced convergent evolution. Thus, our study suggests that common genetic mechanisms might have been utilized to enable high-altitude adaptation. PMID:23673643

  18. Genome Sequence of Mycobacteriophage ErnieJ.

    PubMed

    Robinson, Courtney J; London-Thomas, Laricca Y; Dickson, Leon A; Clinton, Tiffany A; Baig, Hana; Bute, Maude; Fahad, Mohammed; Farrakhan, Kanhai; Grady, Neshaun; Guthrie, Nicholas E; Hafid, Ruoa; Harvey, Jayla; Hunnicutt, Kellie; Larsen, Victoria L; McDuffie, Taashaylaray; McGee, Earyn N; Pailin, Jillian Y; Peacock, Bria; Thomas, Antolice; Anderson, Winston A

    2016-11-23

    ErnieJ, a cluster C mycobacteriophage that infects Mycobacterium smegmatis mc(2)155, was recovered from soil in Washington, DC. Its genome is 153,243 bp in size and encodes 227 predicted proteins, 30 tRNAs, and one transfer-messenger RNA (tmRNA). Ten percent of the predicted proteins have homologs in phages that infect nonmycobacterial Actinobacteria.

  19. Genome Sequence of Mycobacteriophage ErnieJ

    PubMed Central

    London-Thomas, Laricca Y.; Dickson, Leon A.; Clinton, Tiffany A.; Baig, Hana; Bute, Maude; Fahad, Mohammed; Farrakhan, Kanhai; Grady, Neshaun; Guthrie, Nicholas E.; Hafid, Ruoa; Harvey, Jayla; Hunnicutt, Kellie; Larsen, Victoria L.; McDuffie, Taashaylaray; McGee, Earyn N.; Pailin, Jillian Y.; Peacock, Bria; Thomas, Antolice; Anderson, Winston A.

    2016-01-01

    ErnieJ, a cluster C mycobacteriophage that infects Mycobacterium smegmatis mc2155, was recovered from soil in Washington, DC. Its genome is 153,243 bp in size and encodes 227 predicted proteins, 30 tRNAs, and one transfer-messenger RNA (tmRNA). Ten percent of the predicted proteins have homologs in phages that infect nonmycobacterial Actinobacteria. PMID:27881532

  20. Genome Sequencing Fishes out Longevity Genes.

    PubMed

    Lakhina, Vanisha; Murphy, Coleen T

    2015-12-03

    Understanding the molecular basis underlying aging is critical if we are to fully understand how and why we age-and possibly how to delay the aging process. Up until now, most longevity pathways were discovered in invertebrates because of their short lifespans and availability of genetic tools. Now, Reichwald et al. and Valenzano et al. independently provide a reference genome for the short-lived African turquoise killifish, establishing its role as a vertebrate system for aging research.

  1. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

    SciTech Connect

    Liolios, Konstantinos; Chen, Amy; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Phil; Markowitz, Victor; Kyrpides, Nikos C.

    2009-09-01

    The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification.

  2. Whole genome sequencing of Chinese clearhead icefish, Protosalanx hyalocranius.

    PubMed

    Liu, Kai; Xu, Dongpo; Li, Jia; Bian, Chao; Duan, Jinrong; Zhou, Yanfeng; Zhang, Minying; You, Xinxin; You, Yang; Chen, Jieming; Yu, Hui; Xu, Gangchun; Fang, Di-An; Qiang, Jun; Jiang, Shulun; He, Jie; Xu, Junmin; Shi, Qiong; Zhang, Zhiyong; Xu, Pao

    2017-04-01

    Chinese clearhead icefish, Protosalanx hyalocranius , is a representative icefish species with economic importance and special appearance. Due to its great economic value in China, the fish was introduced into Lake Dianchi and several other lakes from the Lake Taihu half a century ago. Similar to the Sinocyclocheilus cavefish, the clearhead icefish has certain cavefish-like traits, such as transparent body and nearly scaleless skin. Here, we provide the whole genome sequence of this surface-dwelling fish and generated a draft genome assembly, aiming at exploring molecular mechanisms for the biological interests. A total of 252.1 Gb of raw reads were sequenced. Subsequently, a novel draft genome assembly was generated, with the scaffold N50 reaching 1.163 Mb. The genome completeness was estimated to be 98.39 % by using the CEGMA evaluation. Finally, we annotated 19 884 protein-coding genes and observed that repeat sequences account for 24.43 % of the genome assembly. We report the first draft genome of the Chinese clearhead icefish. The genome assembly will provide a solid foundation for further molecular breeding and germplasm resource protection in Chinese clearhead icefish, as well as other icefishes. It is also a valuable genetic resource for revealing the molecular mechanisms for the cavefish-like characters.

  3. The Diploid Genome Sequence of an Individual Human

    PubMed Central

    Levy, Samuel; Sutton, Granger; Ng, Pauline C; Feuk, Lars; Halpern, Aaron L; Walenz, Brian P; Axelrod, Nelson; Huang, Jiaqi; Kirkness, Ewen F; Denisov, Gennady; Lin, Yuan; MacDonald, Jeffrey R; Pang, Andy Wing Chun; Shago, Mary; Stockwell, Timothy B; Tsiamouri, Alexia; Bafna, Vineet; Bansal, Vikas; Kravitz, Saul A; Busam, Dana A; Beeson, Karen Y; McIntosh, Tina C; Remington, Karin A; Abril, Josep F; Gill, John; Borman, Jon; Rogers, Yu-Hui; Frazier, Marvin E; Scherer, Stephen W; Strausberg, Robert L; Venter, J. Craig

    2007-01-01

    Presented here is a genome sequence of an individual human. It was produced from ∼32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information. PMID:17803354

  4. The diploid genome sequence of an individual human.

    PubMed

    Levy, Samuel; Sutton, Granger; Ng, Pauline C; Feuk, Lars; Halpern, Aaron L; Walenz, Brian P; Axelrod, Nelson; Huang, Jiaqi; Kirkness, Ewen F; Denisov, Gennady; Lin, Yuan; MacDonald, Jeffrey R; Pang, Andy Wing Chun; Shago, Mary; Stockwell, Timothy B; Tsiamouri, Alexia; Bafna, Vineet; Bansal, Vikas; Kravitz, Saul A; Busam, Dana A; Beeson, Karen Y; McIntosh, Tina C; Remington, Karin A; Abril, Josep F; Gill, John; Borman, Jon; Rogers, Yu-Hui; Frazier, Marvin E; Scherer, Stephen W; Strausberg, Robert L; Venter, J Craig

    2007-09-04

    Presented here is a genome sequence of an individual human. It was produced from approximately 32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2-206 bp), 292,102 heterozygous insertion/deletion events (indels)(1-571 bp), 559,473 homozygous indels (1-82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.

  5. The maize genome as a model for efficient sequence analysis of large plant genomes.

    PubMed

    Rabinowicz, Pablo D; Bennetzen, Jeffrey L

    2006-04-01

    The genomes of flowering plants vary in size from about 0.1 to over 100 gigabase pairs (Gbp), mostly because of polyploidy and variation in the abundance of repetitive elements in intergenic regions. High-quality sequences of the relatively small genomes of Arabidopsis (0.14 Gbp) and rice (0.4 Gbp) have now been largely completed. The sequencing of plant genomes that have a more representative size (the mean for flowering plant genomes is 5.6 Gbp) has been seen as a daunting task, partly because of their size and partly because of the numerous highly conserved repeats. Nevertheless, creative strategies and powerful new tools have been generated recently in the plant genetics community, so that sequencing large plant genomes is now a realistic possibility. Maize (2.4-2.7 Gbp) will be the first gigabase-size plant genome to be sequenced using these novel approaches. Pilot studies on maize indicate that the new gene-enrichment, gene-finishing and gene-orientation technologies are efficient, robust and comprehensive. These strategies will succeed in sequencing the gene-space of large genome plants, and in locating all of these genes and adjacent sequences on the genetic and physical maps.

  6. Comprehensive genome sequence analysis of a breast cancer amplicon.

    PubMed

    Collins, C; Volik, S; Kowbel, D; Ginzinger, D; Ylstra, B; Cloutier, T; Hawkins, T; Predki, P; Martin, C; Wernick, M; Kuo, W L; Alberts, A; Gray, J W

    2001-06-01

    Gene amplification occurs in most solid tumors and is associated with poor prognosis. Amplification of 20q13.2 is common to several tumor types including breast cancer. The 1 Mb of sequence spanning the 20q13.2 breast cancer amplicon is one of the most exhaustively studied segments of the human genome. These studies have included amplicon mapping by comparative genomic hybridization (CGH), fluorescent in-situ hybridization (FISH), array-CGH, quantitative microsatellite analysis (QUMA), and functional genomic studies. Together these studies revealed a complex amplicon structure suggesting the presence of at least two driver genes in some tumors. One of these, ZNF217, is capable of immortalizing human mammary epithelial cells (HMEC) when overexpressed. In addition, we now report the sequencing of this region in human and mouse, and on quantitative expression studies in tumors. Amplicon localization now is straightforward and the availability of human and mouse genomic sequence facilitates their functional analysis. However, comprehensive annotation of megabase-scale regions requires integration of vast amounts of information. We present a system for integrative analysis and demonstrate its utility on 1.2 Mb of sequence spanning the 20q13.2 breast cancer amplicon and 865 kb of syntenic murine sequence. We integrate tumor genome copy number measurements with exhaustive genome landscape mapping, showing that amplicon boundaries are associated with maxima in repetitive element density and a region of evolutionary instability. This integration of comprehensive sequence annotation, quantitative expression analysis, and tumor amplicon boundaries provide evidence for an additional driver gene prefoldin 4 (PFDN4), coregulated genes, conserved noncoding regions, and associate repetitive elements with regions of genomic instability at this locus.

  7. Evolution Analysis of Simple Sequence Repeats in Plant Genome.

    PubMed

    Qin, Zhen; Wang, Yanping; Wang, Qingmei; Li, Aixian; Hou, Fuyun; Zhang, Liming

    2015-01-01

    Simple sequence repeats (SSRs) are widespread units on genome sequences, and play many important roles in plants. In order to reveal the evolution of plant genomes, we investigated the evolutionary regularities of SSRs during the evolution of plant species and the plant kingdom by analysis of twelve sequenced plant genome sequences. First, in the twelve studied plant genomes, the main SSRs were those which contain repeats of 1-3 nucleotides combination. Second, in mononucleotide SSRs, the A/T percentage gradually increased along with the evolution of plants (except for P. patens). With the increase of SSRs repeat number the percentage of A/T in C. reinhardtii had no significant change, while the percentage of A/T in terrestrial plants species gradually declined. Third, in dinucleotide SSRs, the percentage of AT/TA increased along with the evolution of plant kingdom and the repeat number increased in terrestrial plants species. This trend was more obvious in dicotyledon than monocotyledon. The percentage of CG/GC showed the opposite pattern to the AT/TA. Forth, in trinucleotide SSRs, the percentages of combinations including two or three A/T were in a rising trend along with the evolution of plant kingdom; meanwhile with the increase of SSRs repeat number in plants species, different species chose different combinations as dominant SSRs. SSRs in C. reinhardtii, P. patens, Z. mays and A. thaliana showed their specific patterns related to evolutionary position or specific changes of genome sequences. The results showed that, SSRs not only had the general pattern in the evolution of plant kingdom, but also were associated with the evolution of the specific genome sequence. The study of the evolutionary regularities of SSRs provided new insights for the analysis of the plant genome evolution.

  8. Sequencing the nuclear genome of the extinct woolly mammoth.

    PubMed

    Miller, Webb; Drautz, Daniela I; Ratan, Aakrosh; Pusey, Barbara; Qi, Ji; Lesk, Arthur M; Tomsho, Lynn P; Packard, Michael D; Zhao, Fangqing; Sher, Andrei; Tikhonov, Alexei; Raney, Brian; Patterson, Nick; Lindblad-Toh, Kerstin; Lander, Eric S; Knight, James R; Irzyk, Gerard P; Fredrikson, Karin M; Harkins, Timothy T; Sheridan, Sharon; Pringle, Tom; Schuster, Stephan C

    2008-11-20

    In 1994, two independent groups extracted DNA from several Pleistocene epoch mammoths and noted differences among individual specimens. Subsequently, DNA sequences have been published for a number of extinct species. However, such ancient DNA is often fragmented and damaged, and studies to date have typically focused on short mitochondrial sequences, never yielding more than a fraction of a per cent of any nuclear genome. Here we describe 4.17 billion bases (Gb) of sequence from several mammoth specimens, 3.3 billion (80%) of which are from the woolly mammoth (Mammuthus primigenius) genome and thus comprise an extensive set of genome-wide sequence from an extinct species. Our data support earlier reports that elephantid genomes exceed 4 Gb. The estimated divergence rate between mammoth and African elephant is half of that between human and chimpanzee. The observed number of nucleotide differences between two particular mammoths was approximately one-eighth of that between one of them and the African elephant, corresponding to a separation between the mammoths of 1.5-2.0 Myr. The estimated probability that orthologous elephant and mammoth amino acids differ is 0.002, corresponding to about one residue per protein. Differences were discovered between mammoth and African elephant in amino-acid positions that are otherwise invariant over several billion years of combined mammalian evolution. This study shows that nuclear genome sequencing of extinct species can reveal population differences not evident from the fossil record, and perhaps even discover genetic factors that affect extinction.

  9. The exceptional genomic word symmetry along DNA sequences.

    PubMed

    Afreixo, Vera; Rodrigues, João M O S; Bastos, Carlos A C; Silva, Raquel M

    2016-02-03

    The second Chargaff's parity rule and its extensions are recognized as universal phenomena in DNA sequences. However, parity of the frequencies of reverse complementary oligonucleotides could be a mere consequence of the single nucleotide parity rule, if nucleotide independence is assumed. Exceptional symmetry (symmetry beyond that expected under an independent nucleotide assumption) was proposed previously as a meaningful measure of the extension of the second parity rule to oligonucleotides. The global exceptional symmetry was detected in long and short genomes. To explore the exceptional genomic word symmetry along the genome sequences, we propose a sliding window method to extract the values of exceptional symmetry (for all words or by word groups). We compare the exceptional symmetry effect size distribution in all human chromosomes against control scenarios (positive and negative controls), testing the differences and performing a residual analysis. We explore local exceptional symmetry in equivalent composition word groups, and find that the behaviour of the local exceptional symmetry depends on the word group. We conclude that the exceptional symmetry is a local phenomenon in genome sequences, with distinct characteristics along the sequence of each chromosome. The local exceptional symmetry along the genomic sequences shows outlying segments, and those segments have high biological annotation density.

  10. Genome sequence of the pea aphid Acyrthosiphon pisum.

    PubMed

    2010-02-23

    Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetabolous insect provides an outgroup to the multiple published genomes of holometabolous insects. Pea aphids are host-plant specialists, they can reproduce both sexually and asexually, and they have coevolved with an obligate bacterial symbiont. Here we highlight findings from whole genome analysis that may be related to these unusual biological features. These findings include discovery of extensive gene duplication in more than 2000 gene families as well as loss of evolutionarily conserved genes. Gene family expansions relative to other published genomes include genes involved in chromatin modification, miRNA synthesis, and sugar transport. Gene losses include genes central to the IMD immune pathway, selenoprotein utilization, purine salvage, and the entire urea cycle. The pea aphid genome reveals that only a limited number of genes have been acquired from bacteria; thus the reduced gene count of Buchnera does not reflect gene transfer to the host genome. The inventory of metabolic genes in the pea aphid genome suggests that there is extensive metabolite exchange between the aphid and Buchnera, including sharing of amino acid biosynthesis between the aphid and Buchnera. The pea aphid genome provides a foundation for post-genomic studies of fundamental biological questions and applied agricultural problems.

  11. Genome Sequence of the Pea Aphid Acyrthosiphon pisum

    PubMed Central

    2010-01-01

    Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetabolous insect provides an outgroup to the multiple published genomes of holometabolous insects. Pea aphids are host-plant specialists, they can reproduce both sexually and asexually, and they have coevolved with an obligate bacterial symbiont. Here we highlight findings from whole genome analysis that may be related to these unusual biological features. These findings include discovery of extensive gene duplication in more than 2000 gene families as well as loss of evolutionarily conserved genes. Gene family expansions relative to other published genomes include genes involved in chromatin modification, miRNA synthesis, and sugar transport. Gene losses include genes central to the IMD immune pathway, selenoprotein utilization, purine salvage, and the entire urea cycle. The pea aphid genome reveals that only a limited number of genes have been acquired from bacteria; thus the reduced gene count of Buchnera does not reflect gene transfer to the host genome. The inventory of metabolic genes in the pea aphid genome suggests that there is extensive metabolite exchange between the aphid and Buchnera, including sharing of amino acid biosynthesis between the aphid and Buchnera. The pea aphid genome provides a foundation for post-genomic studies of fundamental biological questions and applied agricultural problems. PMID:20186266

  12. Whole-genome sequencing and analysis of the Malaysian cynomolgus macaque (Macaca fascicularis) genome.

    PubMed

    Higashino, Atsunori; Sakate, Ryuichi; Kameoka, Yosuke; Takahashi, Ichiro; Hirata, Makoto; Tanuma, Reiko; Masui, Tohru; Yasutomi, Yasuhiro; Osada, Naoki

    2012-07-02

    The genetic background of the cynomolgus macaque (Macaca fascicularis) is made complex by the high genetic diversity, population structure, and gene introgression from the closely related rhesus macaque (Macaca mulatta). Herein we report the whole-genome sequence of a Malaysian cynomolgus macaque male with more than 40-fold coverage, which was determined using a resequencing method based on the Indian rhesus macaque genome. We identified approximately 9.7 million single nucleotide variants (SNVs) between the Malaysian cynomolgus and the Indian rhesus macaque genomes. Compared with humans, a smaller nonsynonymous/synonymous SNV ratio in the cynomolgus macaque suggests more effective removal of slightly deleterious mutations. Comparison of two cynomolgus (Malaysian and Vietnamese) and two rhesus (Indian and Chinese) macaque genomes, including previously published macaque genomes, suggests that Indochinese cynomolgus macaques have been more affected by gene introgression from rhesus macaques. We further identified 60 nonsynonymous SNVs that completely differentiated the cynomolgus and rhesus macaque genomes, and that could be important candidate variants for determining species-specific responses to drugs and pathogens. The demographic inference using the genome sequence data revealed that Malaysian cynomolgus macaques have experienced at least three population bottlenecks. This list of whole-genome SNVs will be useful for many future applications, such as an array-based genotyping system for macaque individuals. High-quality whole-genome sequencing of the cynomolgus macaque genome may aid studies on finding genetic differences that are responsible for phenotypic diversity in macaques and may help control genetic backgrounds among individuals.

  13. Toward Complete Bacterial Genome Sequencing Through the Combined Use of Multiple Next-Generation Sequencing Platforms.

    PubMed

    Jeong, Haeyoung; Lee, Dae-Hee; Ryu, Choong-Min; Park, Seung-Hwan

    2016-01-01

    PacBio's long-read sequencing technologies can be successfully used for a complete bacterial genome assembly using recently developed non-hybrid assemblers in the absence of secondgeneration, high-quality short reads. However, standardized procedures that take into account multiple pre-existing second-generation sequencing platforms are scarce. In addition to Illumina HiSeq and Ion Torrent PGM-based genome sequencing results derived from previous studies, we generated further sequencing data, including from the PacBio RS II platform, and applied various bioinformatics tools to obtain complete genome assemblies for five bacterial strains. Our approach revealed that the hierarchical genome assembly process (HGAP) non-hybrid assembler resulted in nearly complete assemblies at a moderate coverage of ~75x, but that different versions produced non-compatible results requiring post processing. The other two platforms further improved the PacBio assembly through scaffolding and a final error correction.

  14. Sequencing skippy: the genome sequence of an Australian kangaroo, Macropus eugenii

    PubMed Central

    2011-01-01

    Sequencing of the tammar wallaby (Macropus eugenii) reveals insights into genome evolution, and mammalian reproduction and development. See research article: http://genomebiology.com/2011/12/8/R81 PMID:21861852

  15. The clinical potential and challenges of sequencing cancer genomes for personalized medical genomics.

    PubMed

    Cloonan, Nicole; Waddell, Nic; Grimmond, Sean M

    2010-11-01

    Next-generation sequencing is revolutionizing the way in which genomic-scale biological research is performed, and its effects are beginning to be translated medically. Large-scale international collaborations for the comprehensive sequencing of the genome, epigenome, and transcriptomes of cancers and corresponding 'normal' (germ-line) DNA are heralding the start of personalized medical genomics. The promise of eliminating conjecture when determining treatment approaches is certainly appealing for both patients and clinicians; however, several major issues must be resolved before next-generation sequencing will be adopted as a routine clinical tool for patients. This feature review explores the clinical potential and challenges of studying cancer genomes for personalized medical genomics.

  16. Genomic Sequence or Signature Tags (GSTs) from the Genome Group at Brookhaven National Laboratory (BNL)

    DOE Data Explorer

    Dunn, John J.; McCorkle, Sean R.; Praissman, Laura A.; Hind, Geoffrey; Van der Lelie, Daniel; Bahou, Wadie F.; Gnatenko, Dmitri V.; Krause, Maureen K.

    Genomic Signature Tags (GSTs) are the products of a method we have developed for identifying and quantitatively analyzing genomic DNAs. The DNA is initially fragmented with a type II restriction enzyme. An oligonucleotide adaptor containing a recognition site for MmeI, a type IIS restriction enzyme, is then used to release 21-bp tags from fixed positions in the DNA relative to the sites recognized by the fragmenting enzyme. These tags are PCR-amplified, purified, concatenated and then cloned and sequenced. The tag sequences and abundances are used to create a high resolution GST sequence profile of the genomic DNA. [Quoted from Genomic Signature Tags (GSTs): A System for Profiling Genomic DNA, Dunn, John J.; McCorkle, Sean R.; Praissman, Laura A.; Hind, Geoffrey; Van der Lelie, Daniel; Bahou, Wadie F.; Gnatenko, Dmitri V.; Krause, Maureen K., Revised 9/13/2002

  17. Complete mitochondrial genome sequence of Cheirotonus jansoni (Coleoptera: Scarabaeidae).

    PubMed

    Shao, L L; Huang, D Y; Sun, X Y; Hao, J S; Cheng, C H; Zhang, W; Yang, Q

    2014-02-20

    We sequenced the complete mitochondrial genome (mitogenome) of Cheirotonus jansoni (Coleoptera: Scarabaeidae), an endangered insect species from Southeast Asia. This long legged scarab is widely collected and reared for sale, although it is rare and protected in the wild. The circular genome is 17,249 bp long and contains a typical gene complement: 13 protein-coding genes, 2 rRNA genes, 22 putative tRNA genes, and a non-coding AT-rich region. Its gene order and arrangement are identical to the common type found in most insect mitogenomes. As with all other sequenced coleopteran species, a 5-bp long TAGTA motif was detected in the intergenic space sequence located between trnS(UCN) and nad1. The atypical cox1 start codon is AAC, and the putative initiation codon for the atp8 gene appears to be GTC, instead of the frequently found ATN. By sequence comparison, the 2590-bp long non-coding AT-rich region is the second longest among the coleopterans, with two tandem repeat regions: one is 10 copies of an 88-bp sequence and the other is 2 copies of a 153-bp sequence. Additionally, the A+T content (64%) of the 13 protein-coding genes is the lowest among all sequenced coleopteran species. This newly sequenced genome aids in our understanding of the comparative biology of the mitogenomes of coleopteran species and supplies important data for the conservation of this species.

  18. Genomic Sequence of Canine Papillomavirus 19

    PubMed Central

    Tisza, Michael J.; Yuan, Hang; Schlegel, Richard

    2016-01-01

    It is generally assumed that individual papillomas (warts) are caused by infection with individual papillomavirus types. Deep sequencing of virions extracted from a canine oral papilloma revealed the presence of canine papillomavirus 1 (CPV1), CPV2, and a novel canine papillomavirus, CPV19. This suggests that papillomas sometimes harbor multiple viral species. PMID:27932663

  19. Draft genome sequence of Bacillus endophyticus 2102.

    PubMed

    Lee, Yong-Jik; Lee, Sang-Jae; Kim, Sun Hong; Lee, Sang Jun; Kim, Byoung-Chan; Lee, Han-Seung; Jeong, Haeyoung; Lee, Dong-Woo

    2012-10-01

    Bacillus endophyticus 2102 is an endospore-forming, plant growth-promoting rhizobacterium isolated from a hypersaline pond in South Korea. Here we present the draft sequence of B. endophyticus 2102, which is of interest because of its potential use in the industrial production of algaecides and bioplastics and for the treatment of industrial textile effluents.

  20. 959 Nematode Genomes: a semantic wiki for coordinating sequencing projects.

    PubMed

    Kumar, Sujai; Schiffer, Philipp H; Blaxter, Mark

    2012-01-01

    Genome sequencing has been democratized by second-generation technologies, and even small labs can sequence metazoan genomes now. In this article, we describe '959 Nematode Genomes'--a community-curated semantic wiki to coordinate the sequencing efforts of individual labs to collectively sequence 959 genomes spanning the phylum Nematoda. The main goal of the wiki is to track sequencing projects that have been proposed, are in progress, or have been completed. Wiki pages for species and strains are linked to pages for people and organizations, using machine- and human-readable metadata that users can query to see the status of their favourite worm. The site is based on the same platform that runs Wikipedia, with semantic extensions that allow the underlying taxonomy and data storage models to be maintained and updated with ease compared with a conventional database-driven web site. The wiki also provides a way to track and share preliminary data if those data are not polished enough to be submitted to the official sequence repositories. In just over a year, this wiki has already fostered new international collaborations and attracted newcomers to the enthusiastic community of nematode genomicists. www.nematodegenomes.org.

  1. Complete genome sequence of Alicyclobacillus acidocaldarius type strain (104-IAT)

    PubMed Central

    Mavromatis, Konstantinos; Sikorski, Johannes; Lapidus, Alla; Glavina Del Rio, Tijana; Copeland, Alex; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Nolan, Matt; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Chain, Patrick; Meincke, Linda; Sims, David; Chertkov, Olga; Han, Cliff; Brettin, Thomas; Detter, John C.; Wahrenburg, Claudia; Rohde, Manfred; Pukall, Rüdiger; Göker, Markus; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C.

    2010-01-01

    Alicyclobacillus acidocaldarius (Darland and Brock 1971) is the type species of the larger of the two genera in the bacillal family ‘Alicyclobacillaceae’. A. acidocaldarius is a free-living and non-pathogenic organism, but may also be associated with food and fruit spoilage. Due to its acidophilic nature, several enzymes from this species have since long been subjected to detailed molecular and biochemical studies. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family ‘Alicyclobacillaceae’. The 3,205,686 bp long genome (chromosome and three plasmids) with its 3,153 protein-coding and 82 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304673

  2. Complete genome sequence of Nakamurella multipartita type strain (Y-104).

    PubMed

    Tice, Hope; Mayilraj, Shanmugam; Sims, David; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Glavina Del Rio, Tijana; Copeland, Alex; Cheng, Jan-Fang; Meincke, Linda; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Detter, John C; Brettin, Thomas; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Chen, Feng

    2010-03-30

    Nakamurella multipartita (Yoshimi et al. 1996) Tao et al. 2004 is the type species of the monospecific genus Nakamurella in the actinobacterial suborder Frankineae. The nonmotile, coccus-shaped strain was isolated from activated sludge acclimated with sugar-containing synthetic wastewater, and is capable of accumulating large amounts of polysaccharides in its cells. Here we describe the features of the organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of a member of the family Nakamurellaceae. The 6,060,298 bp long single replicon genome with its 5415 protein-coding and 56 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  3. Complete genome sequence of Arthrobacter sp. strain FB24

    SciTech Connect

    Nakatsu, C. H.; Barabote, Ravi; Thompson, Sue; Bruce, David; Detter, Chris; Brettin, T.; Han, Cliff F.; Beasley, Federico; Chen, Weimin; Konopka, Allan; Xie, Gary

    2013-09-30

    Arthrobacter sp. strain FB24 is a species in the genus Arthrobacter Conn and Dimmick 1947, in the family Micrococcaceae and class Actinobacteria. A number of Arthrobacter genome sequences have been completed because of their important role in soil, especially bioremediation. This isolate is of special interest because it is tolerant to multiple metals and it is extremely resistant to elevated concentrations of chromate. The genome consists of a 4,698,945 bp circular chromosome and three plasmids (96,488, 115,507, and 159,536 bp, a total of 5,070,478 bp), coding 4,536 proteins of which 1,257 are without known function. This genome was sequenced as part of the DOE Joint Genome Institute Program.

  4. Complete genome sequence of Arthrobacter sp. strain FB24

    PubMed Central

    Nakatsu, Cindy H.; Barabote, Ravi; Thompson, Sue; Bruce, David; Detter, Chris; Brettin, Thomas; Han, Cliff; Beasley, Federico; Chen, Weimin; Konopka, Allan; Xie, Gary

    2013-01-01

    Arthrobacter sp. strain FB24 is a species in the genus Arthrobacter Conn and Dimmick 1947, in the family Micrococcaceae and class Actinobacteria. A number of Arthrobacter genome sequences have been completed because of their important role in soil, especially bioremediation. This isolate is of special interest because it is tolerant to multiple metals and it is extremely resistant to elevated concentrations of chromate. The genome consists of a 4,698,945 bp circular chromosome and three plasmids (96,488, 115,507, and 159,536 bp, a total of 5,070,478 bp), coding 4,536 proteins of which 1,257 are without known function. This genome was sequenced as part of the DOE Joint Genome Institute Program. PMID:24501649

  5. Complete genome sequence of Tsukamurella paurometabola type strain (no. 33).

    PubMed

    Munk, A Christine; Lapidus, Alla; Lucas, Susan; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Del Rio, Tijana Glavina; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Huntemann, Marcel; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Tapia, Roxanne; Han, Cliff; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Brettin, Thomas; Yasawong, Montri; Brambilla, Evelyne-Marie; Rohde, Manfred; Sikorski, Johannes; Göker, Markus; Detter, John C; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2011-07-01

    Tsukamurella paurometabola corrig. (Steinhaus 1941) Collins et al. 1988 is the type species of the genus Tsukamurella, which is the type genus to the family Tsukamurellaceae. The species is not only of interest because of its isolated phylogenetic location, but also because it is a human opportunistic pathogen with some strains of the species reported to cause lung infection, lethal meningitis, and necrotizing tenosynovitis. This is the first completed genome sequence of a member of the genus Tsukamurella and the first genome sequence of a member of the family Tsukamurellaceae. The 4,479,724 bp long genome contains a 99,806 bp long plasmid and a total of 4,335 protein-coding and 56 RNA genes, and is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  6. Complete genome sequence of Alicyclobacillus acidocaldarius type strain (104-IAT)

    SciTech Connect

    Mavromatis, K; Sikorski, Johannes; Lapidus, Alla L.; Glavina Del Rio, Tijana; Copeland, A; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Nolan, Matt; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Ivanova, N; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Meincke, Linda; Sims, David; Chertkov, Olga; Han, Cliff; Brettin, Tom; Detter, J C; Wahrenburg, Claudia; Rohde, Manfred; Pukall, Rudiger; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2010-01-01

    Alicyclobacillus acidocaldarius (Darland and Brock 1971) is the type species of the larger of the two genera in the bacillal family Alicyclobacillaceae . A. acidocaldarius is a free-living and non-pathogenic organism, but may also be associated with food and fruit spoilage. Due to its acidophilic nature, several enzymes from this species have since long been subjected to detailed molecular and biochemical studies. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family Alicyclobacillaceae . The 3,205,686 bp long genome (chromosome and three plasmids) with its 3,153 protein-coding and 82 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  7. Whole genome sequencing in clinical and public health microbiology

    PubMed Central

    Kwong, J. C.; McCallum, N.; Sintchenko, V.; Howden, B. P.

    2015-01-01

    SummaryGenomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology. The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology. Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories. As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future. Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure. PMID:25730631

  8. The tomato genome sequence provides insights into fleshy fruit evolution.

    PubMed

    2012-05-30

    Tomato (Solanum lycopersicum) is a major crop plant and a model system for fruit development. Solanum is one of the largest angiosperm genera and includes annual and perennial plants from diverse habitats. Here we present a high-quality genome sequence of domesticated tomato, a draft sequence of its closest wild relative, Solanum pimpinellifolium, and compare them to each other and to the potato genome (Solanum tuberosum). The two tomato genomes show only 0.6% nucleotide divergence and signs of recent admixture, but show more than 8% divergence from potato, with nine large and several smaller inversions. In contrast to Arabidopsis, but similar to soybean, tomato and potato small RNAs map predominantly to gene-rich chromosomal regions, including gene promoters. The Solanum lineage has experienced two consecutive genome triplications: one that is ancient and shared with rosids, and a more recent one. These triplications set the stage for the neofunctionalization of genes controlling fruit characteristics, such as colour and fleshiness.

  9. Complete genome sequence of Arthrobacter sp. strain FB24.

    PubMed

    Nakatsu, Cindy H; Barabote, Ravi; Thompson, Sue; Bruce, David; Detter, Chris; Brettin, Thomas; Han, Cliff; Beasley, Federico; Chen, Weimin; Konopka, Allan; Xie, Gary

    2013-10-16

    Arthrobacter sp. strain FB24 is a species in the genus Arthrobacter Conn and Dimmick 1947, in the family Micrococcaceae and class Actinobacteria. A number of Arthrobacter genome sequences have been completed because of their important role in soil, especially bioremediation. This isolate is of special interest because it is tolerant to multiple metals and it is extremely resistant to elevated concentrations of chromate. The genome consists of a 4,698,945 bp circular chromosome and three plasmids (96,488, 115,507, and 159,536 bp, a total of 5,070,478 bp), coding 4,536 proteins of which 1,257 are without known function. This genome was sequenced as part of the DOE Joint Genome Institute Program.

  10. Whole genome sequencing in clinical and public health microbiology.

    PubMed

    Kwong, J C; McCallum, N; Sintchenko, V; Howden, B P

    2015-04-01

    Genomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology.The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology.Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories.As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future.Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure.

  11. The complete plastid genome sequence of Bomarea edulis (Alstroemeriaceae: Liliales).

    PubMed

    Kim, Jung Sung; Kim, Hyoung Tae; Yoon, Chang Young; Kim, Joo-Hwan

    2016-05-01

    Bomarea, a member of the family Alstroemeriaceae, is distributed from Chile to Mexico and includes approximately 120 species. Recent molecular phylogenetic studies have clarified the monophyly of the family within the order Liliales and the sister relationship with the family Colchicaceae. At this time, five plastid genomes of Liliales have been analyzed at the familial level. To examine plastid genome variation at the generic level, we sequenced the plastid genome of Bomarea edulis, which is the most widely distributed species in the genus, and compared it with Alstroemeria aurea. The plastid genome sequence of B. edulis was 154,925 bp in length with a similar structure as A. aurea, excluding the IR-LSC junction. Ycf68 and infA were pseudogenes caused by frameshift mutations, and the ycf15 gene was deleted, similar to A. aurea.

  12. Complete genome sequence of Desulfotomaculum acetoxidans type strain (5575).

    PubMed

    Spring, Stefan; Lapidus, Alla; Schröder, Maren; Gleim, Dorothea; Sims, David; Meincke, Linda; Glavina Del Rio, Tijana; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Nolan, Matt; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Chain, Patrick; Saunders, Elizabeth; Brettin, Thomas; Detter, John C; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Han, Cliff

    2009-11-22

    Desulfotomaculum acetoxidans Widdel and Pfennig 1977 was one of the first sulfate-reducing bacteria known to grow with acetate as sole energy and carbon source. It is able to oxidize substrates completely to carbon dioxide with sulfate as the electron acceptor, which is reduced to hydrogen sulfide. All available data about this species are based on strain 5575(T), isolated from piggery waste in Germany. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a Desulfotomaculum species with validly published name. The 4,545,624 bp long single replicon genome with its 4370 protein-coding and 100 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  13. Complete genome sequence of Chitinophaga pinensis type strain (UQM 2034).

    PubMed

    Glavina Del Rio, Tijana; Abt, Birte; Spring, Stefan; Lapidus, Alla; Nolan, Matt; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Chain, Patrick; Saunders, Elizabeth; Detter, John C; Brettin, Thomas; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lucas, Susan

    2010-02-28

    Chitinophaga pinensis Sangkhobol and Skerman 1981 is the type strain of the species which is the type species of the rapidly growing genus Chitinophaga in the sphingobacterial family 'Chitinophagaceae'. Members of the genus Chitinophaga vary in shape between filaments and spherical bodies without the production of a fruiting body, produce myxospores, and are of special interest for their ability to degrade chitin. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the family 'Chitinophagaceae', and the 9,127,347 bp long single replicon genome with its 7,397 protein-coding and 95 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  14. Complete genome sequence of Desulfohalobium retbaense type strain (HR(100)).

    PubMed

    Spring, Stefan; Nolan, Matt; Lapidus, Alla; Glavina Del Rio, Tijana; Copeland, Alex; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Land, Miriam; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Munk, Christine; Kiss, Hajnalka; Chain, Patrick; Han, Cliff; Brettin, Thomas; Detter, John C; Schüler, Esther; Göker, Markus; Rohde, Manfred; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-28

    Desulfohalobium retbaense (Ollivier et al. 1991) is the type species of the polyphyletic genus Desulfohalobium, which comprises, at the time of writing, two species and represents the family Desulfohalobiaceae within the Deltaproteobacteria. D. retbaense is a moderately halophilic sulfate-reducing bacterium, which can utilize H(2) and a limited range of organic substrates, which are incompletely oxidized to acetate and CO(2), for growth. The type strain HR(100) (T) was isolated from sediments of the hypersaline Retba Lake in Senegal. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the family Desulfohalobiaceae. The 2,909,567 bp genome (one chromosome and a 45,263 bp plasmid) with its 2,552 protein-coding and 57 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  15. Complete genome sequence of Nocardiopsis dassonvillei type strain (IMRU 509).

    PubMed

    Sun, Hui; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Del Rio, Tijana Glavina; Tice, Hope; Cheng, Jan-Fang; Tapia, Roxane; Han, Cliff; Goodwin, Lynne; Pitluck, Sam; Pagani, Ioanna; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Djao, Olivier Duplex Ngatchou; Rohde, Manfred; Sikorski, Johannes; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-11-30

    Nocardiopsis dassonvillei (Brocq-Rousseau 1904) Meyer 1976 is the type species of the genus Nocardiopsis, which in turn is the type genus of the family Nocardiopsaceae. This species is of interest because of its ecological versatility. Members of N. dassonvillei have been isolated from a large variety of natural habitats such as soil and marine sediments, from different plant and animal materials as well as from human patients. Moreover, representatives of the genus Nocardiopsis participate actively in biopolymer degradation. This is the first complete genome sequence in the family Nocardiopsaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 6,543,312 bp long genome consist of a 5.77 Mbp chromosome and a 0.78 Mbp plasmid and with its 5,570 protein-coding and 77 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  16. Complete genome sequence of Desulfotomaculum acetoxidans type strain (5575T)

    SciTech Connect

    Spring, Stefan; Lapidus, Alla L.; Schroder, Maren; Gleim, Dorothea; Sims, David; Meincke, Linda; Glavina Del Rio, Tijana; Tice, Hope; Copeland, A; Cheng, Jan-Fang; Chen, Feng; Lucas, Susan; Nolan, Matt; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Saunders, Elizabeth H; Brettin, Tom; Detter, J. Chris; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Han, Cliff

    2009-01-01

    Desulfotomaculum acetoxidans Widdel and Pfennig 1977 was one of the first sulfate-reducing bacteria known to grow with acetate as sole energy and carbon source. It is able to oxidize substrates completely to carbon dioxide with sulfate as the electron acceptor, which is reduced to hydrogen sulfide. All available data about this species are based on strain 5575T, isolated from piggery waste in Germany. Here we describe the features of this organ-ism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a Desulfotomaculum species with validly published name. The 4,545,624 bp long single replicon genome with its 4370 protein-coding and 100 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  17. Draft genome sequence of the rubber tree Hevea brasiliensis

    PubMed Central

    2013-01-01

    Background Hevea brasiliensis, a member of the Euphorbiaceae family, is the major commercial source of natural rubber (NR). NR is a latex polymer with high elasticity, flexibility, and resilience that has played a critical role in the world economy since 1876. Results Here, we report the draft genome sequence of H. brasiliensis. The assembly spans ~1.1 Gb of the estimated 2.15 Gb haploid genome. Overall, ~78% of the genome was identified as repetitive DNA. Gene prediction shows 68,955 gene models, of which 12.7% are unique to Hevea. Most of the key genes associated with rubber biosynthesis, rubberwood formation, disease resistance, and allergenicity have been identified. Conclusions The knowledge gained from this genome sequence will aid in the future development of high-yielding clones to keep up with the ever increasing need for natural rubber. PMID:23375136

  18. Complete genome sequence of Arcobacter nitrofigilis type strain (CIT)

    SciTech Connect

    Pati, Amrita; Gronow, Sabine; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Han, Cliff; Chertkov, Olga; Bruce, David; Tapia, Roxanne; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Jeffries, Cynthia; Detter, J. Chris; Rohde, Manfred; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2010-01-01

    Arcobacter nitrofigilis (McClung et al. 1983) Vandamme et al. 1991 is the type species of the genus Arcobacter in the epsilonproteobacterial family Campylobacteraceae. The species was first described in 1983 as Campylobacter nitrofigilis [1] after its detection as a free-living, nitrogen-fixing Campylobacter species associated with Spartina alterniflora Loisel. roots [2]. It is of phylogenetic interest because of its lifestyle as a symbiotic organism in a marine environment in contrast to many other Arcobacter species which are associated with warm-blooded animals and tend to be pathogenic. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a type stain of the genus Arcobacter. The 3,192,235 bp genome with its 3,154 protein-coding and 70 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  19. Using the NCBI Map Viewer to browse genomic sequence data.

    PubMed

    Wolfsberg, Tyra G

    2011-04-01

    This unit includes a basic protocol with an introduction to the Map Viewer, describing how to perform a simple text-based search of genome annotations to view the genomic context of a gene, navigate along a chromosome, zoom in and out, and change the displayed maps to hide and show information. It also describes some of NCBI's sequence-analysis tools, which are provided as links from the Map Viewer. The alternate protocols describe different ways to query the genome sequence, and also illustrate additional features of the Map Viewer. Alternate Protocol 1 shows how to perform and interpret the results of a BLAST search against the human genome. Alternate Protocol 2 demonstrates how to retrieve a list of all genes between two STS markers. Finally, Alternate Protocol 3 shows how to find all annotated members of a gene family. 2011 by John Wiley & Sons, Inc.

  20. Using the NCBI map viewer to browse genomic sequence data.

    PubMed

    Wolfsberg, Tyra G

    2010-03-01

    This unit includes a Basic Protocol with an introduction to the Map Viewer, describing how to perform a simple text-based search of genome annotations to view the genomic context of a gene, navigate along a chromosome, zoom in and out, and change the displayed maps to hide and show information. It also describes some of NCBI's sequence-analysis tools, which are provided as links from the Map Viewer. The Alternate Protocols describe different ways to query the genome sequence, and also illustrate additional features of the Map Viewer. Alternate Protocol 1 shows how to perform and interpret the results of a BLAST search against the human genome. Alternate Protocol 2 demonstrates how to retrieve a list of all genes between two STS markers. Finally, Alternate Protocol 3 shows how to find all annotated members of a gene family. (c) 2010 by John Wiley & Sons, Inc.