Science.gov

Sample records for genome sequencing centers

  1. Genome Sequencing Centers

    Cancer.gov

    The Cancer Genome Atlas (TCGA) Genome Sequencing Centers (GSCs) perform large-scale DNA sequencing using the latest sequencing technologies. Supported by the National Human Genome Research Institute (NHGRI) large-scale sequencing program, the GSCs generate the enormous volume of data required by TCGA, while continually improving existing technologies and methods to expand the frontier of what can be achieved in cancer genome sequencing.

  2. Genome Science: A Video Tour of the Washington University Genome Sequencing Center for High School and Undergraduate Students

    ERIC Educational Resources Information Center

    Flowers, Susan K.; Easter, Carla; Holmes, Andrea; Cohen, Brian; Bednarski, April E.; Mardis, Elaine R.; Wilson, Richard K.; Elgin, Sarah C. R.

    2005-01-01

    Sequencing of the human genome has ushered in a new era of biology. The technologies developed to facilitate the sequencing of the human genome are now being applied to the sequencing of other genomes. In 2004, a partnership was formed between Washington University School of Medicine Genome Sequencing Center's Outreach Program and Washington…

  3. Operational streamlining in a high-throughput genome sequencing center

    E-print Network

    Person, Kerry P. (Kerry Patrick)

    2006-01-01

    Advances in medicine rely on accurate data that is rapidly provided. It is therefore critical for the Genome Sequencing platform of the Broad Institute of MIT and Harvard to continually strive to reduce cost, improve ...

  4. Whole Genome Sequencing

    MedlinePLUS

    ... you want to learn. Search form Search Whole Genome Sequencing You are here Home Testing & Services Testing ... the full story, click here . What is whole genome sequencing? Whole genome sequencing is the mapping out ...

  5. Introducing National Center for Genome Resources (NCGR) Informatics (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Crow, John [National Center for Genome Resources

    2013-01-25

    John Crow from the National Center for Genome Resources discusses his organization's informatics at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  6. Introducing National Center for Genome Resources (NCGR) Informatics (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    SciTech Connect

    Crow, John

    2012-06-01

    John Crow from the National Center for Genome Resources discusses his organization's informatics at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  7. Complete Genome Sequence of Cupriavidus basilensis 4G11, Isolated from the Oak Ridge Field Research Center Site

    PubMed Central

    Ray, Jayashree; Waters, R. Jordan; Skerker, Jeffrey M.; Kuehl, Jennifer V.; Price, Morgan N.; Huang, Jiawen; Chakraborty, Romy; Arkin, Adam P.

    2015-01-01

    Cupriavidus basilensis 4G11 was isolated from groundwater at the Oak Ridge Field Research Center (FRC) site. Here, we report the complete genome sequence and annotation of Cupriavidus basilensis 4G11. The genome contains 8,421,483 bp, 7,661 predicted protein-coding genes, and a total GC content of 64.4%. PMID:25977418

  8. Complete genome sequence of Cupriavidus basilensis 4G11, isolated from the Oak Ridge Field Research Center site

    DOE PAGESBeta

    Ray, Jayashree; Waters, R. Jordan; Skerker, Jeffrey M.; Kuehl, Jennifer V.; Price, Morgan N.; Huang, Jiawen; Chakraborty, Romy; Arkin, Adam P.; Deutschbauer, Adam

    2015-05-14

    Cupriavidus basilensis 4G11 was isolated from groundwater at the Oak Ridge Field Research Center (FRC) site. Here, we report the complete genome sequence and annotation of Cupriavidus basilensis 4G11. The genome contains 8,421,483 bp, 7,661 predicted protein-coding genes, and a total GC content of 64.4%.

  9. Genome Characterization Centers

    Cancer.gov

    Genomics is a fast-moving field with novel technologies and platforms that help characterize the genome being made available to the research community on a continual basis. The Cancer Genome Atlas (TCGA) Genome Characterization Centers (GCCs) are responsible for characterizing all of the genomic changes found in the tumors studied as part of the TCGA program.

  10. Funding Opportunity: Genomic Data Centers

    Cancer.gov

    Funding Opportunity CCG, Funding Opportunity Center for Cancer Genomics, CCG, Center for Cancer Genomics, CCG RFA, Center for cancer genomics rfa, genomic data analysis network, genomic data analysis network centers,

  11. Pash: Efficient Genome-Scale Sequence Anchoring by Positional Hashing

    E-print Network

    Batzoglou, Serafim

    and Molecular Biophysics, 2 Bioinformatics Research Laboratory, 3 Human Genome Sequencing Center, Department of chimpanzee whole-genome shotgun sequencing reads onto the human genome. The results of these comparisons

  12. Genome Sequencing and Cancer

    PubMed Central

    Mardis, Elaine R.

    2012-01-01

    New technologies for DNA sequencing, coupled with advanced analytical approaches, are now providing unprecedented speed and precision in decoding human genomes. This combination of technology and analysis, when applied to the study of cancer genomes, is revealing specific and novel information about the fundamental genetic mechanisms that underlie cancer’s development and progression. This review outlines the history of the past several years of development in this realm, and discusses the current and future applications that will further elucidate cancer’s genomic causes. PMID:22534183

  13. Genome Data Analysis Centers

    Cancer.gov

    The use of novel technologies, the need to integrate different data types and the immense quantity of data generated by The Cancer Genome Atlas (TCGA) Research Network has led to an expansion of the TCGA Research Network to include new centers devoted to data analysis. The Genome Data Analysis Centers (GDACs) work hand-in-hand with the Genome Characterization Centers (GCCs) to develop state-of-the-art tools that assist researchers with processing and integrating data analyses across the entire genome.

  14. Complete genome sequence of Cupriavidus basilensis 4G11, isolated from the Oak Ridge Field Research Center site

    SciTech Connect

    Ray, Jayashree; Waters, R. Jordan; Skerker, Jeffrey M.; Kuehl, Jennifer V.; Price, Morgan N.; Huang, Jiawen; Chakraborty, Romy; Arkin, Adam P.; Deutschbauer, Adam

    2015-05-14

    Cupriavidus basilensis 4G11 was isolated from groundwater at the Oak Ridge Field Research Center (FRC) site. Here, we report the complete genome sequence and annotation of Cupriavidus basilensis 4G11. The genome contains 8,421,483 bp, 7,661 predicted protein-coding genes, and a total GC content of 64.4%.

  15. Genome sequencing conference II

    SciTech Connect

    Not Available

    1990-01-01

    Genome Sequencing Conference 2 was held September 30 to October 30, 1990. 26 speaker abstracts and 33 poster presentations were included in the program report. New and improved methods for DNA sequencing and genetic mapping were presented. Many of the papers were concerned with accuracy and speed of acquisition of data with computers and automation playing an increasing role. Individual papers have been processed separately for inclusion on the database.

  16. Unlocking hidden genomic sequence

    PubMed Central

    Keith, Jonathan M.; Cochran, Duncan A. E.; Lala, Gita H.; Adams, Peter; Bryant, Darryn; Mitchelson, Keith R.

    2004-01-01

    Despite the success of conventional Sanger sequencing, significant regions of many genomes still present major obstacles to sequencing. Here we propose a novel approach with the potential to alleviate a wide range of sequencing difficulties. The technique involves extracting target DNA sequence from variants generated by introduction of random mutations. The introduction of mutations does not destroy original sequence information, but distributes it amongst multiple variants. Some of these variants lack problematic features of the target and are more amenable to conventional sequencing. The technique has been successfully demonstrated with mutation levels up to an average 18% base substitution and has been used to read previously intractable poly(A), AT-rich and GC-rich motifs. PMID:14973330

  17. Prenatal Whole Genome Sequencing

    PubMed Central

    Donley, Greer; Hull, Sara Chandros; Berkman, Benjamin E.

    2014-01-01

    With whole genome sequencing set to become the preferred method of prenatal screening, we need to pay more attention to the massive amount of information it will deliver to parents—and the fact that we don't yet understand what most of it means. PMID:22777977

  18. Center for Cancer Genomics | Office of Cancer Genomics

    Cancer.gov

    The Center for Cancer Genomics (CCG) was established to unify the National Cancer Institute's activities in cancer genomics, with the goal of advancing genomics research and translating findings into the clinic to improve the precise diagnosis and treatment of cancers. In addition to promoting genomic sequencing approach

  19. Towards Sequencing Cotton (Gossypium) Genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Despite rapidly decreasing costs and innovative technologies, sequencing of angiosperm genomes is not yet undertaken lightly. Generating larger amounts of sequence data more quickly does not address the difficulties of sequencing and assembling complex genomes de novo. The cotton genomes represent a...

  20. Rice genomics: current status of genome sequencing.

    PubMed

    Matsumoto, T; Wu, J; Baba, T; Katayose, Y; Yamamoto, K; Sakata, K; Yano, M; Sasaki, T

    2001-01-01

    Since its establishment in 1991, the Rice Genome Research Program (RGP) has produced some basic tools for rice genome analysis, including a cDNA catalogue, a genetic linkage map and a yeast artificial chromosome (YAC)-based physical map. For the further development of rice genomics, RGP launched in 1998 an international collaborative project on rice genome sequencing. A P1-derived artificial chromosome (PAC)-based, sequence-ready physical map has been constructed using the PCR markers from cDNA sequences (expressed sequence tag [EST] markers). Selected PAC clones with 100-150 kb inserts from chromosomes 1 and 6 have been subjected to shotgun sequencing. The assembled genomic sequences, after predicting the gene-coding region, have been published both through a public database and through our website. As of January 2000, 1.9 Mb from 13 PAC clones were published. Future prospects for understanding rice genomic information at the nucleotide level are discussed. PMID:11387985

  1. Operations capability improvement of a molecular biology laboratory in a high throughput genome sequencing center

    E-print Network

    Vokoun, Matthew R. (Matthew Richard)

    2005-01-01

    The Broad Institute is a research collaboration of MIT, Harvard University and affiliated hospitals, and the Whitehead Institute for Biomedical Research. Its scientific mission is to "(1) create tools for genomic medicine ...

  2. Advances in plant genome sequencing.

    PubMed

    Hamilton, John P; Buell, C Robin

    2012-04-01

    The study of plant biology in the 21st century is, and will continue to be, vastly different from that in the 20th century. One driver for this has been the use of genomics methods to reveal the genetic blueprints for not one but dozens of plant species, as well as resolving genome differences in thousands of individuals at the population level. Genomics technology has advanced substantially since publication of the first plant genome sequence, that of Arabidopsis thaliana, in 2000. Plant genomics researchers have readily embraced new algorithms, technologies and approaches to generate genome, transcriptome and epigenome datasets for model and crop species that have permitted deep inferences into plant biology. Challenges in sequencing any genome include ploidy, heterozygosity and paralogy, all which are amplified in plant genomes compared to animal genomes due to the large genome sizes, high repetitive sequence content, and rampant whole- or segmental genome duplication. The ability to generate de novo transcriptome assemblies provides an alternative approach to bypass these complex genomes and access the gene space of these recalcitrant species. The field of genomics is driven by technological improvements in sequencing platforms; however, software and algorithm development has lagged behind reductions in sequencing costs, improved throughput, and quality improvements. It is anticipated that sequencing platforms will continue to improve the length and quality of output, and that the complementary algorithms and bioinformatic software needed to handle large, repetitive genomes will improve. The future is bright for an exponential improvement in our understanding of plant biology. PMID:22449051

  3. Genome Sequence Databases (Overview): Sequencing and Assembly

    SciTech Connect

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  4. Fungal Genome Sequencing and Bioenergy

    SciTech Connect

    Baker, Scott E.; Thykaer, Jette; Adney, William S.; Brettin, T.; Brockman, Fred J.; D'haeseleer, Patrik; Martinez, Antonio D.; Miller, R. M.; Rokhsar, Daniel S.; Schadt, Christopher W.; Torok, Tamas; Tuskan, Gerald; Bennett, Joan W.; Berka, Randy; Briggs, Steve; Heitman, Joseph; Taylor, John; Turgeon, Barbara G.; Werner-Washburne, Maggie; Himmel, Michael E.

    2008-09-30

    To date, the number of ongoing filamentous fungal genome sequencing projects is almost tenfold fewer than those of bacterial and archaeal genome projects. The fungi chosen for sequencing represent narrow kingdom diversity; most are pathogens or models. We advocate an ambitious, forward-looking phylogenetic-based genome sequencing program, designed to capture metabolic diversity within the fungal kingdom, thereby enhancing research into alternative bioenergy sources, bioremediation, and fungal-environment interactions.

  5. Microbial genome sequencing and pathogenesis.

    PubMed

    Tang, C M; Hood, D W; Moxon, E R

    1998-02-01

    The year 1997 saw the publication of the complete nucleotide sequence of Helicobacter pylori and Escherichia coli. It is conceivable that the complete nucleotide sequence for all the major human bacterial pathogens will be available by the end of the century. Database alignments have been used to ascribe the putative functions of open reading frames in the sequenced isolates and to define the differences between bacterial species at the nucleotide level. The most striking finding from all genome projects has been the high proportion of open reading frames that have no known function. Experimental data demonstrating the utility of the genome sequencing projects are only just beginning to emerge. PMID:10066467

  6. The complete sequence of a heterochromatic island from a higher eukaryote. The Cold Spring Harbor Laboratory, Washington University Genome Sequencing Center, and PE Biosystems Arabidopsis Sequencing Consortium.

    PubMed

    2000-02-01

    Heterochromatin, constitutively condensed chromosomal material, is widespread among eukaryotes but incompletely characterized at the nucleotide level. We have sequenced and analyzed 2.1 megabases (Mb) of Arabidopsis thaliana chromosome 4 that includes 0.5-0.7 Mb of isolated heterochromatin that resembles the chromosomal knobs described by Barbara McClintock in maize. This isolated region has a low density of expressed genes, low levels of recombination and a low incidence of genetrap insertion. Satellite repeats were absent, but tandem arrays of long repeats and many transposons were found. Methylation of these sequences was dependent on chromatin remodeling. Clustered repeats were associated with condensed chromosomal domains elsewhere. The complete sequence of a heterochromatic island provides an opportunity to study sequence determinants of chromosome condensation. PMID:10676819

  7. Sequencing and mapping of the onion genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The cost of DNA sequencing continues to decline and, in the near future, it will become reasonable to undertake sequencing of the enormous nuclear genome of onion. We undertook sequencing of expressed and genomic regions of the onion genome to learn about the structure of the onion genome, as well a...

  8. Sequencing Complex Genomic Regions

    SciTech Connect

    Eichler, Evan

    2009-05-28

    Evan Eichler, Howard Hughes Medical Investigator at the University of Washington, gives the May 28, 2009 keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM. Part 2 of 2

  9. Sequencing Complex Genomic Regions

    SciTech Connect

    Eichler, Evan

    2009-05-28

    Evan Eichler, Howard Hughes Medical Investigator at the University of Washington, gives the May 28, 2009 keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM. Part 1 of 2

  10. Fuzzy Genome Sequence Assembly for Single and Environmental Genomes

    E-print Network

    Nicolescu, Monica

    and to the first genome sequence as- sembly, Bacteriophage X174 [38]. In 1990 the Human Genome Project in 2003, two years before its projected date. #12;2 Sara Nasser, et al In 1993 The Institute for Genome advancements in technology that lead the to complete sequencing of the Human Genome and the H. influenzae

  11. Challenges of sequencing human genomes

    PubMed Central

    Ding, Li; Mardis, Elaine R.; Wilson, Richard K.

    2010-01-01

    Massively parallel sequencing technologies continue to alter the study of human genetics. As the cost of sequencing declines, next-generation sequencing (NGS) instruments and datasets will become increasingly accessible to the wider research community. Investigators are understandably eager to harness the power of these new technologies. Sequencing human genomes on these platforms, however, presents numerous production and bioinformatics challenges. Production issues like sample contamination, library chimaeras and variable run quality have become increasingly problematic in the transition from technology development lab to production floor. Analysis of NGS data, too, remains challenging, particularly given the short-read lengths (35–250 bp) and sheer volume of data. The development of streamlined, highly automated pipelines for data analysis is critical for transition from technology adoption to accelerated research and publication. This review aims to describe the state of current NGS technologies, as well as the strategies that enable NGS users to characterize the full spectrum of DNA sequence variation in humans. PMID:20519329

  12. Comparative Sequencing of Plant Genomes: Choices to Make The first sequenced genome of a plant,

    E-print Network

    Purugganan, Michael D.

    COMMENTARY Comparative Sequencing of Plant Genomes: Choices to Make The first sequenced genome of a plant, Arabidopsis thaliana, was published ,6 years ago (Arabidopsis Genome Initiative, 2000). Since Information Entrez Genome Projects website reports that sequencing of several more plant genomes is in prog

  13. Sequencing crop genomes: approaches and applications

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Plant genome sequencing methodology parrallels the sequencing of the human genome. The first projects were slow and very expensive. BAC by BAC approaches were utilized first and whole-genome shotgun sequencing rapidly replaced that approach. So called 'next generation' technologies such as short rea...

  14. Almost finished: the complete genome sequence of Mycosphaerella graminicola

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Mycosphaerella graminicola causes septoria tritici blotch of wheat. An 8.9x shotgun sequence of bread wheat strain IPO323 was generated through the Community Sequencing Program of the U.S. Department of Energy’s Joint Genome Institute (JGI), and was finished at the Stanford Human Genome Center. The ...

  15. Value of a newly sequenced bacterial genome

    PubMed Central

    Barbosa, Eudes GV; Aburjaile, Flavia F; Ramos, Rommel TJ; Carneiro, Adriana R; Le Loir, Yves; Baumbach, Jan; Miyoshi, Anderson; Silva, Artur; Azevedo, Vasco

    2014-01-01

    Next-generation sequencing (NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft (partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the “scientific value” of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information. PMID:24921006

  16. Sequencing Centers Panel at SFAF

    SciTech Connect

    Schilkey, Faye; Ali, Johar; Grafham, Darren; Muzny, Donna; Fulton, Bob; Fitzgerald, Mike; Hostetler, Jessica; Daum, Chris

    2010-06-02

    From left to right: Faye Schilkey of NCGR, Johar Ali of OICR, Darren Grafham of Wellcome Trust Sanger Institute, Donna Muzny of the Baylor College of Medicine, Bob Fulton of Washington University, Mike Fitzgerald of the Broad Institute, Jessica Hostetler of the J. Craig Venter Institute and Chris Daum of the DOE Joint Genome Institute discuss sequencing technologies, applications and pipelines on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  17. The fungal genome initiative and lessons learned from genome sequencing.

    PubMed

    Cuomo, Christina A; Birren, Bruce W

    2010-01-01

    The sequence of Saccharomyces cerevisiae enabled systematic genome-wide experimental approaches, demonstrating the power of having the complete genome of an organism. The rapid impact of these methods on research in yeast mobilized an effort to expand genomic resources for other fungi. The "fungal genome initiative" represents an organized genome sequencing effort to promote comparative and evolutionary studies across the fungal kingdom. Through such an approach, scientists can not only better understand specific organisms but also illuminate the shared and unique aspects of fungal biology that underlie the importance of fungi in biomedical research, health, food production, and industry. To date, assembled genomes for over 100 fungi are available in public databases, and many more sequencing projects are underway. Here, we discuss both examples of findings from comparative analysis of fungal sequences, with a specific emphasis on yeast genomes, and on the analytical approaches taken to mine fungal genomes. New sequencing methods are accelerating comparative studies of fungi by reducing the cost and difficulty of sequencing. This has driven more common use of sequencing applications, such as to study genome-wide variation in populations or to deeply profile RNA transcripts. These and further technological innovations will continue to be piloted in yeasts and other fungi, and will expand the applications of sequencing to study fungal biology. PMID:20946837

  18. Sequencing a Genome by Walking With Clone-end Sequences

    E-print Network

    Sequencing a Genome by Walking With Clone-end Sequences: A Mathematical Analysis Serafim Batzoglou-insert clones (such as bacterial artificial chromosomes (BACs)) and then (ii) to take successive 'walking' steps by selecting and sequencing minimally overlapping clones, using information such as clone-end sequences

  19. Towards a reference pecan genome sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The cost of generating DNA sequence data has declined dramatically over the previous 15 years as a result of the Human Genome Project and the potential applications of genome sequencing for human medicine. This cost reduction has generated renewed interest among crop breeding scientists in applying...

  20. Complete genome sequence of ‘Candidatus Liberibacter africanus’

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The complete genome sequence of ‘Candidatus Liberibacter africanus’ (Laf), strain ptsapsy, was obtained by an Illumina HiSeq 2000. The Laf genome comprises 1,192,232 nucleotides, 34.5% GC content, 1,141 predicted coding sequences, 44 tRNAs, 3 complete copies of ribosomal RNA genes (16S, 23S and 5S) ...

  1. The First Myriapod Genome Sequence Reveals Conservative Arthropod Gene Content and Genome

    E-print Network

    Jiggins, Francis

    Andrews, Fife, United Kingdom, 3 Department of Zoology, University of Cambridge, Cambridge, United Kingdom, 4 Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Biochemistry, University of Cambridge, Cambridge, United Kingdom, 13 Evolutionsbiologie, Zoologisches Institut

  2. Human Genome Sequencing in Health and Disease

    PubMed Central

    Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.

    2013-01-01

    Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320

  3. Twenty years of bacterial genome sequencing.

    PubMed

    Loman, Nicholas J; Pallen, Mark J

    2015-12-01

    Twenty years ago, the publication of the first bacterial genome sequence, from Haemophilus influenzae, shook the world of bacteriology. In this Timeline, we review the first two decades of bacterial genome sequencing, which have been marked by three revolutions: whole-genome shotgun sequencing, high-throughput sequencing and single-molecule long-read sequencing. We summarize the social history of sequencing and its impact on our understanding of the biology, diversity and evolution of bacteria, while also highlighting spin-offs and translational impact in the clinic. We look forward to a 'sequencing singularity', where sequencing becomes the method of choice for as-yet unthinkable applications in bacteriology and beyond. PMID:26548914

  4. Genomic sequencing of Pleistocene cave bears

    SciTech Connect

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  5. The genome sequence of Drosophila melanogaster.

    SciTech Connect

    2000-03-24

    The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the {approximately}120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes {approximately}13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

  6. Genome sequence of Coxiella burnetii strain Namibia

    PubMed Central

    2014-01-01

    We present the whole genome sequence and annotation of the Coxiella burnetii strain Namibia. This strain was isolated from an aborting goat in 1991 in Windhoek, Namibia. The plasmid type QpRS was confirmed in our work. Further genomic typing placed the strain into a unique genomic group. The genome sequence is 2,101,438 bp long and contains 1,979 protein-coding and 51 RNA genes, including one rRNA operon. To overcome the poor yield from cell culture systems, an additional DNA enrichment with whole genome amplification (WGA) methods was applied. We describe a bioinformatics pipeline for improved genome assembly including several filters with a special focus on WGA characteristics. PMID:25593636

  7. Genome sequence of Coxiella burnetii strain Namibia.

    PubMed

    Walter, Mathias C; Öhrman, Caroline; Myrtennäs, Kerstin; Sjödin, Andreas; Byström, Mona; Larsson, Pär; Macellaro, Anna; Forsman, Mats; Frangoulidis, Dimitrios

    2014-01-01

    We present the whole genome sequence and annotation of the Coxiella burnetii strain Namibia. This strain was isolated from an aborting goat in 1991 in Windhoek, Namibia. The plasmid type QpRS was confirmed in our work. Further genomic typing placed the strain into a unique genomic group. The genome sequence is 2,101,438 bp long and contains 1,979 protein-coding and 51 RNA genes, including one rRNA operon. To overcome the poor yield from cell culture systems, an additional DNA enrichment with whole genome amplification (WGA) methods was applied. We describe a bioinformatics pipeline for improved genome assembly including several filters with a special focus on WGA characteristics. PMID:25593636

  8. Streptococcal taxonomy based on genome sequence analyses

    PubMed Central

    2013-01-01

    The identification of the clinically relevant viridans streptococci group, at species level, is still problematic. The aim of this study was to extract taxonomic information from the complete genome sequences of 67 streptococci, comprising 19 species, by means of genomic analyses, multilocus sequence analysis (MLSA), average amino acid identity (AAI), genomic signatures, genome-to-genome distances (GGD) and codon usage bias. We then attempted to determine the usefulness of these genomic tools for species identification in streptococci. Our results showed that MLSA, AAI and GGD analyses are robust markers to identify streptococci at the species level, for instance, S. pneumoniae, S. mitis, and S. oralis. A Streptococcus species can be defined as a group of strains that share ? 95% DNA similarity in MLSA and AAI, and > 70% DNA identity in GGD. This approach allows an advanced understanding of bacterial diversity. PMID:24358875

  9. Draft Genome Sequences of the Onion Center Rot Pathogen Pantoea ananatis PA4 and Maize Brown Stalk Rot Pathogen P. ananatis BD442

    PubMed Central

    Weller-Stuart, Tania; Chan, Wai Yin; Venter, Stephanus N.; Smits, Theo H. M.; Duffy, Brion; Goszczynska, Teresa; Cowan, Don A.; de Maayer, Pieter

    2014-01-01

    Pantoea ananatis is an emerging phytopathogen that infects a broad spectrum of plant hosts. Here, we present the genomes of two South African isolates, P. ananatis PA4, which causes center rot of onion, and BD442, isolated from brown stalk rot of maize. PMID:25103759

  10. Genome sequence and analysis of Lactobacillus helveticus

    PubMed Central

    Cremonesi, Paola; Chessa, Stefania; Castiglioni, Bianca

    2013-01-01

    The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of Lactobacillus helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE) inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract. As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones. PMID:23335916

  11. Sequencing and comparing whole mitochondrial genomes ofanimals

    SciTech Connect

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  12. Thirteen Camellia chloroplast genome sequences determined by high-throughput sequencing: genome structure and phylogenetic relationships

    PubMed Central

    2014-01-01

    Background Camellia is an economically and phylogenetically important genus in the family Theaceae. Owing to numerous hybridization and polyploidization, it is taxonomically and phylogenetically ranked as one of the most challengingly difficult taxa in plants. Sequence comparisons of chloroplast (cp) genomes are of great interest to provide a robust evidence for taxonomic studies, species identification and understanding mechanisms that underlie the evolution of the Camellia species. Results The eight complete cp genomes and five draft cp genome sequences of Camellia species were determined using Illumina sequencing technology via a combined strategy of de novo and reference-guided assembly. The Camellia cp genomes exhibited typical circular structure that was rather conserved in genomic structure and the synteny of gene order. Differences of repeat sequences, simple sequence repeats, indels and substitutions were further examined among five complete cp genomes, representing a wide phylogenetic diversity in the genus. A total of fifteen molecular markers were identified with more than 1.5% sequence divergence that may be useful for further phylogenetic analysis and species identification of Camellia. Our results showed that, rather than functional constrains, it is the regional constraints that strongly affect sequence evolution of the cp genomes. In a substantial improvement over prior studies, evolutionary relationships of the section Thea were determined on basis of phylogenomic analyses of cp genome sequences. Conclusions Despite a high degree of conservation between the Camellia cp genomes, sequence variation among species could still be detected, representing a wide phylogenetic diversity in the genus. Furthermore, phylogenomic analysis was conducted using 18 complete cp genomes and 5 draft cp genome sequences of Camellia species. Our results support Chang’s taxonomical treatment that C. pubicosta may be classified into sect. Thea, and indicate that taxonomical value of the number of ovaries should be reconsidered when classifying the Camellia species. The availability of these cp genomes provides valuable genetic information for accurately identifying species, clarifying taxonomy and reconstructing the phylogeny of the genus Camellia. PMID:25001059

  13. Using the Potato Genome Sequence! Robin Buell!

    E-print Network

    Douches, David S.

    Using the Potato Genome Sequence! Robin Buell! Michigan State University! Department of Plant;· RHPOTKEY BAC library (78000 clones; 9-10 g.e.) · Library clones fingerprintedGen Sequencing? 10 #12;Initial Strategy heterozygous clone (RH89-039-16) Contig assembly

  14. Complete genome sequence of trivittatus virus.

    PubMed

    Groseth, Allison; Vine, Veronica; Weisend, Carla; Ebihara, Hideki

    2015-10-01

    Trivittatus virus (family Bunyaviridae, genus Orthobunyavirus) represents an important genetic intermediate between the California encephalitis group and the Bwamba/Pongola and Nyando groups. Here, we report the first complete genome sequence of the prototype (Eklund) strain, isolated in 1948, which, interestingly, shows only a few differences when compared to partial sequences of modern strains. PMID:26212363

  15. Center for Cancer Genomics Launches New Website

    Cancer.gov

    CCG was established to unify NCI’s activities in cancer genomics, with the goal of advancing genomics research and translating findings into the clinic to improve the precise diagnosis and treatment of cancers. In addition to promoting genomic sequencing approaches, CCG aims to accelerate structural, functional and computational research to explore cancer mechanisms, discover new cancer targets, and develop new therapeutics.

  16. Rhipicephalus (Boophilus) microplus strain Deutsch, whole genome shotgun sequencing project first submission of genome sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence difficult. Cot filtration/selection techniques were used to reduce the repetitive fraction of the tick genome and enrich for the fraction of DNA with gene-containing regions. The Cot-selected ...

  17. Genome Sequence of the Palaeopolyploid soybean

    SciTech Connect

    Schmutz, Jeremy; Cannon, Steven B.; Schlueter, Jessica; Ma, Jianxin; Mitros, Therese; Nelson, William; Hyten, David L.; Song, Qijian; Thelen, Jay J.; Cheng, Jianlin; Xu, Dong; Hellsten, Uffe; May, Gregory D.; Yu, Yeisoo; Sakura, Tetsuya; Umezawa, Taishi; Bhattacharyya, Madan K.; Sandhu, Devinder; Valliyodan, Babu; Lindquist, Erika; Peto, Myron; Grant, David; Shu, Shengqiang; Goodstein, David; Barry, Kerrie; Futrell-Griggs, Montona; Abernathy, Brian; Du, Jianchang; Tian, Zhixi; Zhu, Liucun; Gill, Navdeep; Joshi, Trupti; Libault, Marc; Sethuraman, Anand; Zhang, Xue-Cheng; Shinozaki, Kazuo; Nguyen, Henry T.; Wing, Rod A.; Cregan, Perry; Specht, James; Grimwood, Jane; Rokhsar, Dan; Stacey, Gary; Shoemaker, Randy C.; Jackson, Scott A.

    2009-08-03

    Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70percent more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78percent of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75percent of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

  18. Genomic Sequencing of Single Microbial Cells from Environmental Samples

    SciTech Connect

    Ishoey, Thomas; Woyke, Tanja; Stepanauskas, Ramunas; Novotny, Mark; Lasken, Roger S.

    2008-02-01

    Recently developed techniques allow genomic DNA sequencing from single microbial cells [Lasken RS: Single-cell genomic sequencing using multiple displacement amplification, Curr Opin Microbiol 2007, 10:510-516]. Here, we focus on research strategies for putting these methods into practice in the laboratory setting. An immediate consequence of single-cell sequencing is that it provides an alternative to culturing organisms as a prerequisite for genomic sequencing. The microgram amounts of DNA required as template are amplified from a single bacterium by a method called multiple displacement amplification (MDA) avoiding the need to grow cells. The ability to sequence DNA from individual cells will likely have an immense impact on microbiology considering the vast numbers of novel organisms, which have been inaccessible unless culture-independent methods could be used. However, special approaches have been necessary to work with amplified DNA. MDA may not recover the entire genome from the single copy present in most bacteria. Also, some sequence rearrangements can occur during the DNA amplification reaction. Over the past two years many research groups have begun to use MDA, and some practical approaches to single-cell sequencing have been developed. We review the consensus that is emerging on optimum methods, reliability of amplified template, and the proper interpretation of 'composite' genomes which result from the necessity of combining data from several single-cell MDA reactions in order to complete the assembly. Preferred laboratory methods are considered on the basis of experience at several large sequencing centers where >70% of genomes are now often recovered from single cells. Methods are reviewed for preparation of bacterial fractions from environmental samples, single-cell isolation, DNA amplification by MDA, and DNA sequencing.

  19. Using comparative genomics to reorder the human genome sequence into a virtual sheep genome

    PubMed Central

    Dalrymple, Brian P; Kirkness, Ewen F; Nefedov, Mikhail; McWilliam, Sean; Ratnakumar, Abhirami; Barris, Wes; Zhao, Shaying; Shetty, Jyoti; Maddox, Jillian F; O'Grady, Margaret; Nicholas, Frank; Crawford, Allan M; Smith, Tim; de Jong, Pieter J; McEwan, John; Oddy, V Hutton; Cockett, Noelle E

    2007-01-01

    Background Is it possible to construct an accurate and detailed subgene-level map of a genome using bacterial artificial chromosome (BAC) end sequences, a sparse marker map, and the sequences of other genomes? Results A sheep BAC library, CHORI-243, was constructed and the BAC end sequences were determined and mapped with high sensitivity and low specificity onto the frameworks of the human, dog, and cow genomes. To maximize genome coverage, the coordinates of all BAC end sequence hits to the cow and dog genomes were also converted to the equivalent human genome coordinates. The 84,624 sheep BACs (about 5.4-fold genome coverage) with paired ends in the correct orientation (tail-to-tail) and spacing, combined with information from sheep BAC comparative genome contigs (CGCs) built separately on the dog and cow genomes, were used to construct 1,172 sheep BAC-CGCs, covering 91.2% of the human genome. Clustered non-tail-to-tail and outsize BACs located close to the ends of many BAC-CGCs linked BAC-CGCs covering about 70% of the genome to at least one other BAC-CGC on the same chromosome. Using the BAC-CGCs, the intrachromosomal and interchromosomal BAC-CGC linkage information, human/cow and vertebrate synteny, and the sheep marker map, a virtual sheep genome was constructed. To identify BACs potentially located in gaps between BAC-CGCs, an additional set of 55,668 sheep BACs were positioned on the sheep genome with lower confidence. A coordinate conversion process allowed us to transfer human genes and other genome features to the virtual sheep genome to display on a sheep genome browser. Conclusion We demonstrate that limited sequencing of BACs combined with positioning on a well assembled genome and integrating locations from other less well assembled genomes can yield extensive, detailed subgene-level maps of mammalian genomes, for which genomic resources are currently limited. PMID:17663790

  20. Locus Reference Genomic: reference sequences for the reporting of clinically relevant sequence variants

    PubMed Central

    MacArthur, Jacqueline A. L.; Morales, Joannella; Tully, Ray E.; Astashyn, Alex; Gil, Laurent; Bruford, Elspeth A.; Larsson, Pontus; Flicek, Paul; Dalgleish, Raymond; Maglott, Donna R.; Cunningham, Fiona

    2014-01-01

    Locus Reference Genomic (LRG; http://www.lrg-sequence.org/) records contain internationally recognized stable reference sequences designed specifically for reporting clinically relevant sequence variants. Each LRG is contained within a single file consisting of a stable ‘fixed’ section and a regularly updated ‘updatable’ section. The fixed section contains stable genomic DNA sequence for a genomic region, essential transcripts and proteins for variant reporting and an exon numbering system. The updatable section contains mapping information, annotation of all transcripts and overlapping genes in the region and legacy exon and amino acid numbering systems. LRGs provide a stable framework that is vital for reporting variants, according to Human Genome Variation Society (HGVS) conventions, in genomic DNA, transcript or protein coordinates. To enable translation of information between LRG and genomic coordinates, LRGs include mapping to the human genome assembly. LRGs are compiled and maintained by the National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI). LRG reference sequences are selected in collaboration with the diagnostic and research communities, locus-specific database curators and mutation consortia. Currently >700 LRGs have been created, of which >400 are publicly available. The aim is to create an LRG for every locus with clinical implications. PMID:24285302

  1. Sequencing and analysis of a genomic fragment provide an insight into the Dunaliella viridis genomic sequence.

    PubMed

    Sun, Xiao-Ming; Tang, Yuan-Ping; Meng, Xiang-Zong; Zhang, Wen-Wen; Li, Shan; Deng, Zhi-Rui; Xu, Zheng-Kai; Song, Ren-Tao

    2006-11-01

    Dunaliella is a genus of wall-less unicellular eukaryotic green alga. Its exceptional resistances to salt and various other stresses have made it an ideal model for stress tolerance study. However, very little is known about its genome and genomic sequences. In this study, we sequenced and analyzed a 29,268 bp genomic fragment from Dunaliella viridis. The fragment showed low sequence homology to the GenBank database. At the nucleotide level, only a segment with significant sequence homology to 18S rRNA was found. The fragment contained six putative genes, but only one gene showed significant homology at the protein level to GenBank database. The average GC content of this sequence was 51.1%, which was much lower than that of close related green algae Chlamydomonas (65.7%). Significant segmental duplications were found within this fragment. The duplicated sequences accounted for about 35.7% of the entire region. Large amounts of simple sequence repeats (microsatellites) were found, with strong bias towards (AC)(n) type (76%). Analysis of other Dunaliella genomic sequences in the GenBank database (total 25,749 bp) was in agreement with these findings. These sequence features made it difficult to sequence Dunaliella genomic sequences. Further investigation should be made to reveal the biological significance of these unique sequence features. PMID:17091199

  2. Finishing the euchromatic sequence of the human genome

    E-print Network

    Brutlag, Doug

    foundation for biomedical research in the decades ahead. The Human Genome Project (HGP) was launched in 1990Finishing the euchromatic sequence of the human genome International Human Genome Sequencing ........................................................................................................................................................................................................................... The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich

  3. Genome sequence of the Brown Norway rat yields insights into

    E-print Network

    Payseur, Bret

    Genome sequence of the Brown Norway rat yields insights into mammalian evolution Rat Genome, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality `draft' covering over 90% of the genome

  4. Complete Genome Sequence Analysis of Bacillus subtilis T30.

    PubMed

    Xu, Shuang-Yong; Boitano, Matthew; Clark, Tyson A; Vincze, Tamas; Fomenkov, Alexey; Kumar, Sanjay; Too, Priscilla Hiu-Mei; Gonchar, Danila; Degtyarev, Sergey K; Roberts, Richard J

    2015-01-01

    The complete genome sequence of Bacillus subtilis T30 was determined by SMRT sequencing. The entire genome contains 4,138 predicted genes. The genome carries one intact prophage sequence (37.4 kb) similar to Bacillus phage SPBc2 and one incomplete prophage genome of 39.9 kb similar to Bacillus phage phi105. PMID:25953183

  5. Complete Genome Sequence Analysis of Bacillus subtilis T30

    PubMed Central

    Boitano, Matthew; Clark, Tyson A.; Vincze, Tamas; Fomenkov, Alexey; Kumar, Sanjay; Too, Priscilla Hiu-Mei; Gonchar, Danila; Degtyarev, Sergey K.

    2015-01-01

    The complete genome sequence of Bacillus subtilis T30 was determined by SMRT sequencing. The entire genome contains 4,138 predicted genes. The genome carries one intact prophage sequence (37.4 kb) similar to Bacillus phage SPBc2 and one incomplete prophage genome of 39.9 kb similar to Bacillus phage phi105. PMID:25953183

  6. An International Plan to Sequence the Onion Genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The cost of DNA sequencing continues to decline and, in the near future, it will become reasonable to undertake sequencing of the enormous nuclear genome of onion. We undertook sequencing of expressed and genomic regions of the onion genome to learn about the structure of the onion genome, as well a...

  7. Genome Sequence of Fusobacterium nucleatum Subspecies Polymorphum --a Genetically Tractable

    E-print Network

    Fox, George

    in dental plaque biofilms, and important in biofilm ecology and human infectious diseases. Dental plaque is a complex and dynamic microbial community that forms as a biofilm on teeth, and harbors more that 400,2,3 , Sarah K. Highlander1,3 * 1 Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas

  8. Microbial species delineation using whole genome sequences

    PubMed Central

    Varghese, Neha J.; Mukherjee, Supratim; Ivanova, Natalia; Konstantinidis, Konstantinos T.; Mavrommatis, Kostas; Kyrpides, Nikos C.; Pati, Amrita

    2015-01-01

    Increased sequencing of microbial genomes has revealed that prevailing prokaryotic species assignments can be inconsistent with whole genome information for a significant number of species. The long-standing need for a systematic and scalable species assignment technique can be met by the genome-wide Average Nucleotide Identity (gANI) metric, which is widely acknowledged as a robust measure of genomic relatedness. In this work, we demonstrate that the combination of gANI and the alignment fraction (AF) between two genomes accurately reflects their genomic relatedness. We introduce an efficient implementation of AF,gANI and discuss its successful application to 86.5M genome pairs between 13,151 prokaryotic genomes assigned to 3032 species. Subsequently, by comparing the genome clusters obtained from complete linkage clustering of these pairs to existing taxonomy, we observed that nearly 18% of all prokaryotic species suffer from anomalies in species definition. Our results can be used to explore central questions such as whether microorganisms form a continuum of genetic diversity or distinct species represented by distinct genetic signatures. We propose that this precise and objective AF,gANI-based species definition: the MiSI (Microbial Species Identifier) method, be used to address previous inconsistencies in species classification and as the primary guide for new taxonomic species assignment, supplemented by the traditional polyphasic approach, as required. PMID:26150420

  9. Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genomics research in mammals has produced reference genome sequences that are essential for identifying variation associated with disease. High quality reference genome sequences are now available for humans, model species, and economically important agricultural animals. Comparisons between these s...

  10. Genome Sequence of Phytophthora ramorum: Implications for Management1

    E-print Network

    157 Genome Sequence of Phytophthora ramorum: Implications for Management1 Brett Tyler2 , Sucheta A draft genome sequence has been determined for Phytophthora ramorum, together with a draft sequence of the soybean pathogen Phytophthora sojae. The P. ramorum genome was sequenced to a depth of 7-fold coverage

  11. Standardized Metadata for Human Pathogen/Vector Genomic Sequences

    PubMed Central

    Dugan, Vivien G.; Emrich, Scott J.; Giraldo-Calderón, Gloria I.; Harb, Omar S.; Newman, Ruchi M.; Pickett, Brett E.; Schriml, Lynn M.; Stockwell, Timothy B.; Stoeckert, Christian J.; Sullivan, Dan E.; Singh, Indresh; Ward, Doyle V.; Yao, Alison; Zheng, Jie; Barrett, Tanya; Birren, Bruce; Brinkac, Lauren; Bruno, Vincent M.; Caler, Elizabet; Chapman, Sinéad; Collins, Frank H.; Cuomo, Christina A.; Di Francesco, Valentina; Durkin, Scott; Eppinger, Mark; Feldgarden, Michael; Fraser, Claire; Fricke, W. Florian; Giovanni, Maria; Henn, Matthew R.; Hine, Erin; Hotopp, Julie Dunning; Karsch-Mizrachi, Ilene; Kissinger, Jessica C.; Lee, Eun Mi; Mathur, Punam; Mongodin, Emmanuel F.; Murphy, Cheryl I.; Myers, Garry; Neafsey, Daniel E.; Nelson, Karen E.; Nierman, William C.; Puzak, Julia; Rasko, David; Roos, David S.; Sadzewicz, Lisa; Silva, Joana C.; Sobral, Bruno; Squires, R. Burke; Stevens, Rick L.; Tallon, Luke; Tettelin, Herve; Wentworth, David; White, Owen; Will, Rebecca; Wortman, Jennifer; Zhang, Yun; Scheuermann, Richard H.

    2014-01-01

    High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium’s minimal information (MIxS) and NCBI’s BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant. PMID:24936976

  12. Standardized metadata for human pathogen/vector genomic sequences.

    PubMed

    Dugan, Vivien G; Emrich, Scott J; Giraldo-Calderón, Gloria I; Harb, Omar S; Newman, Ruchi M; Pickett, Brett E; Schriml, Lynn M; Stockwell, Timothy B; Stoeckert, Christian J; Sullivan, Dan E; Singh, Indresh; Ward, Doyle V; Yao, Alison; Zheng, Jie; Barrett, Tanya; Birren, Bruce; Brinkac, Lauren; Bruno, Vincent M; Caler, Elizabet; Chapman, Sinéad; Collins, Frank H; Cuomo, Christina A; Di Francesco, Valentina; Durkin, Scott; Eppinger, Mark; Feldgarden, Michael; Fraser, Claire; Fricke, W Florian; Giovanni, Maria; Henn, Matthew R; Hine, Erin; Hotopp, Julie Dunning; Karsch-Mizrachi, Ilene; Kissinger, Jessica C; Lee, Eun Mi; Mathur, Punam; Mongodin, Emmanuel F; Murphy, Cheryl I; Myers, Garry; Neafsey, Daniel E; Nelson, Karen E; Nierman, William C; Puzak, Julia; Rasko, David; Roos, David S; Sadzewicz, Lisa; Silva, Joana C; Sobral, Bruno; Squires, R Burke; Stevens, Rick L; Tallon, Luke; Tettelin, Herve; Wentworth, David; White, Owen; Will, Rebecca; Wortman, Jennifer; Zhang, Yun; Scheuermann, Richard H

    2014-01-01

    High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS) and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant. PMID:24936976

  13. Mapping and sequencing the human genome

    SciTech Connect

    1988-01-01

    Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.

  14. Genomic Sequencing George M. Church, Walter Gilbert

    E-print Network

    Church, George M.

    at deoxycytidines, and nucleic acid- protein interactions at single nucleotide resolution. How can we visualize copies of articles, and you may use content in the JSTOR archive only for your personal, non ABSTRACT Unique DNA sequences can be determined directly from mouse genomic DNA. A denaturing gel separates

  15. Complete Genome Sequence of Treponema pallidum, the

    E-print Network

    Salzberg, Steven

    Complete Genome Sequence of Treponema pallidum, the Syphilis Spirochete Claire M. Fraser,* Steven J and substantiates the considerable di- versity observed among pathogenic spirochetes. Venereal syphilis was first century with the age of exploration. Syphilis was ubiquitous by the 19th century and has been called

  16. Genomes and evolution From sequence to organism

    E-print Network

    Patel, Nipam H.

    Genomes and evolution From sequence to organism Editorial overview Evan E Eichler and Nipam H Patel. Nipam H Patel Depts. of Integrative Biology and Molecular Cell Biology, University of California, 3060 VLSB #3140, Berkeley, CA 94720-3140, USA e-mail: nipam@uclink.berkeley.edu URL: http

  17. Gambling on a shortcut to genome sequencing

    SciTech Connect

    Roberts, L.

    1991-06-21

    Almost from the start of the Human Genome Project, a debate has been raging over whether to sequence the entire human genome, all 3 billion bases, or just the genes - a mere 2% or 3% of the genome, and by far the most interesting part. In England, Sydney Brenner convinced the Medical Research Council (MRC) to start with the expressed genes, or complementary DNAs. But the US stance has been that the entire sequence is essential if we are to understand the blueprint of man. Craig Venter of the National Institute of Neurological Disorders and Stroke says that focusing on the expressed genes may be even more useful than expected. His strategy involves randomly selecting clones from cDNA libraries which theoretically contain all the genes that are switched on at a particular time in a particular tissue. Then the researchers sequence just a short stretch of each clone, about 400 to 500 bases, to create can expressed sequence tag or EST. The sequences of these ESTs are then stored in a database. Using that information, other researchers can then recreate that EST by using polymerase chain reaction techniques.

  18. Whole-genome sequencing in bacteriology: state of the art

    PubMed Central

    Dark, Michael J

    2013-01-01

    Over the last ten years, genome sequencing capabilities have expanded exponentially. There have been tremendous advances in sequencing technology, DNA sample preparation, genome assembly, and data analysis. This has led to advances in a number of facets of bacterial genomics, including metagenomics, clinical medicine, bacterial archaeology, and bacterial evolution. This review examines the strengths and weaknesses of techniques in bacterial genome sequencing, upcoming technologies, and assembly techniques, as well as highlighting recent studies that highlight new applications for bacterial genomics. PMID:24143115

  19. Whole-genome sequencing in bacteriology: state of the art.

    PubMed

    Dark, Michael J

    2013-01-01

    Over the last ten years, genome sequencing capabilities have expanded exponentially. There have been tremendous advances in sequencing technology, DNA sample preparation, genome assembly, and data analysis. This has led to advances in a number of facets of bacterial genomics, including metagenomics, clinical medicine, bacterial archaeology, and bacterial evolution. This review examines the strengths and weaknesses of techniques in bacterial genome sequencing, upcoming technologies, and assembly techniques, as well as highlighting recent studies that highlight new applications for bacterial genomics. PMID:24143115

  20. Draft Genome Sequence of Mycobacterium arupense Strain GUC1

    PubMed Central

    Greninger, Alexander L.; Cunningham, Gail; Yu, Joanna M.; Hsu, Elaine D.; Chiu, Charles Y.

    2015-01-01

    We report the draft genome sequence of Mycobacterium arupense strain GUC1 from a sputum sample of a patient with bronchiectasis. This is the first draft genome sequence of Mycobacterium arupense, a rapidly growing nonchromogenic mycobacteria. PMID:26067970

  1. Genome sequencing and analysis of the model grass Brachypodium distachyon

    E-print Network

    Green, Pamela

    ARTICLES Genome sequencing and analysis of the model grass Brachypodium distachyon The International Brachypodium Initiative* Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our

  2. Letter to the Editor Toward Sequencing Cotton (Gossypium) Genomes

    E-print Network

    Chee, Peng W.

    Letter to the Editor Toward Sequencing Cotton (Gossypium) Genomes Despite rapidly decreasing costs complex ge- nomes de novo. The cotton (Gossypium spp.) genomes represent a challenging case. To this end, a coalition of cotton genome scientists has developed a strategy for sequencing the cotton genomes, which

  3. Comparative Analysis of Genome Sequences with VISTA

    DOE Data Explorer

    Dubchak, Inna

    VISTA is a comprehensive suite of programs and databases developed by and hosted at the Genomics Division of Lawrence Berkeley National Laboratory. They provide information and tools designed to facilitate comparative analysis of genomic sequences. Users have two ways to interact with the suite of applications at the VISTA portal. They can submit their own sequences and alignments for analysis (VISTA servers) or examine pre-computed whole-genome alignments of different species. A key menu option is the Enhancer Browser and Database at http://enhancer.lbl.gov/. The VISTA Enhancer Browser is a central resource for experimentally validated human noncoding fragments with gene enhancer activity as assessed in transgenic mice. Most of these noncoding elements were selected for testing based on their extreme conservation with other vertebrates. The results of this enhancer screen are provided through this publicly available website. The browser also features relevant results by external contributors and a large collection of additional genome-wide conserved noncoding elements which are candidate enhancer sequences. The LBL developers invite external groups to submit computational predictions of developmental enhancers. As of 10/19/2009 the database contains information on 1109 in vivo tested elements - 508 elements with enhancer activity.

  4. Whole genome sequence analysis of Mycobacterium suricattae.

    PubMed

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; van der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah Musa; Siame, Kabengele Keith; Gey van Pittius, Nicolaas Claudius; van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-12-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi. PMID:26542221

  5. Simple sequence repeats in prokaryotic genomes

    PubMed Central

    Mrázek, Jan; Guo, Xiangxue; Shah, Apurva

    2007-01-01

    Simple sequence repeats (SSRs) in DNA sequences are composed of tandem iterations of short oligonucleotides and may have functional and/or structural properties that distinguish them from general DNA sequences. They are variable in length because of slip-strand mutations and may also affect local structure of the DNA molecule or the encoded proteins. Long SSRs (LSSRs) are common in eukaryotes but rare in most prokaryotes. In pathogens, SSRs can enhance antigenic variance of the pathogen population in a strategy that counteracts the host immune response. We analyze representations of SSRs in >300 prokaryotic genomes and report significant differences among different prokaryotes as well as among different types of SSRs. LSSRs composed of short oligonucleotides (1–4 bp length, designated LSSR1–4) are often found in host-adapted pathogens with reduced genomes that are not known to readily survive in a natural environment outside the host. In contrast, LSSRs composed of longer oligonucleotides (5–11 bp length, designated LSSR5–11) are found mostly in nonpathogens and opportunistic pathogens with large genomes. Comparisons among SSRs of different lengths suggest that LSSR1–4 are likely maintained by selection. This is consistent with the established role of some LSSR1–4 in enhancing antigenic variance. By contrast, abundance of LSSR5–11 in some genomes may reflect the SSRs' general tendency to expand rather than their specific role in the organisms' physiology. Differences among genomes in terms of SSR representations and their possible interpretations are discussed. PMID:17485665

  6. Assessing the Costs and Cost-Effectiveness of Genomic Sequencing.

    PubMed

    Christensen, Kurt D; Dukhovny, Dmitry; Siebert, Uwe; Green, Robert C

    2015-01-01

    Despite dramatic drops in DNA sequencing costs, concerns are great that the integration of genomic sequencing into clinical settings will drastically increase health care expenditures. This commentary presents an overview of what is known about the costs and cost-effectiveness of genomic sequencing. We discuss the cost of germline genomic sequencing, addressing factors that have facilitated the decrease in sequencing costs to date and anticipating the factors that will drive sequencing costs in the future. We then address the cost-effectiveness of diagnostic and pharmacogenomic applications of genomic sequencing, with an emphasis on the implications for secondary findings disclosure and the integration of genomic sequencing into general patient care. Throughout, we ground the discussion by describing efforts in the MedSeq Project, an ongoing randomized controlled clinical trial, to understand the costs and cost-effectiveness of integrating whole genome sequencing into cardiology and primary care settings. PMID:26690481

  7. Complete Genome Sequences of 138 Mycobacteriophages

    PubMed Central

    2012-01-01

    Bacteriophages are the most numerous biological entities in the biosphere, and although their genetic diversity is high, it remains ill defined. Mycobacteriophages—the viruses of mycobacterial hosts—provide insights into this diversity as well as tools for manipulating Mycobacterium tuberculosis. We report here the complete genome sequences of 138 new mycobacteriophages, which—together with the 83 mycobacteriophages previously reported—represent the largest collection of phages known to infect a single common host, Mycobacterium smegmatis mc2 155. PMID:22282335

  8. Why Assembling Plant Genome Sequences Is So Challenging

    PubMed Central

    Claros, Manuel Gonzalo; Bautista, Rocío; Guerrero-Fernández, Darío; Benzerki, Hicham; Seoane, Pedro; Fernández-Pozo, Noé

    2012-01-01

    In spite of the biological and economic importance of plants, relatively few plant species have been sequenced. Only the genome sequence of plants with relatively small genomes, most of them angiosperms, in particular eudicots, has been determined. The arrival of next-generation sequencing technologies has allowed the rapid and efficient development of new genomic resources for non-model or orphan plant species. But the sequencing pace of plants is far from that of animals and microorganisms. This review focuses on the typical challenges of plant genomes that can explain why plant genomics is less developed than animal genomics. Explanations about the impact of some confounding factors emerging from the nature of plant genomes are given. As a result of these challenges and confounding factors, the correct assembly and annotation of plant genomes is hindered, genome drafts are produced, and advances in plant genomics are delayed. PMID:24832233

  9. Revolutionize your capacity to understand the scale of transcriptomic diversity one cell at a time. Duke Center for Genomic

    E-print Network

    Richardson, David

    at a time. GCB Duke Center for Genomic and Computational Biology Sequencing and Genomic Technologies Shared Resource Single Cell Analysis CONTACT US microarray@duke.edu URL: http://www.genome.duke.edu/cores-and-services on Illumina sequencing platforms and data can be analyzed by the Integrative Genomic Analysis Shared Resource

  10. CovarisTM S220 System can be used for multiple types of DNA shearing. Duke Center for Genomic

    E-print Network

    Richardson, David

    r CovarisTM S220 System can be used for multiple types of DNA shearing. GCB Duke Center for Genomic and Computational Biology Sequencing and Genomic Technologies Shared Resource Covaris Shearing CONTACT US microarray@duke.edu URL: http://www.genome.duke.edu/cores-and-services/sequencing- and-genomic-technologies Sample

  11. A comparison of virus genome sequences with their host silkworm, Bombyx mori.

    PubMed

    Tang, Xu-Dong; Yue, Ya-Jie; Wang, Wei; Li, Nan; Shen, Zhong-Yuan

    2016-01-15

    With the recent availability of the genomes of many viruses and the silkworm, Bombyx mori, as well as a variety of Basic Local Alignment Search Tool (BLAST) programs, a new opportunity to gain insight into the interaction of viruses with the silkworm is possible. This study aims to determine the possible existence of sequence identities between the genomes of viruses and the silkworm and attempts to explain this phenomenon. BLAST searches of the genomes of viruses against the silkworm genome were performed using the resources of the National Center for Biotechnology Information. All studied viruses contained variable numbers of short regions with sequence identity to the genome of the silkworm. The short regions of sequence identity in the genome of the silkworm may be derived from the genomes of viruses in the long history of silkworm-virus interaction. This study is the first to compare these genomes, and may contribute to research on the interaction between viruses and the silkworm. PMID:26432002

  12. GENOMIC DIVERGENCES AMONG CATTLE, DOG, AND HUMAN ESTIMATED FROM LARGE-SCALE ALIGNMENTS OF GENOMIC SEQUENCES

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We performed a detailed analysis of genomic divergences based on large-scale comparison of 11 Mb of genomic sequence from cattle, human and dog. Using human and dog genome assemblies as references, optimal 3-way global alignments were constructed for 84 cattle large (>50 kb) genomic sequence clones...

  13. Initial sequencing and comparative analysis of the mouse genome

    SciTech Connect

    Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F.; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E.; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R.; Brown, Daniel G.; Brown, Stephen D.; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D.; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T.; Church, Deanna M.; Clamp, Michele; Clee, Christopher; Collins, Francis S.; Cook, Lisa L.; Copley, Richard R.; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D.; Deri, Justin; Dermitzakis, Emmanouil T.; Dewey, Colin; Dickens, Nicholas J.; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M.; Eddy, Sean R.; Elnitski, Laura; Emes, Richard D.; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A.; Flicek, Paul; Foley, Karen; Frankel, Wayne N.; Fulton, Lucinda A.; Fulton, Robert S.; Furey, Terrence S.; Gage, Diane; Gibbs, Richard A.; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A.; Green, Eric D.; Gregory, Simon; Guigo, Roderic; Guyer, Mark; Hardison, Ross C.; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W.; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B.; Johnson, L. Steven; Jones, Matthew; Jones, Thomas A.; Joy, Ann; Kamal, Michael; Karlsson, Elinor K.; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W. James; Kirby, Andrew; Kolbe, Diana L.; Korf, Ian; Kucherlapati, Raju S.; Kulbokas III, Edward J.; Kulp, David; Landers, Tom; Leger, J.P.; Leonard, Steven; Letunic, Ivica; Levine, Rosie; et al.

    2002-12-15

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  14. Optimizing the BACEnd Strategy for Sequencing the Human Genome

    E-print Network

    Shamir, Ron

    University, Tel Aviv, 69978, Israel. 1 #12; 1 Introduction With the Human Genome Project moving from the map sequencing has become central. The classical strategy set forth by the founders of the Human Genome ProjectOptimizing the BAC­End Strategy for Sequencing the Human Genome Richard M. Karp \\Lambda Ron Shamir

  15. DATABASE Open Access Whole genome sequencing of peach (Prunus

    E-print Network

    Crisosto, Carlos H.

    DATABASE Open Access Whole genome sequencing of peach (Prunus persica L.) for SNP identification high frequency SNPs distributed throughout the peach genome is described. Three peach genomes were `Lovell' peach sequence as well as sufficient depth of coverage for `in silico' SNP discovery. Description

  16. Genome sequence of the Pea Aphid Acyrthosiphon pisum

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The International aphid genome consortium, IAGC, herein presents the 464 Mb draft genome assembly sequence of the pea aphid Acyrthosiphon pisum. This is the first published whole genome sequence from the diverse assemblage of hemimetabolous insects, providing an outgroup to the multiple published g...

  17. Genomic Sequence Is Highly Predictive of Local Nucleosome Depletion

    E-print Network

    Yuan, Guo-Cheng "GC"

    in human. Citation: Yuan G-C, Liu JS (2008) Genomic sequence is highly predictive of local nucleosome with nucleosome binding may offer valuable insight. In addition, as high-resolution mapping of genomeGenomic Sequence Is Highly Predictive of Local Nucleosome Depletion Guo-Cheng Yuan1,2,* , Jun S

  18. Complete Genome Sequence of the Embu Virus Strain SPAn880

    PubMed Central

    Antwerpen, Markus; Georgi, Enrico; Vette, Philipp; Zoeller, Gudrun; Meyer, Hermann

    2014-01-01

    We report the complete genome sequence of the Embu virus. The genome consists of 185,139 bp and is nearly identical to that of the Cotia virus. This is the first report on the Embu virus genome sequence, which has been considered an unclassified poxvirus until now. PMID:25477400

  19. Draft Genome Sequence of the Fungus Trametes hirsuta 072

    PubMed Central

    Tyazhelova, Tatiana V.; Moiseenko, Konstantin V.; Vasina, Daria V.; Mosunova, Olga V.; Fedorova, Tatiana V.; Maloshenok, Lilya G.; Landesman, Elena O.; Bruskin, Sergei A.; Psurtseva, Nadezhda V.; Slesarev, Alexei I.; Kozyavkin, Sergei A.; Koroleva, Olga V.

    2015-01-01

    A standard draft genome sequence of the white rot saprotrophic fungus Trametes hirsuta 072 (Basidiomycota, Polyporales) is presented. The genome sequence contains about 33.6 Mb assembled in 141 scaffolds with a G+C content of ~57.6%. The draft genome annotation predicts 14,598 putative protein-coding open reading frames (ORFs). PMID:26586872

  20. Complete genome sequence of bacteriophage T5.

    PubMed

    Wang, Jianbin; Jiang, Yan; Vincent, Myriam; Sun, Yongqiao; Yu, Hong; Wang, Jing; Bao, Qiyu; Kong, Huimin; Hu, Songnian

    2005-02-01

    The 121,752-bp genome sequence of bacteriophage T5 was determined; the linear, double-stranded DNA is nicked in one of the strands and has large direct terminal repeats of 10,139 bp (8.3%) at both ends. The genome structure is consistently arranged according to its lytic life cycle. Of the 168 potential open reading frames (ORFs), 61 were annotated; these annotated ORFs are mainly enzymes involved in phage DNA replication, repair, and nucleotide metabolism. At least five endonucleases that believed to help inducing nicks in T5 genomic DNA, and a DNA ligase gene was found to be split into two separate ORFs. Analysis of T5 early promoters suggests a probable motif AAA{3, 4 T}nTTGCTT{17, 18 n}TATAATA{12, 13 W}{10 R} for strong promoters that may strengthen the step modification of host RNA polymerase, and thus control transcription of phage DNA. The distinct protein domain profile and a mosaic genome structure suggest an origin from the common genetic pool. PMID:15661140

  1. Genomic Sequence Comparisons, 1987-2003 Final Report

    SciTech Connect

    George M. Church

    2004-07-29

    This project was to develop new DNA sequencing and RNA and protein quantitation methods and related genome annotation tools. The project began in 1987 with the development of multiplex sequencing (published in Science in 1988), and one of the first automated sequencing methods. This lead to the first commercial genome sequence in 1994 and to the establishment of the main commercial participants (GTC then Agencourt) in the public DOE/NIH genome project. In collaboration with GTC we contributed to one of the first complete DOE genome sequences, in 1997, that of Methanobacterium thermoautotropicum, a species of great relevance to energy-rich gas production.

  2. TAG Sequence Identification of Genomic Regions Using TAGdb.

    PubMed

    Ruperao, Pradeep

    2016-01-01

    Second-generation sequencing (SGS) technology has enabled the sequencing of genomes and identification of genes. However, large complex plant genomes remain particularly difficult for de novo assembly. Access to the vast quantity of raw sequence data may facilitate discoveries; however the volume of this data makes access difficult. This chapter discusses the Web-based tool TAGdb that enables researchers to identify paired read second-generation DNA sequence data that share identity with a submitted query sequence. The identified reads can be used for PCR amplification of genomic regions to identify genes and promoters without the need for genome assembly. PMID:26519409

  3. Complete genome sequence of Methanocorpusculum labreanum type strain Z

    SciTech Connect

    Anderson, Iain; Sieprawska-Lupa, Magdalena; Goltsman, Eugene; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Tice, Hope; Dalin, Eileen; Barry, Kerrie; Pitluck, Sam; Hauser, Loren John; Land, Miriam L; Lucas, Susan; Richardson, P M; Whitman, W. B.; Kyrpides, Nikos C

    2009-01-01

    Methanocorpusculum labreanum is a methanogen belonging to the order Methanomicrobiales within the archaeal phylum Euryarchaeota. The type strain Z was isolated from surface sediments of Tar Pit Lake in the La Brea Tar Pits in Los Angeles, California. M. labreanum is of phylogenetic interest because at the time the sequencing project began only one genome had previously been sequenced from the order Methanomicrobiales. We report here the complete genome sequence of M. labreanum type strain Z and its annotation. This is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.

  4. Complete genome sequence of Methanoculleus marisnigri type strain JR1

    SciTech Connect

    Anderson, Iain; Sieprawska-Lupa, Magdalena; Goltsman, Eugene; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Tice, Hope; Dalin, Eileen; Barry, Kerrie; Saunders, Elizabeth H; Han, Cliff; Brettin, Tom; Detter, J. Chris; Bruce, David; Mikhailova, Natalia; Pitluck, Sam; Hauser, Loren John; Land, Miriam L; Lucas, Susan; Richardson, P M; Whitman, W. B.; Kyrpides, Nikos C

    2009-01-01

    Methanoculleus marisnigri Romesser et al. 1981 is a methanogen belonging to the order Methanomicrobiales within the archaeal phylum Euryarchaeota. The type strain, JR1, was isolated from anoxic sediments of the Black Sea. M. marisnigri is of phylogenetic interest because at the time the sequencing project began only one genome had previously been sequenced from the order Methanomicrobiales. We report here the complete genome sequence of M. marisnigri type strain JR1 and its annotation. This is part of a Joint Genome Institute 2006 Community Sequencing Program to sequence genomes of diverse Archaea.

  5. Simple sequence repeats in bryophyte mitochondrial genomes.

    PubMed

    Zhao, Chao-Xian; Zhu, Rui-Liang; Liu, Yang

    2016-01-01

    Simple sequence repeats (SSRs) are thought to be common in plant mitochondrial (mt) genomes, but have yet to be fully described for bryophytes. We screened the mt genomes of two liverworts (Marchantia polymorpha and Pleurozia purpurea), two mosses (Physcomitrella patens and Anomodon rugelii) and two hornworts (Phaeoceros laevis and Nothoceros aenigmaticus), and detected 475 SSRs. Some SSRs are found conserved during the evolution, among which except one exists in both liverworts and mosses, all others are shared only by the two liverworts, mosses or hornworts. SSRs are known as DNA tracts having high mutation rates; however, according to our observations, they still can evolve slowly. The conservativeness of these SSRs suggests that they are under strong selection and could play critical roles in maintaining the gene functions. PMID:24491104

  6. Complete mitochondrial genome sequence of Nectogale elegans.

    PubMed

    Huang, Ting; Yan, Chaochao; Tan, Zheng; Tu, Feiyun; Yue, Bisong; Zhang, Xiuyue

    2014-08-01

    The elegant water shrew (Nectogale elegans) belongs to the family Soricidae, and distributes in northern South Asia, central and southern China and northern Southeast Asia. In this study, the complete mitochondrial genome of N. elegans was sequenced. It was determined to be 17,460 bases, and included 13 protein-coding genes (PCGs), 22 tRNA genes, 2 ribosomal RNA genes and one non-coding region, which is similar to other mammalian mitochondrial genomes. Bayesian inference and maximum likelihood methods were used to construct phylogenetic trees based on 12 heavy-strand concatenated PCGs. Phylogenetic analyses further confirmed that Crocidurinae diverged prior to Soricinae, and Sorex unguiculatus differentiated earlier than N. elegans. PMID:23795853

  7. Initial sequencing and comparative analysis of the mouse genome

    E-print Network

    Hardison, Ross C.

    and knockin techniques17­22 . For these and other reasons, the Human Genome Project (HGP) recognized from its ........................................................................................................................................................................................................................... The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from

  8. Genome, Epigenome and RNA sequences of Monozygotic Twins Discordant for Multiple Sclerosis

    SciTech Connect

    Miller, Neil

    2010-06-02

    Neil Miller, Deputy Director of Software Engineering at the National Center for Genome Resources, discusses a monozygotic twin study on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  9. Genome sequencing in microfabricated high-density picolitre reactors

    E-print Network

    using a pyrosequencing protocol optimized for solid support and picolitre-scale volumes. Here we show. Large- scale sequencing projects, including whole-genome sequencing, have usually required the cloning or capillary electrophoresis. Current estimates put the cost of sequencing a human genome between $10 million

  10. Genome Sequence of Stachybotrys chartarum Strain 51-11.

    PubMed

    Betancourt, Doris A; Dean, Timothy R; Kim, Jean; Levy, Josh

    2015-01-01

    The Stachybotrys chartarum strain 51-11 genome was sequenced by shotgun sequencing utilizing Illumina HiSeq 2000 and PacBio technologies. Since S. chartarum has been implicated as having health impacts within water-damaged buildings, any information extracted from the genomic sequence data relating to toxins or the metabolism of the fungus might be useful. PMID:26430036

  11. Next Generation Sequencing at the University of Chicago Genomics Core

    SciTech Connect

    Faber, Pieter

    2013-04-24

    The University of Chicago Genomics Core provides University of Chicago investigators (and external clients) access to State-of-the-Art genomics capabilities: next generation sequencing, Sanger sequencing / genotyping and micro-arrays (gene expression, genotyping, and methylation). The current presentation will highlight our capabilities in the area of ultra-high throughput sequencing analysis.

  12. Identification and annotation of repetitive sequences in fungal genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Cheaper and faster sequencing technologies have fundamentally changed the pace of genome sequencing projects and have contributed to the ever-increasing volume of genomic data. This has been paralleled by an increase in computational power and resources to process and translate raw sequence data int...

  13. Whole-Genome Chromatin IP Sequencing (ChIP-Seq)

    E-print Network

    Kopp, Artyom

    ILLUMINA® SEQUENCING Whole-Genome Chromatin IP Sequencing (ChIP-Seq) Illumina ChIP-Seq combines-associated proteins. Illumina ChIP-Seq technology precisely and cost-effectively maps global binding sites. The powerful Illumina Whole-Genome Chromatin IP Sequencing (ChIP-Seq) application allows researchers to easily

  14. Genome Sequence of Stachybotrys chartarum Strain 51-11

    PubMed Central

    Kim, Jean; Levy, Josh

    2015-01-01

    The Stachybotrys chartarum strain 51-11 genome was sequenced by shotgun sequencing utilizing Illumina HiSeq 2000 and PacBio technologies. Since S. chartarum has been implicated as having health impacts within water-damaged buildings, any information extracted from the genomic sequence data relating to toxins or the metabolism of the fungus might be useful. PMID:26430036

  15. Initial impact of the sequencing of the human genome

    E-print Network

    Massachusetts Institute of Technology. Department of Biology; Broad Institute of MIT and Harvard; Lander, Eric S.; Lander, Eric S.

    The sequence of the human genome has dramatically accelerated biomedical research. Here I explore its impact, in the decade since its publication, on our understanding of the biological functions encoded in the genome, on ...

  16. Current challenges in de novo plant genome sequencing and assembly

    PubMed Central

    2012-01-01

    Genome sequencing is now affordable, but assembling plant genomes de novo remains challenging. We assess the state of the art of assembly and review the best practices for the community. PMID:22546054

  17. Complete genome sequence of Arcanobacterium haemolyticum type strain (11018T)

    SciTech Connect

    Yasawong, Montri; Teshima, Hazuki; Lapidus, Alla L.; Nolan, Matt; Lucas, Susan; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Bruce, David; Detter, J. Chris; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Rohde, Manfred; Sikorski, Johannes; Pukall, Rudiger; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-01

    Vulcanisaeta distributa Itoh et al. 2002 belongs to the family Thermoproteaceae in the phylum Crenarchaeota. The genus Vulcanisaeta is characterized by a global distribution in hot and acidic springs. This is the first genome sequence from a member of the genus Vulcanisaeta and seventh genome sequence in the family Thermoproteaceae. The 2,374,137 bp long genome with its 2,544 protein-coding and 49 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  18. Draft Genome Sequences of Klebsiella variicola Plant Isolates

    PubMed Central

    Martínez-Romero, Esperanza; Silva-Sanchez, Jesús; Barrios, Humberto; Rodríguez-Medina, Nadia; Martínez-Barnetche, Jesús; Téllez-Sosa, Juan; Gómez-Barreto, Rosa Elena

    2015-01-01

    Three endophytic Klebsiella variicola isolates—T29A, 3, and 6A2, obtained from sugar cane stem, maize shoots, and banana leaves, respectively—were used for whole-genome sequencing. Here, we report the draft genome sequences of circular chromosomes and plasmids. The genomes contain plant colonization and cellulases genes. This study will help toward understanding the genomic basis of K. variicola interaction with plant hosts. PMID:26358599

  19. Reconstructing cancer genomes from paired-end sequencing data

    PubMed Central

    2012-01-01

    Background A cancer genome is derived from the germline genome through a series of somatic mutations. Somatic structural variants - including duplications, deletions, inversions, translocations, and other rearrangements - result in a cancer genome that is a scrambling of intervals, or "blocks" of the germline genome sequence. We present an efficient algorithm for reconstructing the block organization of a cancer genome from paired-end DNA sequencing data. Results By aligning paired reads from a cancer genome - and a matched germline genome, if available - to the human reference genome, we derive: (i) a partition of the reference genome into intervals; (ii) adjacencies between these intervals in the cancer genome; (iii) an estimated copy number for each interval. We formulate the Copy Number and Adjacency Genome Reconstruction Problem of determining the cancer genome as a sequence of the derived intervals that is consistent with the measured adjacencies and copy numbers. We design an efficient algorithm, called Paired-end Reconstruction of Genome Organization (PREGO), to solve this problem by reducing it to an optimization problem on an interval-adjacency graph constructed from the data. The solution to the optimization problem results in an Eulerian graph, containing an alternating Eulerian tour that corresponds to a cancer genome that is consistent with the sequencing data. We apply our algorithm to five ovarian cancer genomes that were sequenced as part of The Cancer Genome Atlas. We identify numerous rearrangements, or structural variants, in these genomes, analyze reciprocal vs. non-reciprocal rearrangements, and identify rearrangements consistent with known mechanisms of duplication such as tandem duplications and breakage/fusion/bridge (B/F/B) cycles. Conclusions We demonstrate that PREGO efficiently identifies complex and biologically relevant rearrangements in cancer genome sequencing data. An implementation of the PREGO algorithm is available at http://compbio.cs.brown.edu/software/. PMID:22537039

  20. Genome sequencing and annotation of Morganella sp. SA36

    PubMed Central

    Selim, Samy; Hassan, Sherif; Hagagy, Nashwa

    2015-01-01

    We report draft genome sequence of Morganella sp. Strain SA36, isolated from water spring in Aljouf region, Saudi Arabia. The draft genome size is 2,564,439 bp with a G + C content of 51.1% and contains 6 rRNA sequence (single copies of 5S, 16S & 23S rRNA). The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. LDNQ00000000.

  1. Complete Genome Sequence of Corynebacterium pseudotuberculosis Strain 12C

    PubMed Central

    Sousa, Thiago Jesus; Mariano, Diego; Parise, Doglas; Parise, Mariana; Viana, Marcus Vinicius Canário; Guimarães, Luis Carlos; Benevides, Leandro Jesus; Rocha, Flávia; Bagano, Priscilla; Ramos, Rommel; Silva, Artur; Figueiredo, Henrique; Almeida, Sintia

    2015-01-01

    We present here the complete genome sequence of Corynebacterium pseudotuberculosis strain 12C, isolated from a sheep abscess in the Brazil. The sequencing was performed with the Ion Torrent Personal Genome Machine (PGM) system, a fragment library, and a coverage of ~48-fold. The genome presented is a circular chromosome with 2,337,451 bp in length, 2,119 coding sequences, 12 rRNAs, 49 tRNAs, and a G+C content of 52.83%. PMID:26184935

  2. Rapid whole genome sequencing and precision neonatology.

    PubMed

    Petrikin, Joshua E; Willig, Laurel K; Smith, Laurie D; Kingsmore, Stephen F

    2015-12-01

    Traditionally, genetic testing has been too slow or perceived to be impractical to initial management of the critically ill neonate. Technological advances have led to the ability to sequence and interpret the entire genome of a neonate in as little as 26 h. As the cost and speed of testing decreases, the utility of whole genome sequencing (WGS) of neonates for acute and latent genetic illness increases. Analyzing the entire genome allows for concomitant evaluation of the currently identified 5588 single gene diseases. When applied to a select population of ill infants in a level IV neonatal intensive care unit, WGS yielded a diagnosis of a causative genetic disease in 57% of patients. These diagnoses may lead to clinical management changes ranging from transition to palliative care for uniformly lethal conditions for alteration or initiation of medical or surgical therapy to improve outcomes in others. Thus, institution of 2-day WGS at time of acute presentation opens the possibility of early implementation of precision medicine. This implementation may create opportunities for early interventional, frequently novel or off-label therapies that may alter disease trajectory in infants with what would otherwise be fatal disease. Widespread deployment of rapid WGS and precision medicine will raise ethical issues pertaining to interpretation of variants of unknown significance, discovery of incidental findings related to adult onset conditions and carrier status, and implementation of medical therapies for which little is known in terms of risks and benefits. Despite these challenges, precision neonatology has significant potential both to decrease infant mortality related to genetic diseases with onset in newborns and to facilitate parental decision making regarding transition to palliative care. PMID:26521050

  3. Sequencing and Assembly of the 22-Gb Loblolly Pine Genome

    PubMed Central

    Zimin, Aleksey; Stevens, Kristian A.; Crepeau, Marc W.; Holtz-Morris, Ann; Koriabine, Maxim; Marçais, Guillaume; Puiu, Daniela; Roberts, Michael; Wegrzyn, Jill L.; de Jong, Pieter J.; Neale, David B.; Salzberg, Steven L.; Yorke, James A.; Langley, Charles H.

    2014-01-01

    Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer “super-reads,” rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp. PMID:24653210

  4. Next-generation sequencing technologies have greatly reduced the cost of sequencing genomes. With the current sequencing technology, a genome is broken

    E-print Network

    Campbell, A. Malcolm

    in real time and includes tutorials detailing the complexities of genome assembly. With PHAST, students such as genome assembly. Key Words: Genome assembly; bioinformatics; computational biology; teaching tool. Genome- stand genome sequencing and assembly. Objectives PHAST (Phage Assembly Suite and Tutorial; http

  5. The reference genome sequence of Saccharomyces cerevisiae: then and now.

    PubMed

    Engel, Stacia R; Dietrich, Fred S; Fisk, Dianna G; Binkley, Gail; Balakrishnan, Rama; Costanzo, Maria C; Dwight, Selina S; Hitz, Benjamin C; Karra, Kalpana; Nash, Robert S; Weng, Shuai; Wong, Edith D; Lloyd, Paul; Skrzypek, Marek S; Miyasato, Stuart R; Simison, Matt; Cherry, J Michael

    2014-03-01

    The genome of the budding yeast Saccharomyces cerevisiae was the first completely sequenced from a eukaryote. It was released in 1996 as the work of a worldwide effort of hundreds of researchers. In the time since, the yeast genome has been intensively studied by geneticists, molecular biologists, and computational scientists all over the world. Maintenance and annotation of the genome sequence have long been provided by the Saccharomyces Genome Database, one of the original model organism databases. To deepen our understanding of the eukaryotic genome, the S. cerevisiae strain S288C reference genome sequence was updated recently in its first major update since 1996. The new version, called "S288C 2010," was determined from a single yeast colony using modern sequencing technologies and serves as the anchor for further innovations in yeast genomic science. PMID:24374639

  6. A taste of pineapple evolution through genome sequencing.

    PubMed

    Xu, Qing; Liu, Zhong-Jian

    2015-12-01

    The genome sequence assembly of the highly heterozygous Ananas comosus and its varieties is an impressive technical achievement. The sequence opens the door to a greater understanding of pineapple morphology and evolution. PMID:26620110

  7. Genome scanning : an AFM-based DNA sequencing technique

    E-print Network

    Elmouelhi, Ahmed (Ahmed M.), 1979-

    2003-01-01

    Genome Scanning is a powerful new technique for DNA sequencing. The method presented in this thesis uses an atomic force microscope with a functionalized cantilever tip to sequence single stranded DNA immobilized to a mica ...

  8. Insights from twenty years of bacterial genome sequencing

    SciTech Connect

    Land, Miriam L; Hauser, Loren John; Jun, Se Ran; Nookaew, Intawat; Leuze, Michael Rex; Ahn, Tae-Hyuk; Karpinets, Tatiana V; Lund, Ole; Kora, Guruprasad H; Wassenaar, Trudy; Poudel, Suresh; Ussery, David W

    2015-01-01

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them.

  9. Mapping the Human Reference Genome’s Missing Sequence by Three-Way Admixture in Latino Genomes

    PubMed Central

    Genovese, Giulio; Handsaker, Robert E.; Li, Heng; Kenny, Eimear E.; McCarroll, Steven A.

    2013-01-01

    A principal obstacle to completing maps and analyses of the human genome involves the genome’s “inaccessible” regions: sequences (often euchromatic and containing genes) that are isolated from the rest of the euchromatic genome by heterochromatin and other repeat-rich sequence. We describe a way to localize these sequences by using ancestry linkage disequilibrium in populations that derive ancestry from at least three continents, as is the case for Latinos. We used this approach to map the genomic locations of almost 20 megabases of sequence unlocalized or missing from the current human genome reference (NCBI Genome GRCh37)—a substantial fraction of the human genome’s remaining unmapped sequence. We show that the genomic locations of most sequences that originated from fosmids and larger clones can be admixture mapped in this way, by using publicly available whole-genome sequence data. Genome assembly efforts and future builds of the human genome reference will be strongly informed by this localization of genes and other euchromatic sequences that are embedded within highly repetitive pericentromeric regions. PMID:23932108

  10. Toward product attribute control: developments from genome sequencing.

    PubMed

    Baik, Jong Youn; Lee, Kelvin H

    2014-12-01

    Chinese hamster ovary (CHO) cells are important hosts for the production of therapeutic proteins. Recent genome sequencing studies provide an initial baseline of information useful for understanding cell line performance in terms of product quality attributes. However, the lack of a well-established reference genome together with concerns about genome stability have not yet permitted the community to define the detailed relationship between the genome and cell line performance. Emerging efforts to define a new reference genome, together with new data on genome stability, herald an era where cell line's with defined genomes can be combined with defined process parameters to yield product quality attribute control. PMID:24874795

  11. Genome Project Standards in a New Era of Sequencing

    SciTech Connect

    GSC Consortia; HMP Jumpstart Consortia; Chain, P. S. G.; Grafham, D. V.; Fulton, R. S.; FitzGerald, M. G.; Hostetler, J.; Muzny, D.; Detter, J. C.; Ali, J.; Birren, B.; Bruce, D. C.; Buhay, C.; Cole, J. R.; Ding, Y.; Dugan, S.; Field, D.; Garrity, G. M.; Gibbs, R.; Graves, T.; Han, C. S.; Harrison, S. H.; Highlander, S.; Hugenholtz, P.; Khouri, H. M.; Kodira, C. D.; Kolker, E.; Kyrpides, N. C.; Lang, D.; Lapidus, A.; Malfatti, S. A.; Markowitz, V.; Metha, T.; Nelson, K. E.; Parkhill, J.; Pitluck, S.; Qin, X.; Read, T. D.; Schmutz, J.; Sozhamannan, S.; Strausberg, R.; Sutton, G.; Thomson, N. R.; Tiedje, J. M.; Weinstock, G.; Wollam, A.

    2009-06-01

    For over a decade, genome 43 sequences have adhered to only two standards that are relied on for purposes of sequence analysis by interested third parties (1, 2). However, ongoing developments in revolutionary sequencing technologies have resulted in a redefinition of traditional whole genome sequencing that requires a careful reevaluation of such standards. With commercially available 454 pyrosequencing (followed by Illumina, SOLiD, and now Helicos), there has been an explosion of genomes sequenced under the moniker 'draft', however these can be very poor quality genomes (due to inherent errors in the sequencing technologies, and the inability of assembly programs to fully address these errors). Further, one can only infer that such draft genomes may be of poor quality by navigating through the databases to find the number and type of reads deposited in sequence trace repositories (and not all genomes have this available), or to identify the number of contigs or genome fragments deposited to the database. The difficulty in assessing the quality of such deposited genomes has created some havoc for genome analysis pipelines and contributed to many wasted hours of (mis)interpretation. These same novel sequencing technologies have also brought an exponential leap in raw sequencing capability, and at greatly reduced prices that have further skewed the time- and cost-ratios of draft data generation versus the painstaking process of improving and finishing a genome. The resulting effect is an ever-widening gap between drafted and finished genomes that only promises to continue (Figure 1), hence there is an urgent need to distinguish good and poor datasets. The sequencing institutes in the authorship, along with the NIH's Human Microbiome Project Jumpstart Consortium (3), strongly believe that a new set of standards is required for genome sequences. The following represents a set of six community-defined categories of genome sequence standards that better reflect the quality of the genome sequence, based on our collective understanding of the different technologies, available assemblers, and the varied efforts to improve upon drafted genomes. Due to the increasingly rapid pace of genomics we avoided the use of rigid numerical thresholds in our definitions to take into account the types of products achieved by any combination of technology, chemistry, assembler, or improvement/finishing process.

  12. Genome Wide Characterization of Simple Sequence Repeats in Cucumber

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The whole genome sequence of the cucumber cultivar Gy14 was recently sequenced at 15× coverage with the Roche 454 Titanium technology. The microsatellite DNA sequences (simple sequence repeats, SSRs) in the assembled scaffolds were computationally explored and characterized. A total of 112,073 SSRs ...

  13. Volatiles from nineteen recently genome sequenced actinomycetes.

    PubMed

    Citron, Christian A; Barra, Lena; Wink, Joachim; Dickschat, Jeroen S

    2015-03-01

    The volatiles released by agar plate cultures of nineteen actinomycetes whose genomes were recently sequenced were collected by use of a closed-loop stripping apparatus (CLSA) and analysed by GC/MS. In total, 178 compounds from various classes were identified. The most interesting findings were the detection of the insect pheromone frontalin in Streptomyces varsoviensis, and the emission of the unusual plant metabolite 1-nitro-2-phenylethane. Its biosynthesis from phenylalanine was investigated in isotopic labelling experiments. Furthermore, the identified terpenes were correlated to the information about terpene cyclase homologs encoded in the investigated strains. The analytical data were in line with functionally characterised bacterial terpene cyclases and particularly corroborated the recently suggested function of a terpene cyclase from Streptomyces violaceusniger by the identification of a functional homolog in Streptomyces rapamycinicus. PMID:25585196

  14. Selection to sequence: opportunities in fungal genomics

    SciTech Connect

    Baker, Scott E.

    2009-12-01

    Selection is a biological force, causing genotypic and phenotypic change over time. Whether environmental or human induced, selective pressures shape the genotypes and the phenotypes of organisms both in nature and in the laboratory. In nature, selective pressure is highly dynamic and the sum of the environment and other organisms. In the laboratory, selection is used in genetic studies and industrial strain development programs to isolate mutants affecting biological processes of interest to researchers. Selective pressures are important considerations for fungal biology. In the laboratory a number of fungi are used as experimental systems to study a wide range of biological processes and in nature fungi are important pathogens of plants and animals and play key roles in carbon and nitrogen cycling. The continued development of high throughput sequencing technologies makes it possible to characterize at the genomic level, the effect of selective pressures both in the lab and in nature for filamentous fungi as well as other organisms.

  15. Finishing The Euchromatic Sequence Of The Human Genome

    SciTech Connect

    Rubin, Edward M.; Lucas, Susan; Richardson, Paul; Rokhsar, Daniel; Pennacchio, Len

    2004-09-07

    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process.The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers {approx}99% of the euchromatic genome and is accurate to an error rate of {approx}1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number,birth and death. Notably, the human genome seems to encode only20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead.

  16. On the sequencing of the human genome Robert H. Waterston*

    E-print Network

    Batzoglou, Serafim

    . The international Human Ge- nome Project (HGP) used the hierarchical shotgun approach, whereas Celera Genomics. One was the product of the international Human Genome Project (HGP), and the other was the productOn the sequencing of the human genome Robert H. Waterston* , Eric S. Lander , and John E. Sulston

  17. Complete Genome Sequence of Mycoplasma wenyonii Strain Massachusetts

    PubMed Central

    Guimaraes, Ana M. S.; do Nascimento, Naíla C.; SanMiguel, Phillip J.

    2012-01-01

    Mycoplasma wenyonii is a hemotrophic mycoplasma that causes acute and chronic infections in cattle. Here, we announce the first complete genome sequence of this organism. The genome is a single circular chromosome with 650,228 bp and G+C% of 33.9. Analyses of M. wenyonii genome will provide insights into its biology. PMID:22965086

  18. SEQUENCING THE PIG GENOME USING A BAC BY BAC APPROACH

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We have generated a highly contiguous physical map covering >98% of the pig genome in just 176 contigs. The map is localized to the genome through integration with the UIVC RH map as well BAC end sequence alignments to the human genome. Over 265k HindIII restriction digest fingerprints totaling 16.2...

  19. On the current status of Phakopsora pachyrhizi genome sequencing.

    PubMed

    Loehrer, Marco; Vogel, Alexander; Huettel, Bruno; Reinhardt, Richard; Benes, Vladimir; Duplessis, Sébastien; Usadel, Björn; Schaffrath, Ulrich

    2014-01-01

    Recent advances in the field of sequencing technologies and bioinformatics allow a more rapid access to genomes of non-model organisms at sinking costs. Accordingly, draft genomes of several economically important cereal rust fungi have been released in the last 3 years. Aside from the very recent flax rust and poplar rust draft assemblies there are no genomic data available for other dicot-infecting rust fungi. In this article we outline rust fungus sequencing efforts and comment on the current status of Phakopsora pachyrhizi (Asian soybean rust) genome sequencing. PMID:25221558

  20. On the current status of Phakopsora pachyrhizi genome sequencing

    PubMed Central

    Loehrer, Marco; Vogel, Alexander; Huettel, Bruno; Reinhardt, Richard; Benes, Vladimir; Duplessis, Sébastien; Usadel, Björn; Schaffrath, Ulrich

    2014-01-01

    Recent advances in the field of sequencing technologies and bioinformatics allow a more rapid access to genomes of non-model organisms at sinking costs. Accordingly, draft genomes of several economically important cereal rust fungi have been released in the last 3 years. Aside from the very recent flax rust and poplar rust draft assemblies there are no genomic data available for other dicot-infecting rust fungi. In this article we outline rust fungus sequencing efforts and comment on the current status of Phakopsora pachyrhizi (Asian soybean rust) genome sequencing. PMID:25221558

  1. Draft Genome Sequence of Tolypothrix boutellei Strain VB521301

    PubMed Central

    Chandrababunaidu, Mathu Malar; Singh, Deeksha; Sen, Diya; Bhan, Sushma; Das, Subhadeep; Gupta, Akash

    2015-01-01

    We report here the draft genome sequence of the filamentous nitrogen-fixing cyanobacterium Tolypothrix boutellei strain VB521301. The organism is lipid rich and hydrophobic and produces polyunsaturated fatty acids which can be harnessed for industrial purpose. The draft genome sequence assembled into 11,572,263 bp with 70 scaffolds and 7,777 protein coding genes. PMID:25700407

  2. The Prospects for Sequencing the Western Corn Rootworm Genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Historically, obtaining the complete sequence of eukaryotic genomes has been an expensive and complex task. For this reason, efforts to sequence insect genomes have largely been confined to model organisms, species that are important to human health, and representative species from a few insect orde...

  3. Nearly Complete Genome Sequence of Lactobacillus plantarum Strain NIZO2877

    PubMed Central

    Bayjanov, Jumamurat R.; Joncour, Pauline; Hughes, Sandrine; Gillet, Benjamin; Kleerebezem, Michiel; Siezen, Roland; van Hijum, Sacha A. F. T.

    2015-01-01

    Lactobacillus plantarum is a versatile bacterial species that is isolated mostly from foods. Here, we present the first genome sequence of L. plantarum strain NIZO2877 isolated from a hot dog in Vietnam. Its two contigs represent a nearly complete genome sequence. PMID:26607887

  4. De Novo Genome Sequence of Yersinia aleksiciae Y159T

    PubMed Central

    Neubauer, Heinrich

    2015-01-01

    We report here on the genome sequence of Yersinia aleksiciae Y159T, isolated in Finland in 1981. The genome has a size of 4 Mb, a G+C content of 49%, and is predicted to contain 3,423 coding sequences. PMID:26383649

  5. Complete Genome Sequence of the Human Gut Symbiont Roseburia hominis

    PubMed Central

    Travis, Anthony J.; Kelly, Denise; Flint, Harry J.

    2015-01-01

    We report here the complete genome sequence of the human gut symbiont Roseburia hominis A2-183T (= DSM 16839T = NCIMB 14029T), isolated from human feces. The genome is represented by a 3,592,125-bp chromosome with 3,405 coding sequences. A number of potential functions contributing to host-microbe interaction are identified. PMID:26543119

  6. Draft genome sequence of Kocuria rhizophila P7-4.

    PubMed

    Kim, Woo-Jin; Kim, Young-Ok; Kim, Dae-Soo; Choi, Sang-Haeng; Kim, Dong-Wook; Lee, Jun-Seo; Kong, Hee Jeong; Nam, Bo-Hye; Kim, Bong-Seok; Lee, Sang-Jun; Park, Hong-Seog; Chae, Sung-Hwa

    2011-08-01

    We report the draft genome sequence of Kocuria rhizophila P7-4, which was isolated from the intestine of Siganus doliatus caught in the Pacific Ocean. The 2.83-Mb genome sequence consists of 75 large contigs (>100 bp in size) and contains 2,462 predicted protein-coding genes. PMID:21685281

  7. Draft Genome Sequence of Kocuria rhizophila P7-4?

    PubMed Central

    Kim, Woo-Jin; Kim, Young-Ok; Kim, Dae-Soo; Choi, Sang-Haeng; Kim, Dong-Wook; Lee, Jun-Seo; Kong, Hee Jeong; Nam, Bo-Hye; Kim, Bong-Seok; Lee, Sang-Jun; Park, Hong-Seog; Chae, Sung-Hwa

    2011-01-01

    We report the draft genome sequence of Kocuria rhizophila P7-4, which was isolated from the intestine of Siganus doliatus caught in the Pacific Ocean. The 2.83-Mb genome sequence consists of 75 large contigs (>100 bp in size) and contains 2,462 predicted protein-coding genes. PMID:21685281

  8. Complete genome sequence of chinese strain of ‘Candidatus Liberibacter asiaticus’

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The complete genome sequence of ‘Candidatus Liberibacter asiaticus’ strain (Las) Guangxi-1(GX-1) was obtained by an Illumina HiSeq 2000. The GX-1 genome comprises 1,268,237 nucleotides, 36.5 % GC content, 1,141 predicted coding sequences, 44 tRNAs, 3 complete copies of ribosomal RNA genes (16S, 23S ...

  9. Draft Genome Sequence of Neurospora crassa Strain FGSC 73

    DOE PAGESBeta

    Baker, Scott E.; Schackwitz, Wendy; Lipzen, Anna; Martin, Joel; Haridas, Sajeet; LaButti, Kurt; Grigoriev, Igor V.; Simmons, Blake A.; McCluskey, Kevin

    2015-04-02

    We report the elucidation of the complete genome of the Neurospora crassa (Shear and Dodge) strain FGSC 73, a mat-a, trp-3 mutant strain. The genome sequence around the idiotypic mating type locus represents the only publicly available sequence for a mat-a strain. 40.42 Megabases are assembled into 358 scaffolds carrying 11,978 gene models.

  10. De Novo Assembly of a Bell Pepper Endornavirus Genome Sequence Using RNA Sequencing Data

    PubMed Central

    Jo, Yeonhwa; Choi, Hoseng

    2015-01-01

    The genus Endornavirus is a double-stranded RNA virus that infects a wide range of hosts. In this study, we report on the de novo assembly of a bell pepper endornavirus genome sequence by RNA sequencing (RNA-Seq). Our result demonstrates the successful application of RNA-Seq to obtain a complete viral genome sequence from the transcriptome data. PMID:25792042

  11. De novo assembly of a bell pepper endornavirus genome sequence using RNA sequencing data.

    PubMed

    Jo, Yeonhwa; Choi, Hoseng; Cho, Won Kyong

    2015-01-01

    The genus Endornavirus is a double-stranded RNA virus that infects a wide range of hosts. In this study, we report on the de novo assembly of a bell pepper endornavirus genome sequence by RNA sequencing (RNA-Seq). Our result demonstrates the successful application of RNA-Seq to obtain a complete viral genome sequence from the transcriptome data. PMID:25792042

  12. Unexpected cross-species contamination in genome sequencing projects

    PubMed Central

    Merchant, Samier; Wood, Derrick E.

    2014-01-01

    The raw data from a genome sequencing project sometimes contains DNA from contaminating organisms, which may be introduced during sample collection or sequence preparation. In some instances, these contaminants remain in the sequence even after assembly and deposition of the genome into public databases. As a result, searches of these databases may yield erroneous and confusing results. We used efficient microbiome analysis software to scan the draft assembly of domestic cow, Bos taurus, and identify 173 small contigs that appeared to derive from microbial contaminants. In the course of verifying these findings, we discovered that one genome, Neisseria gonorrhoeae TCDC-NG08107, although putatively a complete genome, contained multiple sequences that actually derived from the cow and sheep genomes. Our findings illustrate the need to carefully validate findings of anomalous DNA that rely on comparisons to either draft or finished genomes. PMID:25426337

  13. Draft sequences of the radish (Raphanus sativus L.) genome.

    PubMed

    Kitashiba, Hiroyasu; Li, Feng; Hirakawa, Hideki; Kawanabe, Takahiro; Zou, Zhongwei; Hasegawa, Yoichi; Tonosaki, Kaoru; Shirasawa, Sachiko; Fukushima, Aki; Yokoi, Shuji; Takahata, Yoshihito; Kakizaki, Tomohiro; Ishida, Masahiko; Okamoto, Shunsuke; Sakamoto, Koji; Shirasawa, Kenta; Tabata, Satoshi; Nishio, Takeshi

    2014-10-01

    Radish (Raphanus sativus L., n = 9) is one of the major vegetables in Asia. Since the genomes of Brassica and related species including radish underwent genome rearrangement, it is quite difficult to perform functional analysis based on the reported genomic sequence of Brassica rapa. Therefore, we performed genome sequencing of radish. Short reads of genomic sequences of 191.1 Gb were obtained by next-generation sequencing (NGS) for a radish inbred line, and 76,592 scaffolds of ? 300 bp were constructed along with the bacterial artificial chromosome-end sequences. Finally, the whole draft genomic sequence of 402 Mb spanning 75.9% of the estimated genomic size and containing 61,572 predicted genes was obtained. Subsequently, 221 single nucleotide polymorphism markers and 768 PCR-RFLP markers were used together with the 746 markers produced in our previous study for the construction of a linkage map. The map was combined further with another radish linkage map constructed mainly with expressed sequence tag-simple sequence repeat markers into a high-density integrated map of 1,166 cM with 2,553 DNA markers. A total of 1,345 scaffolds were assigned to the linkage map, spanning 116.0 Mb. Bulked PCR products amplified by 2,880 primer pairs were sequenced by NGS, and SNPs in eight inbred lines were identified. PMID:24848699

  14. Genome sequencing and annotation of Serratia sp. strain TEL.

    PubMed

    Lephoto, Tiisetso E; Gray, Vincent M

    2015-12-01

    We present the annotation of the draft genome sequence of Serratia sp. strain TEL (GenBank accession number KP711410). This organism was isolated from entomopathogenic nematode Oscheius sp. strain TEL (GenBank accession number KM492926) collected from grassland soil and has a genome size of 5,000,541 bp and 542 subsystems. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession number LDEG00000000. PMID:26697332

  15. Markov encoding for detecting signals in genomic sequences.

    PubMed

    Rajapakse, Jagath C; Ho, Loi Sy

    2005-01-01

    We present a technique to encode the inputs to neural networks for the detection of signals in genomic sequences. The encoding is based on lower-order Markov models which incorporate known biological characteristics in genomic sequences. The neural networks then learn intrinsic higher-order dependencies of nucleotides at the signal sites. We demonstrate the efficacy of the Markov encoding method in the detection of three genomic signals, namely, splice sites, transcription start sites, and translation initiation sites. PMID:17044178

  16. Genome sequencing and annotation of Serratia sp. strain TEL

    PubMed Central

    Lephoto, Tiisetso E.; Gray, Vincent M.

    2015-01-01

    We present the annotation of the draft genome sequence of Serratia sp. strain TEL (GenBank accession number KP711410). This organism was isolated from entomopathogenic nematode Oscheius sp. strain TEL (GenBank accession number KM492926) collected from grassland soil and has a genome size of 5,000,541 bp and 542 subsystems. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession number LDEG00000000. PMID:26697332

  17. Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi

    SciTech Connect

    Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.; Casjens, S. R.; Qiu, W.-G.; Mongodin, E. F.; Luft, B. J.

    2011-02-01

    Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.

  18. Cancer whole-genome sequencing: present and future.

    PubMed

    Nakagawa, H; Wardell, C P; Furuta, M; Taniguchi, H; Fujimoto, A

    2015-12-01

    Recent explosive advances in next-generation sequencing technology and computational approaches to massive data enable us to analyze a number of cancer genome profiles by whole-genome sequencing (WGS). To explore cancer genomic alterations and their diversity comprehensively, global and local cancer genome-sequencing projects, including ICGC and TCGA, have been analyzing many types of cancer genomes mainly by exome sequencing. However, there is limited information on somatic mutations in non-coding regions including untranslated regions, introns, regulatory elements and non-coding RNAs, and rearrangements, sometimes producing fusion genes, and pathogen detection in cancer genomes remain widely unexplored. WGS approaches can detect these unexplored mutations, as well as coding mutations and somatic copy number alterations, and help us to better understand the whole landscape of cancer genomes and elucidate functions of these unexplored genomic regions. Analysis of cancer genomes using the present WGS platforms is still primitive and there are substantial improvements to be made in sequencing technologies, informatics and computer resources. Taking account of the extreme diversity of cancer genomes and phenotype, it is also required to analyze much more WGS data and integrate these with multi-omics data, functional data and clinical-pathological data in a large number of sample sets to interpret them more fully and efficiently. PMID:25823020

  19. Scrutinizing Virus Genome Termini by High-Throughput Sequencing

    PubMed Central

    Fan, Huahao; Jiang, Huanhuan; Chen, Yubao; Tong, Yigang

    2014-01-01

    Analysis of genomic terminal sequences has been a major step in studies on viral DNA replication and packaging mechanisms. However, traditional methods to study genome termini are challenging due to the time-consuming protocols and their inefficiency where critical details are lost easily. Recent advances in next generation sequencing (NGS) have enabled it to be a powerful tool to study genome termini. In this study, using NGS we sequenced one iridovirus genome and twenty phage genomes and confirmed for the first time that the high frequency sequences (HFSs) found in the NGS reads are indeed the terminal sequences of viral genomes. Further, we established a criterion to distinguish the type of termini and the viral packaging mode. We also obtained additional terminal details such as terminal repeats, multi-termini, asymmetric termini. With this approach, we were able to simultaneously detect details of the genome termini as well as obtain the complete sequence of bacteriophage genomes. Theoretically, this application can be further extended to analyze larger and more complicated genomes of plant and animal viruses. This study proposed a novel and efficient method for research on viral replication, packaging, terminase activity, transcription regulation, and metabolism of the host cell. PMID:24465717

  20. Doug Brutlag 2015 Sequencing the Human Genome

    E-print Network

    Brutlag, Doug

    Project: Should we do it? · Service, R. F. (2001). The human genome: Objection #1: big biology is bad://www.elec-intro.com/m13-cloning #12;© Doug Brutlag 2015 Public Human Genome Project Strategy Published in Nature 15 The Human Genome Project: How should we do it? · Weber, J. L., & Myers, E. W. (1997). Human whole-genome

  1. Genome Sequence of Tumebacillus flagellatus GST4, the First Genome Sequence of a Species in the Genus Tumebacillus

    PubMed Central

    Wang, Qing-Yan; Huang, Yan-Yan; Song, Li-Fu; Du, Qi-Shi; Yu, Bo; Chen, Dong

    2014-01-01

    We present here the first genome sequence of a species in the genus Tumebacillus. The draft genome sequence of Tumebacillus flagellatus GST4 provides a genetic basis for future studies addressing the origins, evolution, and ecological role of Tumebacillus organisms, as well as a source of acid-resistant amylase-encoding genes for further studies. PMID:25395648

  2. Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes

    SciTech Connect

    McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.; Kuehl, Jennifer V.; Boore, Jeffrey L.; dePamphilis, Claude W.

    2005-08-26

    Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. A minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.

  3. BorreliaBase: a phylogeny-centered browser of Borrelia genomes

    PubMed Central

    2014-01-01

    Background The bacterial genus Borrelia (phylum Spirochaetes) consists of two groups of pathogens represented respectively by B. burgdorferi, the agent of Lyme borreliosis, and B. hermsii, the agent of tick-borne relapsing fever. The number of publicly available Borrelia genomic sequences is growing rapidly with the discovery and sequencing of Borrelia strains worldwide. There is however a lack of dedicated online databases to facilitate comparative analyses of Borrelia genomes. Description We have developed BorreliaBase, an online database for comparative browsing of Borrelia genomes. The database is currently populated with sequences from 35 genomes of eight Lyme-borreliosis (LB) group Borrelia species and 7 Relapsing-fever (RF) group Borrelia species. Distinct from genome repositories and aggregator databases, BorreliaBase serves manually curated comparative-genomic data including genome-based phylogeny, genome synteny, and sequence alignments of orthologous genes and intergenic spacers. Conclusions With a genome phylogeny at its center, BorreliaBase allows online identification of hypervariable lipoprotein genes, potential regulatory elements, and recombination footprints by providing evolution-based expectations of sequence variability at each genomic locus. The phylo-centric design of BorreliaBase (http://borreliabase.org) is a novel model for interactive browsing and comparative analysis of bacterial genomes online. PMID:24994456

  4. Community-wide analysis of microbial genome sequence signatures

    PubMed Central

    Dick, Gregory J; Andersson, Anders F; Baker, Brett J; Simmons, Sheri L; Thomas, Brian C; Yelton, A Pepper; Banfield, Jillian F

    2009-01-01

    Background Analyses of DNA sequences from cultivated microorganisms have revealed genome-wide, taxa-specific nucleotide compositional characteristics, referred to as genome signatures. These signatures have far-reaching implications for understanding genome evolution and potential application in classification of metagenomic sequence fragments. However, little is known regarding the distribution of genome signatures in natural microbial communities or the extent to which environmental factors shape them. Results We analyzed metagenomic sequence data from two acidophilic biofilm communities, including composite genomes reconstructed for nine archaea, three bacteria, and numerous associated viruses, as well as thousands of unassigned fragments from strain variants and low-abundance organisms. Genome signatures, in the form of tetranucleotide frequencies analyzed by emergent self-organizing maps, segregated sequences from all known populations sharing < 50 to 60% average amino acid identity and revealed previously unknown genomic clusters corresponding to low-abundance organisms and a putative plasmid. Signatures were pervasive genome-wide. Clusters were resolved because intra-genome differences resulting from translational selection or protein adaptation to the intracellular (pH ~5) versus extracellular (pH ~1) environment were small relative to inter-genome differences. We found that these genome signatures stem from multiple influences but are primarily manifested through codon composition, which we propose is the result of genome-specific mutational biases. Conclusions An important conclusion is that shared environmental pressures and interactions among coevolving organisms do not obscure genome signatures in acid mine drainage communities. Thus, genome signatures can be used to assign sequence fragments to populations, an essential prerequisite if metagenomics is to provide ecological and biochemical insights into the functioning of microbial communities. PMID:19698104

  5. Complete genome sequence of Mycoplasma haemofelis, a hemotropic mycoplasma.

    PubMed

    Barker, Emily N; Helps, Chris R; Peters, Iain R; Darby, Alistair C; Radford, Alan D; Tasker, Séverine

    2011-04-01

    Here, we present the genome sequence of Mycoplasma haemofelis strain Langford 1, representing the first hemotropic mycoplasma (hemoplasma) species to be completely sequenced and annotated. Originally isolated from a cat with hemolytic anemia, this strain induces severe hemolytic anemia when inoculated into specific-pathogen-free-derived cats. The genome sequence has provided insights into the biology of this uncultivatable hemoplasma and has identified potential molecular mechanisms underlying its pathogenicity. PMID:21317334

  6. Complete Genome Sequence of Mycoplasma haemofelis, a Hemotropic Mycoplasma?

    PubMed Central

    Barker, Emily N.; Helps, Chris R.; Peters, Iain R.; Darby, Alistair C.; Radford, Alan D.; Tasker, Séverine

    2011-01-01

    Here, we present the genome sequence of Mycoplasma haemofelis strain Langford 1, representing the first hemotropic mycoplasma (hemoplasma) species to be completely sequenced and annotated. Originally isolated from a cat with hemolytic anemia, this strain induces severe hemolytic anemia when inoculated into specific-pathogen-free-derived cats. The genome sequence has provided insights into the biology of this uncultivatable hemoplasma and has identified potential molecular mechanisms underlying its pathogenicity. PMID:21317334

  7. Reference genome sequence of the model plant Setaria

    SciTech Connect

    Bennetzen, Jeffrey L; Yang, Xiaohan; Ye, Chuyu; Tuskan, Gerald A

    2012-01-01

    We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The {approx}400-Mb assembly covers {approx}80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).

  8. Reference genome sequence of the model plant Setaria

    SciTech Connect

    Bennetzen, Jeffrey L; Schmutz, Jeremy; Wang, Hao; Percifield, Ryan; Hawkins, Jennifer; Pontaroli, Ana C.; Estep, Matt; Feng, Liang; Vaughn, Justin N; Grimwood, Jane; Jenkins, Jerry; Barry, Kerrie; Lindquist, Erika; Hellsten, Uffe; Deshpande, Shweta; Wang, Xuewen; Wu, Xiaomei; Mitros, Therese; Triplett, Jimmy; Yang, Xiaohan; Ye, Chuyu; Mauro-Herrera, Margarita; Wang, Lin; Li, Pinghua; Sharma, Manoj; Sharma, Rita; Ronald, Pamela; Panaud, Olivier; Kellogg, Elizabeth A.; Brutnell, Thomas P.; Doust, Andrew N.; Tuskan, Gerald A; Rokhsar, Daniel; Devos, Katrien M

    2012-01-01

    We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The ~400-Mb assembly covers ~80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).

  9. Marsupial Genome Sequences: Providing Insight into Evolution and Disease

    PubMed Central

    Deakin, Janine E.

    2012-01-01

    Marsupials (metatherians), with their position in vertebrate phylogeny and their unique biological features, have been studied for many years by a dedicated group of researchers, but it has only been since the sequencing of the first marsupial genome that their value has been more widely recognised. We now have genome sequences for three distantly related marsupial species (the grey short-tailed opossum, the tammar wallaby, and Tasmanian devil), with the promise of many more genomes to be sequenced in the near future, making this a particularly exciting time in marsupial genomics. The emergence of a transmissible cancer, which is obliterating the Tasmanian devil population, has increased the importance of obtaining and analysing marsupial genome sequence for understanding such diseases as well as for conservation efforts. In addition, these genome sequences have facilitated studies aimed at answering questions regarding gene and genome evolution and provided insight into the evolution of epigenetic mechanisms. Here I highlight the major advances in our understanding of evolution and disease, facilitated by marsupial genome projects, and speculate on the future contributions to be made by such sequences. PMID:24278712

  10. Marsupial genome sequences: providing insight into evolution and disease.

    PubMed

    Deakin, Janine E

    2012-01-01

    Marsupials (metatherians), with their position in vertebrate phylogeny and their unique biological features, have been studied for many years by a dedicated group of researchers, but it has only been since the sequencing of the first marsupial genome that their value has been more widely recognised. We now have genome sequences for three distantly related marsupial species (the grey short-tailed opossum, the tammar wallaby, and Tasmanian devil), with the promise of many more genomes to be sequenced in the near future, making this a particularly exciting time in marsupial genomics. The emergence of a transmissible cancer, which is obliterating the Tasmanian devil population, has increased the importance of obtaining and analysing marsupial genome sequence for understanding such diseases as well as for conservation efforts. In addition, these genome sequences have facilitated studies aimed at answering questions regarding gene and genome evolution and provided insight into the evolution of epigenetic mechanisms. Here I highlight the major advances in our understanding of evolution and disease, facilitated by marsupial genome projects, and speculate on the future contributions to be made by such sequences. PMID:24278712

  11. Microbial genome sequencing using optical mapping and Illumina sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Introduction Optical mapping is a technique in which strands of genomic DNA are digested with one or more restriction enzymes, and a physical map of the genome constructed from the resulting image. In outline, genomic DNA is extracted from a pure culture, linearly arrayed on a specialized glass sli...

  12. Complete genome sequence of Gordonia bronchialis type strain (3410T)

    SciTech Connect

    Ivanova, N; Sikorski, Johannes; Jando, Marlen; Lapidus, Alla L.; Nolan, Matt; Glavina Del Rio, Tijana; Tice, Hope; Copeland, A; Cheng, Jan-Fang; Chen, Feng; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Saunders, Elizabeth H; Han, Cliff; Detter, J C; Brettin, Thomas S; Rohde, Manfred; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2010-01-01

    Gordonia bronchialis Tsukamura 1971 is the type species of the genus. G. bronchialis is a human-pathogenic organism that has been isolated from a large variety of human tissues. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family Gordoniaceae. The 5,290,012 bp long genome with its 4,944 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  13. Complete genome sequence of Acidimicrobium ferrooxidans type strain (ICPT)

    SciTech Connect

    Clum, Alicia; Nolan, Matt; Lang, Elke; Glavina Del Rio, Tijana; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavrommatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Goker, Markus; Spring, Stefan; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia C.; Chain, Patrick; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Lapidus, Alla

    2009-05-20

    Acidimicrobium ferrooxidans (Clark and Norris 1996) is the sole and type species of the genus, which until recently was the only genus within the actinobacterial family Acidimicrobiaceae and in the order Acidomicrobiales. Rapid oxidation of iron pyrite during autotrophic growth in the absence of an enhanced CO2 concentration is characteristic for A. ferrooxidans. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the order Acidomicrobiales, and the 2,158,157 bp long single replicon genome with its 2038 protein coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  14. Complete genome sequence of Sulfurospirillum deleyianum type strain (5175T)

    SciTech Connect

    Sikorski, Johannes; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Chen, Feng; Tice, Hope; Cheng, Jan-Fang; Saunders, Elizabeth H; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Ovchinnikova, Galina; Pati, Amrita; Ivanova, N; Mavromatis, K; Chen, Amy; Palaniappan, Krishna; Chain, Patrick S. G.; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Detter, J. Chris; Han, Cliff; Rohde, Manfred; Lang, Elke; Spring, Stefan; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-01

    Sulfurospirillum deleyianum Schumacher et al. 1993 is the type species of the genus Sulfurospirillum. S. deleyianum is a model organism for studying sulfur reduction and dissimilatory nitrate reduction as energy source for growth. Also, it is a prominent model organism for studying the structural and functional characteristics of the cytochrome c nitrite reductase. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the genus Sulfurospirillum. The 2,306,351 bp long genome with its 2291 protein-coding and 52 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  15. Complete genome sequence of Thermomonospora curvata type strain (B9)

    SciTech Connect

    Chertkov, Olga; Sikorski, Johannes; Nolan, Matt; Lapidus, Alla L.; Lucas, Susan; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Ngatchou, Olivier Duplex; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Brettin, Thomas S; Han, Cliff; Detter, J. Chris; Rohde, Manfred; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2011-01-01

    Thermomonospora curvata Henssen 1957 is the type species of the genus Thermomonospora. This genus is of interest because members of this clade are sources of new antibiotics, enzymes, and products with pharmacological activity. In addition, members of this genus participate in the active degradation of cellulose. This is the first complete genome sequence of a member of the family Thermomonosporaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 5,639,016 bp long genome with its 4,985 protein-coding and 76 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  16. Accurate Whole Genome Sequencing as the Ultimate Genetic Test

    E-print Network

    Church, George M.

    Accurate Whole Genome Sequencing as the Ultimate Genetic Test Radoje Drmanac,1,2* Brock A. Peters,1- assembling DNA nanoarrays. Science 2010;327:78­81.4 Even 30 years ago, it was obvious that Sanger sequenc that started in Serbia in 1987 with a proposal for sequencing by hy- bridization (SBH) on dot-blot DNA arrays

  17. The genome sequence of Schizosaccharomyces pombe.

    PubMed

    Wood, V; Gwilliam, R; Rajandream, M-A; Lyne, M; Lyne, R; Stewart, A; Sgouros, J; Peat, N; Hayles, J; Baker, S; Basham, D; Bowman, S; Brooks, K; Brown, D; Brown, S; Chillingworth, T; Churcher, C; Collins, M; Connor, R; Cronin, A; Davis, P; Feltwell, T; Fraser, A; Gentles, S; Goble, A; Hamlin, N; Harris, D; Hidalgo, J; Hodgson, G; Holroyd, S; Hornsby, T; Howarth, S; Huckle, E J; Hunt, S; Jagels, K; James, K; Jones, L; Jones, M; Leather, S; McDonald, S; McLean, J; Mooney, P; Moule, S; Mungall, K; Murphy, L; Niblett, D; Odell, C; Oliver, K; O'Neil, S; Pearson, D; Quail, M A; Rabbinowitsch, E; Rutherford, K; Rutter, S; Saunders, D; Seeger, K; Sharp, S; Skelton, J; Simmonds, M; Squares, R; Squares, S; Stevens, K; Taylor, K; Taylor, R G; Tivey, A; Walsh, S; Warren, T; Whitehead, S; Woodward, J; Volckaert, G; Aert, R; Robben, J; Grymonprez, B; Weltjens, I; Vanstreels, E; Rieger, M; Schäfer, M; Müller-Auer, S; Gabel, C; Fuchs, M; Düsterhöft, A; Fritzc, C; Holzer, E; Moestl, D; Hilbert, H; Borzym, K; Langer, I; Beck, A; Lehrach, H; Reinhardt, R; Pohl, T M; Eger, P; Zimmermann, W; Wedler, H; Wambutt, R; Purnelle, B; Goffeau, A; Cadieu, E; Dréano, S; Gloux, S; Lelaure, V; Mottier, S; Galibert, F; Aves, S J; Xiang, Z; Hunt, C; Moore, K; Hurst, S M; Lucas, M; Rochet, M; Gaillardin, C; Tallada, V A; Garzon, A; Thode, G; Daga, R R; Cruzado, L; Jimenez, J; Sánchez, M; del Rey, F; Benito, J; Domínguez, A; Revuelta, J L; Moreno, S; Armstrong, J; Forsburg, S L; Cerutti, L; Lowe, T; McCombie, W R; Paulsen, I; Potashkin, J; Shpakovski, G V; Ussery, D; Barrell, B G; Nurse, P; Cerrutti, L

    2002-02-21

    We have sequenced and annotated the genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (Saccharomyces cerevisiae), possibly reflecting more-extended control regions. Some 43% of the genes contain introns, of which there are 4,730. Fifty genes have significant similarity with human disease genes; half of these are cancer related. We identify highly conserved genes important for eukaryotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing. These genes may have originated with the appearance of eukaryotic life. Few similarly conserved genes that are important for multicellular organization were identified, suggesting that the transition from prokaryotes to eukaryotes required more new genes than did the transition from unicellular to multicellular organization. PMID:11859360

  18. Complete genome sequence of Staphylothermus hellenicus P8T

    SciTech Connect

    Anderson, Iain; Wirth, Reinhard; Lucas, Susan; Copeland, A; Lapidus, Alla L.; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Davenport, Karen W.; Detter, J. Chris; Han, Cliff; Tapia, Roxanne; Land, Miriam L; Hauser, Loren John; Pati, Amrita; Mikhailova, Natalia; Woyke, Tanja; Klenk, Hans-Peter; Kyrpides, Nikos C; Ivanova, N

    2011-01-01

    Staphylothermus hellenicus belongs to the order Desulfurococcales within the archaeal phy- lum Crenarchaeota. Strain P8T is the type strain of the species and was isolated from a shal- low hydrothermal vent system at Palaeochori Bay, Milos, Greece. It is a hyperthermophilic, anaerobic heterotroph. Here we describe the features of this organism together with the com- plete genome sequence and annotation. The 1,580,347 bp genome with its 1,668 protein- coding and 48 RNA genes was sequenced as part of a DOE Joint Genome Institute (JGI) La- boratory Sequencing Program (LSP) project.

  19. Complete genome sequence of Staphylothermus hellenicus P8T

    PubMed Central

    Anderson, Iain; Wirth, Reinhard; Lucas, Susan; Copeland, Alex; Lapidus, Alla; Cheng, Jan-Fang; Goodwin, Lynne; Pitluck, Samuel; Davenport, Karen; Detter, John C.; Han, Cliff; Tapia, Roxanne; Land, Miriam; Hauser, Loren; Pati, Amrita; Mikhailova, Natalia; Woyke, Tanja; Klenk, Hans-Peter; Kyrpides, Nikos; Ivanova, Natalia

    2011-01-01

    Staphylothermus hellenicus belongs to the order Desulfurococcales within the archaeal phylum Crenarchaeota. Strain P8T is the type strain of the species and was isolated from a shallow hydrothermal vent system at Palaeochori Bay, Milos, Greece. It is a hyperthermophilic, anaerobic heterotroph. Here we describe the features of this organism together with the complete genome sequence and annotation. The 1,580,347 bp genome with its 1,668 protein-coding and 48 RNA genes was sequenced as part of a DOE Joint Genome Institute (JGI) Laboratory Sequencing Program (LSP) project. PMID:22180806

  20. Genomic Treasure Troves: Complete Genome Sequencing of Herbarium and Insect Museum Specimens

    PubMed Central

    Staats, Martijn; Erkens, Roy H. J.; van de Vossenberg, Bart; Wieringa, Jan J.; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E.; Bakker, Freek T.

    2013-01-01

    Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22–82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4–97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2–71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal. Furthermore, NGS of historical DNA enables recovering crucial genetic information from old type specimens that to date have remained mostly unutilized and, thus, opens up a new frontier for taxonomic research as well. PMID:23922691

  1. Sequencing, assembling, and correcting draft genomes using recombinant populations.

    PubMed

    Hahn, Matthew W; Zhang, Simo V; Moyle, Leonie C

    2014-04-01

    Current de novo whole-genome sequencing approaches often are inadequate for organisms lacking substantial preexisting genetic data. Problems with these methods are manifest as: large numbers of scaffolds that are not ordered within chromosomes or assigned to individual chromosomes, misassembly of allelic sequences as separate loci when the individual(s) being sequenced are heterozygous, and the collapse of recently duplicated sequences into a single locus, regardless of levels of heterozygosity. Here we propose a new approach for producing de novo whole-genome sequences-which we call recombinant population genome construction-that solves many of the problems encountered in standard genome assembly and that can be applied in model and nonmodel organisms. Our approach takes advantage of next-generation sequencing technologies to simultaneously barcode and sequence a large number of individuals from a recombinant population. The sequences of all recombinants can be combined to create an initial de novo assembly, followed by the use of individual recombinant genotypes to correct assembly splitting/collapsing and to order and orient scaffolds within linkage groups. Recombinant population genome construction can rapidly accelerate the transformation of nonmodel species into genome-enabled systems by simultaneously producing a high-quality genome assembly and providing genomic tools (e.g., high-confidence single-nucleotide polymorphisms) for immediate applications. In populations segregating for important functional traits, this approach also enables simultaneous mapping of quantitative trait loci. We demonstrate our method using simulated Illumina data from a recombinant population of Caenorhabditis elegans and show that the method can produce a high-fidelity, high-quality genome assembly for both parents of the cross. PMID:24531727

  2. Genome Sequence of a Novel Iflavirus from mRNA Sequencing of the Butterfly Heliconius erato

    PubMed Central

    Macias-Muñoz, Aide; Briscoe, Adriana D.

    2014-01-01

    Here, we report the genome sequence of a novel iflavirus strain recovered from the neotropical butterfly Heliconius erato. The coding DNA sequence (CDS) of the iflavirus genome was 8,895 nucleotides in length, encoding a polyprotein that was 2,965 amino acids long. PMID:24831145

  3. The Arabidopsis lyrata genome sequence and the basis of rapid genome size change

    SciTech Connect

    Hu, Tina T.; Pattyn, Pedro; Bakker, Erica G.; Cao, Jun; Cheng, Jan-Fang; Clark, Richard M.; Fahlgren, Noah; Fawcett, Jeffrey A.; Grimwood, Jane; Gundlach, Heidrun; Haberer, Georg; Hollister, Jesse D.; Ossowski, Stephan; Ottilar, Robert P.; Salamov, Asaf A.; Schneeberger, Korbinian; Spannagl, Manuel; Wang, Xi; Yang, Liang; Nasrallah, Mikhail E.; Bergelson, Joy; Carrington, James C.; Gaut, Brandon S.; Schmutz, Jeremy; Mayer, Klaus F. X.; Van de Peer, Yves; Grigoriev, Igor V.; Nordborg, Magnus; Weigel, Detlef; Guo, Ya-Long

    2011-04-29

    In our manuscript, we present a high-quality genome sequence of the Arabidopsis thaliana relative, Arabidopsis lyrata, produced by dideoxy sequencing. We have performed the usual types of genome analysis (gene annotation, dN/dS studies etc. etc.), but this is relegated to the Supporting Information. Instead, we focus on what was a major motivation for sequencing this genome, namely to understand how A. thaliana lost half its genome in a few million years and lived to tell the tale. The rather surprising conclusion is that there is not a single genomic feature that accounts for the reduced genome, but that every aspect centromeres, intergenic regions, transposable elements, gene family number is affected through hundreds of thousands of cuts. This strongly suggests that overall genome size in itself is what has been under selection, a suggestion that is strongly supported by our demonstration (using population genetics data from A. thaliana) that new deletions seem to be driven to fixation.

  4. Complete genome sequence of Cellulomonas flavigena type strain (134T)

    SciTech Connect

    Abt, Birte; Foster, Brian; Lapidus, Alla L.; Clum, Alicia; Sun, Hui; Pukall, Rudiger; Lucas, Susan; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Goodwin, Lynne A.; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Rohde, Manfred; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-01

    Cellulomonas flavigena (Kellerman and McBeth 1912) Bergey et al. 1923 is the type species of the genus Cellulomonas of the actinobacterial family Cellulomonadaceae. Members of the genus Cellulomonas are of special interest for their ability to degrade cellulose and hemicellulose, particularly with regard to the use of biomass as an alternative energy source. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the genus Cellulomonas, and next to the human pathogen Tropheryma whipplei the second complete genome sequence within the actinobacterial family Cellulomonadaceae. The 4,123,179 bp long single replicon genome with its 3,735 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  5. Genome sequencing and analysis of the model grass Brachypodium distachyon

    SciTech Connect

    Yang, Xiaohan; Kalluri, Udaya C; Tuskan, Gerald A

    2010-01-01

    Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.

  6. Nucleotide sequences flanking dinucleotide microsatellites in the human, mouse and Drosophila genomes.

    PubMed

    Matula, M; Kypr, J

    1999-10-01

    We extracted nucleotide sequences from the EMBL database that flank dinucleotide microsatellites in the long sequenced parts of the human, mouse and drosophila genomes. Comparison of the flanking sequences showed that the microsatellites were mostly connected to the bulk of genomic DNA through conserved, highly non-random and mostly (A+T)-rich sequences having many dozens of nucleotides in length. In many cases, the connectors were mutated versions of the flanked microsatellites whose sequence pattern gradually vanished with the distance from the microsatellite center. Hence many microsatellites have hundreds rather than dozens of nucleotides in length, and their ends are diffuse. In contrast, some microsatellites containing predominantly C and/or G, did not influence their neighborhood at all. These results make us change notions about the microsatellite nature. They also indicate that the microsatellites are the dominant part of eukaryotic genomes. PMID:10563577

  7. The Release 6 reference sequence of the Drosophila melanogaster genome.

    PubMed

    Hoskins, Roger A; Carlson, Joseph W; Wan, Kenneth H; Park, Soo; Mendez, Ivonne; Galle, Samuel E; Booth, Benjamin W; Pfeiffer, Barret D; George, Reed A; Svirskas, Robert; Krzywinski, Martin; Schein, Jacqueline; Accardo, Maria Carmela; Damia, Elisabetta; Messina, Giovanni; Méndez-Lago, María; de Pablos, Beatriz; Demakova, Olga V; Andreyeva, Evgeniya N; Boldyreva, Lidiya V; Marra, Marco; Carvalho, A Bernardo; Dimitri, Patrizio; Villasante, Alfredo; Zhimulev, Igor F; Rubin, Gerald M; Karpen, Gary H; Celniker, Susan E

    2015-03-01

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads. PMID:25589440

  8. The Release 6 reference sequence of the Drosophila melanogaster genome

    PubMed Central

    Carlson, Joseph W.; Wan, Kenneth H.; Park, Soo; Mendez, Ivonne; Galle, Samuel E.; Booth, Benjamin W.; Pfeiffer, Barret D.; George, Reed A.; Svirskas, Robert; Krzywinski, Martin; Schein, Jacqueline; Accardo, Maria Carmela; Damia, Elisabetta; Messina, Giovanni; Méndez-Lago, María; de Pablos, Beatriz; Demakova, Olga V.; Andreyeva, Evgeniya N.; Boldyreva, Lidiya V.; Marra, Marco; Carvalho, A. Bernardo; Dimitri, Patrizio; Villasante, Alfredo; Zhimulev, Igor F.; Rubin, Gerald M.; Karpen, Gary H.

    2015-01-01

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads. PMID:25589440

  9. Complete genome sequences of six strains of the genus methylobacterium

    SciTech Connect

    Marx, Christopher J; Bringel, Francoise O.; Christoserdova, Ludmila; Moulin, Lionel; Farhan Ul Haque, Muhammad; Fleischman, Darrell E.; Gruffaz, Christelle; Jourand, Philippe; Knief, Claudia; Lee, Ming-Chun; Muller, Emilie E. L.; Nadalig, Thierry; Peyraud, Remi; Roselli, Sandro; Russ, Lina; Aguero, Fernan; Goodwin, Lynne A.; Ivanova, N; Kyrpides, Nikos C; Lajus, Aurelie; Medigue, Claudine; Nolan, Matt; Woyke, Tanja; Stolyar, Sergey; Vorholt, Julia A.; Vuilleumier, Stephane

    2012-01-01

    The complete and assembled genome sequences were determined for six strains of the alphaproteobacterial genus Methylobacterium, chosen for their key adaptations to different plant-associated niches and environmental constraints.

  10. Complete Genome Sequences of Six Strains of the Genus Methylobacterium

    SciTech Connect

    Marx, Christopher J; Bringel, Francoise O.; Christoserdova, Ludmila; Moulin, Lionel; UI Hague, Muhammad Farhan; Fleischman, Darrell E.; Gruffaz, Christelle; Jourand, Philippe; Knief, Claudia; Lee, Ming-Chun; Muller, Emilie E. L.; Nadalig, Thierry; Peyraud, Remi; Roselli, Sandro; Russ, Lina; Goodwin, Lynne A.; Ivanov, Pavel S.; Ivanova, N; Kyrpides, Nikos C; Lajus, Aurelie; Medigue, Claudine; Nolan, Matt; Woyke, Tanja; Stolyar, Sergey; Vorholt, Julia A.; Vuilleumier, Stephane

    2012-01-01

    The complete and assembled genome sequences were determined for six strains of the alphaproteobacterial genus Methylobacterium, chosen for their key adaptations to different plant-associated niches and environmental constraints.

  11. The genome sequence of the filamentous fungus Neurospora crassa 

    E-print Network

    Read, Nick D; et al

    2003-04-24

    Neurospora crassa is a central organism in the history of twentieth-century genetics, biochemistry and molecular biology. Here, we report a high-quality draft sequence of the N. crassa genome. The approximately 40-megabase ...

  12. Draft Genome Sequence of Pseudomonas syringae pv. persicae NCPPB 2254.

    PubMed

    Zhao, Wenjun; Jiang, Hongshan; Tian, Qian; Hu, Jie

    2015-01-01

    Pseudomonas syringae pv. persicae is a pathogen that causes bacterial decline of stone fruit. Here, we report the draft genome sequence for P. syringae pv. persicae, which was isolated from Prunus persica. PMID:26044420

  13. Edinburgh Research Explorer Draft Genome Sequences of Six Different Staphylococcus

    E-print Network

    Millar, Andrew J.

    Edinburgh Research Explorer Draft Genome Sequences of Six Different Staphylococcus epidermidis of Six Different Staphylococcus epidermidis Clones, Isolated Individually from Preterm Neonates Staphylococcus epidermidis Clones, Isolated Individually from Preterm Neonates Presenting with Sepsis

  14. Melanoma genome sequencing reveals frequent PREX2 mutations

    E-print Network

    Lander, Eric S.

    Melanoma is notable for its metastatic propensity, lethality in the advanced setting and association with ultraviolet exposure early in life. To obtain a comprehensive genomic view of melanoma in humans, we sequenced the ...

  15. Complete Genome Sequence of Rahnella aquatilis CIP 78.65

    SciTech Connect

    Martinez, Robert J; Bruce, David; Detter, J C; Goodwin, Lynne A.; Han, James; Han, Cliff; Held, Brittany; Land, Miriam L; Mikhailova, Natalia; Nolan, Matt; Pennacchio, Len; Pitluck, Sam; Tapia, Roxanne; Woyke, Tanja; Sobeckya, Patricia A.

    2012-01-01

    Rahnella aquatilis CIP 78.65 is a gammaproteobacterium isolated from a drinking water source in Lille, France. Here we report the complete genome sequence of Rahnella aquatilis CIP 78.65, the type strain of R. aquatilis.

  16. Draft Genome Sequences of Three Mycobacterium chimaera Respiratory Isolates

    PubMed Central

    Roycroft, Emma; Raftery, Philomena; Mok, Simone; Fitzgibbon, Margaret; Rogers, Thomas R.

    2015-01-01

    Mycobacterium chimaera is an opportunistic human pathogen implicated in both pulmonary and cardiovascular infections. Here, we report the draft genome sequences of three strains isolated from human respiratory specimens. PMID:26634757

  17. Sequence analysis of the complete mitochondrial genome of Youxian sheldrake.

    PubMed

    He, Shao-Ping; Liu, Li-Li; Yu, Qi-Fang; Li, Si; He, Jian-Hua

    2016-03-01

    Youxian sheldrake is excellent native breeds in Hunan province in China. The complete mitochondrial (mt) genome sequence plays an important role in the accurate determination of phylogenetic relationships among metazoans. This is the first study to determine the complete mitochondrial genome sequence of Youxian sheldrake using PCR-based amplification and Sanger sequencing. The characteristic of the entire mitochondrial genome was analyzed in detail, the total length of the mitogenome is 16,605?bp, with the base composition of 29.21% A, 22.18% T, 32.84% C, 15.77% G in the Youxian sheldrake. It contained 2 ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and a major non-coding control region (D-loop region). The complete mitochondrial genome sequence of Youxian sheldrake provided an important data for further study of the phylogenetics of poultry, and available data for the genetics and breeding. PMID:25090395

  18. Cancer Genome Sequencing and Its Implications for Personalized Cancer Vaccines

    PubMed Central

    Li, Lijin; Goedegebuure, Peter; Mardis, Elaine R.; Ellis, Matthew J.C.; Zhang, Xiuli; Herndon, John M.; Fleming, Timothy P.; Carreno, Beatriz M.; Hansen, Ted H.; Gillanders, William E.

    2011-01-01

    New DNA sequencing platforms have revolutionized human genome sequencing. The dramatic advances in genome sequencing technologies predict that the $1,000 genome will become a reality within the next few years. Applied to cancer, the availability of cancer genome sequences permits real-time decision-making with the potential to affect diagnosis, prognosis, and treatment, and has opened the door towards personalized medicine. A promising strategy is the identification of mutated tumor antigens, and the design of personalized cancer vaccines. Supporting this notion are preliminary analyses of the epitope landscape in breast cancer suggesting that individual tumors express significant numbers of novel antigens to the immune system that can be specifically targeted through cancer vaccines. PMID:24213133

  19. Commentary on patents: Full bacterial DNA sequences boost genomics

    SciTech Connect

    Fox, J.L.

    1995-07-01

    Together with recent U.S. federal court decisions on DNA patenting, the sequencing achievement indicates that efforts on the broader genomics front may be moving more rapidly than had been previously thought.

  20. Initial genome sequencing and analysis of multiple myeloma

    E-print Network

    Lander, Eric S.

    Multiple myeloma is an incurable malignancy of plasma cells, and its pathogenesis is poorly understood. Here we report the massively parallel sequencing of 38 tumour genomes and their comparison to matched normal DNAs. ...

  1. Draft Genome Sequence of Coprobacter fastidiosus NSB1T

    PubMed Central

    Chaplin, A. V.; Efimov, B. A.; Khokhlova, E. V.; Kafarskaia, L. I.; Tupikin, A. E.; Kabilov, M. R.

    2014-01-01

    Coprobacter fastidiosus is a Gram-negative obligate anaerobic bacterium belonging to the phylum Bacteroidetes. In this work, we report the draft genome sequence of C. fastidiosus strain NSB1T isolated from human infant feces. PMID:24604645

  2. Fulfilling the Promise of a Sequenced Human Genome – Part II

    SciTech Connect

    Green, Eric

    2009-05-27

    Eric Green, scientific director of the National Human Genome Research Institute (NHGRI), gives the opening keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM on May 27, 2009. Part 2 of 2

  3. Fulfilling the Promise of a Sequenced Human Genome – Part I

    SciTech Connect

    Green, Eric

    2009-05-27

    Eric Green, scientific director of the National Human Genome Research Institute (NHGRI), gives the opening keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM on May 27, 2009. Part 1 of 2

  4. Complete genome sequence of Treponema pallidum strain DAL-1.

    PubMed

    Zobaníková, Marie; Mikolka, Pavol; Cejková, Darina; Pospíšilová, Petra; Chen, Lei; Strouhal, Michal; Qin, Xiang; Weinstock, George M; Smajs, David

    2012-10-10

    Treponema pallidum strain DAL-1 is a human uncultivable pathogen causing the sexually transmitted disease syphilis. Strain DAL-1 was isolated from the amniotic fluid of a pregnant woman in the secondary stage of syphilis. Here we describe the 1,139,971 bp long genome of T. pallidum strain DAL-1 which was sequenced using two independent sequencing methods (454 pyrosequencing and Illumina). In rabbits, strain DAL-1 replicated better than the T. pallidum strain Nichols. The comparison of the complete DAL-1 genome sequence with the Nichols sequence revealed a list of genetic differences that are potentially responsible for the increased rabbit virulence of the DAL-1 strain. PMID:23449808

  5. Genome sequence of vanilla distortion mosaic virus infecting Coriandrum sativum.

    PubMed

    Adams, I P; Rai, S; Deka, M; Harju, V; Hodges, T; Hayward, G; Skelton, A; Fox, A; Boonham, N

    2014-12-01

    The 9573-nucleotide genome of a potyvirus was sequenced from a Coriandrum sativum plant from India with viral symptoms. On analysis, this virus was shown to have greater than 85 % nucleotide sequence identity to vanilla distortion mosaic virus (VDMV). Analysis of the putative coat protein sequence confirmed that this virus was in fact VDMV, with greater than 91 % amino acid sequence identity. The genome appears to encode a 3083-amino-acid polyprotein potentially cleaved into the 10 mature proteins expected in potyviruses. Phylogenetic analysis confirmed that VDMV is a distinct but ungrouped member of the genus Potyvirus. PMID:25252813

  6. Dissection of the Octoploid Strawberry Genome by Deep Sequencing of the Genomes of Fragaria Species

    PubMed Central

    Hirakawa, Hideki; Shirasawa, Kenta; Kosugi, Shunichi; Tashiro, Kosuke; Nakayama, Shinobu; Yamada, Manabu; Kohara, Mistuyo; Watanabe, Akiko; Kishida, Yoshie; Fujishiro, Tsunakazu; Tsuruoka, Hisano; Minami, Chiharu; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Komaki, Akiko; Yanagi, Tomohiro; Guoxin, Qin; Maeda, Fumi; Ishikawa, Masami; Kuhara, Satoru; Sato, Shusei; Tabata, Satoshi; Isobe, Sachiko N.

    2014-01-01

    Cultivated strawberry (Fragaria x ananassa) is octoploid and shows allogamous behaviour. The present study aims at dissecting this octoploid genome through comparison with its wild relatives, F. iinumae, F. nipponica, F. nubicola, and F. orientalis by de novo whole-genome sequencing on an Illumina and Roche 454 platforms. The total length of the assembled Illumina genome sequences obtained was 698 Mb for F. x ananassa, and ?200 Mb each for the four wild species. Subsequently, a virtual reference genome termed FANhybrid_r1.2 was constructed by integrating the sequences of the four homoeologous subgenomes of F. x ananassa, from which heterozygous regions in the Roche 454 and Illumina genome sequences were eliminated. The total length of FANhybrid_r1.2 thus created was 173.2 Mb with the N50 length of 5137 bp. The Illumina-assembled genome sequences of F. x ananassa and the four wild species were then mapped onto the reference genome, along with the previously published F. vesca genome sequence to establish the subgenomic structure of F. x ananassa. The strategy adopted in this study has turned out to be successful in dissecting the genome of octoploid F. x ananassa and appears promising when applied to the analysis of other polyploid plant species. PMID:24282021

  7. Intra-species sequence comparisons for annotating genomes

    SciTech Connect

    Boffelli, Dario; Weer, Claire V.; Weng, Li; Lewis, Keith D.; Shoukry, Malak I.; Pachter, Lior; Keys, David N.; Rubin, Edward M.

    2004-07-15

    Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intra-species sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents and a set of genomic intervals amplified, resequenced and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom and raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. The sequence data from this study has been submitted to GenBank under accession nos. AY667278-AY667407.

  8. Complete Genome Sequences of Helicobacter pylori Rifampin-Resistant Strains.

    PubMed

    Momynaliev, Kuvat; Chelysheva, Vera; Selezneva, Oksana; Akopian, Tatyana; Alexeev, Dmitry; Govorun, Vadim

    2013-01-01

    Here we present the complete genome sequences of two Helicobacter pylori rifampin-resistant (Rif(r)) strains (Rif1 and Rif2). Rif(r) strains were obtained by in vitro selection of H. pylori 26695 on agar plates with 20 µg/ml rifampin. The genome data provide insights on the genomic diversity of H. pylori under selection by rifampin. PMID:23833139

  9. Mulan: multiple-sequence alignment to predict functional elements in genomic sequences.

    PubMed

    Loots, Gabriela G; Ovcharenko, Ivan

    2007-01-01

    Multiple sequence alignment analysis is a powerful approach for translating the evolutionary selective power into phylogenetic relationships to localize functional coding and noncoding genomic elements. The tool Mulan (http://mulan.dcode.org/) has been designed to effectively perform multiple comparisons of genomic sequences necessary to facilitate bioinformatic-driven biological discoveries. The Mulan network server is capable of comparing both closely and distantly related genomes to identify conserved elements over a broad range of evolutionary time. Several novel algorithms are brought together in this tool: the tba multisequence aligner program used to rapidly identify local sequence conservation and the multiTF program to detect evolutionarily conserved transcription factor binding sites in alignments. Mulan is integrated with the ERC Browser, the UCSC Genome Browser for quick uploads of available sequences and supports two-way communication with the GALA database to overlay GALA functional genome annotation with sequence conservation profiles. Local multiple alignments computed by Mulan ensure reliable representation of short- and large-scale genomic rearrangements in distant organisms. Recently, we have also introduced the ability to handle duplications to permit the reliable reconstruction of evolutionary events that underlie the genome sequence data. Here, we describe the main features of the Mulan tool that include the interactive modification of critical conservation parameters, visualization options, and dynamic access to sequence data from visual graphs for flexible and easy-to-perform analysis of differentially evolving genomic regions. PMID:17993678

  10. Complete Genome Sequence of Mycoplasma synoviae Strain WVU 1853T

    PubMed Central

    Kutish, Gerald F.; Barbet, Anthony F.; Michaels, Dina L.

    2015-01-01

    A hybrid sequence assembly of the complete Mycoplasma synoviae type strain WVU 1853T genome was compared to that of strain MS53. The findings support prior conclusions about M. synoviae, based on the genome of that otherwise uncharacterized field strain, and provide the first evidence of epigenetic modifications in M. synoviae. PMID:26021934

  11. Mitochondrial Genome Sequence of the Glass Sponge Oopsacas minuta.

    PubMed

    Jourda, Cyril; Santini, Sébastien; Rocher, Caroline; Le Bivic, André; Claverie, Jean-Michel

    2015-01-01

    We report the complete mitochondrial genome sequence of the Mediterranean glass sponge Oopsacas minuta. This 19-kb mitochondrial genome has 24 noncoding genes (22 tRNAs and 2 rRNAs) and 14 protein-encoding genes coding for 11 subunits of respiratory chain complexes and 3 ATP synthase subunits. PMID:26227597

  12. A snapshot of the emerging tomato genome sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genome of tomato (Solanum lycopersicum) is being sequenced by an international consortium of 10 countries (Korea, China, the United Kingdom, India, the Netherlands, France, Japan, Spain, Italy and the United States) as part of a larger initiative called the ‘International Solanaceae Genome Proje...

  13. Complete Genome Sequence of Campylobacter gracilis ATCC 33236T

    PubMed Central

    Yee, Emma

    2015-01-01

    The human oral pathogen Campylobacter gracilis has been isolated from periodontal and endodontal infections, and also from nonoral head, neck, or lung infections. This study describes the whole-genome sequence of the human periodontal isolate ATCC 33236T (=FDC 1084), which is the first closed genome for C. gracilis. PMID:26383656

  14. Mitochondrial Genome Sequence of the Glass Sponge Oopsacas minuta

    PubMed Central

    Jourda, Cyril; Santini, Sébastien; Rocher, Caroline; Le Bivic, André

    2015-01-01

    We report the complete mitochondrial genome sequence of the Mediterranean glass sponge Oopsacas minuta. This 19-kb mitochondrial genome has 24 noncoding genes (22 tRNAs and 2 rRNAs) and 14 protein-encoding genes coding for 11 subunits of respiratory chain complexes and 3 ATP synthase subunits. PMID:26227597

  15. Draft Genome Sequence of Mycobacterium austroafricanum DSM 44191.

    PubMed

    Croce, Olivier; Robert, Catherine; Raoult, Didier; Drancourt, Michel

    2014-01-01

    We announce the draft genome sequence of Mycobacterium austroafricanum DSM 44191(T) (= E9789-SA12441(T)), a non-tuberculosis species responsible for opportunistic infection. The genome described here has a size of 6,772,357 bp with a G+C content of 66.79% and contains 6,419 protein-coding genes and 112 RNA genes. PMID:24744336

  16. Sequence and comparative analysis of the chicken genome provide unique

    E-print Network

    Hardison, Ross C.

    and contraction of multigene families seem to have been major factors in the independent evolution of mammals evolution International Chicken Genome Sequencing Consortium* *Lists of participants and affiliations appear and an estimated 20,000­23,000 genes--provides a new perspective on vertebrate genome evolution, while also

  17. Genomic regulatory regions: insights from comparative sequence analysis

    E-print Network

    Sidow, Arend

    Genomic regulatory regions: insights from comparative sequence analysis Gregory M Cooperà and Arend of genomic regulatory regions with functional roles. It is effective because functionally important regions for the comprehensive discovery of human regulatory elements. Addresses à Department of Genetics, Stanford University

  18. RESEARCH Open Access Genomic and small RNA sequencing of

    E-print Network

    Green, Pamela

    of sorghum as a reference genome sequence for Andropogoneae grasses Kankshita Swaminathan1,2 , Magdy origins of Mxg, and suggest that while the repeat content of Mxg differs from sorghum, the sorghum genome. Included within the Andropogoneae are major crops such as maize, Sorghum bicolor (sorghum), sugarcane

  19. Draft Genome Sequence of "Candidatus Liberibacter asiaticus" from California.

    PubMed

    Zheng, Z; Deng, X; Chen, J

    2014-01-01

    We report here the draft genome sequence of "Candidatus Liberibacter asiaticus" strain HHCA, collected from a lemon tree in California. The HHCA strain has a genome size of 1,150,620 bp, 36.5% G+C content, 1,119 predicted open reading frames, and 51 RNA genes. PMID:25278540

  20. Draft Genome Sequence of “Candidatus Liberibacter asiaticus” from California

    PubMed Central

    Zheng, Z.

    2014-01-01

    We report here the draft genome sequence of “Candidatus Liberibacter asiaticus” strain HHCA, collected from a lemon tree in California. The HHCA strain has a genome size of 1,150,620 bp, 36.5% G+C content, 1,119 predicted open reading frames, and 51 RNA genes. PMID:25278540

  1. Draft Genome Sequence of Linfuranone Producer Microbispora sp. GMKU 363.

    PubMed

    Komaki, Hisayuki; Ichikawa, Natsuko; Hosoyama, Akira; Fujita, Nobuyuki; Thamchaipenet, Arinthip; Igarashi, Yasuhiro

    2015-01-01

    Here, we report the draft genome sequence of Microbispora sp. GMKU 363, a plant-derived actinomycete that produces linfuranone A, a linear polyketide modified with a furanone ring possessing adipocyte differentiation inducing activity. The biosynthetic gene cluster for linfuranone was identified by analyzing polyketide synthase genes in the genome. PMID:26659694

  2. First Complete Genome Sequence of Felis catus Gammaherpesvirus 1

    PubMed Central

    Lee, Justin S.; Vuyisich, Momchilo; Chain, Patrick; Lo, Chien-Chi; Kronmiller, Brent; Bracha, Shay; Avery, Anne C.; VandeWoude, Sue

    2015-01-01

    We sequenced the complete genome of Felis catus gammaherpesvirus 1 (FcaGHV1) from lymph node DNA of an infected cat. The genome includes a 121,556-nucleotide unique region with 87 predicted open reading frames (61 gammaherpesvirus conserved and 26 unique) flanked by multiple copies of a 966-nucleotide terminal repeat. PMID:26543105

  3. Multiplexed DNA Sequence Capture of Mitochondrial Genomes Using PCR Products

    E-print Network

    Pääbo, Svante

    Multiplexed DNA Sequence Capture of Mitochondrial Genomes Using PCR Products Tomislav Maricic products are used to capture complete human mitochondrial genomes from complex DNA mixtures. We use. It has applications in population genetics and forensics, as well as studies of ancient DNA. Citation

  4. Draft Genome Sequence of Linfuranone Producer Microbispora sp. GMKU 363

    PubMed Central

    Ichikawa, Natsuko; Hosoyama, Akira; Fujita, Nobuyuki; Thamchaipenet, Arinthip; Igarashi, Yasuhiro

    2015-01-01

    Here, we report the draft genome sequence of Microbispora sp. GMKU 363, a plant-derived actinomycete that produces linfuranone A, a linear polyketide modified with a furanone ring possessing adipocyte differentiation inducing activity. The biosynthetic gene cluster for linfuranone was identified by analyzing polyketide synthase genes in the genome. PMID:26659694

  5. Draft Genome Sequence of Entomopathogenic Serratia liquefaciens Strain FK01

    PubMed Central

    Taira, Erika; Mon, Hiroaki; Mori, Kazuki; Akasaka, Taiki; Tashiro, Kousuke; Yasunaga-Aoki, Chisa; Lee, Jae Man; Kusakabe, Takahiro

    2014-01-01

    In the present study, we determined the draft genome sequence of the entomopathogenic bacterium Serratia liquefaciens FK01, which is highly virulent to the silkworm. The draft genome is ~5.28 Mb in size, and the G+C content is 55.8%. PMID:24970828

  6. Draft Genome Sequence of Corynebacterium pseudodiphtheriticum Strain 090104 "Sokolov".

    PubMed

    Karlyshev, Andrey V; Melnikov, Vyacheslav G

    2013-01-01

    This report describes the first draft genome sequence of a Corynebacterium pseudodiphtheriticum strain. The information on the genome organization and putative gene products will assist in better understanding of the molecular mechanisms involved in the beneficial probiotic effects of this bacterium. PMID:24201200

  7. Genomic sequence for the aflatoxigenic filamentous fungus Aspergillus nomius

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genome of the A. nomius type strain was sequenced using a personal genome machine. Annotation of the genes was undertaken, followed by gene ontology and an investigation into the number of secondary metabolite clusters. Comparative studies with other Aspergillus species involved shared/unique ge...

  8. Draft Genome Sequences of 10 Strains of the Genus Exiguobacterium

    PubMed Central

    Chauhan, Archana; Layton, Alice C.; Pfiffner, Susan M.; Huntemann, Marcel; Copeland, Alex; Chen, Amy; Kyrpides, Nikos C.; Markowitz, Victor M.; Palaniappan, Krishna; Ivanova, Natalia; Mikhailova, Natalia; Ovchinnikova, Galina; Andersen, Evan W.; Pati, Amrita; Stamatis, Dimitrios; Reddy, T. B. K.; Shapiro, Nicole; Nordberg, Henrik P.; Cantor, Michael N.; Hua, X. Susan; Woyke, Tanja

    2014-01-01

    High-quality draft genome sequences were determined for 10 Exiguobacterium strains in order to provide insight into their evolutionary strategies for speciation and environmental adaptation. The selected genomes include psychrotrophic and thermophilic species from a range of habitats, which will allow for a comparison of metabolic pathways and stress response genes. PMID:25323723

  9. Draft genome sequences of 10 strains of the genus exiguobacterium.

    PubMed

    Vishnivetskaya, Tatiana A; Chauhan, Archana; Layton, Alice C; Pfiffner, Susan M; Huntemann, Marcel; Copeland, Alex; Chen, Amy; Kyrpides, Nikos C; Markowitz, Victor M; Palaniappan, Krishna; Ivanova, Natalia; Mikhailova, Natalia; Ovchinnikova, Galina; Andersen, Evan W; Pati, Amrita; Stamatis, Dimitrios; Reddy, T B K; Shapiro, Nicole; Nordberg, Henrik P; Cantor, Michael N; Hua, X Susan; Woyke, Tanja

    2014-01-01

    High-quality draft genome sequences were determined for 10 Exiguobacterium strains in order to provide insight into their evolutionary strategies for speciation and environmental adaptation. The selected genomes include psychrotrophic and thermophilic species from a range of habitats, which will allow for a comparison of metabolic pathways and stress response genes. PMID:25323723

  10. Genome Sequence of Type Strain Lysinibacillus macroides DSM 54T

    PubMed Central

    Liu, Guo-hong; Wang, Jie-ping; Che, Jian-Mei; Chen, Qian-Qian; Chen, Zheng; Ge, Ci-bin

    2015-01-01

    Lysinibacillus macroides DSM 54T is a Gram-positive, spore-forming bacterium. Here, we report the 4,866,035-bp genome sequence of Lysinibacillus macroides DSM 54T, which will accelerate the application of degrading xylan and provide useful information for genomic taxonomy and phylogenomics of Bacillus-like bacteria. PMID:26543111

  11. Complete genome sequence of pronghorn virus, a pestivirus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The complete genome sequence of Pronghorn virus, a member of the Pestivirus genus of the Flaviviridae, was determined. The virus, originally isolated from a pronghorn antelope, had a genome of 12,287 nucleotides with a single open reading frame of 11,694 bases encoding 3898 amino acids....

  12. Whole Genome and Transcriptome Sequencing of a B3 Thymoma

    PubMed Central

    Petrini, Iacopo; Rajan, Arun; Pham, Trung; Voeller, Donna; Davis, Sean; Gao, James; Wang, Yisong; Giaccone, Giuseppe

    2013-01-01

    Molecular pathology of thymomas is poorly understood. Genomic aberrations are frequently identified in tumors but no extensive sequencing has been reported in thymomas. Here we present the first comprehensive view of a B3 thymoma at whole genome and transcriptome levels. A 55-year-old Caucasian female underwent complete resection of a stage IVA B3 thymoma. RNA and DNA were extracted from a snap frozen tumor sample with a fraction of cancer cells over 80%. We performed array comparative genomic hybridization using Agilent platform, transcriptome sequencing using HiSeq 2000 (Illumina) and whole genome sequencing using Complete Genomics Inc platform. Whole genome sequencing determined, in tumor and normal, the sequence of both alleles in more than 95% of the reference genome (NCBI Build 37). Copy number (CN) aberrations were comparable with those previously described for B3 thymomas, with CN gain of chromosome 1q, 5, 7 and X and CN loss of 3p, 6, 11q42.2-qter and q13. One translocation t(11;X) was identified by whole genome sequencing and confirmed by PCR and Sanger sequencing. Ten single nucleotide variations (SNVs) and 2 insertion/deletions (INDELs) were identified; these mutations resulted in non-synonymous amino acid changes or affected splicing sites. The lack of common cancer-associated mutations in this patient suggests that thymomas may evolve through mechanisms distinctive from other tumor types, and supports the rationale for additional high-throughput sequencing screens to better understand the somatic genetic architecture of thymoma. PMID:23577124

  13. Draft genome sequence of Therminicola potens strain JR

    SciTech Connect

    Byrne-Bailey, K.G.; Wrighton, K.C.; Melnyk, R.A.; Agbo, P.; Hazen, T.C.; Coates, J.D.

    2010-07-01

    'Thermincola potens' strain JR is one of the first Gram-positive dissimilatory metal-reducing bacteria (DMRB) for which there is a complete genome sequence. Consistent with the physiology of this organism, preliminary annotation revealed an abundance of multiheme c-type cytochromes that are putatively associated with the periplasm and cell surface in a Gram-positive bacterium. Here we report the complete genome sequence of strain JR.

  14. Large-Scale Sequencing: The Future of Genomic Sciences Colloquium

    SciTech Connect

    Margaret Riley; Merry Buckley

    2009-01-01

    Genetic sequencing and the various molecular techniques it has enabled have revolutionized the field of microbiology. Examining and comparing the genetic sequences borne by microbes - including bacteria, archaea, viruses, and microbial eukaryotes - provides researchers insights into the processes microbes carry out, their pathogenic traits, and new ways to use microorganisms in medicine and manufacturing. Until recently, sequencing entire microbial genomes has been laborious and expensive, and the decision to sequence the genome of an organism was made on a case-by-case basis by individual researchers and funding agencies. Now, thanks to new technologies, the cost and effort of sequencing is within reach for even the smallest facilities, and the ability to sequence the genomes of a significant fraction of microbial life may be possible. The availability of numerous microbial genomes will enable unprecedented insights into microbial evolution, function, and physiology. However, the current ad hoc approach to gathering sequence data has resulted in an unbalanced and highly biased sampling of microbial diversity. A well-coordinated, large-scale effort to target the breadth and depth of microbial diversity would result in the greatest impact. The American Academy of Microbiology convened a colloquium to discuss the scientific benefits of engaging in a large-scale, taxonomically-based sequencing project. A group of individuals with expertise in microbiology, genomics, informatics, ecology, and evolution deliberated on the issues inherent in such an effort and generated a set of specific recommendations for how best to proceed. The vast majority of microbes are presently uncultured and, thus, pose significant challenges to such a taxonomically-based approach to sampling genome diversity. However, we have yet to even scratch the surface of the genomic diversity among cultured microbes. A coordinated sequencing effort of cultured organisms is an appropriate place to begin, since not only are their genomes available, but they are also accompanied by data on environment and physiology that can be used to understand the resulting data. As single cell isolation methods improve, there should be a shift toward incorporating uncultured organisms and communities into this effort. Efforts to sequence cultivated isolates should target characterized isolates from culture collections for which biochemical data are available, as well as other cultures of lasting value from personal collections. The genomes of type strains should be among the first targets for sequencing, but creative culture methods, novel cell isolation, and sorting methods would all be helpful in obtaining organisms we have not yet been able to cultivate for sequencing. The data that should be provided for strains targeted for sequencing will depend on the phylogenetic context of the organism and the amount of information available about its nearest relatives. Annotation is an important part of transforming genome sequences into useful resources, but it represents the most significant bottleneck to the field of comparative genomics right now and must be addressed. Furthermore, there is a need for more consistency in both annotation and achieving annotation data. As new annotation tools become available over time, re-annotation of genomes should be implemented, taking advantage of advancements in annotation techniques in order to capitalize on the genome sequences and increase both the societal and scientific benefit of genomics work. Given the proper resources, the knowledge and ability exist to be able to select model systems, some simple, some less so, and dissect them so that we may understand the processes and interactions at work in them. Colloquium participants suggest a five-pronged, coordinated initiative to exhaustively describe six different microbial ecosystems, designed to describe all the gene diversity, across genomes. In this effort, sequencing should be complemented by other experimental data, particularly transcriptomics and metabolomics data, all of which

  15. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    PubMed

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-01-01

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181?Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40?Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299?Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled. PMID:26586576

  16. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis

    PubMed Central

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-01-01

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181?Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40?Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299?Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled. PMID:26586576

  17. A 454 sequencing approach to dipteran mitochondrial genome research.

    PubMed

    Ramakodi, Meganathan P; Singh, Baneshwar; Wells, Jeffrey D; Guerrero, Felix; Ray, David A

    2015-01-01

    The availability of complete mitochondrial genome (mtgenome) data for Diptera, one of the largest metazoan orders, in public databases is limited. The advent of high throughput sequencing technology provides the potential to generate mtgenomes for many species affordably and quickly. However, these technologies need to be validated for dipterans as the members of this clade play important economic and research roles. Illumina and 454 sequencing platforms are widely used in genomic research involving non-model organisms. The Illumina platform has already been utilized for generating mitochondrial genomes without using conventional long range PCR for insects whereas the power of 454 sequencing for generating mitochondrial genome drafts without PCR has not yet been validated for insects. Thus, this study examines the utility of 454 sequencing approach for dipteran mtgenomic research. We generated complete or nearly complete mitochondrial genomes for Cochliomyia hominivorax, Haematobia irritans, Phormia regina and Sarcophaga crassipalpis using a 454 sequencing approach. Comparisons between newly obtained and existing assemblies for C. hominivorax and H. irritans revealed no major discrepancies and verified the utility of 454 sequencing for dipteran mitochondrial genomes. We also report the complete mitochondrial sequences for two forensically important flies, P. regina and S. crassipalpis, which could be used to provide useful information to legal personnel. Comparative analyses revealed that dipterans follow similar codon usage and nucleotide biases that could be due to mutational and selection pressures. This study illustrates the utility of 454 sequencing to obtain complete mitochondrial genomes for dipterans without the aid of conventional molecular techniques such as PCR and cloning and validates this method of mtgenome sequencing in arthropods. PMID:25451744

  18. Sequencing and comparing whole mitochondrial genomes of animals.

    PubMed

    Boore, Jeffrey L; Macey, J Robert; Medina, Mónica

    2005-01-01

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, which can be especially powerful. We describe here the protocols commonly used for physically isolating mitochondrial DNA (mtDNA), for amplifying these by polymerase chain reaction (PCR) or rolling circle amplification (RCA), for cloning, sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences with determining and comparing complete mitochondrial DNA sequences. PMID:15865975

  19. Single Nucleotide Polymorphism Mapping Using Genome-Wide Unique Sequences

    PubMed Central

    Chen, Leslie Y.Y.; Lu, Szu-Hsien; Shih, Edward S.C.; Hwang, Ming-Jing

    2002-01-01

    As more and more genomic DNAs are sequenced to characterize human genetic variations, the demand for a very fast and accurate method to genomically position these DNA sequences is high. We have developed a new mapping method that does not require sequence alignment. In this method, we first identified DNA fragments of 15 bp in length that are unique in the human genome and then used them to position single nucleotide polymorphism (SNP) sequences. By use of four desktop personal computers with AMD K7 (1 GHz) processors, our new method mapped more than 1.6 million SNP sequences in 20 hr and achieved a very good agreement with mapping results from alignment-based methods. PMID:12097348

  20. Genome sequence of the date palm Phoenix dactylifera L.

    PubMed

    Al-Mssallem, Ibrahim S; Hu, Songnian; Zhang, Xiaowei; Lin, Qiang; Liu, Wanfei; Tan, Jun; Yu, Xiaoguang; Liu, Jiucheng; Pan, Linlin; Zhang, Tongwu; Yin, Yuxin; Xin, Chengqi; Wu, Hao; Zhang, Guangyu; Ba Abdullah, Mohammed M; Huang, Dawei; Fang, Yongjun; Alnakhli, Yasser O; Jia, Shangang; Yin, An; Alhuzimi, Eman M; Alsaihati, Burair A; Al-Owayyed, Saad A; Zhao, Duojun; Zhang, Sun; Al-Otaibi, Noha A; Sun, Gaoyuan; Majrashi, Majed A; Li, Fusen; Tala; Wang, Jixiang; Yun, Quanzheng; Alnassar, Nafla A; Wang, Lei; Yang, Meng; Al-Jelaify, Rasha F; Liu, Kan; Gao, Shenghan; Chen, Kaifu; Alkhaldi, Samiyah R; Liu, Guiming; Zhang, Meng; Guo, Haiyan; Yu, Jun

    2013-01-01

    Date palm (Phoenix dactylifera L.) is a cultivated woody plant species with agricultural and economic importance. Here we report a genome assembly for an elite variety (Khalas), which is 605.4?Mb in size and covers >90% of the genome (~671?Mb) and >96% of its genes (~41,660 genes). Genomic sequence analysis demonstrates that P. dactylifera experienced a clear genome-wide duplication after either ancient whole genome duplications or massive segmental duplications. Genetic diversity analysis indicates that its stress resistance and sugar metabolism-related genes tend to be enriched in the chromosomal regions where the density of single-nucleotide polymorphisms is relatively low. Using transcriptomic data, we also illustrate the date palm's unique sugar metabolism that underlies fruit development and ripening. Our large-scale genomic and transcriptomic data pave the way for further genomic studies not only on P. dactylifera but also other Arecaceae plants. PMID:23917264

  1. Complete mitochondrial genome sequence of Aoluguya reindeer (Rangifer tarandus).

    PubMed

    Ju, Yan; Liu, Huamiao; Rong, Min; Yang, Yifeng; Wei, Haijun; Shao, Yuanchen; Chen, Xiumin; Xing, Xiumei

    2014-12-01

    Abstract The complete mitochondria genome of the reindeer, Rangifer tarandus, was determined by accurate polymerase chain reaction. The entire genome is 16,357?bp in length and contains 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a D-loop region, all of which are arranged in a typical vertebrate manner. The overall base composition of the reindeer's mitochondrial genome is 33.7% of A, 23.1% of C, 30.1% of T and 13.2%of G. A termination associated sequence and several conserved central sequence block domains were discovered within the control region. PMID:25469816

  2. Complete genome sequence of Serratia plymuthica strain AS12

    SciTech Connect

    Neupane, Saraswoti; Finlay, Roger D.; Alstrom, Sadhna; Goodwin, Lynne A.; Kyrpides, Nikos C; Lucas, Susan; Lapidus, Alla L.; Bruce, David; Pitluck, Sam; Peters, Lin; Ovchinnikova, Galina; Chertkov, Olga; Han, James; Han, Cliff; Tapia, Roxanne; Detter, J. Chris; Land, Miriam L; Hauser, Loren John; Cheng, Jan-Fang; Ivanova, N; Pagani, Ioanna; Klenk, Hans-Peter; Woyke, Tanja; Hogberg, Nils

    2012-01-01

    A plant associated member of the family Enterobacteriaceae, Serratia plymuthica strain AS12 was isolated from rapeseed roots. It is of scientific interest due to its plant growth promoting and plant pathogen inhibiting ability. The genome of S. plymuthica AS12 comprises a 5,443,009 bp long circular chromosome, which consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome was sequenced within the 2010 DOE-JGI Community Sequencing Program (CSP2010) as part of the project entitled 'Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogens'.

  3. Complete genome sequence of Ferroglobus placidus AEDII12DO

    SciTech Connect

    Anderson, Iain; Risso, Carla; Holmes, Dawn; Lucas, Susan; Copeland, A; Lapidus, Alla L.; Cheng, Jan-Fang; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Saunders, Elizabeth H; Brettin, Thomas S; Detter, J. Chris; Han, Cliff; Tapia, Roxanne; Larimer, Frank W; Land, Miriam L; Hauser, Loren John; Woyke, Tanja; Lovley, Derek; Kyrpides, Nikos C; Ivanova, N

    2011-01-01

    Ferroglobus placidus belongs to the order Archaeoglobales within the archaeal phylum Euryar- chaeota. Strain AEDII12DO is the type strain of the species and was isolated from a shallow marine hydrothermal system at Vulcano, Italy. It is a hyperthermophilic, anaerobic chemoli- thoautotroph, but it can also use a variety of aromatic compounds as electron donors. Here we describe the features of this organism together with the complete genome sequence and anno- tation. The 2,196,266 bp genome with its 2,567 protein-coding and 55 RNA genes was se- quenced as part of a DOE Joint Genome Institute Laboratory Sequencing Program (LSP) project.

  4. RESTseq – Efficient Benchtop Population Genomics with RESTriction Fragment SEQuencing

    PubMed Central

    Stolle, Eckart; Moritz, Robin F. A.

    2013-01-01

    We present RESTseq, an improved approach for a cost efficient, highly flexible and repeatable enrichment of DNA fragments from digested genomic DNA using Next Generation Sequencing platforms including small scale Personal Genome sequencers. Easy adjustments make it suitable for a wide range of studies requiring SNP detection or SNP genotyping from fine-scale linkage mapping to population genomics and population genetics also in non-model organisms. We demonstrate the validity of our approach by comparing two honeybee and several stingless bee samples. PMID:23691128

  5. RESTseq--efficient benchtop population genomics with RESTriction Fragment SEQuencing.

    PubMed

    Stolle, Eckart; Moritz, Robin F A

    2013-01-01

    We present RESTseq, an improved approach for a cost efficient, highly flexible and repeatable enrichment of DNA fragments from digested genomic DNA using Next Generation Sequencing platforms including small scale Personal Genome sequencers. Easy adjustments make it suitable for a wide range of studies requiring SNP detection or SNP genotyping from fine-scale linkage mapping to population genomics and population genetics also in non-model organisms. We demonstrate the validity of our approach by comparing two honeybee and several stingless bee samples. PMID:23691128

  6. Massively parallel sequencing: the new frontier of hematologic genomics

    PubMed Central

    Nickerson, Deborah A.; Reiner, Alex P.

    2013-01-01

    Genomic technologies are becoming a routine part of human genetic analysis. The exponential growth in DNA sequencing capability has brought an unprecedented understanding of human genetic variation and the identification of thousands of variants that impact human health. In this review, we describe the different types of DNA variation and provide an overview of existing DNA sequencing technologies and their applications. As genomic technologies and knowledge continue to advance, they will become integral in clinical practice. To accomplish the goal of personalized genomic medicine for patients, close collaborations between researchers and clinicians will be essential to develop and curate deep databases of genetic variation and their associated phenotypes. PMID:24021669

  7. Genome sequence analysis of the model grass Brachypodium distachyon: insights into grass genome evolution

    SciTech Connect

    Schulman, Al

    2009-08-09

    Three subfamilies of grasses, the Erhardtoideae (rice), the Panicoideae (maize, sorghum, sugar cane and millet), and the Pooideae (wheat, barley and cool season forage grasses) provide the basis of human nutrition and are poised to become major sources of renewable energy. Here we describe the complete genome sequence of the wild grass Brachypodium distachyon (Brachypodium), the first member of the Pooideae subfamily to be completely sequenced. Comparison of the Brachypodium, rice and sorghum genomes reveals a precise sequence- based history of genome evolution across a broad diversity of the grass family and identifies nested insertions of whole chromosomes into centromeric regions as a predominant mechanism driving chromosome evolution in the grasses. The relatively compact genome of Brachypodium is maintained by a balance of retroelement replication and loss. The complete genome sequence of Brachypodium, coupled to its exceptional promise as a model system for grass research, will support the development of new energy and food crops

  8. Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology

    PubMed Central

    Cronn, Richard; Liston, Aaron; Parks, Matthew; Gernandt, David S.; Shen, Rongkun; Mockler, Todd

    2008-01-01

    Organellar DNA sequences are widely used in evolutionary and population genetic studies, however, the conservative nature of chloroplast gene and genome evolution often limits phylogenetic resolution and statistical power. To gain maximal access to the historical record contained within chloroplast genomes, we have adapted multiplex sequencing-by-synthesis (MSBS) to simultaneously sequence multiple genomes using the Illumina Genome Analyzer. We PCR-amplified ?120 kb plastomes from eight species (seven Pinus, one Picea) in 35 reactions. Pooled products were ligated to modified adapters that included 3 bp indexing tags and samples were multiplexed at four genomes per lane. Tagged microreads were assembled by de novo and reference-guided assembly methods, using previously published Pinus plastomes as surrogate references. Assemblies for these eight genomes are estimated at 88–94% complete, with an average sequence depth of 55× to 186×. Mononucleotide repeats interrupt contig assembly with increasing repeat length, and we estimate that the limit for their assembly is 16 bp. Comparisons to 37 kb of Sanger sequence show a validated error rate of 0.056%, and conspicuous errors are evident from the assembly process. This efficient sequencing approach yields high-quality draft genomes and should have immediate applicability to genomes with comparable complexity. PMID:18753151

  9. Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology.

    PubMed

    Cronn, Richard; Liston, Aaron; Parks, Matthew; Gernandt, David S; Shen, Rongkun; Mockler, Todd

    2008-11-01

    Organellar DNA sequences are widely used in evolutionary and population genetic studies, however, the conservative nature of chloroplast gene and genome evolution often limits phylogenetic resolution and statistical power. To gain maximal access to the historical record contained within chloroplast genomes, we have adapted multiplex sequencing-by-synthesis (MSBS) to simultaneously sequence multiple genomes using the Illumina Genome Analyzer. We PCR-amplified approximately 120 kb plastomes from eight species (seven Pinus, one Picea) in 35 reactions. Pooled products were ligated to modified adapters that included 3 bp indexing tags and samples were multiplexed at four genomes per lane. Tagged microreads were assembled by de novo and reference-guided assembly methods, using previously published Pinus plastomes as surrogate references. Assemblies for these eight genomes are estimated at 88-94% complete, with an average sequence depth of 55x to 186x. Mononucleotide repeats interrupt contig assembly with increasing repeat length, and we estimate that the limit for their assembly is 16 bp. Comparisons to 37 kb of Sanger sequence show a validated error rate of 0.056%, and conspicuous errors are evident from the assembly process. This efficient sequencing approach yields high-quality draft genomes and should have immediate applicability to genomes with comparable complexity. PMID:18753151

  10. Draft genome sequences of two virulent serotypes of avian Pasteurella multocida

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Here we report the draft genome sequences of two virulent avian strains of Pasteurella multocida. Comparative analyses of these genomes were done with the published genome sequence of avirulent Pasteurella multocida strain Pm70....

  11. Sequence Analysis and Organization of the Neodiprion abietis Nucleopolyhedrovirus Genome

    PubMed Central

    Duffy, Simon P.; Young, Aaron M.; Morin, Benoit; Lucarotti, Christopher J.; Koop, Ben F.; Levin, David B.

    2006-01-01

    Of 30 baculovirus genomes that have been sequenced to date, the only nonlepidopteran baculoviruses include the dipteran Culex nigripalpus nucleopolyhedrovirus and two hymenopteran nucleopolyhedroviruses that infect the sawflies Neodiprion lecontei (NeleNPV) and Neodiprion sertifer (NeseNPV). This study provides a complete sequence and genome analysis of the nucleopolyhedrovirus that infects the balsam fir sawfly Neodiprion abietis (Hymenoptera, Symphyta, Diprionidae). The N. abietis nucleopolyhedrovirus (NeabNPV) is 84,264 bp in size, with a G+C content of 33.5%, and contains 93 predicted open reading frames (ORFs). Eleven predicted ORFs are unique to this baculovirus, 10 ORFs have a putative sequence homologue in the NeleNPV genome but not the NeseNPV genome, and 1 ORF (neab53) has a putative sequence homologue in the NeseNPV genome but not the NeleNPV genome. Specific repeat sequences are coincident with major genome rearrangements that distinguish NeabNPV and NeleNPV. Genes associated with these repeat regions encode a common amino acid motif, suggesting that they are a family of repeated contiguous gene clusters. Lepidopteran baculoviruses, similarly, have a family of repeated genes called the bro gene family. However, there is no significant sequence similarity between the NeabNPV and bro genes. Homologues of early-expressed genes such as ie-1 and lef-3 were absent in NeabNPV, as they are in the previously sequenced hymenopteran baculoviruses. Analyses of ORF upstream sequences identified potential temporally distinct genes on the basis of putative promoter elements. PMID:16809301

  12. Complete genome sequence of equine herpesvirus type 9.

    PubMed

    Fukushi, Hideto; Yamaguchi, Tsuyoshi; Yamada, Souichi

    2012-12-01

    Equine herpesvirus type 9 (EHV-9), which we isolated from a case of epizootic encephalitis in a herd of Thomson's gazelles (Gazella thomsoni) in 1993, has been known to cause fatal encephalitis in Thomson's gazelle, giraffe, and polar bear in natural infections. Our previous report indicated that EHV-9 was similar to the equine pathogen equine herpesvirus type 1 (EHV-1), which mainly causes abortion, respiratory infection, and equine herpesvirus myeloencephalopathy. We determined the genome sequence of EHV-9. The genome has a length of 148,371 bp and all 80 of the open reading frames (ORFs) found in the genome of EHV-1. The nucleotide sequences of the ORFs in EHV-9 were 86 to 95% identical to those in EHV-1. The whole genome sequence should help to reveal the neuropathogenicity of EHV-9. PMID:23166237

  13. Transcriptome and genome sequencing uncovers functional variation in humans

    PubMed Central

    Lappalainen, Tuuli; Sammeth, Michael; Friedländer, Marc R; ‘t Hoen, Peter AC; Monlong, Jean; Rivas, Manuel A; Gonzàlez-Porta, Mar; Kurbatova, Natalja; Griebel, Thasso; Ferreira, Pedro G; Barann, Matthias; Wieland, Thomas; Greger, Liliana; van Iterson, Maarten; Almlöf, Jonas; Ribeca, Paolo; Pulyakhina, Irina; Esser, Daniela; Giger, Thomas; Tikhonov, Andrew; Sultan, Marc; Bertier, Gabrielle; MacArthur, Daniel G; Lek, Monkol; Lizano, Esther; Buermans, Henk PJ; Padioleau, Ismael; Schwarzmayr, Thomas; Karlberg, Olof; Ongen, Halit; Kilpinen, Helena; Beltran, Sergi; Gut, Marta; Kahlem, Katja; Amstislavskiy, Vyacheslav; Stegle, Oliver; Pirinen, Matti; Montgomery, Stephen B; Donnelly, Peter; McCarthy, Mark I; Flicek, Paul; Strom, Tim M; Lehrach, Hans; Schreiber, Stefan; Sudbrak, Ralf; Carracedo, Ángel; Antonarakis, Stylianos E; Häsler, Robert; Syvänen, Ann-Christine; van Ommen, Gert-Jan; Brazma, Alvis; Meitinger, Thomas; Rosenstiel, Philip; Guigó, Roderic; Gut, Ivo G; Estivill, Xavier; Dermitzakis, Emmanouil T

    2013-01-01

    Summary Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of mRNA and miRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project – the first uniformly processed RNA-seq data from multiple human populations with high-quality genome sequences. We discovered extremely widespread genetic variation affecting regulation of the majority of genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on cellular mechanisms of regulatory and loss-of-function variation, and allowed us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome. PMID:24037378

  14. Comparison of mitochondrial genome sequences of pangolins (Mammalia, Pholidota).

    PubMed

    Hassanin, Alexandre; Hugot, Jean-Pierre; van Vuuren, Bettine Jansen

    2015-04-01

    The complete mitochondrial genome was sequenced for three species of pangolins, Manis javanica, Phataginus tricuspis, and Smutsia temminckii, and comparisons were made with two other species, Manis pentadactyla and Phataginus tetradactyla. The genome of Manidae contains the 37 genes found in a typical mammalian genome, and the structure of the control region is highly conserved among species. In Manis, the overall base composition differs from that found in African genera. Phylogenetic analyses support the monophyly of the genera Manis, Phataginus, and Smutsia, as well as the basal division between Maninae and Smutsiinae. Comparisons with GenBank sequences reveal that the reference genomes of M. pentadactyla and P. tetradactyla (accession numbers NC_016008 and NC_004027) were sequenced from misidentified taxa, and that a new species of tree pangolin should be described in Gabon. PMID:25746396

  15. Sequencing the Genome of the Heirloom Watermelon Cultivar Charleston Gray

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genome of the watermelon cultivar Charleston Gray, a major heirloom which has been used in breeding programs of many watermelon cultivars, was sequenced. Our strategy involved a hybrid approach using the Illumina and 454/Titanium next-generation sequencing technologies. For Illumina, shotgun g...

  16. GENOMIC SEQUENCE ANALYSIS OF LEPTOSPIRA BORGPETERSENII SEROVAR HARDJO

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A genomic library from Leptospira borgpetersenii serovar hardjo strain JB197 was prepared by mechanically shearing the DNA and inserting it into a positive selection vector. DNA was prepared from approximately 22,000 random clones and used as templates for automated sequencing. Sequence data was c...

  17. Environmental Genome Shotgun Sequencing of the Sargasso Sea

    E-print Network

    Bruns, Tom

    Environmental Genome Shotgun Sequencing of the Sargasso Sea J. Craig Venter,1 * Karin Remington,1 collected from the Sargasso Sea near Bermuda. A total of 1.045 billion base pairs of nonredundant sequence characterization. To help ensure a tractable pilot study, we sampled in the Sargasso Sea, a nutrient- limited, open

  18. Draft Genome Sequence of Prosthecomicrobium hirschii ATCC 27832T

    PubMed Central

    Daniel, Jeremy J.; Givan, Scott A.; Brun, Yves V.

    2015-01-01

    We report the draft genome sequence of Prosthecomicrobium hirschii ATCC 27832T, an alphaproteobacterium with remarkable cellular morphologies. The chromosome comprises 6,484,983 bp in six scaffolds with a G+C content of 69%, and 6,066 potential coding sequences. PMID:26586892

  19. Draft Genome Sequence of Lactobacillus fermentum NB-22

    PubMed Central

    Shkoporov, A. N.; Efimov, B. A.; Pikina, A. P.; Borisova, O. Y.; Gladko, I. A.; Postnikova, E. A.; Lordkipanidze, A. E.; Kafarskaia, L. I.

    2015-01-01

    We announce here a draft genome sequence of Lactobacillus fermentum NB-22, a strain isolated from human vaginal microbiota. The assembled sequence consists of 190 contigs, joined into 137 scaffolds, and the total size is 2.01 Mb. PMID:26272572

  20. Rosetta Genomics Announces Next-Generation Sequencing Research Collaboration with

    E-print Network

    Pilpel, Yitzhak

    discoveries and technological applications. Working in collaboration with the Institute provides us identification of microRNA sequences. These advances will allow us to incorporate sequencing in more of our treatment. Rosetta Genomics estimates that, in the U.S. alone, 200,000 patients a year may benefit from

  1. Complete Genome Sequences of Mandrillus leucophaeus and Papio ursinus Cytomegaloviruses.

    PubMed

    Blewett, Earl Linwood; Sherrod, Carly J; Texier, Jordan R; Conrad, Tom M; Dittmer, Dirk P

    2015-01-01

    The complete genome sequences of Mandrillus leucophaeus and Papio ursinus cytomegaloviruses were determined. An isolate from a drill monkey, OCOM6-2, and an isolate from a chacma baboon, OCOM4-52, were subjected to pyrosequencing and assembled. Comparative alignment of published primate cytomegaloviruses (CMVs) showed variable sequence conservation between species. PMID:26251484

  2. Distribution and intensity of constraint in mammalian genomic sequence

    E-print Network

    Sidow, Arend

    sequence conservation to identify regions of functional im- portance in mammals (Pennacchio et al. 2001 that comparative sequence analysis is a powerful paradigm for the discovery of those functional regions in the human genome whose experimental discovery is difficult (O'Brien et al. 1999; Hardison 2000; Pennacchio

  3. Complete Genome Sequence of the Alfalfa latent virus

    PubMed Central

    Shao, Jonathan; Postnikova, Olga A.

    2015-01-01

    The first complete genome sequence of the Alfalfa latent carlavirus (ALV) was obtained by primer walking and Illumina RNA sequencing. The virus differs substantially from the Czech ALV isolate and the Pea streak virus isolate from Wisconsin. The absence of a clear nucleic acid-binding protein indicates ALV divergence from other carlaviruses. PMID:25883281

  4. Complete Genomic Sequence of Issyk-Kul Virus

    PubMed Central

    Marston, Denise A.; Ellis, Richard J.; Fooks, Anthony R.; Hewson, Roger

    2015-01-01

    Issyk-Kul virus (ISKV) is an ungrouped virus tentatively assigned to the Bunyaviridae family and is associated with an acute febrile illness in several central Asian countries. Using next-generation sequencing technologies, we report here the full-genome sequence for this novel unclassified arboviral pathogen circulating in central Asia. PMID:26139711

  5. Complete Genome Sequence of Kocuria palustris MU14/1

    PubMed Central

    Foecking, Mark F.

    2015-01-01

    Presented here is the first completely assembled genome sequence of Kocuria palustris, an actinobacterial species with broad ecological distribution. The single, circular chromosome of K. palustris MU14/1 comprises 2,854,447 bp, has a G+C content of 70.5%, and contains a deduced gene set of 2,521 coding sequences. PMID:26472837

  6. PHYTOPHTHORA GENOME SEQUENCES UNCOVER EVOLUTIONARY ORIGINS AND MECHANISMS OF PATHOGENESIS

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Draft genome sequences of the soybean pathogen Phytophthora sojae and the sudden oak death pathogen Phytophthora ramorum have been determined. Oomycetes such as these Phytophthora species share the kingdom Stramenopiles with photosynthetic algae such as diatoms, and the Phytophthora sequences sugges...

  7. Complete Genome Sequence of Caulobacter crescentus Siphophage Sansa

    PubMed Central

    Vara, Leonardo; Kane, Ashley A.; Cahill, Jesse L.; Rasche, Eric S.

    2015-01-01

    Caulobacter crescentus is a Gram-negative dimorphic model organism used to study cell differentiation. Siphophage Sansa is a newly isolated siphophage with an icosahedral capsid that infects C. crescentus. Sansa shares no sequence similarity to other phages deposited in GenBank. Here, we describe its genome sequence and general features. PMID:26450723

  8. Draft Genome Sequences of Two Toxigenic Corynebacterium ulcerans Strains

    PubMed Central

    Fournier, Eric; Massé, Cynthia; Charest, Hugues; Bernard, Kathryn; Côté, Jean-Charles; Tremblay, Cécile

    2015-01-01

    Here, we present the draft genome sequences of two toxigenic Corynebacterium ulcerans strains isolated from two different patients: one from a blood sample and the other from a scar exudate following surgery. Although these two strains harbor the diphtheria toxin gene tox, no full prophage sequences were found in the flanking regions. PMID:26112794

  9. Complete Genome Sequence of Caulobacter crescentus Siphophage Sansa.

    PubMed

    Vara, Leonardo; Kane, Ashley A; Cahill, Jesse L; Rasche, Eric S; Kuty Everett, Gabriel F

    2015-01-01

    Caulobacter crescentus is a Gram-negative dimorphic model organism used to study cell differentiation. Siphophage Sansa is a newly isolated siphophage with an icosahedral capsid that infects C. crescentus. Sansa shares no sequence similarity to other phages deposited in GenBank. Here, we describe its genome sequence and general features. PMID:26450723

  10. Genome sequence of Stachybotrys chartarum Strain 51-11

    EPA Science Inventory

    Stachybotrys chartarum strain 51-11 genome was sequenced by shotgun sequencing utilizing Illumina Hiseq 2000 and PacBio long read technology. Since Stachybotrys chartarum has been implicated in health impacts within water-damaged buildings, any information extracted from the geno...

  11. Complete Genome Sequence of Southern tomato virus Identified in China Using Next-Generation Sequencing

    PubMed Central

    Padmanabhan, Chellappan; Zheng, Yi; Li, Rugang; Sun, Shu-E; Zhang, Deyong; Liu, Yong; Fei, Zhangjun

    2015-01-01

    The complete genome sequence of Southern tomato virus (STV), a double-stranded RNA virus that affects tomato in China, was determined using small RNA deep sequencing. This Chinese isolate shares 99% sequence identity to other isolates from Mexico, France, Spain, and the United States. This is the first report of STV infecting tomatoes in Asia. PMID:26494671

  12. Complete genome sequence of southern tomato virus identified from China using next generation sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Complete genome sequence of a double-stranded RNA (dsRNA) virus, southern tomato virus (STV), on tomatoes in China, was elucidated using small RNAs deep sequencing. The identified STV_CN12 shares 99% sequence identity to other isolates from Mexico, France, Spain, and U.S. This is the first report ...

  13. Sequence Analysis of the Genome of Carnation (Dianthus caryophyllus L.)

    PubMed Central

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-01-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. ‘Francesco’ was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568 887 315 bp, consisting of 45 088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16 644 bp and 60 737 bp, respectively, and the longest scaffold was 1 287 144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ?98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. PMID:24344172

  14. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    PubMed

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ? 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. PMID:24344172

  15. Initial sequence and comparative analysis of the cat genome

    PubMed Central

    Pontius, Joan U.; Mullikin, James C.; Smith, Douglas R.; Lindblad-Toh, Kerstin; Gnerre, Sante; Clamp, Michele; Chang, Jean; Stephens, Robert; Neelam, Beena; Volfovsky, Natalia; Schäffer, Alejandro A.; Agarwala, Richa; Narfström, Kristina; Murphy, William J.; Giger, Urs; Roca, Alfred L.; Antunes, Agostinho; Menotti-Raymond, Marilyn; Yuhki, Naoya; Pecon-Slattery, Jill; Johnson, Warren E.; Bourque, Guillaume; Tesler, Glenn; O’Brien, Stephen J.

    2007-01-01

    The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing ?65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence. PMID:17975172

  16. Legume genomics: understanding biology through DNA and RNA sequencing

    PubMed Central

    O'Rourke, Jamie A.; Bolon, Yung-Tsi; Bucciarelli, Bruna; Vance, Carroll P.

    2014-01-01

    Background The legume family (Leguminosae) consists of approx. 17 000 species. A few of these species, including, but not limited to, Phaseolus vulgaris, Cicer arietinum and Cajanus cajan, are important dietary components, providing protein for approx. 300 million people worldwide. Additional species, including soybean (Glycine max) and alfalfa (Medicago sativa), are important crops utilized mainly in animal feed. In addition, legumes are important contributors to biological nitrogen, forming symbiotic relationships with rhizobia to fix atmospheric N2 and providing up to 30 % of available nitrogen for the next season of crops. The application of high-throughput genomic technologies including genome sequencing projects, genome re-sequencing (DNA-seq) and transcriptome sequencing (RNA-seq) by the legume research community has provided major insights into genome evolution, genomic architecture and domestication. Scope and Conclusions This review presents an overview of the current state of legume genomics and explores the role that next-generation sequencing technologies play in advancing legume genomics. The adoption of next-generation sequencing and implementation of associated bioinformatic tools has allowed researchers to turn each species of interest into their own model organism. To illustrate the power of next-generation sequencing, an in-depth overview of the transcriptomes of both soybean and white lupin (Lupinus albus) is provided. The soybean transcriptome focuses on analysing seed development in two near-isogenic lines, examining the role of transporters, oil biosynthesis and nitrogen utilization. The white lupin transcriptome analysis examines how phosphate deficiency alters gene expression patterns, inducing the formation of cluster roots. Such studies illustrate the power of next-generation sequencing and bioinformatic analyses in elucidating the gene networks underlying biological processes. PMID:24769535

  17. Draft Genome Sequences of Two South African Bacillus anthracis Strains

    PubMed Central

    Lekota, Kgaugelo E.; Mafofo, Joseph; Madoroba, Evelyn; Rees, Jasper; van Heerden, Henriette

    2015-01-01

    Bacillus anthracis is a Gram-positive bacterium that causes anthrax, mainly in herbivores through exotoxins and capsule produced on plasmids, pXO1 and pXO2. This paper compares the whole-genome sequences of two B. anthracis strains from an endemic region and a sporadic outbreak in South Africa. Sequencing was done using next-generation sequencing technologies. PMID:26586878

  18. Draft Genome Sequences of Two South African Bacillus anthracis Strains.

    PubMed

    Lekota, Kgaugelo E; Mafofo, Joseph; Madoroba, Evelyn; Rees, Jasper; van Heerden, Henriette; Muchadeyi, Farai C

    2015-01-01

    Bacillus anthracis is a Gram-positive bacterium that causes anthrax, mainly in herbivores through exotoxins and capsule produced on plasmids, pXO1 and pXO2. This paper compares the whole-genome sequences of two B. anthracis strains from an endemic region and a sporadic outbreak in South Africa. Sequencing was done using next-generation sequencing technologies. PMID:26586878

  19. A rapid whole genome sequencing and analysis system supporting genomic epidemiology (7th Annual SFAF Meeting, 2012)

    ScienceCinema

    FitzGerald, Michael [Broad Institute

    2013-02-12

    Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  20. A rapid whole genome sequencing and analysis system supporting genomic epidemiology (7th Annual SFAF Meeting, 2012)

    SciTech Connect

    FitzGerald, Michael

    2012-06-01

    Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  1. Sequence-Based Mapping of the Polyploid Wheat Genome

    PubMed Central

    Saintenac, Cyrille; Jiang, Dayou; Wang, Shichen; Akhunov, Eduard

    2013-01-01

    The emergence of new sequencing technologies has provided fast and cost-efficient strategies for high-resolution mapping of complex genomes. Although these approaches hold great promise to accelerate genome analysis, their application in studying genetic variation in wheat has been hindered by the complexity of its polyploid genome. Here, we applied the next-generation sequencing of a wheat doubled-haploid mapping population for high-resolution gene mapping and tested its utility for ordering shotgun sequence contigs of a flow-sorted wheat chromosome. A bioinformatical pipeline was developed for reliable variant analysis of sequence data generated for polyploid wheat mapping populations. The results of variant mapping were consistent with the results obtained using the wheat 9000 SNP iSelect assay. A reference map of the wheat genome integrating 2740 gene-associated single-nucleotide polymorphisms from the wheat iSelect assay, 1351 diversity array technology, 118 simple sequence repeat/sequence-tagged sites, and 416,856 genotyping-by-sequencing markers was developed. By analyzing the sequenced megabase-size regions of the wheat genome we showed that mapped markers are located within 40?100 kb from genes providing a possibility for high-resolution mapping at the level of a single gene. In our population, gene loci controlling a seed color phenotype cosegregated with 2459 markers including one that was located within the red seed color gene. We demonstrate that the high-density reference map presented here is a useful resource for gene mapping and linking physical and genetic maps of the wheat genome. PMID:23665877

  2. The clinical potential and challenges of sequencing cancer genomes for personalized medical genomics.

    PubMed

    Cloonan, Nicole; Waddell, Nic; Grimmond, Sean M

    2010-11-01

    Next-generation sequencing is revolutionizing the way in which genomic-scale biological research is performed, and its effects are beginning to be translated medically. Large-scale international collaborations for the comprehensive sequencing of the genome, epigenome, and transcriptomes of cancers and corresponding 'normal' (germ-line) DNA are heralding the start of personalized medical genomics. The promise of eliminating conjecture when determining treatment approaches is certainly appealing for both patients and clinicians; however, several major issues must be resolved before next-generation sequencing will be adopted as a routine clinical tool for patients. This feature review explores the clinical potential and challenges of studying cancer genomes for personalized medical genomics. PMID:21046525

  3. Genomic Sequence or Signature Tags (GSTs) from the Genome Group at Brookhaven National Laboratory (BNL)

    DOE Data Explorer

    Dunn, John J.; McCorkle, Sean R.; Praissman, Laura A.; Hind, Geoffrey; Van der Lelie, Daniel; Bahou, Wadie F.; Gnatenko, Dmitri V.; Krause, Maureen K.

    Genomic Signature Tags (GSTs) are the products of a method we have developed for identifying and quantitatively analyzing genomic DNAs. The DNA is initially fragmented with a type II restriction enzyme. An oligonucleotide adaptor containing a recognition site for MmeI, a type IIS restriction enzyme, is then used to release 21-bp tags from fixed positions in the DNA relative to the sites recognized by the fragmenting enzyme. These tags are PCR-amplified, purified, concatenated and then cloned and sequenced. The tag sequences and abundances are used to create a high resolution GST sequence profile of the genomic DNA. [Quoted from Genomic Signature Tags (GSTs): A System for Profiling Genomic DNA, Dunn, John J.; McCorkle, Sean R.; Praissman, Laura A.; Hind, Geoffrey; Van der Lelie, Daniel; Bahou, Wadie F.; Gnatenko, Dmitri V.; Krause, Maureen K., Revised 9/13/2002

  4. Evolution Analysis of Simple Sequence Repeats in Plant Genome

    PubMed Central

    Qin, Zhen; Wang, Yanping; Wang, Qingmei; Li, Aixian; Hou, Fuyun; Zhang, Liming

    2015-01-01

    Simple sequence repeats (SSRs) are widespread units on genome sequences, and play many important roles in plants. In order to reveal the evolution of plant genomes, we investigated the evolutionary regularities of SSRs during the evolution of plant species and the plant kingdom by analysis of twelve sequenced plant genome sequences. First, in the twelve studied plant genomes, the main SSRs were those which contain repeats of 1–3 nucleotides combination. Second, in mononucleotide SSRs, the A/T percentage gradually increased along with the evolution of plants (except for P. patens). With the increase of SSRs repeat number the percentage of A/T in C. reinhardtii had no significant change, while the percentage of A/T in terrestrial plants species gradually declined. Third, in dinucleotide SSRs, the percentage of AT/TA increased along with the evolution of plant kingdom and the repeat number increased in terrestrial plants species. This trend was more obvious in dicotyledon than monocotyledon. The percentage of CG/GC showed the opposite pattern to the AT/TA. Forth, in trinucleotide SSRs, the percentages of combinations including two or three A/T were in a rising trend along with the evolution of plant kingdom; meanwhile with the increase of SSRs repeat number in plants species, different species chose different combinations as dominant SSRs. SSRs in C. reinhardtii, P. patens, Z. mays and A. thaliana showed their specific patterns related to evolutionary position or specific changes of genome sequences. The results showed that, SSRs not only had the general pattern in the evolution of plant kingdom, but also were associated with the evolution of the specific genome sequence. The study of the evolutionary regularities of SSRs provided new insights for the analysis of the plant genome evolution. PMID:26630570

  5. Genome Sequence of the Pea Aphid Acyrthosiphon pisum

    PubMed Central

    2010-01-01

    Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetabolous insect provides an outgroup to the multiple published genomes of holometabolous insects. Pea aphids are host-plant specialists, they can reproduce both sexually and asexually, and they have coevolved with an obligate bacterial symbiont. Here we highlight findings from whole genome analysis that may be related to these unusual biological features. These findings include discovery of extensive gene duplication in more than 2000 gene families as well as loss of evolutionarily conserved genes. Gene family expansions relative to other published genomes include genes involved in chromatin modification, miRNA synthesis, and sugar transport. Gene losses include genes central to the IMD immune pathway, selenoprotein utilization, purine salvage, and the entire urea cycle. The pea aphid genome reveals that only a limited number of genes have been acquired from bacteria; thus the reduced gene count of Buchnera does not reflect gene transfer to the host genome. The inventory of metabolic genes in the pea aphid genome suggests that there is extensive metabolite exchange between the aphid and Buchnera, including sharing of amino acid biosynthesis between the aphid and Buchnera. The pea aphid genome provides a foundation for post-genomic studies of fundamental biological questions and applied agricultural problems. PMID:20186266

  6. Corruption of genomic databases with anomalous sequence.

    PubMed Central

    Lamperti, E D; Kittelberger, J M; Smith, T F; Villa-Komaroff, L

    1992-01-01

    We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%. PMID:1614861

  7. Comparison of methods for genomic localization of gene trap sequences

    PubMed Central

    Harper, Courtney A; Huang, Conrad C; Stryke, Doug; Kawamoto, Michiko; Ferrin, Thomas E; Babbitt, Patricia C

    2006-01-01

    Background Gene knockouts in a model organism such as mouse provide a valuable resource for the study of basic biology and human disease. Determining which gene has been inactivated by an untargeted gene trapping event poses a challenging annotation problem because gene trap sequence tags, which represent sequence near the vector insertion site of a trapped gene, are typically short and often contain unresolved residues. To understand better the localization of these sequences on the mouse genome, we compared stand-alone versions of the alignment programs BLAT, SSAHA, and MegaBLAST. A set of 3,369 sequence tags was aligned to build 34 of the mouse genome using default parameters for each algorithm. Known genome coordinates for the cognate set of full-length genes (1,659 sequences) were used to evaluate localization results. Results In general, all three programs performed well in terms of localizing sequences to a general region of the genome, with only relatively subtle errors identified for a small proportion of the sequence tags. However, large differences in performance were noted with regard to correctly identifying exon boundaries. BLAT correctly identified the vast majority of exon boundaries, while SSAHA and MegaBLAST missed the majority of exon boundaries. SSAHA consistently reported the fewest false positives and is the fastest algorithm. MegaBLAST was comparable to BLAT in speed, but was the most susceptible to localizing sequence tags incorrectly to pseudogenes. Conclusion The differences in performance for sequence tags and full-length reference sequences were surprisingly small. Characteristic variations in localization results for each program were noted that affect the localization of sequence at exon boundaries, in particular. PMID:16982004

  8. Center for Cell and Genome Sciences, Crocker Science Building

    E-print Network

    Tipple, Brett

    chemistry Center for Cell and Genome Sciences genetic engineering building artificial life brain engineering photodiodes #12;the Cell engineering the genome, imaging proteins Invitrogen at the intersection of chemistry

  9. Widespread mitovirus sequences in plant genomes

    PubMed Central

    Warner, Benjamin E.; Yerramsetty, Pradeep

    2015-01-01

    The exploration of the evolution of RNA viruses has been aided recently by the discovery of copies of fragments or complete genomes of non-retroviral RNA viruses (Non-retroviral Endogenous RNA Viral Elements, or NERVEs) in many eukaryotic nuclear genomes. Among the most prominent NERVEs are partial copies of the RNA dependent RNA polymerase (RdRP) of the mitoviruses in plant mitochondrial genomes. Mitoviruses are in the family Narnaviridae, which are the simplest viruses, encoding only a single protein (the RdRP) in their unencapsidated viral plus strand. Narnaviruses are known only in fungi, and the origin of plant mitochondrial mitovirus NERVEs appears to be horizontal transfer from plant pathogenic fungi. At least one mitochondrial mitovirus NERVE, but not its nuclear copy, is expressed. PMID:25870770

  10. The International Pea Genome Sequencing Project: Sequencing and Assembly Progresses Updates

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The International Consortium for the Pea Genome Sequencing (ICPG) includes scientists from six countries around the world. Its aim is to provide a high quality reference of the pea genome to the scientific community as well as to the pea breeder community. The consortium proposed a strategy that int...

  11. The power of EST sequence data: Relation to Acyrthosiphon pisum genome annotation and functional genomics initiatives

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genes important to aphid biology, survival and reproduction were successfully identified by use of a genomics approach. We created and described the Sequencing, compilation, and annotation of the approxiamtely 525Mb nuclear genome of the pea aphid, Acyrthosiphon pisum, which represents an important ...

  12. Preliminary Genomic Characterization of Ten Hardwood Tree Species from Multiplexed Low Coverage Whole Genome Sequencing

    PubMed Central

    Staton, Margaret; Best, Teodora; Khodwekar, Sudhir; Owusu, Sandra; Xu, Tao; Xu, Yi; Jennings, Tara; Cronn, Richard; Arumuganathan, A. Kathiravetpilla; Coggeshall, Mark; Gailing, Oliver; Liang, Haiying; Romero-Severson, Jeanne; Schlarbaum, Scott; Carlson, John E.

    2015-01-01

    Forest health issues are on the rise in the United States, resulting from introduction of alien pests and diseases, coupled with abiotic stresses related to climate change. Increasingly, forest scientists are finding genetic/genomic resources valuable in addressing forest health issues. For a set of ten ecologically and economically important native hardwood tree species representing a broad phylogenetic spectrum, we used low coverage whole genome sequencing from multiplex Illumina paired ends to economically profile their genomic content. For six species, the genome content was further analyzed by flow cytometry in order to determine the nuclear genome size. Sequencing yielded a depth of 0.8X to 7.5X, from which in silico analysis yielded preliminary estimates of gene and repetitive sequence content in the genome for each species. Thousands of genomic SSRs were identified, with a clear predisposition toward dinucleotide repeats and AT-rich repeat motifs. Flanking primers were designed for SSR loci for all ten species, ranging from 891 loci in sugar maple to 18,167 in redbay. In summary, we have demonstrated that useful preliminary genome information including repeat content, gene content and useful SSR markers can be obtained at low cost and time input from a single lane of Illumina multiplex sequence. PMID:26698853

  13. Sequencing and analysis of an Irish human genome

    PubMed Central

    2010-01-01

    Background Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence. Results Using sequence data from a branch of the European ancestral tree as yet unsequenced, we identify variants that may be specific to this population. Through comparisons with HapMap and previous genetic association studies, we identified novel disease-associated variants, including a novel nonsense variant putatively associated with inflammatory bowel disease. We describe a novel method for improving SNP calling accuracy at low genome coverage using haplotype information. This analysis has implications for future re-sequencing studies and validates the imputation of Irish haplotypes using data from the current Human Genome Diversity Cell Line Panel (HGDP-CEPH). Finally, we identify gene duplication events as constituting significant targets of recent positive selection in the human lineage. Conclusions Our findings show that there remains utility in generating whole genome sequences to illustrate both general principles and reveal specific instances of human biology. With increasing access to low cost sequencing we would predict that even armed with the resources of a small research group a number of similar initiatives geared towards answering specific biological questions will emerge. PMID:20822512

  14. Complete genome sequence of Arthrobacter sp. strain FB24

    SciTech Connect

    Nakatsu, C. H.; Barabote, Ravi; Thompson, Sue; Bruce, David; Detter, Chris; Brettin, T.; Han, Cliff F.; Beasley, Federico; Chen, Weimin; Konopka, Allan; Xie, Gary

    2013-09-30

    Arthrobacter sp. strain FB24 is a species in the genus Arthrobacter Conn and Dimmick 1947, in the family Micrococcaceae and class Actinobacteria. A number of Arthrobacter genome sequences have been completed because of their important role in soil, especially bioremediation. This isolate is of special interest because it is tolerant to multiple metals and it is extremely resistant to elevated concentrations of chromate. The genome consists of a 4,698,945 bp circular chromosome and three plasmids (96,488, 115,507, and 159,536 bp, a total of 5,070,478 bp), coding 4,536 proteins of which 1,257 are without known function. This genome was sequenced as part of the DOE Joint Genome Institute Program.

  15. Draft genome sequence of the rubber tree Hevea brasiliensis

    PubMed Central

    2013-01-01

    Background Hevea brasiliensis, a member of the Euphorbiaceae family, is the major commercial source of natural rubber (NR). NR is a latex polymer with high elasticity, flexibility, and resilience that has played a critical role in the world economy since 1876. Results Here, we report the draft genome sequence of H. brasiliensis. The assembly spans ~1.1 Gb of the estimated 2.15 Gb haploid genome. Overall, ~78% of the genome was identified as repetitive DNA. Gene prediction shows 68,955 gene models, of which 12.7% are unique to Hevea. Most of the key genes associated with rubber biosynthesis, rubberwood formation, disease resistance, and allergenicity have been identified. Conclusions The knowledge gained from this genome sequence will aid in the future development of high-yielding clones to keep up with the ever increasing need for natural rubber. PMID:23375136

  16. Whole genome sequencing in clinical and public health microbiology

    PubMed Central

    Kwong, J. C.; McCallum, N.; Sintchenko, V.; Howden, B. P.

    2015-01-01

    SummaryGenomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology. The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology. Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories. As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future. Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure. PMID:25730631

  17. Complete genome sequence of Meiothermus ruber type strain (21T)

    SciTech Connect

    Tindall, Brian; Sikorski, Johannes; Lucas, Susan; Goltsman, Eugene; Copeland, A; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Han, Cliff; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Fahnrich, Regine; Goodwin, Lynne A.; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Rohde, Manfred; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lapidus, Alla L.

    2010-01-01

    Meiothermus ruber (Loginova et al. 1984) Nobre et al. 1996 is the type species of the genus Meiothermus. This thermophilic genus is of special interest, as its members can be affiliated to either low-temperature or high-temperature groups. The temperature related split is in accordance with the chemotaxonomic feature of the polar lipids. M. ruber is a representative of the low-temperature group. This is the first completed genome sequence of the genus Meiothermus and only the third genome sequence to be published from a member of the family Thermaceae. The 3,097,457 bp long genome with its 3,052 protein-coding and 53 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  18. Complete genome sequence of Alicyclobacillus acidocaldarius type strain (104-IAT)

    SciTech Connect

    Mavromatis, K; Sikorski, Johannes; Lapidus, Alla L.; Glavina Del Rio, Tijana; Copeland, A; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Nolan, Matt; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Ivanova, N; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Meincke, Linda; Sims, David; Chertkov, Olga; Han, Cliff; Brettin, Tom; Detter, J C; Wahrenburg, Claudia; Rohde, Manfred; Pukall, Rudiger; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2010-01-01

    Alicyclobacillus acidocaldarius (Darland and Brock 1971) is the type species of the larger of the two genera in the bacillal family Alicyclobacillaceae . A. acidocaldarius is a free-living and non-pathogenic organism, but may also be associated with food and fruit spoilage. Due to its acidophilic nature, several enzymes from this species have since long been subjected to detailed molecular and biochemical studies. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family Alicyclobacillaceae . The 3,205,686 bp long genome (chromosome and three plasmids) with its 3,153 protein-coding and 82 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  19. Complete genome sequence of Arthrobacter sp. strain FB24

    PubMed Central

    Nakatsu, Cindy H.; Barabote, Ravi; Thompson, Sue; Bruce, David; Detter, Chris; Brettin, Thomas; Han, Cliff; Beasley, Federico; Chen, Weimin; Konopka, Allan; Xie, Gary

    2013-01-01

    Arthrobacter sp. strain FB24 is a species in the genus Arthrobacter Conn and Dimmick 1947, in the family Micrococcaceae and class Actinobacteria. A number of Arthrobacter genome sequences have been completed because of their important role in soil, especially bioremediation. This isolate is of special interest because it is tolerant to multiple metals and it is extremely resistant to elevated concentrations of chromate. The genome consists of a 4,698,945 bp circular chromosome and three plasmids (96,488, 115,507, and 159,536 bp, a total of 5,070,478 bp), coding 4,536 proteins of which 1,257 are without known function. This genome was sequenced as part of the DOE Joint Genome Institute Program. PMID:24501649

  20. Complete genome sequence of Desulfotomaculum acetoxidans type strain (5575T)

    SciTech Connect

    Spring, Stefan; Lapidus, Alla L.; Schroder, Maren; Gleim, Dorothea; Sims, David; Meincke, Linda; Glavina Del Rio, Tijana; Tice, Hope; Copeland, A; Cheng, Jan-Fang; Chen, Feng; Lucas, Susan; Nolan, Matt; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Saunders, Elizabeth H; Brettin, Tom; Detter, J. Chris; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Han, Cliff

    2009-01-01

    Desulfotomaculum acetoxidans Widdel and Pfennig 1977 was one of the first sulfate-reducing bacteria known to grow with acetate as sole energy and carbon source. It is able to oxidize substrates completely to carbon dioxide with sulfate as the electron acceptor, which is reduced to hydrogen sulfide. All available data about this species are based on strain 5575T, isolated from piggery waste in Germany. Here we describe the features of this organ-ism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a Desulfotomaculum species with validly published name. The 4,545,624 bp long single replicon genome with its 4370 protein-coding and 100 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  1. Genome Sequences of Mycobacteriophages Luchador and Nerujay

    PubMed Central

    Ahmed, Taha; Drobitch, Marissa K.; Early, David R.; Eljamri, Soukaina; Kasturiarachi, Naomi S.; Klonicki, Emily F.; Manjooran, Daniel T.; Ní Chochlain, Aífe N.; Puglionesi, Andrew O.; Rajakumar, Vinod; Shindle, Katherine A.; Tran, Mai T.; Brown, Bryony R.; Churilla, Bryce M.; Cohen, Karen L.; Wilkes, Kellyn E.; Grubb, Sarah R.; Warner, Marcie H.; Bowman, Charles A.; Russell, Daniel A.; Hatfull, Graham F.

    2015-01-01

    Luchador and Nerujay are two newly isolated mycobacteriophages recovered from soil samples using Mycobacterium smegmatis. Their genomes are 53,387 bp and 53,455 bp long and have 96 and 97 predicted open reading frames, respectively. Nerujay is related to subcluster A1 phages, and Luchador represents a new subcluster, A14. PMID:26089414

  2. Genome Sequences of Mycobacteriophages Luchador and Nerujay.

    PubMed

    Pope, Welkin H; Ahmed, Taha; Drobitch, Marissa K; Early, David R; Eljamri, Soukaina; Kasturiarachi, Naomi S; Klonicki, Emily F; Manjooran, Daniel T; Ní Chochlain, Aífe N; Puglionesi, Andrew O; Rajakumar, Vinod; Shindle, Katherine A; Tran, Mai T; Brown, Bryony R; Churilla, Bryce M; Cohen, Karen L; Wilkes, Kellyn E; Grubb, Sarah R; Warner, Marcie H; Bowman, Charles A; Russell, Daniel A; Hatfull, Graham F

    2015-01-01

    Luchador and Nerujay are two newly isolated mycobacteriophages recovered from soil samples using Mycobacterium smegmatis. Their genomes are 53,387 bp and 53,455 bp long and have 96 and 97 predicted open reading frames, respectively. Nerujay is related to subcluster A1 phages, and Luchador represents a new subcluster, A14. PMID:26089414

  3. Complete genome sequence of Croceibacter atlanticus HTCC2559T.

    PubMed

    Oh, Hyun-Myung; Kang, Ilnam; Ferriera, Steve; Giovannoni, Stephen J; Cho, Jang-Cheon

    2010-09-01

    Here we announce the complete genome sequence of Croceibacter atlanticus HTCC2559(T), which was isolated by high-throughput dilution-to-extinction culturing from the Bermuda Atlantic Time Series station in the Western Sargasso Sea. Strain HTCC2559(T) contained genes for carotenoid biosynthesis, flavonoid biosynthesis, and several macromolecule-degrading enzymes. The genome confirmed physiological observations of cultivated Croceibacter atlanticus strain HTCC2559(T), which identified it as an obligate chemoheterotroph. PMID:20639333

  4. Genome Sequence of the Urethral Isolate Pseudomonas aeruginosa RN21

    PubMed Central

    Wibberg, Daniel; Tielen, Petra; Narten, Maike; Schobert, Max; Blom, Jochen; Schatschneider, Sarah; Meyer, Ann-Kathrin; Neubauer, Rüdiger; Albersmeier, Andreas; Albaum, Stefan; Jahn, Martina; Goesmann, Alexander; Vorhölter, Frank-Jörg; Pühler, Alfred

    2015-01-01

    Pseudomonas aeruginosa is known to cause complicated urinary tract infections (UTI). The improved 7.0-Mb draft genome sequence of P. aeruginosa RN21, isolated from a patient with an acute UTI, was determined. It carries three (pro)phage genomes, genes for two restriction/modification systems, and a clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) system. PMID:26184943

  5. Genome Sequence of the Urethral Isolate Pseudomonas aeruginosa RN21.

    PubMed

    Wibberg, Daniel; Tielen, Petra; Narten, Maike; Schobert, Max; Blom, Jochen; Schatschneider, Sarah; Meyer, Ann-Kathrin; Neubauer, Rüdiger; Albersmeier, Andreas; Albaum, Stefan; Jahn, Martina; Goesmann, Alexander; Vorhölter, Frank-Jörg; Pühler, Alfred; Jahn, Dieter

    2015-01-01

    Pseudomonas aeruginosa is known to cause complicated urinary tract infections (UTI). The improved 7.0-Mb draft genome sequence of P. aeruginosa RN21, isolated from a patient with an acute UTI, was determined. It carries three (pro)phage genomes, genes for two restriction/modification systems, and a clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) system. PMID:26184943

  6. Contribution to Sequencing of the Deinococcus radiodurans Genome

    SciTech Connect

    Minton, K.W.

    1999-03-11

    The stated goal of this project was to supply The Institute for Genomic Research (TIGR) with pure DNA from the bacterium Deinocmus radiodurans RI for purposes of complete genomic sequencing by TIGR. We subsequently decided to expand this project to include a second goal; this second goal was the development of a NotI chromosomal map of D. radiodurans R1 using Pulsed Field Gel Electrophoresis (PFGE).

  7. Analysis of Complete Genome Sequences of Human Rhinovirus

    PubMed Central

    Palmenberg, Ann C.; Rathe, Jennifer A.; Liggett, Stephen B.

    2010-01-01

    Human Rhinovirus (HRV) infection is the cause of about one-half of asthma and COPD exacerbations. With >100 serotypes in the HRV reference set an effort was undertaken to sequence their complete genomes so as to understand diversity, structural variation, and evolution of the virus. Analysis revealed conserved motifs, hypervariable regions, a potential fourth HRV species, within-serotype variation in field isolates, a non-scanning internal ribosome entry site, and evidence for HRV recombination. Techniques have now been developed using next generation sequencing to generate complete genomes from patient isolates with high throughput, deep coverage, and low costs. Thus relationships can now be sought between obstructive lung phenotypes and variation in HRV genomes in infected patients, and, potential novel therapeutic strategies developed based on HRV sequence. PMID:20471068

  8. The complete genome sequence of Escherichia coli K-12.

    PubMed

    Blattner, F R; Plunkett, G; Bloch, C A; Perna, N T; Burland, V; Riley, M; Collado-Vides, J; Glasner, J D; Rode, C K; Mayhew, G F; Gregor, J; Davis, N W; Kirkpatrick, H A; Goeden, M A; Rose, D J; Mau, B; Shao, Y

    1997-09-01

    The 4,639,221-base pair sequence of Escherichia coli K-12 is presented. Of 4288 protein-coding genes annotated, 38 percent have no attributed function. Comparison with five other sequenced microbes reveals ubiquitous as well as narrowly distributed gene families; many families of similar genes within E. coli are also evident. The largest family of paralogous proteins contains 80 ABC transporters. The genome as a whole is strikingly organized with respect to the local direction of replication; guanines, oligonucleotides possibly related to replication and recombination, and most genes are so oriented. The genome also contains insertion sequence (IS) elements, phage remnants, and many other patches of unusual composition indicating genome plasticity through horizontal transfer. PMID:9278503

  9. The complete mitochondrial genome sequence of the Daweishan Mini chicken.

    PubMed

    Yan, Ming-Li; Ding, Su-Ping; Ye, Shao-Hui; Wang, Chun-Guang; He, Bao-Li; Yuan, Zhi-Dong; Liu, Li-Li

    2016-01-01

    Daweishan Mini chicken is a valuable chicken breed in China. In this study, the complete mitochondrial genome sequence of Daweishan Mini chicken using PCR amplification, sequencing and assembling has been obtained for the first time. The total length of the mitochondrial genome was 16,785?bp, with the base composition of 30.26% A, 23.73% T, 32.51% C, 13.51% G. It contained 37 genes (2 ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes) and a major non-coding control region (D-loop region). The protein start codons are ATG, except for COX1 that begins with GTG. The complete mitochondrial genome sequence of Daweishan Mini chicken provides an important data set for further investigation on the phylogenetic relationships within Gallus gallus. PMID:24450719

  10. Whole-Genome Sequencing for Optimized Patient Management

    PubMed Central

    Bainbridge, Matthew N.; Wiszniewski, Wojciech; Murdock, David R.; Friedman, Jennifer; Gonzaga-Jauregui, Claudia; Newsham, Irene; Reid, Jeffrey G.; Fink, John K.; Morgan, Margaret B.; Gingras, Marie-Claude; Muzny, Donna M.; Hoang, Linh D.; Yousaf, Shahed; Lupski, James R.; Gibbs, Richard A.

    2012-01-01

    Whole-genome sequencing of patient DNA can facilitate diagnosis of a disease, but its potential for guiding treatment has been under-realized. We interrogated the complete genome sequences of a 14-year-old fraternal twin pair diagnosed with dopa (3,4-dihydroxyphenylalanine)–responsive dystonia (DRD; Mendelian Inheritance in Man #128230). DRD is a genetically heterogeneous and clinically complex movement disorder that is usually treated with l-dopa, a precursor of the neurotransmitter dopamine. Whole-genome sequencing identified compound heterozygous mutations in the SPR gene encoding sepiapterin reductase. Disruption of SPR causes a decrease in tetrahydrobiopterin, a cofactor required for the hydroxylase enzymes that synthesize the neurotransmitters dopamine and serotonin. Supplementation of l-dopa therapy with 5-hydroxytryptophan, a serotonin precursor, resulted in clinical improvements in both twins. PMID:21677200

  11. Complete Genome Sequences for 59 Burkholderia Isolates, Both Pathogenic and Near Neighbor

    DOE PAGESBeta

    Johnson, Shannon L.; Bishop-Lilly, Kimberly A.; Ladner, Jason T.; Daligault, Hajnalka E.; Davenport, Karen W.; Jaissle, James; Frey, Kenneth G.; Koroleva, Galina I.; Bruce, David C.; Coyne, Susan R.; et al

    2015-04-30

    The genus Burkholderia encompasses both pathogenic (including Burkholderia mallei and Burkholderia pseudomallei, U.S. Centers for Disease Control and Prevention Category B listed), and nonpathogenic Gram-negative bacilli. Presented in this document are full genome sequences for a panel of 59 Burkholderia strains, selected to aid in detection assay development.

  12. Complete Genome Sequences for 59 Burkholderia Isolates, Both Pathogenic and Near Neighbor

    PubMed Central

    Bishop-Lilly, Kimberly A.; Ladner, Jason T.; Daligault, Hajnalka E.; Davenport, Karen W.; Jaissle, James; Frey, Kenneth G.; Koroleva, Galina I.; Bruce, David C.; Coyne, Susan R.; Broomall, Stacey M.; Li, Po-E; Teshima, Hazuki; Gibbons, Henry S.; Palacios, Gustavo F.; Rosenzweig, C. Nicole; Redden, Cassie L.; Xu, Yan; Minogue, Timothy D.; Chain, Patrick S.

    2015-01-01

    The genus Burkholderia encompasses both pathogenic (including Burkholderia mallei and Burkholderia pseudomallei, U.S. Centers for Disease Control and Prevention Category B listed), and nonpathogenic Gram-negative bacilli. Here we present full genome sequences for a panel of 59 Burkholderia strains, selected to aid in detection assay development. PMID:25931592

  13. The genome sequence of the colonial chordate, Botryllus schlosseri

    PubMed Central

    Voskoboynik, Ayelet; Neff, Norma F; Sahoo, Debashis; Newman, Aaron M; Pushkarev, Dmitry; Koh, Winston; Passarelli, Benedetto; Fan, H Christina; Mantalas, Gary L; Palmeri, Karla J; Ishizuka, Katherine J; Gissi, Carmela; Griggio, Francesca; Ben-Shlomo, Rachel; Corey, Daniel M; Penland, Lolita; White, Richard A; Weissman, Irving L; Quake, Stephen R

    2013-01-01

    Botryllus schlosseri is a colonial urochordate that follows the chordate plan of development following sexual reproduction, but invokes a stem cell-mediated budding program during subsequent rounds of asexual reproduction. As urochordates are considered to be the closest living invertebrate relatives of vertebrates, they are ideal subjects for whole genome sequence analyses. Using a novel method for high-throughput sequencing of eukaryotic genomes, we sequenced and assembled 580 Mbp of the B. schlosseri genome. The genome assembly is comprised of nearly 14,000 intron-containing predicted genes, and 13,500 intron-less predicted genes, 40% of which could be confidently parceled into 13 (of 16 haploid) chromosomes. A comparison of homologous genes between B. schlosseri and other diverse taxonomic groups revealed genomic events underlying the evolution of vertebrates and lymphoid-mediated immunity. The B. schlosseri genome is a community resource for studying alternative modes of reproduction, natural transplantation reactions, and stem cell-mediated regeneration. DOI: http://dx.doi.org/10.7554/eLife.00569.001 PMID:23840927

  14. Melanoma genome sequencing reveals frequent PREX2 mutations

    PubMed Central

    Berger, Michael F.; Hodis, Eran; Heffernan, Timothy P.; Deribe, Yonathan Lissanu; Lawrence, Michael S.; Protopopov, Alexei; Ivanova, Elena; Watson, Ian R.; Nickerson, Elizabeth; Ghosh, Papia; Zhang, Hailei; Zeid, Rhamy; Ren, Xiaojia; Cibulskis, Kristian; Sivachenko, Andrey Y.; Wagle, Nikhil; Sucker, Antje; Sougnez, Carrie; Onofrio, Robert; Ambrogio, Lauren; Auclair, Daniel; Fennell, Timothy; Carter, Scott L.; Drier, Yotam; Stojanov, Petar; Singer, Meredith A.; Voet, Douglas; Jing, Rui; Saksena, Gordon; Barretina, Jordi; Ramos, Alex H.; Pugh, Trevor J.; Stransky, Nicolas; Parkin, Melissa; Winckler, Wendy; Mahan, Scott; Ardlie, Kristin; Baldwin, Jennifer; Wargo, Jennifer; Schadendorf, Dirk; Meyerson, Matthew; Gabriel, Stacey B.; Golub, Todd R.; Wagner, Stephan N.; Lander, Eric S.; Getz, Gad; Chin, Lynda; Garraway, Levi A.

    2012-01-01

    Melanoma is notable for its metastatic propensity, lethality in the advanced setting, and association with ultraviolet (UV) exposure early in life1. To obtain a comprehensive genomic view of melanoma, we sequenced the genomes of 25 metastatic melanomas and matched germline DNA. A wide range of point mutation rates was observed: lowest in melanomas whose primaries arose on non-UV exposed hairless skin of the extremities (3 and 14 per Mb genome), intermediate in those originating from hair-bearing skin of the trunk (range = 5 to 55 per Mb), and highest in a patient with a documented history of chronic sun exposure (111 per Mb). Analysis of whole-genome sequence data identified PREX2 - a PTEN-interacting protein and negative regulator of PTEN in breast cancer2 - as a significantly mutated gene with a mutation frequency of approximately 14% in an independent extension cohort of 107 human melanomas. PREX2 mutations are biologically relevant, as ectopic expression of mutant PREX2 accelerated tumor formation of immortalized human melanocytes in vivo. Thus, whole-genome sequencing of human melanoma tumors revealed genomic evidence of UV pathogenesis and discovered a new recurrently mutated gene in melanoma. PMID:22622578

  15. The complete mitochondrial genome sequence of the budgerigar, Melopsittacus undulatus.

    PubMed

    Guan, Xiaojing; Xu, Jun; Smith, Edward J

    2016-01-01

    Here, we describe the budgie's mitochondrial genome sequence, a resource that can facilitate this parrot's use as a model organism as well as for determining its phylogenetic relatedness to other parrots/Psittaciformes. The estimated total length of the sequence was 18,193?bp. In addition to the to the 13 protein and tRNA and rRNA coding regions, the sequence also includes a duplicated hypervariable region, a feature unique to only a few birds. The two hypervariable regions shared a sequence identity of about 86%. PMID:24660934

  16. Complete genome sequence of Haliscomenobacter hydrossis type strain (OT)

    SciTech Connect

    Daligault, Hajnalka E.; Lapidus, Alla L.; Zeytun, Ahmet; Nolan, Matt; Lucas, Susan; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Pagani, Ioanna; Ivanova, N; Huntemann, Marcel; Mavromatis, K; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Brambilla, Evelyne-Marie; Rohde, Manfred; Verbarg, Susanne; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Woyke, Tanja

    2011-01-01

    Haliscomenobacter hydrossis van Veen et al. 1973 is the type species of the genus Halisco- menobacter, which belongs to order 'Sphingobacteriales'. The species is of interest because of its isolated phylogenetic location in the tree of life, especially the so far genomically un- charted part of it, and because the organism grows in a thin, hardly visible hyaline sheath. Members of the species were isolated from fresh water of lakes and from ditch water. The genome of H. hydrossis is the first completed genome sequence reported from a member of the family 'Saprospiraceae'. The 8,771,651 bp long genome with its three plasmids of 92 kbp, 144 kbp and 164 kbp length contains 6,848 protein-coding and 60 RNA genes, and is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  17. The minimum information about a genome sequence (MIGS) specification.

    PubMed

    Field, Dawn; Garrity, George; Gray, Tanya; Morrison, Norman; Selengut, Jeremy; Sterk, Peter; Tatusova, Tatiana; Thomson, Nicholas; Allen, Michael J; Angiuoli, Samuel V; Ashburner, Michael; Axelrod, Nelson; Baldauf, Sandra; Ballard, Stuart; Boore, Jeffrey; Cochrane, Guy; Cole, James; Dawyndt, Peter; De Vos, Paul; DePamphilis, Claude; Edwards, Robert; Faruque, Nadeem; Feldman, Robert; Gilbert, Jack; Gilna, Paul; Glöckner, Frank Oliver; Goldstein, Philip; Guralnick, Robert; Haft, Dan; Hancock, David; Hermjakob, Henning; Hertz-Fowler, Christiane; Hugenholtz, Phil; Joint, Ian; Kagan, Leonid; Kane, Matthew; Kennedy, Jessie; Kowalchuk, George; Kottmann, Renzo; Kolker, Eugene; Kravitz, Saul; Kyrpides, Nikos; Leebens-Mack, Jim; Lewis, Suzanna E; Li, Kelvin; Lister, Allyson L; Lord, Phillip; Maltsev, Natalia; Markowitz, Victor; Martiny, Jennifer; Methe, Barbara; Mizrachi, Ilene; Moxon, Richard; Nelson, Karen; Parkhill, Julian; Proctor, Lita; White, Owen; Sansone, Susanna-Assunta; Spiers, Andrew; Stevens, Robert; Swift, Paul; Taylor, Chris; Tateno, Yoshio; Tett, Adrian; Turner, Sarah; Ussery, David; Vaughan, Bob; Ward, Naomi; Whetzel, Trish; San Gil, Ingio; Wilson, Gareth; Wipat, Anil

    2008-05-01

    With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the 'transparency' of the information contained in existing genomic databases. PMID:18464787

  18. Castor Bean Organelle Genome Sequencing and Worldwide Genetic Diversity Analysis

    PubMed Central

    Chan, Agnes P.; Williams, Amber L.; Rice, Danny W.; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M. J.; Khouri, Hoda M.; Beckstrom-Sternberg, Stephen M.; Allan, Gerard J.; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D.

    2011-01-01

    Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade. PMID:21750729

  19. Complete genome sequence of Pyrolobus fumarii type strain (1AT)

    SciTech Connect

    Anderson, Iain; Goker, Markus; Nolan, Matt; Lucas, Susan; Hammon, Nancy; Deshpande, Shweta; Cheng, Jan-Fang; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne A.; Pitluck, Sam; Huntemann, Marcel; Liolios, Konstantinos; Ivanova, N; Pagani, Ioanna; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Brambilla, Evelyne-Marie; Huber, Harald; Yasawong, Montri; Rohde, Manfred; Spring, Stefan; Abt, Birte; Sikorski, Johannes; Wirth, Reinhard; Detter, J. Chris; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lapidus, Alla L.

    2011-01-01

    Pyrolobus fumarii Bl chl et al. 1997 is the type species of the genus Pyrolobus, which be- longs to the crenarchaeal family Pyrodictiaceae. The species is a facultatively microaerophilic non-motile crenarchaeon. It is of interest because of its isolated phylogenetic location in the tree of life and because it is a hyperthermophilic chemolithoautotroph known as the primary producer of organic matter at deep-sea hydrothermal vents. P. fumarii exhibits currently the highest optimal growth temperature of all life forms on earth (106 C). This is the first com- pleted genome sequence of a member of the genus Pyrolobus to be published and only the second genome sequence from a member of the family Pyrodictiaceae. Although Diversa Corporation announced the completion of sequencing of the P. fumarii genome on Septem- ber 25, 2001, this sequence was never released to the public. The 1,843,267 bp long genome with its 1,986 protein-coding and 52 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  20. Adaptive seeds tame genomic sequence comparison.

    PubMed

    Kie?basa, Szymon M; Wan, Raymond; Sato, Kengo; Horton, Paul; Frith, Martin C

    2011-03-01

    The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition. PMID:21209072

  1. MinION nanopore sequencing of an influenza genome

    PubMed Central

    Wang, Jing; Moore, Nicole E.; Deng, Yi-Mo; Eccles, David A.; Hall, Richard J.

    2015-01-01

    Influenza epidemics and pandemics have significant impacts on economies, morbidity and mortality worldwide. The ability to rapidly and accurately sequence influenza viruses is instrumental in the prevention and mitigation of influenza. All eight influenza genes from an influenza A virus were amplified by PCR simultaneously and then subjected to sequencing on a MinION nanopore sequencer. A complete influenza virus genome was obtained that shared greater than 99% identity with sequence data obtained from Illumina MiSeq and traditional Sanger-sequencing. The laboratory infrastructure and computing resources used to perform this experiment on the MinION nanopore sequencer would be available in most molecular laboratories around the world. Using this system, the concept of portability, and thus sequencing influenza viruses in the clinic or field is now tenable. PMID:26347715

  2. Establishing a framework for comparative analysis of genome sequences

    SciTech Connect

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  3. Characteristics of cloned repeated DNA sequences in the barley genome

    SciTech Connect

    Anan'ev, E.V.; Bochkanov, S.S.; Ryzhik, M.V.; Sonina, N.V.; Chernyshev, A.I.; Shchipkova, N.I.; Yakovleva, E.Yu.

    1986-12-01

    A partial clone library of barley DNA fragments based on plasmid pBR325 was created. The cloned EcoRI-fragments of chromosomal DNA are from 2 to 14 kbp in length. More than 95% of the barley DNA inserts comprise repeated sequences of different complexity and copy number. Certain of these DNA sequences are from families comprising at least 1% of the barley genome. A significant proportion of the clones hybridize with numerous sets of restriction fragments of genome DNA and they are dispersed throughout the barley chromosomes.

  4. Profiling DNA Methylomes from Microarray to Genome-Scale Sequencing

    PubMed Central

    Huang, Yi-Wen; Huang, Tim H.-M.; Wang, Li-Shu

    2010-01-01

    DNA cytosine methylation is a central epigenetic modification which plays critical roles in cellular processes including genome regulation, development and disease. Here, we review current and emerging microarray and next-generation sequencing based technologies that enhance our knowledge of DNA methylation profiling. Each methodology has limitations and their unique applications, and combinations of several modalities may help build the entire methylome. With advances on next-generation sequencing technologies, it is now possible to globally map the DNA cytosine methylation at single-base resolution, providing new insights into the regulation and dynamics of DNA methylation in genomes. PMID:20218736

  5. The genomic sequence of cardamine chlorotic fleck carmovirus.

    PubMed

    Skotnicki, M L; Mackenzie, A M; Torronen, M; Gibbs, A J

    1993-09-01

    The complete genomic sequence of cardamine chlorotic fleck carmovirus (CCFV) has been determined. The genome is a positive-sense ssRNA molecule 4041 nucleotides in length, and has 47 to 64% sequence identity with turnip crinkle, carnation mottle and melon necrotic spot carmoviruses. CCFV and these other carmoviruses have four similar open reading frames (ORFs), and CCFV has large regions of amino acid identity in all of these ORFs with a European isolate of turnip crinkle virus. CCFV, which replicates well in Arabidopsis thaliana, has only been found so far in Australia in the wild perennial brassica Cardamine lilacina. PMID:8376969

  6. Complete Plastid Genome Sequence of the Brown Alga Undaria pinnatifida.

    PubMed

    Zhang, Lei; Wang, Xumin; Liu, Tao; Wang, Guoliang; Chi, Shan; Liu, Cui; Wang, Haiyang

    2015-01-01

    In this study, we fully sequenced the circular plastid genome of a brown alga, Undaria pinnatifida. The genome is 130,383 base pairs (bp) in size; it contains a large single-copy (LSC, 76,598 bp) and a small single-copy region (SSC, 42,977 bp), separated by two inverted repeats (IRa and IRb: 5,404 bp). The genome contains 139 protein-coding, 28 tRNA, and 6 rRNA genes; none of these genes contains introns. Organization and gene contents of the U. pinnatifida plastid genome were similar to those of Saccharina japonica. There is a co-linear relationship between the plastid genome of U. pinnatifida and that of three previously sequenced large brown algal species. Phylogenetic analyses of 43 taxa based on 23 plastid protein-coding genes grouped all plastids into a red or green lineage. In the large brown algae branch, U. pinnatifida and S. japonica formed a sister clade with much closer relationship to Ectocarpus siliculosus than to Fucus vesiculosus. For the first time, the start codon ATT was identified in the plastid genome of large brown algae, in the atpA gene of U. pinnatifida. In addition, we found a gene-length change induced by a 3-bp repetitive DNA in ycf35 and ilvB genes of the U. pinnatifida plastid genome. PMID:26426800

  7. Complete Plastid Genome Sequence of the Brown Alga Undaria pinnatifida

    PubMed Central

    Liu, Tao; Wang, Guoliang; Chi, Shan; Liu, Cui; Wang, Haiyang

    2015-01-01

    In this study, we fully sequenced the circular plastid genome of a brown alga, Undaria pinnatifida. The genome is 130,383 base pairs (bp) in size; it contains a large single-copy (LSC, 76,598 bp) and a small single-copy region (SSC, 42,977 bp), separated by two inverted repeats (IRa and IRb: 5,404 bp). The genome contains 139 protein-coding, 28 tRNA, and 6 rRNA genes; none of these genes contains introns. Organization and gene contents of the U. pinnatifida plastid genome were similar to those of Saccharina japonica. There is a co-linear relationship between the plastid genome of U. pinnatifida and that of three previously sequenced large brown algal species. Phylogenetic analyses of 43 taxa based on 23 plastid protein-coding genes grouped all plastids into a red or green lineage. In the large brown algae branch, U. pinnatifida and S. japonica formed a sister clade with much closer relationship to Ectocarpus siliculosus than to Fucus vesiculosus. For the first time, the start codon ATT was identified in the plastid genome of large brown algae, in the atpA gene of U. pinnatifida. In addition, we found a gene-length change induced by a 3-bp repetitive DNA in ycf35 and ilvB genes of the U. pinnatifida plastid genome. PMID:26426800

  8. Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing

    PubMed Central

    2011-01-01

    Background Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. Results A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. Conclusions The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first step in the development of a community resource for further study of plant-insect co-evolution, anti-herbivore defense, floral developmental genetics, reproductive biology, chemical evolution, population genetics, and comparative genomics using milkweeds, and A. syriaca in particular, as ecological and evolutionary models. PMID:21542930

  9. Genome Calligrapher: A Web Tool for Refactoring Bacterial Genome Sequences for de Novo DNA Synthesis.

    PubMed

    Christen, Matthias; Deutsch, Samuel; Christen, Beat

    2015-08-21

    Recent advances in synthetic biology have resulted in an increasing demand for the de novo synthesis of large-scale DNA constructs. Any process improvement that enables fast and cost-effective streamlining of digitized genetic information into fabricable DNA sequences holds great promise to study, mine, and engineer genomes. Here, we present Genome Calligrapher, a computer-aided design web tool intended for whole genome refactoring of bacterial chromosomes for de novo DNA synthesis. By applying a neutral recoding algorithm, Genome Calligrapher optimizes GC content and removes obstructive DNA features known to interfere with the synthesis of double-stranded DNA and the higher order assembly into large DNA constructs. Subsequent bioinformatics analysis revealed that synthesis constraints are prevalent among bacterial genomes. However, a low level of codon replacement is sufficient for refactoring bacterial genomes into easy-to-synthesize DNA sequences. To test the algorithm, 168 kb of synthetic DNA comprising approximately 20 percent of the synthetic essential genome of the cell-cycle bacterium Caulobacter crescentus was streamlined and then ordered from a commercial supplier of low-cost de novo DNA synthesis. The successful assembly into eight 20 kb segments indicates that Genome Calligrapher algorithm can be efficiently used to refactor difficult-to-synthesize DNA. Genome Calligrapher is broadly applicable to recode biosynthetic pathways, DNA sequences, and whole bacterial genomes, thus offering new opportunities to use synthetic biology tools to explore the functionality of microbial diversity. The Genome Calligrapher web tool can be accessed at https://christenlab.ethz.ch/GenomeCalligrapher ?. PMID:26107775

  10. Sequence modelling and an extensible data model for genomic database

    SciTech Connect

    Li, Peter Wei-Der Lawrence Berkeley Lab., CA )

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  11. Sequence modelling and an extensible data model for genomic database

    SciTech Connect

    Li, Peter Wei-Der |

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  12. NCI Center of Excellence in Integrative Cancer Biology and Genomics

    Cancer.gov

    Highlighted Article 1 The Center of Excellence in Integrative Cancer Biology and Genomics (CEICBG) is one of four Centers of Excellence established within the NCI Intramural Research Program (IRP). The Centers of Excellence build upon existing structures

  13. Repeated sequences in bacterial chromosomes and plasmids: a glimpse from sequenced genomes.

    PubMed

    Romero, D; Martínez-Salazar, J; Ortiz, E; Rodríguez, C; Valencia-Morales, E

    1999-01-01

    To gain insight into the extent of exact DNA repeats in sequenced bacterial genomes and their plasmids, we analyzed the collection of completely sequenced bacterial genomes available at GenBank using the program Miropeats. This program draws graphical representations of exact DNA repeats in whole genomes. In this work, we present maps showing the extent and type (inverted or direct) of exact DNA repeats longer than 300 bp for the whole collection. These repeats may participate in a variety of events relevant for bacterial genome plasticity, such as amplifications, deletions, inversions, and translocations (via homologous recombination), as well as transposition. Additionally, we review recent data showing that high-frequency architectural variations in genomic structure occur at both the interspecies and interstrain levels. PMID:10673011

  14. Complete genome sequence of Allochromatium vinosum DSM 180T

    SciTech Connect

    Weissgerber, Thomas; Zigann, Renate; Bruce, David; Chang, Yun-Juan; Detter, J. Chris; Han, Cliff; Hauser, Loren John; Jeffries, Cynthia; Land, Miriam L; Munk, Christine; Tapia, Roxanne; Dahl, Christiane

    2011-01-01

    Allochromatium vinosum formerly Chromatium vinosum is a mesophilic purple sulfur bacte- rium belonging to the family Chromatiaceae in the bacterial class Gammaproteobacteria. The genus Allochromatium contains currently five species. All members were isolated from fresh- water, brackish water or marine habitats and are predominately obligate phototrophs. Here we describe the features of the organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the Chromatiaceae within the purple sulfur bacteria thriving in globally occurring habitats. The 3,669,074 bp ge- nome with its 3,302 protein-coding and 64 RNA genes was sequenced within the Joint Ge- nome Institute Community Sequencing Program.

  15. Easy quantitative assessment of genome editing by sequence trace decomposition.

    PubMed

    Brinkman, Eva K; Chen, Tao; Amendola, Mario; van Steensel, Bas

    2014-12-16

    The efficacy and the mutation spectrum of genome editing methods can vary substantially depending on the targeted sequence. A simple, quick assay to accurately characterize and quantify the induced mutations is therefore needed. Here we present TIDE, a method for this purpose that requires only a pair of PCR reactions and two standard capillary sequencing runs. The sequence traces are then analyzed by a specially developed decomposition algorithm that identifies the major induced mutations in the projected editing site and accurately determines their frequency in a cell population. This method is cost-effective and quick, and it provides much more detailed information than current enzyme-based assays. An interactive web tool for automated decomposition of the sequence traces is available. TIDE greatly facilitates the testing and rational design of genome editing strategies. PMID:25300484

  16. Next Generation Sequencing to Characterize Mitochondrial Genomic DNA Heteroplasmy

    PubMed Central

    Huang, Taosheng

    2015-01-01

    This protocol is to describe the methodology to characterize mitochondria DNA (mtDNA) heteroplasmy with parallel sequencing. Mitochondria play a very important role in important cellular functions. Each eukaryotic cell contains hundreds of mitochondria with hundreds of mitochondria genomes. The mutant mtDNA and the wild type may co-exist as heteroplasmy, and cause human disease. The purpose of this methodology is to simultaneously determine mtDNA sequence and to quantify the heteroplasmy level. The protocol includes two-fragment mitochondria genome DNA PCR amplification. The PCR product is then mixed at an equimolar ratio. The samples will be barcoded and sequenced with high-throughput next-generation sequencing technology. We found that this technology is highly sensitive, specific, and accurate in determining mtDNA mutations and the degree of heteroplasmic level. PMID:21975941

  17. Complete genome sequence of Thauera aminoaromatica strain MZ1T

    SciTech Connect

    Sanseverino, John; Chauhan, Archana; Lucas, Susan; Copeland, A; Lapidus, Alla L.; Glavina Del Rio, Tijana; Dalin, Eileen; Tice, Hope; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Sims, David; Brettin, Thomas S; Detter, J. Chris; Han, Cliff; Chang, Yun-Juan; Larimer, Frank W; Land, Miriam L; Hauser, Loren John; Kyrpides, Nikos C; Mikhailova, Natalia; Moser, Scott; Jegier, Patricia; Close, Dan; Wang, Ying; Layton, Alice; Allen, Michael S.; Sayler, Gary

    2012-01-01

    Thauera aminoaromatica strain MZ1T, an isolate belonging to genus Thauera, of the family Rhodocyclaceae and the class the Betaproteobacteria, has been characterized for its ability to produce abundant exopolysaccharide and degrade various aromatic compounds with nitrate as an electron acceptor. These properties, if fully understood at the genome-sequence level, can aid in environmental processing of organic matter in anaerobic cycles by short-circuiting a central anaerobic metabolite, acetate, from microbiological conversion to methane, a criti-cal greenhouse gas. Strain MZ1T is the first strain from the genus Thauera with a completely sequenced genome. The 4,496,212 bp chromosome and 78,374 bp plasmid contain 4,071 protein-coding and 71 RNA genes, and were sequenced as part of the DOE Community Se-quencing Program CSP{_}776774.

  18. Complete genome sequence of Thauera aminoaromatica strain MZ1T

    PubMed Central

    Jiang, Ke; Sanseverino, John; Chauhan, Archana; Lucas, Susan; Copeland, Alex; Lapidus, Alla; Del Rio, Tijana Glavina; Dalin, Eileen; Tice, Hope; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Sims, David; Brettin, Thomas; Detter, John C.; Han, Cliff; Chang, Y.J.; Larimer, Frank; Land, Miriam; Hauser, Loren; Kyrpides, Nikos C.; Mikhailova, Natalia; Moser, Scott; Jegier, Patricia; Close, Dan; DeBruyn, Jennifer M.; Wang, Ying; Layton, Alice C.; Allen, Michael S.; Sayler, Gary S.

    2012-01-01

    Thauera aminoaromatica strain MZ1T, an isolate belonging to genus Thauera, of the family Rhodocyclaceae and the class the Betaproteobacteria, has been characterized for its ability to produce abundant exopolysaccharide and degrade various aromatic compounds with nitrate as an electron acceptor. These properties, if fully understood at the genome-sequence level, can aid in environmental processing of organic matter in anaerobic cycles by short-circuiting a central anaerobic metabolite, acetate, from microbiological conversion to methane, a critical greenhouse gas. Strain MZ1T is the first strain from the genus Thauera with a completely sequenced genome. The 4,496,212 bp chromosome and 78,374 bp plasmid contain 4,071 protein-coding and 71 RNA genes, and were sequenced as part of the DOE Community Sequencing Program CSP_776774. PMID:23407619

  19. Final progress report, Construction of a genome-wide highly characterized clone resource for genome sequencing

    SciTech Connect

    Nierman, William C.

    2000-02-14

    At TIGR, the human Bacterial Artificial Chromosome (BAC) end sequencing and trimming were with an overall sequencing success rate of 65%. CalTech human BAC libraries A, B, C and D as well as Roswell Park Cancer Institute's library RPCI-11 were used. To date, we have generated >300,000 end sequences from >186,000 human BAC clones with an average read length {approx}460 bp for a total of 141 Mb covering {approx}4.7% of the genome. Over sixty percent of the clones have BAC end sequences (BESs) from both ends representing over five-fold coverage of the genome by the paired-end clones. The average phred Q20 length is {approx}400 bp. This high accuracy makes our BESs match the human finished sequences with an average identity of 99% and a match length of 450 bp, and a frequency of one match per 12.8 kb contig sequence. Our sample tracking has ensured a clone tracking accuracy of >90%, which gives researchers a high confidence in (1) retrieving the right clone from the BA C libraries based on the sequence matches; and (2) building a minimum tiling path of sequence-ready clones across the genome and genome assembly scaffolds.

  20. Complete mitochondrial genome sequence of Romanogobio tenuicorpus (Amur whitefin gudgeon).

    PubMed

    Dong, Fang; Tong, Guang-Xiang; Kuang, You-Yi; Sun, Xiao-Wen

    2015-12-01

    Amur whitefin gudgeon (Romanogobio tenuicorpus) belongs to the family Cyprinidae, it is freshwater aquaculture species in China. In the report, we determined the complete mitochondrial genome sequence of Romanogobio tenuicorpus, which is 16,600?bp long circular molecule with 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes and a control region, the conserved sequence blocks, CSB1, CSB2 and CSB3 were also detected. PMID:24409923

  1. Sugarcane genome sequencing by methylation filtration provides tools for genomic research in the genus Saccharum.

    PubMed

    Grativol, Clícia; Regulski, Michael; Bertalan, Marcelo; McCombie, W Richard; da Silva, Felipe Rodrigues; Zerlotini Neto, Adhemar; Vicentini, Renato; Farinelli, Laurent; Hemerly, Adriana Silva; Martienssen, Robert A; Ferreira, Paulo Cavalcanti Gomes

    2014-07-01

    Many economically important crops have large and complex genomes that hamper their sequencing by standard methods such as whole genome shotgun (WGS). Large tracts of methylated repeats occur in plant genomes that are interspersed by hypomethylated gene-rich regions. Gene-enrichment strategies based on methylation profiles offer an alternative to sequencing repetitive genomes. Here, we have applied methyl filtration with McrBC endonuclease digestion to enrich for euchromatic regions in the sugarcane genome. To verify the efficiency of methylation filtration and the assembly quality of sequences submitted to gene-enrichment strategy, we have compared assemblies using methyl-filtered (MF) and unfiltered (UF) libraries. The use of methy filtration allowed a better assembly by filtering out 35% of the sugarcane genome and by producing 1.5× more scaffolds and 1.7× more assembled Mb in length compared with unfiltered dataset. The coverage of sorghum coding sequences (CDS) by MF scaffolds was at least 36% higher than by the use of UF scaffolds. Using MF technology, we increased by 134× the coverage of gene regions of the monoploid sugarcane genome. The MF reads assembled into scaffolds that covered all genes of the sugarcane bacterial artificial chromosomes (BACs), 97.2% of sugarcane expressed sequence tags (ESTs), 92.7% of sugarcane RNA-seq reads and 98.4% of sorghum protein sequences. Analysis of MF scaffolds from encoded enzymes of the sucrose/starch pathway discovered 291 single-nucleotide polymorphisms (SNPs) in the wild sugarcane species, S. spontaneum and S. officinarum. A large number of microRNA genes was also identified in the MF scaffolds. The information achieved by the MF dataset provides a valuable tool for genomic research in the genus Saccharum and for improvement of sugarcane as a biofuel crop. PMID:24773339

  2. Sequence Determination from Overlapping Fragments: A Simple Model of Whole-Genome Shotgun Sequencing

    NASA Astrophysics Data System (ADS)

    Derrida, Bernard; Fink, Thomas M.

    2002-02-01

    Assembling fragments randomly sampled from along a sequence is the basis of whole-genome shotgun sequencing, a technique used to map the DNA of the human and other genomes. We calculate the probability that a random sequence can be recovered from a collection of overlapping fragments. We provide an exact solution for an infinite alphabet and in the case of constant overlaps. For the general problem we apply two assembly strategies and give the probability that the assembly puzzle can be solved in the limit of infinitely many fragments.

  3. Nucleotide sequence stability of the genome of hepatitis delta virus.

    PubMed

    Netter, H J; Wu, T T; Bockol, M; Cywinski, A; Ryu, W S; Tennant, B C; Taylor, J M

    1995-03-01

    Cultured cells were cotransfected with a fully sequenced 1,679-base cDNA clone of human hepatitis delta virus (HDV) RNA genome and a cDNA for the genome of woodchuck hepatitis virus (WHV). The HDV particles released were able to infect a woodchuck that was chronically infected with WHV. The HDV so produced was passaged a total of six times in woodchucks in order to determine the stability of the HDV nucleotide sequence. During a final chronic infection with such virus, liver RNA was extracted, and the HDV nucleotide sequence for the 352-base region, positions 905 to 1256, was obtained. By means of PCR, we obtained double-stranded cDNA both for direct sequencing and also for molecular cloning followed by sequencing. By direct sequencing, we found that a consensus sequence existed and was identical to the original sequence. From the sequences of 31 clones, we found 32% (10 of 31) to be identical to the original single nucleotide sequence. For the remainder, there were neither insertions nor deletions but there was a small number of single-nucleotide changes. These changes were predominantly transitions rather than transversions. Furthermore, the transitions were largely of just two types, uridine to cytidine and adenosine to guanosine. Of the 40 changes detected on HDV, 35% (14 of 40) occurred within an eight-nucleotide region that included position 1012, previously shown to be a site of RNA editing. These findings may have significant implications regarding both the stability of the HDV RNA genome and the mechanism of RNA editing. PMID:7853505

  4. LLNL Genomic Assessment: Viral and Bacterial Sequencing Needs for TMTI, Task 1.4.2 Report

    SciTech Connect

    Slezak, T; Borucki, M; Lam, M; Lenhoff, R; Vitalis, E

    2010-01-26

    Good progress has been made on both bacterial and viral sequencing by the TMTI centers. While access to appropriate samples is a limiting factor to throughput, excellent progress has been made with respect to getting agreements in place with key sources of relevant materials. Sharing of sequenced genomes funded by TMTI has been extremely limited to date. The April 2010 exercise should force a resolution to this, but additional managerial pressures may be needed to ensure that rapid sharing of TMTI-funded sequencing occurs, regardless of collaborator constraints concerning ultimate publication(s). Policies to permit TMTI-internal rapid sharing of sequenced genomes should be written into all TMTI agreements with collaborators now being negotiated. TMTI needs to establish a Web-based system for tracking samples destined for sequencing. This includes metadata on sample origins and contributor, information on sample shipment/receipt, prioritization by TMTI, assignment to one or more sequencing centers (including possible TMTI-sponsored sequencing at a contributor site), and status history of the sample sequencing effort. While this system could be a component of the AFRL system, it is not part of any current development effort. Policy and standardized procedures are needed to ensure appropriate verification of all TMTI samples prior to the investment in sequencing. PCR, arrays, and classical biochemical tests are examples of potential verification methods. Verification is needed to detect miss-labeled, degraded, mixed or contaminated samples. Regular QC exercises are needed to ensure that the TMTI-funded centers are meeting all standards for producing quality genomic sequence data.

  5. Mitochondrial Genome Sequences Effectively Reveal the Phylogeny of Hylobates Gibbons

    PubMed Central

    Chan, Yi-Chiao; Roos, Christian; Inoue-Murayama, Miho; Inoue, Eiji; Shih, Chih-Chin; Pei, Kurtis Jai-Chyi; Vigilant, Linda

    2010-01-01

    Background Uniquely among hominoids, gibbons exist as multiple geographically contiguous taxa exhibiting distinctive behavioral, morphological, and karyotypic characteristics. However, our understanding of the evolutionary relationships of the various gibbons, especially among Hylobates species, is still limited because previous studies used limited taxon sampling or short mitochondrial DNA (mtDNA) sequences. Here we use mtDNA genome sequences to reconstruct gibbon phylogenetic relationships and reveal the pattern and timing of divergence events in gibbon evolutionary history. Methodology/Principal Findings We sequenced the mitochondrial genomes of 51 individuals representing 11 species belonging to three genera (Hylobates, Nomascus and Symphalangus) using the high-throughput 454 sequencing system with the parallel tagged sequencing approach. Three phylogenetic analyses (maximum likelihood, Bayesian analysis and neighbor-joining) depicted the gibbon phylogenetic relationships congruently and with strong support values. Most notably, we recover a well-supported phylogeny of the Hylobates gibbons. The estimation of divergence times using Bayesian analysis with relaxed clock model suggests a much more rapid speciation process in Hylobates than in Nomascus. Conclusions/Significance Use of more than 15 kb sequences of the mitochondrial genome provided more informative and robust data than previous studies of short mitochondrial segments (e.g., control region or cytochrome b) as shown by the reliable reconstruction of divergence patterns among Hylobates gibbons. Moreover, molecular dating of the mitogenomic divergence times implied that biogeographic change during the last five million years may be a factor promoting the speciation of Sundaland animals, including Hylobates species. PMID:21203450

  6. Genome Sequencing Fishes out Longevity Genes.

    PubMed

    Lakhina, Vanisha; Murphy, Coleen T

    2015-12-01

    Understanding the molecular basis underlying aging is critical if we are to fully understand how and why we age-and possibly how to delay the aging process. Up until now, most longevity pathways were discovered in invertebrates because of their short lifespans and availability of genetic tools. Now, Reichwald et al. and Valenzano et al. independently provide a reference genome for the short-lived African turquoise killifish, establishing its role as a vertebrate system for aging research. PMID:26638067

  7. Island length distribution in genome sequencing.

    PubMed

    Percus, O E; Percus, J K

    1999-09-01

    We consider the general problem of constructing a physical map of a genome by welding islands of overlapping clones. Both distribution of clone length and non-uniform probability of overlap detection are taken into account, the latter restricted to the Markov case in which only the location of the end of the developing island is required. Exact results for the distribution of island length are obtained in the special cases of fixed clone length or rigid overlap criterion, and mean and variance for the general situation. Determination of ocean length distribution permits island number and contig number distributions to be found as well. PMID:10501922

  8. Draft Genome Sequence of Bacillus endophyticus 2102

    PubMed Central

    Lee, Yong-Jik; Lee, Sang-Jae; Kim, Sun Hong; Lee, Sang Jun; Kim, Byoung-Chan; Lee, Han-Seung

    2012-01-01

    Bacillus endophyticus 2102 is an endospore-forming, plant growth-promoting rhizobacterium isolated from a hypersaline pond in South Korea. Here we present the draft sequence of B. endophyticus 2102, which is of interest because of its potential use in the industrial production of algaecides and bioplastics and for the treatment of industrial textile effluents. PMID:23012284

  9. Sequencing a North American yak genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Livestock researchers are beginning to identify beneficial effects of natural genetic variation in livestock. For example, comparing gene sequences from related species has helped identify the underlying mechanisms of traits like coat color, fertility, and disease resistance. Although cattle and y...

  10. Genome Sequence of Brevibacillus formosus F12T for a Genome-Sequencing Project for Genomic Taxonomy and Phylogenomics of Bacillus-Like Bacteria

    PubMed Central

    Wang, Jie-Ping; Liu, Guo-Hong; Chen, Qian-qian; Zhu, Yu-jing; Chen, Zheng; Che, Jian-mei

    2015-01-01

    Brevibacillus formosus F12T is a Gram-positive, spore-forming, and strictly aerobic bacterium. Here, we report the draft 6.215-Mb genome sequence of B. formosus F12T, which will provide useful information for genomic taxonomy and phylogenomics of Bacillus-like bacteria, as well as for the functional gene mining and application of B. formosus. PMID:26205874

  11. Mutator System Derivatives Isolated from Sugarcane Genome Sequence.

    PubMed

    Manetti, M E; Rossi, M; Cruz, G M Q; Saccaro, N L; Nakabashi, M; Altebarmakian, V; Rodier-Goud, M; Domingues, D; D'Hont, A; Van Sluys, M A

    2012-09-01

    Mutator-like transposase is the most represented transposon transcript in the sugarcane transcriptome. Phylogenetic reconstructions derived from sequenced transcripts provided evidence that at least four distinct classes exist (I-IV) and that diversification among these classes occurred early in Angiosperms, prior to the divergence of Monocots/Eudicots. The four previously described classes served as probes to select and further sequence six BAC clones from a genomic library of cultivar R570. A total of 579,352 sugarcane base pairs were produced from these "Mutator system" BAC containing regions for further characterization. The analyzed genomic regions confirmed that the predicted structure and organization of the Mutator system in sugarcane is composed of two true transposon lineages, each containing a specific terminal inverted repeat and two transposase lineages considered to be domesticated. Each Mutator transposase class displayed a particular molecular structure supporting lineage specific evolution. MUSTANG, previously described domesticated genes, are located in syntenic regions across Sacharineae and, as expected for a host functional gene, posses the same gene structure as in other Poaceae. Two sequenced BACs correspond to hom(eo)logous locus with specific retrotransposon insertions that discriminate sugarcane haplotypes. The comparative studies presented, add information to the Mutator systems previously identified in the maize and rice genomes by describing lineage specific molecular structure and genomic distribution pattern in the sugarcane genome. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s12042-012-9104-y) contains supplementary material, which is available to authorized users. PMID:22905278

  12. Genome Sequencing Highlights the Dynamic Early History of Dogs

    PubMed Central

    Freedman, Adam H.; Gronau, Ilan; Schweizer, Rena M.; Ortega-Del Vecchyo, Diego; Han, Eunjung; Silva, Pedro M.; Galaverni, Marco; Fan, Zhenxin; Marx, Peter; Lorente-Galdos, Belen; Beale, Holly; Ramirez, Oscar; Hormozdiari, Farhad; Alkan, Can; Vilà, Carles; Squire, Kevin; Geffen, Eli; Kusak, Josip; Boyko, Adam R.; Parker, Heidi G.; Lee, Clarence; Tadigotla, Vasisht; Siepel, Adam; Bustamante, Carlos D.; Harkins, Timothy T.; Nelson, Stanley F.; Ostrander, Elaine A.; Marques-Bonet, Tomas; Wayne, Robert K.; Novembre, John

    2014-01-01

    To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we generated high-quality genome sequences from three gray wolves, one from each of the three putative centers of dog domestication, two basal dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. Analysis of these sequences supports a demographic model in which dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow. In dogs, the domestication bottleneck involved at least a 16-fold reduction in population size, a much more severe bottleneck than estimated previously. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was substantially larger than represented by modern wolf populations. We narrow the plausible range for the date of initial dog domestication to an interval spanning 11–16 thousand years ago, predating the rise of agriculture. In light of this finding, we expand upon previous work regarding the increase in copy number of the amylase gene (AMY2B) in dogs, which is believed to have aided digestion of starch in agricultural refuse. We find standing variation for amylase copy number variation in wolves and little or no copy number increase in the Dingo and Husky lineages. In conjunction with the estimated timing of dog origins, these results provide additional support to archaeological finds, suggesting the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that, surprisingly, none of the extant wolf lineages from putative domestication centers is more closely related to dogs, and, instead, the sampled wolves form a sister monophyletic clade. This result, in combination with dog-wolf admixture during the process of domestication, suggests that a re-evaluation of past hypotheses regarding dog origins is necessary. PMID:24453982

  13. Genome sequencing highlights the dynamic early history of dogs.

    PubMed

    Freedman, Adam H; Gronau, Ilan; Schweizer, Rena M; Ortega-Del Vecchyo, Diego; Han, Eunjung; Silva, Pedro M; Galaverni, Marco; Fan, Zhenxin; Marx, Peter; Lorente-Galdos, Belen; Beale, Holly; Ramirez, Oscar; Hormozdiari, Farhad; Alkan, Can; Vilà, Carles; Squire, Kevin; Geffen, Eli; Kusak, Josip; Boyko, Adam R; Parker, Heidi G; Lee, Clarence; Tadigotla, Vasisht; Wilton, Alan; Siepel, Adam; Bustamante, Carlos D; Harkins, Timothy T; Nelson, Stanley F; Ostrander, Elaine A; Marques-Bonet, Tomas; Wayne, Robert K; Novembre, John

    2014-01-01

    To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we generated high-quality genome sequences from three gray wolves, one from each of the three putative centers of dog domestication, two basal dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. Analysis of these sequences supports a demographic model in which dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow. In dogs, the domestication bottleneck involved at least a 16-fold reduction in population size, a much more severe bottleneck than estimated previously. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was substantially larger than represented by modern wolf populations. We narrow the plausible range for the date of initial dog domestication to an interval spanning 11-16 thousand years ago, predating the rise of agriculture. In light of this finding, we expand upon previous work regarding the increase in copy number of the amylase gene (AMY2B) in dogs, which is believed to have aided digestion of starch in agricultural refuse. We find standing variation for amylase copy number variation in wolves and little or no copy number increase in the Dingo and Husky lineages. In conjunction with the estimated timing of dog origins, these results provide additional support to archaeological finds, suggesting the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that, surprisingly, none of the extant wolf lineages from putative domestication centers is more closely related to dogs, and, instead, the sampled wolves form a sister monophyletic clade. This result, in combination with dog-wolf admixture during the process of domestication, suggests that a re-evaluation of past hypotheses regarding dog origins is necessary. PMID:24453982

  14. [Genome sequencing and personalized medicine: perspectives and limitations].

    PubMed

    Le Gall, Jean-Yves; Debré, Patrice

    2014-01-01

    DNA sequencing technologies have advanced at an exponential rate in recent years: the first human genome was sequenced in 2001 after many years of effort by dozens of international laboratories at a cost of tens of millions of dollars, while in 2013 a genome can be sequenced within 24 hours for a few hundred dollars (exome sequencing takes only a few hours). More and more hospital laboratories are acquiring new high-throughput sequencing devices ("next-generation sequencers", NGS), allowing them to analyze tens or hundreds of genes, or even the entire exome. This is having a major impact on medical concepts and practices, especially with respect to genetics and oncology. This ability to search for mutations simultaneously in a large number of genes is finding applications in the diagnosis of Mendelian diseases (including at birth), routine screening for heterozygotes, and pre-conception diagnosis. NGS is now sufficiently sensitive to analyze circulating fetal DNA in maternal blood (cell-free fetal DNA, cffDNA), enabling applications such as non invasive diagnosis of fetal sex (and X-linked diseases), fetal rhesus among rhesus-negative women, trisomy and, in the near future, Mendelian mutations. Data on multifactorial diseases are still preliminary, but it should soon be possible to identify "strong" factors of genetic predisposition that have so far been beyond the scope of genome-wide association studies (GWAS). In the field of constitutional oncogenetics, NGS can also be used for simultaneous analysis of genes involved in " hereditary " cancers (21 breast cancer genes, 6 colon cancer genes, etc.). More generally, NGS can identify all genomic abnormalities (deletions, translocations, mutations) in a given malignant tissue (hemopathy or solid tumor), and has the potential to distinguish between important mutations (those that drive tumor progression) from " bystander " or accessory mutations, and also to identify "druggable" mutations amenable to targeted therapies (e.g. imatinib and Bcr/Abl rearrangement; verumafemib and the BRAF V600E mutation). Systematic sequencing of all the genes involved in drug metabolism and responsiveness will lead to individualized pharmacogenetics. Finally, sequencing of the tumoral and constitutional genomes, identfication of somatic mutations, and detection of pharmacogenetic variants will open up the era of personalized medicine. The first results of these targeted therapeutic indications show a gain in the duration of remission and survival, although the cost-effectiveness of these approaches remains to be determined. Finally, this huge capacity for genome sequencing raises a number of regulatory and ethical issues. PMID:26259290

  15. Evaluation of next generation mtGenome sequencing using the Ion Torrent Personal Genome Machine (PGM)?

    PubMed Central

    Parson, Walther; Strobl, Christina; Huber, Gabriela; Zimmermann, Bettina; Gomes, Sibylle M.; Souto, Luis; Fendt, Liane; Delport, Rhena; Langit, Reina; Wootton, Sharon; Lagacé, Robert; Irwin, Jodi

    2013-01-01

    Insights into the human mitochondrial phylogeny have been primarily achieved by sequencing full mitochondrial genomes (mtGenomes). In forensic genetics (partial) mtGenome information can be used to assign haplotypes to their phylogenetic backgrounds, which may, in turn, have characteristic geographic distributions that would offer useful information in a forensic case. In addition and perhaps even more relevant in the forensic context, haplogroup-specific patterns of mutations form the basis for quality control of mtDNA sequences. The current method for establishing (partial) mtDNA haplotypes is Sanger-type sequencing (STS), which is laborious, time-consuming, and expensive. With the emergence of Next Generation Sequencing (NGS) technologies, the body of available mtDNA data can potentially be extended much more quickly and cost-efficiently. Customized chemistries, laboratory workflows and data analysis packages could support the community and increase the utility of mtDNA analysis in forensics. We have evaluated the performance of mtGenome sequencing using the Personal Genome Machine (PGM) and compared the resulting haplotypes directly with conventional Sanger-type sequencing. A total of 64 mtGenomes (>1 million bases) were established that yielded high concordance with the corresponding STS haplotypes (<0.02% differences). About two-thirds of the differences were observed in or around homopolymeric sequence stretches. In addition, the sequence alignment algorithm employed to align NGS reads played a significant role in the analysis of the data and the resulting mtDNA haplotypes. Further development of alignment software would be desirable to facilitate the application of NGS in mtDNA forensic genetics. PMID:23948325

  16. Complete Genome Sequence of Caulobacter crescentus Podophage Percy

    PubMed Central

    Lerma, Roxann A.; Tidwell, T. J.; Cahill, Jesse L.; Rasche, Eric S.

    2015-01-01

    Podophage Percy infects Caulobacter crescentus, a Gram-negative bacterium that divides asymmetrically and is a commonly used model organism to study the cell cycle, asymmetric cell division, and cell differentiation. Here, we announce the sequence and annotated complete genome of the phiKMV-like podophage Percy and note its prominent features. PMID:26607888

  17. Complete Genome Sequence of Bordetella pertussis D420

    PubMed Central

    Boinett, Christine J.; Harris, Simon R.; Langridge, Gemma C.; Trainor, Elizabeth A.; Merkel, Tod J.

    2015-01-01

    Bordetella pertussis is the causative agent of whooping cough, a highly contagious, acute respiratory illness that has seen resurgence despite the use of vaccines. We present the complete genome sequence of a clinical strain of B. pertussis, D420, which is representative of a currently circulating clade of this pathogen. PMID:26067980

  18. Complete Genome Sequence of Bordetella pertussis D420.

    PubMed

    Boinett, Christine J; Harris, Simon R; Langridge, Gemma C; Trainor, Elizabeth A; Merkel, Tod J; Parkhill, Julian

    2015-01-01

    Bordetella pertussis is the causative agent of whooping cough, a highly contagious, acute respiratory illness that has seen resurgence despite the use of vaccines. We present the complete genome sequence of a clinical strain of B. pertussis, D420, which is representative of a currently circulating clade of this pathogen. PMID:26067980

  19. Complete Genome Sequence of Agrobacterium tumefaciens Ach5

    PubMed Central

    Huang, Ya-Yi; Cho, Shu-Ting; Lo, Wen-Sui; Wang, Yi-Chieh; Lai, Erh-Min

    2015-01-01

    Agrobacterium tumefaciens is a phytopathogenic bacterium that causes crown gall disease. The strain Ach5 was isolated from yarrow (Achillea ptarmica L.) and is the wild-type progenitor of other derived strains widely used for plant transformation. Here, we report the complete genome sequence of this bacterium. PMID:26044425

  20. Complete genome sequence of canine papillomavirus type 16.

    PubMed

    Luff, Jennifer; Mader, Michelle; Britton, Monica; Fass, Joseph; Rowland, Peter; Orr, Carolyn; Schlegel, Richard; Yuan, Hang

    2015-01-01

    Papillomaviruses are epitheliotropic, circular, double-stranded DNA viruses within the family Papillomaviridae that are associated with benign and malignant tumors in humans and animals. We report the complete genome sequence of canine papillomavirus type 16 identified within multiple pigmented cutaneous plaques and squamous cell carcinoma from an intact female Basenji dog. PMID:25953189

  1. Draft Genome Sequence of the Cellulolytic Fungus Chaetomium globosum.

    PubMed

    Cuomo, Christina A; Untereiner, Wendy A; Ma, Li-Jun; Grabherr, Manfred; Birren, Bruce W

    2015-01-01

    Chaetomium globosum is a filamentous fungus typically isolated from cellulosic substrates. This species also causes superficial infections of humans and, more rarely, can cause cerebral infections. Here, we report the genome sequence of C. globosum isolate CBS 148.51, which will facilitate the study and comparative analysis of this fungus. PMID:25720678

  2. Genome Sequence of Porphyromonas gingivalis Strain A7436

    PubMed Central

    Xie, Gary; Bélanger, Myriam; Kumar, Dibyendu; Whitlock, Joan A.; Liu, Li; Farmerie, William G.; Daligault, Hajnalka E.; Han, Cliff S.; Brettin, Thomas S.

    2015-01-01

    Porphyromonas gingivalis is strongly associated with periodontitis. P. gingivalis strain trafficking and tissue homing differ widely, even among presumptive closely related strains, such as W83 and A7436. Here, we present the genome sequence of A7436 with a single contig of 2,367,029 bp and a G+C content of 48.33%. PMID:26404590

  3. Draft Genome Sequence of Pseudomonas moraviensis R28-S.

    PubMed

    Hunter, Samuel S; Yano, Hirokazu; Loftie-Eaton, Wesley; Hughes, Julie; De Gelder, Leen; Stragier, Pieter; De Vos, Paul; Settles, Matthew L; Top, Eva M

    2014-01-01

    We report the draft genome sequence of Pseudomonas moraviensis R28-S, isolated from the municipal wastewater treatment plant of Moscow, ID. The strain carries a native mercury resistance plasmid, poorly maintains introduced IncP-1 antibiotic resistance plasmids, and has been useful for studying the evolution of plasmid host range and stability. PMID:24558233

  4. Draft Genome Sequence of Pseudomonas moraviensis R28-S

    PubMed Central

    Yano, Hirokazu; Loftie-Eaton, Wesley; Hughes, Julie; De Gelder, Leen; Stragier, Pieter; De Vos, Paul; Settles, Matthew L.

    2014-01-01

    We report the draft genome sequence of Pseudomonas moraviensis R28-S, isolated from the municipal wastewater treatment plant of Moscow, ID. The strain carries a native mercury resistance plasmid, poorly maintains introduced IncP-1 antibiotic resistance plasmids, and has been useful for studying the evolution of plasmid host range and stability. PMID:24558233

  5. Complete Genome Sequence of Biocontrol Strain Pseudomonas fluorescens LBUM 223

    PubMed Central

    Roquigny, Roxane; Arseneault, Tanya; Gadkar, Vijay J.; Novinscak, Amy

    2015-01-01

    Pseudomonas fluorescens LBUM 223 is a plant growth-promoting rhizobacterium (PGPR) with biocontrol activity against various plant pathogens. It produces the antimicrobial metabolite phenazine-1-carboxylic acid, which is involved in the biocontrol of Streptomyces scabies, the causal agent of common scab of potato. Here, we report the complete genome sequence of P. fluorescens LBUM 223. PMID:25953163

  6. Draft Genome Sequence of Rice Isolate Pseudomonas chlororaphis EA105

    PubMed Central

    McCully, Lucy M.; Bitzer, Adam S.; Spence, Carla A.; Bais, Harsh P.

    2014-01-01

    Pseudomonas chlororaphis EA105, a strain isolated from rice rhizosphere, has shown antagonistic activities against a rice fungal pathogen, and could be important in defense against rice blast. We report the draft genome sequence of EA105, which is an estimated size of 6.6 Mb. PMID:25540352

  7. PARTIAL MITOCHONDRIAL GENOME SEQUENCES OF OSTRINIA NUBILALIS AND OSTRINIA FURNICALIS

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Contiguous 14,535-nt and 14,536-nt near complete mitochondrial genome sequences respectively were obtained for Ostrinia nubilalis and Ostrinia furnicalis. Translocation of trnM was observed compared to Drosophila and the hexanucleotide ATTTAG may initiate cox1 translation. The overall percentage AT ...

  8. Complete genome sequence of Robiginitalea biformata HTCC2501.

    PubMed

    Oh, Hyun-Myung; Giovannoni, Stephen J; Lee, Kiyoung; Ferriera, Steve; Johnson, Justin; Cho, Jang-Cheon

    2009-11-01

    Robiginitalea biformata HTCC2501, isolated from the Sargasso Sea by dilution-to-extinction culturing, has been known as an aerobic chemoheterotroph with carotenoid pigments and dimorphic growth phases. Here, we announce the complete sequence of the R. biformata HTCC2501 genome, which contains genes for carotenoid biosynthesis and several macromolecule-degrading enzymes. PMID:19767438

  9. PHYTOPHTHORA GENOME SEQUENCES UNCOVER EVOLUTIONARY ORIGINS AND MECHANISMS OF PATHOGENESIS

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Draft genome sequences of the soybean pathogen Phytophthora sojae and the sudden oak death pathogen Phytophthora ramorum have been determined to depths of 9x and 7.7x, respectively. Oomycetes such as these Phytophthora species share the kingdom Stramenopiles with photosynthetic algae such as diatoms...

  10. Complete Genome Sequence of Bacillus megaterium Siphophage Silence

    PubMed Central

    Solis, Jonathan A.; Farmer, Nicholas G.; Cahill, Jesse L.; Rasche, Eric S.

    2015-01-01

    Silence is a newly isolated siphophage that infects Bacillus megaterium, a soil bacterium that is used readily in research and commercial applications. A study of B. megaterium phage Silence will enhance our knowledge of the diversity of Bacillus phages. Here, we describe the complete genome sequence and annotated features of Silence. PMID:26450722

  11. Complete Genome Sequence of Enterotoxigenic Escherichia coli Myophage Murica

    PubMed Central

    Wilder, Joseph N.; Lancaster, Jacob C.; Cahill, Jesse L.; Rasche, Eric S.

    2015-01-01

    Murica is an rv5-like myophage that infects enterotoxigenic Escherichia coli. Pathogenic E. coli strains are responsible for many intestinal diseases, and phages that infect these bacteria may prove useful in preventing severe health issues. The following is a report of the complete genome sequence of Murica and its important features. PMID:26430048

  12. Complete Genome Sequence of Citrobacter freundii Myophage Mordin

    PubMed Central

    Guan, Jingwen; Snowden, Jeffrey D.; Cahill, Jesse L.; Rasche, Eric S.

    2015-01-01

    Citrobacter freundii is a Gram-negative opportunistic pathogen that is associated with urinary tract infections. Bacteriophages infecting C. freundii can be used as an effective treatment to fight these infections. Here, we announce the complete genome sequence of the C. freundii Felix O1-like myophage Mordin and describe its features. PMID:26472844

  13. Draft Genome Sequence of Bacillus subtilis strain KATMIRA1933

    PubMed Central

    Melnikov, Vyacheslav G.; Chikindas, Michael L.

    2014-01-01

    In this report, we present a draft sequence of Bacillus subtilis KATMIRA1933. Previous studies demonstrated probiotic properties of this strain partially attributed to production of an antibacterial compound, subtilosin. Comparative analysis of this strain’s genome with that of a commercial probiotic strain, B. subtilis Natto, is presented. PMID:24948771

  14. Complete Genome Sequence of Vibrio vulnificus Bacteriophage SSP002

    PubMed Central

    Lee, Hyun Sung; Choi, Slae

    2012-01-01

    Vibrio vulnificus phages are abundant in coastal marine environments, shellfish, clams, and oysters. SSP002, a V. vulnificus-specific bacteriophage, was isolated from oysters from the west coast of South Korea. In this study, the complete genome of SSP002 was sequenced and analyzed for the first time among the V. vulnificus-specific bacteriophages. PMID:22733877

  15. Complete genome sequence of Vibrio vulnificus bacteriophage SSP002.

    PubMed

    Lee, Hyun Sung; Choi, Slae; Choi, Sang Ho

    2012-07-01

    Vibrio vulnificus phages are abundant in coastal marine environments, shellfish, clams, and oysters. SSP002, a V. vulnificus-specific bacteriophage, was isolated from oysters from the west coast of South Korea. In this study, the complete genome of SSP002 was sequenced and analyzed for the first time among the V. vulnificus-specific bacteriophages. PMID:22733877

  16. WHEAT AND RYE GENOME CODING SEQUENCE VARIATION IN TRITICALE

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The coding sequences of triticale were studied by using RFLP and AFLP analyses with fifty oat EST clones and forty primer pairs, respectively, to analyze the genome banding profiles of four primary triticales and their wheat and rye progenitors. The results showed that almost 100 percent of the EST...

  17. Transcriptome and genome sequencing uncovers functional variation in humans.

    PubMed

    Lappalainen, Tuuli; Sammeth, Michael; Friedländer, Marc R; 't Hoen, Peter A C; Monlong, Jean; Rivas, Manuel A; Gonzàlez-Porta, Mar; Kurbatova, Natalja; Griebel, Thasso; Ferreira, Pedro G; Barann, Matthias; Wieland, Thomas; Greger, Liliana; van Iterson, Maarten; Almlöf, Jonas; Ribeca, Paolo; Pulyakhina, Irina; Esser, Daniela; Giger, Thomas; Tikhonov, Andrew; Sultan, Marc; Bertier, Gabrielle; MacArthur, Daniel G; Lek, Monkol; Lizano, Esther; Buermans, Henk P J; Padioleau, Ismael; Schwarzmayr, Thomas; Karlberg, Olof; Ongen, Halit; Kilpinen, Helena; Beltran, Sergi; Gut, Marta; Kahlem, Katja; Amstislavskiy, Vyacheslav; Stegle, Oliver; Pirinen, Matti; Montgomery, Stephen B; Donnelly, Peter; McCarthy, Mark I; Flicek, Paul; Strom, Tim M; Lehrach, Hans; Schreiber, Stefan; Sudbrak, Ralf; Carracedo, Angel; Antonarakis, Stylianos E; Häsler, Robert; Syvänen, Ann-Christine; van Ommen, Gert-Jan; Brazma, Alvis; Meitinger, Thomas; Rosenstiel, Philip; Guigó, Roderic; Gut, Ivo G; Estivill, Xavier; Dermitzakis, Emmanouil T

    2013-09-26

    Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project--the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome. PMID:24037378

  18. Complete Genome Sequence of Enterococcus Bacteriophage EFLK1

    PubMed Central

    Khalifa, Leron; Coppenhagen-Glazer, Shunit; Shlezinger, Mor; Kott-Gutkowski, Miriam; Adini, Omri; Beyth, Nurit

    2015-01-01

    We previously isolated EFDG1, a lytic phage against enterococci for therapeutic use. Nevertheless, EFDG1-resistant bacterial strains (EFDG1r) have evolved. EFLK1, a new highly effective phage against EFDG1r strains, was isolated in this study. The genome of EFLK1 was fully sequenced, analyzed, and deposited in GenBank. PMID:26586876

  19. Complete Genome Sequence of Biocontrol Strain Pseudomonas fluorescens LBUM223.

    PubMed

    Roquigny, Roxane; Arseneault, Tanya; Gadkar, Vijay J; Novinscak, Amy; Joly, David L; Filion, Martin

    2015-01-01

    Pseudomonas fluorescens LBUM223 is a plant growth-promoting rhizobacterium (PGPR) with biocontrol activity against various plant pathogens. It produces the antimicrobial metabolite phenazine-1-carboxylic acid, which is involved in the biocontrol of Streptomyces scabies, the causal agent of common scab of potato. Here, we report the complete genome sequence of P. fluorescens LBUM223. PMID:25953163

  20. Complete Genome Sequences of Six South African Rabies Viruses

    PubMed Central

    Phahladira, Baby; Marston, Denise A.; Wise, Emma L.; Ellis, Richard J.; Fooks, Anthony R.

    2015-01-01

    South African rabies viruses (RABVs) from dogs and jackals (canid viruses) are highly related and most likely originated from a single progenitor. RABV is the cause of most global human rabies cases. The complete genome sequences of 3 RABVs from South Africa and Zimbabwe are reported here. PMID:26430028

  1. Characterization of reniform nematode genome through shotgun sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The reniform nematode (RN), a major agricultural pest particularly on cotton in the United States(U.S.), is among the major plant parasitic nematodes for which limited genomic information exists. In this study, over 380 Mb of sequence data were generated from four pooled adult female RN and assembl...

  2. Draft Genome Sequence of Rhodotorula mucilaginosa, an Emergent Opportunistic Pathogen

    PubMed Central

    Deligios, Massimo; Fraumene, Cristina; Abbondio, Marcello; Mannazzu, Ilaria; Tanca, Alessandro; Addis, Maria Filippa

    2015-01-01

    Rhodotorula mucilaginosa, a yeast with valuable biotechnological features, has also been recorded as an emergent opportunistic pathogen that might cause disease in both immunocompetent and immunocompromised individuals. Here, we report the draft genome sequence of R. mucilaginosa strain C2.5t1, which was isolated from cacao seeds in Cameroon. PMID:25858834

  3. Genome Sequence of Avirulent Riemerella anatipestifer Strain RA-JLLY

    PubMed Central

    Zhang, Rongrong; Luo, Qingping; Wen, Guoyuan; Ai, Diyun; Wang, Honglin; Luo, Ling; Wang, Hongcai

    2015-01-01

    Riemerella anatipestifer is an important bacterial pathogen associated with epizootic infections in waterfowl and various other birds. Riemerella anatipestifer strain RA-JLLY is an avirulent strain, isolated from the brain of an old duck in Hubei province, China. Here, we report the genome sequence of this species. PMID:26404587

  4. Complete genomic sequence of rabies virus from an ethiopian wolf.

    PubMed

    Marston, Denise A; Wise, Emma L; Ellis, Richard J; McElhinney, Lorraine M; Banyard, Ashley C; Johnson, Nicholas; Deressa, Asefa; Regassa, Fekede; de Lamballerie, Xavier; Fooks, Anthony R; Sillero-Zubiri, Claudio

    2015-01-01

    Ethiopian wolves are the rarest canid in the world, with only 500 found in the Ethiopian highlands. Rabies poses the most immediate threat to their survival, causing epizootic cycles of mass mortality. The complete genome sequence of a rabies virus (RABV) derived from an Ethiopian wolf during the most recent epizootic is reported here. PMID:25814597

  5. Genome Sequence of Mucoid Pseudomonas aeruginosa Strain FRD1.

    PubMed

    Wang, Di; Hildebrand, Falk; Ye, Lumeng; Wei, Qing; Ma, Luyan Z

    2015-01-01

    Pseudomonas aeruginosa is an important opportunistic pathogen. Strain FRD1 is a mucoid isolate from the sputum of a cystic fibrosis patient. It has been widely studied and has many different phenotypes compared to nonmucoid strains. Here, we present the draft genome sequence of P. aeruginosa strain FRD1 to gain insight into mucoid isolates. PMID:25908149

  6. Analysis of Chimpanzee History Based on Genome Sequence Alignments

    E-print Network

    Reich, David

    Analysis of Chimpanzee History Based on Genome Sequence Alignments Jennifer L. Caswell1 of three western chimpanzees, three central chimpanzees, an eastern chimpanzee, a bonobo, a human was previously available. We show that bonobos and common chimpanzees were separated ,1,290,000 years ago

  7. Genome Sequences of Three Turkey Orthoreovirus Strains Isolated in Hungary

    PubMed Central

    Dandár, Eszter; Fehér, Enik?; Bálint, Ádám; Kisfali, Péter; Melegh, Béla; Mató, Tamás; Kecskeméti, Sándor; Palya, Vilmos; Bányai, Krisztián

    2015-01-01

    We have investigated the genomic properties of three turkey reovirus strains—19831M09, D1246, and D1104—isolated in Hungary in 2009. Sequence identity values and phylogenetic calculations indicated genetic conservativeness among the studied Hungarian strains and a close relationship with strains isolated in the United States. PMID:26586882

  8. FEMALE-SPECIFIC DNA SEQUENCES IN THE CHICKEN GENOME

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Female-specific regions in the chicken's genome were detected in silico. Eight fragments out of 21 that were in-silico W-specific, were shown to produce PCR products only in females. Some of these fragments gave a female-specific product in turkeys and peacocks. We sequenced all eight fragments in o...

  9. Genome Sequence of Porphyromonas gingivalis Strain AJW4

    PubMed Central

    Xie, Gary; Bélanger, Myriam; Kumar, Dibyendu; Whitlock, Joan A.; Liu, Li; Farmerie, William G.; Daligault, Hajnalka E.; Han, Cliff S.; Brettin, Thomas S.

    2015-01-01

    Porphyromonas gingivalis is associated with oral and systemic diseases. Strain-specific P. gingivalis invasion phenotypes have been correlated with disease presentation in infected laboratory animals. Here, we present the genome sequence of AJW4, a minimally invasive strain, with a single contig of 2,372,492 bp and a G+C content of 48.27%. PMID:26543127

  10. Draft Genome Sequence of Rhodotorula mucilaginosa, an Emergent Opportunistic Pathogen.

    PubMed

    Deligios, Massimo; Fraumene, Cristina; Abbondio, Marcello; Mannazzu, Ilaria; Tanca, Alessandro; Addis, Maria Filippa; Uzzau, Sergio

    2015-01-01

    Rhodotorula mucilaginosa, a yeast with valuable biotechnological features, has also been recorded as an emergent opportunistic pathogen that might cause disease in both immunocompetent and immunocompromised individuals. Here, we report the draft genome sequence of R. mucilaginosa strain C2.5t1, which was isolated from cacao seeds in Cameroon. PMID:25858834

  11. Draft Genome Sequence of the Fungus Penicillium brasilianum MG11.

    PubMed

    Horn, Fabian; Linde, Jörg; Mattern, Derek J; Walther, Grit; Guthke, Reinhard; Brakhage, Axel A; Valiante, Vito

    2015-01-01

    The genus Penicillium belongs to the phylum Ascomycota and includes a variety of fungal species important for food and drug production. We report the draft genome sequence of Penicillium brasilianum MG11. This strain was isolated from soil, and it was reported to produce different secondary metabolites. PMID:26337871

  12. Draft genome sequence of the mulberry tree Morus notabilis

    PubMed Central

    He, Ningjia; Zhang, Chi; Qi, Xiwu; Zhao, Shancen; Tao, Yong; Yang, Guojun; Lee, Tae-Ho; Wang, Xiyin; Cai, Qingle; Li, Dong; Lu, Mengzhu; Liao, Sentai; Luo, Guoqing; He, Rongjun; Tan, Xu; Xu, Yunmin; Li, Tian; Zhao, Aichun; Jia, Ling; Fu, Qiang; Zeng, Qiwei; Gao, Chuan; Ma, Bi; Liang, Jiubo; Wang, Xiling; Shang, Jingzhe; Song, Penghua; Wu, Haiyang; Fan, Li; Wang, Qing; Shuai, Qin; Zhu, Juanjuan; Wei, Congjin; Zhu-Salzman, Keyan; Jin, Dianchuan; Wang, Jinpeng; Liu, Tao; Yu, Maode; Tang, Cuiming; Wang, Zhenjiang; Dai, Fanwei; Chen, Jiafei; Liu, Yan; Zhao, Shutang; Lin, Tianbao; Zhang, Shougong; Wang, Junyi; Wang, Jian; Yang, Huanming; Yang, Guangwei; Wang, Jun; Paterson, Andrew H.; Xia, Qingyou; Ji, Dongfeng; Xiang, Zhonghuai

    2013-01-01

    Human utilization of the mulberry–silkworm interaction started at least 5,000 years ago and greatly influenced world history through the Silk Road. Complementing the silkworm genome sequence, here we describe the genome of a mulberry species Morus notabilis. In the 330-Mb genome assembly, we identify 128?Mb of repetitive sequences and 29,338 genes, 60.8% of which are supported by transcriptome sequencing. Mulberry gene sequences appear to evolve ~3 times faster than other Rosales, perhaps facilitating the species’ spread worldwide. The mulberry tree is among a few eudicots but several Rosales that have not preserved genome duplications in more than 100 million years; however, a neopolyploid series found in the mulberry tree and several others suggest that new duplications may confer benefits. Five predicted mulberry miRNAs are found in the haemolymph and silk glands of the silkworm, suggesting interactions at molecular levels in the plant–herbivore relationship. The identification and analyses of mulberry genes involved in diversifying selection, resistance and protease inhibitor expressed in the laticifers will accelerate the improvement of mulberry plants. PMID:24048436

  13. RECOGNITION OF POLYADENYLATION SITES FROM ARABIDOPSIS GENOMIC SEQUENCES

    E-print Network

    Wong, Limsoon

    site. The selection of polyadenylation sites are determined by polyadenylation signals or cis1 RECOGNITION OF POLYADENYLATION SITES FROM ARABIDOPSIS GENOMIC SEQUENCES CHUAN HOCK KOH LIMSOON. The ability to predict polyadenylation site will allow us to define gene boundaries, predict number of genes

  14. Draft Genome Sequence of the Sexually Transmitted Pathogen

    E-print Network

    Gent, Universiteit

    that likely transpired through lateral gene transfer from bacteria, and amplification of specific gene of the parasite during its transition to a urogenital environment. The genome sequence predicts previously unknown was identified (Table 1), endowing T. vaginalis with one of the highest coding capacities among eukaryotes (table

  15. Complete Genome Sequences of Six South African Rabies Viruses.

    PubMed

    Sabeta, Claude; Phahladira, Baby; Marston, Denise A; Wise, Emma L; Ellis, Richard J; Fooks, Anthony R

    2015-01-01

    South African rabies viruses (RABVs) from dogs and jackals (canid viruses) are highly related and most likely originated from a single progenitor. RABV is the cause of most global human rabies cases. The complete genome sequences of 3 RABVs from South Africa and Zimbabwe are reported here. PMID:26430028

  16. Genome Sequence of Enterotoxigenic Escherichia coli Strain B2C.

    PubMed

    Madhavan, T P Vipin; Steen, Jason A; Hugenholtz, Philip; Sakellaris, Harry

    2014-01-01

    Enterotoxigenic Escherichia coli (ETEC) is a major cause of diarrheal disease around the globe, causing an estimated 380,000 deaths annually. The disease is caused by a wide variety of strains. Here, we report the genome sequence of ETEC strain B2C, which was isolated from an American soldier in Vietnam. PMID:24723709

  17. Genome Sequence of Enterotoxigenic Escherichia coli Strain B2C

    PubMed Central

    Vipin Madhavan, T. P.; Steen, Jason A.; Hugenholtz, Philip

    2014-01-01

    Enterotoxigenic Escherichia coli (ETEC) is a major cause of diarrheal disease around the globe, causing an estimated 380,000 deaths annually. The disease is caused by a wide variety of strains. Here, we report the genome sequence of ETEC strain B2C, which was isolated from an American soldier in Vietnam. PMID:24723709

  18. Draft Genome Sequence of Shewanella sp. Strain CP20.

    PubMed

    Lutz, Carla; Martin Tay, Qi Xiang; Sun, Shuyang; McDougald, Diane

    2015-01-01

    Shewanella sp. CP20 is a marine bacterium that survives ingestion by Tetrahymena pyriformis and is expelled from the protozoan within membrane-bound vacuoles, where the bacterial cells show long-term survival. Here, we report the draft genome sequence of Shewanella sp. CP20 and discuss the potential mechanisms facilitating intraprotozoan survival. PMID:25858840

  19. Draft Genome Sequence of the Fungus Penicillium brasilianum MG11

    PubMed Central

    Linde, Jörg; Mattern, Derek J.; Walther, Grit; Guthke, Reinhard; Brakhage, Axel A.

    2015-01-01

    The genus Penicillium belongs to the phylum Ascomycota and includes a variety of fungal species important for food and drug production. We report the draft genome sequence of Penicillium brasilianum MG11. This strain was isolated from soil, and it was reported to produce different secondary metabolites. PMID:26337871

  20. Draft Genome Sequence of Rhodococcus rhodochrous Strain ATCC 21198

    SciTech Connect

    Shields-Menard, Sara A.; Brown, Steven D; Klingeman, Dawn Marie; Indest, Karl; Hancock, Dawn; Wewalwela, Jayani; French, Todd; Donaldson, Janet

    2014-01-01

    Rhodococcus rhodochrous is a Gram-positive red-pigmented bacterium commonly found in the soil. The draft genome sequence for R. rhodochrous strain ATCC 21198 is presented here to provide genetic data for a better understanding of its lipid-accumulating capabilities.

  1. Genome Sequence of the Yeast Cyberlindnera fabianii (Hansenula fabianii)

    PubMed Central

    Freel, Kelle C.; Sarilar, Véronique; Neuvéglise, Cécile; Devillers, Hugo; Friedrich, Anne

    2014-01-01

    The yeast Cyberlindnera fabianii is used in wastewater treatment, fermentation of alcoholic beverages, and has caused blood infections. To assist in the accurate identification of this species, and to determine the genetic basis for properties involved in fermentation and water treatment, we sequenced and annotated the genome of C. fabianii (YJS4271). PMID:25103752

  2. Genome sequence of the palaeopolyploid Jeremy Schmutz1,2

    E-print Network

    Bhattacharyya, Madan Kumar

    10,17 , Randy C. Shoemaker3 & Scott A. Jackson5 Soybean (Glycine max) is one of the most important of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties. Legumes feed protein and cooking oil. We report here a soybean whole-genome shotgun sequence of Glycine max var

  3. Genome Sequence of the Paleopolyploid Soybean (Glycine max (L.) Merr.)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We report the genome sequence for soybean (Glycine max var. Williams 82), one of the most important crop plants worldwide because of its ability to produce both protein and oil. Soybean is a recently domesticated legume that plays a vital role in crop rotation as it fixes atmospheric nitrogen via s...

  4. Genome Sequence of a Urease-positive Campylobacter lari.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Campylobacter lari is frequently isolated from shore birds and can cause illness in humans. Here we report the draft whole genome sequence of an urease-positive strain of C. lari that was isolated in estuarial water on the coast of Delaware, USA....

  5. Len Gen: The international lentil genome sequencing project

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We have been sequencing CDC Redberry using NGS of paired-end and mate-pair libraries over a wide range of sizes and technologies. The most recent draft (v0.7) of approximately 150x coverage produced scaffolds covering over half the genome (2.7 Gb of the expected 4.3 Gb). Long reads from PacBio sequ...

  6. The Genome Sequence of the Malaria Mosquito Anopheles gambiae

    E-print Network

    Salzberg, Steven

    The Genome Sequence of the Malaria Mosquito Anopheles gambiae Robert A. Holt,1 * G. Mani insights into the phys- iological adaptations of a hematophagous insect. The mosquito is both an elegant, exquisitely adapted organism and a scourge of humanity. The principal mosquito-borne human illnesses

  7. Complete genome sequence of phototrophic betaproteobacterium Rubrivivax gelatinosus IL144.

    PubMed

    Nagashima, Sakiko; Kamimura, Akiko; Shimizu, Takayuki; Nakamura-Isaki, Sanae; Aono, Eiji; Sakamoto, Koji; Ichikawa, Natsuko; Nakazawa, Hidekazu; Sekine, Mitsuo; Yamazaki, Shuji; Fujita, Nobuyuki; Shimada, Keizo; Hanada, Satoshi; Nagashima, Kenji V P

    2012-07-01

    Rubrivivax gelatinosus is a facultative photoheterotrophic betaproteobacterium living in freshwater ponds, sewage ditches, activated sludge, and food processing wastewater. There have not been many studies on photosynthetic betaproteobacteria. Here we announce the complete genome sequence of the best-studied phototrophic betaproteobacterium, R. gelatinosus IL-144 (NBRC 100245). PMID:22689232

  8. Low coverage sequencing of two Asian elephant (Elephas maximus) genomes

    PubMed Central

    2014-01-01

    Background There are three species of elephant that exist, the Asian elephant (Elephas maximus) and two species of African elephant (Loxodonta africana and Loxodonta cyclotis). The populations of all three species are dwindling, and are under threat due to factors, such as habitat destruction and ivory hunting. The species differ in many respects, including in their morphology and response to disease. The availability of elephant genome sequence data from all three elephant species will complement studies of behaviour, genetic diversity, evolution and disease resistance. Findings We present low-coverage Illumina sequence data from two Asian elephants, representing approximately 5X and 2.5X coverage respectively. Both raw and aligned data are available, using the African elephant (L. africana) genome as a reference. Conclusions The data presented here are an important addition to the available genetic and genomic information on Asian and African elephants. PMID:25053995

  9. Complete genome sequence of Ferrimonas balearica type strain (PATT)

    SciTech Connect

    Nolan, Matt; Sikorski, Johannes; Davenport, Karen W.; Lucas, Susan; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Tapia, Roxanne; Brettin, Thomas S; Detter, J. Chris; Han, Cliff; Yasawong, Montri; Rohde, Manfred; Tindall, Brian; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lapidus, Alla L.

    2010-01-01

    Ferrimonas balerica (Rossello-Mora et al. 1996) is the type species of the genus Ferrimonas, which belongs to the gammaproteobacterial family Ferrimonadaceae. The species is a Gram-negative, motile, facultatively anaerobic and non spore-forming bacterium, which is of special interest because it is a chemoorganotroph and has a strictly respiratory metabolism with oxygen, nitrate, Fe(III)-oxyhydroxide, Fe(III)-citrate, MnO2, selenate, selenite and thiosulfate as electron acceptors. This is the first completed genome sequence of a member of the genus Ferrimonas and also the first sequence from a member of the family Ferrimonadaceae. The 4,279,159 bp long genome with its 3,803 protein-coding and 144 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  10. Complete genome sequence of Intrasporangium calvumtype strain (7 KIPT)

    SciTech Connect

    Glavina Del Rio, Tijana; Chertkov, Olga; Yasawong, Montri; Lucas, Susan; Deshpande, Shweta; Cheng, Jan-Fang; Detter, J. Chris; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Rohde, Manfred; Pukall, Rudiger; Sikorski, Johannes; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lapidus, Alla L.

    2010-01-01

    Intrasporangium calvum Kalakoutskii et al. 1967 is the type species of the genus Intrasporangium, which belongs to the actinobacterial family Intrasporangiaceae. The species is a Gram-positive bacterium that forms a branching mycelium, which tends to break into irregular fragments. The mycelium of this strain may bear intercalary vesicles but does not contain spores. The strain described in this study is an airborne organism that was isolated from a school dining room in 1967. One particularly interesting feature of I. calvum is that the type of its menaquinone is different from all other representatives of the family Intrasporangiaceae. This is the first completed genome sequence from a member of the genus Intrasporangium and also the first sequence from the family Intrasporangiaceae. The 4,024,382 bp long genome with its 3,653 protein-coding and 57 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  11. A novel DNA sequence motif in human and mouse genomes

    PubMed Central

    Zhang, Shilu; Du, Fang; Ji, Hongkai

    2015-01-01

    We report a novel DNA sequence motif in human and mouse genomes. This motif has several interesting features indicating that it is highly likely to be an unknown functional sequence element. The motif is highly enriched in promoter regions. Locations of the motif sites in the genome have strong tendency to be clustered together. Motif sites are associated with increased phylogenetic conservation as well as elevated DNase I hypersensitivity (DHS) in ENCODE cell lines. Clustered motif sites are found in promoter regions of a substantial fraction of the protein-coding genes in the genome. All together, these indicate that the motif may have important functions associated with a large number of genes. PMID:25990515

  12. Phytophthora Genome Sequences Uncover Evolutionary Origins and Mechanisms of Pathogenesis

    SciTech Connect

    Lamour, Kurt H; McDonald, W Hayes; Savidor, Alon

    2006-01-01

    Genome sequences of the soybean pathogen, Phytophthora sojae, and the sudden oak death pathogen, Phytophthora ramorum, suggest a photosynthetic past and reveal recent massive expansion and diversification of potential pathogenicity gene families. Abstract: Draft genome sequences of the soybean pathogen, Phytophthora sojae, and the sudden oak death pathogen, Phytophthora ramorum, have been determined. O mycetes such as these Phytophthora species share the kingdom Stramenopila with photosynthetic algae such as diatoms and the presence of many Phytophthora genes of probable phototroph origin support a photosynthetic ancestry for the stramenopiles. Comparison of the two species' genomes reveals a rapid expansion and diversification of many protein families associated with plant infection such as hydrolases, ABC transporters, protein toxins, proteinase inhibitors and, in particular, a superfamily of 700 proteins with similarity to known o mycete avirulence genes.

  13. Genome sequence and description of Corynebacterium ihumii sp. nov.

    PubMed Central

    Padmanabhan, Roshan; Dubourg, Grégory; Lagier, Jean-Christophe; Couderc, Carine; Michelle, Caroline; Raoult, Didier; Fournier, Pierre-Edouard

    2014-01-01

    Corynebacterium ihumii strain GD7T sp. nov. is proposed as the type strain of a new species, which belongs to the family Corynebacteriaceae of the class Actinobacteria. This strain was isolated from the fecal flora of a 62 year-old male patient, as a part of the culturomics study. Corynebacterium ihumii is a Gram positive, facultativly anaerobic, nonsporulating bacillus. Here, we describe the features of this organism, together with the high quality draft genome sequence, annotation and the comparison with other member of the genus Corynebacteria. C. ihumii genome is 2,232,265 bp long (one chromosome but no plasmid) containing 2,125 protein-coding and 53 RNA genes, including 4 rRNA genes. The whole-genome shotgun sequence of Corynebacterium ihumii strain GD7T sp. nov has been deposited in EMBL under accession number GCA_000403725. PMID:25197488

  14. Complete genome sequence of Intrasporangium calvum type strain (7 KIPT)

    PubMed Central

    Del Rio, Tijana Glavina; Chertkov, Olga; Yasawong, Montri; Lucas, Susan; Deshpande, Shweta; Cheng, Jan-Fang; Detter, Chris; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Rohde, Manfred; Pukall, Rüdiger; Sikorski, Johannes; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Lapidus, Alla

    2010-01-01

    Intrasporangium calvum Kalakoutskii et al. 1967 is the type species of the genus Intrasporangium, which belongs to the actinobacterial family Intrasporangiaceae. The species is a Gram-positive bacterium that forms a branching mycelium, which tends to break into irregular fragments. The mycelium of this strain may bear intercalary vesicles but does not contain spores. The strain described in this study is an airborne organism that was isolated from a school dining room in 1967. One particularly interesting feature of I. calvum is that the type of its menaquinone is different from all other representatives of the family Intrasporangiaceae. This is the first completed genome sequence from a member of the genus Intrasporangium and also the first sequence from the family Intrasporangiaceae. The 4,024,382 bp long genome with its 3,653 protein-coding and 57 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304734

  15. Sequencing quality assessment tools to enable data-driven informatics for high throughput genomics

    PubMed Central

    Leggett, Richard M.; Ramirez-Gonzalez, Ricardo H.; Clavijo, Bernardo J.; Waite, Darren; Davey, Robert P.

    2013-01-01

    The processes of quality assessment and control are an active area of research at The Genome Analysis Centre (TGAC). Unlike other sequencing centers that often concentrate on a certain species or technology, TGAC applies expertise in genomics and bioinformatics to a wide range of projects, often requiring bespoke wet lab and in silico workflows. TGAC is fortunate to have access to a diverse range of sequencing and analysis platforms, and we are at the forefront of investigations into library quality and sequence data assessment. We have developed and implemented a number of algorithms, tools, pipelines and packages to ascertain, store, and expose quality metrics across a number of next-generation sequencing platforms, allowing rapid and in-depth cross-platform Quality Control (QC) bioinformatics. In this review, we describe these tools as a vehicle for data-driven informatics, offering the potential to provide richer context for downstream analysis and to inform experimental design. PMID:24381581

  16. Whole genome shotgun sequencing guided by bioinformatics pipelines--an optimized approach for an established technique.

    PubMed

    Kaiser, Olaf; Bartels, Daniela; Bekel, Thomas; Goesmann, Alexander; Kespohl, Sebastian; Pühler, Alfred; Meyer, Folker

    2003-12-19

    While the sequencing of bacterial genomes has become a routine procedure at major sequencing centers, there are still a number of genome projects at small- or medium-size facilities. For these facilities a maximum of control over sequencing, assembling and finishing is essential. At the same time, facilities have to be able to co-operate at minimum costs for the overall project. We have established a pipeline for the distributed sequencing of Alcanivorax borkumensis SK2, Azoarcus sp. BH72, Clavibacter michiganensis subsp. michiganensis NCPPB382, Sorangium cellulosum So ce56 and Xanthomonas campestris pv. vesicatoria 85-10. Our pipeline relies on standard tools (e.g. PHRED/PHRAP, CAP3 and Consed/Autofinish) wherever possible, supplementing them with new tools (BioMake and BACCardI) to achieve the aims described above. PMID:14651855

  17. Genome Sequences of Oblitimonas alkaliphila gen. nov. sp. nov. (Proposed), a Novel Bacterium of the Pseudomonadaceae Family

    PubMed Central

    Lauer, Ana C.; Humrighouse, Ben W.; Emery, Brian; Drobish, Adam; Juieng, Phalasy; Loparev, Vladimir; McQuiston, John R.

    2015-01-01

    Results obtained through 16S rRNA gene sequencing and phenotypic testing of eight related, but unidentified, isolates located in a historical collection at the Centers for Disease Control and Prevention suggested that these isolates belong to a novel genera of bacteria. The genomes of the bacteria, to be named Oblitimonas alkaphilia gen. nov. sp. nov., were sequenced using Illumina technology. Closed genomes were produced for all eight isolates. PMID:26679585

  18. The complete mitochondrial genome sequence of Schizothorax dolichonema (Cypriniformes: Cyprinidae).

    PubMed

    Yue, Xingjian; Zhou, Chuanjiang; Shi, Jinrong; Zou, Yuanchao

    2016-01-01

    The complete mitochondrial genome sequence of Schizothorax dolichonema has been sequenced, which contains 22 tRNA genes, 13 protein-coding genes, 2 rRNA genes and 2 non-coding regions: origin of light-strand replication and control region, with the total length of 16,583?bp. The gene order and composition are similar to most of other vertebrates. Most of the genes are encoded on heavy strand, except for eight tRNA and ND6 genes. The mitogenome sequence of S. dolichonema would contribute for better understanding of biogeography and evolution of Schizothoracine fishes. PMID:24617487

  19. Comprehensive Mitochondrial Genome Analysis by Massively Parallel Sequencing.

    PubMed

    Palculict, Meagan E; Zhang, Victor Wei; Wong, Lee-Jun; Wang, Jing

    2016-01-01

    Next-generation sequencing (NGS) based on massively parallel sequencing (MPS) of the entire 16,569 bp mitochondrial genome generates thousands of reads for each nucleotide position. The high-throughput sequence data generated allow the detection of mitochondrial DNA (mtDNA) point mutations and deletions with the ability to accurately quantify the mtDNA point mutation heteroplasmy and to determine the deletion breakpoints. In addition, this method is particularly sensitive for the detection of low-level mtDNA large deletions and multiple deletions. It is by far the most powerful tool for molecular diagnosis of mtDNA disorders. PMID:26530670

  20. The complete mitochondrial genome sequence of Platypharodon extremus (Cypriniformes: Cyprinidae).

    PubMed

    Chen, Juan; Du, Yurong; Guo, Xinyi; Xie, Ling; Zhang, Xuze; Ji, Yinfa; Pang, Bo; Guo, Songchang; Qi, Delin

    2016-03-01

    Platypharodon extemus is a monotypic species of Schizothoracine fishes and it was listed as Endangered species in the "China Red Data Book (Pisces)", Vulnerable (V) by the National Environmental Protection Agency and Endangered Species Scientific Commission. So far, little mitochondrial genome information of this genus has been described. In this study, we obtained the complete mitochondrial DNA genome sequences of this species. The mitogenome was 16,668?bp in length, which consists of 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes and 2 noncoding regions. The base composition of this mitochondrial genome was 28.6% A, 27.3% T, 18.2% G, 25.9% C, with a high A?+?T content (55.9%). The complete mitochondrial genome of P. extremus would be of great utility in the phylogenetic analysis of the schizothoracine fishes and also provide meritorious insights into the deeper problems of the phylogenic analysis. PMID:25162626

  1. Mitochondrial DNA genome sequence of Ili marinka (Schizothorax pseudoaksaiensis).

    PubMed

    Luan, Pei-Xian; Lu, Jian-Guo; Xue, Shu-Qun; Zhang, Xiao-Feng; Peng, Li-Na; Sun, Xiao-Wen

    2016-03-01

    Ili marinka (Schizothorax pseudoaksaiensis) belongs to the family Cyprinidae, which is found only in Ili River in the Xinjiang Uygur Autonomous Region. In this study, we reported the complete sequence of mitochondrial genome of S. pseudoaksaiensis. The genome is 16,586?bp in length and consists of 13 protein-coding genes, 22 tRNA genes, 2 ribosomal RNA genes, and the non-coding control regions (D-loop). The complete mitochondrial genome base composition is 29.80% for A, 17.85% for G, 25.31% for T, and 27.04% for C, with a slight A+T bias of 55.11%. The mitochondrial genome can contribute to the studies on genetic diversity and conservation of S. pseudoaksaiensis. PMID:25208177

  2. The complete mitochondrial genome sequence of Helicophagella melanura (Diptera: Sarcophagidae).

    PubMed

    Zhang, Changquan; Fu, Xiaoliang; Zhu, Zhenyu; Xie, Kai; Guo, Yadong

    2014-12-01

    Abstract The mitochondrial genome of Helicophagella melanura, a representative of the Sarcophagidae family, was completely sequenced for the first time. The genome is a double-stranded circular molecule of 15,190?bp length, including the 37 genes and 1 non-coding AT-rich region. The gene contents of the mtDNA were identical to those observed in the ancestral arthropod. The total base composition of Helicophagella melanura mitochondrial genome is 39.42% for A, 36.22% for T, 14.71% for C and 9.65% for G and in the order A?>?T?>?C?>?G. The mitochondrial genome data of Helicophagella melanura is useful markers for species discrimination in forensic entomology and for phylogenetic analysis. PMID:25471440

  3. Co-barcoded sequence reads from long DNA fragments: a cost-effective solution for “perfect genomesequencing

    PubMed Central

    Peters, Brock A.; Liu, Jia; Drmanac, Radoje

    2015-01-01

    Next generation sequencing (NGS) technologies, primarily based on massively parallel sequencing, have touched and radically changed almost all aspects of research worldwide. These technologies have allowed for the rapid analysis, to date, of the genomes of more than 2,000 different species. In humans, NGS has arguably had the largest impact. Over 100,000 genomes of individual humans (based on various estimates) have been sequenced allowing for deep insights into what makes individuals and families unique and what causes disease in each of us. Despite all of this progress, the current state of the art in sequence technology is far from generating a “perfect genomesequence and much remains to be understood in the biology of human and other organisms’ genomes. In the article that follows, we outline why the “perfect genome” in humans is important, what is lacking from current human whole genome sequences, and a potential strategy for achieving the “perfect genome” in a cost effective manner. PMID:25642240

  4. The complete genome sequence of Mycobacterium avium subspecies paratuberculosis.

    PubMed

    Li, Lingling; Bannantine, John P; Zhang, Qing; Amonsin, Alongkorn; May, Barbara J; Alt, David; Banerji, Nilanjana; Kanjilal, Sagarika; Kapur, Vivek

    2005-08-30

    We describe here the complete genome sequence of a common clone of Mycobacterium avium subspecies paratuberculosis (Map) strain K-10, the causative agent of Johne's disease in cattle and other ruminants. The K-10 genome is a single circular chromosome of 4,829,781 base pairs and encodes 4,350 predicted ORFs, 45 tRNAs, and one rRNA operon. In silico analysis identified >3,000 genes with homologs to the human pathogen, M. tuberculosis (Mtb), and 161 unique genomic regions that encode 39 previously unknown Map genes. Analysis of nucleotide substitution rates with Mtb homologs suggest overall strong selection for a vast majority of these shared mycobacterial genes, with only 68 ORFs with a synonymous to nonsynonymous substitution ratio of >2. Comparative sequence analysis reveals several noteworthy features of the K-10 genome including: a relative paucity of the PE/PPE family of sequences that are implicated as virulence factors and known to be immunostimulatory during Mtb infection; truncation in the EntE domain of a salicyl-AMP ligase (MbtA), the first gene in the mycobactin biosynthesis gene cluster, providing a possible explanation for mycobactin dependence of Map; and Map-specific sequences that are likely to serve as potential targets for sensitive and specific molecular and immunologic diagnostic tests. Taken together, the availability of the complete genome sequence offers a foundation for the study of the genetic basis for virulence and physiology in Map and enables the development of new generations of diagnostic tests for bovine Johne's disease. PMID:16116077

  5. Realistic artificial DNA sequences as negative controls for computational genomics

    PubMed Central

    Caballero, Juan; Smit, Arian F. A.; Hood, Leroy; Glusman, Gustavo

    2014-01-01

    A common practice in computational genomic analysis is to use a set of ‘background’ sequences as negative controls for evaluating the false-positive rates of prediction tools, such as gene identification programs and algorithms for detection of cis-regulatory elements. Such ‘background’ sequences are generally taken from regions of the genome presumed to be intergenic, or generated synthetically by ‘shuffling’ real sequences. This last method can lead to underestimation of false-positive rates. We developed a new method for generating artificial sequences that are modeled after real intergenic sequences in terms of composition, complexity and interspersed repeat content. These artificial sequences can serve as an inexhaustible source of high-quality negative controls. We used artificial sequences to evaluate the false-positive rates of a set of programs for detecting interspersed repeats, ab initio prediction of coding genes, transcribed regions and non-coding genes. We found that RepeatMasker is more accurate than PClouds, Augustus has the lowest false-positive rate of the coding gene prediction programs tested, and Infernal has a low false-positive rate for non-coding gene detection. A web service, source code and the models for human and many other species are freely available at http://repeatmasker.org/garlic/. PMID:24803667

  6. Draft Genome Sequence of Colletotrichum sublineola, a Destructive Pathogen of Cultivated Sorghum.

    PubMed

    Baroncelli, Riccardo; Sanz-Martín, José María; Rech, Gabriel E; Sukno, Serenella A; Thon, Michael R

    2014-01-01

    Colletotrichum sublineola is a filamentous fungus that causes anthracnose disease on sorghum. We report a draft whole-genome shotgun sequence and gene annotation of the nuclear genome of this fungus using Illumina sequencing. PMID:24926053

  7. Complete Genome Sequence of Parascardovia denticolens JCM 12538T, Isolated from Human Dental Caries

    PubMed Central

    Oshima, Kenshiro; Hayashi, Jun-ichiro; Toh, Hidehiro; Nakano, Akiyo; Shindo, Chie; Komiya, Keiko; Honda, Kenya; Hattori, Masahira

    2015-01-01

    Parascardovia denticolens JCM 12538T was isolated from human dental caries. Here, we report the complete genome sequence of this organism. This paper is the first report demonstrating the completely sequenced and assembled genome of P. denticolens. PMID:25977413

  8. Draft Genome Sequence of Bifidobacterium aesculapii DSM 26737T, Isolated from Feces of Baby Common Marmoset

    PubMed Central

    Toh, Hidehiro; Yamazaki, Yumiko; Tashiro, Kosuke; Kawarai, Shinpei; Oshima, Kenshiro; Nakano, Akiyo; Kim, Co Nguyen Thi; Mimura, Iyo; Arakawa, Kensuke; Iriki, Atsushi; Kikusui, Takefumi

    2015-01-01

    Bifidobacterium aesculapii DSM 26737T was isolated from feces of baby common marmoset. Here, we report the draft genome sequence of this organism. This paper is the first published report of the genomic sequence of B. aesculapii. PMID:26659692

  9. Lessons Learned From 24 Completely Sequenced AML Genomes - Timothy Ley, TCGA Scientific Symposium 2011

    Cancer.gov

    Home News and Events Multimedia Library Videos Lessons Learned From 24 Completely Sequenced AML Genomes - Timothy Ley Lessons Learned From 24 Completely Sequenced AML Genomes - Timothy Ley, TCGA Scientific Symposium 2011 You will need Adobe Flash

  10. Draft Genome Sequence of the Versatile Alkane-Degrading Bacterium Aquabacterium sp. Strain NJ1

    PubMed Central

    Shiwa, Yuh; Yoshikawa, Hirofumi; Zylstra, Gerben J.

    2014-01-01

    The draft genome sequence of a soil bacterium, Aquabacterium sp. strain NJ1, capable of utilizing both liquid and solid alkanes, was deciphered. This is the first report of an Aquabacterium genome sequence. PMID:25477416

  11. AACR 2014: NCI/NIH-Sponsored Session: Large-Scale Genomics Data for the Research Community through the NCI Center for Cancer Genomics

    Cancer.gov

    The NCI’s Center for Cancer Genomics (CCG), which includes the Office of Cancer Genomics and The Cancer Genome Atlas Program Office, provides the research community access to large-scale molecular characterization data, which is largely sequence-based. CCG programs aim to improve patient outcome through identification of valid molecular targets and associated molecular markers (prognostic or diagnostic), in and across diseases investigated, which should ultimately lead to the rapid development of novel, more effective therapies.

  12. SeqEntropy: Genome-Wide Assessment of Repeats for Short Read Sequencing

    E-print Network

    Chen, Chaur-Chin

    analysis of human genome [1] and for rapid full genome sequencing and typing of various organisms. The 1000 Genomes Project, launched in 2008, bSeqEntropy: Genome-Wide Assessment of Repeats for Short Read Sequencing Hsueh-Ting Chu1,2 , William

  13. A Model of the Statistical Power of Comparative Genome Sequence Analysis

    E-print Network

    Batzoglou, Serafim

    by their evolutionary conservation [1,2,3]. It will be instrumental for achieving the goal of the Human Genome Project to comprehensively identify functional elements in the human genome [4]. How many comparative genome sequences do we not contribute significant information to human genome analysis? Since sequencing is expensive and capacity

  14. Development of peanut expessed sequence tag-based genomic resources and tools

    Technology Transfer Automated Retrieval System (TEKTRAN)

    U.S. Peanut Genome Initiative (PGI) has widely recognized the need for peanut genome tools and resources development for mitigating peanut allergens and food safety. Genomics such as Expressed Sequence Tag (EST), microarray technologies, and whole genome sequencing provides robotic tools for profili...

  15. Development of peanut EST (expressed sequence tag)-based genomic resources and tools

    Technology Transfer Automated Retrieval System (TEKTRAN)

    U.S. Peanut Genome Initiative (PGI) has widely recognized the need for peanut genome tools and resources development for mitigating peanut allergens and food safety. Genomics such as Expressed Sequence Tag (EST), microarray technologies, and whole genome sequencing provides robotic tools for profili...

  16. Sugarcane genome sequencing by methylation filtration provides tools for genomic research in the genus Saccharum

    PubMed Central

    Grativol, Clícia; Regulski, Michael; Bertalan, Marcelo; McCombie, W. Richard; da Silva, Felipe Rodrigues; Neto, Adhemar Zerlotini; Vicentini, Renato; Farinelli, Laurent; Hemerly, Adriana Silva; Martienssen, Robert A.; Ferreira, Paulo Cavalcanti Gomes

    2015-01-01

    SUMMARY Many economically important crops have large and complex genomes, which hampers sequencing of their genome by standard methods such as WGS. Large tracts of methylated repeats occur at plant genomes interspersed by hypomethylated gene-rich regions. Gene enrichment strategies based on methylation profile offer an alternative to sequencing repetitive genomes. Here, we have applied methyl filtration (MF) with McrBC digestion to enrich for euchromatic regions of sugarcane genome. To verify the efficiency of MF and the assembly quality of sequences submitted to gene-enrichment strategy, we have compared assemblies using MF and unfiltered (UF) libraries. The MF allowed the achievement of a better assembly by filtering out 35% of the sugarcane genome and by producing 1.5 times more scaffolds and 1.7 times more assembled Mb compared to unfiltered scaffolds. The coverage of sorghum CDS by MF scaffolds was at least 36% higher than by UF scaffolds. Using MF technology, we increased by 134X the coverage of genic regions of the monoploid sugarcane genome. The MF reads assembled into scaffolds covering all genes at sugarcane BACs, 97.2% of sugarcane ESTs, 92.7% of sugarcane RNA-seq reads and 98.4% of sorghum protein sequences. Analysis of MF scaffolds encoding enzymes of the sucrose/starch pathway discovered 291 SNPs in the wild sugarcane species, S. spontaneum and S. officinarum. A large number of microRNA genes were also identified in the MF scaffolds. The information achieved by the MF dataset provides a valuable tool for genomic research in the genus Saccharum and improvement of sugarcane as a biofuel crop. PMID:24773339

  17. Complete genome sequence of Methanospirillum hungatei type strain JF1.

    PubMed

    Gunsalus, Robert P; Cook, Lauren E; Crable, Bryan; Rohlin, Lars; McDonald, Erin; Mouttaki, Housna; Sieber, Jessica R; Poweleit, Nicole; Zhou, Hong; Lapidus, Alla L; Daligault, Hajnalka Erzsebet; Land, Miriam; Gilna, Paul; Ivanova, Natalia; Kyrpides, Nikos; Culley, David E; McInerney, Michael J

    2016-01-01

    Methanospirillum hungatei strain JF1 (DSM 864) is a methane-producing archaeon and is the type species of the genus Methanospirillum, which belongs to the family Methanospirillaceae within the order Methanomicrobiales. Its genome was selected for sequencing due to its ability to utilize hydrogen and carbon dioxide and/or formate as a sole source of energy. Ecologically, M. hungatei functions as the hydrogen- and/or formate-using partner with many species of syntrophic bacteria. Its morphology is distinct from other methanogens with the ability to form long chains of cells (up to 100 ?m in length), which are enclosed within a sheath-like structure, and terminal cells with polar flagella. The genome of M. hungatei strain JF1 is the first completely sequenced genome of the family Methanospirillaceae, and it has a circular genome of 3,544,738 bp containing 3,239 protein coding and 68 RNA genes. The large genome of M. hungatei JF1 suggests the presence of unrecognized biochemical/physiological properties that likely extend to the other Methanospirillaceae and include the ability to form the unusual sheath-like structure and to successfully interact with syntrophic bacteria. PMID:26744606

  18. Research participants' attitudes towards the confidentiality of genomic sequence information

    PubMed Central

    Jamal, Leila; Sapp, Julie C; Lewis, Katie; Yanes, Tatiane; Facio, Flavia M; Biesecker, Leslie G; Biesecker, Barbara B

    2014-01-01

    Respecting the confidentiality of personal data contributed to genomic studies is an important issue for researchers using genomic sequencing in humans. Although most studies adhere to rules of confidentiality, there are different conceptions of confidentiality and why it is important. The resulting ambiguity obscures what is at stake when making tradeoffs between data protection and other goals in research, such as transparency, reciprocity, and public benefit. Few studies have examined why participants in genomic research care about how their information is used. To explore this topic, we conducted semi-structured phone interviews with 30 participants in two National Institutes of Health research protocols using genomic sequencing. Our results show that research participants value confidentiality as a form of control over information about themselves. To the individuals we interviewed, control was valued as a safeguard against discrimination in a climate of uncertainty about future uses of individual genome data. Attitudes towards data sharing were related to the goals of research and details of participants' personal lives. Expectations of confidentiality, trust in researchers, and a desire to advance science were common reasons for willingness to share identifiable data with investigators. Nearly, all participants were comfortable sharing personal data that had been de-identified. These findings suggest that views about confidentiality and data sharing are highly nuanced and are related to the perceived benefits of joining a research study. PMID:24281371

  19. Genome Sequencing Reveals a Phage in Helicobacter pylori

    PubMed Central

    Lehours, Philippe; Vale, Filipa F.; Bjursell, Magnus K.; Melefors, Ojar; Advani, Reza; Glavas, Steve; Guegueniat, Julia; Gontier, Etienne; Lacomme, Sabrina; Alves Matos, António; Menard, Armelle; Mégraud, Francis; Engstrand, Lars; Andersson, Anders F.

    2011-01-01

    ABSTRACT Helicobacter pylori chronically infects the gastric mucosa in more than half of the human population; in a subset of this population, its presence is associated with development of severe disease, such as gastric cancer. Genomic analysis of several strains has revealed an extensive H. pylori pan-genome, likely to grow as more genomes are sampled. Here we describe the draft genome sequence (63 contigs; 26× mean coverage) of H. pylori strain B45, isolated from a patient with gastric mucosa-associated lymphoid tissue (MALT) lymphoma. The major finding was a 24.6-kb prophage integrated in the bacterial genome. The prophage shares most of its genes (22/27) with prophage region II of Helicobacter acinonychis strain Sheeba. After UV treatment of liquid cultures, circular DNA carrying the prophage integrase gene could be detected, and intracellular tailed phage-like particles were observed in H. pylori cells by transmission electron microscopy, indicating that phage production can be induced from the prophage. PCR amplification and sequencing of the integrase gene from 341 H. pylori strains from different geographic regions revealed a high prevalence of the prophage (21.4%). Phylogenetic reconstruction showed four distinct clusters in the integrase gene, three of which tended to be specific for geographic regions. Our study implies that phages may play important roles in the ecology and evolution of H. pylori. PMID:22086490

  20. Insights into hominid evolution from the gorilla genome sequence

    PubMed Central

    Scally, Aylwyn; Dutheil, Julien Y.; Hillier, LaDeana W.; Jordan, Greg E.; Goodhead, Ian; Herrero, Javier; Hobolth, Asger; Lappalainen, Tuuli; Mailund, Thomas; Marques-Bonet, Tomas; McCarthy, Shane; Montgomery, Stephen H.; Schwalie, Petra C.; Tang, Y. Amy; Ward, Michelle C.; Xue, Yali; Yngvadottir, Bryndis; Alkan, Can; Andersen, Lars N.; Ayub, Qasim; Ball, Edward V.; Beal, Kathryn; Bradley, Brenda J.; Chen, Yuan; Clee, Chris M.; Fitzgerald, Stephen; Graves, Tina A.; Gu, Yong; Heath, Paul; Heger, Andreas; Karakoc, Emre; Kolb-Kokocinski, Anja; Laird, Gavin K.; Lunter, Gerton; Meader, Stephen; Mort, Matthew; Mullikin, James C.; Munch, Kasper; O’Connor, Timothy D.; Phillips, Andrew D.; Prado-Martinez, Javier; Rogers, Anthony S.; Sajjadian, Saba; Schmidt, Dominic; Shaw, Katy; Simpson, Jared T.; Stenson, Peter D.; Turner, Daniel J.; Vigilant, Linda; Vilella, Albert J.; Whitener, Weldon; Zhu, Baoli; Cooper, David N.; de Jong, Pieter; Dermitzakis, Emmanouil T.; Eichler, Evan E.; Flicek, Paul; Goldman, Nick; Mundy, Nicholas I.; Ning, Zemin; Odom, Duncan T.; Ponting, Chris P.; Quail, Michael A.; Ryder, Oliver A.; Searle, Stephen M.; Warren, Wesley C.; Wilson, Richard K.; Schierup, Mikkel H.; Rogers, Jane; Tyler-Smith, Chris; Durbin, Richard

    2012-01-01

    Summary Gorillas are humans’ closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human-chimpanzee and human-chimpanzee-gorilla speciation events at approximately 6 and 10 million years ago (Mya). In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution. PMID:22398555

  1. Complete genome sequence of Actinosynnema mirum type strain (101T)

    SciTech Connect

    Land, Miriam; Lapidus, Alla; Mayilraj, Shanmugam; Chen, Feng; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Chertkov, Olga; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Rohde, Manfred; Goker, Markus; Pati, Amrita; Ivanova, Natalia; Mavrommatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia; Brettin, Thomas; Detter, John C.; Han, Cliff; Chain, Patrick; Tindall, Brian; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2009-05-20

    Actinosynnema mirum Hasegawa et al. 1978 is the type species of the genus, and is of phylogenetic interest because of its central phylogenetic location in the Actino-synnemataceae, a rapidly growing family within the actinobacterial suborder Pseudo-nocardineae. A. mirum is characterized by its motile spores borne on synnemata and as a producer of nocardicin antibiotics. It is capable of growing aerobically and under a moderate CO2 atmosphere. The strain is a Gram-positive, aerial and substrate mycelium producing bacterium, originally isolated from a grass blade collected from the Raritan River, New Jersey. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of a member of the family Actinosynnemataceae, and only the second sequence from the actinobacterial suborder Pseudonocardineae. The 8,248,144 bp long single replicon genome with its 7100 protein-coding and 77 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  2. The complete mitochondrial genome sequence of Eimeria magna (Apicomplexa: Coccidia).

    PubMed

    Tian, Si-Qin; Cui, Ping; Fang, Su-Fang; Liu, Guo-Hua; Wang, Chun-Ren; Zhu, Xing-Quan

    2015-10-01

    In the present study, we determined the complete mitochondrial DNA (mtDNA) sequence of Eimeria magna from rabbits for the first time, and compared its gene contents and genome organizations with that of seven Eimeria spp. from domestic chickens. The size of the complete mt genome sequence of E. magna is 6249?bp, which consists of 3 protein-coding genes (cytb, cox1 and cox3), 12 gene fragments for the large subunit (LSU) rRNA, and 7 gene fragments for the small subunit (SSU) rRNA, without transfer RNA genes, in accordance with that of Eimeria spp. from chickens. The putative direction of translation for three genes (cytb, cox1 and cox3) was the same as those of Eimeria species from domestic chickens. The content of A?+?T is 65.16% for E. magna mt genome (29.73% A, 35.43% T, 17.09 G and 17.75% C). The E. magna mt genome sequence provides novel mtDNA markers for studying the molecular epidemiology and population genetics of Eimeria spp. and has implications for the molecular diagnosis and control of rabbit coccidiosis. PMID:24328820

  3. MIR retrotransposon sequences provide insulators to the human genome.

    PubMed

    Wang, Jianrong; Vicente-García, Cristina; Seruggia, Davide; Moltó, Eduardo; Fernandez-Miñán, Ana; Neto, Ana; Lee, Elbert; Gómez-Skarmeta, José Luis; Montoliu, Lluís; Lunyak, Victoria V; Jordan, I King

    2015-08-11

    Insulators are regulatory elements that help to organize eukaryotic chromatin via enhancer-blocking and chromatin barrier activity. Although there are several examples of transposable element (TE)-derived insulators, the contribution of TEs to human insulators has not been systematically explored. Mammalian-wide interspersed repeats (MIRs) are a conserved family of TEs that have substantial regulatory capacity and share sequence characteristics with tRNA-related insulators. We sought to evaluate whether MIRs can serve as insulators in the human genome. We applied a bioinformatic screen using genome sequence and functional genomic data from CD4(+) T cells to identify a set of 1,178 predicted MIR insulators genome-wide. These predicted MIR insulators were computationally tested to serve as chromatin barriers and regulators of gene expression in CD4(+) T cells. The activity of predicted MIR insulators was experimentally validated using in vitro and in vivo enhancer-blocking assays. MIR insulators are enriched around genes of the T-cell receptor pathway and reside at T-cell-specific boundaries of repressive and active chromatin. A total of 58% of the MIR insulators predicted here show evidence of T-cell-specific chromatin barrier and gene regulatory activity. MIR insulators appear to be CCCTC-binding factor (CTCF) independent and show a distinct local chromatin environment with marked peaks for RNA Pol III and a number of histone modifications, suggesting that MIR insulators recruit transcriptional complexes and chromatin modifying enzymes in situ to help establish chromatin and regulatory domains in the human genome. The provisioning of insulators by MIRs across the human genome suggests a specific mechanism by which TE sequences can be used to modulate gene regulatory networks. PMID:26216945

  4. Complete Genome Sequence of the Gut Commensal and Laboratory Strain Enterococcus faecium 64/3.

    PubMed

    Bender, Jennifer K; Fiedler, Stefan; Klare, Ingo; Werner, Guido

    2015-01-01

    The genome sequence of the commensal and widely used laboratory strain Enterococcus faecium 64/3 was resolved by means of PacificBioscience and Illumina whole-genome sequencing. The genome comprises 2,575,333 bp with 2,382 coding sequences as assigned by NCBI. PMID:26586871

  5. Complete Genome Sequence of the Gut Commensal and Laboratory Strain Enterococcus faecium 64/3

    PubMed Central

    Fiedler, Stefan; Klare, Ingo; Werner, Guido

    2015-01-01

    The genome sequence of the commensal and widely used laboratory strain Enterococcus faecium 64/3 was resolved by means of PacificBioscience and Illumina whole-genome sequencing. The genome comprises 2,575,333 bp with 2,382 coding sequences as assigned by NCBI. PMID:26586871

  6. Draft genome sequences of seven isolates of Phytophthora ramorum EU2 from Northern Ireland

    PubMed Central

    Mata Saez, Lourdes de la; McCracken, Alistair R.; Cooke, Louise R.; O'Neill, Paul; Grant, Murray; Studholme, David J.

    2015-01-01

    Here we present draft-quality genome sequence assemblies for the oomycete Phytophthora ramorum genetic lineage EU2. We sequenced genomes of seven isolates collected in Northern Ireland between 2010 and 2012. Multiple genome sequences from P. ramorum EU2 will be valuable for identifying genetic variation within the clonal lineage that can be useful for tracking its spread.

  7. Initial sequence of the chimpanzee genome and comparison with the human

    E-print Network

    Reich, David

    Initial sequence of the chimpanzee genome and comparison with the human genome The Chimpanzee Sequencing and Analysis Consortium* Here we present a draft genome sequence of the common chimpanzee (Pan of the genetic differences that have accumulated since the human and chimpanzee species diverged from our common

  8. Complete Genome Sequence of Pelosinus sp. Strain UFO1 Assembled Using Single-Molecule Real-Time DNA Sequencing Technology

    PubMed Central

    Brown, Steven D.; Utturkar, Sagar M.; Magnuson, Timothy S.; Ray, Allison E.; Poole, Farris L.; Lancaster, W. Andrew; Thorgersen, Michael P.; Adams, Michael W. W.

    2014-01-01

    Pelosinus species can reduce metals such as Fe(III), U(VI), and Cr(VI) and have been isolated from diverse geographical regions. Five draft genome sequences have been published. We report the complete genome sequence for Pelosinus sp. strain UFO1 using only PacBio DNA sequence data and without manual finishing. PMID:25189589

  9. The complete chloroplast genome sequence of desert poplar (Populus euphratica).

    PubMed

    Zhang, Qun-Jie; Gao, Li-Zhi

    2016-01-01

    The complete chloroplast sequence of the desert poplar (Populus euphratica), a plant well-adapted to salt stress, was determined in this study. The genome consists of 156,766?bp containing a pair of inverted repeats (IRs) of 16,591?bp separated by a large single-copy region and a small single-copy region of 84,888?bp and 27,646?bp, respectively. The chloroplast genome contains 130 known genes, including 89 protein-coding genes, 8 ribosomal RNA genes, and 37 tRNA genes; 18 of these are located in the inverted repeat region. PMID:24810062

  10. Genome sequence of the tobacco bacterial wilt pathogen Ralstonia solanacearum.

    PubMed

    Li, Zefeng; Wu, Sanling; Bai, Xuefei; Liu, Yun; Lu, Jianfei; Liu, Yong; Xiao, Bingguang; Lu, Xiuping; Fan, Longjiang

    2011-11-01

    Ralstonia solanacearum is a causal agent of plant bacterial wilt with thousands of distinct strains in a heterogeneous species complex. Here we report the genome sequence of a phylotype IB strain, Y45, isolated from tobacco (Nicotiana tabacum) in China. Compared with the published genomes of eight strains which were isolated from other hosts and habitats, 794 specific genes and many rearrangements/inversion events were identified in the tobacco strain, demonstrating that this strain represents an important node within the R. solanacearum complex. PMID:21994922

  11. Complete genome sequence of Croceibacter bacteriophage P2559S.

    PubMed

    Kang, Ilnam; Kang, Dongmin; Cho, Jang-Cheon

    2012-08-01

    Croceibacter atlanticus HTCC2559(T), a marine bacterium isolated from the Sargasso Sea, is a phylogenetically unique member of the family Flavobacteriaceae. Strain HTCC2559(T) possesses genes related to interaction with primary producers, which makes studies on bacteriophages infecting the strain interesting. Here we report the genome sequence of bacteriophage P2559S, which was isolated off the coast of the Republic of Korea and lytically infects HTCC2559(T). Many genes predicted in the P2559S genome had their homologs in Bacteroides phages. PMID:22843867

  12. Draft genome sequence of Acidithiobacillus ferrooxidans YQH-1

    PubMed Central

    Yan, Lei; Zhang, Shuang; Wang, Weidong; Hu, Huixin; Wang, Yanjie; Yu, Gaobo; Chen, Peng

    2015-01-01

    Acidithiobacillus ferrooxidans YQH-1 is a moderate acidophilic bacterium isolated from a river in a volcano of Northeast China. Here, we describe the draft genome of strain YQH-1, which was assembled into 123 contigs containing 3,111,222 bp with a G + C content of 58.63%. A large number of genes related to carbon dioxide fixation, dinitrogen fixation, pH tolerance, heavy metal detoxification, and oxidative stress defense were detected. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. LJBT00000000.

  13. Genome sequence and description of Mannheimia massilioguelmaensis sp. nov.

    PubMed Central

    Hadjadj, L.; Bentorki, A.A.; Michelle, C.; Amoura, K.; Djahoudi, A.; Rolain, J.-M.

    2015-01-01

    Strain MG13T sp. nov. is the type strain of Mannheimia massilioguelmaensis, a new species within the genus Mannheimia. This strain was isolated from the exudate of a skin lesion of an Algerian man. Mannheimia massilioguelmaensis is a Gram-negative, facultative anaerobic rod, member of the family Pasteurellaceae. Here we describe this organism, together with the complete genome sequence and annotation. The 2 186 813 bp long genome contains 2048 protein-coding and 55 RNA genes, including eight rRNA genes. PMID:26693284

  14. Phytophthora Genome Sequences Uncover Evolutionary Origins and Mechanisms of Pathogenesis

    SciTech Connect

    Tyler, Brett M.; Tripathy, Sucheta; Zhang, Xuemin; Dehal, Paramvir; Jiang, Rays H. Y.; Aerts, Andrea; Arredondo, Felipe D.; Baxter, Laura; Bensasson, Douda; Beynon, JIm L.; Chapman, Jarrod; Damasceno, Cynthia M. B.; Dorrance, Anne E.; Dou, Daolong; Dickerman, Allan W.; Dubchak, Inna L.; Garbelotto, Matteo; Gijzen, Mark; Gordon, Stuart G.; Govers, Francine; Grunwald, NIklaus J.; Huang, Wayne; Ivors, Kelly L.; Jones, Richard W.; Kamoun, Sophien; Krampis, Konstantinos; Lamour, Kurt H.; Lee, Mi-Kyung; McDonald, W. Hayes; Medina, Monica; Meijer, Harold J. G.; Nordberg, Erik K.; Maclean, Donald J.; Ospina-Giraldo, Manuel D.; Morris, Paul F.; Phuntumart, Vipaporn; Putnam, Nicholas J.; Rash, Sam; Rose, Jocelyn K. C.; Sakihama, Yasuko; Salamov, Asaf A.; Savidor, Alon; Scheuring, Chantel F.; Smith, Brian M.; Sobral, Bruno W. S.; Terry, Astrid; Torto-Alalibo, Trudy A.; Win, Joe; Xu, Zhanyou; Zhang, Hongbin; Grigoriev, Igor V.; Rokhsar, Daniel S.; Boore, Jeffrey L.

    2006-04-17

    Draft genome sequences have been determined for the soybean pathogen Phytophthora sojae and the sudden oak death pathogen Phytophthora ramorum. Oömycetes such as these Phytophthora species share the kingdom Stramenopila with photosynthetic algae such as diatoms, and the presence of many Phytophthora genes of probable phototroph origin supports a photosynthetic ancestry for the stramenopiles. Comparison of the two species' genomes reveals a rapid expansion and diversification of many protein families associated with plant infection such as hydrolases, ABC transporters, protein toxins, proteinase inhibitors, and, in particular, a superfamily of 700 proteins with similarity to known oömycete avirulence genes.

  15. Draft genome sequence of Bacillus amyloliquefaciens HB-26

    PubMed Central

    Liu, Xiao-Yan; Min, Yong; Wang, Kai-Mei; Wan, Zhong-Yi; Zhang, Zhi-Gang; Cao, Chun-Xia; Zhou, Rong-Hua; Jiang, Ai-Bing; Liu, Cui-Jun; Zhang, Guang-Yang; Cheng, Xian-Liang; Zhang, Wei; Yang, Zi-Wen

    2014-01-01

    Bacillus amyloliquefaciens HB-26, a Gram-positive bacterium was isolated from soil in China. SDS-PAGE analysis showed this strain secreted six major protein bands of 65, 60, 55, 34, 25 and 20 kDa. A bioassay of this strain reveals that it shows specific activity against P. brassicae and nematode. Here we describe the features of this organism, together with the draft genome sequence and annotation. The 3,989,358 bp long genome (39 contigs) contains 4,001 protein-coding genes and 80 RNA genes. PMID:25197462

  16. Coupling sequencing by hybridization (SBH) with gel sequencing for an inexpensive analysis of genes and genomes

    SciTech Connect

    Drmanac, S.; Labat, I.; Hauser, B.; Drmanac, R.

    1996-11-01

    The speed and cost of DNA sequencing are bottlenecks in the analysis of genes end genomes. Sequencing by hybridization (SBH) is a versatile method with several applications which can accelerated DNA screening, mapping and sequencing. Requirements, achievements and problems in the development of the SBH format 1 (DNA samples arrayed) are presented and schemes for its synergetic coupling with gel sequencing techniques are discussed. It appears that by one hybridization machine with 24 boxes and four ABI gel sequencers 100- 300 Mb of DNA sequence can be determined per year. Various genetic studies based on computer assisted analysis of large collections of partial or complete DNA sequences (`sequenetics`) may be achieved in this century.

  17. Physical map-assisted whole-genome shotgun sequence assemblies

    PubMed Central

    Warren, René L.; Varabei, Dmitry; Platt, Darren; Huang, Xiaoqiu; Messina, David; Yang, Shiaw-Pyng; Kronstad, James W.; Krzywinski, Martin; Warren, Wesley C.; Wallis, John W.; Hillier, LaDeana W.; Chinwalla, Asif T.; Schein, Jacqueline E.; Siddiqui, Asim S.; Marra, Marco A.; Wilson, Richard K.; Jones, Steven J.M.

    2006-01-01

    We describe a targeted approach to improve the contiguity of whole-genome shotgun sequence (WGS) assemblies at run-time, using information from Bacterial Artificial Chromosome (BAC)-based physical maps. Clone sizes and overlaps derived from clone fingerprints are used for the calculation of length constraints between any two BAC neighbors sharing 40% of their size. These constraints are used to promote the linkage and guide the arrangement of sequence contigs within a sequence scaffold at the layout phase of WGS assemblies. This process is facilitated by FASSI, a stand-alone application that calculates BAC end and BAC overlap length constraints from clone fingerprint map contigs created by the FPC package. FASSI is designed to work with the assembly tool PCAP, but its output can be formatted to work with other WGS assembly algorithms able to use length constraints for individual clones. The FASSI method is simple to implement, potentially cost-effective, and has resulted in the increase of scaffold contiguity for both the Drosophila melanogaster and Cryptococcus gattii genomes when compared to a control assembly without map-derived constraints. A 6.5-fold coverage draft DNA sequence of the Pan troglodytes (chimpanzee) genome was assembled using map-derived constraints and resulted in a 26.1% increase in scaffold contiguity. PMID:16741162

  18. Physical map-assisted whole-genome shotgun sequence assemblies.

    PubMed

    Warren, René L; Varabei, Dmitry; Platt, Darren; Huang, Xiaoqiu; Messina, David; Yang, Shiaw-Pyng; Kronstad, James W; Krzywinski, Martin; Warren, Wesley C; Wallis, John W; Hillier, LaDeana W; Chinwalla, Asif T; Schein, Jacqueline E; Siddiqui, Asim S; Marra, Marco A; Wilson, Richard K; Jones, Steven J M

    2006-06-01

    We describe a targeted approach to improve the contiguity of whole-genome shotgun sequence (WGS) assemblies at run-time, using information from Bacterial Artificial Chromosome (BAC)-based physical maps. Clone sizes and overlaps derived from clone fingerprints are used for the calculation of length constraints between any two BAC neighbors sharing 40% of their size. These constraints are used to promote the linkage and guide the arrangement of sequence contigs within a sequence scaffold at the layout phase of WGS assemblies. This process is facilitated by FASSI, a stand-alone application that calculates BAC end and BAC overlap length constraints from clone fingerprint map contigs created by the FPC package. FASSI is designed to work with the assembly tool PCAP, but its output can be formatted to work with other WGS assembly algorithms able to use length constraints for individual clones. The FASSI method is simple to implement, potentially cost-effective, and has resulted in the increase of scaffold contiguity for both the Drosophila melanogaster and Cryptococcus gattii genomes when compared to a control assembly without map-derived constraints. A 6.5-fold coverage draft DNA sequence of the Pan troglodytes (chimpanzee) genome was assembled using map-derived constraints and resulted in a 26.1% increase in scaffold contiguity. PMID:16741162

  19. Whole-genome sequence-based analysis of thyroid function.

    PubMed

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J; Traglia, Michela; Brown, Suzanne J; Mullin, Benjamin H; Shihab, Hashem A; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R; Beilby, John P; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D; Hui, Jennie; Lim, Ee M; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R B; Bell, Jordana T; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M; Naitza, Silvia; Walsh, John P; Spector, Tim; Davey Smith, George; Durbin, Richard; Richards, J Brent; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J; Wilson, Scott G

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF?1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10(-9)) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10(-14)). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10(-9)) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10(-11)). All common variants explain ?20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function. PMID:25743335

  20. Sequencing and annotation of the Ophiostoma ulmi genome

    PubMed Central

    2013-01-01

    Background The ascomycete fungus Ophiostoma ulmi was responsible for the initial pandemic of the massively destructive Dutch elm disease in Europe and North America in early 1910. Dutch elm disease has ravaged the elm tree population globally and is a major threat to the remaining elm population. O. ulmi is also associated with valuable biomaterials applications. It was recently discovered that proteins from O. ulmi can be used for efficient transformation of amylose in the production of bioplastics. Results We have sequenced the 31.5 Mb genome of O.ulmi using Illumina next generation sequencing. Applying both de novo and comparative genome annotation methods, we predict a total of 8639 gene models. The quality of the predicted genes was validated using a variety of data sources consisting of EST data, mRNA-seq data and orthologs from related fungal species. Sequence-based computational methods were used to identify candidate virulence-related genes. Metabolic pathways were reconstructed and highlight specific enzymes that may play a role in virulence. Conclusions This genome sequence will be a useful resource for further research aimed at understanding the molecular mechanisms of pathogenicity by O. ulmi. It will also facilitate the identification of enzymes necessary for industrial biotransformation applications. PMID:23496816

  1. Simple sequence repeats in the Helicobacter pylori genome.

    PubMed

    Saunders, N J; Peden, J F; Hood, D W; Moxon, E R

    1998-03-01

    We describe an integrated system for the analysis of DNA sequence motifs within complete bacterial genome sequences. This system is based around ACeDB, a genome database with an integrated graphical user interface; we identify and display motifs in the context of genetic, sequence and bibliographic data. Tomb et aL (1997) previously reported the identification of contingency genes in Helicobacter pylori through their association with homopolymeric tracts and dinucleotide repeats. With this as a starting point, we validated the system by a search for this type of repeat and used the contextual information to assess the likelihood that they mediate phase variation in the associated open reading frames (ORFs). We found all of the repeats previously described, and identified 27 putative phase-variable genes (including 17 previously described). These could be divided into three groups: lipopolysaccharide (LPS) biosynthesis, cell-surface-associated proteins and DNA restriction/modification systems. Five of the putative genes did not have obvious homologues in any of the public domain sequence databases. The reading frame of some ORFs was disrupted by the presence of the repeats, including the alpha(1-2) fucosyltransferase gene, necessary for the synthesis of the Lewis Y epitope. An additional benefit of this approach is that the results of each search can be analysed further and compared with those from other genomes. This revealed that H. pylori has an unusually high frequency of homopurine:homopyrimidine repeats suggesting mechanistic biases that favour their presence and instability. PMID:9570395

  2. Draft genome sequence of Xanthomonas axonopodis pathovar vasculorum NCPPB 900.

    PubMed

    Harrison, James; Studholme, David J

    2014-11-01

    Xanthomonas axonopodis pathovar vasculorum strain NCPPB 900 was isolated from sugarcane on Reunion island in 1960. Consistent with its belonging to fatty-acid type D, multi-locus sequence analysis confirmed that NCPPB 900 falls within the species X. axonopodis. This genome harbours sequences similar to plasmids pXCV183 from X. campestris pv. vesicatoria 85-10 and pPHB194 from Burkholderia pseudomallei. Its repertoire of predicted effectors includes homologues of XopAA, XopAD, XopAE, XopB, XopD, XopV, XopZ, XopC and XopI and transcriptional activator-like effectors and it is predicted to encode a novel phosphonate natural product also encoded by the genome of the phylogenetically distant X. vasicola pv. vasculorum. Availability of this novel genome sequence may facilitate the study of interactions between xanthomonads and sugarcane, a host-pathogen system that appears to have evolved several times independently within the genus Xanthomonas and may also provide a source of target sequences for molecular detection and diagnostics PMID:25263632

  3. Impact of Small Repeat Sequences on Bacterial Genome Evolution

    PubMed Central

    Delihas, Nicholas

    2011-01-01

    Intergenic regions of prokaryotic genomes carry multiple copies of terminal inverted repeat (TIR) sequences, the nonautonomous miniature inverted-repeat transposable element (MITE). In addition, there are the repetitive extragenic palindromic (REP) sequences that fold into a small stem loop rich in G–C bonding. And the clustered regularly interspaced short palindromic repeats (CRISPRs) display similar small stem loops but are an integral part of a complex genetic element. Other classes of repeats such as the REP2 element do not have TIRs but show other signatures. With the current availability of a large number of whole-genome sequences, many new repeat elements have been discovered. These sequences display diverse properties. Some show an intimate linkage to integrons, and at least one encodes a small RNA. Many repeats are found fused with chromosomal open reading frames, and some are located within protein coding sequences. Small repeat units appear to work hand in hand with the transcriptional and/or post-transcriptional apparatus of the cell. Functionally, they are multifaceted, and this can range from the control of gene expression, the facilitation of host/pathogen interactions, or stimulation of the mammalian immune system. The CRISPR complex displays dramatic functions such as an acquired immune system that defends against invading viruses and plasmids. Evolutionarily, mobile repeat elements may have influenced a cycle of active versus inactive genes in ancestral organisms, and some repeats are concentrated in regions of the chromosome where there is significant genomic plasticity. Changes in the abundance of genomic repeats during the evolution of an organism may have resulted in a benefit to the cell or posed a disadvantage, and some present day species may reflect a purification process. The diverse structure, eclectic functions, and evolutionary aspects of repeat elements are described. PMID:21803768

  4. IsoFinder: computational prediction of isochores in genome sequences

    PubMed Central

    Oliver, José L.; Carpena, Pedro; Hackenberg, Michael; Bernaola-Galván, Pedro

    2004-01-01

    Isochores are long genome segments homogeneous in G+C. Here, we describe an algorithm (IsoFinder) running on the web (http://bioinfo2.ugr.es/IsoF/isofinder.html) able to predict isochores at the sequence level. We move a sliding pointer from left to right along the DNA sequence. At each position of the pointer, we compute the mean G+C values to the left and to the right of the pointer. We then determine the position of the pointer for which the difference between left and right mean values (as measured by the t-statistic) reaches its maximum. Next, we determine the statistical significance of this potential cutting point, after filtering out short-scale heterogeneities below 3 kb by applying a coarse-graining technique. Finally, the program checks whether this significance exceeds a probability threshold. If so, the sequence is cut at this point into two subsequences; otherwise, the sequence remains undivided. The procedure continues recursively for each of the two resulting subsequences created by each cut. This leads to the decomposition of a chromosome sequence into long homogeneous genome regions (LHGRs) with well-defined mean G+C contents, each significantly different from the G+C contents of the adjacent LHGRs. Most LHGRs can be identified with Bernardi's isochores, given their correlation with biological features such as gene density, SINE and LINE (short, long interspersed repetitive elements) densities, recombination rate or single nucleotide polymorphism variability. The resulting isochore maps are available at our web site (http://bioinfo2.ugr.es/isochores/), and also at the UCSC Genome Browser (http://genome.cse.ucsc.edu/). PMID:15215396

  5. A strategy to recover a high-quality, complete plastid sequence from low-coverage whole-genome sequencing1

    PubMed Central

    Garaycochea, Silvia; Speranza, Pablo; Alvarez-Valin, Fernando

    2015-01-01

    Premise of the study: We developed a bioinformatic strategy to recover and assemble a chloroplast genome using data derived from low-coverage 454 GS FLX/Roche whole-genome sequencing. Methods: A comparative genomics approach was applied to obtain the complete chloroplast genome from a weedy biotype of rice from Uruguay. We also applied appropriate filters to discriminate reads representing novel DNA transfer events between the chloroplast and nuclear genomes. Results: From a set of 295,159 reads (96 Mb data), we assembled the chloroplast genome into two contigs. This weedy rice was classified based on 23 polymorphic regions identified by comparison with reference chloroplast genomes. We detected recent and past events of genetic material transfer between the chloroplast and nuclear genomes and estimated their occurrence frequency. Discussion: We obtained a high-quality complete chloroplast genome sequence from low-coverage sequencing data. Intergenome DNA transfer appears to be more frequent than previously thought. PMID:26504677

  6. Widespread endogenization of genome sequences of non-retroviral RNA viruses into plant genomes.

    PubMed

    Chiba, Sotaro; Kondo, Hideki; Tani, Akio; Saisho, Daisuke; Sakamoto, Wataru; Kanematsu, Satoko; Suzuki, Nobuhiro

    2011-07-01

    Non-retroviral RNA virus sequences (NRVSs) have been found in the chromosomes of vertebrates and fungi, but not plants. Here we report similarly endogenized NRVSs derived from plus-, negative-, and double-stranded RNA viruses in plant chromosomes. These sequences were found by searching public genomic sequence databases, and, importantly, most NRVSs were subsequently detected by direct molecular analyses of plant DNAs. The most widespread NRVSs were related to the coat protein (CP) genes of the family Partitiviridae which have bisegmented dsRNA genomes, and included plant- and fungus-infecting members. The CP of a novel fungal virus (Rosellinia necatrix partitivirus 2, RnPV2) had the greatest sequence similarity to Arabidopsis thaliana ILR2, which is thought to regulate the activities of the phytohormone auxin, indole-3-acetic acid (IAA). Furthermore, partitivirus CP-like sequences much more closely related to plant partitiviruses than to RnPV2 were identified in a wide range of plant species. In addition, the nucleocapsid protein genes of cytorhabdoviruses and varicosaviruses were found in species of over 9 plant families, including Brassicaceae and Solanaceae. A replicase-like sequence of a betaflexivirus was identified in the cucumber genome. The pattern of occurrence of NRVSs and the phylogenetic analyses of NRVSs and related viruses indicate that multiple independent integrations into many plant lineages may have occurred. For example, one of the NRVSs was retained in Ar. thaliana but not in Ar. lyrata or other related Camelina species, whereas another NRVS displayed the reverse pattern. Our study has shown that single- and double-stranded RNA viral sequences are widespread in plant genomes, and shows the potential of genome integrated NRVSs to contribute to resolve unclear phylogenetic relationships of plant species. PMID:21779172

  7. Widespread Endogenization of Genome Sequences of Non-Retroviral RNA Viruses into Plant Genomes

    PubMed Central

    Tani, Akio; Saisho, Daisuke; Sakamoto, Wataru; Kanematsu, Satoko; Suzuki, Nobuhiro

    2011-01-01

    Non-retroviral RNA virus sequences (NRVSs) have been found in the chromosomes of vertebrates and fungi, but not plants. Here we report similarly endogenized NRVSs derived from plus-, negative-, and double-stranded RNA viruses in plant chromosomes. These sequences were found by searching public genomic sequence databases, and, importantly, most NRVSs were subsequently detected by direct molecular analyses of plant DNAs. The most widespread NRVSs were related to the coat protein (CP) genes of the family Partitiviridae which have bisegmented dsRNA genomes, and included plant- and fungus-infecting members. The CP of a novel fungal virus (Rosellinia necatrix partitivirus 2, RnPV2) had the greatest sequence similarity to Arabidopsis thaliana ILR2, which is thought to regulate the activities of the phytohormone auxin, indole-3-acetic acid (IAA). Furthermore, partitivirus CP-like sequences much more closely related to plant partitiviruses than to RnPV2 were identified in a wide range of plant species. In addition, the nucleocapsid protein genes of cytorhabdoviruses and varicosaviruses were found in species of over 9 plant families, including Brassicaceae and Solanaceae. A replicase-like sequence of a betaflexivirus was identified in the cucumber genome. The pattern of occurrence of NRVSs and the phylogenetic analyses of NRVSs and related viruses indicate that multiple independent integrations into many plant lineages may have occurred. For example, one of the NRVSs was retained in Ar. thaliana but not in Ar. lyrata or other related Camelina species, whereas another NRVS displayed the reverse pattern. Our study has shown that single- and double-stranded RNA viral sequences are widespread in plant genomes, and shows the potential of genome integrated NRVSs to contribute to resolve unclear phylogenetic relationships of plant species. PMID:21779172

  8. Complete genome sequence of the soil actinomycete Kocuria rhizophila.

    PubMed

    Takarada, Hiromi; Sekine, Mitsuo; Kosugi, Hiroki; Matsuo, Yasunori; Fujisawa, Takatomo; Omata, Seiha; Kishi, Emi; Shimizu, Ai; Tsukatani, Naofumi; Tanikawa, Satoshi; Fujita, Nobuyuki; Harayama, Shigeaki

    2008-06-01

    The soil actinomycete Kocuria rhizophila belongs to the suborder Micrococcineae, a divergent bacterial group for which only a limited amount of genomic information is currently available. K. rhizophila is also important in industrial applications; e.g., it is commonly used as a standard quality control strain for antimicrobial susceptibility testing. Sequencing and annotation of the genome of K. rhizophila DC2201 (NBRC 103217) revealed a single circular chromosome (2,697,540 bp; G+C content of 71.16%) containing 2,357 predicted protein-coding genes. Most of the predicted proteins (87.7%) were orthologous to actinobacterial proteins, and the genome showed fairly good conservation of synteny with taxonomically related actinobacterial genomes. On the other hand, the genome seems to encode much smaller numbers of proteins necessary for secondary metabolism (one each of nonribosomal peptide synthetase and type III polyketide synthase), transcriptional regulation, and lateral gene transfer, reflecting the small genome size. The presence of probable metabolic pathways for the transformation of phenolic compounds generated from the decomposition of plant materials, and the presence of a large number of genes associated with membrane transport, particularly amino acid transporters and drug efflux pumps, may contribute to the organism's utilization of root exudates, as well as the tolerance to various organic compounds. PMID:18408034

  9. A Core Research Facility at Yale's West Campus The Yale Center for Genome Analysis

    E-print Network

    Harris, Jack

    Center for Genome Analysis, a state-of-the-art DNA sequencing facility located at Yale's West Campus--with its 434,000 square feet of state-of-the-art laboratory space--these resources are now available to West Campus. And in the future, supporting bot

  10. Improvements to pairwise sequence comparison (PASC): a genome-based web tool for virus classification.

    PubMed

    Bao, Yiming; Chetvernin, Vyacheslav; Tatusova, Tatiana

    2014-12-01

    The number of viral genome sequences in the public databases is increasing dramatically, and these sequences are playing an important role in virus classification. Pairwise sequence comparison is a sequence-based virus classification method. A program using this method calculates the pairwise identities of virus sequences within a virus family and displays their distribution, and visual analysis helps to determine demarcations at different taxonomic levels such as strain, species, genus and subfamily. Subsequent comparison of new sequences against existing ones allows viruses from which the new sequences were derived to be classified. Although this method cannot be used as the only criterion for virus classification in some cases, it is a quantitative method and has many advantages over conventional virus classification methods. It has been applied to several virus families, and there is an increasing interest in using this method for other virus families/groups. The Pairwise Sequence Comparison (PASC) classification tool was created at the National Center for Biotechnology Information. The tool's database stores pairwise identities for complete genomes/segments of 56 virus families/groups. Data in the system are updated every day to reflect changes in virus taxonomy and additions of new virus sequences to the public database. The web interface of the tool ( http://www.ncbi.nlm.nih.gov/sutils/pasc/ ) makes it easy to navigate and perform analyses. Multiple new viral genome sequences can be tested simultaneously with this system to suggest the taxonomic position of virus isolates in a specific family. PASC eliminates potential discrepancies in the results caused by different algorithms and/or different data used by researchers. PMID:25119676

  11. About The Center for Cancer Genomics (CCG)

    Cancer.gov

    CCG promotes opportunities to work with other agencies and community physicians to usher in a modern era of diagnosis, treatment, and prevention based on the study of genomes, gene expression, proteomics, and the use of other technologies.

  12. Genome Sequence and Comparative Genome Analysis of Lactobacillus casei: Insights into Their Niche-Associated Evolution

    PubMed Central

    Cai, Hui; Thompson, Rebecca; Budinich, Mateo F.; Broadbent, Jeff R.

    2009-01-01

    Lactobacillus casei is remarkably adaptable to diverse habitats and widely used in the food industry. To reveal the genomic features that contribute to its broad ecological adaptability and examine the evolution of the species, the genome sequence of L. casei ATCC 334 is analyzed and compared with other sequenced lactobacilli. This analysis reveals that ATCC 334 contains a high number of coding sequences involved in carbohydrate utilization and transcriptional regulation, reflecting its requirement for dealing with diverse environmental conditions. A comparison of the genome sequences of ATCC 334 to L. casei BL23 reveals 12 and 19 genomic islands, respectively. For a broader assessment of the genetic variability within L. casei, gene content of 21 L. casei strains isolated from various habitats (cheeses, n = 7; plant materials, n = 8; and human sources, n = 6) was examined by comparative genome hybridization with an ATCC 334-based microarray. This analysis resulted in identification of 25 hypervariable regions. One of these regions contains an overrepresentation of genes involved in carbohydrate utilization and transcriptional regulation and was thus proposed as a lifestyle adaptation island. Differences in L. casei genome inventory reveal both gene gain and gene decay. Gene gain, via acquisition of genomic islands, likely confers a fitness benefit in specific habitats. Gene decay, that is, loss of unnecessary ancestral traits, is observed in the cheese isolates and likely results in enhanced fitness in the dairy niche. This study gives the first picture of the stable versus variable regions in L. casei and provides valuable insights into evolution, lifestyle adaptation, and metabolic diversity of L. casei. PMID:20333194

  13. Complete genome sequence of “Enterobacter lignolyticus” SCF1

    SciTech Connect

    DeAngelis, Kristen M.; D'Haeseleer, Patrik; Chivian, Dylan; Fortney, Julian L.; Khudyakov, Jane I.; Simmons, Blake A.; Woo, Hannah; Arkin, Adam P.; Davenport, Karen W.; Goodwin, Lynne A.; Chen, Amy; Ivanova, Natalia; Kyrpides, Nikos C.; Mavromatis, Konstantinos; Woyke, Tanja; Hazen, Terry C.

    2011-09-23

    In an effort to discover anaerobic bacteria capable of lignin degradation, we isolated 'Ente-robacter lignolyticus' SCF1 on minimal media with alkali lignin as the sole source of carbon. This organism was isolated anaerobically from tropical forest soils collected from the Short Cloud Forest site in the El Yunque National Forest in Puerto Rico, USA, part of the Luquillo Long-Term Ecological Research Station. At this site, the soils experience strong fluctuations in redox potential and are net methane producers. Because of its ability to grow on lignin anae-robically, we sequenced the genome. The genome of 'E. lignolyticus' SCF1 is 4.81 Mbp with no detected plasmids, and includes a relatively small arsenal of lignocellulolytic carbohy-drate active enzymes. Lignin degradation was observed in culture, and the genome revealed two putative laccases, a putative peroxidase, and a complete 4-hydroxyphenylacetate degra-dation pathway encoded in a single gene cluster.

  14. Adaptive radiation of Darwin's finches revisited using whole genome sequencing.

    PubMed

    Almén, Markus Sällman; Lamichhaney, Sangeet; Berglund, Jonas; Grant, B Rosemary; Grant, Peter R; Webster, Matthew T; Andersson, Leif

    2016-01-01

    We recently used genome sequencing to study the evolutionary history of the Darwin's finches. A prominent feature of our data was that different polymorphic sites in the genome tended to indicate different genetic relationships among these closely related species. Such patterns are expected in recently diverged genomes as a result of incomplete lineage sorting. However, we uncovered conclusive evidence that these patterns have also been influenced by interspecies hybridisation, a process that has likely played an important role in the radiation of Darwin's finches. A major discovery was that segregation of two haplotypes at the ALX1 locus underlies variation in beak shape among the Darwin's finches, and that differences between the two haplotypes in a 240?kb region in blunt and pointed beaked birds involve both coding and regulatory changes. As we review herein, the evolution of such adaptive haplotypes comprising multiple causal changes appears to be an important mechanism contributing to the evolution of biodiversity. PMID:26606649

  15. Complete genome sequence of Arcanobacterium haemolyticum type strain (11018T)

    PubMed Central

    Yasawong, Montri; Teshima, Hazuki; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Bruce, David; Detter, Chris; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Rohde, Manfred; Sikorski, Johannes; Pukall, Rüdiger; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2010-01-01

    Arcanobacterium haemolyticum (ex MacLean et al. 1946) Collins et al. 1983 is the type species of the genus Arcanobacterium, which belongs to the family Actinomycetaceae. The strain is of interest because it is an obligate parasite of the pharynx of humans and farm animal; occasionally, it causes pharyngeal or skin lesions. It is a Gram-positive, nonmotile and non-sporulating bacterium. The strain described in this study was isolated from infections amongst American soldiers of certain islands of the North and West Pacific. This is the first completed sequence of a member of the genus Arcanobacterium and the ninth type strain genome from the family Actinomycetaceae. The 1,986,154 bp long genome with its 1,821 protein-coding and 64 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304742

  16. Streamlined Genome Sequence Compression using Distributed Source Coding

    PubMed Central

    Wang, Shuang; Jiang, Xiaoqian; Chen, Feng; Cui, Lijuan; Cheng, Samuel

    2014-01-01

    We aim at developing a streamlined genome sequence compression algorithm to support alternative miniaturized sequencing devices, which have limited communication, storage, and computation power. Existing techniques that require heavy client (encoder side) cannot be applied. To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side. Based on the variation between source and reference, our protocol will pick adaptively either syndrome coding or hash coding to compress subsequences of changing code length. Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS). PMID:25520552

  17. Draft Genome Sequence of Colletotrichum acutatum Sensu Lato (Colletotrichum fioriniae).

    PubMed

    Baroncelli, Riccardo; Sreenivasaprasad, Surapareddy; Sukno, Serenella A; Thon, Michael R; Holub, Eric

    2014-01-01

    In addition to its economic impact, Colletotrichum acutatum sensu lato is an interesting model for molecular investigations due to the diversity of host-determined specialization and reproductive lifestyles within the species complex. The pathogen Colletotrichum fioriniae forms part of this species complex and causes anthracnose in a wide range of crops and wild plants worldwide. Some members of this species have also been reported to be entomopathogenic. Here, we report the draft genome sequence of a heterothallic reference isolate of C. fioriniae (strain PJ7). This sequence provides a range of new resources that serve as a useful platform for further research in the field. PMID:24723700

  18. Permanent draft genome sequence of Comamonas testosteroni KF-1

    PubMed Central

    Weiss, Michael; Kesberg, Anna I.; LaButti, Kurt M.; Pitluck, Sam; Bruce, David; Hauser, Loren; Copeland, Alex; Woyke, Tanja; Lowry, Stephen; Lucas, Susan; Land, Miriam; Goodwin, Lynne; Kjelleberg, Staffan; Cook, Alasdair M.; Buhmann, Matthias; Thomas, Torsten; Schleheck, David

    2013-01-01

    Comamonas testosteroni KF-1 is a model organism for the elucidation of the novel biochemical degradation pathways for xenobiotic 4-sulfophenylcarboxylates (SPC) formed during biodegradation of synthetic 4-sulfophenylalkane surfactants (linear alkylbenzenesulfonates, LAS) by bacterial communities. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 6,026,527 bp long chromosome (one sequencing gap) exhibits an average G+C content of 61.79% and is predicted to encode 5,492 protein-coding genes and 114 RNA genes. PMID:23991256

  19. Construction of an integrated database to support genomic sequence analysis

    SciTech Connect

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  20. Permanent draft genome sequence of Comamonas testosteroni KF-1

    SciTech Connect

    Weiss, Michael; Kesberg, Anna I; LaButti, Kurt; Pitluck, Sam; Bruce, David; Hauser, Loren John; Copeland, A; Woyke, Tanja; Lowry, Stephen; Lucas, Susan; Land, Miriam L; Goodwin, Lynne A.; Kjelleberg, Staffan; Cook, Alasdair M.; Buhmann, Matthias; Thomas, Torsten; Schleheck, David

    2013-01-01

    Comamonas testosteroni KF-1 is a model organism for the elucidation of the novel biochemical degra- dation pathways for xenobiotic 4-sulfophenylcarboxylates (SPC) formed during biodegradation of syn- thetic 4-sulfophenylalkane surfactants (linear alkylbenzenesulfonates, LAS) by bacterial communities. Here we describe the features of this organism, together with the complete genome sequence and an- notation. The 6,026,527 bp long chromosome (one sequencing gap) exhibits an average G+C content of 61.79% and is predicted to encode 5,492 protein-coding genes and 114 RNA genes.

  1. Genome sequencing of bacteria: sequencing, de novo assembly and rapid analysis using open source tools

    PubMed Central

    2013-01-01

    Background De novo genome sequencing of previously uncharacterized microorganisms has the potential to open up new frontiers in microbial genomics by providing insight into both functional capabilities and biodiversity. Until recently, Roche 454 pyrosequencing was the NGS method of choice for de novo assembly because it generates hundreds of thousands of long reads (<450?bps), which are presumed to aid in the analysis of uncharacterized genomes. The array of tools for processing NGS data are increasingly free and open source and are often adopted for both their high quality and role in promoting academic freedom. Results The error rate of pyrosequencing the Alcanivorax borkumensis genome was such that thousands of insertions and deletions were artificially introduced into the finished genome. Despite a high coverage (~30 fold), it did not allow the reference genome to be fully mapped. Reads from regions with errors had low quality, low coverage, or were missing. The main defect of the reference mapping was the introduction of artificial indels into contigs through lower than 100% consensus and distracting gene calling due to artificial stop codons. No assembler was able to perform de novo assembly comparable to reference mapping. Automated annotation tools performed similarly on reference mapped and de novo draft genomes, and annotated most CDSs in the de novo assembled draft genomes. Conclusions Free and open source software (FOSS) tools for assembly and annotation of NGS data are being developed rapidly to provide accurate results with less computational effort. Usability is not high priority and these tools currently do not allow the data to be processed without manual intervention. Despite this, genome assemblers now readily assemble medium short reads into long contigs (>97-98% genome coverage). A notable gap in pyrosequencing technology is the quality of base pair calling and conflicting base pairs between single reads at the same nucleotide position. Regardless, using draft whole genomes that are not finished and remain fragmented into tens of contigs allows one to characterize unknown bacteria with modest effort. PMID:23547799

  2. Hellbender Genome Sequences Shed Light on Genomic Expansion at the Base of Crown Salamanders

    PubMed Central

    Sun, Cheng; Mueller, Rachel Lockridge

    2014-01-01

    Among animals, genome sizes range from 20 Mb to 130 Gb, with 380-fold variation across vertebrates. Most of the largest vertebrate genomes are found in salamanders, an amphibian clade of 660 species. Thus, salamanders are an important system for studying causes and consequences of genomic gigantism. Previously, we showed that plethodontid salamander genomes accumulate higher levels of long terminal repeat (LTR) retrotransposons than do other vertebrates, although the evolutionary origins of such sequences remained unexplored. We also showed that some salamanders in the family Plethodontidae have relatively slow rates of DNA loss through small insertions and deletions. Here, we present new data from Cryptobranchus alleganiensis, the hellbender. Cryptobranchus and Plethodontidae span the basal phylogenetic split within salamanders; thus, analyses incorporating these taxa can shed light on the genome of the ancestral crown salamander lineage, which underwent expansion. We show that high levels of LTR retrotransposons likely characterize all crown salamanders, suggesting that disproportionate expansion of this transposable element (TE) class contributed to genomic expansion. Phylogenetic and age distribution analyses of salamander LTR retrotransposons indicate that salamanders’ high TE levels reflect persistence and diversification of ancestral TEs rather than horizontal transfer events. Finally, we show that relatively slow DNA loss rates through small indels likely characterize all crown salamanders, suggesting that a decreased DNA loss rate contributed to genomic expansion at the clade’s base. Our identification of shared genomic features across phylogenetically distant salamanders is a first step toward identifying the evolutionary processes underlying accumulation and persistence of high levels of repetitive sequence in salamander genomes. PMID:25115007

  3. Complete Genome Sequence of Polypropylene Glycol- and Polyethylene Glycol-Degrading Sphingopyxis macrogoltabida Strain EY-1

    PubMed Central

    Nagata, Yuji; Numata, Mitsuru; Tsuchikane, Kieko; Hosoyama, Akira; Yamazoe, Atsushi; Tsuda, Masataka; Fujita, Nobuyuki; Kawai, Fusako

    2015-01-01

    Strain EY-1 was isolated from a microbial consortium growing on a random polymer of ethylene oxide and propylene oxide. Strain EY-1 grew on polyethylene glycol and polypropylene glycol and identified as Sphingopyxis macrogoltabida. Here, we report the complete genome sequence of Sphingopyxis macrogoltabida EY-1. The genome of strain EY-1 is comprised of a 4.76-Mb circular chromosome, and five plasmids. The whole finishing was conducted in silico, with aids of computational tools GenoFinisher and AceFileViewer. Strain EY-1 is available from Biological Resource Center, National Institute of Technology and Evaluation (Tokyo, Japan) (NITE). PMID:26634754

  4. Complete Genome Sequence of a Phenanthrene Degrader, Burkholderia sp. HB-1 (NBRC 110738)

    PubMed Central

    Moriya, Azusa; Kato, Hiromi; Ogawa, Natsumi; Nagata, Yuji; Tsuda, Masataka

    2015-01-01

    The phenanthrene-degrading Burkholderia sp. HB-1 was isolated from a phenanthrene-enrichment culture seeded with a pristine farm soil sample. We report the complete genome sequence of HB-1, which has been deposited to the stock culture (NBRC 110738) at Biological Resource Center, National Institute of Technology and Evaluation (NITE), Tokyo, Japan. The genome of strain HB-1 comprises two circular chromosomes of 4.1 Mb and 3.1 Mb. The finishing was facilitated by the computational tools GenoFinisher, AceFileViewer, and ShortReadManager. PMID:26543118

  5. Complete Genome Sequence of Polypropylene Glycol- and Polyethylene Glycol-Degrading Sphingopyxis macrogoltabida Strain EY-1.

    PubMed

    Ohtsubo, Yoshiyuki; Nagata, Yuji; Numata, Mitsuru; Tsuchikane, Kieko; Hosoyama, Akira; Yamazoe, Atsushi; Tsuda, Masataka; Fujita, Nobuyuki; Kawai, Fusako

    2015-01-01

    Strain EY-1 was isolated from a microbial consortium growing on a random polymer of ethylene oxide and propylene oxide. Strain EY-1 grew on polyethylene glycol and polypropylene glycol and identified as Sphingopyxis macrogoltabida. Here, we report the complete genome sequence of Sphingopyxis macrogoltabida EY-1. The genome of strain EY-1 is comprised of a 4.76-Mb circular chromosome, and five plasmids. The whole finishing was conducted in silico, with aids of computational tools GenoFinisher and AceFileViewer. Strain EY-1 is available from Biological Resource Center, National Institute of Technology and Evaluation (Tokyo, Japan) (NITE). PMID:26634754

  6. The tomato genome sequence provides insights into fleshy fruit evolution.

    PubMed

    2012-05-31

    Tomato (Solanum lycopersicum) is a major crop plant and a model system for fruit development. Solanum is one of the largest angiosperm genera and includes annual and perennial plants from diverse habitats. Here we present a high-quality genome sequence of domesticated tomato, a draft sequence of its closest wild relative, Solanum pimpinellifolium, and compare them to each other and to the potato genome (Solanum tuberosum). The two tomato genomes show only 0.6% nucleotide divergence and signs of recent admixture, but show more than 8% divergence from potato, with nine large and several smaller inversions. In contrast to Arabidopsis, but similar to soybean, tomato and potato small RNAs map predominantly to gene-rich chromosomal regions, including gene promoters. The Solanum lineage has experienced two consecutive genome triplications: one that is ancient and shared with rosids, and a more recent one. These triplications set the stage for the neofunctionalization of genes controlling fruit characteristics, such as colour and fleshiness. PMID:22660326

  7. Complete genome sequence of Desulfomicrobium baculatum type strain (XT)

    SciTech Connect

    Copeland, Alex; Spring, Stefan; Goker, Markus; Schneider, Susanne; Lapidus, Alla; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Nolan, Matt; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavrommatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia C; Meincke, Linda; Sims, David; Brettin, Thomas; Detter, John C; Han, Cliff; Chain, Patrick; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C; Lucas, Susan

    2009-05-20

    Desulfomicrobium baculatum is the type species of the genus Desulfomicrobium, which is the type genus of the family Desulfomicrobiaceae. It is of phylogenetic interest because of the isolated location of the family Desulfomicrobiaceae within the order Desulfovibrionales. D. baculatum strain XT is a Gram-negative, motile, sulfate-reducing bacterium isolated from water-saturated manganese carbonate ore. It is strictly anaerobic and does not require NaCl for growth, although NaCl concentrations up to 6percent (w/v) are tolerated. The metabolism is respiratory or fermentative. In the presence of sulfate, pyruvate and lactate are incompletely oxidized to acetate and CO2. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the deltaproteobacterial family Desulfomicrobiaceae, and this 3,942,657 bp long single replicon genome with its 3494 protein-coding and 72 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  8. Complete genome sequence of Halanaerobium praevalens type strain (GSLT)

    SciTech Connect

    Ivanova, N; Sikorski, Johannes; Chertkov, Olga; Nolan, Matt; Lucas, Susan; Hammon, Nancy; Deshpande, Shweta; Cheng, Jan-Fang; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne A.; Pitluck, Sam; Huntemann, Marcel; Liolios, Konstantinos; Pagani, Ioanna; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Brambilla, Evelyne-Marie; Kannan, K. Palani; Rohde, Manfred; Tindall, Brian; Goker, Markus; Detter, J. Chris; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lapidus, Alla L.

    2011-01-01

    Halanaerobium praevalens Zeikus et al. 1984 is the type species of the genus Halanaero- bium, which in turn is the type genus of the family Halanaerobiaceae. The species is of inter- est because it is able to reduce a variety of nitro-substituted aromatic compounds at a high rate, and because of its ability to degrade organic pollutants. The strain is also of interest be- cause it functions as a hydrolytic bacterium, fermenting complex organic matter and produc- ing intermediary metabolites for other trophic groups such as sulfate-reducing and methano- genic bacteria. It is further reported as being involved in carbon removal in the Great Salt Lake, its source of isolation. This is the first completed genome sequence of a representative of the genus Halanaerobium and the second genome sequence from a type strain of the fami- ly Halanaerobiaceae. The 2,309,262 bp long genome with its 2,110 protein-coding and 70 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  9. Structure and sequence of the saimiriine herpesvirus 1 genome

    PubMed Central

    Tyler, Shaun; Severini, Alberto; Black, Darla; Walker, Matthew; Eberle, R.

    2010-01-01

    We report here the complete genome sequence of the squirrel monkey ?-herpesvirus saimiriine herpesvirus 1 (HVS1). Unlike the simplexviruses of other primate species, only the unique short region of the HVS1 genome is bounded by inverted repeats. While all Old World simian simplexviruses characterized to date lack the herpes simplex virus RL1 (?34.5) gene, HVS1 has an RL1 gene. HVS1 lacks several genes that are present in other primate simplexviruses (US8.5, US10–12, UL43/43.5 and UL49A). Although the overall genome structure appears more like that of varicelloviruses, the encoded HVS1 proteins are most closely related to homologous proteins of the primate simplexviruses. Phylogenetic analyses confirm that HVS1 is a simplexvirus. Limited comparison of two HVS1 strains revealed a very low degree of sequence variation more typical of varicelloviruses. HVS1 is thus unique among the primate ?-herpesviruses in that its genome has properties of both simplexviruses and varicelloviruses. PMID:21130483

  10. Complete genome sequence of Kytococcus sedentarius type strain (541T)

    PubMed Central

    Sims, David; Brettin, Thomas; Detter, John C.; Han, Cliff; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Chen, Feng; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ovchinnikova, Galina; Pati, Amrita; Ivanova, Natalia; Mavrommatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; D'haeseleer, Patrik; Chain, Patrick; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Schneider, Susanne; Göker, Markus; Pukall, Rüdiger; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2009-01-01

    Kytococcus sedentarius (ZoBell and Upham 1944) Stackebrandt et al. 1995 is the type strain of the species, and is of phylogenetic interest because of its location in the Dermacoccaceae, a poorly studied family within the actinobacterial suborder Micrococcineae. Kytococcus sedentarius is known for the production of oligoketide antibiotics as well as for its role as an opportunistic pathogen causing valve endocarditis, hemorrhagic pneumonia, and pitted keratolysis. It is strictly aerobic and can only grow when several amino acids are provided in the medium. The strain described in this report is a free-living, nonmotile, Gram-positive bacterium, originally isolated from a marine environment. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the family Dermacoccaceae and the 2,785,024 bp long single replicon genome with its 2639 protein-coding and 64 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304632

  11. Complete genome sequence of Bifidobacterium animalis subsp. lactis BLC1.

    PubMed

    Bottacini, Francesca; Dal Bello, Fabio; Turroni, Francesca; Milani, Christian; Duranti, Sabrina; Foroni, Elena; Viappiani, Alice; Strati, Francesco; Mora, Diego; van Sinderen, Douwe; Ventura, Marco

    2011-11-01

    Bifidobacterium animalis subsp. lactis BLC1 is a probiotic bacterium that is widely exploited by food industries as the active ingredient of various functional foods. Here we report the complete genome sequence of B. animalis subsp. lactis BLC1, which is expected to provide insights into the biology of this health-promoting microorganism and improve our understanding of its phylogenetic relatedness with other members of the B. animalis subsp. lactis taxon. PMID:22038957

  12. The complete mitochondrial genome sequence of Hemiculter leucisculus.

    PubMed

    Dong, Fang; Tong, Guang-Xiang; Kuang, You-Yi; Zheng, Xian-Hu; Sun, Xiao-Wen

    2015-10-01

    The complete mitochondrial genome of Hemiculter leucisculus was determined to be 16,617?bp. It contains the structure of 22 transfer RNA genes, 13 protein-coding genes, 2 ribosomal RNA genes, and non-coding control region (D-loop). The critical central conserved sequences (CSB-D, CSB-E, and CSB-F) were also detected. The determination of H. leucisculus mitogenome would play an important role in genetic diversity and population vitality in Cyprinidae. PMID:24460158

  13. Genome Sequence of the Human Pathogen Vibrio cholerae Amazonia

    PubMed Central

    Thompson, Cristiane C.; Marin, Michel A.; Dias, Graciela M.; Dutilh, Bas E.; Edwards, Robert A.; Iida, Tetsuya; Thompson, Fabiano L.; Vicente, Ana Carolina P.

    2011-01-01

    Vibrio cholerae O1 Amazonia is a pathogen that was isolated from cholera-like diarrhea cases in at least two countries, Brazil and Ghana. Based on multilocus sequence analysis, this lineage belongs to a distinct profile compared to strains from El Tor and classical biotypes. The genomic analysis revealed that it contains Vibrio pathogenicity island 2 and a set of genes related to pathogenesis and fitness, such as the type VI secretion system, present in choleragenic V. cholerae strains. PMID:21952545

  14. Genome Sequences of Three Novel Bacillus cereus Bacteriophages.

    PubMed

    Grose, Julianne H; Jensen, Jordan D; Merrill, Bryan D; Fisher, Joshua N B; Burnett, Sandra H; Breakwell, Donald P

    2014-01-01

    The Bacillus cereus group is an assemblage of highly related firmicute bacteria that cause a variety of diseases in animals, including insects and humans. We announce three high-quality, complete genome sequences of bacteriophages we isolated from soil samples taken at the bases of fruit trees in Utah County, Utah. While two of the phages (Shanette and JL) are highly related myoviruses, the bacteriophage Basilisk is a siphovirus. PMID:24459255

  15. Genome Sequence of Propionibacterium acnes Type II Strain ATCC 11828

    PubMed Central

    Horváth, Balázs; Hunyadkürti, Judit; Vörös, Andrea; Fekete, Csaba; Urbán, Edit; Kemény, Lajos

    2012-01-01

    Propionibacterium acnes is an anaerobic Gram-positive bacterium that forms part of the normal human cutaneous microbiota and is occasionally associated with inflammatory diseases (I. Kurokawa et al., Exp. Dermatol. 18:821–832, 2009). Here we present the complete genome sequence for the commercially available P. acnes type II reference strain ATCC 11828 (I. Nagy et al., Microbes Infect. 8:2195–2205, 2006) recovered from a subcutaneous abscess. PMID:22156398

  16. Full genome sequences of two reticuloendotheliosis viruses contaminating commercial vaccines.

    PubMed

    Liu, Qinfang; Zhao, Jixun; Su, Jingliang; Pu, Juan; Zhang, Guozhong; Liu, Jinhua

    2009-09-01

    Reticuloendotheliosis virus (REV) fragments are a common contaminant in some commercial vaccines such as fowl poxvirus (FPV) and Marek's disease virus. However, only those strains integrating or containing a near-intact REV provirus are more likely to cause problems in the field. We confirm here, by PCR assays and animal experiments, that vaccines against FPV and herpes virus of turkeys were contaminated with full genome sequences of REV. Further, we determined the complete proviral sequence of two REV isolates from contaminated vaccines. Two REV isolates (REV-99 and REV-06) present in the vaccines were both replication competent, and their proviral genome was 8286 nucleotides in length with two identical long terminal repeats (LTR). The complete genome in these two REV isolates shared 99.8% identity to APC-566 and fowl poxvirus REV proviral inserts (FPV-REV). REV-99 and REV-06 LTR showed over 99% identity to chicken syncytial virus (CSV), but an identity of only 75.8% and 78.0%, respectively, to SNV. Alignments with other available REV gag, pol, and env sequences revealed high similarity at the nucleotide level. The results further indicated that the prototype CSV may be the most-important REV contaminant in the commercial vaccines, and distinct genotypes of REVs may cocirculate in chicken flocks of China at the present time. PMID:19848070

  17. Identification and Analysis of Gene Families from the Duplicated Genome of Soybean Using EST Sequences

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Large scale gene analysis of most organisms is hampered by incomplete genomic sequences. In many organisms, such as soybean, the best source of sequence information is the existence of expressed sequence tag (EST) libraries. Soybean has a large (1115 Mbp) genome that has yet to be fully sequenced....

  18. Complete genome sequence of an emerging genotype of tobacco streak virus in the U.S.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We report the complete genome sequence of an emerging genotype of Tobacco streak virus (TSV) infecting zucchini squash in Florida (TSV_FL13-07), through deep sequencing of sRNAs and validation by Sanger sequencing. TSV_FL13-07 only shares less than 90% sequence identity in three genomic ribonucleic...

  19. Fuzzy Classification of Genome Sequences Prior to Assembly Based on Similarity Measures*

    E-print Network

    Nicolescu, Monica

    with ambiguity in trying to match and assemble a genome from its sequenced subsequences. This research develops are then given to a sequencing machine. Fragments or subsequences are selected randomly; using a sequence once. Sequencing DNA using the shot-gun method was introduced in 1995 [3]. More details about whole genome shot-gun

  20. Global Genomic Diversity of Human Papillomavirus 6 Based on 724 Isolates and 190 Complete Genome Sequences

    PubMed Central

    Jelen, Mateja M.; Chen, Zigui; Kocjan, Boštjan J.; Burt, Felicity J.; Chan, Paul K. S.; Chouhy, Diego; Combrinck, Catharina E.; Coutlée, François; Estrade, Christine; Ferenczy, Alex; Fiander, Alison; Franco, Eduardo L.; Garland, Suzanne M.; Giri, Adriana A.; González, Joaquín Víctor; Gröning, Arndt; Heidrich, Kerstin; Hibbitts, Sam; Hošnjak, Lea; Luk, Tommy N. M.; Marinic, Karina; Matsukura, Toshihiko; Neumann, Anna; Oštrbenk, Anja; Picconi, Maria Alejandra; Richardson, Harriet; Sagadin, Martin; Sahli, Roland; Seedat, Riaz Y.; Seme, Katja; Severini, Alberto; Sinchi, Jessica L.; Smahelova, Jana; Tabrizi, Sepehr N.; Tachezy, Ruth; Tohme, Sarah; Uloza, Virgilijus; Vitkauskiene, Astra; Wong, Yong Wee; Židovec Lepej, Snježana; Burk, Robert D.

    2014-01-01

    ABSTRACT Human papillomavirus type 6 (HPV6) is the major etiological agent of anogenital warts and laryngeal papillomas and has been included in both the quadrivalent and nonavalent prophylactic HPV vaccines. This study investigated the global genomic diversity of HPV6, using 724 isolates and 190 complete genomes from six continents, and the association of HPV6 genomic variants with geographical location, anatomical site of infection/disease, and gender. Initially, a 2,800-bp E5a-E5b-L1-LCR fragment was sequenced from 492/530 (92.8%) HPV6-positive samples collected for this study. Among them, 130 exhibited at least one single nucleotide polymorphism (SNP), indel, or amino acid change in the E5a-E5b-L1-LCR fragment and were sequenced in full. A global alignment and maximum likelihood tree of 190 complete HPV6 genomes (130 fully sequenced in this study and 60 obtained from sequence repositories) revealed two variant lineages, A and B, and five B sublineages: B1, B2, B3, B4, and B5. HPV6 (sub)lineage-specific SNPs and a 960-bp representative region for whole-genome-based phylogenetic clustering within the L2 open reading frame were identified. Multivariate logistic regression analysis revealed that lineage B predominated globally. Sublineage B3 was more common in Africa and North and South America, and lineage A was more common in Asia. Sublineages B1 and B3 were associated with anogenital infections, indicating a potential lesion-specific predilection of some HPV6 sublineages. Females had higher odds for infection with sublineage B3 than males. In conclusion, a global HPV6 phylogenetic analysis revealed the existence of two variant lineages and five sublineages, showing some degree of ethnogeographic, gender, and/or disease predilection in their distribution. IMPORTANCE This study established the largest database of globally circulating HPV6 genomic variants and contributed a total of 130 new, complete HPV6 genome sequences to available sequence repositories. Two HPV6 variant lineages and five sublineages were identified and showed some degree of association with geographical location, anatomical site of infection/disease, and/or gender. We additionally identified several HPV6 lineage- and sublineage-specific SNPs to facilitate the identification of HPV6 variants and determined a representative region within the L2 gene that is suitable for HPV6 whole-genome-based phylogenetic analysis. This study complements and significantly expands the current knowledge of HPV6 genetic diversity and forms a comprehensive basis for future epidemiological, evolutionary, functional, pathogenicity, vaccination, and molecular assay development studies. PMID:24741079