Sample records for genome sequence typing

  1. Development of Mycoplasma synoviae (MS) core genome multilocus sequence typing (cgMLST) scheme.

    PubMed

    Ghanem, Mostafa; El-Gazzar, Mohamed

    2018-05-01

    Mycoplasma synoviae (MS) is a poultry pathogen with reported increased prevalence and virulence in recent years. MS strain identification is essential for prevention, control efforts and epidemiological outbreak investigations. Multiple multilocus based sequence typing schemes have been developed for MS, yet the resolution of these schemes could be limited for outbreak investigation. The cost of whole genome sequencing became close to that of sequencing the seven MLST targets; however, there is no standardized method for typing MS strains based on whole genome sequences. In this paper, we propose a core genome multilocus sequence typing (cgMLST) scheme as a standardized and reproducible method for typing MS based whole genome sequences. A diverse set of 25 MS whole genome sequences were used to identify 302 core genome genes as cgMLST targets (35.5% of MS genome) and 44 whole genome sequences of MS isolates from six countries in four continents were used for typing applying this scheme. cgMLST based phylogenetic trees displayed a high degree of agreement with core genome SNP based analysis and available epidemiological information. cgMLST allowed evaluation of two conventional MLST schemes of MS. The high discriminatory power of cgMLST allowed differentiation between samples of the same conventional MLST type. cgMLST represents a standardized, accurate, highly discriminatory, and reproducible method for differentiation between MS isolates. Like conventional MLST, it provides stable and expandable nomenclature, allowing for comparing and sharing the typing results between different laboratories worldwide. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  2. Complete genome sequence of the Antarctic Halorubrum lacusprofundi type strain ACAM 34

    DOE PAGES

    Anderson, Iain J.; DasSarma, Priya; Lucas, Susan; ...

    2016-09-10

    Halorubrum lacusprofundi is an extreme halophile within the archaeal phylum Euryarchaeota. The type strain ACAM 34 was isolated from Deep Lake, Antarctica. H. lacusprofundi is of phylogenetic interest because it is distantly related to the haloarchaea that have previously been sequenced. It is also of interest because of its psychrotolerance. We report here the complete genome sequence of H. lacusprofundi type strain ACAM 34 and its annotation. In conclusion, this genome is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.

  3. Complete genome sequence of the Antarctic Halorubrum lacusprofundi type strain ACAM 34

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, Iain J.; DasSarma, Priya; Lucas, Susan

    Halorubrum lacusprofundi is an extreme halophile within the archaeal phylum Euryarchaeota. The type strain ACAM 34 was isolated from Deep Lake, Antarctica. H. lacusprofundi is of phylogenetic interest because it is distantly related to the haloarchaea that have previously been sequenced. It is also of interest because of its psychrotolerance. We report here the complete genome sequence of H. lacusprofundi type strain ACAM 34 and its annotation. In conclusion, this genome is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.

  4. First genome report on novel sequence types of Neisseria meningitidis: ST12777 and ST12778.

    PubMed

    Veeraraghavan, Balaji; Lal, Binesh; Devanga Ragupathi, Naveen Kumar; Neeravi, Iyyan Raj; Jeyaraman, Ranjith; Varghese, Rosemol; Paul, Miracle Magdalene; Baskaran, Ashtawarthani; Ranjan, Ranjini

    2018-03-01

    Neisseria meningitidis is an important causative agent of meningitis and/or sepsis with high morbidity and mortality. Baseline genome data on N. meningitidis, especially from developing countries such as India, are lacking. This study aimed to investigate the whole genome sequences of N. meningitidis isolates from a tertiary care centre in India. Whole-genome sequencing was performed using an Ion Torrent™ Personal Genome Machine™ (PGM) with 400-bp chemistry. Data were assembled de novo using SPAdes Genome Assembler v.5.0.0.0. Sequence annotation was performed through PATRIC, RAST and the NCBI PGAAP server. Downstream analysis of the isolates was performed using the Center for Genomic Epidemiology databases for antimicrobial resistance genes and sequence types. Virulence factors and CRISPR were analysed using the PubMLST database and CRISPRFinder, respectively. This study reports the whole genome shotgun sequences of eight N. meningitidis isolates from bloodstream infections. The genome data revealed two novel sequence types (ST12777 and ST12778), along with ST11, ST437 and ST6928. The virulence profile of the isolates matched their sequence types. All isolates were negative for plasmid-mediated resistance genes. To the best of our knowledge, this is the first report of ST11 and ST437 N. meningitidis isolates in India along with two novel sequence types (ST12777 and ST12778). These results indicate that the sequence types circulating in India are diverse and require continuous monitoring. Further studies strengthening the genome data on N. meningitidis are required to understand the prevalence, spread, exact resistance and virulence mechanisms along with serotypes. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.

  5. First High-Quality Draft Genome Sequence of Pasteurella multocida Sequence Type 128 Isolated from Infected Bone.

    PubMed

    Kavousi, Niloofar; Eng, Wilhelm Wei Han; Lee, Yin Peng; Tan, Lian Huat; Thuraisingham, Ravindran; Yule, Catherine M; Gan, Han Ming

    2016-03-03

    We report here the first high-quality draft genome sequence of Pasteurella multocida sequence type 128, which was isolated from the infected finger bone of an adult female who was bitten by a domestic dog. The draft genome will be a valuable addition to the scarce genomic resources available for P. multocida. Copyright © 2016 Kavousi et al.

  6. Draft Genome Sequences for Two Metal-Reducing Pelosinus fermentans Strains Isolated from a Cr(VI)-Contaminated Site and for Type Strain R7

    PubMed Central

    Brown, Steven D.; Podar, Mircea; Klingeman, Dawn M.; Johnson, Courtney M.; Yang, Zamin K.; Utturkar, Sagar M.; Land, Miriam L.; Mosher, Jennifer J.; Hurt, Richard A.; Phelps, Tommy J.; Palumbo, Anthony V.; Arkin, Adam P.; Hazen, Terry C.

    2012-01-01

    Pelosinus fermentans 16S rRNA gene sequences have been reported from diverse geographical sites since the recent isolation of the type strain. We present the genome sequence of the P. fermentans type strain R7 (DSM 17108) and genome sequences for two new strains with different abilities to reduce iron, chromate, and uranium. PMID:22933770

  7. Complete genome sequence of Streptosporangium roseum type strain (NI 9100T)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nolan, Matt; Sikorski, Johannes; Jando, Marlen

    2010-01-01

    Streptosporangium roseum Crauch 1955 is the type strain of the species which is the type species of the genus Streptosporangium. The pinkish coiled Streptomyces-like organism with a spore case was isolated from vegetable garden soil in 1955. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the family Streptosporangiaceae, and the second largest microbial genome sequence ever deciphered. The 10,369,518 bp long genome with its 9421 protein-coding and 80 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaeamore » project.« less

  8. Automated typing of red blood cell and platelet antigens: a whole-genome sequencing study.

    PubMed

    Lane, William J; Westhoff, Connie M; Gleadall, Nicholas S; Aguad, Maria; Smeland-Wagman, Robin; Vege, Sunitha; Simmons, Daimon P; Mah, Helen H; Lebo, Matthew S; Walter, Klaudia; Soranzo, Nicole; Di Angelantonio, Emanuele; Danesh, John; Roberts, David J; Watkins, Nick A; Ouwehand, Willem H; Butterworth, Adam S; Kaufman, Richard M; Rehm, Heidi L; Silberstein, Leslie E; Green, Robert C

    2018-06-01

    There are more than 300 known red blood cell (RBC) antigens and 33 platelet antigens that differ between individuals. Sensitisation to antigens is a serious complication that can occur in prenatal medicine and after blood transfusion, particularly for patients who require multiple transfusions. Although pre-transfusion compatibility testing largely relies on serological methods, reagents are not available for many antigens. Methods based on single-nucleotide polymorphism (SNP) arrays have been used, but typing for ABO and Rh-the most important blood groups-cannot be done with SNP typing alone. We aimed to develop a novel method based on whole-genome sequencing to identify RBC and platelet antigens. This whole-genome sequencing study is a subanalysis of data from patients in the whole-genome sequencing arm of the MedSeq Project randomised controlled trial (NCT01736566) with no measured patient outcomes. We created a database of molecular changes in RBC and platelet antigens and developed an automated antigen-typing algorithm based on whole-genome sequencing (bloodTyper). This algorithm was iteratively improved to address cis-trans haplotype ambiguities and homologous gene alignments. Whole-genome sequencing data from 110 MedSeq participants (30 × depth) were used to initially validate bloodTyper through comparison with conventional serology and SNP methods for typing of 38 RBC antigens in 12 blood-group systems and 22 human platelet antigens. bloodTyper was further validated with whole-genome sequencing data from 200 INTERVAL trial participants (15 × depth) with serological comparisons. We iteratively improved bloodTyper by comparing its typing results with conventional serological and SNP typing in three rounds of testing. The initial whole-genome sequencing typing algorithm was 99·5% concordant across the first 20 MedSeq genomes. Addressing discordances led to development of an improved algorithm that was 99·8% concordant for the remaining 90 MedSeq genomes. Additional modifications led to the final algorithm, which was 99·2% concordant across 200 INTERVAL genomes (or 99·9% after adjustment for the lower depth of coverage). By enabling more precise antigen-matching of patients with blood donors, antigen typing based on whole-genome sequencing provides a novel approach to improve transfusion outcomes with the potential to transform the practice of transfusion medicine. National Human Genome Research Institute, Doris Duke Charitable Foundation, National Health Service Blood and Transplant, National Institute for Health Research, and Wellcome Trust. Copyright © 2018 Elsevier Ltd. All rights reserved.

  9. Whole-Genome Sequences of Listeria monocytogenes Sequence Type 6 Isolates Associated with a Large Foodborne Outbreak in South Africa, 2017 to 2018

    PubMed Central

    Tau, Nomsa; Smouse, Shannon L.; Mtshali, Phillip S.; Mnyameni, Florah; Khumalo, Zamantungwa T. H.; Ismail, Arshad; Govender, Nevashan; Thomas, Juno

    2018-01-01

    ABSTRACT We report whole-genome sequences for 10 Listeria monocytogenes sequence type 6 isolates associated with a large listeriosis outbreak in South Africa, which occurred over the period of 2017 to 2018. The possibility of listeriosis spreading beyond South Africa’s borders as a result of exported contaminated food products prompted us to make the genome sequences publicly available. PMID:29930052

  10. A RESTful application programming interface for the PubMLST molecular typing and genome databases

    PubMed Central

    Bray, James E.; Maiden, Martin C. J.

    2017-01-01

    Abstract Molecular typing is used to differentiate microorganisms at the subspecies or strain level for epidemiological investigations, infection control, public health and environmental sampling. DNA sequence-based typing methods require authoritative databases that link sequence variants to nomenclature in order to facilitate communication and comparison of identified types in national or global settings. The PubMLST website (https://pubmlst.org/) fulfils this role for over a hundred microorganisms for which it hosts curated molecular sequence typing data, providing sequence and allelic profile definitions for multi-locus sequence typing (MLST) and single-gene typing approaches. In recent years, these have expanded to cover the whole genome with schemes such as core genome MLST (cgMLST) and whole genome MLST (wgMLST) which catalogue the allelic diversity found in hundreds to thousands of genes. These approaches provide a common nomenclature for high-resolution strain characterization and comparison. Molecular typing information is linked to isolate provenance, phenotype, and increasingly genome assemblies, providing a resource for outbreak investigation and research in to population structure, gene association, global epidemiology and vaccine coverage. A Representational State Transfer (REST) Application Programming Interface (API) has been developed for the PubMLST website to make these large quantities of structured molecular typing and whole genome sequence data available for programmatic access by any third party application. The API is an integral component of the Bacterial Isolate Genome Sequence Database (BIGSdb) platform that is used to host PubMLST resources, and exposes all public data within the site. In addition to data browsing, searching and download, the API supports authentication and submission of new data to curator queues. Database URL: http://rest.pubmlst.org/ PMID:29220452

  11. Complete genome sequence of the hippuricase-positive Campylobacter avium type strain LMG 24591

    USDA-ARS?s Scientific Manuscript database

    Campylobacter avium is a hippurate-positive, thermotolerant campylobacter that has been isolated from poultry. Here we present the genome sequences of two C. avium strains isolated from broiler chickens: strains LMG 24591T (complete genome) and LMG 24592 (draft genome). The C. avium type strain geno...

  12. Reads2Type: a web application for rapid microbial taxonomy identification.

    PubMed

    Saputra, Dhany; Rasmussen, Simon; Larsen, Mette V; Haddad, Nizar; Sperotto, Maria Maddalena; Aarestrup, Frank M; Lund, Ole; Sicheritz-Pontén, Thomas

    2015-11-25

    Identification of bacteria may be based on sequencing and molecular analysis of a specific locus such as 16S rRNA, or a set of loci such as in multilocus sequence typing. In the near future, healthcare institutions and routine diagnostic microbiology laboratories may need to sequence the entire genome of microbial isolates. Therefore we have developed Reads2Type, a web-based tool for taxonomy identification based on whole bacterial genome sequence data. Raw sequencing data provided by the user are mapped against a set of marker probes that are derived from currently available bacteria complete genomes. Using a dataset of 1003 whole genome sequenced bacteria from various sequencing platforms, Reads2Type was able to identify the species with 99.5 % accuracy and on the minutes time scale. In comparison with other tools, Reads2Type offers the advantage of not needing to transfer sequencing files, as the entire computational analysis is done on the computer of whom utilizes the web application. This also prevents data privacy issues to arise. The Reads2Type tool is available at http://www.cbs.dtu.dk/~dhany/reads2type.html.

  13. Whole-Genome Sequences of Listeria monocytogenes Sequence Type 6 Isolates Associated with a Large Foodborne Outbreak in South Africa, 2017 to 2018.

    PubMed

    Allam, Mushal; Tau, Nomsa; Smouse, Shannon L; Mtshali, Phillip S; Mnyameni, Florah; Khumalo, Zamantungwa T H; Ismail, Arshad; Govender, Nevashan; Thomas, Juno; Smith, Anthony M

    2018-06-21

    We report whole-genome sequences for 10 Listeria monocytogenes sequence type 6 isolates associated with a large listeriosis outbreak in South Africa, which occurred over the period of 2017 to 2018. The possibility of listeriosis spreading beyond South Africa's borders as a result of exported contaminated food products prompted us to make the genome sequences publicly available. Copyright © 2018 Allam et al.

  14. Complete Genome Sequences of Isolates of Enterococcus faecium Sequence Type 117, a Globally Disseminated Multidrug-Resistant Clone

    PubMed Central

    Tedim, Ana P.; Lanza, Val F.; Manrique, Marina; Pareja, Eduardo; Ruiz-Garbajosa, Patricia; Cantón, Rafael; Baquero, Fernando; Tobes, Raquel

    2017-01-01

    ABSTRACT The emergence of nosocomial infections by multidrug-resistant sequence type 117 (ST117) Enterococcus faecium has been reported in several European countries. ST117 has been detected in Spanish hospitals as one of the main causes of bloodstream infections. We analyzed genome variations of ST117 strains isolated in Madrid and describe the first ST117 closed genome sequences. PMID:28360174

  15. Draft genome sequence of an aflatoxigenic Aspergillus species, A. bombycis

    USDA-ARS?s Scientific Manuscript database

    The genome of the A. bombycis Type strain was sequenced using a Personal Genome Machine, followed by annotation of its predicted genes. The genome size for A. bombycis was found to be approximately 37 Mb and contained 12,266 genes. This announcement introduces a sequenced genome for an aflatoxigenic...

  16. Preferences for learning different types of genome sequencing results among young breast cancer patients: Role of psychological and clinical factors.

    PubMed

    Kaphingst, Kimberly A; Ivanovich, Jennifer; Lyons, Sarah; Biesecker, Barbara; Dresser, Rebecca; Elrick, Ashley; Matsen, Cindy; Goodman, Melody

    2018-01-29

    The growing importance of genome sequencing means that patients will increasingly face decisions regarding what results they would like to learn. The present study examined psychological and clinical factors that might affect these preferences. 1,080 women diagnosed with breast cancer at age 40 or younger completed an online survey. We assessed their interest in learning various types of genome sequencing results: risk of preventable disease or unpreventable disease, cancer treatment response, uncertain meaning, risk to relatives' health, and ancestry/physical traits. Multivariable logistic regression was used to examine whether being "very" interested in each result type was associated with clinical factors: BRCA1/2 mutation status, prior genetic testing, family history of breast cancer, and psychological factors: cancer recurrence worry, genetic risk worry, future orientation, health information orientation, and genome sequencing knowledge. The proportion of respondents who were very interested in learning each type of result ranged from 16% to 77%. In all multivariable models, those who were very interested in learning a result type had significantly higher knowledge about sequencing benefits, greater genetic risks worry, and stronger health information orientation compared to those with less interest (p-values < .05). Our findings indicate that high interest in return of various types of genome sequencing results was more closely related to psychological factors. Shared decision-making approaches that increase knowledge about genome sequencing and incorporate patient preferences for health information and learning about genetic risks may help support patients' informed choices about learning different types of sequencing results. © Society of Behavioral Medicine 2018.

  17. Whole genome sequence of Enterobacter ludwigii type strain EN-119T, isolated from clinical specimens.

    PubMed

    Li, Gengmi; Hu, Zonghai; Zeng, Ping; Zhu, Bing; Wu, Lijuan

    2015-04-01

    Enterobacter ludwigii strain EN-119(T) is the type strain of E. ludwigii, which belongs to the E. cloacae complex (Ecc). This strain was first reported and nominated in 2005 and later been found in many hospitals. In this paper, the whole genome sequencing of this strain was carried out. The total genome size of EN-119(T) is 4952,770 bp with 4578 coding sequences, 88 tRNAs and 10 rRNAs. The genome sequence of EN-119(T) is the first whole genome sequence of E. ludwigii, which will further our understanding of Ecc. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  18. The global catalogue of microorganisms 10K type strain sequencing project: closing the genomic gaps for the validly published prokaryotic and fungi species.

    PubMed

    Wu, Linhuan; McCluskey, Kevin; Desmeth, Philippe; Liu, Shuangjiang; Hideaki, Sugawara; Yin, Ye; Moriya, Ohkuma; Itoh, Takashi; Kim, Cha Young; Lee, Jung-Sook; Zhou, Yuguang; Kawasaki, Hiroko; Hazbón, Manzour Hernando; Robert, Vincent; Boekhout, Teun; Lima, Nelson; Evtushenko, Lyudmila; Boundy-Mills, Kyria; Bunk, Boyke; Moore, Edward R B; Eurwilaichitr, Lily; Ingsriswang, Supawadee; Shah, Heena; Yao, Su; Jin, Tao; Huang, Jinqun; Shi, Wenyu; Sun, Qinglan; Fan, Guomei; Li, Wei; Li, Xian; Kurtböke, Ipek; Ma, Juncai

    2018-05-01

    Genomic information is essential for taxonomic, phylogenetic, and functional studies to comprehensively decipher the characteristics of microorganisms, to explore microbiomes through metagenomics, and to answer fundamental questions of nature and human life. However, large gaps remain in the available genomic sequencing information published for bacterial and archaeal species, and the gaps are even larger for fungal type strains. The Global Catalogue of Microorganisms (GCM) leads an internationally coordinated effort to sequence type strains and close gaps in the genomic maps of microorganisms. Hence, the GCM aims to promote research by deep-mining genomic data.

  19. Genome Sequence of the Enterobacter mori Type Strain, LMG 25706, a Pathogenic Bacterium of Morus alba L. ▿

    PubMed Central

    Zhu, Bo; Zhang, Guo-Qing; Lou, Miao-Miao; Tian, Wen-Xiao; Li, Bin; Zhou, Xue-Ping; Wang, Guo-Feng; Liu, He; Xie, Guan-Lin; Jin, Gu-Lei

    2011-01-01

    Enterobacter mori is a plant-pathogenic enterobacterium responsible for the bacterial wilt of Morus alba L. Here we present the draft genome sequence of the type strain, LMG 25706. To the best of our knowledge, this is the first genome sequence of a plant-pathogenic bacterium in the genus Enterobacter. PMID:21602328

  20. Complete Genome Sequence of Salmonella enterica Serovar Typhimurium Strain YU15 (Sequence Type 19) Harboring the Salmonella Genomic Island 1 and Virulence Plasmid pSTV

    PubMed Central

    Calva, Edmundo; Puente, José L.; Zaidi, Mussaret B.

    2016-01-01

    The complete genome of Salmonella enterica subsp. enterica serovar Typhimurium sequence type 19 (ST19) strain YU15, isolated in Yucatán, Mexico, from a human baby stool culture, was determined using PacBio technology. The chromosome contains five intact prophages and the Salmonella genomic island 1 (SGI1). This strain carries the Salmonella virulence plasmid pSTV. PMID:27081132

  1. Analysis of Illumina Microbial Assemblies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Clum, Alicia; Foster, Brian; Froula, Jeff

    2010-05-28

    Since the emerging of second generation sequencing technologies, the evaluation of different sequencing approaches and their assembly strategies for different types of genomes has become an important undertaken. Next generation sequencing technologies dramatically increase sequence throughput while decreasing cost, making them an attractive tool for whole genome shotgun sequencing. To compare different approaches for de-novo whole genome assembly, appropriate tools and a solid understanding of both quantity and quality of the underlying sequence data are crucial. Here, we performed an in-depth analysis of short-read Illumina sequence assembly strategies for bacterial and archaeal genomes. Different types of Illumina libraries as wellmore » as different trim parameters and assemblers were evaluated. Results of the comparative analysis and sequencing platforms will be presented. The goal of this analysis is to develop a cost-effective approach for the increased throughput of the generation of high quality microbial genomes.« less

  2. The Salmonella In Silico Typing Resource (SISTR): An Open Web-Accessible Tool for Rapidly Typing and Subtyping Draft Salmonella Genome Assemblies.

    PubMed

    Yoshida, Catherine E; Kruczkiewicz, Peter; Laing, Chad R; Lingohr, Erika J; Gannon, Victor P J; Nash, John H E; Taboada, Eduardo N

    2016-01-01

    For nearly 100 years serotyping has been the gold standard for the identification of Salmonella serovars. Despite the increasing adoption of DNA-based subtyping approaches, serotype information remains a cornerstone in food safety and public health activities aimed at reducing the burden of salmonellosis. At the same time, recent advances in whole-genome sequencing (WGS) promise to revolutionize our ability to perform advanced pathogen characterization in support of improved source attribution and outbreak analysis. We present the Salmonella In Silico Typing Resource (SISTR), a bioinformatics platform for rapidly performing simultaneous in silico analyses for several leading subtyping methods on draft Salmonella genome assemblies. In addition to performing serovar prediction by genoserotyping, this resource integrates sequence-based typing analyses for: Multi-Locus Sequence Typing (MLST), ribosomal MLST (rMLST), and core genome MLST (cgMLST). We show how phylogenetic context from cgMLST analysis can supplement the genoserotyping analysis and increase the accuracy of in silico serovar prediction to over 94.6% on a dataset comprised of 4,188 finished genomes and WGS draft assemblies. In addition to allowing analysis of user-uploaded whole-genome assemblies, the SISTR platform incorporates a database comprising over 4,000 publicly available genomes, allowing users to place their isolates in a broader phylogenetic and epidemiological context. The resource incorporates several metadata driven visualizations to examine the phylogenetic, geospatial and temporal distribution of genome-sequenced isolates. As sequencing of Salmonella isolates at public health laboratories around the world becomes increasingly common, rapid in silico analysis of minimally processed draft genome assemblies provides a powerful approach for molecular epidemiology in support of public health investigations. Moreover, this type of integrated analysis using multiple sequence-based methods of sub-typing allows for continuity with historical serotyping data as we transition towards the increasing adoption of genomic analyses in epidemiology. The SISTR platform is freely available on the web at https://lfz.corefacility.ca/sistr-app/.

  3. Complete genome sequence of Aminobacterium colombiense type strain (ALA-1T)

    PubMed Central

    Chertkov, Olga; Sikorski, Johannes; Brambilla, Evelyne; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Han, Cliff; Detter, John C.; Bruce, David; Tapia, Roxanne; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Spring, Stefan; Rohde, Manfred; Göker, Markus; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2010-01-01

    Aminobacterium colombiense Baena et al. 1999 is the type species of the genus Aminobacterium. This genus is of large interest because of its isolated phylogenetic location in the family Synergistaceae, its strictly anaerobic lifestyle, and its ability to grow by fermentation of a limited range of amino acids but not carbohydrates. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the second completed genome sequence of a member of the family Synergistaceae and the first genome sequence of a member of the genus Aminobacterium. The 1,980,592 bp long genome with its 1,914 protein-coding and 56 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304712

  4. Genomic Encyclopedia of Type Strains, Phase I: The one thousand microbial genomes (KMG-I) project

    DOE PAGES

    Kyrpides, Nikos C.; Woyke, Tanja; Eisen, Jonathan A.; ...

    2014-06-15

    The Genomic Encyclopedia of Bacteria and Archaea (GEBA) project was launched by the JGI in 2007 as a pilot project with the objective of sequencing 250 bacterial and archaeal genomes. The two major goals of that project were (a) to test the hypothesis that there are many benefits to the use the phylogenetic diversity of organisms in the tree of life as a primary criterion for generating their genome sequence and (b) to develop the necessary framework, technology and organization for large-scale sequencing of microbial isolate genomes. While the GEBA pilot project has not yet been entirely completed, both ofmore » the original goals have already been successfully accomplished, leading the way for the next phase of the project. Here we propose taking the GEBA project to the next level, by generating high quality draft genomes for 1,000 bacterial and archaeal strains. This represents a combined 16-fold increase in both scale and speed as compared to the GEBA pilot project (250 isolate genomes in 4+ years). We will follow a similar approach for organism selection and sequencing prioritization as was done for the GEBA pilot project (i.e. phylogenetic novelty, availability and growth of cultures of type strains and DNA extraction capability), focusing on type strains as this ensures reproducibility of our results and provides the strongest linkage between genome sequences and other knowledge about each strain. In turn, this project will constitute a pilot phase of a larger effort that will target the genome sequences of all available type strains of the Bacteria and Archaea.« less

  5. Genomic Encyclopedia of Type Strains, Phase I: The one thousand microbial genomes (KMG-I) project

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kyrpides, Nikos C.; Woyke, Tanja; Eisen, Jonathan A.

    The Genomic Encyclopedia of Bacteria and Archaea (GEBA) project was launched by the JGI in 2007 as a pilot project with the objective of sequencing 250 bacterial and archaeal genomes. The two major goals of that project were (a) to test the hypothesis that there are many benefits to the use the phylogenetic diversity of organisms in the tree of life as a primary criterion for generating their genome sequence and (b) to develop the necessary framework, technology and organization for large-scale sequencing of microbial isolate genomes. While the GEBA pilot project has not yet been entirely completed, both ofmore » the original goals have already been successfully accomplished, leading the way for the next phase of the project. Here we propose taking the GEBA project to the next level, by generating high quality draft genomes for 1,000 bacterial and archaeal strains. This represents a combined 16-fold increase in both scale and speed as compared to the GEBA pilot project (250 isolate genomes in 4+ years). We will follow a similar approach for organism selection and sequencing prioritization as was done for the GEBA pilot project (i.e. phylogenetic novelty, availability and growth of cultures of type strains and DNA extraction capability), focusing on type strains as this ensures reproducibility of our results and provides the strongest linkage between genome sequences and other knowledge about each strain. In turn, this project will constitute a pilot phase of a larger effort that will target the genome sequences of all available type strains of the Bacteria and Archaea.« less

  6. ProDeGe: A computational protocol for fully automated decontamination of genomes

    DOE PAGES

    Tennessen, Kristin; Andersen, Evan; Clingenpeel, Scott; ...

    2015-06-09

    Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequences from them are propagating into public databases to drive novel scientific discoveries, rigorous quality controls and decontamination protocols are urgently needed. Here, we present ProDeGe (Protocol for fully automated Decontamination of Genomes), the first computational protocol for fully automated decontamination of draft genomes. ProDeGe classifies sequences into two classes—clean and contaminant—using a combination of homology and feature-based methodologies.more » On average, 84% of sequence from the non-target organism is removed from the data set (specificity) and 84% of the sequence from the target organism is retained (sensitivity). Lastly, the procedure operates successfully at a rate of ~0.30 CPU core hours per megabase of sequence and can be applied to any type of genome sequence.« less

  7. ProDeGe: A computational protocol for fully automated decontamination of genomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tennessen, Kristin; Andersen, Evan; Clingenpeel, Scott

    Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequences from them are propagating into public databases to drive novel scientific discoveries, rigorous quality controls and decontamination protocols are urgently needed. Here, we present ProDeGe (Protocol for fully automated Decontamination of Genomes), the first computational protocol for fully automated decontamination of draft genomes. ProDeGe classifies sequences into two classes—clean and contaminant—using a combination of homology and feature-based methodologies.more » On average, 84% of sequence from the non-target organism is removed from the data set (specificity) and 84% of the sequence from the target organism is retained (sensitivity). Lastly, the procedure operates successfully at a rate of ~0.30 CPU core hours per megabase of sequence and can be applied to any type of genome sequence.« less

  8. Detection of somatic, subclonal and mosaic CNVs from sequencing | Division of Cancer Prevention

    Cancer.gov

    Progress in technology has made individual genome sequencing a clinical reality, with partial genome sequencing already in use in clinical care. In fact, it is expected that within a few years whole genome sequencing will be a standard procedure that will allow discovering personal genomic variants of all types and thus greatly facilitate individualized medicine. However, fast

  9. Complete genome sequence of Parvibaculum lavamentivorans type strain (DS-1(T)).

    PubMed

    Schleheck, David; Weiss, Michael; Pitluck, Sam; Bruce, David; Land, Miriam L; Han, Shunsheng; Saunders, Elizabeth; Tapia, Roxanne; Detter, Chris; Brettin, Thomas; Han, James; Woyke, Tanja; Goodwin, Lynne; Pennacchio, Len; Nolan, Matt; Cook, Alasdair M; Kjelleberg, Staffan; Thomas, Torsten

    2011-12-31

    Parvibaculum lavamentivorans DS-1(T) is the type species of the novel genus Parvibaculum in the novel family Rhodobiaceae (formerly Phyllobacteriaceae) of the order Rhizobiales of Alphaproteobacteria. Strain DS-1(T) is a non-pigmented, aerobic, heterotrophic bacterium and represents the first tier member of environmentally important bacterial communities that catalyze the complete degradation of synthetic laundry surfactants. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,914,745 bp long genome with its predicted 3,654 protein coding genes is the first completed genome sequence of the genus Parvibaculum, and the first genome sequence of a representative of the family Rhodobiaceae.

  10. Genome sequence of the mud-dwelling archaeon Methanoplanus limicola type strain (DSM 2279 T), reclassification of Methanoplanus petrolearius as Methanolacinia petrolearia and emended descriptions of the genera Methanoplanus and Methanolacinia

    DOE PAGES

    Goker, Markus; Lu, Megan; Fiebig, Anne; ...

    2014-06-15

    Methanoplanus limicola Wildgruber et al. 1984 is a mesophilic methanogen that was isolated from a swamp composed of drilling waste near Naples, Italy, shortly after the Archaea were recognized as a separate domain of life. Methanoplanus is the type genus in the family Methanoplanaceae, a taxon that felt into disuse since modern 16S rRNA gene sequences-based taxonomy was established. Methanoplanus is now placed within the Methanomicrobiaceae, a family that is so far poorly characterized at the genome level. The only other type strain of the genus with a sequenced genome, Methanoplanus petrolearius SEBR 4847 T, turned out to be misclassifiedmore » and required reclassification to Methanolacinia. Both, Methanoplanus and Methanolacinia, needed taxonomic emendations due to a significant deviation of the G+C content of their genomes from previously published (pregenome-sequence era) values. Until now genome sequences were published for only four of the 33 species with validly published names in the Methanomicrobiaceae. Here we describe the features of M. limicola, together with the improved-high-quality draft genome sequence and an notation of the type strain, M3 T. The 3,200,946 bp long chromosome (permanent draft sequence) with its 3,064 protein-coding and 65 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  11. Draft genome sequence of CTX-M-type β-lactamase-producing Klebsiella quasipneumoniae subsp. similipneumoniae isolated from a Box turtle.

    PubMed

    Li, Chien-Feng; Tang, Hui-Ling; Chiou, Chien-Shun; Tung, Kwong-Chung; Lu, Min-Chi; Lai, Yi-Chyi

    2018-03-01

    Klebsiella spp. are regarded as major pathogens causing infections in humans and various animals. Here we report the draft genome sequence of a CTX-M-type β-lactamase-producing Klebsiella quasipneumoniae subsp. similipneumoniae strain CHKP0062 isolated from a Yellow-margined Box turtle. An Illumina-Solexa platform was used to sequence the genome of CHKP0062. Qualified reads were assembled de novo using Velvet. The draft genome was annotated by the NCBI Prokaryotic Genome Annotation Pipeline (PGAP). The resistome and virulome of the strain were investigated. A total of 5423 protein-coding sequences, 87 tRNAs, 24 rRNAs and 12 ncRNAs were identified in the 5 699 275-bp genome. CHKP0062 was assigned to sequence type ST2131 with the K-loci type as KL67. No virulence-associated genes were identified. However, numerous antimicrobial resistance genes were present in this strain. Plasmid contigs were assembled and revealed homology to the multidrug resistance plasmids pC15-K, pCTX-M3 and pKF3-94, with the carriage of the class A β-lactamase genes bla TEM-1b and bla CTX-M-3 . The genome sequence reported in this study will be useful for comparative genomic analysis regarding the dissemination of clinically important antibiotic resistance genes among Klebsiella spp. isolated from humans and animals. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.

  12. Pstl repeat: a family of short interspersed nucleotide element (SINE)-like sequences in the genomes of cattle, goat, and buffalo.

    PubMed

    Sheikh, Faruk G; Mukhopadhyay, Sudit S; Gupta, Prabhakar

    2002-02-01

    The PstI family of elements are short, highly repetitive DNA sequences interspersed throughout the genome of the Bovidae. We have cloned and sequenced some members of the PstI family from cattle, goat, and buffalo. These elements are approximately 500 bp, have a copy number of 2 x 10(5) - 4 x 10(5), and comprise about 4% of the haploid genome. Studies of nucleotide sequence homology indicate that the buffalo and goat PstI repeats (type II) are similar types of short interspersed nucleotide element (SINE) sequences, but the cattle PstI repeat (type I) is considerably more divergent. Additionally, the goat PstI sequence showed significant sequence homology with bovine serine tRNA, and is therefore likely derived from serine tRNA. Interestingly, Southern hybridization suggests that both types of SINEs (I and II) are present in all the species of Bovidae. Dendrogram analysis indicates that cattle PstI SINE is similar to bovine Alu-like SINEs. Goat and buffalo SINEs formed a separate cluster, suggesting that these two types of SINEs evolved separately in the genome of the Bovidae.

  13. Multilocus sequence typing of total-genome-sequenced bacteria.

    PubMed

    Larsen, Mette V; Cosentino, Salvatore; Rasmussen, Simon; Friis, Carsten; Hasman, Henrik; Marvig, Rasmus Lykke; Jelsbak, Lars; Sicheritz-Pontén, Thomas; Ussery, David W; Aarestrup, Frank M; Lund, Ole

    2012-04-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available to scientists and routine diagnostic laboratories. Currently, the cost is below that of traditional MLST. The new challenges will be how to extract the relevant information from the large amount of data so as to allow for comparison over time and between laboratories. Ideally, this information should also allow for comparison to historical data. We developed a Web-based method for MLST of 66 bacterial species based on WGS data. As input, the method uses short sequence reads from four sequencing platforms or preassembled genomes. Updates from the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56 MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types of their isolates on the basis of WGS data. This method is publicly available at www.cbs.dtu.dk/services/MLST.

  14. Complete genome sequence of the bile-resistant pigment-producing anaerobe Alistipes finegoldii type strain (AHN2437T)

    PubMed Central

    Mavromatis, Konstantinos; Stackebrandt, Erko; Munk, Christine; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Hammon, Nancy; Deshpande, Shweta; Cheng, Jan-Fang; Tapia, Roxanne; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Pagani, Ioanna; Ivanova, Natalia; Mikhailova, Natalia; Huntemann, Marcel; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Rohde, Manfred; Gronow, Sabine; Göker, Markus; Detter, John C.; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Woyke, Tanja

    2013-01-01

    Alistipes finegoldii Rautio et al. 2003 is one of five species of Alistipes with a validly published name: family Rikenellaceae, order Bacteroidetes, class Bacteroidia, phylum Bacteroidetes. This rod-shaped and strictly anaerobic organism has been isolated mostly from human tissues. Here we describe the features of the type strain of this species, together with the complete genome sequence, and annotation. A. finegoldii is the first member of the genus Alistipes for which the complete genome sequence of its type strain is now available. The 3,734,239 bp long single replicon genome with its 3,302 protein-coding and 68 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:23961309

  15. Complete Genome Sequencing of a Novel Type of Omikronpapillomavirus 1 in Indian River Lagoon Bottlenose Dolphins (Tursiops truncatus).

    PubMed

    Rodrigues, Thaís C S; Subramaniam, Kuttichantran; Cortés-Hinojosa, Galaxia; Wellehan, James F X; Ng, Terry Fei Fan; Delwart, Eric; McCulloch, Stephen D; Goldstein, Juli D; Schaefer, Adam M; Fair, Patricia A; Reif, John S; Bossart, Gregory D; Waltzek, Thomas B

    2018-04-26

    The genome sequence of a papillomavirus was determined from fecal samples collected from bottlenose dolphins in the Indian River Lagoon, FL. The genome was 7,772 bp and displayed a typical papillomavirus genome organization. Phylogenetic analysis supported the bottlenose dolphin papillomavirus as being a novel type of Omikronpapillomavirus 1 . Copyright © 2018 Rodrigues et al.

  16. Next-Generation Sequencing Platforms

    NASA Astrophysics Data System (ADS)

    Mardis, Elaine R.

    2013-06-01

    Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.

  17. Draft Genome Sequence of Streptomyces specialis Type Strain GW41-1564 (DSM 41924).

    PubMed

    Loucif, Lotfi; Michelle, Caroline; Terras, Jérôme; Rolain, Jean-Marc; Raoult, Didier; Fournier, Pierre-Edouard

    2017-03-30

    Here, we report the draft genome sequence of Streptomyces specialis type strain GW41-1564, which was isolated from soil. This 5.87-Mb genome exhibits a high G+C content of 72.72% and contains 5,486 protein-coding genes. Copyright © 2017 Loucif et al.

  18. Are Escherichia coli Pathotypes Still Relevant in the Era of Whole-Genome Sequencing?

    PubMed Central

    Robins-Browne, Roy M.; Holt, Kathryn E.; Ingle, Danielle J.; Hocking, Dianna M.; Yang, Ji; Tauschek, Marija

    2016-01-01

    The empirical and pragmatic nature of diagnostic microbiology has given rise to several different schemes to subtype E.coli, including biotyping, serotyping, and pathotyping. These schemes have proved invaluable in identifying and tracking outbreaks, and for prognostication in individual cases of infection, but they are imprecise and potentially misleading due to the malleability and continuous evolution of E. coli. Whole genome sequencing can be used to accurately determine E. coli subtypes that are based on allelic variation or differences in gene content, such as serotyping and pathotyping. Whole genome sequencing also provides information about single nucleotide polymorphisms in the core genome of E. coli, which form the basis of sequence typing, and is more reliable than other systems for tracking the evolution and spread of individual strains. A typing scheme for E. coli based on genome sequences that includes elements of both the core and accessory genomes, should reduce typing anomalies and promote understanding of how different varieties of E. coli spread and cause disease. Such a scheme could also define pathotypes more precisely than current methods. PMID:27917373

  19. Are Escherichia coli Pathotypes Still Relevant in the Era of Whole-Genome Sequencing?

    PubMed

    Robins-Browne, Roy M; Holt, Kathryn E; Ingle, Danielle J; Hocking, Dianna M; Yang, Ji; Tauschek, Marija

    2016-01-01

    The empirical and pragmatic nature of diagnostic microbiology has given rise to several different schemes to subtype E .coli, including biotyping, serotyping, and pathotyping. These schemes have proved invaluable in identifying and tracking outbreaks, and for prognostication in individual cases of infection, but they are imprecise and potentially misleading due to the malleability and continuous evolution of E. coli . Whole genome sequencing can be used to accurately determine E. coli subtypes that are based on allelic variation or differences in gene content, such as serotyping and pathotyping. Whole genome sequencing also provides information about single nucleotide polymorphisms in the core genome of E. coli , which form the basis of sequence typing, and is more reliable than other systems for tracking the evolution and spread of individual strains. A typing scheme for E. coli based on genome sequences that includes elements of both the core and accessory genomes, should reduce typing anomalies and promote understanding of how different varieties of E. coli spread and cause disease. Such a scheme could also define pathotypes more precisely than current methods.

  20. Piscine reovirus: Genomic and molecular phylogenetic analysis from farmed and wild salmonids collected on the Canada/US Pacific Coast

    USGS Publications Warehouse

    Siah, Ahmed; Morrison, Diane B.; Fringuelli, Elena; Savage, Paul S.; Richmond, Zina; Purcell, Maureen K.; Johns, Robert; Johnson, Stewart C.; Sakasida, Sonja M.

    2015-01-01

    Piscine reovirus (PRV) is a double stranded non-enveloped RNA virus detected in farmed and wild salmonids. This study examined the phylogenetic relationships among different PRV sequence types present in samples from salmonids in Western Canada and the US, including Alaska (US), British Columbia (Canada) and Washington State (US). Tissues testing positive for PRV were partially sequenced for segment S1, producing 71 sequences that grouped into 10 unique sequence types. Sequence analysis revealed no identifiable geographical or temporal variation among the sequence types. Identical sequence types were found in fish sampled in 2001, 2005 and 2014. In addition, PRV positive samples from fish derived from Alaska, British Columbia and Washington State share identical sequence types. Comparative analysis of the phylogenetic tree indicated that Canada/US Pacific Northwest sequences formed a subgroup with some Norwegian sequence types (group II), distinct from other Norwegian and Chilean sequences (groups I, III and IV). Representative PRV positive samples from farmed and wild fish in British Columbia and Washington State were subjected to genome sequencing using next generation sequencing methods. Individual analysis of each of the 10 partial segments indicated that the Canadian and US PRV sequence types clustered separately from available whole genome sequences of some Norwegian and Chilean sequences for all segments except the segment S4. In summary, PRV was genetically homogenous over a large geographic distance (Alaska to Washington State), and the sequence types were relatively stable over a 13 year period.

  1. Piscine Reovirus: Genomic and Molecular Phylogenetic Analysis from Farmed and Wild Salmonids Collected on the Canada/US Pacific Coast

    PubMed Central

    Siah, Ahmed; Morrison, Diane B.; Fringuelli, Elena; Savage, Paul; Richmond, Zina; Johns, Robert; Purcell, Maureen K.; Johnson, Stewart C.; Saksida, Sonja M.

    2015-01-01

    Piscine reovirus (PRV) is a double stranded non-enveloped RNA virus detected in farmed and wild salmonids. This study examined the phylogenetic relationships among different PRV sequence types present in samples from salmonids in Western Canada and the US, including Alaska (US), British Columbia (Canada) and Washington State (US). Tissues testing positive for PRV were partially sequenced for segment S1, producing 71 sequences that grouped into 10 unique sequence types. Sequence analysis revealed no identifiable geographical or temporal variation among the sequence types. Identical sequence types were found in fish sampled in 2001, 2005 and 2014. In addition, PRV positive samples from fish derived from Alaska, British Columbia and Washington State share identical sequence types. Comparative analysis of the phylogenetic tree indicated that Canada/US Pacific Northwest sequences formed a subgroup with some Norwegian sequence types (group II), distinct from other Norwegian and Chilean sequences (groups I, III and IV). Representative PRV positive samples from farmed and wild fish in British Columbia and Washington State were subjected to genome sequencing using next generation sequencing methods. Individual analysis of each of the 10 partial segments indicated that the Canadian and US PRV sequence types clustered separately from available whole genome sequences of some Norwegian and Chilean sequences for all segments except the segment S4. In summary, PRV was genetically homogenous over a large geographic distance (Alaska to Washington State), and the sequence types were relatively stable over a 13 year period. PMID:26536673

  2. Complete genome sequence of Clostridium pasteurianum NRRL B-598, a non-type strain producing butanol.

    PubMed

    Sedlar, Karel; Kolek, Jan; Skutkova, Helena; Branska, Barbora; Provaznik, Ivo; Patakova, Petra

    2015-11-20

    The strain Clostridium pasteurianum NRRL B-598 is non-type, oxygen tolerant, spore-forming, mesophilic and heterofermentative strain with high hydrogen production and ability of acetone-butanol fermentation (ethanol production being negligible). Here, we present the annotated complete genome sequence of this bacterium, replacing the previous draft genome assembly. The genome consisting of a single circular 6,186,879 bp chromosome with no plasmid was determined using PacBio RSII and Roche 454 sequencing. Copyright © 2015 Elsevier B.V. All rights reserved.

  3. Whole Genome Sequencing for Genomics-Guided Investigations of Escherichia coli O157:H7 Outbreaks.

    PubMed

    Rusconi, Brigida; Sanjar, Fatemeh; Koenig, Sara S K; Mammel, Mark K; Tarr, Phillip I; Eppinger, Mark

    2016-01-01

    Multi isolate whole genome sequencing (WGS) and typing for outbreak investigations has become a reality in the post-genomics era. We applied this technology to strains from Escherichia coli O157:H7 outbreaks. These include isolates from seven North America outbreaks, as well as multiple isolates from the same patient and from different infected individuals in the same household. Customized high-resolution bioinformatics sequence typing strategies were developed to assess the core genome and mobilome plasticity. Sequence typing was performed using an in-house single nucleotide polymorphism (SNP) discovery and validation pipeline. Discriminatory power becomes of particular importance for the investigation of isolates from outbreaks in which macrogenomic techniques such as pulse-field gel electrophoresis or multiple locus variable number tandem repeat analysis do not differentiate closely related organisms. We also characterized differences in the phage inventory, allowing us to identify plasticity among outbreak strains that is not detectable at the core genome level. Our comprehensive analysis of the mobilome identified multiple plasmids that have not previously been associated with this lineage. Applied phylogenomics approaches provide strong molecular evidence for exceptionally little heterogeneity of strains within outbreaks and demonstrate the value of intra-cluster comparisons, rather than basing the analysis on archetypal reference strains. Next generation sequencing and whole genome typing strategies provide the technological foundation for genomic epidemiology outbreak investigation utilizing its significantly higher sample throughput, cost efficiency, and phylogenetic relatedness accuracy. These phylogenomics approaches have major public health relevance in translating information from the sequence-based survey to support timely and informed countermeasures. Polymorphisms identified in this work offer robust phylogenetic signals that index both short- and long-term evolution and can complement currently employed typing schemes for outbreak ex- and inclusion, diagnostics, surveillance, and forensic studies.

  4. Whole Genome Sequencing for Genomics-Guided Investigations of Escherichia coli O157:H7 Outbreaks

    PubMed Central

    Rusconi, Brigida; Sanjar, Fatemeh; Koenig, Sara S. K.; Mammel, Mark K.; Tarr, Phillip I.; Eppinger, Mark

    2016-01-01

    Multi isolate whole genome sequencing (WGS) and typing for outbreak investigations has become a reality in the post-genomics era. We applied this technology to strains from Escherichia coli O157:H7 outbreaks. These include isolates from seven North America outbreaks, as well as multiple isolates from the same patient and from different infected individuals in the same household. Customized high-resolution bioinformatics sequence typing strategies were developed to assess the core genome and mobilome plasticity. Sequence typing was performed using an in-house single nucleotide polymorphism (SNP) discovery and validation pipeline. Discriminatory power becomes of particular importance for the investigation of isolates from outbreaks in which macrogenomic techniques such as pulse-field gel electrophoresis or multiple locus variable number tandem repeat analysis do not differentiate closely related organisms. We also characterized differences in the phage inventory, allowing us to identify plasticity among outbreak strains that is not detectable at the core genome level. Our comprehensive analysis of the mobilome identified multiple plasmids that have not previously been associated with this lineage. Applied phylogenomics approaches provide strong molecular evidence for exceptionally little heterogeneity of strains within outbreaks and demonstrate the value of intra-cluster comparisons, rather than basing the analysis on archetypal reference strains. Next generation sequencing and whole genome typing strategies provide the technological foundation for genomic epidemiology outbreak investigation utilizing its significantly higher sample throughput, cost efficiency, and phylogenetic relatedness accuracy. These phylogenomics approaches have major public health relevance in translating information from the sequence-based survey to support timely and informed countermeasures. Polymorphisms identified in this work offer robust phylogenetic signals that index both short- and long-term evolution and can complement currently employed typing schemes for outbreak ex- and inclusion, diagnostics, surveillance, and forensic studies. PMID:27446025

  5. Draft genome sequence of Halomonas lutea strain YIM 91125 T (DSM 23508 T) isolated from the alkaline Lake Ebinur in Northwest China

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gao, Xiao-Yang; Zhi, Xiao-Yang; Li, Hong-Wei

    Species of the genus Halomonas are halophilic and their flexible adaption to changes of salinity and temperature brings considerable potential biotechnology applications, such as degradation of organic pollutants and enzyme production. The type strain Halomonas lutea YIM 91125 T was isolated from a hypersaline lake in China. The genome of strain YIM 91125 T becomes the twelfth species sequenced in Halomonas, and the thirteenth species sequenced in Halomonadaceae. We described the features of H. lutea YIM 91125 T, together with the high quality draft genome sequence and annotation of its type strain. The 4,533,090 bp long genome of strain YIMmore » 91125 T with its 4,284 protein-coding and 84 RNA genes is a part of Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG-I) project. From the viewpoint of comparative genomics, H. lutea has a larger genome size and more specific genes, which indicated acquisition of function bringing better adaption to its environment. Finally, DDH analysis demonstrated that H. lutea is a distinctive species, and halophilic features and nitrogen metabolism related genes were discovered in its genome.« less

  6. Draft genome sequence of Halomonas lutea strain YIM 91125 T (DSM 23508 T) isolated from the alkaline Lake Ebinur in Northwest China

    DOE PAGES

    Gao, Xiao-Yang; Zhi, Xiao-Yang; Li, Hong-Wei; ...

    2015-01-20

    Species of the genus Halomonas are halophilic and their flexible adaption to changes of salinity and temperature brings considerable potential biotechnology applications, such as degradation of organic pollutants and enzyme production. The type strain Halomonas lutea YIM 91125 T was isolated from a hypersaline lake in China. The genome of strain YIM 91125 T becomes the twelfth species sequenced in Halomonas, and the thirteenth species sequenced in Halomonadaceae. We described the features of H. lutea YIM 91125 T, together with the high quality draft genome sequence and annotation of its type strain. The 4,533,090 bp long genome of strain YIMmore » 91125 T with its 4,284 protein-coding and 84 RNA genes is a part of Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG-I) project. From the viewpoint of comparative genomics, H. lutea has a larger genome size and more specific genes, which indicated acquisition of function bringing better adaption to its environment. Finally, DDH analysis demonstrated that H. lutea is a distinctive species, and halophilic features and nitrogen metabolism related genes were discovered in its genome.« less

  7. Draft genome sequence of Staphylococcus aureus KT/312045, an ST1-MSSA PVL positive isolated from pus sample in East Coast Malaysia.

    PubMed

    Suhaili, Zarizal; Lean, Soo-Sum; Mohamad, Noor Muzamil; Rachman, Abdul R Abdul; Desa, Mohd Nasir Mohd; Yeo, Chew Chieng

    2016-09-01

    Most of the efforts in elucidating the molecular relatedness and epidemiology of Staphylococcus aureus in Malaysia have been largely focused on methicillin-resistant S. aureus (MRSA). Therefore, here we report the draft genome sequence of the methicillin-susceptible Staphylococcus aureus (MSSA) with sequence type 1 (ST1), spa type t127 with Panton-Valentine Leukocidin (pvl) pathogenic determinant isolated from pus sample designated as KT/314250 strain. The size of the draft genome is 2.86 Mbp with 32.7% of G + C content consisting 2673 coding sequences. The draft genome sequence has been deposited in DDBJ/EMBL/GenBank under the accession number AOCP00000000.

  8. Complete genome sequence of the sulfate-reducing firmicute Desulfotomaculum ruminis type strain (DLT)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Spring, Stefan; Visser, Michael; Lu, Megan

    2012-12-11

    Desulfotomaculum ruminis Campbell and Postgate 1965 is a member of the large genus Desulfotomaculum which contains 30 species and is contained in the family Peptococcaceae. This species is of interest because it represents one of the few sulfate- reducing bacteria that have been isolated from the rumen. Here we describe the features of D. ruminis together with the complete genome sequence and annotation. The 3,969,014 bp long chromosome with a total of 3,901 protein-coding and 85 RNA genes is the second completed genome sequence of a type strain of the genus Desulfotomaculum to be pub- lished, and was sequenced asmore » part of the DOE Joint Genome Institute Community Sequencing Program 2009.« less

  9. Analysis of the full genome of human group C rotaviruses reveals lineage diversification and reassortment.

    PubMed

    Medici, Maria Cristina; Tummolo, Fabio; Martella, Vito; Arcangeletti, Maria Cristina; De Conto, Flora; Chezzi, Carlo; Fehér, Enikő; Marton, Szilvia; Calderaro, Adriana; Bányai, Krisztián

    2016-08-01

    Group C rotaviruses (RVC) are enteric pathogens of humans and animals. Whole-genome sequences are available only for few RVCs, leaving gaps in our knowledge about their genetic diversity. We determined the full-length genome sequence of two human RVCs (PR2593/2004 and PR713/2012), detected in Italy from hospital-based surveillance for rotavirus infection in 2004 and 2012. In the 11 RNA genomic segments, the two Italian RVCs segregated within separate intra-genotypic lineages showed variation ranging from 1.9 % (VP6) to 15.9 % (VP3) at the nucleotide level. Comprehensive analysis of human RVC sequences available in the databases allowed us to reveal the existence of at least two major genome configurations, defined as type I and type II. Human RVCs of type I were all associated with the M3 VP3 genotype, including the Italian strain PR2593/2004. Conversely, human RVCs of type II were all associated with the M2 VP3 genotype, including the Italian strain PR713/2012. Reassortant RVC strains between these major genome configurations were identified. Although only a few full-genome sequences of human RVCs, mostly of Asian origin, are available, the analysis of human RVC sequences retrieved from the databases indicates that at least two intra-genotypic RVC lineages circulate in European countries. Gathering more sequence data is necessary to develop a standardized genotype and intra-genotypic lineage classification system useful for epidemiological investigations and avoiding confusion in the literature.

  10. Complete genome sequence of Leptotrichia buccalis type strain (C-1013-bT)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ivanova, Natalia; Gronow, Sabine; Lapidus, Alla

    2009-05-20

    Leptotrichia buccalis (Robin 1853) Trevisan 1879 is the type species of the genus, and is of phylogenetic interest because of its isolated location in the sparsely populated and neither taxonomically nor genomically adequately accessed family 'Leptotrichiaceae' within the phylum 'Fusobacteria'. Species of Leptotrichia are large fusiform non-motile, non-sporulating rods, which often populate the human oral flora. L. buccalis is anaerobic to aerotolerant, and saccharolytic. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of the order 'Fusobacteriales' and no more than the second sequence from themore » phylum 'Fusobacteria'. The 2,465,610 bp long single replicon genome with its 2306 protein-coding and 61 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  11. Complete genome sequence of Leptotrichia buccalis type strain (C-1013-bT)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ivanova, N; Gronow, Sabine; Lapidus, Alla L.

    2009-01-01

    Leptotrichia buccalis (Robin 1853) Trevisan 1879 is the type species of the genus, and is of phylogenetic interest because of its isolated location in the sparsely populated and neither taxonomically nor genomically adequately accessed family 'Leptotrichiaceae' within the phylum 'Fusobacteria'. Species of Leptotrichia are large, fusiform, non-motile, non-sporulating rods, which often populate the human oral flora. L. buccalis is anaerobic to aerotolerant, and saccharolytic. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of the order 'Fusobacteriales' and no more than the second sequence from themore » phylum 'Fusobacteria'. The 2,465,610 bp long single replicon genome with its 2306 protein-coding and 61 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  12. Draft Genome Sequence for ICMP 5702, the Type Strain of Pectobacterium carotovorum subsp. carotovorum That Causes Soft Rot Disease on Potato

    PubMed Central

    Lu, Ashley; Armstrong, Karen F.

    2015-01-01

    Pectobacterium species are economically important bacteria that cause soft rotting of potato tubers in the field and in storage. Here, we report the draft genome sequence of the type strain for P. carotovorum subsp. carotovorum, ICMP 5702 (ATCC 15713). The genome sequence of ICMP 5702 will provide an important reference for future phylogenomic and taxonomic studies of the phytopathogenic Enterobacteriaceae. PMID:26251498

  13. Draft genome sequence for virulent and avirulent strains of Xanthomonas arboricola isolated from Prunus spp. in Spain.

    PubMed

    Garita-Cambronero, Jerson; Palacio-Bielsa, Ana; López, María M; Cubero, Jaime

    2016-01-01

    Xanthomonas arboricola is a species in genus Xanthomonas which is mainly comprised of plant pathogens. Among the members of this taxon, X. arboricola pv. pruni, the causal agent of bacterial spot disease of stone fruits and almond, is distributed worldwide although it is considered a quarantine pathogen in the European Union. Herein, we report the draft genome sequence, the classification, the annotation and the sequence analyses of a virulent strain, IVIA 2626.1, and an avirulent strain, CITA 44, of X. arboricola associated with Prunus spp. The draft genome sequence of IVIA 2626.1 consists of 5,027,671 bp, 4,720 protein coding genes and 50 RNA encoding genes. The draft genome sequence of strain CITA 44 consists of 4,760,482 bp, 4,250 protein coding genes and 56 RNA coding genes. Initial comparative analyses reveals differences in the presence of structural and regulatory components of the type IV pilus, the type III secretion system, the type III effectors as well as variations in the number of the type IV secretion systems. The genome sequence data for these strains will facilitate the development of molecular diagnostics protocols that differentiate virulent and avirulent strains. In addition, comparative genome analysis will provide insights into the plant-pathogen interaction during the bacterial spot disease process.

  14. Complete genome sequence of Tsukamurella paurometabola type strain (no. 33T)

    PubMed Central

    Munk, A. Christine; Lapidus, Alla; Lucas, Susan; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Del Rio, Tijana Glavina; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Huntemann, Marcel; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Tapia, Roxanne; Han, Cliff; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Brettin, Thomas; Yasawong, Montri; Brambilla, Evelyne-Marie; Rohde, Manfred; Sikorski, Johannes; Göker, Markus; Detter, John C.; Woyke, Tanja; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2011-01-01

    Tsukamurella paurometabola corrig. (Steinhaus 1941) Collins et al. 1988 is the type species of the genus Tsukamurella, which is the type genus to the family Tsukamurellaceae. The species is not only of interest because of its isolated phylogenetic location, but also because it is a human opportunistic pathogen with some strains of the species reported to cause lung infection, lethal meningitis, and necrotizing tenosynovitis. This is the first completed genome sequence of a member of the genus Tsukamurella and the first genome sequence of a member of the family Tsukamurellaceae. The 4,479,724 bp long genome contains a 99,806 bp long plasmid and a total of 4,335 protein-coding and 56 RNA genes, and is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21886861

  15. Non-contiguous finished genome sequence of the opportunistic oral pathogen Prevotella multisaccharivorax type strain (PPPA20T)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pati, Amrita; Gronow, Sabine; Lu, Megan

    2011-01-01

    Prevotella multisaccharivorax Sakamoto et al. 2005 is a species of the large genus Prevotella, which belongs to the family Prevotellaceae. The species is of medical interest because its members are able to cause diseases in the human oral cavity such as periodontitis, root caries and others. Although 77 Prevotella genomes have already been sequenced or are targeted for sequencing, this is only the second completed genome sequence of a type strain of a species within the genus Prevotella to be published. The 3,388,644 bp long genome is assembled in three non-contiguous contigs, harbors 2,876 protein-coding and 75 RNA genes andmore » is a part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  16. Permanent draft genome sequence of the gliding predator Saprospira grandis strain Sa g1 (= HR1)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mavromatis, K; Chertkov, Olga; Lapidus, Alla L.

    2012-01-01

    Saprospira grandis Gross et al. 1911 is a member of the Saprospiraceae, a family in the class 'Sphingobacteria' that remains poorly characterized at the genomic level. The species is known for preying on other marine bacteria via 'ixotrophy'. S. grandis strain Sa g1 was isolated from decaying crab carapace in France and was selected for genome sequencing because of its isolated location in the tree of life. Only one type strain genome has been published so far from the Saprospiraceae, while the sequence of strain Sa g1 represents the second genome to be published from a non-type strain of S.more » grandis. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 4,495,250 bp long Improved-High-Quality draft of the genome with its 3,536 protein-coding and 62 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  17. The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color

    USDA-ARS?s Scientific Manuscript database

    Background: Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. Results: We describe the sequencing and assembly of...

  18. Genome Sequence of the Yeast Clavispora lusitaniae Type Strain CBS 6936.

    PubMed

    Durrens, Pascal; Klopp, Christophe; Biteau, Nicolas; Fitton-Ouhabi, Valérie; Dementhon, Karine; Accoceberry, Isabelle; Sherman, David J; Noël, Thierry

    2017-08-03

    Clavispora lusitaniae , an environmental saprophytic yeast belonging to the CTG clade of Candida , can behave occasionally as an opportunistic pathogen in humans. We report here the genome sequence of the type strain CBS 6936. Comparison with sequences of strain ATCC 42720 indicates conservation of chromosomal structure but significant nucleotide divergence. Copyright © 2017 Durrens et al.

  19. Genome Sequence of the Yeast Clavispora lusitaniae Type Strain CBS 6936

    PubMed Central

    Klopp, Christophe; Biteau, Nicolas; Fitton-Ouhabi, Valérie; Dementhon, Karine; Accoceberry, Isabelle; Sherman, David J.; Noël, Thierry

    2017-01-01

    ABSTRACT Clavispora lusitaniae, an environmental saprophytic yeast belonging to the CTG clade of Candida, can behave occasionally as an opportunistic pathogen in humans. We report here the genome sequence of the type strain CBS 6936. Comparison with sequences of strain ATCC 42720 indicates conservation of chromosomal structure but significant nucleotide divergence. PMID:28774979

  20. Complete genome sequence of Sulfurimonas autotrophica type strain (OK10T)

    PubMed Central

    Sikorski, Johannes; Munk, Christine; Lapidus, Alla; Ngatchou Djao, Olivier Duplex; Lucas, Susan; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Han, Cliff; Cheng, Jan-Fang; Tapia, Roxanne; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Sims, David; Meincke, Linda; Brettin, Thomas; Detter, John C.; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Rohde, Manfred; Lang, Elke; Spring, Stefan; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2010-01-01

    Sulfurimonas autotrophica Inagaki et al. 2003 is the type species of the genus Sulfurimonas. This genus is of interest because of its significant contribution to the global sulfur cycle as it oxidizes sulfur compounds to sulfate and by its apparent habitation of deep-sea hydrothermal and marine sulfidic environments as potential ecological niche. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the second complete genome sequence of the genus Sulfurimonas and the 15th genome in the family Helicobacteraceae. The 2,153,198 bp long genome with its 2,165 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304749

  1. Complete Genome Sequence of Mycobacterium marinum ATCC 927T, Obtained Using Nanopore and Illumina Sequencing Technologies.

    PubMed

    Yoshida, Mitsunori; Fukano, Hanako; Miyamoto, Yuji; Shibayama, Keigo; Suzuki, Masato; Hoshino, Yoshihiko

    2018-05-17

    Mycobacterium marinum is a slowly growing, broad-host-range mycobacterial species. Here, we report the complete genome sequence of a Mycobacterium marinum type strain that was isolated from tubercles of diseased fish. This sequence will provide essential information for future taxonomic and comparative genome studies of its relatives. Copyright © 2018 Yoshida et al.

  2. Analysis of whole genome sequencing for the Escherichia coli O157:H7 typing phages.

    PubMed

    Cowley, Lauren A; Beckett, Stephen J; Chase-Topping, Margo; Perry, Neil; Dallman, Tim J; Gally, David L; Jenkins, Claire

    2015-04-08

    Shiga toxin producing Escherichia coli O157 can cause severe bloody diarrhea and haemolytic uraemic syndrome. Phage typing of E. coli O157 facilitates public health surveillance and outbreak investigations, certain phage types are more likely to occupy specific niches and are associated with specific age groups and disease severity. The aim of this study was to analyse the genome sequences of 16 (fourteen T4 and two T7) E. coli O157 typing phages and to determine the genes responsible for the subtle differences in phage type profiles. The typing phages were sequenced using paired-end Illumina sequencing at The Genome Analysis Centre and the Animal Health and Veterinary Laboratories Agency and bioinformatics programs including Velvet, Brig and Easyfig were used to analyse them. A two-way Euclidian cluster analysis highlighted the associations between groups of phage types and typing phages. The analysis showed that the T7 typing phages (9 and 10) differed by only three genes and that the T4 typing phages formed three distinct groups of similar genomic sequences: Group 1 (1, 8, 11, 12 and 15, 16), Group 2 (3, 6, 7 and 13) and Group 3 (2, 4, 5 and 14). The E. coli O157 phage typing scheme exhibited a significantly modular network linked to the genetic similarity of each group showing that these groups are specialised to infect a subset of phage types. Sequencing the typing phage has enabled us to identify the variable genes within each group and to determine how this corresponds to changes in phage type.

  3. Genome-wide-analyses of Listeria monocytogenes from food-processing plants reveal clonal diversity and date the emergence of persisting sequence types.

    PubMed

    Knudsen, Gitte M; Nielsen, Jesper Boye; Marvig, Rasmus L; Ng, Yin; Worning, Peder; Westh, Henrik; Gram, Lone

    2017-08-01

    Whole genome sequencing is increasing used in epidemiology, e.g. for tracing outbreaks of food-borne diseases. This requires in-depth understanding of pathogen emergence, persistence and genomic diversity along the food production chain including in food processing plants. We sequenced the genomes of 80 isolates of Listeria monocytogenes sampled from Danish food processing plants over a time-period of 20 years, and analysed the sequences together with 10 public available reference genomes to advance our understanding of interplant and intraplant genomic diversity of L. monocytogenes. Except for three persisting sequence types (ST) based on Multi Locus Sequence Typing being ST7, ST8 and ST121, long-term persistence of clonal groups was limited, and new clones were introduced continuously, potentially from raw materials. No particular gene could be linked to the persistence phenotype. Using time-based phylogenetic analyses of the persistent STs, we estimate the L. monocytogenes evolutionary rate to be 0.18-0.35 single nucleotide polymorphisms/year, suggesting that the persistent STs emerged approximately 100 years ago, which correlates with the onset of industrialization and globalization of the food market. © 2017 Society for Applied Microbiology and John Wiley & Sons Ltd.

  4. Large-scale genomic analyses reveal the population structure and evolutionary trends of Streptococcus agalactiae strains in Brazilian fish farms.

    PubMed

    Barony, Gustavo M; Tavares, Guilherme C; Pereira, Felipe L; Carvalho, Alex F; Dorella, Fernanda A; Leal, Carlos A G; Figueiredo, Henrique C P

    2017-10-19

    Streptococcus agalactiae is a major pathogen and a hindrance on tilapia farming worldwide. The aims of this work were to analyze the genomic evolution of Brazilian strains of S. agalactiae and to establish spatial and temporal relations between strains isolated from different outbreaks of streptococcosis. A total of 39 strains were obtained from outbreaks and their whole genomes were sequenced and annotated for comparative analysis of multilocus sequence typing, genomic similarity and whole genome multilocus sequence typing (wgMLST). The Brazilian strains presented two sequence types, including a newly described ST, and a non-typeable lineage. The use of wgMLST could differentiate each strain in a single clone and was used to establish temporal and geographical correlations among strains. Bayesian phylogenomic analysis suggests that the studied Brazilian population was co-introduced in the country with their host, approximately 60 years ago. Brazilian strains of S. agalactiae were shown to be heterogeneous in their genome sequences and were distributed in different regions of the country according to their genotype, which allowed the use of wgMLST analysis to track each outbreak event individually.

  5. Complete genome sequence of Sanguibacter keddieii type strain (ST-74T)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ivanova, Natalia; Sikorski, Johannes; Sims, David

    2009-05-20

    Sanguibacter keddieii is the type species of the genus Sanguibacter, the only described genus within the family of Sanguibacteraceae. Phylogenetically, this family is located in the neighbourhood of the genus Oerskovia and the family Cellulomonadaceae within the actinobacterial suborder Micrococcineae. The strain described in this report was isolated from blood of apparently healthy cows. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the family Sanguibacteraceae, and the 4,253,413 bp long single replicon genome with its 3735 protein-coding and 70 RNA genes is part ofmore » the Genomic Encyclopedia of Bacteria and Archaea project.« less

  6. Complete genome sequence of Nakamurella multipartita type strain (Y-104).

    PubMed

    Tice, Hope; Mayilraj, Shanmugam; Sims, David; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Glavina Del Rio, Tijana; Copeland, Alex; Cheng, Jan-Fang; Meincke, Linda; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Detter, John C; Brettin, Thomas; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Chen, Feng

    2010-03-30

    Nakamurella multipartita (Yoshimi et al. 1996) Tao et al. 2004 is the type species of the monospecific genus Nakamurella in the actinobacterial suborder Frankineae. The nonmotile, coccus-shaped strain was isolated from activated sludge acclimated with sugar-containing synthetic wastewater, and is capable of accumulating large amounts of polysaccharides in its cells. Here we describe the features of the organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of a member of the family Nakamurellaceae. The 6,060,298 bp long single replicon genome with its 5415 protein-coding and 56 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  7. Complete Genome Sequence of Magnetospirillum gryphiswaldense MSR-1

    PubMed Central

    Wang, Xu; Wang, Qing; Zhang, Weijia; Wang, Yinjia; Li, Li; Wen, Tong; Zhang, Tongwei; Zhang, Yang; Xu, Jun; Hu, Junying; Li, Shuqi; Liu, Lingzi; Liu, Jinxin; Jiang, Wei; Tian, Jiesheng; Wang, Lei; Li, Jilun

    2014-01-01

    We report the complete genomic sequence of Magnetospirillum gryphiswaldense MSR-1 (DSM 6361), a type strain of the genus Magnetospirillum belonging to the Alphaproteobacteria. Compared to the reported draft sequence, extensive rearrangements and differences were found, indicating high genomic flexibility and “domestication” by accelerated evolution of the strain upon repeated passaging. PMID:24625872

  8. Whole genome sequencing in the prevention and control of Staphylococcus aureus infection.

    PubMed

    Price, J R; Didelot, X; Crook, D W; Llewelyn, M J; Paul, J

    2013-01-01

    Staphylococcus aureus remains a leading cause of hospital-acquired infection but weaknesses inherent in currently available typing methods impede effective infection prevention and control. The high resolution offered by whole genome sequencing has the potential to revolutionise our understanding and management of S. aureus infection. To outline the practicalities of whole genome sequencing and discuss how it might shape future infection control practice. We review conventional typing methods and compare these with the potential offered by whole genome sequencing. In contrast with conventional methods, whole genome sequencing discriminates down to single nucleotide differences and allows accurate characterisation of transmission events and outbreaks and additionally provides information about the genetic basis of phenotypic characteristics, including antibiotic susceptibility and virulence. However, translating its potential into routine practice will depend on affordability, acceptable turnaround times and on creating a reliable standardised bioinformatic infrastructure. Whole genome sequencing has the potential to provide a universal test that facilitates outbreak investigation, enables the detection of emerging strains and predicts their clinical importance. Copyright © 2012 The Healthcare Infection Society. Published by Elsevier Ltd. All rights reserved.

  9. Genome sequence of Frateuria aurantia type strain (Kondô 67T), a xanthomonade isolated from Lilium auratium Lindl.

    PubMed Central

    Anderson, Iain; Teshima, Huzuki; Nolan, Matt; Lapidus, Alla; Tice, Hope; Del Rio, Tijana Glavina; Cheng, Jan-Fang; Han, Cliff; Tapia, Roxanne; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Mavromatis, Konstantinos; Pagani, Ioanna; Ivanova, Natalia; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Rohde, Manfred; Lang, Elke; Detter, John C.; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2013-01-01

    Frateuria aurantia (ex Kondô and Ameyama 1958) Swings et al. 1980 is a member of the bispecific genus Frateuria in the family Xanthomonadaceae, which is already heavily targeted for non-type strain genome sequencing. Strain Kondô 67T was initially (1958) identified as a member of ‘Acetobacter aurantius’, a name that was not considered for the approved list. Kondô 67T was therefore later designated as the type strain of the newly proposed acetogenic species Frateuria aurantia. The strain is of interest because of its triterpenoids (hopane family). F. aurantia Kondô 67T is the first member of the genus Frateura whose genome sequence has been deciphered, and here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,603,458-bp long chromosome with its 3,200 protein-coding and 88 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:24501647

  10. Complete genome sequence of Streptobacillus moniliformis type strain (9901T)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nolan, Matt; Gronow, Sabine; Lapidus, Alla L.

    2009-01-01

    Streptobacillus moniliformis Levaditi et al. 1925 is the sole and type species of the genus, and is of phylogenetic interest because of its isolated location in the sparsely populated and neither taxonomically nor genomically much accessed family 'Leptotrichiaceae' within the phylum 'Fusobacteria'. S. moniliformis, a Gram-negative, non-motile and pleomorphic bacterium, is the etiologic agent of rat bite fever and Haverhill fever. Strain 9901T, the type strain of the species, was isolated from a patient with rat bite fever. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is only the second completedmore » genome sequence of the order 'Fusobacteriales' and no more than the third sequence from the phylum 'Fusobacteria'. The 1,662,578 bp long chromosome and the 10,702 bp plasmid with a total of 1511 protein-coding and 55 RNA genes are part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  11. Genome sequence of Frateuria aurantia type strain (Kondô 67T), a xanthomonade isolated from Lilium auratium Lindl.

    DOE PAGES

    Anderson, Iain; Teshima, Huzuki; Nolan, Matt; ...

    2013-10-16

    Frateuria aurantia (ex Kondô and Ameyama 1958) Swings et al. 1980 is a member of the bispecific genus Frateuria in the family Xanthomonadaceae, which is already heavily targeted for non-type strain genome sequencing. Strain Kondô 67 T was initially (1958) identified as a member of ‘Acetobacter aurantius’, a name that was not considered for the approved list. Kondô 67 T was therefore later designated as the type strain of the newly proposed acetogenic species Frateuria aurantia. The strain is of interest because of its triterpenoids (hopane family). F. aurantia Kondô 67 T is the first member of the genus Frateuramore » whose genome sequence has been deciphered, and here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,603,458-bp long chromosome with its 3,200 protein-coding and 88 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  12. Complete genome sequence of Campylobacter concisus ATCC 33237T and draft genome sequences for an additional eight well-characterized C. concisus strains

    USDA-ARS?s Scientific Manuscript database

    This report includes the complete genome of the Campylobacter concisus type strain ATCC 33237T and the draft genomes of eight additional well characterized C. concisus genomes. C. concisus has been shown to be a genetically heterogeneous species and these nine genomes provide valuable information re...

  13. Fine definition of the pedigree haplotypes of closely related rice cultivars by means of genome-wide discovery of single-nucleotide polymorphisms.

    PubMed

    Yamamoto, Toshio; Nagasaki, Hideki; Yonemaru, Jun-ichi; Ebana, Kaworu; Nakajima, Maiko; Shibaya, Taeko; Yano, Masahiro

    2010-04-27

    To create useful gene combinations in crop breeding, it is necessary to clarify the dynamics of the genome composition created by breeding practices. A large quantity of single-nucleotide polymorphism (SNP) data is required to permit discrimination of chromosome segments among modern cultivars, which are genetically related. Here, we used a high-throughput sequencer to conduct whole-genome sequencing of an elite Japanese rice cultivar, Koshihikari, which is closely related to Nipponbare, whose genome sequencing has been completed. Then we designed a high-throughput typing array based on the SNP information by comparison of the two sequences. Finally, we applied this array to analyze historical representative rice cultivars to understand the dynamics of their genome composition. The total 5.89-Gb sequence for Koshihikari, equivalent to 15.7 x the entire rice genome, was mapped using the Pseudomolecules 4.0 database for Nipponbare. The resultant Koshihikari genome sequence corresponded to 80.1% of the Nipponbare sequence and led to the identification of 67,051 SNPs. A high-throughput typing array consisting of 1917 SNP sites distributed throughout the genome was designed to genotype 151 representative Japanese cultivars that have been grown during the past 150 years. We could identify the ancestral origin of the pedigree haplotypes in 60.9% of the Koshihikari genome and 18 consensus haplotype blocks which are inherited from traditional landraces to current improved varieties. Moreover, it was predicted that modern breeding practices have generally decreased genetic diversity Detection of genome-wide SNPs by both high-throughput sequencer and typing array made it possible to evaluate genomic composition of genetically related rice varieties. With the aid of their pedigree information, we clarified the dynamics of chromosome recombination during the historical rice breeding process. We also found several genomic regions decreasing genetic diversity which might be caused by a recent human selection in rice breeding. The definition of pedigree haplotypes by means of genome-wide SNPs will facilitate next-generation breeding of rice and other crops.

  14. Complete genome sequence of Rhodothermus marinus type strain (R-10).

    PubMed

    Nolan, Matt; Tindall, Brian J; Pomrenke, Helga; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Lucas, Susan; Chen, Feng; Tice, Hope; Cheng, Jan-Fang; Saunders, Elizabeth; Han, Cliff; Bruce, David; Goodwin, Lynne; Chain, Patrick; Pitluck, Sam; Ovchinikova, Galina; Pati, Amrita; Ivanova, Natalia; Mavromatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Brettin, Thomas; Göker, Markus; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Detter, John C

    2009-12-29

    Rhodothermus marinus Alfredsson et al. 1995 is the type species of the genus and is of phylogenetic interest because the Rhodothermaceae represent the deepest lineage in the phylum Bacteroidetes. R. marinus R-10(T) is a Gram-negative, non-motile, non-spore-forming bacterium isolated from marine hot springs off the coast of Iceland. Strain R-10(T) is strictly aerobic and requires slightly halophilic conditions for growth. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the genus Rhodothermus, and only the second sequence from members of the family Rhodothermaceae. The 3,386,737 bp genome (including a 125 kb plasmid) with its 2914 protein-coding and 48 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  15. Draft genome sequences of two closely related aflatoxigenic Aspergillus species obtained from the Ivory Coast

    USDA-ARS?s Scientific Manuscript database

    The genomes of the A. ochraceoroseus and A. rambellii type strains were sequenced using a personal genome machine, followed by annotation of their genes. The genome size for A. ochraceoroseus was found to be approximately 23 Mb and contained 7,837 genes, while the A. rambellii genome was found to be...

  16. Complete Genome Sequence of a CTX-M-15-Producing Escherichia coli Strain from the H30Rx Subclone of Sequence Type 131 from a Patient with Recurrent Urinary Tract Infections, Closely Related to a Lethal Urosepsis Isolate from the Patient’s Sister

    PubMed Central

    Johnson, Timothy J.; Liu, Cindy M.; Sokurenko, Evgeni; Kisiela, Dagmara I.; Paul, Sandip; Andersen, Paal; Johnson, James R.; Price, Lance B.

    2016-01-01

    We report here the complete genome sequence, including five plasmid sequences, of Escherichia coli sequence type 131 (ST131) strain JJ1887. The strain was isolated in 2007 in the United States from a patient with recurrent cystitis, whose caregiver sister died from urosepsis caused by a nearly identical strain. PMID:27174264

  17. Complete Genome Sequences of Two Methicillin-Resistant Staphylococcus haemolyticus Isolates of Multilocus Sequence Type 25, First Detected by Shotgun Metagenomics.

    PubMed

    Couto, Natacha; Chlebowicz, Monika A; Raangs, Erwin C; Friedrich, Alex W; Rossen, John W

    2018-04-05

    The emergence of nosocomial infections by multidrug-resistant Staphylococcus haemolyticus isolates has been reported in several European countries. Here, we report the first two complete genome sequences of S. haemolyticus sequence type 25 (ST25) isolates 83131A and 83131B. Both isolates were isolated from the same clinical sample and were first identified through shotgun metagenomics. Copyright © 2018 Couto et al.

  18. Efficient engineering of a bacteriophage genome using the type I-E CRISPR-Cas system.

    PubMed

    Kiro, Ruth; Shitrit, Dror; Qimron, Udi

    2014-01-01

    The clustered regularly interspaced short palindromic repeats (CRISPR)-CRISPR-associated (Cas) system has recently been used to engineer genomes of various organisms, but surprisingly, not those of bacteriophages (phages). Here we present a method to genetically engineer the Escherichia coli phage T7 using the type I-E CRISPR-Cas system. T7 phage genome is edited by homologous recombination with a DNA sequence flanked by sequences homologous to the desired location. Non-edited genomes are targeted by the CRISPR-Cas system, thus enabling isolation of the desired recombinant phages. This method broadens CRISPR Cas-based editing to phages and uses a CRISPR-Cas type other than type II. The method may be adjusted to genetically engineer any bacteriophage genome.

  19. Complete genome sequence of the larval shellfish pathogen Vibrio Tubiashii type strain ATCC 19109

    USDA-ARS?s Scientific Manuscript database

    Vibrio tubiashii is a larval shellfish pathogen. Here we report the first closed genome sequence for this species (American Type Culture Collection type strain 19109), which has two chromosomes (3,294,490 and 1,766,582 bp), two megaplasmids (251,408 and 122,808 bp) and two plasmids (57,076 and 47,9...

  20. The full genome sequences of 8 equine herpesvirus type 4 isolates from horses in Japan.

    PubMed

    Izume, Satoko; Kirisawa, Rikio; Ohya, Kenji; Ohnuma, Aiko; Kimura, Takashi; Omatsu, Tsutomu; Katayama, Yukie; Mizutani, Tetsuya; Fukushi, Hideto

    2017-01-24

    Equine herpesvirus type 4 (EHV-4) is one of the most important pathogens in horses. To clarify the key genes of the EHV-4 genome that cause abortion in female horses, we determined the whole genome sequences of a laboratory strain and 7 Japanese EHV-4 isolates that were isolated from 2 aborted fetuses and nasal swabs of 5 horses with respiratory disease. The full genome sequences and predicted amino acid sequences of each gene of these isolates were compared with of the reference EHV-4 strain NS80567 and Australian isolates that were reported in 2015. The EHV-4 isolates clustered in 2 groups which did not reflect their pathogenicity. A comparison of the predicted amino acid sequences of the genes did not reveal any genes that were associated with EHV-4-induced abortion.

  1. Draft genome sequences of Streptococcus bovis strains ATCC 33317 and JB1

    USDA-ARS?s Scientific Manuscript database

    We report the draft genome sequences of Streptococcus bovis type strain ATTC 33317 (CVM42251) isolated from cow dung and strain JB1 (CVM42252) isolated from a cow rumen in 1977. Strains were subjected to Next Generation sequencing and the genome sizes are approximately 2 MB and 2.2 MB, respectively....

  2. Genome sequence of an aflatoxigenic pathogen of Argentinian peanut, Aspergillus arachidicola

    USDA-ARS?s Scientific Manuscript database

    In this study we sequenced the genome of the A. arachidicola Type strain (CBS 117610) and found its genome size to be 38.9 Mb, and its number of predicted genes to be 12,091, which are values comparable to those in other sequenced Aspergilli. Of its predicted genes, 691 were identified as unique to ...

  3. Epidemiological information is key when interpreting whole genome sequence data – lessons learned from a large Legionella pneumophila outbreak in Warstein, Germany, 2013

    PubMed Central

    Petzold, Markus; Prior, Karola; Moran-Gilad, Jacob; Harmsen, Dag; Lück, Christian

    2017-01-01

    Introduction Whole genome sequencing (WGS) is increasingly used in Legionnaires’ disease (LD) outbreak investigations, owing to its higher resolution than sequence-based typing, the gold standard typing method for Legionella pneumophila, in the analysis of endemic strains. Recently, a gene-by-gene typing approach based on 1,521 core genes called core genome multilocus sequence typing (cgMLST) was described that enables a robust and standardised typing of L. pneumophila. Methods: We applied this cgMLST scheme to isolates obtained during the largest outbreak of LD reported so far in Germany. In this outbreak, the epidemic clone ST345 had been isolated from patients and four different environmental sources. In total 42 clinical and environmental isolates were retrospectively typed. Results: Epidemiologically unrelated ST345 isolates were clearly distinguishable from the epidemic clone. Remarkably, epidemic isolates split up into two distinct clusters, ST345-A and ST345-B, each respectively containing a mix of clinical and epidemiologically-related environmental samples. Discussion/conclusion: The outbreak was therefore likely caused by both variants of the single sequence type, which pre-existed in the environmental reservoirs. The two clusters differed by 40 alleles located in two neighbouring genomic regions of ca 42 and 26 kb. Additional analysis supported horizontal gene transfer of the two regions as responsible for the difference between the variants. Both regions comprise virulence genes and have previously been reported to be involved in recombination events. This corroborates the notion that genomic outbreak investigations should always take epidemiological information into consideration when making inferences. Overall, cgMLST proved helpful in disentangling the complex genomic epidemiology of the outbreak. PMID:29162202

  4. Epidemiological information is key when interpreting whole genome sequence data - lessons learned from a large Legionella pneumophila outbreak in Warstein, Germany, 2013.

    PubMed

    Petzold, Markus; Prior, Karola; Moran-Gilad, Jacob; Harmsen, Dag; Lück, Christian

    2017-11-01

    IntroductionWhole genome sequencing (WGS) is increasingly used in Legionnaires' disease (LD) outbreak investigations, owing to its higher resolution than sequence-based typing, the gold standard typing method for Legionella pneumophila, in the analysis of endemic strains. Recently, a gene-by-gene typing approach based on 1,521 core genes called core genome multilocus sequence typing (cgMLST) was described that enables a robust and standardised typing of L. pneumophila . Methods : We applied this cgMLST scheme to isolates obtained during the largest outbreak of LD reported so far in Germany. In this outbreak, the epidemic clone ST345 had been isolated from patients and four different environmental sources. In total 42 clinical and environmental isolates were retrospectively typed. Results : Epidemiologically unrelated ST345 isolates were clearly distinguishable from the epidemic clone. Remarkably, epidemic isolates split up into two distinct clusters, ST345-A and ST345-B, each respectively containing a mix of clinical and epidemiologically-related environmental samples. Discussion/conclusion : The outbreak was therefore likely caused by both variants of the single sequence type, which pre-existed in the environmental reservoirs. The two clusters differed by 40 alleles located in two neighbouring genomic regions of ca 42 and 26 kb. Additional analysis supported horizontal gene transfer of the two regions as responsible for the difference between the variants. Both regions comprise virulence genes and have previously been reported to be involved in recombination events. This corroborates the notion that genomic outbreak investigations should always take epidemiological information into consideration when making inferences. Overall, cgMLST proved helpful in disentangling the complex genomic epidemiology of the outbreak.

  5. Vibrio cholerae typing phage N4: genome sequence and its relatedness to T7 viral supergroup.

    PubMed

    Das, Mayukh; Nandy, R K; Bhowmick, Tushar Suvra; Yamasaki, S; Ghosh, A; Nair, G B; Sarkar, B L

    2012-01-01

    In countries where cholera is endemic, Vibrio cholerae O1 bacteriophages have been detected in sewage water. These have been used to serve not only as strain markers, but also for the typing of V. cholerae strains. Vibriophage N4 (ATCC 51352-B1) occupies a unique position in the new phage-typing scheme and can infect a larger number of V. cholerae O1 biotype El Tor strains. Here we characterized the complete genome sequence of this typing vibriophage. The complete DNA sequence of the N4 genome was determined by using a shotgun sequencing approach. Complete genome sequence explored that phage N4 is comprised of one circular, double-stranded chromosome of 38,497 bp with an overall GC content of 42.8%. A total of 47 open reading frames were identified and functions could be assigned to 30 of them. Further, a close relationship with another vibriophage, VP4, and the enterobacteriophage T7 could be established. DNA-DNA hybridization among V. cholerae O1 and O139 phages revealed homology among O1 vibriophages at their genomic level. This study indicates two evolutionary distinctive branches of the possible phylogenetic origin of O1 and O139 vibriophages and provides an unveiled collection of information on viral gene products of typing vibriophages. Copyright © 2011 S. Karger AG, Basel.

  6. Complete genome sequence of Cellulophaga lytica type strain (LIM-21T)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pati, Amrita; Abt, Birte; Teshima, Hazuki

    Cellulophaga lytica (Lewin 1969) Johansen et al. 1999 is the type species of the genus Cellulophaga, which belongs to the family Flavobacteriaceae within the phylum 'Bacteroidetes' and was isolated from marine beach mud in Limon, Costa Rica. The species is of biotechnological interest because its members produce a wide range of extracellular enzymes capable of degrading proteins and polysaccharides. After the genome sequence of Cellulophaga algicola this is the second completed genome sequence of a member of the genus Cellulophaga. The 3,765,936 bp long genome with its 3,303 protein-coding and 55 RNA genes consists of one circular chromosome and ismore » a part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  7. Complete genome sequence of Nocardiopsis dassonvillei type strain (IMRU 509T)

    PubMed Central

    Sun, Hui; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Del Rio, Tijana Glavina; Tice, Hope; Cheng, Jan-Fang; Tapia, Roxane; Han, Cliff; Goodwin, Lynne; Pitluck, Sam; Pagani, Ioanna; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Djao, Olivier Duplex Ngatchou; Rohde, Manfred; Sikorski, Johannes; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2010-01-01

    Nocardiopsis dassonvillei (Brocq-Rousseau 1904) Meyer 1976 is the type species of the genus Nocardiopsis, which in turn is the type genus of the family Nocardiopsaceae. This species is of interest because of its ecological versatility. Members of N. dassonvillei have been isolated from a large variety of natural habitats such as soil and marine sediments, from different plant and animal materials as well as from human patients. Moreover, representatives of the genus Nocardiopsis participate actively in biopolymer degradation. This is the first complete genome sequence in the family Nocardiopsaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 6,543,312 bp long genome consist of a 5.77 Mbp chromosome and a 0.78 Mbp plasmid and with its 5,570 protein-coding and 77 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304737

  8. The (in)complete organelle genome: exploring the use and nonuse of available technologies for characterizing mitochondrial and plastid chromosomes.

    PubMed

    Sanitá Lima, Matheus; Woods, Laura C; Cartwright, Matthew W; Smith, David Roy

    2016-11-01

    Not long ago, scientists paid dearly in time, money and skill for every nucleotide that they sequenced. Today, DNA sequencing technologies epitomize the slogan 'faster, easier, cheaper and more', and in many ways, sequencing an entire genome has become routine, even for the smallest laboratory groups. This is especially true for mitochondrial and plastid genomes. Given their relatively small sizes and high copy numbers per cell, organelle DNAs are currently among the most highly sequenced kind of chromosome. But accurately characterizing an organelle genome and the information it encodes can require much more than DNA sequencing and bioinformatics analyses. Organelle genomes can be surprisingly complex and can exhibit convoluted and unconventional modes of gene expression. Unravelling this complexity can demand a wide assortment of experiments, from pulsed-field gel electrophoresis to Southern and Northern blots to RNA analyses. Here, we show that it is exactly these types of 'complementary' analyses that are often lacking from contemporary organelle genome papers, particularly short 'genome announcement' articles. Consequently, crucial and interesting features of organelle chromosomes are going undescribed, which could ultimately lead to a poor understanding and even a misrepresentation of these genomes and the genes they express. High-throughput sequencing and bioinformatics have made it easy to sequence and assemble entire chromosomes, but they should not be used as a substitute for or at the expense of other types of genomic characterization methods. © 2016 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.

  9. Draft Genome Sequences of Mycobacterium setense Type Strain DSM-45070 and the Nonpathogenic Strain Manresensis, Isolated from the Bank of the Cardener River in Manresa, Catalonia, Spain

    PubMed Central

    Vilaplana, Cristina; Velasco, Juan; Pluvinet, Raquel; Santín, Sheila; Prat, Cristina; Julián, Esther; Alcaide, Fernando; Comas, Iñaki; Sumoy, Lauro; Cardona, Pere-Joan

    2015-01-01

    We present here the draft genome sequences of two Mycobacterium setense strains. One of them corresponds to the M. setense type strain DSM-45070, originally isolated from a patient with a posttraumatic chronic skin abscess. The other one corresponds to the nonpathogenic M. setense strain Manresensis, isolated from the Cardener River crossing Manresa, Catalonia, Spain. A comparative genomic analysis shows a smaller genome size and fewer genes in M. setense strain Manresensis relative to those of the type strain, and it shows the genome segments unique to each strain. PMID:25657273

  10. The Genome Sequence of a Type ST239 Methicillin-Resistant Staphylococcus aureus Isolate from a Malaysian Hospital

    PubMed Central

    Lee, LS; Teh, LK; Zainuddin, ZF; Salleh, MZ

    2014-01-01

    We report the genome sequence of a healthcare-associated MRSA type ST239 clone isolated from a patient with septicemia in Malaysia. This clone typifies the characteristics of ST239 lineage, including resistance to multiple antibiotics and antiseptics. PMID:25197474

  11. Complete mitochondrial genome sequences of Brassica rapa (Chinese cabbage and mizuna), and intraspecific differentiation of cytoplasm in B. rapa and Brassica juncea.

    PubMed

    Hatono, Saki; Nishimura, Kaori; Murakami, Yoko; Tsujimura, Mai; Yamagishi, Hiroshi

    2017-09-01

    The complete sequence of the mitochondrial genome was determined for two cultivars of Brassica rapa . After determining the sequence of a Chinese cabbage variety, 'Oushou hakusai', the sequence of a mizuna variety, 'Chusei shiroguki sensuji kyomizuna', was mapped against the sequence of Chinese cabbage. The precise sequences where the two varieties demonstrated variation were ascertained by direct sequencing. It was found that the mitochondrial genomes of the two varieties are identical over 219,775 bp, with a single nucleotide polymorphism (SNP) between the genomes. Because B. rapa is the maternal species of an amphidiploid crop species, Brassica juncea , the distribution of the SNP was observed both in B. rapa and B. juncea . While the mizuna type SNP was restricted mainly to cultivars of mizuna (japonica group) in B. rapa , the mizuna type was widely distributed in B. juncea . The finding that the two Brassica species have these SNP types in common suggests that the nucleotide substitution occurred in wild B. rapa before both mitotypes were domesticated. It was further inferred that the interspecific hybridization between B. rapa and B. nigra took place twice and resulted in the two mitotypes of cultivated B. juncea .

  12. Complete genome sequence of Brachyspira murdochii type strain (56-150T)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pati, Amrita; Sikorski, Johannes; Gronow, Sabine

    2010-01-01

    Brachyspira murdochii Stanton et al. 1992 is a non-pathogenic but host-associated spirochete of the family Brachyspiraceae. Initially isolated from the intestinal content of a healthy swine, the group B spirochaetes were first described under the basonym Serpulina murdochii. Members of the family Brachyspiraceae are of great phylogenetic interest because of the extremely isolated location of this family within the phylum Spirochaetes . Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a type strain of a member of the family Brachyspiraceaeand only the second genomemore » sequence from a member of the genus Brachyspira. The 3,241,804 bp long genome with its 2,893 protein-coding and 40 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  13. BrucellaBase: Genome information resource.

    PubMed

    Sankarasubramanian, Jagadesan; Vishnu, Udayakumar S; Khader, L K M Abdul; Sridhar, Jayavel; Gunasekaran, Paramasamy; Rajendhran, Jeyaprakash

    2016-09-01

    Brucella sp. causes a major zoonotic disease, brucellosis. Brucella belongs to the family Brucellaceae under the order Rhizobiales of Alphaproteobacteria. We present BrucellaBase, a web-based platform, providing features of a genome database together with unique analysis tools. We have developed a web version of the multilocus sequence typing (MLST) (Whatmore et al., 2007) and phylogenetic analysis of Brucella spp. BrucellaBase currently contains genome data of 510 Brucella strains along with the user interfaces for BLAST, VFDB, CARD, pairwise genome alignment and MLST typing. Availability of these tools will enable the researchers interested in Brucella to get meaningful information from Brucella genome sequences. BrucellaBase will regularly be updated with new genome sequences, new features along with improvements in genome annotations. BrucellaBase is available online at http://www.dbtbrucellosis.in/brucellabase.html or http://59.99.226.203/brucellabase/homepage.html. Copyright © 2016 Elsevier B.V. All rights reserved.

  14. Near-complete genome sequence of the cellulolytic Bacterium Bacteroides ( Pseudobacteroides) cellulosolvens ATCC 35603

    DOE PAGES

    Dassa, Bareket; Utturkar, Sagar M.; Hurt, Richard A.; ...

    2015-09-24

    We report the single-contig genome sequence of the anaerobic, mesophilic, cellulolytic bacterium, Bacteroides cellulosolvens. The bacterium produces a particularly elaborate cellulosome system, whereas the types of cohesin-dockerin interactions are opposite of other known cellulosome systems: cell-surface attachment is thus mediated via type-I interactions whereas enzymes are integrated via type-II interactions.

  15. Defining and Evaluating a Core Genome Multilocus Sequence Typing Scheme for Genome-Wide Typing of Clostridium difficile.

    PubMed

    Bletz, Stefan; Janezic, Sandra; Harmsen, Dag; Rupnik, Maja; Mellmann, Alexander

    2018-06-01

    Clostridium difficile , recently renamed Clostridioides difficile , is the most common cause of antibiotic-associated nosocomial gastrointestinal infections worldwide. To differentiate endogenous infections and transmission events, highly discriminatory subtyping is necessary. Today, methods based on whole-genome sequencing data are increasingly used to subtype bacterial pathogens; however, frequently a standardized methodology and typing nomenclature are missing. Here we report a core genome multilocus sequence typing (cgMLST) approach developed for C. difficile Initially, we determined the breadth of the C. difficile population based on all available MLST sequence types with Bayesian inference (BAPS). The resulting BAPS partitions were used in combination with C. difficile clade information to select representative isolates that were subsequently used to define cgMLST target genes. Finally, we evaluated the novel cgMLST scheme with genomes from 3,025 isolates. BAPS grouping ( n = 6 groups) together with the clade information led to a total of 11 representative isolates that were included for cgMLST definition and resulted in 2,270 cgMLST genes that were present in all isolates. Overall, 2,184 to 2,268 cgMLST targets were detected in the genome sequences of 70 outbreak-associated and reference strains, and on average 99.3% cgMLST targets (1,116 to 2,270 targets) were present in 2,954 genomes downloaded from the NCBI database, underlining the representativeness of the cgMLST scheme. Moreover, reanalyzing different cluster scenarios with cgMLST were concordant to published single nucleotide variant analyses. In conclusion, the novel cgMLST is representative for the whole C. difficile population, is highly discriminatory in outbreak situations, and provides a unique nomenclature facilitating interlaboratory exchange. Copyright © 2018 American Society for Microbiology.

  16. Pan-genome multilocus sequence typing and outbreak-specific reference-based single nucleotide polymorphism analysis to resolve two concurrent Staphylococcus aureus outbreaks in neonatal services.

    PubMed

    Roisin, S; Gaudin, C; De Mendonça, R; Bellon, J; Van Vaerenbergh, K; De Bruyne, K; Byl, B; Pouseele, H; Denis, O; Supply, P

    2016-06-01

    We used a two-step whole genome sequencing analysis for resolving two concurrent outbreaks in two neonatal services in Belgium, caused by exfoliative toxin A-encoding-gene-positive (eta+) methicillin-susceptible Staphylococcus aureus with an otherwise sporadic spa-type t209 (ST-109). Outbreak A involved 19 neonates and one healthcare worker in a Brussels hospital from May 2011 to October 2013. After a first episode interrupted by decolonization procedures applied over 7 months, the outbreak resumed concomitantly with the onset of outbreak B in a hospital in Asse, comprising 11 neonates and one healthcare worker from mid-2012 to January 2013. Pan-genome multilocus sequence typing, defined on the basis of 42 core and accessory reference genomes, and single-nucleotide polymorphisms mapped on an outbreak-specific de novo assembly were used to compare 28 available outbreak isolates and 19 eta+/spa-type t209 isolates identified by routine or nationwide surveillance. Pan-genome multilocus sequence typing showed that the outbreaks were caused by independent clones not closely related to any of the surveillance isolates. Isolates from only ten cases with overlapping stays in outbreak A, including four pairs of twins, showed no or only a single nucleotide polymorphism variation, indicating limited sequential transmission. Detection of larger genomic variation, even from the start of the outbreak, pointed to sporadic seeding from a pre-existing exogenous source, which persisted throughout the whole course of outbreak A. Whole genome sequencing analysis can provide unique fine-tuned insights into transmission pathways of complex outbreaks even at their inception, which, with timely use, could valuably guide efforts for early source identification. Copyright © 2016 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.

  17. Reconstitution of wild type viral DNA in simian cells transfected with early and late SV40 defective genomes.

    PubMed

    O'Neill, F J; Gao, Y; Xu, X

    1993-11-01

    The DNAs of polyomaviruses ordinarily exist as a single circular molecule of approximately 5000 base pairs. Variants of SV40, BKV and JCV have been described which contain two complementing defective DNA molecules. These defectives, which form a bipartite genome structure, contain either the viral early region or the late region. The defectives have the unique property of being able to tolerate variable sized reiterations of regulatory and terminus region sequences, and portions of the coding region. They can also exchange coding region sequences with other polyomaviruses. It has been suggested that the bipartite genome structure might be a stage in the evolution of polyomaviruses which can uniquely sustain genome and sequence diversity. However, it is not known if the regulatory and terminus region sequences are highly mutable. Also, it is not known if the bipartite genome structure is reversible and what the conditions might be which would favor restoration of the monomolecular genome structure. We addressed the first question by sequencing the reiterated regulatory and terminus regions of E- and L-SV40 DNAs. This revealed a large number of mutations in the regulatory regions of the defective genomes, including deletions, insertions, rearrangements and base substitutions. We also detected insertions and base substitutions in the T-antigen gene. We addressed the second question by introducing into permissive simian cells, E- and L-SV40 genomes which had been engineered to contain only a single regulatory region. Analysis of viral DNA from transfected cells demonstrated recombined genomes containing a wild type monomolecular DNA structure. However, the complete defectives, containing reiterated regulatory regions, could often compete away the wild type genomes. The recombinant monomolecular genomes were isolated, cloned and found to be infectious. All of the DNA alterations identified in one of the regulatory regions of E-SV40 DNA were present in the recombinant monomolecular genomes. These and other findings indicate that the bipartite genome state can sustain many mutations which wtSV40 cannot directly sustain. However, the mutations can later be introduced into the wild type genomes when the E- and L-SV40 DNAs recombine to generate a new monomolecular genome structure.

  18. Genome sequence of the phylogenetically isolated spirochete Leptonema illini type strain (3055T)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huntemann, Marcel; Stackebrandt, Erko; Held, Brittany

    2013-01-01

    Leptonema illini Hovind-Hougen 1979 is the type species of the genus Leptonema, family Leptospiraceae, phylum Spirochaetes. Organisms of this family have a Gram-negative-like cell enve- lope consisting of a cytoplasmic membrane and an outer membrane. The peptidoglycan layer is as- sociated with the cytoplasmic rather than the outer membrane. The two flagella of members of Leptospiraceae extend from the cytoplasmic membrane at the ends of the bacteria into the periplasmic space and are necessary for their motility. Here we describe the features of the L. illini type strain, together with the complete genome sequence, and annotation. This is the firstmore » genome sequence (finished at the level of Improved High Quality Draft) to be reported from of a member of the genus Leptonema and a representative of the third genus of the family Leptospiraceae for which complete or draft genome sequences are now available. The three scaffolds of the 4,522,760 bp draft genome sequence reported here, and its 4,230 protein-coding and 47 RNA genes are part of the Ge- nomic Encyclopedia of Bacteria and Archaea project.« less

  19. Evaluation of next generation mtGenome sequencing using the Ion Torrent Personal Genome Machine (PGM)☆

    PubMed Central

    Parson, Walther; Strobl, Christina; Huber, Gabriela; Zimmermann, Bettina; Gomes, Sibylle M.; Souto, Luis; Fendt, Liane; Delport, Rhena; Langit, Reina; Wootton, Sharon; Lagacé, Robert; Irwin, Jodi

    2013-01-01

    Insights into the human mitochondrial phylogeny have been primarily achieved by sequencing full mitochondrial genomes (mtGenomes). In forensic genetics (partial) mtGenome information can be used to assign haplotypes to their phylogenetic backgrounds, which may, in turn, have characteristic geographic distributions that would offer useful information in a forensic case. In addition and perhaps even more relevant in the forensic context, haplogroup-specific patterns of mutations form the basis for quality control of mtDNA sequences. The current method for establishing (partial) mtDNA haplotypes is Sanger-type sequencing (STS), which is laborious, time-consuming, and expensive. With the emergence of Next Generation Sequencing (NGS) technologies, the body of available mtDNA data can potentially be extended much more quickly and cost-efficiently. Customized chemistries, laboratory workflows and data analysis packages could support the community and increase the utility of mtDNA analysis in forensics. We have evaluated the performance of mtGenome sequencing using the Personal Genome Machine (PGM) and compared the resulting haplotypes directly with conventional Sanger-type sequencing. A total of 64 mtGenomes (>1 million bases) were established that yielded high concordance with the corresponding STS haplotypes (<0.02% differences). About two-thirds of the differences were observed in or around homopolymeric sequence stretches. In addition, the sequence alignment algorithm employed to align NGS reads played a significant role in the analysis of the data and the resulting mtDNA haplotypes. Further development of alignment software would be desirable to facilitate the application of NGS in mtDNA forensic genetics. PMID:23948325

  20. The Genome sequences of four non-human/non-clinical Salmonella enterica serovar Kentucky ST198 isolates recovered between 1972 and 1973

    USDA-ARS?s Scientific Manuscript database

    Salmonella Kentucky is a polyphyletic member of S. enterica subclade A1 with multiple sequence types that often colonize the same hosts but in different frequencies on different continents. To evaluate the genomic features involved in S. Kentucky host specificity we sequenced the genomes of four iso...

  1. Distilled single-cell genome sequencing and de novo assembly for sparse microbial communities.

    PubMed

    Taghavi, Zeinab; Movahedi, Narjes S; Draghici, Sorin; Chitsaz, Hamidreza

    2013-10-01

    Identification of every single genome present in a microbial sample is an important and challenging task with crucial applications. It is challenging because there are typically millions of cells in a microbial sample, the vast majority of which elude cultivation. The most accurate method to date is exhaustive single-cell sequencing using multiple displacement amplification, which is simply intractable for a large number of cells. However, there is hope for breaking this barrier, as the number of different cell types with distinct genome sequences is usually much smaller than the number of cells. Here, we present a novel divide and conquer method to sequence and de novo assemble all distinct genomes present in a microbial sample with a sequencing cost and computational complexity proportional to the number of genome types, rather than the number of cells. The method is implemented in a tool called Squeezambler. We evaluated Squeezambler on simulated data. The proposed divide and conquer method successfully reduces the cost of sequencing in comparison with the naïve exhaustive approach. Squeezambler and datasets are available at http://compbio.cs.wayne.edu/software/squeezambler/.

  2. Whole genome sequence and comparative analysis of Borrelia burgdorferi MM1

    PubMed Central

    Jabbari, Neda; Reddy, Panga Jaipal; Hood, Leroy

    2018-01-01

    Lyme disease is caused by spirochaetes of the Borrelia burgdorferi sensu lato genospecies. Complete genome assemblies are available for fewer than ten strains of Borrelia burgdorferi sensu stricto, the primary cause of Lyme disease in North America. MM1 is a sensu stricto strain originally isolated in the midwestern United States. Aside from a small number of genes, the complete genome sequence of this strain has not been reported. Here we present the complete genome sequence of MM1 in relation to other sensu stricto strains and in terms of its Multi Locus Sequence Typing. Our results indicate that MM1 is a new sequence type which contains a conserved main chromosome and 15 plasmids. Our results include the first contiguous 28.5 kb assembly of lp28-8, a linear plasmid carrying the vls antigenic variation system, from a Borrelia burgdorferi sensu stricto strain. PMID:29889842

  3. Use of Whole-Genus Genome Sequence Data To Develop a Multilocus Sequence Typing Tool That Accurately Identifies Yersinia Isolates to the Species and Subspecies Levels

    PubMed Central

    Hall, Miquette; Chattaway, Marie A.; Reuter, Sandra; Savin, Cyril; Strauch, Eckhard; Carniel, Elisabeth; Connor, Thomas; Van Damme, Inge; Rajakaruna, Lakshani; Rajendram, Dunstan; Jenkins, Claire; Thomson, Nicholas R.

    2014-01-01

    The genus Yersinia is a large and diverse bacterial genus consisting of human-pathogenic species, a fish-pathogenic species, and a large number of environmental species. Recently, the phylogenetic and population structure of the entire genus was elucidated through the genome sequence data of 241 strains encompassing every known species in the genus. Here we report the mining of this enormous data set to create a multilocus sequence typing-based scheme that can identify Yersinia strains to the species level to a level of resolution equal to that for whole-genome sequencing. Our assay is designed to be able to accurately subtype the important human-pathogenic species Yersinia enterocolitica to whole-genome resolution levels. We also report the validation of the scheme on 386 strains from reference laboratory collections across Europe. We propose that the scheme is an important molecular typing system to allow accurate and reproducible identification of Yersinia isolates to the species level, a process often inconsistent in nonspecialist laboratories. Additionally, our assay is the most phylogenetically informative typing scheme available for Y. enterocolitica. PMID:25339391

  4. Assembly of highly repetitive genomes using short reads: the genome of discrete typing unit III Trypanosoma cruzi strain 231.

    PubMed

    Baptista, Rodrigo P; Reis-Cunha, Joao Luis; DeBarry, Jeremy D; Chiari, Egler; Kissinger, Jessica C; Bartholomeu, Daniella C; Macedo, Andrea M

    2018-02-14

    Next-generation sequencing (NGS) methods are low-cost high-throughput technologies that produce thousands to millions of sequence reads. Despite the high number of raw sequence reads, their short length, relative to Sanger, PacBio or Nanopore reads, complicates the assembly of genomic repeats. Many genome tools are available, but the assembly of highly repetitive genome sequences using only NGS short reads remains challenging. Genome assembly of organisms responsible for important neglected diseases such as Trypanosoma cruzi, the aetiological agent of Chagas disease, is known to be challenging because of their repetitive nature. Only three of six recognized discrete typing units (DTUs) of the parasite have their draft genomes published and therefore genome evolution analyses in the taxon are limited. In this study, we developed a computational workflow to assemble highly repetitive genomes via a combination of de novo and reference-based assembly strategies to better overcome the intrinsic limitations of each, based on Illumina reads. The highly repetitive genome of the human-infecting parasite T. cruzi 231 strain was used as a test subject. The combined-assembly approach shown in this study benefits from the reference-based assembly ability to resolve highly repetitive sequences and from the de novo capacity to assemble genome-specific regions, improving the quality of the assembly. The acceptable confidence obtained by analyzing our results showed that our combined approach is an attractive option to assemble highly repetitive genomes with NGS short reads. Phylogenomic analysis including the 231 strain, the first representative of DTU III whose genome was sequenced, was also performed and provides new insights into T. cruzi genome evolution.

  5. Complete genome sequencing of a multidrug-resistant and human-invasive Salmonella enterica serovar Typhimurium strain of the emerging sequence type 213 genotype

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Calva, Edmundo; Silva, Claudia; Zaidi, Mussaret B.

    Salmonella enterica subsp. enterica serovar Typhimurium strain YU39 was isolated in 2005 in the state of Yucatán, Mexico, from a human systemic infection. The YU39 strain is representative of the multidrug-resistant emergent sequence type 213 (ST213) genotype. The YU39 complete genome is composed of a chromosome and seven plasmids.

  6. Complete genome sequencing of a multidrug-resistant and human-invasive Salmonella enterica serovar Typhimurium strain of the emerging sequence type 213 genotype

    DOE PAGES

    Calva, Edmundo; Silva, Claudia; Zaidi, Mussaret B.; ...

    2015-06-18

    Salmonella enterica subsp. enterica serovar Typhimurium strain YU39 was isolated in 2005 in the state of Yucatán, Mexico, from a human systemic infection. The YU39 strain is representative of the multidrug-resistant emergent sequence type 213 (ST213) genotype. The YU39 complete genome is composed of a chromosome and seven plasmids.

  7. Genomic sequence for the aflatoxigenic filamentous fungus Aspergillus nomius

    USDA-ARS?s Scientific Manuscript database

    The genome of the A. nomius type strain was sequenced using a personal genome machine. Annotation of the genes was undertaken, followed by gene ontology and an investigation into the number of secondary metabolite clusters. Comparative studies with other Aspergillus species involved shared/unique ge...

  8. Nearing saturation of cancer driver gene discovery.

    PubMed

    Hsiehchen, David; Hsieh, Antony

    2018-06-15

    Extensive sequencing efforts of cancer genomes such as The Cancer Genome Atlas (TCGA) have been undertaken to uncover bona fide cancer driver genes which has enhanced our understanding of cancer and revealed therapeutic targets. However, the number of driver gene mutations is bounded, indicating that there must be a point when further sequencing efforts will be excessive. We found that there was a significant positive correlation between sample size and identified driver gene mutations across 33 cancers sequenced by the TCGA, which is expected if additional sequencing is still leading to the identification of more driver genes. However, the rate of new cancer driver genes being discovered with larger samples is declining rapidly. Our analysis provides a general guide for determining which cancer types would likely benefit from additional sequencing efforts, particularly those with relatively high rates of cancer driver gene discovery. Our results argue that past strategies of indiscriminately sequencing as many specimens as possible for all cancer types is becoming inefficient. In addition, without significant investments into applying our knowledge of cancer genomes, we risk sequencing more cancer genomes for the sake of sequencing rather than meaningful patient benefit.

  9. Complete genome sequence of Tsukamurella paurometabola type strain (no. 33T)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Munk, Christine; Lapidus, Alla L.; Lucas, Susan

    2011-01-01

    Tsukamurella paurometabola corrig. (Steinhaus 1941) Collins et al. 1988 is the type species of the genus Tsukamurella, which is the type genus to the family Tsukamurellaceae. The spe- cies is not only of interest because of its isolated phylogenetic location, but also because it is a human opportunistic pathogen with some strains of the species reported to cause lung in- fection, lethal meningitis, and necrotizing tenosynovitis. This is the first completed genome sequence of a member of the genus Tsukamurella and the first genome sequence of a member of the family Tsukamurellaceae. The 4,479,724 bp long genome contains a 99,806more » bp long plasmid and a total of 4,335 protein-coding and 56 RNA genes, and is a part of the Ge- nomic Encyclopedia of Bacteria and Archaea project.« less

  10. Draft Genome Sequence of Mycobacterium chimaera Type Strain Fl-0169

    EPA Science Inventory

    We report the draft genome sequence of the type strain Mycobacterium chimaera Fl-0169T, a member of the Mycobacterium avium complex (MAC). M. chimaera Fl-0169T was isolated from a patient in Italy and is highly similar to strains of M. chimaera isolated in Ireland, though Fl-016...

  11. Genome Sequence of the Symbiotic Type Strain Rhizobium tibeticum CCBAU85039T

    PubMed Central

    Wibberg, Daniel; Winkler, Anika; Ormeño-Orrillo, Ernesto; Martínez-Romero, Esperanza; Niehaus, Karsten; Pühler, Alfred; Kalinowski, Jörn; Lagares, Antonio; Schlüter, Andreas; Pistorio, Mariano

    2017-01-01

    ABSTRACT Rhizobium tibeticum was originally isolated from root nodules of Trigonella archiducis-nicolai grown in Tibet, China. This species is also able to nodulate Medicago sativa and Phaseolus vulgaris. The whole-genome sequence of the type strain, R. tibeticum CCBAU85039T, is reported in this study. PMID:28126941

  12. Core genome conservation of Staphylococcus haemolyticus limits sequence based population structure analysis.

    PubMed

    Cavanagh, Jorunn Pauline; Klingenberg, Claus; Hanssen, Anne-Merethe; Fredheim, Elizabeth Aarag; Francois, Patrice; Schrenzel, Jacques; Flægstad, Trond; Sollid, Johanna Ericson

    2012-06-01

    The notoriously multi-resistant Staphylococcus haemolyticus is an emerging pathogen causing serious infections in immunocompromised patients. Defining the population structure is important to detect outbreaks and spread of antimicrobial resistant clones. Currently, the standard typing technique is pulsed-field gel electrophoresis (PFGE). In this study we describe novel molecular typing schemes for S. haemolyticus using multi locus sequence typing (MLST) and multi locus variable number of tandem repeats (VNTR) analysis. Seven housekeeping genes (MLST) and five VNTR loci (MLVF) were selected for the novel typing schemes. A panel of 45 human and veterinary S. haemolyticus isolates was investigated. The collection had diverse PFGE patterns (38 PFGE types) and was sampled over a 20 year-period from eight countries. MLST resolved 17 sequence types (Simpsons index of diversity [SID]=0.877) and MLVF resolved 14 repeat types (SID=0.831). We found a low sequence diversity. Phylogenetic analysis clustered the isolates in three (MLST) and one (MLVF) clonal complexes, respectively. Taken together, neither the MLST nor the MLVF scheme was suitable to resolve the population structure of this S. haemolyticus collection. Future MLVF and MLST schemes will benefit from addition of more variable core genome sequences identified by comparing different fully sequenced S. haemolyticus genomes. Copyright © 2012 Elsevier B.V. All rights reserved.

  13. The Organelle Genomes of Hassawi Rice (Oryza sativa L.) and Its Hybrid in Saudi Arabia: Genome Variation, Rearrangement, and Origins

    PubMed Central

    Zhang, Tongwu; Hu, Songnian; Zhang, Guangyu; Pan, Linlin; Zhang, Xiaowei; Al-Mssallem, Ibrahim S.; Yu, Jun

    2012-01-01

    Hassawi rice (Oryza sativa L.) is a landrace adapted to the climate of Saudi Arabia, characterized by its strong resistance to soil salinity and drought. Using high quality sequencing reads extracted from raw data of a whole genome sequencing project, we assembled both chloroplast (cp) and mitochondrial (mt) genomes of the wild-type Hassawi rice (Hassawi-1) and its dwarf hybrid (Hassawi-2). We discovered 16 InDels (insertions and deletions) but no SNP (single nucleotide polymorphism) is present between the two Hassawi cp genomes. We identified 48 InDels and 26 SNPs in the two Hassawi mt genomes and a new type of sequence variation, termed reverse complementary variation (RCV) in the rice cp genomes. There are two and four RCVs identified in Hassawi-1 when compared to 93–11 (indica) and Nipponbare (japonica), respectively. Microsatellite sequence analysis showed there are more SSRs in the genic regions of both cp and mt genomes in the Hassawi rice than in the other rice varieties. There are also large repeats in the Hassawi mt genomes, with the longest length of 96,168 bp and 96,165 bp in Hassawi-1 and Hassawi-2, respectively. We believe that frequent DNA rearrangement in the Hassawi mt and cp genomes indicate ongoing dynamic processes to reach genetic stability under strong environmental pressures. Based on sequence variation analysis and the breeding history, we suggest that both Hassawi-1 and Hassawi-2 originated from the Indonesian variety Peta since genetic diversity between the two Hassawi cultivars is very low albeit an unknown historic origin of the wild-type Hassawi rice. PMID:22870184

  14. Draft genome sequence of Therminicola potens strain JR

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Byrne-Bailey, K.G.; Wrighton, K.C.; Melnyk, R.A.

    'Thermincola potens' strain JR is one of the first Gram-positive dissimilatory metal-reducing bacteria (DMRB) for which there is a complete genome sequence. Consistent with the physiology of this organism, preliminary annotation revealed an abundance of multiheme c-type cytochromes that are putatively associated with the periplasm and cell surface in a Gram-positive bacterium. Here we report the complete genome sequence of strain JR.

  15. Complete Genome Sequence of Marinobacter flavimaris LMG 23834T, Which Is Potentially Useful in Bioremediation.

    PubMed

    Palau, Montserrat; Boujida, Nadia; Manresa, Àngels; Miñana-Galbis, David

    2018-04-19

    The complete genome sequence of the halophilic strain Marinobacter flavimaris LMG 23834 T is presented here. The genomic information of this type strain will be useful for taxonomic purposes and for its potential use in bioremediation studies. Copyright © 2018 Palau et al.

  16. Complete Genome Sequencing of a Multidrug-Resistant and Human-Invasive Salmonella enterica Serovar Typhimurium Strain of the Emerging Sequence Type 213 Genotype

    PubMed Central

    Calva, Edmundo; Zaidi, Mussaret B.; Sanchez-Flores, Alejandro; Estrada, Karel; Silva, Genivaldo G. Z.; Soto-Jiménez, Luz M.; Wiesner, Magdalena; Fernández-Mora, Marcos; Edwards, Robert A.

    2015-01-01

    Salmonella enterica subsp. enterica serovar Typhimurium strain YU39 was isolated in 2005 in the state of Yucatán, Mexico, from a human systemic infection. The YU39 strain is representative of the multidrug-resistant emergent sequence type 213 (ST213) genotype. The YU39 complete genome is composed of a chromosome and seven plasmids. PMID:26089426

  17. Draft genome sequence of the marine bacterium Streptomyces griseoaurantiacus M045, which produces novel manumycin-type antibiotics with a pABA core component.

    PubMed

    Li, Fuchao; Jiang, Peng; Zheng, Huajun; Wang, Shengyue; Zhao, Guoping; Qin, Song; Liu, Zhaopu

    2011-07-01

    Streptomyces griseoaurantiacus M045, isolated from marine sediment, produces manumycin and chinikomycin antibiotics. Here we present a high-quality draft genome sequence of S. griseoaurantiacus M045, the first marine Streptomyces species to be sequenced and annotated. The genome encodes several gene clusters for biosynthesis of secondary metabolites and has provided insight into genomic islands linking secondary metabolism to functional adaptation in marine S. griseoaurantiacus M045.

  18. Complete genome sequence of Rhodospirillum rubrum type strain (S1).

    PubMed

    Munk, A Christine; Copeland, Alex; Lucas, Susan; Lapidus, Alla; Del Rio, Tijana Glavina; Barry, Kerrie; Detter, John C; Hammon, Nancy; Israni, Sanjay; Pitluck, Sam; Brettin, Thomas; Bruce, David; Han, Cliff; Tapia, Roxanne; Gilna, Paul; Schmutz, Jeremy; Larimer, Frank; Land, Miriam; Kyrpides, Nikos C; Mavromatis, Konstantinos; Richardson, Paul; Rohde, Manfred; Göker, Markus; Klenk, Hans-Peter; Zhang, Yaoping; Roberts, Gary P; Reslewic, Susan; Schwartz, David C

    2011-07-01

    Rhodospirillum rubrum (Esmarch 1887) Molisch 1907 is the type species of the genus Rhodospirillum, which is the type genus of the family Rhodospirillaceae in the class Alphaproteobacteria. The species is of special interest because it is an anoxygenic phototroph that produces extracellular elemental sulfur (instead of oxygen) while harvesting light. It contains one of the most simple photosynthetic systems currently known, lacking light harvesting complex 2. Strain S1(T) can grow on carbon monoxide as sole energy source. With currently over 1,750 PubMed entries, R. rubrum is one of the most intensively studied microbial species, in particular for physiological and genetic studies. Next to R. centenum strain SW, the genome sequence of strain S1(T) is only the second genome of a member of the genus Rhodospirillum to be published, but the first type strain genome from the genus. The 4,352,825 bp long chromosome and 53,732 bp plasmid with a total of 3,850 protein-coding and 83 RNA genes were sequenced as part of the DOE Joint Genome Institute Program DOEM 2002.

  19. Cronobacter, the emergent bacterial pathogen Enterobacter sakazakii comes of age; MLST and whole genome sequence analysis.

    PubMed

    Forsythe, Stephen J; Dickins, Benjamin; Jolley, Keith A

    2014-12-16

    Following the association of Cronobacter spp. to several publicized fatal outbreaks in neonatal intensive care units of meningitis and necrotising enterocolitis, the World Health Organization (WHO) in 2004 requested the establishment of a molecular typing scheme to enable the international control of the organism. This paper presents the application of Next Generation Sequencing (NGS) to Cronobacter which has led to the establishment of the Cronobacter PubMLST genome and sequence definition database (http://pubmlst.org/cronobacter/) containing over 1000 isolates with metadata along with the recognition of specific clonal lineages linked to neonatal meningitis and adult infections Whole genome sequencing and multilocus sequence typing (MLST) has supports the formal recognition of the genus Cronobacter composed of seven species to replace the former single species Enterobacter sakazakii. Applying the 7-loci MLST scheme to 1007 strains revealed 298 definable sequence types, yet only C. sakazakii clonal complex 4 (CC4) was principally associated with neonatal meningitis. This clonal lineage has been confirmed using ribosomal-MLST (51-loci) and whole genome-MLST (1865 loci) to analyse 107 whole genomes via the Cronobacter PubMLST database. This database has enabled the retrospective analysis of historic cases and outbreaks following re-identification of those strains. The Cronobacter PubMLST database offers a central, open access, reliable sequence-based repository for researchers. It has the capacity to create new analysis schemes 'on the fly', and to integrate metadata (source, geographic distribution, clinical presentation). It is also expandable and adaptable to changes in taxonomy, and able to support the development of reliable detection methods of use to industry and regulatory authorities. Therefore it meets the WHO (2004) request for the establishment of a typing scheme for this emergent bacterial pathogen. Whole genome sequencing has additionally shown a range of potential virulence and environmental fitness traits which may account for the association of C. sakazakii CC4 pathogenicity, and propensity for neonatal CNS.

  20. Comparative analysis of the full genome sequence of European bat lyssavirus type 1 and type 2 with other lyssaviruses and evidence for a conserved transcription termination and polyadenylation motif in the G-L 3' non-translated region.

    PubMed

    Marston, D A; McElhinney, L M; Johnson, N; Müller, T; Conzelmann, K K; Tordo, N; Fooks, A R

    2007-04-01

    We report the first full-length genomic sequences for European bat lyssavirus type-1 (EBLV-1) and type-2 (EBLV-2). The EBLV-1 genomic sequence was derived from a virus isolated from a serotine bat in Hamburg, Germany, in 1968 and the EBLV-2 sequence was derived from a virus isolate from a human case of rabies that occurred in Scotland in 2002. A long-distance PCR strategy was used to amplify the open reading frames (ORFs), followed by standard and modified RACE (rapid amplification of cDNA ends) techniques to amplify the 3' and 5' ends. The lengths of each complete viral genome for EBLV-1 and EBLV-2 were 11 966 and 11 930 base pairs, respectively, and follow the standard rhabdovirus genome organization of five viral proteins. Comparison with other lyssavirus sequences demonstrates variation in degrees of homology, with the genomic termini showing a high degree of complementarity. The nucleoprotein was the most conserved, both intra- and intergenotypically, followed by the polymerase (L), matrix and glyco- proteins, with the phosphoprotein being the most variable. In addition, we have shown that the two EBLVs utilize a conserved transcription termination and polyadenylation (TTP) motif, approximately 50 nt upstream of the L gene start codon. All available lyssavirus sequences to date, with the exception of Pasteur virus (PV) and PV-derived isolates, use the second TTP site. This observation may explain differences in pathogenicity between lyssavirus strains, dependent on the length of the untranslated region, which might affect transcriptional activity and RNA stability.

  1. Genome sequence of Frateuria aurantia type strain (Kondo 67(T)), a xanthomonade isolated from Lilium auratium Lindl.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, Iain; Teshima, Hazuki; Nolan, Matt

    2013-01-01

    rateuria aurantia (ex Kondo and Ameyama 1958) Swings et al. 1980 is a member of the bispecific genus Frateuria in the family Xanthomonadaceae, which is already heavily targeted for non-type strain genome sequencing. Strain Kondo 67(T) was initially (1958) identified as a member of 'Acetobacter aurantius', a name that was not considered for the approved list. Kondo 67(T) was therefore later designated as the type strain of the newly proposed acetogenic species Frateuria aurantia. The strain is of interest because of its triterpenoids (hopane family). F. aurantia Kondo 67(T) is the first member of the genus Frateura whose genome sequencemore » has been deciphered, and here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,603,458-bp long chromosome with its 3,200 protein-coding and 88 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  2. Complete genome sequence of the Phaeobacter gallaeciensis type strain CIP 105210(T) (= DSM 26640(T) = BS107(T)).

    PubMed

    Frank, Oliver; Pradella, Silke; Rohde, Manfred; Scheuner, Carmen; Klenk, Hans-Peter; Göker, Markus; Petersen, Jörn

    2014-06-15

    Phaeobacter gallaeciensis CIP 105210(T) (= DSM 26640(T) = BS107(T)) is the type strain of the species Phaeobacter gallaeciensis. The genus Phaeobacter belongs to the marine Roseobacter group (Rhodobacteraceae, Alphaproteobacteria). Phaeobacter species are effective colonizers of marine surfaces, including frequent associations with eukaryotes. Strain BS107(T) was isolated from a rearing of the scallop Pecten maximus. Here we describe the features of this organism, together with the complete genome sequence, comprising eight circular replicons with a total of 4,448 genes. In addition to a high number of extrachromosomal replicons, the genome contains six genomic island and three putative prophage regions, as well as a hybrid between a plasmid and a circular phage. Phylogenomic analyses confirm previous results, which indicated that the originally reported P. gallaeciensis type-strain deposit DSM 17395 belongs to P. inhibens and that CIP 105210(T) (= DSM 26640(T)) is the sole genome-sequenced representative of P. gallaeciensis.

  3. Complete genome sequence of Marivirga tractuosa type strain (H-43).

    PubMed

    Pagani, Ioanna; Chertkov, Olga; Lapidus, Alla; Lucas, Susan; Del Rio, Tijana Glavina; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Nolan, Matt; Saunders, Elizabeth; Pitluck, Sam; Held, Brittany; Goodwin, Lynne; Liolios, Konstantinos; Ovchinikova, Galina; Ivanova, Natalia; Mavromatis, Konstantinos; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Jeffries, Cynthia D; Detter, John C; Han, Cliff; Tapia, Roxanne; Ngatchou-Djao, Olivier D; Rohde, Manfred; Göker, Markus; Spring, Stefan; Sikorski, Johannes; Woyke, Tanja; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2011-04-29

    Marivirga tractuosa (Lewin 1969) Nedashkovskaya et al. 2010 is the type species of the genus Marivirga, which belongs to the family Flammeovirgaceae. Members of this genus are of interest because of their gliding motility. The species is of interest because representative strains show resistance to several antibiotics, including gentamicin, kanamycin, neomycin, polymixin and streptomycin. This is the first complete genome sequence of a member of the family Flammeovirgaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 4,511,574 bp long chromosome and the 4,916 bp plasmid with their 3,808 protein-coding and 49 RNA genes are a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  4. Complete genome sequence of Marivirga tractuosa type strain (H-43T)

    PubMed Central

    Pagani, Ioanna; Chertkov, Olga; Lapidus, Alla; Lucas, Susan; Del Rio, Tijana Glavina; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Nolan, Matt; Saunders, Elizabeth; Pitluck, Sam; Held, Brittany; Goodwin, Lynne; Liolios, Konstantinos; Ovchinikova, Galina; Ivanova, Natalia; Mavromatis, Konstantinos; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Jeffries, Cynthia D.; Detter, John C.; Han, Cliff; Tapia, Roxanne; Ngatchou-Djao, Olivier D.; Rohde, Manfred; Göker, Markus; Spring, Stefan; Sikorski, Johannes; Woyke, Tanja; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C.

    2011-01-01

    Marivirga tractuosa (Lewin 1969) Nedashkovskaya et al. 2010 is the type species of the genus Marivirga, which belongs to the family Flammeovirgaceae. Members of this genus are of interest because of their gliding motility. The species is of interest because representative strains show resistance to several antibiotics, including gentamicin, kanamycin, neomycin, polymixin and streptomycin. This is the first complete genome sequence of a member of the family Flammeovirgaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 4,511,574 bp long chromosome and the 4,916 bp plasmid with their 3,808 protein-coding and 49 RNA genes are a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21677852

  5. Complete genome sequence of Staphylothermus hellenicus P8T

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, Iain; Wirth, Reinhard; Lucas, Susan

    2011-01-01

    Staphylothermus hellenicus belongs to the order Desulfurococcales within the archaeal phy- lum Crenarchaeota. Strain P8T is the type strain of the species and was isolated from a shal- low hydrothermal vent system at Palaeochori Bay, Milos, Greece. It is a hyperthermophilic, anaerobic heterotroph. Here we describe the features of this organism together with the com- plete genome sequence and annotation. The 1,580,347 bp genome with its 1,668 protein- coding and 48 RNA genes was sequenced as part of a DOE Joint Genome Institute (JGI) La- boratory Sequencing Program (LSP) project.

  6. Genome Sequence of the Symbiotic Type Strain Rhizobium tibeticum CCBAU85039T.

    PubMed

    Torres Tejerizo, Gonzalo; Wibberg, Daniel; Winkler, Anika; Ormeño-Orrillo, Ernesto; Martínez-Romero, Esperanza; Niehaus, Karsten; Pühler, Alfred; Kalinowski, Jörn; Lagares, Antonio; Schlüter, Andreas; Pistorio, Mariano

    2017-01-26

    Rhizobium tibeticum was originally isolated from root nodules of Trigonella archiducis-nicolai grown in Tibet, China. This species is also able to nodulate Medicago sativa and Phaseolus vulgaris The whole-genome sequence of the type strain, R. tibeticum CCBAU85039 T , is reported in this study. Copyright © 2017 Torres Tejerizo et al.

  7. Draft Genome Sequence of Chryseobacterium sp. JV274 Isolated from Maize Rhizosphere

    PubMed Central

    Vacheron, Jordan; Dubost, Audrey; Chapulliot, David; Prigent-Combaret, Claire

    2017-01-01

    ABSTRACT We report the draft genome sequence of Chryseobacterium sp. JV274. This strain was isolated from the rhizosphere of maize during a greenhouse experiment. JV274 harbors genes involved in flexirubin production (darA and darB genes), bacterial competition (type VI secretion system), and gliding (bacterial motility; type IX secretion system). PMID:28408666

  8. Complete genome sequence of Coriobacterium glomerans type strain (PW2T) from the midgut of Pyrrhocoris apterus L. (red soldier bug)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stackebrandt, Erko; Zeytun, Ahmet; Lapidus, Alla L.

    2013-01-01

    Coriobacterium glomerans Haas and Ko nig 1988, is the only species of the genus Coriobacterium, family Coriobacteriaceae, order Coriobacteriales, phylum Actinobacteria. The bacterium thrives as an endosymbiont of pyrrhocorid bugs, i.e. the red fire bug Pyrrhocoris apterus L. The rationale for sequencing the genome of strain PW2T is its endosymbiotic life style which is rare among members of Actinobacteria. Here we describe the features of this symbiont, together with the complete genome sequence and its annotation. This is the first complete genome sequence of a member of the genus Coriobacterium and the sixth member of the order Coriobacteriales for whichmore » complete genome sequences are now available. The 2,115,681 bp long single replicon genome with its 1,804 protein-coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  9. Legionella oakridgensis ATCC 33761 genome sequence and phenotypic characterization reveals its replication capacity in amoebae.

    PubMed

    Brzuszkiewicz, Elzbieta; Schulz, Tino; Rydzewski, Kerstin; Daniel, Rolf; Gillmaier, Nadine; Dittmann, Christine; Holland, Gudrun; Schunder, Eva; Lautner, Monika; Eisenreich, Wolfgang; Lück, Christian; Heuner, Klaus

    2013-12-01

    Legionella oakridgensis is able to cause Legionnaires' disease, but is less virulent compared to L. pneumophila strains and very rarely associated with human disease. L. oakridgensis is the only species of the family legionellae which is able to grow on media without additional cysteine. In contrast to earlier publications, we found that L. oakridgensis is able to multiply in amoebae. We sequenced the genome of L. oakridgensis type strain OR-10 (ATCC 33761). The genome is smaller than the other yet sequenced Legionella genomes and has a higher G+C-content of 40.9%. L. oakridgensis lacks a flagellum and it also lacks all genes of the flagellar regulon except of the alternative sigma-28 factor FliA and the anti-sigma-28 factor FlgM. Genes encoding structural components of type I, type II, type IV Lvh and type IV Dot/Icm, Sec- and Tat-secretion systems could be identified. Only a limited set of Dot/Icm effector proteins have been recognized within the genome sequence of L. oakridgensis. Like in L. pneumophila strains, various proteins with eukaryotic motifs and eukaryote-like proteins were detected. We could demonstrate that the Dot/Icm system is essential for intracellular replication of L. oakridgensis. Furthermore, we identified new putative virulence factors of Legionella. Copyright © 2013 Elsevier GmbH. All rights reserved.

  10. Genome Sequences of Mycobacteriophages Amgine, Amohnition, Bella96, Cain, DarthP, Hammy, Krueger, LastHope, Peanam, PhelpsODU, Phrank, SirPhilip, Slimphazie, and Unicorn

    PubMed Central

    Anders, Kirk R.; Mavrodi, Dmitri V.; Vazquez, Edwin; Amoh, Nana Yaa A.; Baliraine, Frederick N.; Buchser, William J.; Cast, Thomas P.; Chamberlain, Carmen E.; Chung, Hui-Min; D’Angelo, William A.; Farris, Christian T.; Fernandez-Martinez, Mariceli; Fischman, Haley D.; Forsyth, Mark H.; Fortier, Anna G.; Gallo, Kara F.; Held, Greta J.; Lomas, Miguel A.; Maldonado-Vazquez, Natalia Y.; Moonsammy, Claudia H.; Namboote, Peace; Paudel, Sudip; Reyes, Gabriella M.; Rubin, Michael R.; Saha, Margaret S.; Stukey, Joseph; Tobias, Tristan D.; Garlena, Rebecca A.; Stoner, Ty H.; Russell, Daniel A.

    2017-01-01

    ABSTRACT We report the genome sequences of 14 cluster K mycobacteriophages isolated using Mycobacterium smegmatis mc²155 as host. Four are closely related to subcluster K1 phages, and 10 are members of subcluster K6. The phage genomes span considerable sequence diversity, including multiple types of integrases and integration sites. PMID:29217790

  11. Draft Genome Sequences of 510 Listeria monocytogenes Strains from Food Isolates and Human Listeriosis Cases from Northern Italy.

    PubMed

    Lomonaco, Sara; Gallina, Silvia; Filipello, Virginia; Sanchez Leon, Maria; Kastanis, George John; Allard, Marc; Brown, Eric; Amato, Ettore; Pontello, Mirella; Decastelli, Lucia

    2018-01-18

    Listeriosis outbreaks are frequently multistate/multicountry outbreaks, underlining the importance of molecular typing data for several diverse and well-characterized isolates. Large-scale whole-genome sequencing studies on Listeria monocytogenes isolates from non-U.S. locations have been limited. Herein, we describe the draft genome sequences of 510 L. monocytogenes isolates from northern Italy from different sources.

  12. Stress-induced rearrangement of Fusarium retrotransposon sequences.

    PubMed

    Anaya, N; Roncero, M I

    1996-11-27

    Rearrangement of fusarium oxysporum retrotransposon skippy was induced by growth in the presence of potassium chlorate. Three fungal strains, one sensitive to chlorate (Co60) and two resistant to chlorate and deficient for nitrate reductase (Co65 and Co94), were studied by Southern analysis of their genomic DNA. Polymorphism was detected in their hybridization banding pattern, relative to the wild type grown in the absence of chlorate, using various enzymes with or without restriction sites within the retrotransposon. Results were consistent with the assumption that three different events had occurred in strain Co60: genomic amplification of skippy yielding tandem arrays of the element, generation of new skippy sequences, and deletion of skippy sequences. Amplification of Co60 genomic DNA using the polymerase chain reaction and divergent primers derived from the retrotransposon generated a new band, corresponding to one long terminal repeat plus flanking sequences, that was not present in the wild-type strain. Molecular analysis of nitrate reductase-deficient mutants showed that generation and deletion of skippy sequences, but not genomic amplification in tandem repeats, had occurred in their genomes.

  13. Complete genome sequence of Thermosphaera aggregans type strain (M11TL).

    PubMed

    Spring, Stefan; Rachel, Reinhard; Lapidus, Alla; Davenport, Karen; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Nolan, Matt; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia C; Brettin, Thomas; Detter, John C; Tapia, Roxanne; Han, Cliff; Heimerl, Thomas; Weikl, Fabian; Brambilla, Evelyne; Göker, Markus; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-06-15

    Thermosphaera aggregans Huber et al. 1998 is the type species of the genus Thermosphaera, which comprises at the time of writing only one species. This species represents archaea with a hyperthermophilic, heterotrophic, strictly anaerobic and fermentative phenotype. The type strain M11TL(T) was isolated from a water-sediment sample of a hot terrestrial spring (Obsidian Pool, Yellowstone National Park, Wyoming). Here we describe the features of this organism, together with the complete genome sequence and annotation. The 1,316,595 bp long single replicon genome with its 1,410 protein-coding and 47 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  14. Complete genome sequence of Intrasporangium calvum type strain (7 KIPT)

    PubMed Central

    Del Rio, Tijana Glavina; Chertkov, Olga; Yasawong, Montri; Lucas, Susan; Deshpande, Shweta; Cheng, Jan-Fang; Detter, Chris; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Rohde, Manfred; Pukall, Rüdiger; Sikorski, Johannes; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Lapidus, Alla

    2010-01-01

    Intrasporangium calvum Kalakoutskii et al. 1967 is the type species of the genus Intrasporangium, which belongs to the actinobacterial family Intrasporangiaceae. The species is a Gram-positive bacterium that forms a branching mycelium, which tends to break into irregular fragments. The mycelium of this strain may bear intercalary vesicles but does not contain spores. The strain described in this study is an airborne organism that was isolated from a school dining room in 1967. One particularly interesting feature of I. calvum is that the type of its menaquinone is different from all other representatives of the family Intrasporangiaceae. This is the first completed genome sequence from a member of the genus Intrasporangium and also the first sequence from the family Intrasporangiaceae. The 4,024,382 bp long genome with its 3,653 protein-coding and 57 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304734

  15. How Primary Care Providers Talk to Patients about Genome Sequencing Results: Risk, Rationale, and Recommendation.

    PubMed

    Vassy, Jason L; Davis, J Kelly; Kirby, Christine; Richardson, Ian J; Green, Robert C; McGuire, Amy L; Ubel, Peter A

    2018-06-01

    Genomics will play an increasingly prominent role in clinical medicine. To describe how primary care physicians (PCPs) discuss and make clinical recommendations about genome sequencing results. Qualitative analysis. PCPs and their generally healthy patients undergoing genome sequencing. Patients received clinical genome reports that included four categories of results: monogenic disease risk variants (if present), carrier status, five pharmacogenetics results, and polygenic risk estimates for eight cardiometabolic traits. Patients' office visits with their PCPs were audio-recorded, and summative content analysis was used to describe how PCPs discussed genomic results. For each genomic result discussed in 48 PCP-patient visits, we identified a "take-home" message (recommendation), categorized as continuing current management, further treatment, further evaluation, behavior change, remembering for future care, or sharing with family members. We analyzed how PCPs came to each recommendation by identifying 1) how they described the risk or importance of the given result and 2) the rationale they gave for translating that risk into a specific recommendation. Quantitative analysis showed that continuing current management was the most commonly coded recommendation across results overall (492/749, 66%) and for each individual result type except monogenic disease risk results. Pharmacogenetics was the most common result type to prompt a recommendation to remember for future care (94/119, 79%); carrier status was the most common type prompting a recommendation to share with family members (45/54, 83%); and polygenic results were the most common type prompting a behavior change recommendation (55/58, 95%). One-fifth of recommendation codes associated with monogenic results were for further evaluation (6/24, 25%). Rationales for these recommendations included patient context, family context, and scientific/clinical limitations of sequencing. PCPs distinguish substantive differences among categories of genome sequencing results and use clinical judgment to justify continuing current management in generally healthy patients with genomic results.

  16. Complete genome sequence of the filamentous gliding predatory bacterium Herpetosiphon aurantiacus type strain (114-95T)

    PubMed Central

    Kiss, Hajnalka; Nett, Markus; Domin, Nicole; Martin, Karin; Maresca, Julia A.; Copeland, Alex; Lapidus, Alla; Lucas, Susan; Berry, Kerrie W.; Glavina Del Rio, Tijana; Dalin, Eileen; Tice, Hope; Pitluck, Sam; Richardson, Paul; Bruce, David; Goodwin, Lynne; Han, Cliff; Detter, John C.; Schmutz, Jeremy; Brettin, Thomas; Land, Miriam; Hauser, Loren; Kyrpides, Nikos C.; Ivanova, Natalia; Göker, Markus; Woyke, Tanja; Klenk, Hans-Peter; Bryant, Donald A.

    2011-01-01

    Herpetosiphon aurantiacus Holt and Lewin 1968 is the type species of the genus Herpetosiphon, which in turn is the type genus of the family Herpetosiphonaceae, type family of the order Herpetosiphonales in the phylum Chloroflexi. H. aurantiacus cells are organized in filaments which can rapidly glide. The species is of interest not only because of its rather isolated position in the tree of life, but also because Herpetosiphon ssp. were identified as predators capable of facultative predation by a wolf pack strategy and of degrading the prey organisms by excreted hydrolytic enzymes. The genome of H. aurantiacus strain 114-95T is the first completely sequenced genome of a member of the family Herpetosiphonaceae. The 6,346,587 bp long chromosome and the two 339,639 bp and 99,204 bp long plasmids with a total of 5,577 protein-coding and 77 RNA genes was sequenced as part of the DOE Joint Genome Institute Program DOEM 2005. PMID:22675585

  17. Full genome sequence analysis of a novel adenovirus of rhesus macaque origin indicates a new simian adenovirus type and species.

    PubMed

    Malouli, Daniel; Howell, Grant L; Legasse, Alfred W; Kahl, Christoph; Axthelm, Michael K; Hansen, Scott G; Früh, Klaus

    2014-09-01

    Multiple novel simian adenoviruses have been isolated over the past years and their potential to cross the species barrier and infect the human population is an ever present threat. Here we describe the isolation and full genome sequencing of a novel simian adenovirus (SAdV) isolated from the urine of two independent, never co-housed, late stage simian immunodeficiency virus (SIV)-infected rhesus macaques. The viral genome sequences revealed a novel type with a unique genome length, GC content, E3 region and DNA polymerase amino acid sequence that is sufficiently distinct from all currently known human- or simian adenovirus species to warrant classifying these isolates as a novel species of simian adenovirus. This new species, termed Simian mastadenovirus D (SAdV-D), displays the standard genome organization for the genus Mastadenovirus containing only one copy of the fiber gene which sets it apart from the old world monkey adenovirus species HAdV-G, SAdV-B and SAdV-C.

  18. Complete genome sequence of the aerobic, heterotroph Marinithermus hydrothermalis type strain (T1T) from a deep-sea hydrothermal vent chimney

    PubMed Central

    Copeland, Alex; Gu, Wei; Yasawong, Montri; Lapidus, Alla; Lucas, Susan; Deshpande, Shweta; Pagani, Ioanna; Tapia, Roxanne; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Pan, Chongle; Brambilla, Evelyne-Marie; Rohde, Manfred; Tindall, Brian J.; Sikorski, Johannes; Göker, Markus; Detter, John C.; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Woyke, Tanja

    2012-01-01

    Marinithermus hydrothermalis Sako et al. 2003 is the type species of the monotypic genus Marinithermus. M. hydrothermalis T1T was the first isolate within the phylum “Thermus-Deinococcus” to exhibit optimal growth under a salinity equivalent to that of sea water and to have an absolute requirement for NaCl for growth. M. hydrothermalis T1T is of interest because it may provide a new insight into the ecological significance of the aerobic, thermophilic decomposers in the circulation of organic compounds in deep-sea hydrothermal vent ecosystems. This is the first completed genome sequence of a member of the genus Marinithermus and the seventh sequence from the family Thermaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,269,167 bp long genome with its 2,251 protein-coding and 59 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:22675595

  19. Complete genome sequence of Actinosynnema mirum type strain (101T)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Land, Miriam; Lapidus, Alla; Mayilraj, Shanmugam

    2009-05-20

    Actinosynnema mirum Hasegawa et al. 1978 is the type species of the genus, and is of phylogenetic interest because of its central phylogenetic location in the Actino-synnemataceae, a rapidly growing family within the actinobacterial suborder Pseudo-nocardineae. A. mirum is characterized by its motile spores borne on synnemata and as a producer of nocardicin antibiotics. It is capable of growing aerobically and under a moderate CO2 atmosphere. The strain is a Gram-positive, aerial and substrate mycelium producing bacterium, originally isolated from a grass blade collected from the Raritan River, New Jersey. Here we describe the features of this organism, together withmore » the complete genome sequence and annotation. This is the first complete genome sequence of a member of the family Actinosynnemataceae, and only the second sequence from the actinobacterial suborder Pseudonocardineae. The 8,248,144 bp long single replicon genome with its 7100 protein-coding and 77 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  20. UCSC genome browser: deep support for molecular biomedical research.

    PubMed

    Mangan, Mary E; Williams, Jennifer M; Lathe, Scott M; Karolchik, Donna; Lathe, Warren C

    2008-01-01

    The volume and complexity of genomic sequence data, and the additional experimental data required for annotation of the genomic context, pose a major challenge for display and access for biomedical researchers. Genome browsers organize this data and make it available in various ways to extract useful information to advance research projects. The UCSC Genome Browser is one of these resources. The official sequence data for a given species forms the framework to display many other types of data such as expression, variation, cross-species comparisons, and more. Visual representations of the data are available for exploration. Data can be queried with sequences. Complex database queries are also easily achieved with the Table Browser interface. Associated tools permit additional query types or access to additional data sources such as images of in situ localizations. Support for solving researcher's issues is provided with active discussion mailing lists and by providing updated training materials. The UCSC Genome Browser provides a source of deep support for a wide range of biomedical molecular research (http://genome.ucsc.edu).

  1. Typing and comparative genome analysis of Brucella melitensis isolated from Lebanon.

    PubMed

    Abou Zaki, Natalia; Salloum, Tamara; Osman, Marwan; Rafei, Rayane; Hamze, Monzer; Tokajian, Sima

    2017-10-16

    Brucella melitensis is the main causative agent of the zoonotic disease brucellosis. This study aimed at typing and characterizing genetic variation in 33 Brucella isolates recovered from patients in Lebanon. Bruce-ladder multiplex PCR and PCR-RFLP of omp31, omp2a and omp2b were performed. Sixteen representative isolates were chosen for draft-genome sequencing and analyzed to determine variations in virulence, resistance, genomic islands, prophages and insertion sequences. Comparative whole-genome single nucleotide polymorphism analysis was also performed. The isolates were confirmed to be B. melitensis. Genome analysis revealed multiple virulence determinants and efflux pumps. Genome comparisons and single nucleotide polymorphisms divided the isolates based on geographical distribution but revealed high levels of similarity between the strains. Sequence divergence in B. melitensis was mainly due to lateral gene transfer of mobile elements. This is the first report of an in-depth genomic characterization of B. melitensis in Lebanon. © FEMS 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  2. DNA methylation at hepatitis B viral integrants is associated with methylation at flanking human genomic sequences

    PubMed Central

    Watanabe, Yoshiyuki; Yamamoto, Hiroyuki; Oikawa, Ritsuko; Toyota, Minoru; Yamamoto, Masakazu; Kokudo, Norihiro; Tanaka, Shinji; Arii, Shigeki; Yotsuyanagi, Hiroshi; Koike, Kazuhiko; Itoh, Fumio

    2015-01-01

    Integration of DNA viruses into the human genome plays an important role in various types of tumors, including hepatitis B virus (HBV)–related hepatocellular carcinoma. However, the molecular details and clinical impact of HBV integration on either human or HBV epigenomes are unknown. Here, we show that methylation of the integrated HBV DNA is related to the methylation status of the flanking human genome. We developed a next-generation sequencing-based method for structural methylation analysis of integrated viral genomes (denoted G-NaVI). This method is a novel approach that enables enrichment of viral fragments for sequencing using unique baits based on the sequence of the HBV genome. We detected integrated HBV sequences in the genome of the PLC/PRF/5 cell line and found variable levels of methylation within the integrated HBV genomes. Allele-specific methylation analysis revealed that the HBV genome often became significantly methylated when integrated into highly methylated host sites. After integration into unmethylated human genome regions such as promoters, however, the HBV DNA remains unmethylated and may eventually play an important role in tumorigenesis. The observed dynamic changes in DNA methylation of the host and viral genomes may functionally affect the biological behavior of HBV. These findings may impact public health given that millions of people worldwide are carriers of HBV. We also believe our assay will be a powerful tool to increase our understanding of the various types of DNA virus-associated tumorigenesis. PMID:25653310

  3. Genome sequencing identifies Listeria fleischmannii subsp. coloradonensis subsp. nov., isolated from a ranch.

    PubMed

    den Bakker, Henk C; Manuel, Clyde S; Fortes, Esther D; Wiedmann, Martin; Nightingale, Kendra K

    2013-09-01

    Twenty Listeria-like isolates were obtained from environmental samples collected on a cattle ranch in northern Colorado; all of these isolates were found to share an identical partial sigB sequence, suggesting close relatedness. The isolates were similar to members of the genus Listeria in that they were Gram-stain-positive, short rods, oxidase-negative and catalase-positive; the isolates were similar to Listeria fleischmannii because they were non-motile at 25 °C. 16S rRNA gene sequencing for representative isolates and whole genome sequencing for one isolate was performed. The genome of the type strain of Listeria fleischmannii (strain LU2006-1(T)) was also sequenced. The draft genomes were very similar in size and the average MUMmer nucleotide identity across 91% of the genomes was 95.16%. Genome sequence data were used to design primers for a six-gene multi-locus sequence analysis (MLSA) scheme. Phylogenies based on (i) the near-complete 16S rRNA gene, (ii) 31 core genes and (iii) six housekeeping genes illustrated the close relationship of these Listeria-like isolates to Listeria fleischmannii LU2006-1(T). Sufficient genetic divergence of the Listeria-like isolates from the type strain of Listeria fleischmannii and differing phenotypic characteristics warrant these isolates to be classified as members of a distinct infraspecific taxon, for which the name Listeria fleischmannii subsp. coloradonensis subsp. nov. is proposed. The type strain is TTU M1-001(T) ( =BAA-2414(T) =DSM 25391(T)). The isolates of Listeria fleischmannii subsp. coloradonensis subsp. nov. differ from the nominate subspecies by the inability to utilize melezitose, turanose and sucrose, and the ability to utilize inositol. The results also demonstrate the utility of whole genome sequencing to facilitate identification of novel taxa within a well-described genus. The genomes of both subspecies of Listeria fleischmannii contained putative enhancin genes; the Listeria fleischmannii subsp. coloradonensis subsp. nov. genome also encoded a putative mosquitocidal toxin. The presence of these genes suggests possible adaptation to an insect host, and further studies are needed to probe niche adaptation of Listeria fleischmannii.

  4. The Applied Development of a Tiered Multilocus Sequence Typing (MLST) Scheme for Dichelobacter nodosus.

    PubMed

    Blanchard, Adam M; Jolley, Keith A; Maiden, Martin C J; Coffey, Tracey J; Maboni, Grazieli; Staley, Ceri E; Bollard, Nicola J; Warry, Andrew; Emes, Richard D; Davies, Peers L; Tötemeyer, Sabine

    2018-01-01

    Dichelobacter nodosus ( D. nodosus ) is the causative pathogen of ovine footrot, a disease that has a significant welfare and financial impact on the global sheep industry. Previous studies into the phylogenetics of D. nodosus have focused on Australia and Scandinavia, meaning the current diversity in the United Kingdom (U.K.) population and its relationship globally, is poorly understood. Numerous epidemiological methods are available for bacterial typing; however, few account for whole genome diversity or provide the opportunity for future application of new computational techniques. Multilocus sequence typing (MLST) measures nucleotide variations within several loci with slow accumulation of variation to enable the designation of allele numbers to determine a sequence type. The usage of whole genome sequence data enables the application of MLST, but also core and whole genome MLST for higher levels of strain discrimination with a negligible increase in experimental cost. An MLST database was developed alongside a seven loci scheme using publically available whole genome data from the sequence read archive. Sequence type designation and strain discrimination was compared to previously published data to ensure reproducibility. Multiple D. nodosus isolates from U.K. farms were directly compared to populations from other countries. The U.K. isolates define new clades within the global population of D. nodosus and predominantly consist of serogroups A, B and H, however serogroups C, D, E, and I were also found. The scheme is publically available at https://pubmlst.org/dnodosus/.

  5. Complete genome sequence of Hippea maritima type strain (MH2T)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huntemann, Marcel; Lu, Megan; Nolan, Matt

    2011-01-01

    Hippea maritima (Miroshnichenko et al. 1999) is the type species of the genus Hippea, which belongs to the family Desulfurellaceae within the class Deltaproteobacteria. The anaerobic, moderately thermophilic marine sulfur-reducer was first isolated from shallow-water hot vents in Matipur Harbor, Papua New Guinea. H. maritima was of interest for genome se- quencing because of its isolated phylogenetic location, as a distant next neighbor of the ge- nus Desulfurella. Strain MH2T is the first type strain from the order Desulfurellales with a com- pletely sequenced genome. The 1,694,430 bp long linear genome with its 1,723 protein- coding and 57 RNA genesmore » consists of one circular chromosome and is a part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  6. Methyltransferases acquired by lactococcal 936-type phage provide protection against restriction endonuclease activity.

    PubMed

    Murphy, James; Klumpp, Jochen; Mahony, Jennifer; O'Connell-Motherway, Mary; Nauta, Arjen; van Sinderen, Douwe

    2014-10-01

    So-called 936-type phages are among the most frequently isolated phages in dairy facilities utilising Lactococcus lactis starter cultures. Despite extensive efforts to control phage proliferation and decades of research, these phages continue to negatively impact cheese production in terms of the final product quality and consequently, monetary return. Whole genome sequencing and in silico analysis of three 936-type phage genomes identified several putative (orphan) methyltransferase (MTase)-encoding genes located within the packaging and replication regions of the genome. Utilising SMRT sequencing, methylome analysis was performed on all three phages, allowing the identification of adenine modifications consistent with N-6 methyladenine sequence methylation, which in some cases could be attributed to these phage-encoded MTases. Heterologous gene expression revealed that M.Phi145I/M.Phi93I and M.Phi93DAM, encoded by genes located within the packaging module, provide protection against the restriction enzymes HphI and DpnII, respectively, representing the first functional MTases identified in members of 936-type phages. SMRT sequencing technology enabled the identification of the target motifs of MTases encoded by the genomes of three lytic 936-type phages and these MTases represent the first functional MTases identified in this species of phage. The presence of these MTase-encoding genes on 936-type phage genomes is assumed to represent an adaptive response to circumvent host encoded restriction-modification systems thereby increasing the fitness of the phages in a dynamic dairy environment.

  7. Microbial genomic taxonomy

    PubMed Central

    2013-01-01

    A need for a genomic species definition is emerging from several independent studies worldwide. In this commentary paper, we discuss recent studies on the genomic taxonomy of diverse microbial groups and a unified species definition based on genomics. Accordingly, strains from the same microbial species share >95% Average Amino Acid Identity (AAI) and Average Nucleotide Identity (ANI), >95% identity based on multiple alignment genes, <10 in Karlin genomic signature, and > 70% in silico Genome-to-Genome Hybridization similarity (GGDH). Species of the same genus will form monophyletic groups on the basis of 16S rRNA gene sequences, Multilocus Sequence Analysis (MLSA) and supertree analysis. In addition to the established requirements for species descriptions, we propose that new taxa descriptions should also include at least a draft genome sequence of the type strain in order to obtain a clear outlook on the genomic landscape of the novel microbe. The application of the new genomic species definition put forward here will allow researchers to use genome sequences to define simultaneously coherent phenotypic and genomic groups. PMID:24365132

  8. Non contiguous-finished genome sequence and description of Senegalemassilia anaerobia gen. nov., sp. nov.

    PubMed Central

    Lagier, Jean-Christophe; Elkarkouri, Khalid; Rivet, Romain; Couderc, Carine; Raoult, Didier; Fournier, Pierre-Edouard

    2013-01-01

    Senegalemassilia anaerobia strain JC110T sp.nov. is the type strain of Senegalemassilia anaerobia gen. nov., sp. nov., the type species of a new genus within the Coriobacteriaceae family, Senegalemassilia gen. nov. This strain, whose genome is described here, was isolated from the fecal flora of a healthy Senegalese patient. S. anaerobia is a Gram-positive anaerobic coccobacillus. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,383,131 bp long genome contains 1,932 protein-coding and 58 RNA genes. PMID:24019984

  9. Complete Genome Sequence of Rahnella aquatilis CIP 78.65

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Martinez, Robert J; Bruce, David; Detter, J C

    2012-01-01

    Rahnella aquatilis CIP 78.65 is a gammaproteobacterium isolated from a drinking water source in Lille, France. Here we report the complete genome sequence of Rahnella aquatilis CIP 78.65, the type strain of R. aquatilis.

  10. Complete genome sequence of Rahnella aquatilis CIP 78.65.

    PubMed

    Martinez, Robert J; Bruce, David; Detter, Chris; Goodwin, Lynne A; Han, James; Han, Cliff S; Held, Brittany; Land, Miriam L; Mikhailova, Natalia; Nolan, Matt; Pennacchio, Len; Pitluck, Sam; Tapia, Roxanne; Woyke, Tanja; Sobecky, Patricia A

    2012-06-01

    Rahnella aquatilis CIP 78.65 is a gammaproteobacterium isolated from a drinking water source in Lille, France. Here we report the complete genome sequence of Rahnella aquatilis CIP 78.65, the type strain of R. aquatilis.

  11. Genome sequence of the exopolysaccharide-producing Salipiger mucosus type strain (DSM 16094(T)), a moderately halophilic member of the Roseobacter clade.

    PubMed

    Riedel, Thomas; Spring, Stefan; Fiebig, Anne; Petersen, Jörn; Kyrpides, Nikos C; Göker, Markus; Klenk, Hans-Peter

    2014-06-15

    Salipiger mucosus Martínez-Cànovas et al. 2004 is the type species of the genus Salipiger, a moderately halophilic and exopolysaccharide-producing representative of the Roseobacter lineage within the alphaproteobacterial family Rhodobacteraceae. Members of this family were shown to be the most abundant bacteria especially in coastal and polar waters, but were also found in microbial mats and sediments. Here we describe the features of the S. mucosus strain DSM 16094(T) together with its genome sequence and annotation. The 5,689,389-bp genome sequence consists of one chromosome and several extrachromosomal elements. It contains 5,650 protein-coding genes and 95 RNA genes. The genome of S. mucosus DSM 16094(T) was sequenced as part of the activities of the Transregional Collaborative Research Center 51 (TRR51) funded by the German Research Foundation (DFG).

  12. Analysis of the genome-wide variations among multiple strains of the plant pathogenic bacterium Xylella fastidiosa

    PubMed Central

    Doddapaneni, Harshavardhan; Yao, Jiqiang; Lin, Hong; Walker, M Andrew; Civerolo, Edwin L

    2006-01-01

    Background The Gram-negative, xylem-limited phytopathogenic bacterium Xylella fastidiosa is responsible for causing economically important diseases in grapevine, citrus and many other plant species. Despite its economic impact, relatively little is known about the genomic variations among strains isolated from different hosts and their influence on the population genetics of this pathogen. With the availability of genome sequence information for four strains, it is now possible to perform genome-wide analyses to identify and categorize such DNA variations and to understand their influence on strain functional divergence. Results There are 1,579 genes and 194 non-coding homologous sequences present in the genomes of all four strains, representing a 76. 2% conservation of the sequenced genome. About 60% of the X. fastidiosa unique sequences exist as tandem gene clusters of 6 or more genes. Multiple alignments identified 12,754 SNPs and 14,449 INDELs in the 1528 common genes and 20,779 SNPs and 10,075 INDELs in the 194 non-coding sequences. The average SNP frequency was 1.08 × 10-2 per base pair of DNA and the average INDEL frequency was 2.06 × 10-2 per base pair of DNA. On an average, 60.33% of the SNPs were synonymous type while 39.67% were non-synonymous type. The mutation frequency, primarily in the form of external INDELs was the main type of sequence variation. The relative similarity between the strains was discussed according to the INDEL and SNP differences. The number of genes unique to each strain were 60 (9a5c), 54 (Dixon), 83 (Ann1) and 9 (Temecula-1). A sub-set of the strain specific genes showed significant differences in terms of their codon usage and GC composition from the native genes suggesting their xenologous origin. Tandem repeat analysis of the genomic sequences of the four strains identified associations of repeat sequences with hypothetical and phage related functions. Conclusion INDELs and strain specific genes have been identified as the main source of variations among strains, with individual strains showing different rates of genome evolution. Based on these genome comparisons, it appears that the Pierce's disease strain Temecula-1 genome represents the ancestral genome of the X. fastidiosa. Results of this analysis are publicly available in the form of a web database. PMID:16948851

  13. Phenotypic H-Antigen Typing by Mass Spectrometry Combined with Genetic Typing of H Antigens, O Antigens, and Toxins by Whole-Genome Sequencing Enhances Identification of Escherichia coli Isolates.

    PubMed

    Cheng, Keding; Chui, Huixia; Domish, Larissa; Sloan, Angela; Hernandez, Drexler; McCorrister, Stuart; Robinson, Alyssia; Walker, Matthew; Peterson, Lorea A M; Majcher, Miles; Ratnam, Sam; Haldane, David J M; Bekal, Sadjia; Wylie, John; Chui, Linda; Tyler, Shaun; Xu, Bianli; Reimer, Aleisha; Nadon, Celine; Knox, J David; Wang, Gehua

    2016-08-01

    Mass spectrometry-based phenotypic H-antigen typing (MS-H) combined with whole-genome-sequencing-based genetic identification of H antigens, O antigens, and toxins (WGS-HOT) was used to type 60 clinical Escherichia coli isolates, 43 of which were previously identified as nonmotile, H type undetermined, or O rough by serotyping or having shown discordant MS-H and serotyping results. Whole-genome sequencing confirmed that MS-H was able to provide more accurate data regarding H antigen expression than serotyping. Further, enhanced and more confident O antigen identification resulted from gene cluster based typing in combination with conventional typing based on the gene pair comprising wzx and wzy and that comprising wzm and wzt The O antigen was identified in 94.6% of the isolates when the two genetic O typing approaches (gene pair and gene cluster) were used in conjunction, in comparison to 78.6% when the gene pair database was used alone. In addition, 98.2% of the isolates showed the existence of genes for various toxins and/or virulence factors, among which verotoxins (Shiga toxin 1 and/or Shiga toxin 2) were 100% concordant with conventional PCR based testing results. With more applications of mass spectrometry and whole-genome sequencing in clinical microbiology laboratories, this combined phenotypic and genetic typing platform (MS-H plus WGS-HOT) should be ideal for pathogenic E. coli typing. Copyright © 2016 Cheng et al.

  14. Molecular characterization of the complete genome of falconid herpesvirus strain S-18

    USDA-ARS?s Scientific Manuscript database

    Falconid herpesvirus type 1 (FHV-1) is the causative agent of falcon inclusion body disease, an acute, highly contagious disease of raptors. The complete nucleotide sequence of the genome of FHV-1 has been determined. The genome is arranged as a D-type genome with large inverted repeats flanking a ...

  15. Complete genome sequence of the haloalkaliphilic, obligately chemolithoautotrophic thiosulfate and sulfide-oxidizing γ-proteobacterium Thioalkalimicrobium cyclicum type strain ALM 1 (DSM 14477 T)

    DOE PAGES

    Kappler, Ulrike; Davenport, Karen W.; Beatson, Scott; ...

    2016-06-03

    Thioalkalimicrobium cyclicum (Sorokin et al. 2002) is a member of the family Piscirickettsiaceae in the order Thiotrichales. The -proteobacterium belongs to the colourless sulfur-oxidizing bacteria isolated from saline soda lakes with stable alkaline pH, such as Lake Mono (California) and Soap Lake (Washington State). Strain ALM 1 T is characterized by its adaptation to life in the oxic/anoxic interface towards the less saline aerobic waters (mixolimnion) of the stable stratified alkaline salt lakes. Strain ALM 1 T is the first representative of the genus Thioalkalimicrobium whose genome sequence has been deciphered and the fourth genome sequence of a type strainmore » of the Piscirickettsiaceae to be published. As a result, the 1,932,455 bp long chromosome with its 1,684 protein-coding and 50 RNA genes was sequenced as part of the DOE Joint Genome Institute Community Sequencing Program (CSP) 2008.« less

  16. Complete genome sequence of the haloalkaliphilic, obligately chemolithoautotrophic thiosulfate and sulfide-oxidizing γ-proteobacterium Thioalkalimicrobium cyclicum type strain ALM 1 (DSM 14477 T)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kappler, Ulrike; Davenport, Karen W.; Beatson, Scott

    Thioalkalimicrobium cyclicum (Sorokin et al. 2002) is a member of the family Piscirickettsiaceae in the order Thiotrichales. The -proteobacterium belongs to the colourless sulfur-oxidizing bacteria isolated from saline soda lakes with stable alkaline pH, such as Lake Mono (California) and Soap Lake (Washington State). Strain ALM 1 T is characterized by its adaptation to life in the oxic/anoxic interface towards the less saline aerobic waters (mixolimnion) of the stable stratified alkaline salt lakes. Strain ALM 1 T is the first representative of the genus Thioalkalimicrobium whose genome sequence has been deciphered and the fourth genome sequence of a type strainmore » of the Piscirickettsiaceae to be published. As a result, the 1,932,455 bp long chromosome with its 1,684 protein-coding and 50 RNA genes was sequenced as part of the DOE Joint Genome Institute Community Sequencing Program (CSP) 2008.« less

  17. The genetic architecture of type 2 diabetes.

    PubMed

    Fuchsberger, Christian; Flannick, Jason; Teslovich, Tanya M; Mahajan, Anubha; Agarwala, Vineeta; Gaulton, Kyle J; Ma, Clement; Fontanillas, Pierre; Moutsianas, Loukas; McCarthy, Davis J; Rivas, Manuel A; Perry, John R B; Sim, Xueling; Blackwell, Thomas W; Robertson, Neil R; Rayner, N William; Cingolani, Pablo; Locke, Adam E; Tajes, Juan Fernandez; Highland, Heather M; Dupuis, Josee; Chines, Peter S; Lindgren, Cecilia M; Hartl, Christopher; Jackson, Anne U; Chen, Han; Huyghe, Jeroen R; van de Bunt, Martijn; Pearson, Richard D; Kumar, Ashish; Müller-Nurasyid, Martina; Grarup, Niels; Stringham, Heather M; Gamazon, Eric R; Lee, Jaehoon; Chen, Yuhui; Scott, Robert A; Below, Jennifer E; Chen, Peng; Huang, Jinyan; Go, Min Jin; Stitzel, Michael L; Pasko, Dorota; Parker, Stephen C J; Varga, Tibor V; Green, Todd; Beer, Nicola L; Day-Williams, Aaron G; Ferreira, Teresa; Fingerlin, Tasha; Horikoshi, Momoko; Hu, Cheng; Huh, Iksoo; Ikram, Mohammad Kamran; Kim, Bong-Jo; Kim, Yongkang; Kim, Young Jin; Kwon, Min-Seok; Lee, Juyoung; Lee, Selyeong; Lin, Keng-Han; Maxwell, Taylor J; Nagai, Yoshihiko; Wang, Xu; Welch, Ryan P; Yoon, Joon; Zhang, Weihua; Barzilai, Nir; Voight, Benjamin F; Han, Bok-Ghee; Jenkinson, Christopher P; Kuulasmaa, Teemu; Kuusisto, Johanna; Manning, Alisa; Ng, Maggie C Y; Palmer, Nicholette D; Balkau, Beverley; Stančáková, Alena; Abboud, Hanna E; Boeing, Heiner; Giedraitis, Vilmantas; Prabhakaran, Dorairaj; Gottesman, Omri; Scott, James; Carey, Jason; Kwan, Phoenix; Grant, George; Smith, Joshua D; Neale, Benjamin M; Purcell, Shaun; Butterworth, Adam S; Howson, Joanna M M; Lee, Heung Man; Lu, Yingchang; Kwak, Soo-Heon; Zhao, Wei; Danesh, John; Lam, Vincent K L; Park, Kyong Soo; Saleheen, Danish; So, Wing Yee; Tam, Claudia H T; Afzal, Uzma; Aguilar, David; Arya, Rector; Aung, Tin; Chan, Edmund; Navarro, Carmen; Cheng, Ching-Yu; Palli, Domenico; Correa, Adolfo; Curran, Joanne E; Rybin, Denis; Farook, Vidya S; Fowler, Sharon P; Freedman, Barry I; Griswold, Michael; Hale, Daniel Esten; Hicks, Pamela J; Khor, Chiea-Chuen; Kumar, Satish; Lehne, Benjamin; Thuillier, Dorothée; Lim, Wei Yen; Liu, Jianjun; van der Schouw, Yvonne T; Loh, Marie; Musani, Solomon K; Puppala, Sobha; Scott, William R; Yengo, Loïc; Tan, Sian-Tsung; Taylor, Herman A; Thameem, Farook; Wilson, Gregory; Wong, Tien Yin; Njølstad, Pål Rasmus; Levy, Jonathan C; Mangino, Massimo; Bonnycastle, Lori L; Schwarzmayr, Thomas; Fadista, João; Surdulescu, Gabriela L; Herder, Christian; Groves, Christopher J; Wieland, Thomas; Bork-Jensen, Jette; Brandslund, Ivan; Christensen, Cramer; Koistinen, Heikki A; Doney, Alex S F; Kinnunen, Leena; Esko, Tõnu; Farmer, Andrew J; Hakaste, Liisa; Hodgkiss, Dylan; Kravic, Jasmina; Lyssenko, Valeriya; Hollensted, Mette; Jørgensen, Marit E; Jørgensen, Torben; Ladenvall, Claes; Justesen, Johanne Marie; Käräjämäki, Annemari; Kriebel, Jennifer; Rathmann, Wolfgang; Lannfelt, Lars; Lauritzen, Torsten; Narisu, Narisu; Linneberg, Allan; Melander, Olle; Milani, Lili; Neville, Matt; Orho-Melander, Marju; Qi, Lu; Qi, Qibin; Roden, Michael; Rolandsson, Olov; Swift, Amy; Rosengren, Anders H; Stirrups, Kathleen; Wood, Andrew R; Mihailov, Evelin; Blancher, Christine; Carneiro, Mauricio O; Maguire, Jared; Poplin, Ryan; Shakir, Khalid; Fennell, Timothy; DePristo, Mark; de Angelis, Martin Hrabé; Deloukas, Panos; Gjesing, Anette P; Jun, Goo; Nilsson, Peter; Murphy, Jacquelyn; Onofrio, Robert; Thorand, Barbara; Hansen, Torben; Meisinger, Christa; Hu, Frank B; Isomaa, Bo; Karpe, Fredrik; Liang, Liming; Peters, Annette; Huth, Cornelia; O'Rahilly, Stephen P; Palmer, Colin N A; Pedersen, Oluf; Rauramaa, Rainer; Tuomilehto, Jaakko; Salomaa, Veikko; Watanabe, Richard M; Syvänen, Ann-Christine; Bergman, Richard N; Bharadwaj, Dwaipayan; Bottinger, Erwin P; Cho, Yoon Shin; Chandak, Giriraj R; Chan, Juliana C N; Chia, Kee Seng; Daly, Mark J; Ebrahim, Shah B; Langenberg, Claudia; Elliott, Paul; Jablonski, Kathleen A; Lehman, Donna M; Jia, Weiping; Ma, Ronald C W; Pollin, Toni I; Sandhu, Manjinder; Tandon, Nikhil; Froguel, Philippe; Barroso, Inês; Teo, Yik Ying; Zeggini, Eleftheria; Loos, Ruth J F; Small, Kerrin S; Ried, Janina S; DeFronzo, Ralph A; Grallert, Harald; Glaser, Benjamin; Metspalu, Andres; Wareham, Nicholas J; Walker, Mark; Banks, Eric; Gieger, Christian; Ingelsson, Erik; Im, Hae Kyung; Illig, Thomas; Franks, Paul W; Buck, Gemma; Trakalo, Joseph; Buck, David; Prokopenko, Inga; Mägi, Reedik; Lind, Lars; Farjoun, Yossi; Owen, Katharine R; Gloyn, Anna L; Strauch, Konstantin; Tuomi, Tiinamaija; Kooner, Jaspal Singh; Lee, Jong-Young; Park, Taesung; Donnelly, Peter; Morris, Andrew D; Hattersley, Andrew T; Bowden, Donald W; Collins, Francis S; Atzmon, Gil; Chambers, John C; Spector, Timothy D; Laakso, Markku; Strom, Tim M; Bell, Graeme I; Blangero, John; Duggirala, Ravindranath; Tai, E Shyong; McVean, Gilean; Hanis, Craig L; Wilson, James G; Seielstad, Mark; Frayling, Timothy M; Meigs, James B; Cox, Nancy J; Sladek, Rob; Lander, Eric S; Gabriel, Stacey; Burtt, Noël P; Mohlke, Karen L; Meitinger, Thomas; Groop, Leif; Abecasis, Goncalo; Florez, Jose C; Scott, Laura J; Morris, Andrew P; Kang, Hyun Min; Boehnke, Michael; Altshuler, David; McCarthy, Mark I

    2016-08-04

    The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes.

  18. The genetic architecture of type 2 diabetes

    PubMed Central

    Ma, Clement; Fontanillas, Pierre; Moutsianas, Loukas; McCarthy, Davis J; Rivas, Manuel A; Perry, John R B; Sim, Xueling; Blackwell, Thomas W; Robertson, Neil R; Rayner, N William; Cingolani, Pablo; Locke, Adam E; Tajes, Juan Fernandez; Highland, Heather M; Dupuis, Josee; Chines, Peter S; Lindgren, Cecilia M; Hartl, Christopher; Jackson, Anne U; Chen, Han; Huyghe, Jeroen R; van de Bunt, Martijn; Pearson, Richard D; Kumar, Ashish; Müller-Nurasyid, Martina; Grarup, Niels; Stringham, Heather M; Gamazon, Eric R; Lee, Jaehoon; Chen, Yuhui; Scott, Robert A; Below, Jennifer E; Chen, Peng; Huang, Jinyan; Go, Min Jin; Stitzel, Michael L; Pasko, Dorota; Parker, Stephen C J; Varga, Tibor V; Green, Todd; Beer, Nicola L; Day-Williams, Aaron G; Ferreira, Teresa; Fingerlin, Tasha; Horikoshi, Momoko; Hu, Cheng; Huh, Iksoo; Ikram, Mohammad Kamran; Kim, Bong-Jo; Kim, Yongkang; Kim, Young Jin; Kwon, Min-Seok; Lee, Juyoung; Lee, Selyeong; Lin, Keng-Han; Maxwell, Taylor J; Nagai, Yoshihiko; Wang, Xu; Welch, Ryan P; Yoon, Joon; Zhang, Weihua; Barzilai, Nir; Voight, Benjamin F; Han, Bok-Ghee; Jenkinson, Christopher P; Kuulasmaa, Teemu; Kuusisto, Johanna; Manning, Alisa; Ng, Maggie C Y; Palmer, Nicholette D; Balkau, Beverley; Stančáková, Alena; Abboud, Hanna E; Boeing, Heiner; Giedraitis, Vilmantas; Prabhakaran, Dorairaj; Gottesman, Omri; Scott, James; Carey, Jason; Kwan, Phoenix; Grant, George; Smith, Joshua D; Neale, Benjamin M; Purcell, Shaun; Butterworth, Adam S; Howson, Joanna M M; Lee, Heung Man; Lu, Yingchang; Kwak, Soo-Heon; Zhao, Wei; Danesh, John; Lam, Vincent K L; Park, Kyong Soo; Saleheen, Danish; So, Wing Yee; Tam, Claudia H T; Afzal, Uzma; Aguilar, David; Arya, Rector; Aung, Tin; Chan, Edmund; Navarro, Carmen; Cheng, Ching-Yu; Palli, Domenico; Correa, Adolfo; Curran, Joanne E; Rybin, Denis; Farook, Vidya S; Fowler, Sharon P; Freedman, Barry I; Griswold, Michael; Hale, Daniel Esten; Hicks, Pamela J; Khor, Chiea-Chuen; Kumar, Satish; Lehne, Benjamin; Thuillier, Dorothée; Lim, Wei Yen; Liu, Jianjun; van der Schouw, Yvonne T; Loh, Marie; Musani, Solomon K; Puppala, Sobha; Scott, William R; Yengo, Loïc; Tan, Sian-Tsung; Taylor, Herman A; Thameem, Farook; Wilson, Gregory; Wong, Tien Yin; Njølstad, Pål Rasmus; Levy, Jonathan C; Mangino, Massimo; Bonnycastle, Lori L; Schwarzmayr, Thomas; Fadista, João; Surdulescu, Gabriela L; Herder, Christian; Groves, Christopher J; Wieland, Thomas; Bork-Jensen, Jette; Brandslund, Ivan; Christensen, Cramer; Koistinen, Heikki A; Doney, Alex S F; Kinnunen, Leena; Esko, Tõnu; Farmer, Andrew J; Hakaste, Liisa; Hodgkiss, Dylan; Kravic, Jasmina; Lyssenko, Valeriya; Hollensted, Mette; Jørgensen, Marit E; Jørgensen, Torben; Ladenvall, Claes; Justesen, Johanne Marie; Käräjämäki, Annemari; Kriebel, Jennifer; Rathmann, Wolfgang; Lannfelt, Lars; Lauritzen, Torsten; Narisu, Narisu; Linneberg, Allan; Melander, Olle; Milani, Lili; Neville, Matt; Orho-Melander, Marju; Qi, Lu; Qi, Qibin; Roden, Michael; Rolandsson, Olov; Swift, Amy; Rosengren, Anders H; Stirrups, Kathleen; Wood, Andrew R; Mihailov, Evelin; Blancher, Christine; Carneiro, Mauricio O; Maguire, Jared; Poplin, Ryan; Shakir, Khalid; Fennell, Timothy; DePristo, Mark; de Angelis, Martin Hrabé; Deloukas, Panos; Gjesing, Anette P; Jun, Goo; Nilsson, Peter; Murphy, Jacquelyn; Onofrio, Robert; Thorand, Barbara; Hansen, Torben; Meisinger, Christa; Hu, Frank B; Isomaa, Bo; Karpe, Fredrik; Liang, Liming; Peters, Annette; Huth, Cornelia; O'Rahilly, Stephen P; Palmer, Colin N A; Pedersen, Oluf; Rauramaa, Rainer; Tuomilehto, Jaakko; Salomaa, Veikko; Watanabe, Richard M; Syvänen, Ann-Christine; Bergman, Richard N; Bharadwaj, Dwaipayan; Bottinger, Erwin P; Cho, Yoon Shin; Chandak, Giriraj R; Chan, Juliana C N; Chia, Kee Seng; Daly, Mark J; Ebrahim, Shah B; Langenberg, Claudia; Elliott, Paul; Jablonski, Kathleen A; Lehman, Donna M; Jia, Weiping; Ma, Ronald C W; Pollin, Toni I; Sandhu, Manjinder; Tandon, Nikhil; Froguel, Philippe; Barroso, Inês; Teo, Yik Ying; Zeggini, Eleftheria; Loos, Ruth J F; Small, Kerrin S; Ried, Janina S; DeFronzo, Ralph A; Grallert, Harald; Glaser, Benjamin; Metspalu, Andres; Wareham, Nicholas J; Walker, Mark; Banks, Eric; Gieger, Christian; Ingelsson, Erik; Im, Hae Kyung; Illig, Thomas; Franks, Paul W; Buck, Gemma; Trakalo, Joseph; Buck, David; Prokopenko, Inga; Mägi, Reedik; Lind, Lars; Farjoun, Yossi; Owen, Katharine R; Gloyn, Anna L; Strauch, Konstantin; Tuomi, Tiinamaija; Kooner, Jaspal Singh; Lee, Jong-Young; Park, Taesung; Donnelly, Peter; Morris, Andrew D; Hattersley, Andrew T; Bowden, Donald W; Collins, Francis S; Atzmon, Gil; Chambers, John C; Spector, Timothy D; Laakso, Markku; Strom, Tim M; Bell, Graeme I; Blangero, John; Duggirala, Ravindranath; Tai, E Shyong; McVean, Gilean; Hanis, Craig L; Wilson, James G; Seielstad, Mark; Frayling, Timothy M; Meigs, James B; Cox, Nancy J; Sladek, Rob; Lander, Eric S; Gabriel, Stacey; Burtt, Noël P; Mohlke, Karen L; Meitinger, Thomas; Groop, Leif; Abecasis, Goncalo; Florez, Jose C; Scott, Laura J; Morris, Andrew P; Kang, Hyun Min; Boehnke, Michael; Altshuler, David; McCarthy, Mark I

    2016-01-01

    The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of heritability. To test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole genome sequencing in 2,657 Europeans with and without diabetes, and exome sequencing in a total of 12,940 subjects from five ancestral groups. To increase statistical power, we expanded sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support a major role for lower-frequency variants in predisposition to type 2 diabetes. PMID:27398621

  19. High quality draft genome sequence and analysis of Pontibacter roseus type strain SRC-1T (DSM 17521T) isolated from muddy waters of a drainage system in Chandigarh, India

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mukherjee, Supratim; Lapidus, Alla; Shapiro, Nicole

    2015-01-01

    Pontibacter roseus Suresh et al 2006 is a member of genus Pontibacter family Cytophagaceae, class Cytophagia. While the type species of the genus Pontibacter actiniarum was isolated in 2005 from a marine environment, subsequent species of the same genus have been found in different types of habitats ranging from seawater, sediment, desert soil, rhizosphere, contaminated sites, solar saltern and muddy water. Here we describe the features of Pontibacter roseus strain SRC-1T along with its complete genome sequence and annotation from a culture of DSM 17521T. The 4,581,480 bp long draft genome consists of 12 scaffolds with 4,003 protein-coding and 50more » RNA genes and is a part of Genomic encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG-I) project.« less

  20. Genome sequence of the Thermotoga thermarum type strain (LA3(T)) from an African solfataric spring.

    PubMed

    Göker, Markus; Spring, Stefan; Scheuner, Carmen; Anderson, Iain; Zeytun, Ahmet; Nolan, Matt; Lucas, Susan; Tice, Hope; Del Rio, Tijana Glavina; Cheng, Jan-Fang; Han, Cliff; Tapia, Roxanne; Goodwin, Lynne A; Pitluck, Sam; Liolios, Konstantinos; Mavromatis, Konstantinos; Pagani, Ioanna; Ivanova, Natalia; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Rohde, Manfred; Detter, John C; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lapidus, Alla

    2014-06-15

    Thermotoga thermarum Windberger et al. 1989 is a member to the genomically well characterized genus Thermotoga in the phylum 'Thermotogae'. T. thermarum is of interest for its origin from a continental solfataric spring vs. predominantly marine oil reservoirs of other members of the genus. The genome of strain LA3T also provides fresh data for the phylogenomic positioning of the (hyper-)thermophilic bacteria. T. thermarum strain LA3(T) is the fourth sequenced genome of a type strain from the genus Thermotoga, and the sixth in the family Thermotogaceae to be formally described in a publication. Phylogenetic analyses do not reveal significant discrepancies between the current classification of the group, 16S rRNA gene data and whole-genome sequences. Nevertheless, T. thermarum significantly differs from other Thermotoga species regarding its iron-sulfur cluster synthesis, as it contains only a minimal set of the necessary proteins. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,039,943 bp long chromosome with its 2,015 protein-coding and 51 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  1. Genome sequence of the Thermotoga thermarum type strain (LA 3 T) from an African solfataric spring

    DOE PAGES

    Goker, Markus; Spring, Stefan; Scheuner, Carmen; ...

    2014-06-15

    Thermotoga thermarum Windberger et al. 1989 is a member to the genomically well characterized genus Thermotoga in the phylum ' Thermotogae'. T. thermarum is of interest for its origin from a continental solfataric spring vs. predominantly marine oil reservoirs of other members of the genus. The genome of strain LA3T also provides fresh data for the phylogenomic positioning of the (hyper-)thermophilic bacteria. T. thermarum strain LA3 T is the fourth sequenced genome of a type strain from the genus Thermotoga, and the sixth in the family Thermotogaceae to be formally described in a publication. Phylogenetic analyses do not reveal significantmore » discrepancies between the current classification of the group, 16S rRNA gene data and whole-genome sequences. Nevertheless, T. thermarum significantly differs from other Thermotoga species regarding its iron-sulfur cluster synthesis, as it contains only a minimal set of the necessary proteins. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,039,943 bp long chromosome with its 2,015 protein-coding and 51 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  2. Genome Sequences of Mycobacteriophages Amgine, Amohnition, Bella96, Cain, DarthP, Hammy, Krueger, LastHope, Peanam, PhelpsODU, Phrank, SirPhilip, Slimphazie, and Unicorn.

    PubMed

    Anders, Kirk R; Barekzi, Nazir; Best, Aaron A; Frederick, Gregory D; Mavrodi, Dmitri V; Vazquez, Edwin; Amoh, Nana Yaa A; Baliraine, Frederick N; Buchser, William J; Cast, Thomas P; Chamberlain, Carmen E; Chung, Hui-Min; D'Angelo, William A; Farris, Christian T; Fernandez-Martinez, Mariceli; Fischman, Haley D; Forsyth, Mark H; Fortier, Anna G; Gallo, Kara F; Held, Greta J; Lomas, Miguel A; Maldonado-Vazquez, Natalia Y; Moonsammy, Claudia H; Namboote, Peace; Paudel, Sudip; Polley, Sarah-Elizabeth M; Reyes, Gabriella M; Rubin, Michael R; Saha, Margaret S; Stukey, Joseph; Tobias, Tristan D; Garlena, Rebecca A; Stoner, Ty H; Cresawn, Steven G; Jacobs-Sera, Deborah; Pope, Welkin H; Russell, Daniel A; Hatfull, Graham F

    2017-12-07

    We report the genome sequences of 14 cluster K mycobacteriophages isolated using Mycobacterium smegmatis mc²155 as host. Four are closely related to subcluster K1 phages, and 10 are members of subcluster K6. The phage genomes span considerable sequence diversity, including multiple types of integrases and integration sites. Copyright © 2017 Anders et al.

  3. Draft Genome Sequence of the First New Delhi Metallo-β-Lactamase (NDM-1)-Producing Escherichia coli Strain Isolated in Peru.

    PubMed

    Tamariz, Jesus; Llanos, Carlos; Seas, Carlos; Montenegro, Paola; Lagos, Jose; Fernandes, Miriam R; Cerdeira, Louise; Lincopan, Nilton

    2018-03-29

    We present here the draft genome sequence of the first New Delhi metallo-β-lactamase (NDM-1)-producing Escherichia coli strain, belonging to sequence type 155 (ST155), isolated in Peru. Assembly of this draft genome resulted in 5,061,184 bp, revealing a clinically significant resistome for β-lactams, aminoglycosides, tetracyclines, phenicols, sulfonamides, trimethoprim, and fluoroquinolones. Copyright © 2018 Tamariz et al.

  4. Complete Genome Sequence of the Electricity-Producing “Thermincola potens” Strain JR▿

    PubMed Central

    Byrne-Bailey, Kathryne G.; Wrighton, Kelly C.; Melnyk, Ryan A.; Agbo, Peter; Hazen, Terry C.; Coates, John D.

    2010-01-01

    “Thermincola potens” strain JR is one of the first Gram-positive dissimilatory metal-reducing bacteria (DMRB) for which there is a complete genome sequence. Consistent with the physiology of this organism, preliminary annotation revealed an abundance of multiheme c-type cytochromes that are putatively associated with the periplasm and cell surface in a Gram-positive bacterium. Here we report the complete genome sequence of strain JR. PMID:20525829

  5. Genome Sequence of the Thermophile Bacillus coagulans Hammer, the Type Strain of the Species

    PubMed Central

    Su, Fei; Tao, Fei; Tang, Hongzhi

    2012-01-01

    Here we announce a 3.0-Mb assembly of the Bacillus coagulans Hammer strain, which is the type strain of the species within the genus Bacillus. Genomic analyses based on the sequence may provide insights into the phylogeny of the species and help to elucidate characteristics of the poorly studied strains of Bacillus coagulans. PMID:23105047

  6. Genome sequence of the thermophile Bacillus coagulans Hammer, the type strain of the species.

    PubMed

    Su, Fei; Tao, Fei; Tang, Hongzhi; Xu, Ping

    2012-11-01

    Here we announce a 3.0-Mb assembly of the Bacillus coagulans Hammer strain, which is the type strain of the species within the genus Bacillus. Genomic analyses based on the sequence may provide insights into the phylogeny of the species and help to elucidate characteristics of the poorly studied strains of Bacillus coagulans.

  7. Analysis of whole genome sequences of 16 strains of rubella virus from the United States, 1961-2009.

    PubMed

    Abernathy, Emily; Chen, Min-hsin; Bera, Jayati; Shrivastava, Susmita; Kirkness, Ewen; Zheng, Qi; Bellini, William; Icenogle, Joseph

    2013-01-25

    Rubella virus is the causative agent of rubella, a mild rash illness, and a potent teratogenic agent when contracted by a pregnant woman. Global rubella control programs target the reduction and elimination of congenital rubella syndrome. Phylogenetic analysis of partial sequences of rubella viruses has contributed to virus surveillance efforts and played an important role in demonstrating that indigenous rubella viruses have been eliminated in the United States. Sixteen wild-type rubella viruses were chosen for whole genome sequencing. All 16 viruses were collected in the United States from 1961 to 2009 and are from 8 of the 13 known rubella genotypes. Phylogenetic analysis of 30 whole genome sequences produced a maximum likelihood tree giving high bootstrap values for all genotypes except provisional genotype 1a. Comparison of the 16 new complete sequences and 14 previously sequenced wild-type viruses found regions with clusters of variable amino acids. The 5' 250 nucleotides of the genome are more conserved than any other part of the genome. Genotype specific deletions in the untranslated region between the non-structural and structural open reading frames were observed for genotypes 2B and genotype 1G. No evidence was seen for recombination events among the 30 viruses. The analysis presented here is consistent with previous reports on the genetic characterization of rubella virus genomes. Conserved and variable regions were identified and additional evidence for genotype specific nucleotide deletions in the intergenic region was found. Phylogenetic analysis confirmed genotype groupings originally based on structural protein coding region sequences, which provides support for the WHO nomenclature for genetic characterization of wild-type rubella viruses.

  8. Great expectations: patient perspectives and anticipated utility of non-diagnostic genomic-sequencing results.

    PubMed

    Hylind, Robyn; Smith, Maureen; Rasmussen-Torvik, Laura; Aufox, Sharon

    2018-01-01

    The management of secondary findings is a challenge to health-care providers relaying clinical genomic-sequencing results to patients. Understanding patients' expectations from non-diagnostic genomic sequencing could help guide this management. This study interviewed 14 individuals enrolled in the eMERGE (Electronic Medical Records and Genomics) study. Participants in eMERGE consent to undergo non-diagnostic genomic sequencing, receive results, and have results returned to their physicians. The interviews assessed expectations and intended use of results. The majority of interviewees were male (64%) and 43% identified as non-Caucasian. A unique theme identified was that many participants expressed uncertainty about the type of diseases they expected to receive results on, what results they wanted to learn about, and how they intended to use results. Participant uncertainty highlights the complex nature of deciding to undergo genomic testing and a deficiency in genomic knowledge. These results could help improve how genomic sequencing and secondary findings are discussed with patients.

  9. Repetitive part of the banana (Musa acuminata) genome investigated by low-depth 454 sequencing.

    PubMed

    Hribová, Eva; Neumann, Pavel; Matsumoto, Takashi; Roux, Nicolas; Macas, Jirí; Dolezel, Jaroslav

    2010-09-16

    Bananas and plantains (Musa spp.) are grown in more than a hundred tropical and subtropical countries and provide staple food for hundreds of millions of people. They are seed-sterile crops propagated clonally and this makes them vulnerable to a rapid spread of devastating diseases and at the same time hampers breeding improved cultivars. Although the socio-economic importance of bananas and plantains cannot be overestimated, they remain outside the focus of major research programs. This slows down the study of nuclear genome and the development of molecular tools to facilitate banana improvement. In this work, we report on the first thorough characterization of the repeat component of the banana (M. acuminata cv. 'Calcutta 4') genome. Analysis of almost 100 Mb of sequence data (0.15× genome coverage) permitted partial sequence reconstruction and characterization of repetitive DNA, making up about 30% of the genome. The results showed that the banana repeats are predominantly made of various types of Ty1/copia and Ty3/gypsy retroelements representing 16 and 7% of the genome respectively. On the other hand, DNA transposons were found to be rare. In addition to new families of transposable elements, two new satellite repeats were discovered and found useful as cytogenetic markers. To help in banana sequence annotation, a specific Musa repeat database was created, and its utility was demonstrated by analyzing the repeat composition of 62 genomic BAC clones. A low-depth 454 sequencing of banana nuclear genome provided the largest amount of DNA sequence data available until now for Musa and permitted reconstruction of most of the major types of DNA repeats. The information obtained in this study improves the knowledge of the long-range organization of banana chromosomes, and provides sequence resources needed for repeat masking and annotation during the Musa genome sequencing project. It also provides sequence data for isolation of DNA markers to be used in genetic diversity studies and in marker-assisted selection.

  10. Repetitive part of the banana (Musa acuminata) genome investigated by low-depth 454 sequencing

    PubMed Central

    2010-01-01

    Background Bananas and plantains (Musa spp.) are grown in more than a hundred tropical and subtropical countries and provide staple food for hundreds of millions of people. They are seed-sterile crops propagated clonally and this makes them vulnerable to a rapid spread of devastating diseases and at the same time hampers breeding improved cultivars. Although the socio-economic importance of bananas and plantains cannot be overestimated, they remain outside the focus of major research programs. This slows down the study of nuclear genome and the development of molecular tools to facilitate banana improvement. Results In this work, we report on the first thorough characterization of the repeat component of the banana (M. acuminata cv. 'Calcutta 4') genome. Analysis of almost 100 Mb of sequence data (0.15× genome coverage) permitted partial sequence reconstruction and characterization of repetitive DNA, making up about 30% of the genome. The results showed that the banana repeats are predominantly made of various types of Ty1/copia and Ty3/gypsy retroelements representing 16 and 7% of the genome respectively. On the other hand, DNA transposons were found to be rare. In addition to new families of transposable elements, two new satellite repeats were discovered and found useful as cytogenetic markers. To help in banana sequence annotation, a specific Musa repeat database was created, and its utility was demonstrated by analyzing the repeat composition of 62 genomic BAC clones. Conclusion A low-depth 454 sequencing of banana nuclear genome provided the largest amount of DNA sequence data available until now for Musa and permitted reconstruction of most of the major types of DNA repeats. The information obtained in this study improves the knowledge of the long-range organization of banana chromosomes, and provides sequence resources needed for repeat masking and annotation during the Musa genome sequencing project. It also provides sequence data for isolation of DNA markers to be used in genetic diversity studies and in marker-assisted selection. PMID:20846365

  11. Molecular Strain Typing of Mycobacterium tuberculosis: a Review of Frequently Used Methods

    PubMed Central

    2016-01-01

    Tuberculosis, caused by the bacterium Mycobacterium tuberculosis, remains one of the most serious global health problems. Molecular typing of M. tuberculosis has been used for various epidemiologic purposes as well as for clinical management. Currently, many techniques are available to type M. tuberculosis. Choosing the most appropriate technique in accordance with the existing laboratory conditions and the specific features of the geographic region is important. Insertion sequence IS6110-based restriction fragment length polymorphism (RFLP) analysis is considered the gold standard for the molecular epidemiologic investigations of tuberculosis. However, other polymerase chain reaction-based methods such as spacer oligonucleotide typing (spoligotyping), which detects 43 spacer sequence-interspersing direct repeats (DRs) in the genomic DR region; mycobacterial interspersed repetitive units–variable number tandem repeats, (MIRU-VNTR), which determines the number and size of tandem repetitive DNA sequences; repetitive-sequence-based PCR (rep-PCR), which provides high-throughput genotypic fingerprinting of multiple Mycobacterium species; and the recently developed genome-based whole genome sequencing methods demonstrate similar discriminatory power and greater convenience. This review focuses on techniques frequently used for the molecular typing of M. tuberculosis and discusses their general aspects and applications. PMID:27709842

  12. Molecular Strain Typing of Mycobacterium tuberculosis: a Review of Frequently Used Methods.

    PubMed

    Ei, Phyu Win; Aung, Wah Wah; Lee, Jong Seok; Choi, Go Eun; Chang, Chulhun L

    2016-11-01

    Tuberculosis, caused by the bacterium Mycobacterium tuberculosis, remains one of the most serious global health problems. Molecular typing of M. tuberculosis has been used for various epidemiologic purposes as well as for clinical management. Currently, many techniques are available to type M. tuberculosis. Choosing the most appropriate technique in accordance with the existing laboratory conditions and the specific features of the geographic region is important. Insertion sequence IS6110-based restriction fragment length polymorphism (RFLP) analysis is considered the gold standard for the molecular epidemiologic investigations of tuberculosis. However, other polymerase chain reaction-based methods such as spacer oligonucleotide typing (spoligotyping), which detects 43 spacer sequence-interspersing direct repeats (DRs) in the genomic DR region; mycobacterial interspersed repetitive units-variable number tandem repeats, (MIRU-VNTR), which determines the number and size of tandem repetitive DNA sequences; repetitive-sequence-based PCR (rep-PCR), which provides high-throughput genotypic fingerprinting of multiple Mycobacterium species; and the recently developed genome-based whole genome sequencing methods demonstrate similar discriminatory power and greater convenience. This review focuses on techniques frequently used for the molecular typing of M. tuberculosis and discusses their general aspects and applications.

  13. Draft genome sequences of 64 swine associated LA-MRSA ST5 isolates from the USA

    USDA-ARS?s Scientific Manuscript database

    Methicillin resistant Staphylococcus aureus colonizes humans and other animals such as swine. LA-MRSA sequence type (ST) 5 isolates are a public concern due to their pathogenicity and ability to acquire mobile genetic elements. This report presents draft genome sequences for 64 LA-MRSA ST5 isolates ...

  14. Organizational heterogeneity of vertebrate genomes.

    PubMed

    Frenkel, Svetlana; Kirzhner, Valery; Korol, Abraham

    2012-01-01

    Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as "texts" using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS) analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers) in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter--GDM) allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences.

  15. Whole-genome sequencing reveals clonal expansion of multiresistant Staphylococcus haemolyticus in European hospitals.

    PubMed

    Cavanagh, Jorunn Pauline; Hjerde, Erik; Holden, Matthew T G; Kahlke, Tim; Klingenberg, Claus; Flægstad, Trond; Parkhill, Julian; Bentley, Stephen D; Sollid, Johanna U Ericson

    2014-11-01

    Staphylococcus haemolyticus is an emerging cause of nosocomial infections, primarily affecting immunocompromised patients. A comparative genomic analysis was performed on clinical S. haemolyticus isolates to investigate their genetic relationship and explore the coding sequences with respect to antimicrobial resistance determinants and putative hospital adaptation. Whole-genome sequencing was performed on 134 isolates of S. haemolyticus from geographically diverse origins (Belgium, 2; Germany, 10; Japan, 13; Norway, 54; Spain, 2; Switzerland, 43; UK, 9; USA, 1). Each genome was individually assembled. Protein coding sequences (CDSs) were predicted and homologous genes were categorized into three types: Type I, core genes, homologues present in all strains; Type II, unique core genes, homologues shared by only a subgroup of strains; and Type III, unique genes, strain-specific CDSs. The phylogenetic relationship between the isolates was built from variable sites in the form of single nucleotide polymorphisms (SNPs) in the core genome and used to construct a maximum likelihood phylogeny. SNPs in the genome core regions divided the isolates into one major group of 126 isolates and one minor group of isolates with highly diverse genomes. The major group was further subdivided into seven clades (A-G), of which four (A-D) encompassed isolates only from Europe. Antimicrobial multiresistance was observed in 77.7% of the collection. High levels of homologous recombination were detected in genes involved in adherence, staphylococcal host adaptation and bacterial cell communication. The presence of several successful and highly resistant clones underlines the adaptive potential of this opportunistic pathogen. © The Author 2014. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy.

  16. Whole-genome sequencing reveals clonal expansion of multiresistant Staphylococcus haemolyticus in European hospitals

    PubMed Central

    Cavanagh, Jorunn Pauline; Hjerde, Erik; Holden, Matthew T. G.; Kahlke, Tim; Klingenberg, Claus; Flægstad, Trond; Parkhill, Julian; Bentley, Stephen D.; Sollid, Johanna U. Ericson

    2014-01-01

    Objectives Staphylococcus haemolyticus is an emerging cause of nosocomial infections, primarily affecting immunocompromised patients. A comparative genomic analysis was performed on clinical S. haemolyticus isolates to investigate their genetic relationship and explore the coding sequences with respect to antimicrobial resistance determinants and putative hospital adaptation. Methods Whole-genome sequencing was performed on 134 isolates of S. haemolyticus from geographically diverse origins (Belgium, 2; Germany, 10; Japan, 13; Norway, 54; Spain, 2; Switzerland, 43; UK, 9; USA, 1). Each genome was individually assembled. Protein coding sequences (CDSs) were predicted and homologous genes were categorized into three types: Type I, core genes, homologues present in all strains; Type II, unique core genes, homologues shared by only a subgroup of strains; and Type III, unique genes, strain-specific CDSs. The phylogenetic relationship between the isolates was built from variable sites in the form of single nucleotide polymorphisms (SNPs) in the core genome and used to construct a maximum likelihood phylogeny. Results SNPs in the genome core regions divided the isolates into one major group of 126 isolates and one minor group of isolates with highly diverse genomes. The major group was further subdivided into seven clades (A–G), of which four (A–D) encompassed isolates only from Europe. Antimicrobial multiresistance was observed in 77.7% of the collection. High levels of homologous recombination were detected in genes involved in adherence, staphylococcal host adaptation and bacterial cell communication. Conclusions The presence of several successful and highly resistant clones underlines the adaptive potential of this opportunistic pathogen. PMID:25038069

  17. Complete genome sequence of Oceanithermus profundus type strain (506T)

    PubMed Central

    Pati, Amrita; Zhang, Xiaojing; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Del Rio, Tijana Glavina; Tice, Hope; Cheng, Jan-Fang; Tapia, Roxane; Han, Cliff; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Pagani, Ioanna; Ivanova, Natalia; Mavromatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; Hauser, Loren; Jeffries, Cynthia D.; Brambilla, Evelyne-Marie; Röhl, Alina; Mwirichia, Romano; Rohde, Manfred; Tindall, Brian J.; Sikorski, Johannes; Wirth, Reinhard; Göker, Markus; Woyke, Tanja; Detter, John C.; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Land, Miriam

    2011-01-01

    Oceanithermus profundus Miroshnichenko et al. 2003 is the type species of the genus Oceanithermus, which belongs to the family Thermaceae. The genus currently comprises two species whose members are thermophilic and are able to reduce sulfur compounds and nitrite. The organism is adapted to the salinity of sea water, is able to utilize a broad range of carbohydrates, some proteinaceous substrates, organic acids and alcohols. This is the first completed genome sequence of a member of the genus Oceanithermus and the fourth sequence from the family Thermaceae. The 2,439,291 bp long genome with its 2,391 protein-coding and 54 RNA genes consists of one chromosome and a 135,351 bp long plasmid, and is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21677858

  18. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs.

    PubMed

    Dilthey, Alexander T; Gourraud, Pierre-Antoine; Mentzer, Alexander J; Cereb, Nezih; Iqbal, Zamin; McVean, Gil

    2016-10-01

    Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30-250 CPU hours per sample) remain a significant challenge to practical application.

  19. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs

    PubMed Central

    Dilthey, Alexander T.; Gourraud, Pierre-Antoine; McVean, Gil

    2016-01-01

    Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30–250 CPU hours per sample) remain a significant challenge to practical application. PMID:27792722

  20. Complete genome sequence of the moderately thermophilic mineral-sulfide-oxidizing firmicute Sulfobacillus acidophilus type strain (NAL(T)).

    PubMed

    Anderson, Iain; Chertkov, Olga; Chen, Amy; Saunders, Elizabeth; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Hammon, Nancy; Deshpande, Shweta; Cheng, Jan-Fang; Han, Cliff; Tapia, Roxanne; Goodwin, Lynne A; Pitluck, Sam; Liolios, Konstantinos; Pagani, Ioanna; Ivanova, Natalia; Mikhailova, Natalia; Pati, Amrita; Palaniappan, Krishna; Land, Miriam; Pan, Chongle; Rohde, Manfred; Pukall, Rüdiger; Göker, Markus; Detter, John C; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Mavromatis, Konstantinos

    2012-07-30

    Sulfobacillus acidophilus Norris et al. 1996 is a member of the genus Sulfobacillus which comprises five species of the order Clostridiales. Sulfobacillus species are of interest for comparison to other sulfur and iron oxidizers and also have biomining applications. This is the first completed genome sequence of a type strain of the genus Sulfobacillus, and the second published genome of a member of the species S. acidophilus. The genome, which consists of one chromosome and one plasmid with a total size of 3,557,831 bp harbors 3,626 protein-coding and 69 RNA genes, and is a part of the GenomicEncyclopedia ofBacteria andArchaea project.

  1. Genomic Sequencing of Bordetella pertussis for Epidemiology and Global Surveillance of Whooping Cough.

    PubMed

    Bouchez, Valérie; Guglielmini, Julien; Dazas, Mélody; Landier, Annie; Toubiana, Julie; Guillot, Sophie; Criscuolo, Alexis; Brisse, Sylvain

    2018-06-01

    Bordetella pertussis causes whooping cough, a highly contagious respiratory disease that is reemerging in many world regions. The spread of antigen-deficient strains may threaten acellular vaccine efficacy. Dynamics of strain transmission are poorly defined because of shortcomings in current strain genotyping methods. Our objective was to develop a whole-genome genotyping strategy with sufficient resolution for local epidemiologic questions and sufficient reproducibility to enable international comparisons of clinical isolates. We defined a core genome multilocus sequence typing scheme comprising 2,038 loci and demonstrated its congruence with whole-genome single-nucleotide polymorphism variation. Most cases of intrafamilial groups of isolates or of multiple isolates recovered from the same patient were distinguished from temporally and geographically cocirculating isolates. However, epidemiologically unrelated isolates were sometimes nearly undistinguishable. We set up a publicly accessible core genome multilocus sequence typing database to enable global comparisons of B. pertussis isolates, opening the way for internationally coordinated surveillance.

  2. Complete genome sequence of Halogeometricum borinquense type strain (PR3T)

    PubMed Central

    Malfatti, Stephanie; Tindall, Brian J.; Schneider, Susanne; Fähnrich, Regine; Lapidus, Alla; LaButtii, Kurt; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Chen, Feng; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Anderson, Iain; Pati, Amrita; Ivanova, Natalia; Mavromatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; D’haeseleer, Patrik; Göker, Markus; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Chain, Patrick

    2009-01-01

    Halogeometricum borinquense Montalvo-Rodríguez et al. 1998 is the type species of the genus, and is of phylogenetic interest because of its distinct location between the halobacterial genera Haloquadratum and Halosarcina. H. borinquense requires extremely high salt (NaCl) concentrations for growth. It can not only grow aerobically but also anaerobically using nitrate as electron acceptor. The strain described in this report is a free-living, motile, pleomorphic, euryarchaeon, which was originally isolated from the solar salterns of Cabo Rojo, Puerto Rico. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the halobacterial genus Halogeometricum, and this 3,944,467 bp long six replicon genome with its 3937 protein-coding and 57 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304651

  3. Complete genome sequence of Pedobacter heparinus type strain (HIM 762-3T)

    PubMed Central

    Han, Cliff; Spring, Stefan; Lapidus, Alla; Del Rio, Tijana Glavina; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Nolan, Matt; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia C.; Saunders, Elizabeth; Chertkov, Olga; Brettin, Thomas; Göker, Markus; Rohde, Manfred; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Detter, John C.

    2009-01-01

    Pedobacter heparinus (Payza and Korn 1956) Steyn et al. 1998 comb. nov. is the type species of the rapidly growing genus Pedobacter within the family Sphingobacteriaceae of the phylum ‘Bacteroidetes’. P. heparinus is of interest, because it was the first isolated strain shown to grow with heparin as sole carbon and nitrogen source and because it produces several enzymes involved in the degradation of mucopolysaccharides. All available data about this species are based on a sole strain that was isolated from dry soil. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first report on a complete genome sequence of a member of the genus Pedobacter, and the 5,167,383 bp long single replicon genome with its 4287 protein-coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304637

  4. Complete genome sequence of Leadbetterella byssophila type strain (4M15T)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Abt, Birte; Teshima, Hazuki; Lucas, Susan

    2011-01-01

    Leadbetterella byssophila Weon et al. 2005 is the type species of the genus Leadbetterella of the family Cytophagaceae in the phylum Bacteroidetes. Members of the phylum Bacteroidetes are widely distributed in nature, especially in aquatic environments. They are of special interest for their ability to degrade complex biopolymers. L. byssophila occupies a rather isolated position in the tree of life and is characterized by its ability to hydrolyze starch and gelatine, but not agar, cellulose or chitin. Here we describe the features of this organism, together with the complete genome sequence, and annotation. L. byssophila is already the 16th membermore » of the family Cytophagaceae whose genome has been sequenced. The 4,059,653 bp long single replicon genome with its 3,613 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  5. Complete genome sequence of Halogeometricum borinquense type strain (PR 3 T)

    DOE PAGES

    Malfatti, Stephanie; Tindall, Brian J.; Schneider, Susanne; ...

    2009-09-29

    Halogeometricum borinquense Montalvo-Rodríguez et al. 1998 is the type species of the genus, and is of phylogenetic interest because of its distinct location between the halobacterial genera Haloquadratum and Halosarcina. H. borinquense requires extremely high salt (NaCl) concentrations for growth. It can not only grow aerobically but also anaerobically using nitrate as electron acceptor. The strain described in this report is a free-living, motile, pleomorphic, euryarchaeon, which was originally isolated from the solar salterns of Cabo Rojo, Puerto Rico. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first completemore » genome sequence of the halobacterial genus Halogeometricum, and this 3,944,467 bp long six replicon genome with its 3937 protein-coding and 57 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  6. Complete genome sequence of Brachybacterium faecium type strain (Schefferle 6-10T)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lapidus, Alla; Pukall, Rudiger; LaButti, Kurt

    2009-05-20

    Brachybacterium faecium Collins et al. 1988 is the type species of the genus, and is of phylogenetic interest because of its location in the Dermabacteraceae, a rather isolated family within the actinobacterial suborder Micrococcineae. B. faecium is known for its rod-coccus growth cycle and the ability to degrade uric acid. It grows aerobically or weakly anaerobically. The strain described in this report is a free-living, nonmotile, Gram-positive bacterium, originally isolated from poultry deep litter. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a membermore » of the actinobacterial family Dermabacteraceae, and the 3,614,992 bp long single replicon genome with its 3129 protein-coding and 69 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  7. Complete Genome Sequence of Staphylococcus aureus XN108, an ST239-MRSA-SCCmec III Strain with Intermediate Vancomycin Resistance Isolated in Mainland China

    PubMed Central

    Zhang, Xia; Xu, Xiaomeng; Yuan, Wenchang; Hu, Qiwen; Shang, Weilong; Hu, Xiaomei

    2014-01-01

    ST239-MRSA-SCCmec III (ST239, sequence type 239; MRSA, methicillin-resistant Staphylococcus aureus; SCCmec III, staphylococcal cassette chromosome mec type III) is the most predominant clone of hospital-acquired methicillin-resistant S. aureus in mainland China. We report here the complete genome sequence of XN108, the first vancomycin-intermediate S. aureus strain isolated from a steam-burned patient with a wound infection. PMID:25059856

  8. Complete genome sequence of Hirschia baltica type strain (IFAM 1418T)

    PubMed Central

    Chertkov, Olga; Brown, Pamela J.B.; Kysela, David T.; de Pedro, Miguel A.; Lucas, Susan; Copeland, Alex; Lapidus, Alla; Del Rio, Tijana Glavina; Tice, Hope; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Detter, John C.; Han, Cliff; Larimer, Frank; Chang, Yun-juan; Jeffries, Cynthia D.; Land, Miriam; Hauser, Loren; Kyrpides, Nikos C.; Ivanova, Natalia; Ovchinnikova, Galina; Tindall, Brian J.; Göker, Markus; Klenk, Hans-Peter; Brun, Yves V.

    2011-01-01

    The family Hyphomonadaceae within the Alphaproteobacteria is largely comprised of bacteria isolated from marine environments with striking morphologies and an unusual mode of cell growth. Here, we report the complete genome sequence Hirschia baltica, which is only the second a member of the Hyphomonadaceae with a published genome sequence. H. baltica is of special interest because it has a dimorphic life cycle and is a stalked, budding bacterium. The 3,455,622 bp long chromosome and 84,492 bp plasmid with a total of 3,222 protein-coding and 44 RNA genes were sequenced as part of the DOE Joint Genome Institute Program CSP 2008. PMID:22675580

  9. Complete genome sequence of DSM 30083(T), the type strain (U5/41(T)) of Escherichia coli, and a proposal for delineating subspecies in microbial taxonomy.

    PubMed

    Meier-Kolthoff, Jan P; Hahnke, Richard L; Petersen, Jörn; Scheuner, Carmen; Michael, Victoria; Fiebig, Anne; Rohde, Christine; Rohde, Manfred; Fartmann, Berthold; Goodwin, Lynne A; Chertkov, Olga; Reddy, Tbk; Pati, Amrita; Ivanova, Natalia N; Markowitz, Victor; Kyrpides, Nikos C; Woyke, Tanja; Göker, Markus; Klenk, Hans-Peter

    2014-01-01

    Although Escherichia coli is the most widely studied bacterial model organism and often considered to be the model bacterium per se, its type strain was until now forgotten from microbial genomics. As a part of the G enomic E ncyclopedia of B acteria and A rchaea project, we here describe the features of E. coli DSM 30083(T) together with its genome sequence and annotation as well as novel aspects of its phenotype. The 5,038,133 bp containing genome sequence includes 4,762 protein-coding genes and 175 RNA genes as well as a single plasmid. Affiliation of a set of 250 genome-sequenced E. coli strains, Shigella and outgroup strains to the type strain of E. coli was investigated using digital DNA:DNA-hybridization (dDDH) similarities and differences in genomic G+C content. As in the majority of previous studies, results show Shigella spp. embedded within E. coli and in most cases forming a single subgroup of it. Phylogenomic trees also recover the proposed E. coli phylotypes as monophyla with minor exceptions and place DSM 30083(T) in phylotype B2 with E. coli S88 as its closest neighbor. The widely used lab strain K-12 is not only genomically but also physiologically strongly different from the type strain. The phylotypes do not express a uniform level of character divergence as measured using dDDH, however, thus an alternative arrangement is proposed and discussed in the context of bacterial subspecies. Analyses of the genome sequences of a large number of E. coli strains and of strains from > 100 other bacterial genera indicate a value of 79-80% dDDH as the most promising threshold for delineating subspecies, which in turn suggests the presence of five subspecies within E. coli.

  10. Detection of porcine circovirus type 2 in pigs imported from Indonesia.

    PubMed

    Manokaran, Gayathri; Lin, Yueh-Nuo; Soh, Moi-Lien; Lim, Elizabeth Ai-Sim; Lim, Chee-Wee; Tan, Boon-Huan

    2008-11-25

    We have detected the presence of porcine circovirus (PCV) type 2 in Indonesian pigs imported to Singapore for food consumption. A total of three viral isolates were identified, and to genetically characterise them further, their full genomes were sequenced. Each genome showed a typical organization of PCV type 2, with the three isolates sharing similar genome lengths of 1767 nucleotide (nt) at high nt identities of 99.8-100%, further indicating that the viral isolates were quite homogeneous. Sequence analysis further revealed that the ORF2 genes contain the nt sequence CCCCGC (from nt position 262 to 267) that was previously reported to be associated with PCV type 2, group 1C. The phylogenetic tree was constructed for the ORF2 genes, and the PCV type 2 isolates distributed into two distinctive groups. The Indonesian PCV type 2 clustered tightly with one China isolate, accession number AY035820, as a sub-cluster in group 1C. The sequence and phylogenetic analyses both confirmed that the three Indonesian PCV type 2 isolates belong to group 1C, and that the genetic changes for the three Indonesian isolates were very stable, possibly due to the low-scale evolution.

  11. Genome Sequence of the Hemolytic-Uremic Syndrome-Causing Strain Escherichia coli NCCP15647

    PubMed Central

    Jeong, Haeyoung; Zhao, Fumei; Igori, Davaajargal; Oh, Kyung-Hwan; Kim, Seon-Young; Kang, Sung Gyun; Kim, Byung Kwon; Kwon, Soon-Kyeong; Lee, Choong Hoon; Song, Ju Yeon; Yu, Dong Su; Park, Mi-Sun

    2012-01-01

    Enterohemorrhagic Escherichia coli (EHEC) causes a disease involving diarrhea, hemorrhagic colitis, and hemolytic-uremic syndrome (HUS). Here we present the draft genome sequence of NCCP15647, an EHEC isolate from an HUS patient. Its genome exhibits features of EHEC, such as genes for verotoxins, a type III secretion system, and prophages. PMID:22740672

  12. Rapid Threat Organism Recognition Pipeline

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Williams, Kelly P.; Solberg, Owen D.; Schoeniger, Joseph S.

    2013-05-07

    The RAPTOR computational pipeline identifies microbial nucleic acid sequences present in sequence data from clinical samples. It takes as input raw short-read genomic sequence data (in particular, the type generated by the Illumina sequencing platforms) and outputs taxonomic evaluation of detected microbes in various human-readable formats. This software was designed to assist in the diagnosis or characterization of infectious disease, by detecting pathogen sequences in nucleic acid sequence data from clinical samples. It has also been applied in the detection of algal pathogens, when algal biofuel ponds became unproductive. RAPTOR first trims and filters genomic sequence reads based on qualitymore » and related considerations, then performs a quick alignment to the human (or other host) genome to filter out host sequences, then performs a deeper search against microbial genomes. Alignment to a protein sequence database is optional. Alignment results are summarized and placed in a taxonomic framework using the Lowest Common Ancestor algorithm.« less

  13. Repetitive sequences in plant nuclear DNA: types, distribution, evolution and function.

    PubMed

    Mehrotra, Shweta; Goyal, Vinod

    2014-08-01

    Repetitive DNA sequences are a major component of eukaryotic genomes and may account for up to 90% of the genome size. They can be divided into minisatellite, microsatellite and satellite sequences. Satellite DNA sequences are considered to be a fast-evolving component of eukaryotic genomes, comprising tandemly-arrayed, highly-repetitive and highly-conserved monomer sequences. The monomer unit of satellite DNA is 150-400 base pairs (bp) in length. Repetitive sequences may be species- or genus-specific, and may be centromeric or subtelomeric in nature. They exhibit cohesive and concerted evolution caused by molecular drive, leading to high sequence homogeneity. Repetitive sequences accumulate variations in sequence and copy number during evolution, hence they are important tools for taxonomic and phylogenetic studies, and are known as "tuning knobs" in the evolution. Therefore, knowledge of repetitive sequences assists our understanding of the organization, evolution and behavior of eukaryotic genomes. Repetitive sequences have cytoplasmic, cellular and developmental effects and play a role in chromosomal recombination. In the post-genomics era, with the introduction of next-generation sequencing technology, it is possible to evaluate complex genomes for analyzing repetitive sequences and deciphering the yet unknown functional potential of repetitive sequences. Copyright © 2014 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  14. Existence of host-related DNA sequences in the schistosome genome.

    PubMed

    Iwamura, Y; Irie, Y; Kominami, R; Nara, T; Yasuraoka, K

    1991-06-01

    DNA sequences homologous to the mouse intracisternal A particle and endogenous type C retrovirus were detected in the DNAs of Schistosoma japonicum adults and S. mansoni eggs. Furthermore, other kinds of repetitive sequences in the host genome such as mouse type 1 Alu sequence (B1), mouse type 2 Alu sequence (B2) and mo-2 sequence, a mouse mini-satellite, were also detected in the DNAs from adults and eggs of S. japonicum and eggs of S. mansoni. Almost all of the sequences described above were absent in the DNAs of S. mansoni adults. The DNA fingerprints of schistosomes, using the mo-2 sequence, were indistinguishable from each other and resembled those of their murine hosts. Moreover, the mo-2 sequence was hypermethylated in the DNAs of schistosomes and its amount was variable in them. These facts indicate that host-related sequences are actually present in schistosomes and that the mo-2 repetitive sequence exists probably in extra-chromosome.

  15. Complete genome sequence of Catenulispora acidiphila type strain (ID 139908T)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Copeland, Alex; Lapidus, Alla; Rio, Tijana GlavinaDel

    2009-05-20

    Catenulispora acidiphila Busti et al. 2006 is the type species of the genus Catenulispora, and is of interest because of the rather isolated phylogenetic location of the genomically little studied suborder Catenulisporineae within the order Actinomycetales. C. acidiphilia is known for its acidophilic, aerobic lifestyle, but can also grow scantly under anaerobic conditions. Under regular conditions C. acidiphilia grows in long filaments of relatively short aerial hyphae with marked septation. It is a free living, non motile, Gram-positive bacterium isolated from a forest soil sample taken from a wooded area in Gerenzano, Italy. Here we describe the features of thismore » organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of the actinobacterial family Catenulisporaceae, and the 10,467,782 bp long single replicon genome with its 9056 protein-coding and 69 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  16. Complete genome sequence of Campylobacter fetus subsp. testudinum type strain 03-427T

    USDA-ARS?s Scientific Manuscript database

    Campylobacter fetus subsp. testudinum has been isolated from reptiles and humans. This Campylobacter subspecies is genetically distinct from other C. fetus subspecies. Here we present the first whole genome sequence for this C. fetus subspecies....

  17. Information Topics of Greatest Interest for Return of Genome Sequencing Results among Women Diagnosed with Breast Cancer at a Young Age.

    PubMed

    Seo, Joann; Ivanovich, Jennifer; Goodman, Melody S; Biesecker, Barbara B; Kaphingst, Kimberly A

    2017-06-01

    We investigated what information women diagnosed with breast cancer at a young age would want to learn when genome sequencing results are returned. We conducted 60 semi-structured interviews with women diagnosed with breast cancer at age 40 or younger. We examined what specific information participants would want to learn across result types and for each type of result, as well as how much information they would want. Genome sequencing was not offered to participants as part of the study. Two coders independently coded interview transcripts; analysis was conducted using NVivo10. Across result types, participants wanted to learn about health implications, risk and prevalence in quantitative terms, causes of variants, and causes of diseases. Participants wanted to learn actionable information for variants affecting risk of preventable or treatable disease, medication response, and carrier status. The amount of desired information differed for variants affecting risk of unpreventable or untreatable disease, with uncertain significance, and not health-related. Women diagnosed with breast cancer at a young age recognize the value of genome sequencing results in identifying potential causes and effective treatments and expressed interest in using the information to help relatives and to further understand their other health risks. Our findings can inform the development of effective feedback strategies for genome sequencing that meet patients' information needs and preferences.

  18. Global MLST of Salmonella Typhi Revisited in Post-genomic Era: Genetic Conservation, Population Structure, and Comparative Genomics of Rare Sequence Types.

    PubMed

    Yap, Kien-Pong; Ho, Wing S; Gan, Han M; Chai, Lay C; Thong, Kwai L

    2016-01-01

    Typhoid fever, caused by Salmonella enterica serovar Typhi, remains an important public health burden in Southeast Asia and other endemic countries. Various genotyping methods have been applied to study the genetic variations of this human-restricted pathogen. Multilocus sequence typing (MLST) is one of the widely accepted methods, and recently, there is a growing interest in the re-application of MLST in the post-genomic era. In this study, we provide the global MLST distribution of S. Typhi utilizing both publicly available 1,826 S. Typhi genome sequences in addition to performing conventional MLST on S. Typhi strains isolated from various endemic regions spanning over a century. Our global MLST analysis confirms the predominance of two sequence types (ST1 and ST2) co-existing in the endemic regions. Interestingly, S. Typhi strains with ST8 are currently confined within the African continent. Comparative genomic analyses of ST8 and other rare STs with genomes of ST1/ST2 revealed unique mutations in important virulence genes such as flhB, sipC, and tviD that may explain the variations that differentiate between seemingly successful (widespread) and unsuccessful (poor dissemination) S. Typhi populations. Large scale whole-genome phylogeny demonstrated evidence of phylogeographical structuring and showed that ST8 may have diverged from the earlier ancestral population of ST1 and ST2, which later lost some of its fitness advantages, leading to poor worldwide dissemination. In response to the unprecedented increase in genomic data, this study demonstrates and highlights the utility of large-scale genome-based MLST as a quick and effective approach to narrow the scope of in-depth comparative genomic analysis and consequently provide new insights into the fine scale of pathogen evolution and population structure.

  19. Optofluidic Single-Cell Genome Amplification of Sub-micron Bacteria in the Ocean Subsurface

    PubMed Central

    Landry, Zachary C.; Vergin, Kevin; Mannenbach, Christopher; Block, Stephen; Yang, Qiao; Blainey, Paul; Carlson, Craig; Giovannoni, Stephen

    2018-01-01

    Optofluidic single-cell genome amplification was used to obtain genome sequences from sub-micron cells collected from the euphotic and mesopelagic zones of the northwestern Sargasso Sea. Plankton cells were visually selected and manually sorted with an optical trap, yielding 20 partial genome sequences representing seven bacterial phyla. Two organisms, E01-9C-26 (Gammaproteobacteria), represented by four single cell genomes, and Opi.OSU.00C, an uncharacterized Verrucomicrobia, were the first of their types retrieved by single cell genome sequencing and were studied in detail. Metagenomic data showed that E01-9C-26 is found throughout the dark ocean, while Opi.OSU.00C was observed to bloom transiently in the nutrient-depleted euphotic zone of the late spring and early summer. The E01-9C-26 genomes had an estimated size of 4.76–5.05 Mbps, and contained “O” and “W”-type monooxygenase genes related to methane and ammonium monooxygenases that were previously reported from ocean metagenomes. Metabolic reconstruction indicated E01-9C-26 are likely versatile methylotrophs capable of scavenging C1 compounds, methylated compounds, reduced sulfur compounds, and a wide range of amines, including D-amino acids. The genome sequences identified E01-9C-26 as a source of “O” and “W”-type monooxygenase genes related to methane and ammonium monooxygenases that were previously reported from ocean metagenomes, but are of unknown function. In contrast, Opi.OSU.00C genomes encode genes for catabolizing carbohydrate compounds normally associated with eukaryotic phytoplankton. This exploration of optofluidics showed that it was effective for retrieving diverse single-cell bacterioplankton genomes and has potential advantages in microbiology applications that require working with small sample volumes or targeting cells by their morphology.

  20. Draft Genome Sequence of Mycobacterium chimaera Type Strain Fl-0169.

    PubMed

    Pfaller, Stacy; Tokarev, Vasily; Kessler, Collin; McLimans, Christopher; Gomez-Alvarez, Vicente; Wright, Justin; King, Dawn; Lamendella, Regina

    2017-02-23

    We report here the draft genome sequence of the type strain Mycobacterium chimaera Fl-0169, a member of the Mycobacterium avium complex (MAC). M. chimaera Fl-0169 T was isolated from a patient in Italy and is highly similar to strains of M. chimaera isolated in Ireland, although Fl-0169 T possesses unique virulence genes. Copyright © 2017 Pfaller et al.

  1. Comparative analysis of the complete mitochondrial genomes of two types of ducks, peking duck (Anas platyrhychos) and muscovy duck (Cairina moschata).

    PubMed

    Tu, Jianfeng; Yang, Ying; Yang, Fuhe; Xing, Xiumei

    2017-03-01

    Peking duck (Anas platyrhychos) and Muscovy duck (Cairina moschata) are two types of domestic ducks and the most popular meat breeds on the world. In this study, we sequenced and compared complete mitochondrial genomes of both breeds. In order to investigate the phylogeny of both breeds within Anseriformes, the sequences of concatenated 12 protein-coding genes were used for phylogenetic analysis. The result was consistent with most of the previous morphological and molecular studies. Our complete mitochondrial genome sequences of both breeds will be useful information in phylogenetics, and be available as basic data for the breeding and genetics.

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sikorski, Johannes; Lapidus, Alla L.; Copeland, A

    Segniliparus rotundus Butler 2005 is the type species of the genus Segniliparus, which is cur-rently the only genus in the corynebacterial family Segniliparaceae. This family is of large in-terest because of a novel late-emerging genus-specific mycolate pattern. The type strain has been isolated from human sputum and is probably an opportunistic pathogen. Here we de-scribe the features of this organism, together with the complete genome sequence and anno-tation. This is the first completed genome sequence of the family Segniliparaceae. The 3,157,527 bp long genome with its 3,081 protein-coding and 52 RNA genes is part of the Genomic Encyclopedia of Bacteriamore » and Archaea project.« less

  3. Illumina Synthetic Long Read Sequencing Allows Recovery of Missing Sequences even in the “Finished” C. elegans Genome

    PubMed Central

    Li, Runsheng; Hsieh, Chia-Ling; Young, Amanda; Zhang, Zhihong; Ren, Xiaoliang; Zhao, Zhongying

    2015-01-01

    Most next-generation sequencing platforms permit acquisition of high-throughput DNA sequences, but the relatively short read length limits their use in genome assembly or finishing. Illumina has recently released a technology called Synthetic Long-Read Sequencing that can produce reads of unusual length, i.e., predominately around 10 Kb. However, a systematic assessment of their use in genome finishing and assembly is still lacking. We evaluate the promise and deficiency of the long reads in these aspects using isogenic C. elegans genome with no gap. First, the reads are highly accurate and capable of recovering most types of repetitive sequences. However, the presence of tandem repetitive sequences prevents pre-assembly of long reads in the relevant genomic region. Second, the reads are able to reliably detect missing but not extra sequences in the C. elegans genome. Third, the reads of smaller size are more capable of recovering repetitive sequences than those of bigger size. Fourth, at least 40 Kbp missing genomic sequences are recovered in the C. elegans genome using the long reads. Finally, an N50 contig size of at least 86 Kbp can be achieved with 24×reads but with substantial mis-assembly errors, highlighting a need for novel assembly algorithm for the long reads. PMID:26039588

  4. Complete Genome Sequence of the Avian-Pathogenic Escherichia coli Strain APEC O18

    PubMed Central

    Nicholson, Bryon A.; Wannemuehler, Yvonne M.; Logue, Catherine M.; Li, Ganwu

    2016-01-01

    Avian-pathogenic Escherichia coli (APEC) is the causative agent of colibacillosis, a disease that affects all facets of poultry production worldwide, resulting in multimillion dollar losses annually. Here, we report the genome sequence of an APEC O18 sequence type 95 (ST95) strain associated with disease in a chicken. PMID:27811098

  5. Draft genome sequences of 9 LA-MRSA ST5 isolates obtained from humans after short term swine contact

    USDA-ARS?s Scientific Manuscript database

    Livestock associated methicillin resistant Staphylococcus aureus (LA-MRSA) sequence type 5 have raised concerns surrounding the potential for these isolates to colonize or cause disease in humans with swine contact. Here, we report draft genome sequences for 9 LA-MRSA ST5 isolates obtained from huma...

  6. Draft genome sequences of 14 swine associated LA-MRSA ST398 isolates from the U.S.

    USDA-ARS?s Scientific Manuscript database

    Livestock associated methicillin resistant Staphylococcus aureus (LA-MRSA) is part of the normal microbiota of swine. The initial and predominant swine associated LA-MRSA sequence type (ST) identified is ST398. Here, we present 14 draft genome sequence from LA-MRSA ST398 isolates found in the US....

  7. Molecular epidemiology of infectious laryngotracheitis: a review

    USDA-ARS?s Scientific Manuscript database

    Falconid herpesvirus type 1 (FHV-1) is the causative agent of falcon inclusion body disease, an acute, highly contagious disease of raptors. The complete nucleotide sequence of the genome of FHV-1 has been determined. The genome is arranged as a D-type genome with large inverted repeats flanking a ...

  8. Sequence-specific epigenetic effects of the maternal somatic genome on developmental rearrangements of the zygotic genome in Paramecium primaurelia.

    PubMed Central

    Meyer, E; Butler, A; Dubrana, K; Duharcourt, S; Caron, F

    1997-01-01

    In ciliates, the germ line genome is extensively rearranged during the development of the somatic macronucleus from a mitotic product of the zygotic nucleus. Germ line chromosomes are fragmented in specific regions, and a large number of internal sequence elements are eliminated. It was previously shown that transformation of the vegetative macronucleus of Paramecium primaurelia with a plasmid containing a subtelomeric surface antigen gene can affect the processing of the homologous germ line genomic region during development of a new macronucleus in sexual progeny of transformed clones. The gene and telomere-proximal flanking sequences are deleted from the new macronuclear genome, although the germ line genome remains wild type. Here we show that plasmids containing nonoverlapping segments of the same genomic region are able to induce similar terminal deletions; the locations of deletion end points depend on the particular sequence used. Transformation of the maternal macronucleus with a sequence internal to a macronuclear chromosome also causes the occurrence of internal deletions between short direct repeats composed of alternating thymines and adenines. The epigenetic influence of maternal macronuclear sequences on developmental rearrangements of the zygotic genome thus appears to be both sequence specific and general, suggesting that this trans-nucleus effect is mediated by pairing of homologous sequences. PMID:9199294

  9. Identification and characterization of EBV genomes in spontaneously immortalized human peripheral blood B lymphocytes by NGS technology.

    PubMed

    Lei, Haiyan; Li, Tianwei; Hung, Guo-Chiuan; Li, Bingjie; Tsai, Shien; Lo, Shyh-Ching

    2013-11-19

    We conducted genomic sequencing to identify Epstein Barr Virus (EBV) genomes in 2 human peripheral blood B lymphocytes that underwent spontaneous immortalization promoted by mycoplasma infections in culture, using the high-throughput sequencing (HTS) Illumina MiSeq platform. The purpose of this study was to examine if rapid detection and characterization of a viral agent could be effectively achieved by HTS using a platform that has become readily available in general biology laboratories. Raw read sequences, averaging 175 bps in length, were mapped with DNA databases of human, bacteria, fungi and virus genomes using the CLC Genomics Workbench bioinformatics tool. Overall 37,757 out of 49,520,834 total reads in one lymphocyte line (# K4413-Mi) and 28,178 out of 45,335,960 reads in the other lymphocyte line (# K4123-Mi) were identified as EBV sequences. The two EBV genomes with estimated 35.22-fold and 31.06-fold sequence coverage respectively, designated K4413-Mi EBV and K4123-Mi EBV (GenBank accession number KC440852 and KC440851 respectively), are characteristic of type-1 EBV. Sequence comparison and phylogenetic analysis among K4413-Mi EBV, K4123-Mi EBV and the EBV genomes previously reported to GenBank as well as the NA12878 EBV genome assembled from database of the 1000 Genome Project showed that these 2 EBVs are most closely related to B95-8, an EBV previously isolated from a patient with infectious mononucleosis and WT-EBV. They are less similar to EBVs associated with nasopharyngeal carcinoma (NPC) from Hong Kong and China as well as the Akata strain of a case of Burkitt's lymphoma from Japan. They are most different from type 2 EBV found in Western African Burkitt's lymphoma.

  10. The Arabidopsis lyrata genome sequence and the basis of rapid genome size change

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hu, Tina T.; Pattyn, Pedro; Bakker, Erica G.

    2011-04-29

    In our manuscript, we present a high-quality genome sequence of the Arabidopsis thaliana relative, Arabidopsis lyrata, produced by dideoxy sequencing. We have performed the usual types of genome analysis (gene annotation, dN/dS studies etc. etc.), but this is relegated to the Supporting Information. Instead, we focus on what was a major motivation for sequencing this genome, namely to understand how A. thaliana lost half its genome in a few million years and lived to tell the tale. The rather surprising conclusion is that there is not a single genomic feature that accounts for the reduced genome, but that every aspectmore » centromeres, intergenic regions, transposable elements, gene family number is affected through hundreds of thousands of cuts. This strongly suggests that overall genome size in itself is what has been under selection, a suggestion that is strongly supported by our demonstration (using population genetics data from A. thaliana) that new deletions seem to be driven to fixation.« less

  11. Complete Genome Sequences of 44 Arthrobacter Phages.

    PubMed

    Klyczek, Karen K; Jacobs-Sera, Deborah; Adair, Tamarah L; Adams, Sandra D; Ball, Sarah L; Benjamin, Robert C; Bonilla, J Alfred; Breitenberger, Caroline A; Daniels, Charles J; Gaffney, Bobby L; Harrison, Melinda; Hughes, Lee E; King, Rodney A; Krukonis, Gregory P; Lopez, A Javier; Monsen-Collar, Kirsten; Pizzorno, Marie C; Rinehart, Claire A; Staples, Amanda K; Stowe, Emily L; Garlena, Rebecca A; Russell, Daniel A; Cresawn, Steven G; Pope, Welkin H; Hatfull, Graham F

    2018-02-01

    We report here the complete genome sequences of 44 phages infecting Arthrobacter sp. strain ATCC 21022. These phages have double-stranded DNA genomes with sizes ranging from 15,680 to 70,707 bp and G+C contents from 45.1% to 68.5%. All three tail types (belonging to the families Siphoviridae , Myoviridae , and Podoviridae ) are represented. Copyright © 2018 Klyczek et al.

  12. Complete Genome Sequences of 44 Arthrobacter Phages

    PubMed Central

    Klyczek, Karen K.; Adair, Tamarah L.; Adams, Sandra D.; Ball, Sarah L.; Benjamin, Robert C.; Bonilla, J. Alfred; Breitenberger, Caroline A.; Daniels, Charles J.; Gaffney, Bobby L.; Harrison, Melinda; Hughes, Lee E.; King, Rodney A.; Krukonis, Gregory P.; Lopez, A. Javier; Monsen-Collar, Kirsten; Pizzorno, Marie C.; Staples, Amanda K.; Stowe, Emily L.; Garlena, Rebecca A.; Russell, Daniel A.

    2018-01-01

    ABSTRACT We report here the complete genome sequences of 44 phages infecting Arthrobacter sp. strain ATCC 21022. These phages have double-stranded DNA genomes with sizes ranging from 15,680 to 70,707 bp and G+C contents from 45.1% to 68.5%. All three tail types (belonging to the families Siphoviridae, Myoviridae, and Podoviridae) are represented. PMID:29437090

  13. Complete Genome Sequence of the Fruiting Myxobacterium Melittangium boletus DSM 14713.

    PubMed

    Treuner-Lange, Anke; Bruckskotten, Marc; Rupp, Oliver; Goesmann, Alexander; Søgaard-Andersen, Lotte

    2017-11-09

    The formation of spore-filled fruiting bodies in response to starvation represents a hallmark of many members of the order Myxococcales Here, we present the complete 9.9-Mb genome of the fruiting type strain Melittangium boletus DSM 14713, the first member of this genus to have its genome sequenced. Copyright © 2017 Treuner-Lange et al.

  14. Genome Sequence of Gammaproteobacterial Pseudohaliea rubra Type Strain DSM 19751, Isolated from Coastal Seawater of the Mediterranean Sea

    PubMed Central

    Fiebig, Anne; Riedel, Thomas; Göker, Markus; Klenk, Hans-Peter

    2014-01-01

    Pseudohaliea rubra strain DSM 19751T is an aerobic marine gammaproteobacterium that was isolated from surface coastal seawater of the Mediterranean Sea. Here, we present its genome sequence and annotation. Genome analysis revealed the presence of genes involved in the synthesis of bacteriochlorophyll-a and the reserve compound glycogen. PMID:25414506

  15. Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies.

    PubMed

    Zeng, Lu; Kortschak, R Daniel; Raison, Joy M; Bertozzi, Terry; Adelson, David L

    2018-01-01

    Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package.

  16. Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies

    PubMed Central

    Zeng, Lu; Kortschak, R. Daniel; Raison, Joy M.

    2018-01-01

    Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package. PMID:29538441

  17. DNA capture and next-generation sequencing can recover whole mitochondrial genomes from highly degraded samples for human identification

    PubMed Central

    2013-01-01

    Background Mitochondrial DNA (mtDNA) typing can be a useful aid for identifying people from compromised samples when nuclear DNA is too damaged, degraded or below detection thresholds for routine short tandem repeat (STR)-based analysis. Standard mtDNA typing, focused on PCR amplicon sequencing of the control region (HVS I and HVS II), is limited by the resolving power of this short sequence, which misses up to 70% of the variation present in the mtDNA genome. Methods We used in-solution hybridisation-based DNA capture (using DNA capture probes prepared from modern human mtDNA) to recover mtDNA from post-mortem human remains in which the majority of DNA is both highly fragmented (<100 base pairs in length) and chemically damaged. The method ‘immortalises’ the finite quantities of DNA in valuable extracts as DNA libraries, which is followed by the targeted enrichment of endogenous mtDNA sequences and characterisation by next-generation sequencing (NGS). Results We sequenced whole mitochondrial genomes for human identification from samples where standard nuclear STR typing produced only partial profiles or demonstrably failed and/or where standard mtDNA hypervariable region sequences lacked resolving power. Multiple rounds of enrichment can substantially improve coverage and sequencing depth of mtDNA genomes from highly degraded samples. The application of this method has led to the reliable mitochondrial sequencing of human skeletal remains from unidentified World War Two (WWII) casualties approximately 70 years old and from archaeological remains (up to 2,500 years old). Conclusions This approach has potential applications in forensic science, historical human identification cases, archived medical samples, kinship analysis and population studies. In particular the methodology can be applied to any case, involving human or non-human species, where whole mitochondrial genome sequences are required to provide the highest level of maternal lineage discrimination. Multiple rounds of in-solution hybridisation-based DNA capture can retrieve whole mitochondrial genome sequences from even the most challenging samples. PMID:24289217

  18. Genome sequence of the moderately thermophilic sulfur-reducing bacterium Thermanaerovibrio velox type strain (Z-9701 T) and emended description of the genus Thermanaerovibrio

    DOE PAGES

    Palaniappan, Krishna; Meier-Kolthoff, Jan P.; Teshima, Hazuki; ...

    2013-10-16

    Thermanaerovibrio velox Zavarzina et al. 2000 is a member of the Synergistaceae, a family in the phylum Synergistetes that is already well-characterized at the genome level. Members of this phylum were described as Gram-negative staining anaerobic bacteria with a rod/vibrioid cell shape and possessing an atypical outer cell envelope. They inhabit a large variety of anaerobic environments including soil, oil wells, wastewater treatment plants and animal gastrointestinal tracts. They are also found to be linked to sites of human diseases such as cysts, abscesses, and areas of periodontal disease. The moderately thermophilic and organotrophic T. velox shares most of itsmore » morphologic and physiologic features with the closely related species, T. acidaminovorans. In addition to Su883 T, the type strain of T. acidaminovorans, stain Z-9701 T is the second type strain in the genus Thermanaerovibrio to have its genome sequence published. Here we describe the features of this organism, together with the non-contiguous genome sequence and annotation. The 1,880,838 bp long chromosome (non-contiguous finished sequence) with its 1,751 protein-coding and 59 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  19. Genome sequence of the moderately thermophilic sulfur-reducing bacterium Thermanaerovibrio velox type strain (Z-9701 T) and emended description of the genus Thermanaerovibrio

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Palaniappan, Krishna; Meier-Kolthoff, Jan P.; Teshima, Hazuki

    Thermanaerovibrio velox Zavarzina et al. 2000 is a member of the Synergistaceae, a family in the phylum Synergistetes that is already well-characterized at the genome level. Members of this phylum were described as Gram-negative staining anaerobic bacteria with a rod/vibrioid cell shape and possessing an atypical outer cell envelope. They inhabit a large variety of anaerobic environments including soil, oil wells, wastewater treatment plants and animal gastrointestinal tracts. They are also found to be linked to sites of human diseases such as cysts, abscesses, and areas of periodontal disease. The moderately thermophilic and organotrophic T. velox shares most of itsmore » morphologic and physiologic features with the closely related species, T. acidaminovorans. In addition to Su883 T, the type strain of T. acidaminovorans, stain Z-9701 T is the second type strain in the genus Thermanaerovibrio to have its genome sequence published. Here we describe the features of this organism, together with the non-contiguous genome sequence and annotation. The 1,880,838 bp long chromosome (non-contiguous finished sequence) with its 1,751 protein-coding and 59 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  20. Genome sequence of the moderately thermophilic sulfur-reducing bacterium Thermanaerovibrio velox type strain (Z-9701T) and emended description of the genus Thermanaerovibrio

    PubMed Central

    Palaniappan, Krishna; Meier-Kolthoff, Jan P.; Teshima, Hazuki; Nolan, Matt; Lapidus, Alla; Tice, Hope; Del Rio, Tijana Glavina; Cheng, Jan-Fang; Han, Cliff; Tapia, Roxanne; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Mavromatis, Konstantinos; Pagani, Ioanna; Ivanova, Natalia; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Rohde, Manfred; Mayilraj, Shanmugam; Spring, Stefan; Detter, John C.; Göker, Markus; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Woyke, Tanja

    2013-01-01

    Thermanaerovibrio velox Zavarzina et al. 2000 is a member of the Synergistaceae, a family in the phylum Synergistetes that is already well-characterized at the genome level. Members of this phylum were described as Gram-negative staining anaerobic bacteria with a rod/vibrioid cell shape and possessing an atypical outer cell envelope. They inhabit a large variety of anaerobic environments including soil, oil wells, wastewater treatment plants and animal gastrointestinal tracts. They are also found to be linked to sites of human diseases such as cysts, abscesses, and areas of periodontal disease. The moderately thermophilic and organotrophic T. velox shares most of its morphologic and physiologic features with the closely related species, T. acidaminovorans. In addition to Su883T, the type strain of T. acidaminovorans, stain Z-9701T is the second type strain in the genus Thermanaerovibrio to have its genome sequence published. Here we describe the features of this organism, together with the non-contiguous genome sequence and annotation. The 1,880,838 bp long chromosome (non-contiguous finished sequence) with its 1,751 protein-coding and 59 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:24501645

  1. Complete genome sequence of Nitratifractor salsuginis type strain (E9I37-1T)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, Iain; Sikorski, Johannes; Zeytun, Ahmet

    Nitratifractor salsuginis Nakagawa et al. 2005 is the type species of the genus Nitratifractor, a member of the family Nautiliaceae. The species is of interest because of its high capacity for nitrate reduction via conversion to N2 through respiration, which is a key compound in plant nutrition. The strain is also of interest because it represents the first mesophilic and faculta- tively anaerobic member of the Epsilonproteobacteria reported to grow on molecular hydro- gen. This is the first completed genome sequence of a member of the genus Nitratifractor and the second sequence from the family Nautiliaceae. The 2,101,285 bp longmore » genome with its 2,121 protein-coding and 54 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  2. Serotype IV Sequence Type 468 Group B Streptococcus Neonatal Invasive Disease, Minnesota, USA.

    PubMed

    Teatero, Sarah; Ferrieri, Patricia; Fittipaldi, Nahuel

    2016-11-01

    To further understand the emergence of serotype IV group B Streptococcus (GBS) invasive disease, we used whole-genome sequencing to characterize 3 sequence type 468 strains isolated from neonates in Minnesota, USA. We found that strains of tetracycline-resistant sequence type 468 GBS have acquired virulence genes from a putative clonal complex 17 GBS donor by recombination.

  3. Population genomic data reveal genes related to important traits of quail.

    PubMed

    Wu, Yan; Zhang, Yaolei; Hou, Zhuocheng; Fan, Guangyi; Pi, Jinsong; Sun, Shuai; Chen, Jiang; Liu, Huaqiao; Du, Xiao; Shen, Jie; Hu, Gang; Chen, Wenbin; Pan, Ailuan; Yin, Pingping; Chen, Xiaoli; Pu, Yuejin; Zhang, He; Liang, Zhenhua; Jian, Jianbo; Zhang, Hao; Wu, Bin; Sun, Jing; Chen, Jianwei; Tao, Hu; Yang, Ting; Xiao, Hongwei; Yang, Huan; Zheng, Chuanwei; Bai, Mingzhou; Fang, Xiaodong; Burt, David W; Wang, Wen; Li, Qingyi; Xu, Xun; Li, Chengfeng; Yang, Huanming; Wang, Jian; Yang, Ning; Liu, Xin; Du, Jinping

    2018-05-01

    Japanese quail (Coturnix japonica), a recently domesticated poultry species, is important not only as an agricultural product, but also as a model bird species for genetic research. However, most of the biological questions concerning genomics, phylogenetics, and genetics of some important economic traits have not been answered. It is thus necessary to complete a high-quality genome sequence as well as a series of comparative genomics, evolution, and functional studies. Here, we present a quail genome assembly spanning 1.04 Gb with 86.63% of sequences anchored to 30 chromosomes (28 autosomes and 2 sex chromosomes Z/W). Our genomic data have resolved the long-term debate of phylogeny among Perdicinae (Japanese quail), Meleagridinae (turkey), and Phasianinae (chicken). Comparative genomics and functional genomic data found that four candidate genes involved in early maturation had experienced positive selection, and one of them encodes follicle stimulating hormone beta (FSHβ), which is correlated with different FSHβ levels in quail and chicken. We re-sequenced 31 quails (10 wild, 11 egg-type, and 10 meat-type) and identified 18 and 26 candidate selective sweep regions in the egg-type and meat-type lines, respectively. That only one of them is shared between egg-type and meat-type lines suggests that they were subject to an independent selection. We also detected a haplotype on chromosome Z, which was closely linked with maroon/yellow plumage in quail using population resequencing and a genome-wide association study. This haplotype block will be useful for quail breeding programs. This study provided a high-quality quail reference genome, identified quail-specific genes, and resolved quail phylogeny. We have identified genes related to quail early maturation and a marker for plumage color, which is significant for quail breeding. These results will facilitate biological discovery in quails and help us elucidate the evolutionary processes within the Phasianidae family.

  4. Selection and Validation of a Multilocus Variable-Number Tandem-Repeat Analysis Panel for Typing Shigella spp.▿ †

    PubMed Central

    Gorgé, Olivier; Lopez, Stéphanie; Hilaire, Valérie; Lisanti, Olivier; Ramisse, Vincent; Vergnaud, Gilles

    2008-01-01

    The Shigella genus has historically been separated into four species, based on biochemical assays. The classification within each species relies on serotyping. Recently, genome sequencing and DNA assays, in particular the multilocus sequence typing (MLST) approach, greatly improved the current knowledge of the origin and phylogenetic evolution of Shigella spp. The Shigella and Escherichia genera are now considered to belong to a unique genomospecies. Multilocus variable-number tandem-repeat (VNTR) analysis (MLVA) provides valuable polymorphic markers for genotyping and performing phylogenetic analyses of highly homogeneous bacterial pathogens. Here, we assess the capability of MLVA for Shigella typing. Thirty-two potentially polymorphic VNTRs were selected by analyzing in silico five Shigella genomic sequences and subsequently evaluated. Eventually, a panel of 15 VNTRs was selected (i.e., MLVA15 analysis). MLVA15 analysis of 78 strains or genome sequences of Shigella spp. and 11 strains or genome sequences of Escherichia coli distinguished 83 genotypes. Shigella population cluster analysis gave consistent results compared to MLST. MLVA15 analysis showed capabilities for E. coli typing, providing classification among pathogenic and nonpathogenic E. coli strains included in the study. The resulting data can be queried on our genotyping webpage (http://mlva.u-psud.fr). The MLVA15 assay is rapid, highly discriminatory, and reproducible for Shigella and Escherichia strains, suggesting that it could significantly contribute to epidemiological trace-back analysis of Shigella infections and pathogenic Escherichia outbreaks. Typing was performed on strains obtained mostly from collections. Further studies should include strains of much more diverse origins, including all pathogenic E. coli types. PMID:18216214

  5. Complete genome sequence of the moderately thermophilic mineral-sulfide-oxidizing firmicute Sulfobacillus acidophilus type strain (NALT)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, Iain; Chertkov, Olga; Chen, Amy

    2012-01-01

    Sulfobacillus acidophilus Norris et al. 1996 is a member of the genus Sulfobacillus which comprises five species of the order Clostridiales. Sulfobacillus species are of interest for comparison to other sulfur and iron oxidizers and also have biomining applications. This is the first completed genome sequence of a type strain of the genus Sulfobacillus, and the second published genome of a member of the species S. acidophilus. The genome, which consists of one chromosome and one plasmid with a total size of 3,557,831 bp, harbors 3,626 protein-coding and 69 RNA genes, and is a part of the Genomic Encyclopedia ofmore » Bacteria and Archaea project.« less

  6. Overview of the creative genome: effects of genome structure and sequence on the generation of variation and evolution.

    PubMed

    Caporale, Lynn Helena

    2012-09-01

    This overview of a special issue of Annals of the New York Academy of Sciences discusses uneven distribution of distinct types of variation across the genome, the dependence of specific types of variation upon distinct classes of DNA sequences and/or the induction of specific proteins, the circumstances in which distinct variation-generating systems are activated, and the implications of this work for our understanding of evolution and of cancer. Also discussed is the value of non text-based computational methods for analyzing information carried by DNA, early insights into organizational frameworks that affect genome behavior, and implications of this work for comparative genomics. © 2012 New York Academy of Sciences.

  7. Novel Single Nucleotide Polymorphism-Based Assay for Genotyping Mycobacterium avium subsp. paratuberculosis

    PubMed Central

    Goldstone, Robert J.; McLuckie, Joyce; Smith, David G. E.

    2015-01-01

    Typing of Mycobacterium avium subspecies paratuberculosis strains presents a challenge, since they are genetically monomorphic and traditional molecular techniques have limited discriminatory power. The recent advances and availability of whole-genome sequencing have extended possibilities for the characterization of Mycobacterium avium subspecies paratuberculosis, and whole-genome sequencing can provide a phylogenetic context to facilitate global epidemiology studies. In this study, we developed a single nucleotide polymorphism (SNP) assay based on PCR and restriction enzyme digestion or sequencing of the amplified product. The SNP analysis was performed using genome sequence data from 133 Mycobacterium avium subspecies paratuberculosis isolates with different genotypes from 8 different host species and 17 distinct geographic regions around the world. A total of 28,402 SNPs were identified among all of the isolates. The minimum number of SNPs required to distinguish between all of the 133 genomes was 93 and between only the type C isolates was 41. To reduce the number of SNPs and PCRs required, we adopted an approach based on sequential detection of SNPs and a decision tree. By the analysis of 14 SNPs Mycobacterium avium subspecies paratuberculosis isolates can be characterized within 14 phylogenetic groups with a higher discriminatory power than mycobacterial interspersed repetitive unit–variable number tandem repeat assay and other typing methods. Continuous updating of genome sequences is needed in order to better characterize new phylogenetic groups and SNP profiles. The novel SNP assay is a discriminative, simple, reproducible method and requires only basic laboratory equipment for the large-scale global typing of Mycobacterium avium subspecies paratuberculosis isolates. PMID:26677250

  8. Comparative pathogenomics of Clostridium tetani.

    PubMed

    Cohen, Jonathan E; Wang, Rong; Shen, Rong-Fong; Wu, Wells W; Keller, James E

    2017-01-01

    Clostridium tetani and Clostridium botulinum produce two of the most potent neurotoxins known, tetanus neurotoxin and botulinum neurotoxin, respectively. Extensive biochemical and genetic investigation has been devoted to identifying and characterizing various C. botulinum strains. Less effort has been focused on studying C. tetani likely because recently sequenced strains of C. tetani show much less genetic diversity than C. botulinum strains and because widespread vaccination efforts have reduced the public health threat from tetanus. Our aim was to acquire genomic data on the U.S. vaccine strain of C. tetani to better understand its genetic relationship to previously published genomic data from European vaccine strains. We performed high throughput genomic sequence analysis on two wild-type and two vaccine C. tetani strains. Comparative genomic analysis was performed using these and previously published genomic data for seven other C. tetani strains. Our analysis focused on single nucleotide polymorphisms (SNP) and four distinct constituents of the mobile genome (mobilome): a hypervariable flagellar glycosylation island region, five conserved bacteriophage insertion regions, variations in three CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR-associated) systems, and a single plasmid. Intact type IA and IB CRISPR/Cas systems were within 10 of 11 strains. A type IIIA CRISPR/Cas system was present in two strains. Phage infection histories derived from CRISPR-Cas sequences indicate C. tetani encounters phages common among commensal gut bacteria and soil-borne organisms consistent with C. tetani distribution in nature. All vaccine strains form a clade distinct from currently sequenced wild type strains when considering variations in these mobile elements. SNP, flagellar glycosylation island, prophage content and CRISPR/Cas phylogenic histories provide tentative evidence suggesting vaccine and wild type strains share a common ancestor.

  9. Bifidobacterium animalis subsp. lactis ATCC 27673 Is a Genomically Unique Strain within Its Conserved Subspecies

    PubMed Central

    Loquasto, Joseph R.; Barrangou, Rodolphe; Dudley, Edward G.; Stahl, Buffy; Chen, Chun

    2013-01-01

    Many strains of Bifidobacterium animalis subsp. lactis are considered health-promoting probiotic microorganisms and are commonly formulated into fermented dairy foods. Analyses of previously sequenced genomes of B. animalis subsp. lactis have revealed little genetic diversity, suggesting that it is a monomorphic subspecies. However, during a multilocus sequence typing survey of Bifidobacterium, it was revealed that B. animalis subsp. lactis ATCC 27673 gave a profile distinct from that of the other strains of the subspecies. As part of an ongoing study designed to understand the genetic diversity of this subspecies, the genome of this strain was sequenced and compared to other sequenced genomes of B. animalis subsp. lactis and B. animalis subsp. animalis. The complete genome of ATCC 27673 was 1,963,012 bp, contained 1,616 genes and 4 rRNA operons, and had a G+C content of 61.55%. Comparative analyses revealed that the genome of ATCC 27673 contained six distinct genomic islands encoding 83 open reading frames not found in other strains of the same subspecies. In four islands, either phage or mobile genetic elements were identified. In island 6, a novel clustered regularly interspaced short palindromic repeat (CRISPR) locus which contained 81 unique spacers was identified. This type I-E CRISPR-cas system differs from the type I-C systems previously identified in this subspecies, representing the first identification of a different system in B. animalis subsp. lactis. This study revealed that ATCC 27673 is a strain of B. animalis subsp. lactis with novel genetic content and suggests that the lack of genetic variability observed is likely due to the repeated sequencing of a limited number of widely distributed commercial strains. PMID:23995933

  10. Scanning the landscape of genome architecture of non-O1 and non-O139 Vibrio cholerae by whole genome mapping reveals extensive population genetic diversity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chapman, Carol; Henry, Matthew; Bishop-Lilly, Kimberly A.

    Historically, cholera outbreaks have been linked to V. cholerae O1 serogroup strains or its derivatives of the O37 and O139 serogroups. A genomic study on the 2010 Haiti cholera outbreak strains highlighted the putative role of non O1/non-O139 V. cholerae in causing cholera and the lack of genomic sequences of such strains from around the world. Here we address these gaps by scanning a global collection of V. cholerae strains as a first step towards understanding the population genetic diversity and epidemic potential of non O1/non-O139 strains. Whole Genome Mapping (Optical Mapping) based bar coding produces a high resolution, orderedmore » restriction map, depicting a complete view of the unique chromosomal architecture of an organism. To assess the genomic diversity of non-O1/non-O139 V. cholerae, we applied a Whole Genome Mapping strategy on a well-defined and geographically and temporally diverse strain collection, the Sakazaki serogroup type strains. Whole Genome Map data on 91 of the 206 serogroup type strains support the hypothesis that V. cholerae has an unprecedented genetic and genomic structural diversity. Interestingly, we discovered chromosomal fusions in two unusual strains that possess a single chromosome instead of the two chromosomes usually found in V. cholerae. We also found pervasive chromosomal rearrangements such as duplications and indels in many strains. The majority of Vibrio genome sequences currently in public databases are unfinished draft sequences. The Whole Genome Mapping approach presented here enables rapid screening of large strain collections to capture genomic complexities that would not have been otherwise revealed by unfinished draft genome sequencing and thus aids in assembling and finishing draft sequences of complex genomes. Furthermore, Whole Genome Mapping allows for prediction of novel V. cholerae non-O1/non-O139 strains that may have the potential to cause future cholera outbreaks.« less

  11. Scanning the landscape of genome architecture of non-O1 and non-O139 Vibrio cholerae by whole genome mapping reveals extensive population genetic diversity.

    PubMed

    Chapman, Carol; Henry, Matthew; Bishop-Lilly, Kimberly A; Awosika, Joy; Briska, Adam; Ptashkin, Ryan N; Wagner, Trevor; Rajanna, Chythanya; Tsang, Hsinyi; Johnson, Shannon L; Mokashi, Vishwesh P; Chain, Patrick S G; Sozhamannan, Shanmuga

    2015-01-01

    Historically, cholera outbreaks have been linked to V. cholerae O1 serogroup strains or its derivatives of the O37 and O139 serogroups. A genomic study on the 2010 Haiti cholera outbreak strains highlighted the putative role of non O1/non-O139 V. cholerae in causing cholera and the lack of genomic sequences of such strains from around the world. Here we address these gaps by scanning a global collection of V. cholerae strains as a first step towards understanding the population genetic diversity and epidemic potential of non O1/non-O139 strains. Whole Genome Mapping (Optical Mapping) based bar coding produces a high resolution, ordered restriction map, depicting a complete view of the unique chromosomal architecture of an organism. To assess the genomic diversity of non-O1/non-O139 V. cholerae, we applied a Whole Genome Mapping strategy on a well-defined and geographically and temporally diverse strain collection, the Sakazaki serogroup type strains. Whole Genome Map data on 91 of the 206 serogroup type strains support the hypothesis that V. cholerae has an unprecedented genetic and genomic structural diversity. Interestingly, we discovered chromosomal fusions in two unusual strains that possess a single chromosome instead of the two chromosomes usually found in V. cholerae. We also found pervasive chromosomal rearrangements such as duplications and indels in many strains. The majority of Vibrio genome sequences currently in public databases are unfinished draft sequences. The Whole Genome Mapping approach presented here enables rapid screening of large strain collections to capture genomic complexities that would not have been otherwise revealed by unfinished draft genome sequencing and thus aids in assembling and finishing draft sequences of complex genomes. Furthermore, Whole Genome Mapping allows for prediction of novel V. cholerae non-O1/non-O139 strains that may have the potential to cause future cholera outbreaks.

  12. Scanning the landscape of genome architecture of non-O1 and non-O139 Vibrio cholerae by whole genome mapping reveals extensive population genetic diversity

    DOE PAGES

    Chapman, Carol; Henry, Matthew; Bishop-Lilly, Kimberly A.; ...

    2015-03-20

    Historically, cholera outbreaks have been linked to V. cholerae O1 serogroup strains or its derivatives of the O37 and O139 serogroups. A genomic study on the 2010 Haiti cholera outbreak strains highlighted the putative role of non O1/non-O139 V. cholerae in causing cholera and the lack of genomic sequences of such strains from around the world. Here we address these gaps by scanning a global collection of V. cholerae strains as a first step towards understanding the population genetic diversity and epidemic potential of non O1/non-O139 strains. Whole Genome Mapping (Optical Mapping) based bar coding produces a high resolution, orderedmore » restriction map, depicting a complete view of the unique chromosomal architecture of an organism. To assess the genomic diversity of non-O1/non-O139 V. cholerae, we applied a Whole Genome Mapping strategy on a well-defined and geographically and temporally diverse strain collection, the Sakazaki serogroup type strains. Whole Genome Map data on 91 of the 206 serogroup type strains support the hypothesis that V. cholerae has an unprecedented genetic and genomic structural diversity. Interestingly, we discovered chromosomal fusions in two unusual strains that possess a single chromosome instead of the two chromosomes usually found in V. cholerae. We also found pervasive chromosomal rearrangements such as duplications and indels in many strains. The majority of Vibrio genome sequences currently in public databases are unfinished draft sequences. The Whole Genome Mapping approach presented here enables rapid screening of large strain collections to capture genomic complexities that would not have been otherwise revealed by unfinished draft genome sequencing and thus aids in assembling and finishing draft sequences of complex genomes. Furthermore, Whole Genome Mapping allows for prediction of novel V. cholerae non-O1/non-O139 strains that may have the potential to cause future cholera outbreaks.« less

  13. Leptospira species molecular epidemiology in the genomic era.

    PubMed

    Caimi, K; Repetto, S A; Varni, V; Ruybal, P

    2017-10-01

    Leptospirosis is a zoonotic disease which global burden is increasing often related to climatic change. Hundreds of whole genome sequences from worldwide isolates of Leptospira spp. are available nowadays, together with online tools that permit to assign MLST sequence types (STs) directly from raw sequence data. In this work we have applied R7L-MLST to near 500 genomes and strains collection globally distributed. All 10 pathogenic species as well as intermediate were typed using this MLST scheme. The correlation observed between STs and serogroups in our previous work, is still satisfied with this higher dataset sustaining the implementation of MLST to assist serological classification as a complementary approach. Bayesian phylogenetic analysis of concatenated sequences from R7-MLST loci allowed us to resolve taxonomic inconsistencies but also showed that events such as recombination, gene conversion or lateral gene transfer played an important role in the evolution of Leptospira genus. Whole genome sequencing allows us to contribute with suitable epidemiologic information useful to apply in the design of control strategies and also in diagnostic methods for this illness. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Universal Influenza B Virus Genomic Amplification Facilitates Sequencing, Diagnostics, and Reverse Genetics

    PubMed Central

    Zhou, Bin; Lin, Xudong; Wang, Wei; Halpin, Rebecca A.; Bera, Jayati; Stockwell, Timothy B.; Barr, Ian G.

    2014-01-01

    Although human influenza B virus (IBV) is a significant human pathogen, its great genetic diversity has limited our ability to universally amplify the entire genome for subsequent sequencing or vaccine production. The generation of sequence data via next-generation approaches and the rapid cloning of viral genes are critical for basic research, diagnostics, antiviral drugs, and vaccines to combat IBV. To overcome the difficulty of amplifying the diverse and ever-changing IBV genome, we developed and optimized techniques that amplify the complete segmented negative-sense RNA genome from any IBV strain in a single tube/well (IBV genomic amplification [IBV-GA]). Amplicons for >1,000 diverse IBV genomes from different sample types (e.g., clinical specimens) were generated and sequenced using this robust technology. These approaches are sensitive, robust, and sequence independent (i.e., universally amplify past, present, and future IBVs), which facilitates next-generation sequencing and advanced genomic diagnostics. Importantly, special terminal sequences engineered into the optimized IBV-GA2 products also enable ligation-free cloning to rapidly generate reverse-genetics plasmids, which can be used for the rescue of recombinant viruses and/or the creation of vaccine seed stock. PMID:24501036

  15. Complete chloroplast and ribosomal sequences for 30 accessions elucidate evolution of Oryza AA genome species

    PubMed Central

    Kim, Kyunghee; Lee, Sang-Choon; Lee, Junki; Yu, Yeisoo; Yang, Kiwoung; Choi, Beom-Soon; Koh, Hee-Jong; Waminal, Nomar Espinosa; Choi, Hong-Il; Kim, Nam-Hoon; Jang, Woojong; Park, Hyun-Seung; Lee, Jonghoon; Lee, Hyun Oh; Joh, Ho Jun; Lee, Hyeon Ju; Park, Jee Young; Perumal, Sampath; Jayakodi, Murukarthick; Lee, Yun Sun; Kim, Backki; Copetti, Dario; Kim, Soonok; Kim, Sunggil; Lim, Ki-Byung; Kim, Young-Dong; Lee, Jungho; Cho, Kwang-Su; Park, Beom-Seok; Wing, Rod A.; Yang, Tae-Jin

    2015-01-01

    Cytoplasmic chloroplast (cp) genomes and nuclear ribosomal DNA (nR) are the primary sequences used to understand plant diversity and evolution. We introduce a high-throughput method to simultaneously obtain complete cp and nR sequences using Illumina platform whole-genome sequence. We applied the method to 30 rice specimens belonging to nine Oryza species. Concurrent phylogenomic analysis using cp and nR of several of specimens of the same Oryza AA genome species provides insight into the evolution and domestication of cultivated rice, clarifying three ambiguous but important issues in the evolution of wild Oryza species. First, cp-based trees clearly classify each lineage but can be biased by inter-subspecies cross-hybridization events during speciation. Second, O. glumaepatula, a South American wild rice, includes two cytoplasm types, one of which is derived from a recent interspecies hybridization with O. longistminata. Third, the Australian O. rufipogan-type rice is a perennial form of O. meridionalis. PMID:26506948

  16. Complete Genome Sequence and Comparative Genomics of a Streptococcus pyogenes emm3 Strain M3-b isolated from a Japanese Patient with Streptococcal Toxic Shock Syndrome.

    PubMed

    Ogura, Kohei; Watanabe, Shinya; Kirikae, Teruo; Miyoshi-Akiyama, Tohru

    2017-01-01

    Epidemiologic typing of Streptococcus pyogenes (GAS) is frequently based on the genotype of the emm gene, which encodes M/Emm protein. In this study, the complete genome sequence of GAS emm3 strain M3-b, isolated from a patient with streptococcal toxic shock syndrome (STSS), was determined. This strain exhibited 99% identity with other complete genome sequences of emm3 strains MGAS315, SSI-1, and STAB902. The complete genomes of five additional strains isolated from Japanese patients with and without STSS were also sequences. Maximum-likelihood phylogenetic analysis showed that strains M3-b, M3-e, and SSI-1, all which were isolated from STSS patients, were relatively close.

  17. The complete and fully assembled genome sequence of Aeromonas salmonicida subsp. pectinolytica and its comparative analysis with other Aeromonas species: investigation of the mobilome in environmental and pathogenic strains.

    PubMed

    Pfeiffer, Friedhelm; Zamora-Lagos, Maria-Antonia; Blettinger, Martin; Yeroslaviz, Assa; Dahl, Andreas; Gruber, Stephan; Habermann, Bianca H

    2018-01-05

    Due to the predominant usage of short-read sequencing to date, most bacterial genome sequences reported in the last years remain at the draft level. This precludes certain types of analyses, such as the in-depth analysis of genome plasticity. Here we report the finalized genome sequence of the environmental strain Aeromonas salmonicida subsp. pectinolytica 34mel, for which only a draft genome with 253 contigs is currently available. Successful completion of the transposon-rich genome critically depended on the PacBio long read sequencing technology. Using finalized genome sequences of A. salmonicida subsp. pectinolytica and other Aeromonads, we report the detailed analysis of the transposon composition of these bacterial species. Mobilome evolution is exemplified by a complex transposon, which has shifted from pathogenicity-related to environmental-related gene content in A. salmonicida subsp. pectinolytica 34mel. Obtaining the complete, circular genome of A. salmonicida subsp. pectinolytica allowed us to perform an in-depth analysis of its mobilome. We demonstrate the mobilome-dependent evolution of this strain's genetic profile from pathogenic to environmental.

  18. The complete genome sequence of human adenovirus 84, a highly recombinant new Human mastadenovirus D type with a unique fiber gene.

    PubMed

    Kaján, Győző L; Kajon, Adriana E; Pinto, Alexis Castillo; Bartha, Dániel; Arnberg, Niklas

    2017-10-15

    A novel human adenovirus was isolated from a pediatric case of acute respiratory disease in Panama City, Panama in 2011. The clinical isolate was initially identified as an intertypic recombinant based on hexon and fiber gene sequencing. Based on the analysis of its complete genome sequence, the novel complex recombinant Human mastadenovirus D (HAdV-D) strain was classified into a new HAdV type: HAdV-84, and it was designated Adenovirus D human/PAN/P309886/2011/84[P43H17F84]. HAdV-D types possess usually an ocular or gastrointestinal tropism, and respiratory association is scarcely reported. The virus has a novel fiber type, most closely related to, but still clearly distant from that of HAdV-36. The predicted fiber is hypothesised to bind sialic acid with lower affinity compared to HAdV-37. Bioinformatic analysis of the complete genomic sequence of HAdV-84 revealed multiple homologous recombination events and provided deeper insight into HAdV evolution. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. Genomic insights from whole genome sequencing of four clonal outbreak Campylobacter jejuni assessed within the global C. jejuni population.

    PubMed

    Clark, Clifford G; Berry, Chrystal; Walker, Matthew; Petkau, Aaron; Barker, Dillon O R; Guan, Cai; Reimer, Aleisha; Taboada, Eduardo N

    2016-12-03

    Whole genome sequencing (WGS) is useful for determining clusters of human cases, investigating outbreaks, and defining the population genetics of bacteria. It also provides information about other aspects of bacterial biology, including classical typing results, virulence, and adaptive strategies of the organism. Cell culture invasion and protein expression patterns of four related multilocus sequence type 21 (ST21) C. jejuni isolates from a significant Canadian water-borne outbreak were previously associated with the presence of a CJIE1 prophage. Whole genome sequencing was used to examine the genetic diversity among these isolates and confirm that previous observations could be attributed to differential prophage carriage. Moreover, we sought to determine the presence of genome sequences that could be used as surrogate markers to delineate outbreak-associated isolates. Differential carriage of the CJIE1 prophage was identified as the major genetic difference among the four outbreak isolates. High quality single-nucleotide variant (hqSNV) and core genome multilocus sequence typing (cgMLST) clustered these isolates within expanded datasets consisting of additional C. jejuni strains. The number and location of homopolymeric tract regions was identical in all four outbreak isolates but differed from all other C. jejuni examined. Comparative genomics and PCR amplification enabled the identification of large chromosomal inversions of approximately 93 kb and 388 kb within the outbreak isolates associated with transducer-like proteins containing long nucleotide repeat sequences. The 93-kb inversion was characteristic of the outbreak-associated isolates, and the gene content of this inverted region displayed high synteny with the reference strain. The four outbreak isolates were clonally derived and differed mainly in the presence of the CJIE1 prophage, validating earlier findings linking the prophage to phenotypic differences in virulence assays and protein expression. The identification of large, genetically syntenous chromosomal inversions in the genomes of outbreak-associated isolates provided a unique method for discriminating outbreak isolates from the background population. Transducer-like proteins appear to be associated with the chromosomal inversions. CgMLST and hqSNV analysis also effectively delineated the outbreak isolates within the larger C. jejuni population structure.

  20. Species-specific Typing of DNA Based on Palindrome Frequency Patterns

    PubMed Central

    Lamprea-Burgunder, Estelle; Ludin, Philipp; Mäser, Pascal

    2011-01-01

    DNA in its natural, double-stranded form may contain palindromes, sequences which read the same from either side because they are identical to their reverse complement on the sister strand. Short palindromes are underrepresented in all kinds of genomes. The frequency distribution of short palindromes exhibits more than twice the inter-species variance of non-palindromic sequences, which renders palindromes optimally suited for the typing of DNA. Here, we show that based on palindrome frequency, DNA sequences can be discriminated to the level of species of origin. By plotting the ratios of actual occurrence to expectancy, we generate palindrome frequency patterns that allow to cluster different sequences of the same genome and to assign plasmids, and in some cases even viruses to their respective host genomes. This finding will be of use in the growing field of metagenomics. PMID:21429991

  1. Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq

    PubMed Central

    Ode, Hirotaka; Matsuda, Masakazu; Matsuoka, Kazuhiro; Hachiya, Atsuko; Hattori, Junko; Kito, Yumiko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru

    2015-01-01

    Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome. PMID:26617593

  2. High quality draft genome sequence and analysis of Pontibacter roseus type strain SRC-1T (DSM 17521T) isolated from muddy waters of a drainage system in Chandigarh, India

    DOE PAGES

    Mukherjee, Supratim; Lapidus, Alla; Shapiro, Nicole; ...

    2015-02-09

    Pontibacter roseus is a member of genus Pontibacter family Cytophagaceae, class Cytophagia. While the type species of the genus Pontibacter actiniarum was isolated in 2005 from a marine environment, subsequent species of the same genus have been found in different types of habitats ranging from seawater, sediment, desert soil, rhizosphere, contaminated sites, solar saltern and muddy water. Here we describe the features of Pontibacter roseus strain SRC-1 T along with its complete genome sequence and annotation from a culture of DSM 17521 T. In conclusion, the 4,581,480 bp long draft genome consists of 12 scaffolds with 4,003 protein-coding and 50more » RNA genes and is a part of Genomic Encyclopedia of Type Strains: KMG-I project.« less

  3. Optimization and comparative analysis of plant organellar DNA enrichment methods suitable for next generation sequencing

    USDA-ARS?s Scientific Manuscript database

    Plant organellar genomes contain large repetitive elements that may undergo pairing or recombination to form complex structures and/or sub-genomic fragments. Organellar genomes also exist in admixtures within a given cell or tissue type (heteroplasmy) and abundance of sub-types may change through de...

  4. Whole-Genome-Sequencing characterization of bloodstream infection-causing hypervirulent Klebsiella pneumoniae of capsular serotype K2 and ST374.

    PubMed

    Wang, Xiaoli; Xie, Yingzhou; Li, Gang; Liu, Jialin; Li, Xiaobin; Tian, Lijun; Sun, Jingyong; Ou, Hong-Yu; Qu, Hongping

    2018-01-01

    Hypervirulent K. pneumoniae variants (hvKP) have been increasingly reported worldwide, causing metastasis of severe infections such as liver abscesses and bacteremia. The capsular serotype K2 hvKP strains show diverse multi-locus sequence types (MLSTs), but with limited genetics and virulence information. In this study, we report a hypermucoviscous K. pneumoniae strain, RJF293, isolated from a human bloodstream sample in a Chinese hospital. It caused a metastatic infection and fatal septic shock in a critical patient. The microbiological features and genetic background were investigated with multiple approaches. The Strain RJF293 was determined to be multilocis sequence type (ST) 374 and serotype K2, displayed a median lethal dose (LD50) of 1.5 × 10 2 CFU in BALB/c mice and was as virulent as the ST23 K1 serotype hvKP strain NTUH-K2044 in a mouse lethality assay. Whole genome sequencing revealed that the RJF293 genome codes for 32 putative virulence factors and exhibits a unique presence/absence pattern in comparison to the other 105 completely sequenced K. pneumoniae genomes. Whole genome SNP-based phylogenetic analysis revealed that strain RJF293 formed a single clade, distant from those containing either ST66 or ST86 hvKP. Compared to the other sequenced hvKP chromosomes, RJF293 contains several strain-variable regions, including one prophage, one ICEKp1 family integrative and conjugative element and six large genomic islands. The sequencing of the first complete genome of an ST374 K2 hvKP clinical strain should reinforce our understanding of the epidemiology and virulence mechanisms of this bloodstream infection-causing hvKP with clinical significance.

  5. Whole-Genome-Sequencing characterization of bloodstream infection-causing hypervirulent Klebsiella pneumoniae of capsular serotype K2 and ST374

    PubMed Central

    Wang, Xiaoli; Xie, Yingzhou; Li, Gang; Liu, Jialin; Li, Xiaobin; Tian, Lijun; Sun, Jingyong; Qu, Hongping

    2018-01-01

    ABSTRACT Hypervirulent K. pneumoniae variants (hvKP) have been increasingly reported worldwide, causing metastasis of severe infections such as liver abscesses and bacteremia. The capsular serotype K2 hvKP strains show diverse multi-locus sequence types (MLSTs), but with limited genetics and virulence information. In this study, we report a hypermucoviscous K. pneumoniae strain, RJF293, isolated from a human bloodstream sample in a Chinese hospital. It caused a metastatic infection and fatal septic shock in a critical patient. The microbiological features and genetic background were investigated with multiple approaches. The Strain RJF293 was determined to be multilocis sequence type (ST) 374 and serotype K2, displayed a median lethal dose (LD50) of 1.5 × 102 CFU in BALB/c mice and was as virulent as the ST23 K1 serotype hvKP strain NTUH-K2044 in a mouse lethality assay. Whole genome sequencing revealed that the RJF293 genome codes for 32 putative virulence factors and exhibits a unique presence/absence pattern in comparison to the other 105 completely sequenced K. pneumoniae genomes. Whole genome SNP-based phylogenetic analysis revealed that strain RJF293 formed a single clade, distant from those containing either ST66 or ST86 hvKP. Compared to the other sequenced hvKP chromosomes, RJF293 contains several strain-variable regions, including one prophage, one ICEKp1 family integrative and conjugative element and six large genomic islands. The sequencing of the first complete genome of an ST374 K2 hvKP clinical strain should reinforce our understanding of the epidemiology and virulence mechanisms of this bloodstream infection-causing hvKP with clinical significance. PMID:29338592

  6. Insights from 20 years of bacterial genome sequencing

    DOE PAGES

    Land, Miriam L.; Hauser, Loren; Jun, Se-Ran; ...

    2015-02-27

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date,more » there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them.« less

  7. Insights from 20 years of bacterial genome sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Land, Miriam L.; Hauser, Loren; Jun, Se-Ran

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date,more » there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them.« less

  8. Draft Genome Sequence of Pedobacter agri PB92T, Which Belongs to the Family Sphingobacteriaceae

    PubMed Central

    Lee, Myunglip; Roh, Seong Woon; Lee, Hae-Won; Yim, Kyung June; Kim, Kil-Nam; Bae, Jin-Woo; Choi, Kwang-Sik; Jeon, You-Jin; Jung, Won-Kyo; Kang, Heewan

    2012-01-01

    Strain PB92T of Pedobacter agri, which belongs to the family Sphingobacteriaceae, was isolated from soil in the Republic of Korea. The draft genome of strain PB92T contains 5,141,552 bp, with a G+C content of 38.0%. This is the third genome sequencing project of the type strains among the Pedobacter species. PMID:22740666

  9. Genome Sequence of the Shiga Toxin-Producing Escherichia coli Strain NCCP15657

    PubMed Central

    Kim, Byung Kwon; Song, Geun Cheol; Hong, Gun Hyong; Seong, Won-Keun; Kim, Seon-Young; Jeong, Haeyoung; Kang, Sung Gyun; Kwon, Soon-Kyeong; Lee, Choong Hoon; Song, Ju Yeon; Yu, Dong Su; Park, Mi-Sun

    2012-01-01

    Shiga toxin-producing Escherichia coli causes bloody diarrhea and hemolytic-uremic syndrome and serious outbreaks worldwide. Here, we report the draft genome sequence of E. coli NCCP15657 isolated from a patient. The genome has virulence genes, many in the locus of enterocyte effacement (LEE) island, encoding a metalloprotease, the Shiga toxin, and constituents of type III secretion. PMID:22740674

  10. De Novo Assembly of Human Herpes Virus Type 1 (HHV-1) Genome, Mining of Non-Canonical Structures and Detection of Novel Drug-Resistance Mutations Using Short- and Long-Read Next Generation Sequencing Technologies

    PubMed Central

    Karamitros, Timokratis; Piorkowska, Renata; Katzourakis, Aris; Magiorkinis, Gkikas; Mbisa, Jean Lutamyo

    2016-01-01

    Human herpesvirus type 1 (HHV-1) has a large double-stranded DNA genome of approximately 152 kbp that is structurally complex and GC-rich. This makes the assembly of HHV-1 whole genomes from short-read sequencing data technically challenging. To improve the assembly of HHV-1 genomes we have employed a hybrid genome assembly protocol using data from two sequencing technologies: the short-read Roche 454 and the long-read Oxford Nanopore MinION sequencers. We sequenced 18 HHV-1 cell culture-isolated clinical specimens collected from immunocompromised patients undergoing antiviral therapy. The susceptibility of the samples to several antivirals was determined by plaque reduction assay. Hybrid genome assembly resulted in a decrease in the number of contigs in 6 out of 7 samples and an increase in N(G)50 and N(G)75 of all 7 samples sequenced by both technologies. The approach also enhanced the detection of non-canonical contigs including a rearrangement between the unique (UL) and repeat (T/IRL) sequence regions of one sample that was not detectable by assembly of 454 reads alone. We detected several known and novel resistance-associated mutations in UL23 and UL30 genes. Genome-wide genetic variability ranged from <1% to 53% of amino acids in each gene exhibiting at least one substitution within the pool of samples. The UL23 gene had one of the highest genetic variabilities at 35.2% in keeping with its role in development of drug resistance. The assembly of accurate, full-length HHV-1 genomes will be useful in determining genetic determinants of drug resistance, virulence, pathogenesis and viral evolution. The numerous, complex repeat regions of the HHV-1 genome currently remain a barrier towards this goal. PMID:27309375

  11. De Novo Assembly of Human Herpes Virus Type 1 (HHV-1) Genome, Mining of Non-Canonical Structures and Detection of Novel Drug-Resistance Mutations Using Short- and Long-Read Next Generation Sequencing Technologies.

    PubMed

    Karamitros, Timokratis; Harrison, Ian; Piorkowska, Renata; Katzourakis, Aris; Magiorkinis, Gkikas; Mbisa, Jean Lutamyo

    2016-01-01

    Human herpesvirus type 1 (HHV-1) has a large double-stranded DNA genome of approximately 152 kbp that is structurally complex and GC-rich. This makes the assembly of HHV-1 whole genomes from short-read sequencing data technically challenging. To improve the assembly of HHV-1 genomes we have employed a hybrid genome assembly protocol using data from two sequencing technologies: the short-read Roche 454 and the long-read Oxford Nanopore MinION sequencers. We sequenced 18 HHV-1 cell culture-isolated clinical specimens collected from immunocompromised patients undergoing antiviral therapy. The susceptibility of the samples to several antivirals was determined by plaque reduction assay. Hybrid genome assembly resulted in a decrease in the number of contigs in 6 out of 7 samples and an increase in N(G)50 and N(G)75 of all 7 samples sequenced by both technologies. The approach also enhanced the detection of non-canonical contigs including a rearrangement between the unique (UL) and repeat (T/IRL) sequence regions of one sample that was not detectable by assembly of 454 reads alone. We detected several known and novel resistance-associated mutations in UL23 and UL30 genes. Genome-wide genetic variability ranged from <1% to 53% of amino acids in each gene exhibiting at least one substitution within the pool of samples. The UL23 gene had one of the highest genetic variabilities at 35.2% in keeping with its role in development of drug resistance. The assembly of accurate, full-length HHV-1 genomes will be useful in determining genetic determinants of drug resistance, virulence, pathogenesis and viral evolution. The numerous, complex repeat regions of the HHV-1 genome currently remain a barrier towards this goal.

  12. Complete genome sequence of the Campylobacter cuniculorum type strain LMG 24588

    USDA-ARS?s Scientific Manuscript database

    Campylobacter cuniculorum has been isolated from rabbits (Oryctolagus cuniculus). Although C. cuniculorum is highly prevalent in rabbits farmed for human consumption, the pathogenicity of this organism in humans is still unknown. This study describes the whole-genome sequence of the C. cuniculorum t...

  13. Integrative Genomics Viewer (IGV) | Informatics Technology for Cancer Research (ITCR)

    Cancer.gov

    The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.

  14. Complete genome sequence of Haliscomenobacter hydrossis type strain (OT)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Daligault, Hajnalka E.; Lapidus, Alla L.; Zeytun, Ahmet

    2011-01-01

    Haliscomenobacter hydrossis van Veen et al. 1973 is the type species of the genus Halisco- menobacter, which belongs to order 'Sphingobacteriales'. The species is of interest because of its isolated phylogenetic location in the tree of life, especially the so far genomically un- charted part of it, and because the organism grows in a thin, hardly visible hyaline sheath. Members of the species were isolated from fresh water of lakes and from ditch water. The genome of H. hydrossis is the first completed genome sequence reported from a member of the family 'Saprospiraceae'. The 8,771,651 bp long genome with itsmore » three plasmids of 92 kbp, 144 kbp and 164 kbp length contains 6,848 protein-coding and 60 RNA genes, and is a part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  15. Genomes of microsporidia in mosquitoes: status and preliminary findings

    USDA-ARS?s Scientific Manuscript database

    The status and preliminary findings for full genome sequencing of three species of microsporidia with mosquitoes as type hosts will be presented. Vavraia culicis, the type species of the genus Vavraia, was originally described from Culex pipiens. Type material was not available and therefore Vavra...

  16. Comparative Analysis of Genome Sequences Covering the Seven Cronobacter Species

    PubMed Central

    Cummings, Craig A.; Shih, Rita; Degoricija, Lovorka; Rico, Alain; Brzoska, Pius; Hamby, Stephen E.; Masood, Naqash; Hariri, Sumyya; Sonbol, Hana; Chuzhanova, Nadia; McClelland, Michael; Furtado, Manohar R.; Forsythe, Stephen J.

    2012-01-01

    Background Species of Cronobacter are widespread in the environment and are occasional food-borne pathogens associated with serious neonatal diseases, including bacteraemia, meningitis, and necrotising enterocolitis. The genus is composed of seven species: C. sakazakii, C. malonaticus, C. turicensis, C. dublinensis, C. muytjensii, C. universalis, and C. condimenti. Clinical cases are associated with three species, C. malonaticus, C. turicensis and, in particular, with C. sakazakii multilocus sequence type 4. Thus, it is plausible that virulence determinants have evolved in certain lineages. Methodology/Principal Findings We generated high quality sequence drafts for eleven Cronobacter genomes representing the seven Cronobacter species, including an ST4 strain of C. sakazakii. Comparative analysis of these genomes together with the two publicly available genomes revealed Cronobacter has over 6,000 genes in one or more strains and over 2,000 genes shared by all Cronobacter. Considerable variation in the presence of traits such as type six secretion systems, metal resistance (tellurite, copper and silver), and adhesins were found. C. sakazakii is unique in the Cronobacter genus in encoding genes enabling the utilization of exogenous sialic acid which may have clinical significance. The C. sakazakii ST4 strain 701 contained additional genes as compared to other C. sakazakii but none of them were known specific virulence-related genes. Conclusions/Significance Genome comparison revealed that pair-wise DNA sequence identity varies between 89 and 97% in the seven Cronobacter species, and also suggested various degrees of divergence. Sets of universal core genes and accessory genes unique to each strain were identified. These gene sequences can be used for designing genus/species specific detection assays. Genes encoding adhesins, T6SS, and metal resistance genes as well as prophages are found in only subsets of genomes and have contributed considerably to the variation of genomic content. Differences in gene content likely contribute to differences in the clinical and environmental distribution of species and sequence types. PMID:23166675

  17. Evolutionary Dynamics of Microsatellite Distribution in Plants: Insight from the Comparison of Sequenced Brassica, Arabidopsis and Other Angiosperm Species

    PubMed Central

    Shi, Jiaqin; Huang, Shunmou; Fu, Donghui; Yu, Jinyin; Wang, Xinfa; Hua, Wei; Liu, Shengyi; Liu, Guihua; Wang, Hanzhong

    2013-01-01

    Despite their ubiquity and functional importance, microsatellites have been largely ignored in comparative genomics, mostly due to the lack of genomic information. In the current study, microsatellite distribution was characterized and compared in the whole genomes and both the coding and non-coding DNA sequences of the sequenced Brassica, Arabidopsis and other angiosperm species to investigate their evolutionary dynamics in plants. The variation in the microsatellite frequencies of these angiosperm species was much smaller than those for their microsatellite numbers and genome sizes, suggesting that microsatellite frequency may be relatively stable in plants. The microsatellite frequencies of these angiosperm species were significantly negatively correlated with both their genome sizes and transposable elements contents. The pattern of microsatellite distribution may differ according to the different genomic regions (such as coding and non-coding sequences). The observed differences in many important microsatellite characteristics (especially the distribution with respect to motif length, type and repeat number) of these angiosperm species were generally accordant with their phylogenetic distance, which suggested that the evolutionary dynamics of microsatellite distribution may be generally consistent with plant divergence/evolution. Importantly, by comparing these microsatellite characteristics (especially the distribution with respect to motif type) the angiosperm species (aside from a few species) all clustered into two obviously different groups that were largely represented by monocots and dicots, suggesting a complex and generally dichotomous evolutionary pattern of microsatellite distribution in angiosperms. Polyploidy may lead to a slight increase in microsatellite frequency in the coding sequences and a significant decrease in microsatellite frequency in the whole genome/non-coding sequences, but have little effect on the microsatellite distribution with respect to motif length, type and repeat number. Interestingly, several microsatellite characteristics seemed to be constant in plant evolution, which can be well explained by the general biological rules. PMID:23555856

  18. Complete genome sequence of Catenulispora acidiphila type strain (ID 139908T)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Copeland, A; Lapidus, Alla L.; Glavina Del Rio, Tijana

    2009-01-01

    Catenulispora acidiphila Busti et al. 2006 is the type species of the genus Catenulispora, and is of interest because of the rather isolated phylogenetic location it occupies within the scarcely explored suborder Catenulisporineae of the order Actinomycetales. C. acidiphilia is known for its acidophilic, aerobic lifestyle, but can also grow scantly under anaerobic condi-tions. Under regular conditions, C. acidiphilia grows in long filaments of relatively short aerial hyphae with marked septation. It is a free living, non motile, Gram-positive bacterium iso-lated from a forest soil sample taken from a wooded area in Gerenzano, Italy. Here we de-scribe the features ofmore » this organism, together with the complete genome sequence and anno-tation. This is the first complete genome sequence of the actinobacterial family Catenulispo-raceae, and the 10,467,782 bp long single replicon genome with its 9056 protein-coding and 69 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  19. The Complete Mitochondrial DNA Sequence of Scenedesmus obliquus Reflects an Intermediate Stage in the Evolution of the Green Algal Mitochondrial Genome

    PubMed Central

    Nedelcu, Aurora M.; Lee, Robert W.; Lemieux, Claude; Gray, Michael W.; Burger, Gertraud

    2000-01-01

    Two distinct mitochondrial genome types have been described among the green algal lineages investigated to date: a reduced–derived, Chlamydomonas-like type and an ancestral, Prototheca-like type. To determine if this unexpected dichotomy is real or is due to insufficient or biased sampling and to define trends in the evolution of the green algal mitochondrial genome, we sequenced and analyzed the mitochondrial DNA (mtDNA) of Scenedesmus obliquus. This genome is 42,919 bp in size and encodes 42 conserved genes (i.e., large and small subunit rRNA genes, 27 tRNA and 13 respiratory protein-coding genes), four additional free-standing open reading frames with no known homologs, and an intronic reading frame with endonuclease/maturase similarity. No 5S rRNA or ribosomal protein-coding genes have been identified in Scenedesmus mtDNA. The standard protein-coding genes feature a deviant genetic code characterized by the use of UAG (normally a stop codon) to specify leucine, and the unprecedented use of UCA (normally a serine codon) as a signal for termination of translation. The mitochondrial genome of Scenedesmus combines features of both green algal mitochondrial genome types: the presence of a more complex set of protein-coding and tRNA genes is shared with the ancestral type, whereas the lack of 5S rRNA and ribosomal protein-coding genes as well as the presence of fragmented and scrambled rRNA genes are shared with the reduced–derived type of mitochondrial genome organization. Furthermore, the gene content and the fragmentation pattern of the rRNA genes suggest that this genome represents an intermediate stage in the evolutionary process of mitochondrial genome streamlining in green algae. [The sequence data described in this paper have been submitted to the GenBank data library under accession no. AF204057.] PMID:10854413

  20. Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.

    PubMed

    Staats, Martijn; Erkens, Roy H J; van de Vossenberg, Bart; Wieringa, Jan J; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E; Bakker, Freek T

    2013-01-01

    Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal. Furthermore, NGS of historical DNA enables recovering crucial genetic information from old type specimens that to date have remained mostly unutilized and, thus, opens up a new frontier for taxonomic research as well.

  1. Genotyping of Indian antigenic, vaccine, and field Brucella spp. using multilocus sequence typing.

    PubMed

    Shome, Rajeswari; Krithiga, Natesan; Shankaranarayana, Padmashree B; Jegadesan, Sankarasubramanian; Udayakumar S, Vishnu; Shome, Bibek Ranjan; Saikia, Girin Kumar; Sharma, Narendra Kumar; Chauhan, Harshad; Chandel, Bharat Singh; Jeyaprakash, Rajendhran; Rahman, Habibur

    2016-03-31

    Brucellosis is one of the most important zoonotic diseases that affects multiple livestock species and causes great economic losses. The highly conserved genomes of Brucella, with > 90% homology among species, makes it important to study the genetic diversity circulating in the country. A total of 26 Brucella spp. (4 reference strains and 22 field isolates) and 1 B. melitensis draft genome sequence from India (B. melitensis Bm IND1) were included for sequence typing. The field isolates were identified by biochemical tests and confirmed by both conventional and quantitative polymerase chain reaction (qPCR) targeting bcsp 31Brucella genus-specific marker. Brucella speciation and biotyping was done by Bruce ladder, probe qPCR, and AMOS PCRs, respectively, and genotyping was done by multilocus sequence typing (MLST). The MLST typing of 27 Brucella spp. revealed five distinct sequence types (STs); the B. abortus S99 reference strain and 21 B. abortus field isolates belonged to ST1. On the other hand, the vaccine strain B. abortus S19 was genotyped as ST5. Similarly, B. melitensis 16M reference strain and one B. melitensis field isolate were grouped into ST7. Another B. melitensis field isolate belonged to ST8 (draft genome sequence from India), and only B. suis 1330 reference strain was found to be ST14. The sequences revealed genetic similarity of the Indian strains to the global reference and field strains. The study highlights the usefulness of MLST for typing of field isolates and validation of reference strains used for diagnosis and vaccination against brucellosis.

  2. Storage and utilization of HLA genomic data--new approaches to HLA typing.

    PubMed

    Helmberg, W

    2000-01-01

    Currently available DNA-based HLA typing assays can provide detailed information about sequence motifs of a tested sample. It is still a common practice, however, for information acquired by high-resolution sequence specific oligonucleotide probe (SSOP) typing or sequence specific priming (SSP) to be presented in a low-resolution serological format. Unfortunately, this representation can lead to significant loss of useful data in many cases. An alternative to assigning allele equivalents to suchDNA typing results is simply to store the observed typing pattern and utilize the information with the help of Virtual DNA Analysis (VDA). Interpretation of the stored typing patterns can then be updated based on newly defined alleles, assuming the sequence motifs detected by the typing reagents are known. Rather than updating reagent specificities in individual laboratories, such updates should be performed in a central, publicly available sequence database. By referring to this database, HLA genomic data can then be stored and transferred between laboratories without loss of information. The 13th International Histocompatibility Workshop offers an ideal opportunity to begin building this common database for the entire human MHC.

  3. Permanent draft genomes of the Rhodopirellula maiorica strain SM1.

    PubMed

    Richter, Michael; Richter-Heitmann, Tim; Klindworth, Anna; Wegner, Carl-Eric; Frank, Carsten S; Harder, Jens; Glöckner, Frank Oliver

    2014-02-01

    The genome of Rhodopirellula maiorica strain SM1 was sequenced as a permanent draft to complement the full genome sequence of the type strain Rhodopirellula baltica SH1(T). This isolate is part of a larger study to infer the biogeography of Rhodopirellula species in European marine waters, as well as to amend the genus description of R. baltica. This genomics resource article is the fifth of a series of five publications reporting in total eight new permanent daft genomes of Rhodopirellula species. Copyright © 2013 Elsevier B.V. All rights reserved.

  4. Permanent draft genome of Rhodopirellula sallentina SM41.

    PubMed

    Wegner, Carl-Eric; Richter, Michael; Richter-Heitmann, Tim; Klindworth, Anna; Frank, Carsten S; Glöckner, Frank Oliver; Harder, Jens

    2014-02-01

    The genome of Rhodopirellula sallentina SM41 was sequenced as a permanent draft to supplement the full genome sequence of the type strain Rhodopirellula baltica SH1(T). This isolate is part of a larger study to gain insights into the biogeography of Rhodopirellula species in European marine waters, as well as to amend the genus description of R. baltica. This genomics resource article is the third of a series of five publications reporting in total eight new permanent daft genomes of Rhodopirellula species. Copyright © 2013 Elsevier B.V. All rights reserved.

  5. Low levels of LTR retrotransposon deletion by ectopic recombination in the gigantic genomes of salamanders.

    PubMed

    Frahry, Matthew Blake; Sun, Cheng; Chong, Rebecca A; Mueller, Rachel Lockridge

    2015-02-01

    Across the tree of life, species vary dramatically in nuclear genome size. Mutations that add or remove sequences from genomes-insertions or deletions, or indels-are the ultimate source of this variation. Differences in the tempo and mode of insertion and deletion across taxa have been proposed to contribute to evolutionary diversity in genome size. Among vertebrates, most of the largest genomes are found within the salamanders, an amphibian clade with genome sizes ranging from ~14 to ~120 Gb. Salamander genomes have been shown to experience slower rates of DNA loss through small (i.e., <30 bp) deletions than do other vertebrate genomes. However, no studies have addressed DNA loss from salamander genomes resulting from larger deletions. Here, we focus on one type of large deletion-ectopic-recombination-mediated removal of LTR retrotransposon sequences. In ectopic recombination, double-strand breaks are repaired using a "wrong" (i.e., ectopic, or non-allelic) template sequence-typically another locus of similar sequence. When breaks occur within the LTR portions of LTR retrotransposons, ectopic-recombination-mediated repair can produce deletions that remove the internal transposon sequence and the equivalent of one of the two LTR sequences. These deletions leave a signature in the genome-a solo LTR sequence. We compared levels of solo LTRs in the genomes of four salamander species with levels present in five vertebrates with smaller genomes. Our results demonstrate that salamanders have low levels of solo LTRs, suggesting that ectopic-recombination-mediated deletion of LTR retrotransposons occurs more slowly than in other vertebrates with smaller genomes.

  6. Genome-Wide Comparative Analysis Reveals Similar Types of NBS Genes in Hybrid Citrus sinensis Genome and Original Citrus clementine Genome and Provides New Insights into Non-TIR NBS Genes

    PubMed Central

    Wang, Yunsheng; Zhou, Lijuan; Li, Dazhi; Dai, Liangying; Lawton-Rauh, Amy; Srimani, Pradip K.; Duan, Yongping; Luo, Feng

    2015-01-01

    In this study, we identified and compared nucleotide-binding site (NBS) domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China). Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR) domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC) domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention. PMID:25811466

  7. Genome-wide comparative analysis reveals similar types of NBS genes in hybrid Citrus sinensis genome and original Citrus clementine genome and provides new insights into non-TIR NBS genes.

    PubMed

    Wang, Yunsheng; Zhou, Lijuan; Li, Dazhi; Dai, Liangying; Lawton-Rauh, Amy; Srimani, Pradip K; Duan, Yongping; Luo, Feng

    2015-01-01

    In this study, we identified and compared nucleotide-binding site (NBS) domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China). Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR) domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC) domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention.

  8. Complete genome sequence of Dyadobacter fermentans type strain (NS114T)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lang, Elke; Lapidus, Alla; Chertkov, Olga

    Dyadobacter fermentans (Chelius MK and Triplett EW, 2000) is the type species of the genus Dyadobacter. It is of phylogenetic interest because of its location in the Cytophagaceae, a very diverse family within the order 'Sphingobacteriales'. D. fermentans has a mainly respiratory metabolism, stains Gram-negative, is non-motile and oxidase and catalase positive. It is characterized by the production of cell filaments in ageing cultures, a flexirubin-like pigment and its ability to ferment glucose, which is almost unique in the aerobically living members of this taxonomically difficult family. Here we describe the features of this organism, together with the complete genomemore » sequence, and annotation. This is the first complete genome sequence of the 'sphingobacterial' genus Dyadobacter, and this 6,967,790 bp long single replicon genome with its 5804 protein-coding and 50 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  9. Methyl-CpG island-associated genome signature tags

    DOEpatents

    Dunn, John J

    2014-05-20

    Disclosed is a method for analyzing the organismic complexity of a sample through analysis of the nucleic acid in the sample. In the disclosed method, through a series of steps, including digestion with a type II restriction enzyme, ligation of capture adapters and linkers and digestion with a type IIS restriction enzyme, genome signature tags are produced. The sequences of a statistically significant number of the signature tags are determined and the sequences are used to identify and quantify the organisms in the sample. Various embodiments of the invention described herein include methods for using single point genome signature tags to analyze the related families present in a sample, methods for analyzing sequences associated with hyper- and hypo-methylated CpG islands, methods for visualizing organismic complexity change in a sampling location over time and methods for generating the genome signature tag profile of a sample of fragmented DNA.

  10. Narrow-Host-Range Bacteriophages That Infect Rhizobium etli Associate with Distinct Genomic Types

    PubMed Central

    Santamaría, Rosa Isela; Bustos, Patricia; Sepúlveda-Robles, Omar; Lozano, Luis; Rodríguez, César; Fernández, José Luis; Juárez, Soledad; Kameyama, Luis; Guarneros, Gabriel; Dávila, Guillermo

    2014-01-01

    In this work, we isolated and characterized 14 bacteriophages that infect Rhizobium etli. They were obtained from rhizosphere soil of bean plants from agricultural lands in Mexico using an enrichment method. The host range of these phages was narrow but variable within a collection of 48 R. etli strains. We obtained the complete genome sequence of nine phages. Four phages were resistant to several restriction enzymes and in vivo cloning, probably due to nucleotide modifications. The genome size of the sequenced phages varied from 43 kb to 115 kb, with a median size of ∼45 to 50 kb. A large proportion of open reading frames of these phage genomes (65 to 70%) consisted of hypothetical and orphan genes. The remainder encoded proteins needed for phage morphogenesis and DNA synthesis and processing, among other functions, and a minor percentage represented genes of bacterial origin. We classified these phages into four genomic types on the basis of their genomic similarity, gene content, and host range. Since there are no reports of similar sequences, we propose that these bacteriophages correspond to novel species. PMID:24185856

  11. Outbreak Investigation Using High-Throughput Genome Sequencing within a Diagnostic Microbiology Laboratory

    PubMed Central

    Sherry, Norelle L.; Porter, Jessica L.; Seemann, Torsten; Watkins, Andrew; Stinear, Timothy P.

    2013-01-01

    Next-generation sequencing (NGS) of bacterial genomes has recently become more accessible and is now available to the routine diagnostic microbiology laboratory. However, questions remain regarding its feasibility, particularly with respect to data analysis in nonspecialist centers. To test the applicability of NGS to outbreak investigations, Ion Torrent sequencing was used to investigate a putative multidrug-resistant Escherichia coli outbreak in the neonatal unit of the Mercy Hospital for Women, Melbourne, Australia. Four suspected outbreak strains and a comparator strain were sequenced. Genome-wide single nucleotide polymorphism (SNP) analysis demonstrated that the four neonatal intensive care unit (NICU) strains were identical and easily differentiated from the comparator strain. Genome sequence data also determined that the NICU strains belonged to multilocus sequence type 131 and carried the blaCTX-M-15 extended-spectrum beta-lactamase. Comparison of the outbreak strains to all publicly available complete E. coli genome sequences showed that they clustered with neonatal meningitis and uropathogenic isolates. The turnaround time from a positive culture to the completion of sequencing (prior to data analysis) was 5 days, and the cost was approximately $300 per strain (for the reagents only). The main obstacles to a mainstream adoption of NGS technologies in diagnostic microbiology laboratories are currently cost (although this is decreasing), a paucity of user-friendly and clinically focused bioinformatics platforms, and a lack of genomics expertise outside the research environment. Despite these hurdles, NGS technologies provide unparalleled high-resolution genotyping in a short time frame and are likely to be widely implemented in the field of diagnostic microbiology in the next few years, particularly for epidemiological investigations (replacing current typing methods) and the characterization of resistance determinants. Clinical microbiologists need to familiarize themselves with these technologies and their applications. PMID:23408689

  12. Complete genome sequence of Jiangella gansuensis strain YIM 002T (DSM 44835T), the type species of the genus Jiangella and source of new antibiotic compounds

    DOE PAGES

    Jiao, Jian-Yu; Carro, Lorena; Liu, Lan; ...

    2017-02-03

    Jiangella gansuensis strain YIM 002 T is the type strain of the type species of the genus Jiangella, which is at the present time composed of five species, and was isolated from desert soil sample in Gansu Province (China). The five strains of this genus are clustered in a monophyletic group when closer actinobacterial genera are used to infer a 16S rRNA gene sequence phylogeny. The study of this genome is part of the Genomic Encyclopedia of Bacteria and Archaea project, and here we describe the complete genome sequence and annotation of this taxon. The genome of J. gansuensis strainmore » YIM 002T contains a single scaffold of size 5,585,780 bp, which involves 149 pseudogenes, 4905 protein-coding genes and 50 RNA genes, including 2520 hypothetical proteins and 4 rRNA genes. From the investigation of genome sizes of Jiangella species, J. gansuensis shows a smaller size, which indicates this strain might have discarded too much genetic information to adapt to desert environment. Seven new compounds from this bacterium have recently been described; however, its potential should be higher, as secondary metabolite gene cluster analysis predicted 60 gene clusters, including the potential to produce the pristinamycin.« less

  13. Complete genome sequence of Jiangella gansuensis strain YIM 002T (DSM 44835T), the type species of the genus Jiangella and source of new antibiotic compounds

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jiao, Jian-Yu; Carro, Lorena; Liu, Lan

    Jiangella gansuensis strain YIM 002 T is the type strain of the type species of the genus Jiangella, which is at the present time composed of five species, and was isolated from desert soil sample in Gansu Province (China). The five strains of this genus are clustered in a monophyletic group when closer actinobacterial genera are used to infer a 16S rRNA gene sequence phylogeny. The study of this genome is part of the Genomic Encyclopedia of Bacteria and Archaea project, and here we describe the complete genome sequence and annotation of this taxon. The genome of J. gansuensis strainmore » YIM 002T contains a single scaffold of size 5,585,780 bp, which involves 149 pseudogenes, 4905 protein-coding genes and 50 RNA genes, including 2520 hypothetical proteins and 4 rRNA genes. From the investigation of genome sizes of Jiangella species, J. gansuensis shows a smaller size, which indicates this strain might have discarded too much genetic information to adapt to desert environment. Seven new compounds from this bacterium have recently been described; however, its potential should be higher, as secondary metabolite gene cluster analysis predicted 60 gene clusters, including the potential to produce the pristinamycin.« less

  14. The Use of a Combined Bioinformatics Approach to Locate Antibiotic Resistance Genes on Plasmids From Whole Genome Sequences of Salmonella enterica Serovars From Humans in Ghana.

    PubMed

    Kudirkiene, Egle; Andoh, Linda A; Ahmed, Shahana; Herrero-Fresno, Ana; Dalsgaard, Anders; Obiri-Danso, Kwasi; Olsen, John E

    2018-01-01

    In the current study, we identified plasmids carrying antimicrobial resistance genes in draft whole genome sequences of 16 selected Salmonella enterica isolates representing six different serovars from humans in Ghana. The plasmids and the location of resistance genes in the genomes were predicted using a combination of PlasmidFinder, ResFinder, plasmidSPAdes and BLAST genomic analysis tools. Subsequently, S1-PFGE was employed for analysis of plasmid profiles. Whole genome sequencing confirmed the presence of antimicrobial resistance genes in Salmonella isolates showing multidrug resistance phenotypically. ESBL, either bla TEM52-B or bla CTX-M15 were present in two cephalosporin resistant isolates of S . Virchow and S . Poona, respectively. The systematic genome analysis revealed the presence of different plasmids in different serovars, with or without insertion of antimicrobial resistance genes. In S . Enteritidis, resistance genes were carried predominantly on plasmids of IncN type, in S . Typhimurium on plasmids of IncFII(S)/IncFIB(S)/IncQ1 type. In S . Virchow and in S . Poona, resistance genes were detected on plasmids of IncX1 and TrfA/IncHI2/IncHI2A type, respectively. The latter two plasmids were described for the first time in these serovars. The combination of genomic analytical tools allowed nearly full mapping of the resistance plasmids in all Salmonella strains analyzed. The results suggest that the improved analytical approach used in the current study may be used to identify plasmids that are specifically associated with resistance phenotypes in whole genome sequences. Such knowledge would allow the development of rapid multidrug resistance tracking tools in Salmonella populations using WGS.

  15. Complete genome sequence of the Campylobacter helveticus type strain ATCC 51209T

    USDA-ARS?s Scientific Manuscript database

    Campylobacter helveticus has been isolated from domestic dogs and cats. Although C. helveticus is closely related to the emerging human pathogen C. upsaliensis, no C. helveticus-associated cases of human illness have been reported. This study describes the whole-genome sequence of the C. helveticus ...

  16. Genome sequence and annotation of Trichoderma parareesei, the ancestor of the cellulase producer Trichoderma reesei

    DOE PAGES

    Yang, Dongqing; Pomraning, Kyle; Kopchinskiy, Alexey; ...

    2015-08-13

    The filamentous fungus Trichoderma parareesei is the asexually reproducing ancestor of Trichoderma reesei, the holomorphic industrial producer of cellulase and hemicellulase. Here, we present the genome sequence of the T. parareesei type strain CBS 125925, which contains genes for 9,318 proteins.

  17. Complete genome sequences of two acetylene-fermenting Pelobacter acetylenicus strains

    USGS Publications Warehouse

    Sutton, John M.; Baesman, Shaun; Fierst, Janna L.; Poret-Peterson, Amisha T.; Oremland, Ronald S.; Dunlap, Darren S.; Akob, Denise M.

    2017-01-01

    Acetylene fermentation is a rare metabolism that was serendipitously discovered during C2H2-block assays of N2O reductase. Here, we report the genome sequences of two type strains of acetylene-fermenting Pelobacter acetylenicus, the freshwater bacterium DSM 3246 and the estuarine bacterium DSM 3247.

  18. Complete Genome Sequences of Lactobacillus johnsonii Strain N6.2 and Lactobacillus reuteri Strain TD1.

    PubMed

    Leonard, Michael T; Valladares, Ricardo B; Ardissone, Alexandria; Gonzalez, Claudio F; Lorca, Graciela L; Triplett, Eric W

    2014-05-08

    We report here the complete genome sequences of Lactobacillus johnsonii strain N6.2, a homofermentative lactic acid intestinal bacterium, and Lactobacillus reuteri strain TD1, a heterofermentative lactic acid intestinal bacterium, both isolated from a type 1 diabetes-resistant rat model.

  19. Complete genome sequence of the first human parechovirus type 3 isolated in Taiwan.

    PubMed

    Chang, Jenn-Tzong; Yang, Chih-Shiang; Chen, Bao-Chen; Chen, Yao-Shen; Chang, Tsung-Hsien

    2017-11-01

    The first human parechovirus 3 (HPeV3 VGHKS-2007) in Taiwan was identified from a clinical specimen from a male infant. The entire genome of the HPeV3 isolate was sequenced and compared to known HPeV3 sequences. Genome alignment data showed that HPeV3 VGHKS-2007 shares the highest nucleotide identity, 99%, with the Japanese strain of HPeV3 1361K-162589-Yamagata-2008. All HPeV3 isolates possess at least 97% amino acid identity. The analysis of the genome sequence of HPeV3 VGHKS-2007 will facilitate future investigations of the epidemiology and pathogenicity of HPeV3 infection. Copyright © 2017. Published by Elsevier Taiwan LLC.

  20. Defining a Core Genome Multilocus Sequence Typing Scheme for the Global Epidemiology of Vibrio parahaemolyticus

    PubMed Central

    Jolley, Keith A.; Reed, Elizabeth; Martinez-Urtaza, Jaime

    2017-01-01

    ABSTRACT Vibrio parahaemolyticus is an important human foodborne pathogen whose transmission is associated with the consumption of contaminated seafood, with a growing number of infections reported over recent years worldwide. A multilocus sequence typing (MLST) database for V. parahaemolyticus was created in 2008, and a large number of clones have been identified, causing severe outbreaks worldwide (sequence type 3 [ST3]), recurrent outbreaks in certain regions (e.g., ST36), or spreading to other regions where they are nonendemic (e.g., ST88 or ST189). The current MLST scheme uses sequences of 7 genes to generate an ST, which results in a powerful tool for inferring the population structure of this pathogen, although with limited resolution, especially compared to pulsed-field gel electrophoresis (PFGE). The application of whole-genome sequencing (WGS) has become routine for trace back investigations, with core genome MLST (cgMLST) analysis as one of the most straightforward ways to explore complex genomic data in an epidemiological context. Therefore, there is a need to generate a new, portable, standardized, and more advanced system that provides higher resolution and discriminatory power among V. parahaemolyticus strains using WGS data. We sequenced 92 V. parahaemolyticus genomes and used the genome of strain RIMD 2210633 as a reference (with a total of 4,832 genes) to determine which genes were suitable for establishing a V. parahaemolyticus cgMLST scheme. This analysis resulted in the identification of 2,254 suitable core genes for use in the cgMLST scheme. To evaluate the performance of this scheme, we performed a cgMLST analysis of 92 newly sequenced genomes, plus an additional 142 strains with genomes available at NCBI. cgMLST analysis was able to distinguish related and unrelated strains, including those with the same ST, clearly showing its enhanced resolution over conventional MLST analysis. It also distinguished outbreak-related from non-outbreak-related strains within the same ST. The sequences obtained from this work were deposited and are available in the public database (http://pubmlst.org/vparahaemolyticus). The application of this cgMLST scheme to the characterization of V. parahaemolyticus strains provided by different laboratories from around the world will reveal the global picture of the epidemiology, spread, and evolution of this pathogen and will become a powerful tool for outbreak investigations, allowing for the unambiguous comparison of strains with global coverage. PMID:28330888

  1. Features of the genomes of microsporidia in mosquitoes:status and preliminary findings

    USDA-ARS?s Scientific Manuscript database

    The status and preliminary findings for full genome sequencing of two distantly related species of microsporidia with mosquitoes as type hosts will be presented. Vavraia culicis, the type species of the genus Vavraia, was originally described from Culex pipiens. Type material was not available and...

  2. NABIC: A New Access Portal to Search, Visualize, and Share Agricultural Genomics Data.

    PubMed

    Seol, Young-Joo; Lee, Tae-Ho; Park, Dong-Suk; Kim, Chang-Kug

    2016-01-01

    The National Agricultural Biotechnology Information Center developed an access portal to search, visualize, and share agricultural genomics data with a focus on South Korean information and resources. The portal features an agricultural biotechnology database containing a wide range of omics data from public and proprietary sources. We collected 28.4 TB of data from 162 agricultural organisms, with 10 types of omics data comprising next-generation sequencing sequence read archive, genome, gene, nucleotide, DNA chip, expressed sequence tag, interactome, protein structure, molecular marker, and single-nucleotide polymorphism datasets. Our genomic resources contain information on five animals, seven plants, and one fungus, which is accessed through a genome browser. We also developed a data submission and analysis system as a web service, with easy-to-use functions and cutting-edge algorithms, including those for handling next-generation sequencing data.

  3. High-quality draft genome sequence of Gracilimonas tropica CL-CB462 T (DSM 19535 T), isolated from a Synechococcus culture

    DOE PAGES

    Choi, Dong Han; Ahn, Chisang; Jang, Gwang Il; ...

    2015-11-11

    Gracilimonas tropica Choi et al. 2009 is a member of order Sphingobacteriales, class Sphingobacteriia. Three species of the genus Gracilimonas have been isolated from marine seawater or a salt mine and showed extremely halotolerant and mesophilic features, although close relatives are extremely halophilic or thermophilic. The type strain of the type species of Gracilimonas, G. tropica DSM19535 T, was isolated from a Synechococcus culture which was established from the tropical sea-surface water of the Pacific Ocean. The genome of the strain DSM19535 T was sequenced through the Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes project.more » Here, we describe the genomic features of the strain. The 3,831,242 bp long draft genome consists of 48 contigs with 3373 protein-coding and 53 RNA genes. Finally, the strain seems to adapt to phosphate limitation and requires amino acids from external environment. In addition, genomic analyses and pasteurization experiment suggested that G. tropica DSM19535 T did not form spore.« less

  4. Mapping Ribonucleotides Incorporated into DNA by Hydrolytic End-Sequencing.

    PubMed

    Orebaugh, Clinton D; Lujan, Scott A; Burkholder, Adam B; Clausen, Anders R; Kunkel, Thomas A

    2018-01-01

    Ribonucleotides embedded within DNA render the DNA sensitive to the formation of single-stranded breaks under alkali conditions. Here, we describe a next-generation sequencing method called hydrolytic end sequencing (HydEn-seq) to map ribonucleotides inserted into the genome of Saccharomyce cerevisiae strains deficient in ribonucleotide excision repair. We use this method to map several genomic features in wild-type and replicase variant yeast strains.

  5. Non-Homologous End Joining and Homology Directed DNA Repair Frequency of Double-Stranded Breaks Introduced by Genome Editing Reagents.

    PubMed

    Zaboikin, Michail; Zaboikina, Tatiana; Freter, Carl; Srinivasakumar, Narasimhachar

    2017-01-01

    Genome editing using transcription-activator like effector nucleases or RNA guided nucleases allows one to precisely engineer desired changes within a given target sequence. The genome editing reagents introduce double stranded breaks (DSBs) at the target site which can then undergo DNA repair by non-homologous end joining (NHEJ) or homology directed recombination (HDR) when a template DNA molecule is available. NHEJ repair results in indel mutations at the target site. As PCR amplified products from mutant target regions are likely to exhibit different melting profiles than PCR products amplified from wild type target region, we designed a high resolution melting analysis (HRMA) for rapid identification of efficient genome editing reagents. We also designed TaqMan assays using probes situated across the cut site to discriminate wild type from mutant sequences present after genome editing. The experiments revealed that the sensitivity of the assays to detect NHEJ-mediated DNA repair could be enhanced by selection of transfected cells to reduce the contribution of unmodified genomic DNA from untransfected cells to the DNA melting profile. The presence of donor template DNA lacking the target sequence at the time of genome editing further enhanced the sensitivity of the assays for detection of mutant DNA molecules by excluding the wild-type sequences modified by HDR. A second TaqMan probe that bound to an adjacent site, outside of the primary target cut site, was used to directly determine the contribution of HDR to DNA repair in the presence of the donor template sequence. The TaqMan qPCR assay, designed to measure the contribution of NHEJ and HDR in DNA repair, corroborated the results from HRMA. The data indicated that genome editing reagents can produce DSBs at high efficiency in HEK293T cells but a significant proportion of these are likely masked by reversion to wild type as a result of HDR. Supplying a donor plasmid to provide a template for HDR (that eliminates a PCR amplifiable target) revealed these cryptic DSBs and facilitated the determination of the true efficacy of genome editing reagents. The results indicated that in HEK293T cells, approximately 40% of the DSBs introduced by genome editing, were available for participation in HDR.

  6. Core Genome Multilocus Sequence Typing Scheme for High-Resolution Typing of Enterococcus faecium

    PubMed Central

    de Been, Mark; Pinholt, Mette; Top, Janetta; Bletz, Stefan; van Schaik, Willem; Brouwer, Ellen; Rogers, Malbert; Kraat, Yvette; Bonten, Marc; Corander, Jukka; Westh, Henrik; Harmsen, Dag

    2015-01-01

    Enterococcus faecium, a common inhabitant of the human gut, has emerged in the last 2 decades as an important multidrug-resistant nosocomial pathogen. Since the start of the 21st century, multilocus sequence typing (MLST) has been used to study the molecular epidemiology of E. faecium. However, due to the use of a small number of genes, the resolution of MLST is limited. Whole-genome sequencing (WGS) now allows for high-resolution tracing of outbreaks, but current WGS-based approaches lack standardization, rendering them less suitable for interlaboratory prospective surveillance. To overcome this limitation, we developed a core genome MLST (cgMLST) scheme for E. faecium. cgMLST transfers genome-wide single nucleotide polymorphism (SNP) diversity into a standardized and portable allele numbering system that is far less computationally intensive than SNP-based analysis of WGS data. The E. faecium cgMLST scheme was built using 40 genome sequences that represented the diversity of the species. The scheme consists of 1,423 cgMLST target genes. To test the performance of the scheme, we performed WGS analysis of 103 outbreak isolates from five different hospitals in the Netherlands, Denmark, and Germany. The cgMLST scheme performed well in distinguishing between epidemiologically related and unrelated isolates, even between those that had the same sequence type (ST), which denotes the higher discriminatory power of this cgMLST scheme over that of conventional MLST. We also show that in terms of resolution, the performance of the E. faecium cgMLST scheme is equivalent to that of an SNP-based approach. In conclusion, the cgMLST scheme developed in this study facilitates rapid, standardized, and high-resolution tracing of E. faecium outbreaks. PMID:26400782

  7. Core Genome Multilocus Sequence Typing Scheme for High- Resolution Typing of Enterococcus faecium.

    PubMed

    de Been, Mark; Pinholt, Mette; Top, Janetta; Bletz, Stefan; Mellmann, Alexander; van Schaik, Willem; Brouwer, Ellen; Rogers, Malbert; Kraat, Yvette; Bonten, Marc; Corander, Jukka; Westh, Henrik; Harmsen, Dag; Willems, Rob J L

    2015-12-01

    Enterococcus faecium, a common inhabitant of the human gut, has emerged in the last 2 decades as an important multidrug-resistant nosocomial pathogen. Since the start of the 21st century, multilocus sequence typing (MLST) has been used to study the molecular epidemiology of E. faecium. However, due to the use of a small number of genes, the resolution of MLST is limited. Whole-genome sequencing (WGS) now allows for high-resolution tracing of outbreaks, but current WGS-based approaches lack standardization, rendering them less suitable for interlaboratory prospective surveillance. To overcome this limitation, we developed a core genome MLST (cgMLST) scheme for E. faecium. cgMLST transfers genome-wide single nucleotide polymorphism(SNP) diversity into a standardized and portable allele numbering system that is far less computationally intensive than SNP-based analysis of WGS data. The E. faecium cgMLST scheme was built using 40 genome sequences that represented the diversity of the species. The scheme consists of 1,423 cgMLST target genes. To test the performance of the scheme, we performed WGS analysis of 103 outbreak isolates from five different hospitals in the Netherlands, Denmark, and Germany. The cgMLST scheme performed well in distinguishing between epidemiologically related and unrelated isolates, even between those that had the same sequence type (ST), which denotes the higher discriminatory power of this cgMLST scheme over that of conventional MLST. We also show that in terms of resolution, the performance of the E. faecium cgMLST scheme is equivalent to that of an SNP-based approach. In conclusion, the cgMLST scheme developed in this study facilitates rapid, standardized, and high-resolution tracing of E. faecium outbreaks.

  8. Mycobacterium leprae genomes from a British medieval leprosy hospital: towards understanding an ancient epidemic.

    PubMed

    Mendum, Tom A; Schuenemann, Verena J; Roffey, Simon; Taylor, G Michael; Wu, Huihai; Singh, Pushpendra; Tucker, Katie; Hinds, Jason; Cole, Stewart T; Kierzek, Andrzej M; Nieselt, Kay; Krause, Johannes; Stewart, Graham R

    2014-04-08

    Leprosy has afflicted humankind throughout history leaving evidence in both early texts and the archaeological record. In Britain, leprosy was widespread throughout the Middle Ages until its gradual and unexplained decline between the 14th and 16th centuries. The nature of this ancient endemic leprosy and its relationship to modern strains is only partly understood. Modern leprosy strains are currently divided into 5 phylogenetic groups, types 0 to 4, each with strong geographical links. Until recently, European strains, both ancient and modern, were thought to be exclusively type 3 strains. However, evidence for type 2 strains, a group normally associated with Central Asia and the Middle East, has recently been found in archaeological samples in Scandinavia and from two skeletons from the medieval leprosy hospital (or leprosarium) of St Mary Magdalen, near Winchester, England. Here we report the genotypic analysis and whole genome sequencing of two further ancient M. leprae genomes extracted from the remains of two individuals, Sk14 and Sk27, that were excavated from 10th-12th century burials at the leprosarium of St Mary Magdalen. DNA was extracted from the surfaces of bones showing osteological signs of leprosy. Known M. leprae polymorphisms were PCR amplified and Sanger sequenced, while draft genomes were generated by enriching for M. leprae DNA, and Illumina sequencing. SNP-typing and phylogenetic analysis of the draft genomes placed both of these ancient strains in the conserved type 2 group, with very few novel SNPs compared to other ancient or modern strains. The genomes of the two newly sequenced M. leprae strains group firmly with other type 2F strains. Moreover, the M. leprae strain most closely related to one of the strains, Sk14, in the worldwide phylogeny is a contemporaneous ancient St Magdalen skeleton, vividly illustrating the epidemic and clonal nature of leprosy at this site. The prevalence of these type 2 strains indicates that type 2F strains, in contrast to later European and associated North American type 3 isolates, may have been the co-dominant or even the predominant genotype at this location during the 11th century.

  9. Comparison of advanced whole genome sequence-based methods to distinguish strains of Salmonella enterica serovar Heidelberg involved in foodborne outbreaks in Québec.

    PubMed

    Vincent, Caroline; Usongo, Valentine; Berry, Chrystal; Tremblay, Denise M; Moineau, Sylvain; Yousfi, Khadidja; Doualla-Bell, Florence; Fournier, Eric; Nadon, Céline; Goodridge, Lawrence; Bekal, Sadjia

    2018-08-01

    Salmonella enterica serovar Heidelberg (S. Heidelberg) is one of the top serovars causing human salmonellosis. This serovar ranks second and third among serovars that cause human infections in Québec and Canada, respectively, and has been associated with severe infections. Traditional typing methods such as PFGE do not display adequate discrimination required to resolve outbreak investigations due to the low level of genetic diversity of isolates belonging to this serovar. This study evaluates the ability of four whole genome sequence (WGS)-based typing methods to differentiate among 145 S. Heidelberg strains involved in four distinct outbreak events and sporadic cases of salmonellosis that occurred in Québec between 2007 and 2016. Isolates from all outbreaks were indistinguishable by PFGE. The core genome single nucleotide variant (SNV), core genome multilocus sequence typing (MLST) and whole genome MLST approaches were highly discriminatory and separated outbreak strains into four distinct phylogenetic clusters that were concordant with the epidemiological data. The clustered regularly interspaced short palindromic repeats (CRISPR) typing method was less discriminatory. However, CRISPR typing may be used as a secondary method to differentiate isolates of S. Heidelberg that are genetically similar but epidemiologically unrelated to outbreak events. WGS-based typing methods provide a highly discriminatory alternative to PFGE for the laboratory investigation of foodborne outbreaks. Copyright © 2018 Elsevier Ltd. All rights reserved.

  10. New families of site-specific repetitive DNA sequences that comprise constitutive heterochromatin of the Syrian hamster (Mesocricetus auratus, Cricetinae, Rodentia).

    PubMed

    Yamada, Kazuhiko; Kamimura, Eikichi; Kondo, Mariko; Tsuchiya, Kimiyuki; Nishida-Umehara, Chizuko; Matsuda, Yoichi

    2006-02-01

    We molecularly cloned new families of site-specific repetitive DNA sequences from BglII- and EcoRI-digested genomic DNA of the Syrian hamster (Mesocricetus auratus, Cricetrinae, Rodentia) and characterized them by chromosome in situ hybridization and filter hybridization. They were classified into six different types of repetitive DNA sequence families according to chromosomal distribution and genome organization. The hybridization patterns of the sequences were consistent with the distribution of C-positive bands and/or Hoechst-stained heterochromatin. The centromeric major satellite DNA and sex chromosome-specific and telomeric region-specific repetitive sequences were conserved in the same genus (Mesocricetus) but divergent in different genera. The chromosome-2-specific sequence was conserved in two genera, Mesocricetus and Cricetulus, and a low copy number of repetitive sequences on the heterochromatic chromosome arms were conserved in the subfamily Cricetinae but not in the subfamily Calomyscinae. By contrast, the other type of repetitive sequences on the heterochromatic chromosome arms, which had sequence similarities to a LINE sequence of rodents, was conserved through the three subfamilies, Cricetinae, Calomyscinae and Murinae. The nucleotide divergence of the repetitive sequences of heterochromatin was well correlated with the phylogenetic relationships of the Cricetinae species, and each sequence has been independently amplified and diverged in the same genome.

  11. Genomic Definition of Hypervirulent and Multidrug-Resistant Klebsiella pneumoniae Clonal Groups

    PubMed Central

    Bialek-Davenet, Suzanne; Criscuolo, Alexis; Ailloud, Florent; Passet, Virginie; Jones, Louis; Delannoy-Vieillard, Anne-Sophie; Garin, Benoit; Le Hello, Simon; Arlet, Guillaume; Nicolas-Chanoine, Marie-Hélène; Decré, Dominique

    2014-01-01

    Multidrug-resistant and highly virulent Klebsiella pneumoniae isolates are emerging, but the clonal groups (CGs) corresponding to these high-risk strains have remained imprecisely defined. We aimed to identify K. pneumoniae CGs on the basis of genome-wide sequence variation and to provide a simple bioinformatics tool to extract virulence and resistance gene data from genomic data. We sequenced 48 K. pneumoniae isolates, mostly of serotypes K1 and K2, and compared the genomes with 119 publicly available genomes. A total of 694 highly conserved genes were included in a core-genome multilocus sequence typing scheme, and cluster analysis of the data enabled precise definition of globally distributed hypervirulent and multidrug-resistant CGs. In addition, we created a freely accessible database, BIGSdb-Kp, to enable rapid extraction of medically and epidemiologically relevant information from genomic sequences of K. pneumoniae. Although drug-resistant and virulent K. pneumoniae populations were largely nonoverlapping, isolates with combined virulence and resistance features were detected. PMID:25341126

  12. The Use of Weighted Graphs for Large-Scale Genome Analysis

    PubMed Central

    Zhou, Fang; Toivonen, Hannu; King, Ross D.

    2014-01-01

    There is an acute need for better tools to extract knowledge from the growing flood of sequence data. For example, thousands of complete genomes have been sequenced, and their metabolic networks inferred. Such data should enable a better understanding of evolution. However, most existing network analysis methods are based on pair-wise comparisons, and these do not scale to thousands of genomes. Here we propose the use of weighted graphs as a data structure to enable large-scale phylogenetic analysis of networks. We have developed three types of weighted graph for enzymes: taxonomic (these summarize phylogenetic importance), isoenzymatic (these summarize enzymatic variety/redundancy), and sequence-similarity (these summarize sequence conservation); and we applied these types of weighted graph to survey prokaryotic metabolism. To demonstrate the utility of this approach we have compared and contrasted the large-scale evolution of metabolism in Archaea and Eubacteria. Our results provide evidence for limits to the contingency of evolution. PMID:24619061

  13. Analyzing Somatic Genome Rearrangements in Human Cancers by Using Whole-Exome Sequencing | Office of Cancer Genomics

    Cancer.gov

    Although exome sequencing data are generated primarily to detect single-nucleotide variants and indels, they can also be used to identify a subset of genomic rearrangements whose breakpoints are located in or near exons. Using >4,600 tumor and normal pairs across 15 cancer types, we identified over 9,000 high confidence somatic rearrangements, including a large number of gene fusions.

  14. Complete Genome Sequence of Aggregatibacter (Haemophilus) aphrophilus NJ8700▿

    PubMed Central

    Di Bonaventura, Maria Pia; DeSalle, Rob; Pop, Mihai; Nagarajan, Niranjan; Figurski, David H.; Fine, Daniel H.; Kaplan, Jeffrey B.; Planet, Paul J.

    2009-01-01

    We report the finished and annotated genome sequence of Aggregatibacter aphrophilus strain NJ8700, a strain isolated from the oral flora of a healthy individual, and discuss characteristics that may affect its dual roles in human health and disease. This strain has a rough appearance, and its genome contains genes encoding a type VI secretion system and several factors that may participate in host colonization. PMID:19447908

  15. An Outbreak of Streptococcus pyogenes in a Mental Health Facility: Advantage of Well-Timed Whole-Genome Sequencing Over emm Typing.

    PubMed

    Bergin, Sarah M; Periaswamy, Balamurugan; Barkham, Timothy; Chua, Hong Choon; Mok, Yee Ming; Fung, Daniel Shuen Sheng; Su, Alex Hsin Chuan; Lee, Yen Ling; Chua, Ming Lai Ivan; Ng, Poh Yong; Soon, Wei Jia Wendy; Chu, Collins Wenhan; Tan, Siyun Lucinda; Meehan, Mary; Ang, Brenda Sze Peng; Leo, Yee Sin; Holden, Matthew T G; De, Partha; Hsu, Li Yang; Chen, Swaine L; de Sessions, Paola Florez; Marimuthu, Kalisvar

    2018-05-09

    OBJECTIVEWe report the utility of whole-genome sequencing (WGS) conducted in a clinically relevant time frame (ie, sufficient for guiding management decision), in managing a Streptococcus pyogenes outbreak, and present a comparison of its performance with emm typing.SETTINGA 2,000-bed tertiary-care psychiatric hospital.METHODSActive surveillance was conducted to identify new cases of S. pyogenes. WGS guided targeted epidemiological investigations, and infection control measures were implemented. Single-nucleotide polymorphism (SNP)-based genome phylogeny, emm typing, and multilocus sequence typing (MLST) were performed. We compared the ability of WGS and emm typing to correctly identify person-to-person transmission and to guide the management of the outbreak.RESULTSThe study included 204 patients and 152 staff. We identified 35 patients and 2 staff members with S. pyogenes. WGS revealed polyclonal S. pyogenes infections with 3 genetically distinct phylogenetic clusters (C1-C3). Cluster C1 isolates were all emm type 4, sequence type 915 and had pairwise SNP differences of 0-5, which suggested recent person-to-person transmissions. Epidemiological investigation revealed that cluster C1 was mediated by dermal colonization and transmission of S. pyogenes in a male residential ward. Clusters C2 and C3 were genomically diverse, with pairwise SNP differences of 21-45 and 26-58, and emm 11 and mostly emm120, respectively. Clusters C2 and C3, which may have been considered person-to-person transmissions by emm typing, were shown by WGS to be unlikely by integrating pairwise SNP differences with epidemiology.CONCLUSIONSWGS had higher resolution than emm typing in identifying clusters with recent and ongoing person-to-person transmissions, which allowed implementation of targeted intervention to control the outbreak.Infect Control Hosp Epidemiol 2018;1-9.

  16. Mutation Detection with Next-Generation Resequencing through a Mediator Genome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wurtzel, Omri; Dori-Bachash, Mally; Pietrokovski, Shmuel

    2010-12-31

    The affordability of next generation sequencing (NGS) is transforming the field of mutation analysis in bacteria. The genetic basis for phenotype alteration can be identified directly by sequencing the entire genome of the mutant and comparing it to the wild-type (WT) genome, thus identifying acquired mutations. A major limitation for this approach is the need for an a-priori sequenced reference genome for the WT organism, as the short reads of most current NGS approaches usually prohibit de-novo genome assembly. To overcome this limitation we propose a general framework that utilizes the genome of relative organisms as mediators for comparing WTmore » and mutant bacteria. Under this framework, both mutant and WT genomes are sequenced with NGS, and the short sequencing reads are mapped to the mediator genome. Variations between the mutant and the mediator that recur in the WT are ignored, thus pinpointing the differences between the mutant and the WT. To validate this approach we sequenced the genome of Bdellovibrio bacteriovorus 109J, an obligatory bacterial predator, and its prey-independent mutant, and compared both to the mediator species Bdellovibrio bacteriovorus HD100. Although the mutant and the mediator sequences differed in more than 28,000 nucleotide positions, our approach enabled pinpointing the single causative mutation. Experimental validation in 53 additional mutants further established the implicated gene. Our approach extends the applicability of NGS-based mutant analyses beyond the domain of available reference genomes.« less

  17. Independent assessment and improvement of wheat genome sequence assemblies using Fosill jumping libraries.

    PubMed

    Lu, Fu-Hao; McKenzie, Neil; Kettleborough, George; Heavens, Darren; Clark, Matthew D; Bevan, Michael W

    2018-05-01

    The accurate sequencing and assembly of very large, often polyploid, genomes remains a challenging task, limiting long-range sequence information and phased sequence variation for applications such as plant breeding. The 15-Gb hexaploid bread wheat (Triticum aestivum) genome has been particularly challenging to sequence, and several different approaches have recently generated long-range assemblies. Mapping and understanding the types of assembly errors are important for optimising future sequencing and assembly approaches and for comparative genomics. Here we use a Fosill 38-kb jumping library to assess medium and longer-range order of different publicly available wheat genome assemblies. Modifications to the Fosill protocol generated longer Illumina sequences and enabled comprehensive genome coverage. Analyses of two independent Bacterial Artificial Chromosome (BAC)-based chromosome-scale assemblies, two independent Illumina whole genome shotgun assemblies, and a hybrid Single Molecule Real Time (SMRT-PacBio) and short read (Illumina) assembly were carried out. We revealed a surprising scale and variety of discrepancies using Fosill mate-pair mapping and validated several of each class. In addition, Fosill mate-pairs were used to scaffold a whole genome Illumina assembly, leading to a 3-fold increase in N50 values. Our analyses, using an independent means to validate different wheat genome assemblies, show that whole genome shotgun assemblies based solely on Illumina sequences are significantly more accurate by all measures compared to BAC-based chromosome-scale assemblies and hybrid SMRT-Illumina approaches. Although current whole genome assemblies are reasonably accurate and useful, additional improvements will be needed to generate complete assemblies of wheat genomes using open-source, computationally efficient, and cost-effective methods.

  18. Component identification of electron transport chains in curdlan-producing Agrobacterium sp. ATCC 31749 and its genome-specific prediction using comparative genome and phylogenetic trees analysis.

    PubMed

    Zhang, Hongtao; Setubal, Joao Carlos; Zhan, Xiaobei; Zheng, Zhiyong; Yu, Lijun; Wu, Jianrong; Chen, Dingqiang

    2011-06-01

    Agrobacterium sp. ATCC 31749 (formerly named Alcaligenes faecalis var. myxogenes) is a non-pathogenic aerobic soil bacterium used in large scale biotechnological production of curdlan. However, little is known about its genomic information. DNA partial sequence of electron transport chains (ETCs) protein genes were obtained in order to understand the components of ETC and genomic-specificity in Agrobacterium sp. ATCC 31749. Degenerate primers were designed according to ETC conserved sequences in other reported species. DNA partial sequences of ETC genes in Agrobacterium sp. ATCC 31749 were cloned by the PCR method using degenerate primers. Based on comparative genomic analysis, nine electron transport elements were ascertained, including NADH ubiquinone oxidoreductase, succinate dehydrogenase complex II, complex III, cytochrome c, ubiquinone biosynthesis protein ubiB, cytochrome d terminal oxidase, cytochrome bo terminal oxidase, cytochrome cbb (3)-type terminal oxidase and cytochrome caa (3)-type terminal oxidase. Similarity and phylogenetic analyses of these genes revealed that among fully sequenced Agrobacterium species, Agrobacterium sp. ATCC 31749 is closest to Agrobacterium tumefaciens C58. Based on these results a comprehensive ETC model for Agrobacterium sp. ATCC 31749 is proposed.

  19. Next-Generation Genomics Facility at C-CAMP: Accelerating Genomic Research in India

    PubMed Central

    S, Chandana; Russiachand, Heikham; H, Pradeep; S, Shilpa; M, Ashwini; S, Sahana; B, Jayanth; Atla, Goutham; Jain, Smita; Arunkumar, Nandini; Gowda, Malali

    2014-01-01

    Next-Generation Sequencing (NGS; http://www.genome.gov/12513162) is a recent life-sciences technological revolution that allows scientists to decode genomes or transcriptomes at a much faster rate with a lower cost. Genomic-based studies are in a relatively slow pace in India due to the non-availability of genomics experts, trained personnel and dedicated service providers. Using NGS there is a lot of potential to study India's national diversity (of all kinds). We at the Centre for Cellular and Molecular Platforms (C-CAMP) have launched the Next Generation Genomics Facility (NGGF) to provide genomics service to scientists, to train researchers and also work on national and international genomic projects. We have HiSeq1000 from Illumina and GS-FLX Plus from Roche454. The long reads from GS FLX Plus, and high sequence depth from HiSeq1000, are the best and ideal hybrid approaches for de novo and re-sequencing of genomes and transcriptomes. At our facility, we have sequenced around 70 different organisms comprising of more than 388 genomes and 615 transcriptomes – prokaryotes and eukaryotes (fungi, plants and animals). In addition we have optimized other unique applications such as small RNA (miRNA, siRNA etc), long Mate-pair sequencing (2 to 20 Kb), Coding sequences (Exome), Methylome (ChIP-Seq), Restriction Mapping (RAD-Seq), Human Leukocyte Antigen (HLA) typing, mixed genomes (metagenomes) and target amplicons, etc. Translating DNA sequence data from NGS sequencer into meaningful information is an important exercise. Under NGGF, we have bioinformatics experts and high-end computing resources to dissect NGS data such as genome assembly and annotation, gene expression, target enrichment, variant calling (SSR or SNP), comparative analysis etc. Our services (sequencing and bioinformatics) have been utilized by more than 45 organizations (academia and industry) both within India and outside, resulting several publications in peer-reviewed journals and several genomic/transcriptomic data is available at NCBI.

  20. High-quality permanent draft genome sequence of the extremely osmotolerant diphenol degrading bacterium Halotalea alkalilenta AW-7T, and emended description of the genus Halotalea

    DOE PAGES

    Ntougias, Spyridon; Lapidus, Alla; Copeland, Alex; ...

    2015-08-13

    Members of the genus Halotalea (family Halomonadaceae) are of high significance since they can tolerate the greatest glucose and maltose concentrations ever reported for known bacteria and are involved in the degradation of industrial effluents. Here, the characteristics and the permanent-draft genome sequence and annotation of Halotalea alkalilenta AW-7T are described. The microorganism was sequenced as a part of the Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG) project at the DOE Joint Genome Institute, and it is the only strain within the genus Halotalea having its genome sequenced. The genome is 4,467,826 bp longmore » and consists of 40 scaffolds with 64.62 % average GC content. A total of 4,104 genes were predicted, comprising of 4,028 protein-coding and 76 RNA genes. Most protein-coding genes (87.79 %) were assigned to a putative function. Halotalea alkalilenta AW-7T encodes the catechol and protocatechuate degradation to β-ketoadipate via the β-ketoadipate and protocatechuate ortho-cleavage degradation pathway, and it possesses the genetic ability to detoxify fluoroacetate, cyanate and acrylonitrile. Lastly, an emended description of the genus Halotalea Ntougias et al. 2007 is also provided in order to describe the delayed fermentation ability of the type strain.« less

  1. Assembly, Annotation, and Analysis of Multiple Mycorrhizal Fungal Genomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Initiative Consortium, Mycorrhizal Genomics; Kuo, Alan; Grigoriev, Igor

    Mycorrhizal fungi play critical roles in host plant health, soil community structure and chemistry, and carbon and nutrient cycling, all areas of intense interest to the US Dept. of Energy (DOE) Joint Genome Institute (JGI). To this end we are building on our earlier sequencing of the Laccaria bicolor genome by partnering with INRA-Nancy and the mycorrhizal research community in the MGI to sequence and analyze dozens of mycorrhizal genomes of all Basidiomycota and Ascomycota orders and multiple ecological types (ericoid, orchid, and ectomycorrhizal). JGI has developed and deployed high-throughput sequencing techniques, and Assembly, RNASeq, and Annotation Pipelines. In 2012more » alone we sequenced, assembled, and annotated 12 draft or improved genomes of mycorrhizae, and predicted ~;;232831 genes and ~;;15011 multigene families, All of this data is publicly available on JGI MycoCosm (http://jgi.doe.gov/fungi/), which provides access to both the genome data and tools with which to analyze the data. Preliminary comparisons of the current total of 14 public mycorrhizal genomes suggest that 1) short secreted proteins potentially involved in symbiosis are more enriched in some orders than in others amongst the mycorrhizal Agaricomycetes, 2) there are wide ranges of numbers of genes involved in certain functional categories, such as signal transduction and post-translational modification, and 3) novel gene families are specific to some ecological types.« less

  2. Mind the gap; seven reasons to close fragmented genome assemblies.

    PubMed

    Thomma, Bart P H J; Seidl, Michael F; Shi-Kunne, Xiaoqian; Cook, David E; Bolton, Melvin D; van Kan, Jan A L; Faino, Luigi

    2016-05-01

    Like other domains of life, research into the biology of filamentous microbes has greatly benefited from the advent of whole-genome sequencing. Next-generation sequencing (NGS) technologies have revolutionized sequencing, making genomic sciences accessible to many academic laboratories including those that study non-model organisms. Thus, hundreds of fungal genomes have been sequenced and are publically available today, although these initiatives have typically yielded considerably fragmented genome assemblies that often lack large contiguous genomic regions. Many important genomic features are contained in intergenic DNA that is often missing in current genome assemblies, and recent studies underscore the significance of non-coding regions and repetitive elements for the life style, adaptability and evolution of many organisms. The study of particular types of genetic elements, such as telomeres, centromeres, repetitive elements, effectors, and clusters of co-regulated genes, but also of phenomena such as structural rearrangements, genome compartmentalization and epigenetics, greatly benefits from having a contiguous and high-quality, preferably even complete and gapless, genome assembly. Here we discuss a number of important reasons to produce gapless, finished, genome assemblies to help answer important biological questions. Copyright © 2015 Elsevier Inc. All rights reserved.

  3. The complete genome sequence of the acarbose producer Actinoplanes sp. SE50/110

    PubMed Central

    2012-01-01

    Background Actinoplanes sp. SE50/110 is known as the wild type producer of the alpha-glucosidase inhibitor acarbose, a potent drug used worldwide in the treatment of type-2 diabetes mellitus. As the incidence of diabetes is rapidly rising worldwide, an ever increasing demand for diabetes drugs, such as acarbose, needs to be anticipated. Consequently, derived Actinoplanes strains with increased acarbose yields are being used in large scale industrial batch fermentation since 1990 and were continuously optimized by conventional mutagenesis and screening experiments. This strategy reached its limits and is generally superseded by modern genetic engineering approaches. As a prerequisite for targeted genetic modifications, the complete genome sequence of the organism has to be known. Results Here, we present the complete genome sequence of Actinoplanes sp. SE50/110 [GenBank:CP003170], the first publicly available genome of the genus Actinoplanes, comprising various producers of pharmaceutically and economically important secondary metabolites. The genome features a high mean G + C content of 71.32% and consists of one circular chromosome with a size of 9,239,851 bp hosting 8,270 predicted protein coding sequences. Phylogenetic analysis of the core genome revealed a rather distant relation to other sequenced species of the family Micromonosporaceae whereas Actinoplanes utahensis was found to be the closest species based on 16S rRNA gene sequence comparison. Besides the already published acarbose biosynthetic gene cluster sequence, several new non-ribosomal peptide synthetase-, polyketide synthase- and hybrid-clusters were identified on the Actinoplanes genome. Another key feature of the genome represents the discovery of a functional actinomycete integrative and conjugative element. Conclusions The complete genome sequence of Actinoplanes sp. SE50/110 marks an important step towards the rational genetic optimization of the acarbose production. In this regard, the identified actinomycete integrative and conjugative element could play a central role by providing the basis for the development of a genetic transformation system for Actinoplanes sp. SE50/110 and other Actinoplanes spp. Furthermore, the identified non-ribosomal peptide synthetase- and polyketide synthase-clusters potentially encode new antibiotics and/or other bioactive compounds, which might be of pharmacologic interest. PMID:22443545

  4. The complete genome sequence of the acarbose producer Actinoplanes sp. SE50/110.

    PubMed

    Schwientek, Patrick; Szczepanowski, Rafael; Rückert, Christian; Kalinowski, Jörn; Klein, Andreas; Selber, Klaus; Wehmeier, Udo F; Stoye, Jens; Pühler, Alfred

    2012-03-23

    Actinoplanes sp. SE50/110 is known as the wild type producer of the alpha-glucosidase inhibitor acarbose, a potent drug used worldwide in the treatment of type-2 diabetes mellitus. As the incidence of diabetes is rapidly rising worldwide, an ever increasing demand for diabetes drugs, such as acarbose, needs to be anticipated. Consequently, derived Actinoplanes strains with increased acarbose yields are being used in large scale industrial batch fermentation since 1990 and were continuously optimized by conventional mutagenesis and screening experiments. This strategy reached its limits and is generally superseded by modern genetic engineering approaches. As a prerequisite for targeted genetic modifications, the complete genome sequence of the organism has to be known. Here, we present the complete genome sequence of Actinoplanes sp. SE50/110 [GenBank:CP003170], the first publicly available genome of the genus Actinoplanes, comprising various producers of pharmaceutically and economically important secondary metabolites. The genome features a high mean G + C content of 71.32% and consists of one circular chromosome with a size of 9,239,851 bp hosting 8,270 predicted protein coding sequences. Phylogenetic analysis of the core genome revealed a rather distant relation to other sequenced species of the family Micromonosporaceae whereas Actinoplanes utahensis was found to be the closest species based on 16S rRNA gene sequence comparison. Besides the already published acarbose biosynthetic gene cluster sequence, several new non-ribosomal peptide synthetase-, polyketide synthase- and hybrid-clusters were identified on the Actinoplanes genome. Another key feature of the genome represents the discovery of a functional actinomycete integrative and conjugative element. The complete genome sequence of Actinoplanes sp. SE50/110 marks an important step towards the rational genetic optimization of the acarbose production. In this regard, the identified actinomycete integrative and conjugative element could play a central role by providing the basis for the development of a genetic transformation system for Actinoplanes sp. SE50/110 and other Actinoplanes spp. Furthermore, the identified non-ribosomal peptide synthetase- and polyketide synthase-clusters potentially encode new antibiotics and/or other bioactive compounds, which might be of pharmacologic interest.

  5. Whole-Genome Sequencing of Recent Listeria monocytogenes Isolates from Germany Reveals Population Structure and Disease Clusters.

    PubMed

    Halbedel, Sven; Prager, Rita; Fuchs, Stephan; Trost, Eva; Werner, Guido; Flieger, Antje

    2018-06-01

    Listeria monocytogenes causes foodborne outbreaks with high mortality. For improvement of outbreak cluster detection, the German consiliary laboratory for listeriosis implemented whole-genome sequencing (WGS) in 2015. A total of 424 human L. monocytogenes isolates collected in 2007 to 2017 were subjected to WGS and core-genome multilocus sequence typing (cgMLST). cgMLST grouped the isolates into 38 complexes, reflecting 4 known and 34 unknown disease clusters. Most of these complexes were confirmed by single nucleotide polymorphism (SNP) calling, but some were further differentiated. Interestingly, several cgMLST cluster types were further subtyped by pulsed-field gel electrophoresis, partly due to phage insertions in the accessory genome. Our results highlight the usefulness of cgMLST for routine cluster detection but also show that cgMLST complexes require validation by methods providing higher typing resolution. Twelve cgMLST clusters included recent cases, suggesting activity of the source. Therefore, the cgMLST nomenclature data presented here may support future public health actions. Copyright © 2018 American Society for Microbiology.

  6. Comparative analysis of the complete genome of an epidemic hospital sequence type 203 clone of vancomycin-resistant Enterococcus faecium

    PubMed Central

    2013-01-01

    Background In this report we have explored the genomic and microbiological basis for a sustained increase in bloodstream infections at a major Australian hospital caused by Enterococcus faecium multi-locus sequence type (ST) 203, an outbreak strain that has largely replaced a predecessor ST17 sequence type. Results To establish a ST203 reference sequence we fully assembled and annotated the genome of Aus0085, a 2009 vancomycin-resistant Enterococcus faecium (VREfm) bloodstream isolate, and the first example of a completed ST203 genome. Aus0085 has a 3.2 Mb genome, comprising a 2.9 Mb circular chromosome and six circular plasmids (2 kb–130 kb). Twelve percent of the 3222 coding sequences (CDS) in Aus0085 are not present in ST17 E. faecium Aus0004 and ST18 E. faecium TX16. Extending this comparison to an additional 12 ST17 and 14 ST203 E. faecium hospital isolate genomes revealed only six genomic regions spanning 41 kb that were present in all ST203 and absent from all ST17 genomes. The 40 CDS have predicted functions that include ion transport, riboflavin metabolism and two phosphotransferase systems. Comparison of the vancomycin resistance-conferring Tn1549 transposon between Aus0004 and Aus0085 revealed differences in transposon length and insertion site, and van locus sequence variation that correlated with a higher vancomycin MIC in Aus0085. Additional phenotype comparisons between ST17 and ST203 isolates showed that while there were no differences in biofilm-formation and killing of Galleria mellonella, ST203 isolates grew significantly faster and out-competed ST17 isolates in growth assays. Conclusions Here we have fully assembled and annotated the first ST203 genome, and then characterized the genomic differences between ST17 and ST203 E. faecium. We also show that ST203 E. faecium are faster growing and can out-compete ST17 E. faecium. While a causal genetic basis for these phenotype differences is not provided here, this study revealed conserved genetic differences between the two clones, differences that can now be tested to explain the molecular basis for the success and emergence of ST203 E. faecium. PMID:24004955

  7. Whole genome sequencing identifies influenza A H3N2 transmission and offers superior resolution to classical typing methods.

    PubMed

    Meinel, Dominik M; Heinzinger, Susanne; Eberle, Ute; Ackermann, Nikolaus; Schönberger, Katharina; Sing, Andreas

    2018-02-01

    Influenza with its annual epidemic waves is a major cause of morbidity and mortality worldwide. However, only little whole genome data are available regarding the molecular epidemiology promoting our understanding of viral spread in human populations. We implemented a RT-PCR strategy starting from patient material to generate influenza A whole genome sequences for molecular epidemiological surveillance. Samples were obtained within the Bavarian Influenza Sentinel. The complete influenza virus genome was amplified by a one-tube multiplex RT-PCR and sequenced on an Illumina MiSeq. We report whole genomic sequences for 50 influenza A H3N2 viruses, which was the predominating virus in the season 2014/15, directly from patient specimens. The dataset included random samples from Bavaria (Germany) throughout the influenza season and samples from three suspected transmission clusters. We identified the outbreak samples based on sequence identity. Whole genome sequencing (WGS) was superior in resolution compared to analysis of single segments or partial segment analysis. Additionally, we detected manifestation of substantial amounts of viral quasispecies in several patients, carrying mutations varying from the dominant virus in each patient. Our rapid whole genome sequencing approach for influenza A virus shows that WGS can effectively be used to detect and understand outbreaks in large communities. Additionally, the genomic data provide in-depth details about the circulating virus within one season.

  8. Carnivore-specific SINEs (Can-SINEs): distribution, evolution, and genomic impact.

    PubMed

    Walters-Conte, Kathryn B; Johnson, Diana L E; Allard, Marc W; Pecon-Slattery, Jill

    2011-01-01

    Short interspersed nuclear elements (SINEs) are a type of class 1 transposable element (retrotransposon) with features that allow investigators to resolve evolutionary relationships between populations and species while providing insight into genome composition and function. Characterization of a Carnivora-specific SINE family, Can-SINEs, has, has aided comparative genomic studies by providing rare genomic changes, and neutral sequence variants often needed to resolve difficult evolutionary questions. In addition, Can-SINEs constitute a significant source of functional diversity with Carnivora. Publication of the whole-genome sequence of domestic dog, domestic cat, and giant panda serves as a valuable resource in comparative genomic inferences gleaned from Can-SINEs. In anticipation of forthcoming studies bolstered by new genomic data, this review describes the discovery and characterization of Can-SINE motifs as well as describes composition, distribution, and effect on genome function. As the contribution of noncoding sequences to genomic diversity becomes more apparent, SINEs and other transposable elements will play an increasingly large role in mammalian comparative genomics.

  9. Carnivore-Specific SINEs (Can-SINEs): Distribution, Evolution, and Genomic Impact

    PubMed Central

    Johnson, Diana L.E.; Allard, Marc W.; Pecon-Slattery, Jill

    2011-01-01

    Short interspersed nuclear elements (SINEs) are a type of class 1 transposable element (retrotransposon) with features that allow investigators to resolve evolutionary relationships between populations and species while providing insight into genome composition and function. Characterization of a Carnivora-specific SINE family, Can-SINEs, has, has aided comparative genomic studies by providing rare genomic changes, and neutral sequence variants often needed to resolve difficult evolutionary questions. In addition, Can-SINEs constitute a significant source of functional diversity with Carnivora. Publication of the whole-genome sequence of domestic dog, domestic cat, and giant panda serves as a valuable resource in comparative genomic inferences gleaned from Can-SINEs. In anticipation of forthcoming studies bolstered by new genomic data, this review describes the discovery and characterization of Can-SINE motifs as well as describes composition, distribution, and effect on genome function. As the contribution of noncoding sequences to genomic diversity becomes more apparent, SINEs and other transposable elements will play an increasingly large role in mammalian comparative genomics. PMID:21846743

  10. Comparative genomics of Burkholderia multivorans, a ubiquitous pathogen with a highly conserved genomic structure

    PubMed Central

    Cooper, Vaughn S.; Hatcher, Philip J.; Verheyde, Bart; Carlier, Aurélien; Vandamme, Peter

    2017-01-01

    The natural environment serves as a reservoir of opportunistic pathogens. A well-established method for studying the epidemiology of such opportunists is multilocus sequence typing, which in many cases has defined strains predisposed to causing infection. Burkholderia multivorans is an important pathogen in people with cystic fibrosis (CF) and its epidemiology suggests that strains are acquired from non-human sources such as the natural environment. This raises the central question of whether the isolation source (CF or environment) or the multilocus sequence type (ST) of B. multivorans better predicts their genomic content and functionality. We identified four pairs of B. multivorans isolates, representing distinct STs and consisting of one CF and one environmental isolate each. All genomes were sequenced using the PacBio SMRT sequencing technology, which resulted in eight high-quality B. multivorans genome assemblies. The present study demonstrated that the genomic structure of the examined B. multivorans STs is highly conserved and that the B. multivorans genomic lineages are defined by their ST. Orthologous protein families were not uniformly distributed among chromosomes, with core orthologs being enriched on the primary chromosome and ST-specific orthologs being enriched on the second and third chromosome. The ST-specific orthologs were enriched in genes involved in defense mechanisms and secondary metabolism, corroborating the strain-specificity of these virulence characteristics. Finally, the same B. multivorans genomic lineages occur in both CF and environmental samples and on different continents, demonstrating their ubiquity and evolutionary persistence. PMID:28430818

  11. Standards for Clinical Grade Genomic Databases.

    PubMed

    Yohe, Sophia L; Carter, Alexis B; Pfeifer, John D; Crawford, James M; Cushman-Vokoun, Allison; Caughron, Samuel; Leonard, Debra G B

    2015-11-01

    Next-generation sequencing performed in a clinical environment must meet clinical standards, which requires reproducibility of all aspects of the testing. Clinical-grade genomic databases (CGGDs) are required to classify a variant and to assist in the professional interpretation of clinical next-generation sequencing. Applying quality laboratory standards to the reference databases used for sequence-variant interpretation presents a new challenge for validation and curation. To define CGGD and the categories of information contained in CGGDs and to frame recommendations for the structure and use of these databases in clinical patient care. Members of the College of American Pathologists Personalized Health Care Committee reviewed the literature and existing state of genomic databases and developed a framework for guiding CGGD development in the future. Clinical-grade genomic databases may provide different types of information. This work group defined 3 layers of information in CGGDs: clinical genomic variant repositories, genomic medical data repositories, and genomic medicine evidence databases. The layers are differentiated by the types of genomic and medical information contained and the utility in assisting with clinical interpretation of genomic variants. Clinical-grade genomic databases must meet specific standards regarding submission, curation, and retrieval of data, as well as the maintenance of privacy and security. These organizing principles for CGGDs should serve as a foundation for future development of specific standards that support the use of such databases for patient care.

  12. pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.

    PubMed

    Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael

    2013-08-01

    With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains. © 2013 John Wiley & Sons Ltd and Society for Applied Microbiology.

  13. Prediction of type III secretion signals in genomes of gram-negative bacteria.

    PubMed

    Löwer, Martin; Schneider, Gisbert

    2009-06-15

    Pathogenic bacteria infecting both animals as well as plants use various mechanisms to transport virulence factors across their cell membranes and channel these proteins into the infected host cell. The type III secretion system represents such a mechanism. Proteins transported via this pathway ("effector proteins") have to be distinguished from all other proteins that are not exported from the bacterial cell. Although a special targeting signal at the N-terminal end of effector proteins has been proposed in literature its exact characteristics remain unknown. In this study, we demonstrate that the signals encoded in the sequences of type III secretion system effectors can be consistently recognized and predicted by machine learning techniques. Known protein effectors were compiled from the literature and sequence databases, and served as training data for artificial neural networks and support vector machine classifiers. Common sequence features were most pronounced in the first 30 amino acids of the effector sequences. Classification accuracy yielded a cross-validated Matthews correlation of 0.63 and allowed for genome-wide prediction of potential type III secretion system effectors in 705 proteobacterial genomes (12% predicted candidates protein), their chromosomes (11%) and plasmids (13%), as well as 213 Firmicute genomes (7%). We present a signal prediction method together with comprehensive survey of potential type III secretion system effectors extracted from 918 published bacterial genomes. Our study demonstrates that the analyzed signal features are common across a wide range of species, and provides a substantial basis for the identification of exported pathogenic proteins as targets for future therapeutic intervention. The prediction software is publicly accessible from our web server (www.modlab.org).

  14. Complete genome sequence of livestock associated methicillin resistant Staphylococcus aureus ST398 isolated from swine in USA

    USDA-ARS?s Scientific Manuscript database

    Methicillin resistant Staphylococcus aureus colonizes and causes disease in many animal species. Livestock associated-MRSA isolates are represented by isolates of the sequence type 398. These isolates are considered to be livestock adapted. This report provides the complete genome of one swine assoc...

  15. Exploring DNA variant segregation types in pooled genome sequencing enables effective mapping of weeping trait in Malus

    USDA-ARS?s Scientific Manuscript database

    In recent years, next generation sequencing (NGS) based bulked segregant analysis (BSA) has become a powerful approach for allele discovery in non-model plant species. However, challenges remain, particular for out-crossing species with complex genomes. Here, the genetic control of a weeping bran...

  16. Whole genome sequencing and analysis of Campylobacter coli YH502 from retail chicken reveals a plasmid-borne type VI secretion system

    USDA-ARS?s Scientific Manuscript database

    Campylobacter is a major cause of foodborne illnesses worldwide. Campylobacter infections, commonly caused by ingestion of undercooked poultry and meat products, can lead to gastroenteritis and chronic reactive arthritis in humans. Whole genome sequencing (WGS) is a powerful technology that provides...

  17. Whole genome sequence analyses of Xylella fastidiosa PD strains from different geographical regions

    USDA-ARS?s Scientific Manuscript database

    Genome sequences were determined for two Pierce’s disease (PD) causing Xylella fastidiosa (Xf) strains, one from Florida and one from Taiwan. The Florida strain was ATCC 35879, the type of strain used as a standard reference for related taxonomy research. By contrast, the Taiwan strain used was only...

  18. Complete chloroplast genome and 45S nrDNA sequences of the medicinal plant species Glycyrrhiza glabra and Glycyrrhiza uralensis.

    PubMed

    Kang, Sang-Ho; Lee, Jeong-Hoon; Lee, Hyun Oh; Ahn, Byoung Ohg; Won, So Youn; Sohn, Seong-Han; Kim, Jung Sun

    2017-10-06

    Glycyrrhiza uralensis and G. glabra, members of the Fabaceae, are medicinally important species that are native to Asia and Europe. Extracts from these plants are widely used as natural sweeteners because of their much greater sweetness than sucrose. In this study, the three complete chloroplast genomes and five 45S nuclear ribosomal (nr)DNA sequences of these two licorice species and an interspecific hybrid are presented. The chloroplast genomes of G. glabra, G. uralensis and G. glabra × G. uralensis were 127,895 bp, 127,716 bp and 127,939 bp, respectively. The three chloroplast genomes harbored 110 annotated genes, including 76 protein-coding genes, 30 tRNA genes and 4 rRNA genes. The 45S nrDNA sequences were either 5,947 or 5,948 bp in length. Glycyrrhiza glabra and G. glabra × G. uralensis showed two types of nrDNA, while G. uralensis contained a single type. The complete 45S nrDNA sequence unit contains 18S rRNA, ITS1, 5.8S rRNA, ITS2 and 26S rRNA. We identified simple sequence repeat and tandem repeat sequences. We also developed four reliable markers for analysis of Glycyrrhiza diversity authentication.

  19. Using relational databases for improved sequence similarity searching and large-scale genomic analyses.

    PubMed

    Mackey, Aaron J; Pearson, William R

    2004-10-01

    Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.

  20. Complete genome sequence of the gliding, heparinolytic Pedobacter saltans type strain (113T)

    PubMed Central

    Liolios, Konstantinos; Sikorski, Johannes; Lu, Meagan; Nolan, Matt; Lapidus, Alla; Lucas, Susan; Hammon, Nancy; Deshpande, Shweta; Cheng, Jan-Fang; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne; Pitluck, Sam; Huntemann, Marcel; Ivanova, Natalia; Pagani, Ioanna; Mavromatis, Konstantinos; Ovchinikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Brambilla, Evelyne-Marie; Kotsyurbenko, Oleg; Rohde, Manfred; Tindall, Brian J.; Abt, Birte; Göker, Markus; Detter, John C.; Woyke, Tanja; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C.

    2011-01-01

    Pedobacter saltans Steyn et al. 1998 is one of currently 32 species in the genus Pedobacter within the family Sphingobacteriaceae. The species is of interest for its isolated location in the tree of life. Like other members of the genus P. saltans is heparinolytic. Cells of P. saltans show a peculiar gliding, dancing motility and can be distinguished from other Pedobacter strains by their ability to utilize glycerol and the inability to assimilate D-cellobiose. The genome presented here is only the second completed genome sequence of a type strain from a member of the family Sphingobacteriaceae to be published. The 4,635,236 bp long genome with its 3,854 protein-coding and 67 RNA genes consists of one chromosome, and is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:22180808

  1. Complete genome sequence of the gliding, heparinolytic Pedobacter saltans type strain (113).

    PubMed

    Liolios, Konstantinos; Sikorski, Johannes; Lu, Meagan; Nolan, Matt; Lapidus, Alla; Lucas, Susan; Hammon, Nancy; Deshpande, Shweta; Cheng, Jan-Fang; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne; Pitluck, Sam; Huntemann, Marcel; Ivanova, Natalia; Pagani, Ioanna; Mavromatis, Konstantinos; Ovchinikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Brambilla, Evelyne-Marie; Kotsyurbenko, Oleg; Rohde, Manfred; Tindall, Brian J; Abt, Birte; Göker, Markus; Detter, John C; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2011-10-15

    Pedobacter saltans Steyn et al. 1998 is one of currently 32 species in the genus Pedobacter within the family Sphingobacteriaceae. The species is of interest for its isolated location in the tree of life. Like other members of the genus P. saltans is heparinolytic. Cells of P. saltans show a peculiar gliding, dancing motility and can be distinguished from other Pedobacter strains by their ability to utilize glycerol and the inability to assimilate D-cellobiose. The genome presented here is only the second completed genome sequence of a type strain from a member of the family Sphingobacteriaceae to be published. The 4,635,236 bp long genome with its 3,854 protein-coding and 67 RNA genes consists of one chromosome, and is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  2. Genomic signal analysis of pathogen variability

    NASA Astrophysics Data System (ADS)

    Cristea, Paul Dan

    2006-02-01

    The paper presents results in the study of pathogen variability by using genomic signals. The conversion of symbolic nucleotide sequences into digital signals offers the possibility to apply signal processing methods to the analysis of genomic data. The method is particularly well suited to characterize small size genomic sequences, such as those found in viruses and bacteria, being a promising tool in tracking the variability of pathogens, especially in the context of developing drug resistance. The paper is based on data downloaded from GenBank [32], and comprises results on the variability of the eight segments of the influenza type A, subtype H5N1, virus genome, and of the Hemagglutinin (HA) gene, for the H1, H2, H3, H4, H5 and H16 types. Data from human and avian virus isolates are used.

  3. Permanent draft genomes of the three Rhodopirellula baltica strains SH28, SWK14 and WH47.

    PubMed

    Richter, Michael; Richter-Heitmann, Tim; Klindworth, Anna; Wegner, Carl-Eric; Frank, Carsten S; Harder, Jens; Glöckner, Frank Oliver

    2014-02-01

    The genomes of three Rhodopirellula baltica strains were sequenced as permanent drafts to complement the full genome sequence of the type strain R. baltica SH1(T). The isolates are part of a larger study to infer the biogeography of Rhodopirellula species in European marine waters, as well as to amend the genus description of R. baltica. This genomics resource article is the first of a series of five publications reporting in total eight new permanent daft genomes of Rhodopirellula species. Copyright © 2013 Elsevier B.V. All rights reserved.

  4. CuGene as a tool to view and explore genomic data

    NASA Astrophysics Data System (ADS)

    Haponiuk, Michał; Pawełkowicz, Magdalena; Przybecki, Zbigniew; Nowak, Robert M.

    2017-08-01

    Integrated CuGene is an easy-to-use, open-source, on-line tool that can be used to browse, analyze, and query genomic data and annotations. It places annotation tracks beneath genome coordinate positions, allowing rapid visual correlation of different types of information. It also allows users to upload and display their own experimental results or annotation sets. An important functionality of the application is a possibility to find similarity between sequences by applying four different algorithms of different accuracy. The presented tool was tested on real genomic data and is extensively used by Polish Consortium of Cucumber Genome Sequencing.

  5. Genome sequence of the photoarsenotrophic bacterium Ectothiorhodospira sp. strain BSL-9, isolated from a hypersaline alkaline arsenic-rich extreme environment

    USGS Publications Warehouse

    Hernandez-Maldonado, Jaime; Stoneburner, Brendon; Boren, Alison; Miller, Laurence; Rosen, Michael R.; Oremland, Ronald S.; Saltikov, Chad W

    2016-01-01

    The full genome sequence of Ectothiorhodospira sp. strain BSL-9 is reported here. This purple sulfur bacterium encodes an arxA-type arsenite oxidase within the arxB2AB1CD gene island and is capable of carrying out “photoarsenotrophy” anoxygenic photosynthetic arsenite oxidation. Its genome is composed of 3.5 Mb and has approximately 63% G+C content.

  6. A rural worker infected with a bovine-prevalent genotype of Campylobacter fetus subsp. fetus supports zoonotic transmission and inconsistency of MLST and whole-genome typing.

    PubMed

    Iraola, G; Betancor, L; Calleros, L; Gadea, P; Algorta, G; Galeano, S; Muxi, P; Greif, G; Pérez, R

    2015-08-01

    Whole-genome characterisation in clinical microbiology enables to detect trends in infection dynamics and disease transmission. Here, we report a case of bacteraemia due to Campylobacter fetus subsp. fetus in a rural worker under cancer treatment that was diagnosed with cellulitis; the patient was treated with antibiotics and recovered. The routine typing methods were not able to identify the microorganism causing the infection, so it was further analysed by molecular methods and whole-genome sequencing. The multi-locus sequence typing (MLST) revealed the presence of the bovine-associated ST-4 genotype. Whole-genome comparisons with other C. fetus strains revealed an inconsistent phylogenetic position based on the core genome, discordant with previous ST-4 strains. To the best of our knowledge, this is the first C. fetus subsp. fetus carrying the ST-4 isolated from humans and represents a probable case of zoonotic transmission from cattle.

  7. Sequences of multiple bacterial genomes and a Chlamydia trachomatis genotype from direct sequencing of DNA derived from a vaginal swab diagnostic specimen.

    PubMed

    Andersson, P; Klein, M; Lilliebridge, R A; Giffard, P M

    2013-09-01

    Ultra-deep Illumina sequencing was performed on whole genome amplified DNA derived from a Chlamydia trachomatis-positive vaginal swab. Alignment of reads with reference genomes allowed robust SNP identification from the C. trachomatis chromosome and plasmid. This revealed that the C. trachomatis in the specimen was very closely related to the sequenced urogenital, serovar F, clade T1 isolate F-SW4. In addition, high genome-wide coverage was obtained for Prevotella melaninogenica, Gardnerella vaginalis, Clostridiales genomosp. BVAB3 and Mycoplasma hominis. This illustrates the potential of metagenome data to provide high resolution bacterial typing data from multiple taxa in a diagnostic specimen. ©2013 The Authors Clinical Microbiology and Infection ©2013 European Society of Clinical Microbiology and Infectious Diseases.

  8. Genomics and metagenomics in medical microbiology.

    PubMed

    Padmanabhan, Roshan; Mishra, Ajay Kumar; Raoult, Didier; Fournier, Pierre-Edouard

    2013-12-01

    Over the last two decades, sequencing tools have evolved from laborious time-consuming methodologies to real-time detection and deciphering of genomic DNA. Genome sequencing, especially using next generation sequencing (NGS) has revolutionized the landscape of microbiology and infectious disease. This deluge of sequencing data has not only enabled advances in fundamental biology but also helped improve diagnosis, typing of pathogen, virulence and antibiotic resistance detection, and development of new vaccines and culture media. In addition, NGS also enabled efficient analysis of complex human micro-floras, both commensal, and pathological, through metagenomic methods, thus helping the comprehension and management of human diseases such as obesity. This review summarizes technological advances in genomics and metagenomics relevant to the field of medical microbiology. Copyright © 2013 Elsevier B.V. All rights reserved.

  9. NABIC: A New Access Portal to Search, Visualize, and Share Agricultural Genomics Data

    PubMed Central

    Seol, Young-Joo; Lee, Tae-Ho; Park, Dong-Suk; Kim, Chang-Kug

    2016-01-01

    The National Agricultural Biotechnology Information Center developed an access portal to search, visualize, and share agricultural genomics data with a focus on South Korean information and resources. The portal features an agricultural biotechnology database containing a wide range of omics data from public and proprietary sources. We collected 28.4 TB of data from 162 agricultural organisms, with 10 types of omics data comprising next-generation sequencing sequence read archive, genome, gene, nucleotide, DNA chip, expressed sequence tag, interactome, protein structure, molecular marker, and single-nucleotide polymorphism datasets. Our genomic resources contain information on five animals, seven plants, and one fungus, which is accessed through a genome browser. We also developed a data submission and analysis system as a web service, with easy-to-use functions and cutting-edge algorithms, including those for handling next-generation sequencing data. PMID:26848255

  10. Non-contiguous finished genome sequence and contextual data of the filamentous soil bacterium Ktedonobacter racemifer type strain (SOSP1-21).

    PubMed

    Chang, Yun-Juan; Land, Miriam; Hauser, Loren; Chertkov, Olga; Del Rio, Tijana Glavina; Nolan, Matt; Copeland, Alex; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Han, Cliff; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Ovchinikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Mavromatis, Konstantinos; Liolios, Konstantinos; Brettin, Thomas; Fiebig, Anne; Rohde, Manfred; Abt, Birte; Göker, Markus; Detter, John C; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lapidus, Alla

    2011-10-15

    Ktedonobacter racemifer corrig. Cavaletti et al. 2007 is the type species of the genus Ktedonobacter, which in turn is the type genus of the family Ktedonobacteraceae, the type family of the order Ktedonobacterales within the class Ktedonobacteria in the phylum 'Chloroflexi'. Although K. racemifer shares some morphological features with the actinobacteria, it is of special interest because it was the first cultivated representative of a deep branching unclassified lineage of otherwise uncultivated environmental phylotypes tentatively located within the phylum 'Chloroflexi'. The aerobic, filamentous, non-motile, spore-forming Gram-positive heterotroph was isolated from soil in Italy. The 13,661,586 bp long non-contiguous finished genome consists of ten contigs and is the first reported genome sequence from a member of the class Ktedonobacteria. With its 11,453 protein-coding and 87 RNA genes, it is the largest prokaryotic genome reported so far. It comprises a large number of over-represented COGs, particularly genes associated with transposons, causing the genetic redundancy within the genome being considerably larger than expected by chance. This work is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  11. The Staphylococcus aureus Two-Component System AgrAC Displays Four Distinct Genomic Arrangements That Delineate Genomic Virulence Factor Signatures

    PubMed Central

    Choudhary, Kumari S.; Mih, Nathan; Monk, Jonathan; Kavvas, Erol; Yurkovich, James T.; Sakoulas, George; Palsson, Bernhard O.

    2018-01-01

    Two-component systems (TCSs) consist of a histidine kinase and a response regulator. Here, we evaluated the conservation of the AgrAC TCS among 149 completely sequenced Staphylococcus aureus strains. It is composed of four genes: agrBDCA. We found that: (i) AgrAC system (agr) was found in all but one of the 149 strains, (ii) the agr positive strains were further classified into four agr types based on AgrD protein sequences, (iii) the four agr types not only specified the chromosomal arrangement of the agr genes but also the sequence divergence of AgrC histidine kinase protein, which confers signal specificity, (iv) the sequence divergence was reflected in distinct structural properties especially in the transmembrane region and second extracellular binding domain, and (v) there was a strong correlation between the agr type and the virulence genomic profile of the organism. Taken together, these results demonstrate that bioinformatic analysis of the agr locus leads to a classification system that correlates with the presence of virulence factors and protein structural properties. PMID:29887846

  12. Complete genome sequence of Sebaldella termitidis type strain (NCTC 11300T)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harmon-Smith, Miranda; Celia, Laura; Chertkov, Olga

    2010-01-01

    Sebaldella termitidis (Sebald 1962) Collins and Shah 1986, is the only species in the genus Sebaldella within the fusobacterial family Leptotrichiaceae . The sole and type strain of the species was first isolated about 50 years ago from intestinal content of Mediterranean ter-mites. The species is of interest for its very isolated phylogenetic position within the phylum Fusobacteria in the tree of life, with no other species sharing more than 90% 16S rRNA se-quence similarity. The 4,486,650 bp long genome with its 4,210 protein-coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  13. Draft genome sequence of Cicer reticulatum L., the wild progenitor of chickpea provides a resource for agronomic trait improvement.

    PubMed

    Gupta, Sonal; Nawaz, Kashif; Parween, Sabiha; Roy, Riti; Sahu, Kamlesh; Kumar Pole, Anil; Khandal, Hitaishi; Srivastava, Rishi; Kumar Parida, Swarup; Chattopadhyay, Debasis

    2017-02-01

    Cicer reticulatum L. is the wild progenitor of the fourth most important legume crop chickpea (C. arietinum L.). We assembled short-read sequences into 416 Mb draft genome of C. reticulatum and anchored 78% (327 Mb) of this assembly to eight linkage groups. Genome annotation predicted 25,680 protein-coding genes covering more than 90% of predicted gene space. The genome assembly shared a substantial synteny and conservation of gene orders with the genome of the model legume Medicago truncatula. Resistance gene homologs of wild and domesticated chickpeas showed high sequence homology and conserved synteny. Comparison of gene sequences and nucleotide diversity using 66 wild and domesticated chickpea accessions suggested that the desi type chickpea was genetically closer to the wild species than the kabuli type. Comparative analyses predicted gene flow between the wild and the cultivated species during domestication. Molecular diversity and population genetic structure determination using 15,096 genome-wide single nucleotide polymorphisms revealed an admixed domestication pattern among cultivated (desi and kabuli) and wild chickpea accessions belonging to three population groups reflecting significant influence of parentage or geographical origin for their cultivar-specific population classification. The assembly and the polymorphic sequence resources presented here would facilitate the study of chickpea domestication and targeted use of wild Cicer germplasms for agronomic trait improvement in chickpea. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  14. Complete genome sequence of Desulfarculus baarsii type strain (2st14T)

    PubMed Central

    Sun, Hui; Spring, Stefan; Lapidus, Alla; Davenport, Karen; Del Rio, Tijana Glavina; Tice, Hope; Nolan, Matt; Copeland, Alex; Cheng, Jan-Fang; Lucas, Susan; Tapia, Roxanne; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Pagani, Ionna; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Detter, John C.; Han, Cliff; Rohde, Manfred; Brambilla, Evelyne; Göker, Markus; Woyke, Tanja; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Land, Miriam

    2010-01-01

    Desulfarculus baarsii (Widdel 1981) Kuever et al. 2006 is the type and only species of the genus Desulfarculus, which represents the family Desulfarculaceae and the order Desulfarculales. This species is a mesophilic sulfate-reducing bacterium with the capability to oxidize acetate and fatty acids of up to 18 carbon atoms completely to CO2. The acetyl-CoA/CODH (Wood-Ljungdahl) pathway is used by this species for the complete oxidation of carbon sources and autotrophic growth on formate. The type strain 2st14T was isolated from a ditch sediment collected near the University of Konstanz, Germany. This is the first completed genome sequence of a member of the order Desulfarculales. The 3,655,731 bp long single replicon genome with its 3,303 protein-coding and 52 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304732

  15. Genome Sequence of Saccharomyces carlsbergensis, the World’s First Pure Culture Lager Yeast

    PubMed Central

    Walther, Andrea; Hesselbart, Ana; Wendland, Jürgen

    2014-01-01

    Lager yeast beer production was revolutionized by the introduction of pure culture strains. The first established lager yeast strain is known as the bottom fermenting Saccharomyces carlsbergensis, which was originally termed Unterhefe No. 1 by Emil Chr. Hansen and has been used in production in since 1883. S. carlsbergensis belongs to group I/Saaz-type lager yeast strains and is better adapted to cold growth conditions than group II/Frohberg-type lager yeasts, e.g., the Weihenstephan strain WS34/70. Here, we sequenced S. carlsbergensis using next generation sequencing technologies. Lager yeasts are descendants from hybrids formed between a S. cerevisiae parent and a parent similar to S. eubayanus. Accordingly, the S. carlsbergensis 19.5-Mb genome is substantially larger than the 12-Mb S. cerevisiae genome. Based on the sequence scaffolds, synteny to the S. cerevisae genome, and by using directed polymerase chain reaction for gap closure, we generated a chromosomal map of S. carlsbergensis consisting of 29 unique chromosomes. We present evidence for genome and chromosome evolution within S. carlsbergensis via chromosome loss and loss of heterozygosity specifically of parts derived from the S. cerevisiae parent. Based on our sequence data and via fluorescence-activated cell-sorting analysis, we determined the ploidy of S. carlsbergensis. This inferred that this strain is basically triploid with a diploid S. eubayanus and haploid S. cerevisiae genome content. In contrast the Weihenstephan strain, which we resequenced, is essentially tetraploid composed of two diploid S. cerevisiae and S. eubayanus genomes. Based on conserved translocations between the parental genomes in S. carlsbergensis and the Weihenstephan strain we propose a joint evolutionary ancestry for lager yeast strains. PMID:24578374

  16. Within-Host Variations of Human Papillomavirus Reveal APOBEC Signature Mutagenesis in the Viral Genome.

    PubMed

    Hirose, Yusuke; Onuki, Mamiko; Tenjimbayashi, Yuri; Mori, Seiichiro; Ishii, Yoshiyuki; Takeuchi, Takamasa; Tasaka, Nobutaka; Satoh, Toyomi; Morisada, Tohru; Iwata, Takashi; Miyamoto, Shingo; Matsumoto, Koji; Sekizawa, Akihiko; Kukimoto, Iwao

    2018-06-15

    Persistent infection with oncogenic human papillomaviruses (HPVs) causes cervical cancer, accompanied by the accumulation of somatic mutations into the host genome. There are concomitant genetic changes in the HPV genome during viral infection; however, their relevance to cervical carcinogenesis is poorly understood. Here, we explored within-host genetic diversity of HPV by performing deep-sequencing analyses of viral whole-genome sequences in clinical specimens. The whole genomes of HPV types 16, 52, and 58 were amplified by type-specific PCR from total cellular DNA of cervical exfoliated cells collected from patients with cervical intraepithelial neoplasia (CIN) and invasive cervical cancer (ICC) and were deep sequenced. After constructing a reference viral genome sequence for each specimen, nucleotide positions showing changes with >0.5% frequencies compared to the reference sequence were determined for individual samples. In total, 1,052 positions of nucleotide variations were detected in HPV genomes from 151 samples (CIN1, n = 56; CIN2/3, n = 68; ICC, n = 27), with various numbers per sample. Overall, C-to-T and C-to-A substitutions were the dominant changes observed across all histological grades. While C-to-T transitions were predominantly detected in CIN1, their prevalence was decreased in CIN2/3 and fell below that of C-to-A transversions in ICC. Analysis of the trinucleotide context encompassing substituted bases revealed that TpCpN, a preferred target sequence for cellular APOBEC cytosine deaminases, was a primary site for C-to-T substitutions in the HPV genome. These results strongly imply that the APOBEC proteins are drivers of HPV genome mutation, particularly in CIN1 lesions. IMPORTANCE HPVs exhibit surprisingly high levels of genetic diversity, including a large repertoire of minor genomic variants in each viral genotype. Here, by conducting deep-sequencing analyses, we show for the first time a comprehensive snapshot of the within-host genetic diversity of high-risk HPVs during cervical carcinogenesis. Quasispecies harboring minor nucleotide variations in viral whole-genome sequences were extensively observed across different grades of CIN and cervical cancer. Among the within-host variations, C-to-T transitions, a characteristic change mediated by cellular APOBEC cytosine deaminases, were predominantly detected throughout the whole viral genome, most strikingly in low-grade CIN lesions. The results strongly suggest that within-host variations of the HPV genome are primarily generated through the interaction with host cell DNA-editing enzymes and that such within-host variability is an evolutionary source of the genetic diversity of HPVs. Copyright © 2018 American Society for Microbiology.

  17. A Concise Atlas of Thyroid Cancer Next-Generation Sequencing Panel ThyroSeq v.2

    PubMed Central

    Alsina, Jorge; Alsina, Raul; Gulec, Seza

    2017-01-01

    The next-generation sequencing technology allows high out-put genomic analysis. An innovative assay in thyroid cancer, ThyroSeq® was developed for targeted mutation detection by next generation sequencing technology in fine needle aspiration and tissue samples. ThyroSeq v.2 next generation sequencing panel offers simultaneous sequencing and detection in >1000 hotspots of 14 thyroid cancer-related genes and for 42 types of gene fusions known to occur in thyroid cancer. ThyroSeq is being increasingly used to further narrow the indeterminate category defined by cytology for thyroid nodules. From a surgical perspective, genomic profiling also provides prognostic and predictive information and closely relates to determination of surgical strategy. Both the genomic analysis technology and the informatics for the cancer genome data base are rapidly developing. In this paper, we have gathered existing information on the thyroid cancer-related genes involved in the initiation and progression of thyroid cancer. Our goal is to assemble a glossary for the current ThyroSeq genomic panel that can help elucidate the role genomics play in thyroid cancer oncogenesis. PMID:28117295

  18. On the Sequence-Directed Nature of Human Gene Mutation: The Role of Genomic Architecture and the Local DNA Sequence Environment in Mediating Gene Mutations Underlying Human Inherited Disease

    PubMed Central

    Cooper, David N.; Bacolla, Albino; Férec, Claude; Vasquez, Karen M.; Kehrer-Sawatzki, Hildegard; Chen, Jian-Min

    2011-01-01

    Different types of human gene mutation may vary in size, from structural variants (SVs) to single base-pair substitutions, but what they all have in common is that their nature, size and location are often determined either by specific characteristics of the local DNA sequence environment or by higher-order features of the genomic architecture. The human genome is now recognized to contain ‘pervasive architectural flaws’ in that certain DNA sequences are inherently mutation-prone by virtue of their base composition, sequence repetitivity and/or epigenetic modification. Here we explore how the nature, location and frequency of different types of mutation causing inherited disease are shaped in large part, and often in remarkably predictable ways, by the local DNA sequence environment. The mutability of a given gene or genomic region may also be influenced indirectly by a variety of non-canonical (non-B) secondary structures whose formation is facilitated by the underlying DNA sequence. Since these non-B DNA structures can interfere with subsequent DNA replication and repair, and may serve to increase mutation frequencies in generalized fashion (i.e. both in the context of subtle mutations and SVs), they have the potential to serve as a unifying concept in studies of mutational mechanisms underlying human inherited disease. PMID:21853507

  19. Comparative analysis of the complete sequence of the plastid genome of Parthenium argentatum and identification of DNA barcodes to differentiate Parthenium species and lines

    PubMed Central

    2009-01-01

    Background Parthenium argentatum (guayule) is an industrial crop that produces latex, which was recently commercialized as a source of latex rubber safe for people with Type I latex allergy. The complete plastid genome of P. argentatum was sequenced. The sequence provides important information useful for genetic engineering strategies. Comparison to the sequences of plastid genomes from three other members of the Asteraceae, Lactuca sativa, Guitozia abyssinica and Helianthus annuus revealed details of the evolution of the four genomes. Chloroplast-specific DNA barcodes were developed for identification of Parthenium species and lines. Results The complete plastid genome of P. argentatum is 152,803 bp. Based on the overall comparison of individual protein coding genes with those in L. sativa, G. abyssinica and H. annuus, we demonstrate that the P. argentatum chloroplast genome sequence is most closely related to that of H. annuus. Similar to chloroplast genomes in G. abyssinica, L. sativa and H. annuus, the plastid genome of P. argentatum has a large 23 kb inversion with a smaller 3.4 kb inversion, within the large inversion. Using the matK and psbA-trnH spacer chloroplast DNA barcodes, three of the four Parthenium species tested, P. tomentosum, P. hysterophorus and P. schottii, can be differentiated from P. argentatum. In addition, we identified lines within P. argentatum. Conclusion The genome sequence of the P. argentatum chloroplast will enrich the sequence resources of plastid genomes in commercial crops. The availability of the complete plastid genome sequence may facilitate transformation efficiency by using the precise sequence of endogenous flanking sequences and regulatory elements in chloroplast transformation vectors. The DNA barcoding study forms the foundation for genetic identification of commercially significant lines of P. argentatum that are important for producing latex. PMID:19917140

  20. NGSCheckMate: software for validating sample identity in next-generation sequencing studies within and across data types.

    PubMed

    Lee, Sejoon; Lee, Soohyun; Ouellette, Scott; Park, Woong-Yang; Lee, Eunjung A; Park, Peter J

    2017-06-20

    In many next-generation sequencing (NGS) studies, multiple samples or data types are profiled for each individual. An important quality control (QC) step in these studies is to ensure that datasets from the same subject are properly paired. Given the heterogeneity of data types, file types and sequencing depths in a multi-dimensional study, a robust program that provides a standardized metric for genotype comparisons would be useful. Here, we describe NGSCheckMate, a user-friendly software package for verifying sample identities from FASTQ, BAM or VCF files. This tool uses a model-based method to compare allele read fractions at known single-nucleotide polymorphisms, considering depth-dependent behavior of similarity metrics for identical and unrelated samples. Our evaluation shows that NGSCheckMate is effective for a variety of data types, including exome sequencing, whole-genome sequencing, RNA-seq, ChIP-seq, targeted sequencing and single-cell whole-genome sequencing, with a minimal requirement for sequencing depth (>0.5X). An alignment-free module can be run directly on FASTQ files for a quick initial check. We recommend using this software as a QC step in NGS studies. https://github.com/parklab/NGSCheckMate. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing

    PubMed Central

    Alioto, Tyler S.; Buchhalter, Ivo; Derdak, Sophia; Hutter, Barbara; Eldridge, Matthew D.; Hovig, Eivind; Heisler, Lawrence E.; Beck, Timothy A.; Simpson, Jared T.; Tonon, Laurie; Sertier, Anne-Sophie; Patch, Ann-Marie; Jäger, Natalie; Ginsbach, Philip; Drews, Ruben; Paramasivam, Nagarajan; Kabbe, Rolf; Chotewutmontri, Sasithorn; Diessl, Nicolle; Previti, Christopher; Schmidt, Sabine; Brors, Benedikt; Feuerbach, Lars; Heinold, Michael; Gröbner, Susanne; Korshunov, Andrey; Tarpey, Patrick S.; Butler, Adam P.; Hinton, Jonathan; Jones, David; Menzies, Andrew; Raine, Keiran; Shepherd, Rebecca; Stebbings, Lucy; Teague, Jon W.; Ribeca, Paolo; Giner, Francesc Castro; Beltran, Sergi; Raineri, Emanuele; Dabad, Marc; Heath, Simon C.; Gut, Marta; Denroche, Robert E.; Harding, Nicholas J.; Yamaguchi, Takafumi N.; Fujimoto, Akihiro; Nakagawa, Hidewaki; Quesada, Víctor; Valdés-Mas, Rafael; Nakken, Sigve; Vodák, Daniel; Bower, Lawrence; Lynch, Andrew G.; Anderson, Charlotte L.; Waddell, Nicola; Pearson, John V.; Grimmond, Sean M.; Peto, Myron; Spellman, Paul; He, Minghui; Kandoth, Cyriac; Lee, Semin; Zhang, John; Létourneau, Louis; Ma, Singer; Seth, Sahil; Torrents, David; Xi, Liu; Wheeler, David A.; López-Otín, Carlos; Campo, Elías; Campbell, Peter J.; Boutros, Paul C.; Puente, Xose S.; Gerhard, Daniela S.; Pfister, Stefan M.; McPherson, John D.; Hudson, Thomas J.; Schlesner, Matthias; Lichter, Peter; Eils, Roland; Jones, David T. W.; Gut, Ivo G.

    2015-01-01

    As whole-genome sequencing for cancer genome analysis becomes a clinical tool, a full understanding of the variables affecting sequencing analysis output is required. Here using tumour-normal sample pairs from two different types of cancer, chronic lymphocytic leukaemia and medulloblastoma, we conduct a benchmarking exercise within the context of the International Cancer Genome Consortium. We compare sequencing methods, analysis pipelines and validation methods. We show that using PCR-free methods and increasing sequencing depth to ∼100 × shows benefits, as long as the tumour:control coverage ratio remains balanced. We observe widely varying mutation call rates and low concordance among analysis pipelines, reflecting the artefact-prone nature of the raw data and lack of standards for dealing with the artefacts. However, we show that, using the benchmark mutation set we have created, many issues are in fact easy to remedy and have an immediate positive impact on mutation detection accuracy. PMID:26647970

  2. VirSorter: mining viral signal from microbial genomic data.

    PubMed

    Roux, Simon; Enault, Francois; Hurwitz, Bonnie L; Sullivan, Matthew B

    2015-01-01

    Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome), new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs) of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter's prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages). Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs) as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision) on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in "reverse" to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made available through the iPlant Cyberinfrastructure that provides a web-based user interface interconnected with the required computing resources. VirSorter thus complements existing prophage prediction softwares to better leverage fragmented, SAG and metagenomic datasets in a way that will scale to modern sequencing. Given these features, VirSorter should enable the discovery of new viruses in microbial datasets, and further our understanding of uncultivated viral communities across diverse ecosystems.

  3. VirSorter: mining viral signal from microbial genomic data

    PubMed Central

    Roux, Simon; Enault, Francois; Hurwitz, Bonnie L.

    2015-01-01

    Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome), new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs) of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter’s prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages). Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs) as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision) on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in “reverse” to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made available through the iPlant Cyberinfrastructure that provides a web-based user interface interconnected with the required computing resources. VirSorter thus complements existing prophage prediction softwares to better leverage fragmented, SAG and metagenomic datasets in a way that will scale to modern sequencing. Given these features, VirSorter should enable the discovery of new viruses in microbial datasets, and further our understanding of uncultivated viral communities across diverse ecosystems. PMID:26038737

  4. Draft genome sequence of marine-derived Streptomyces sp. TP-A0598, a producer of anti-MRSA antibiotic lydicamycins.

    PubMed

    Komaki, Hisayuki; Ichikawa, Natsuko; Hosoyama, Akira; Fujita, Nobuyuki; Igarashi, Yasuhiro

    2015-01-01

    Streptomyces sp. TP-A0598, isolated from seawater, produces lydicamycin, structurally unique type I polyketide bearing two nitrogen-containing five-membered rings, and four congeners TPU-0037-A, -B, -C, and -D. We herein report the 8 Mb draft genome sequence of this strain, together with classification and features of the organism and generation, annotation and analysis of the genome sequence. The genome encodes 7,240 putative ORFs, of which 4,450 ORFs were assigned with COG categories. Also, 66 tRNA genes and one rRNA operon were identified. The genome contains eight gene clusters involved in the production of polyketides and nonribosomal peptides. Among them, a PKS/NRPS gene cluster was assigned to be responsible for lydicamycin biosynthesis and a plausible biosynthetic pathway was proposed on the basis of gene function prediction. This genome sequence data will facilitate to probe the potential of secondary metabolism in marine-derived Streptomyces.

  5. Complete Genome Sequence of an Avian Paramyxovirus Type 4 Strain Isolated from Domestic Duck at a Live Bird Market in South Korea.

    PubMed

    Tseren-Ochir, Erdene-Ochir; Yuk, Seong-Su; Kwon, Jung-Hoon; Noh, Jin-Yong; Hong, Woo-Tack; Jeong, Jei-Hyun; Jeong, Sol; Kim, Yu-Jin; Kim, Kyu-Jik; Lee, Ji-Ho; Kim, Jun-Beom; Lee, Joong-Bok; Park, Seung-Yong; Choi, In-Soo; Lee, Sang-Won; Song, Chang-Seon

    2017-05-18

    We report here the first full-genome sequence of an avian paramyxovirus type 4 (APMV-4) strain isolated from a domestic mallard duck at a live bird market in South Korea. Phylogenetic analyses provide genetic information on a new genetic clade, APMV-4, isolated from a domestic duck and evidence of APMV-4 exchange between poultry and wild birds. Copyright © 2017 Tseren-Ochir et al.

  6. Non-contiguous finished genome sequence and description of Oceanobacillus massiliensis sp. nov.

    PubMed Central

    Roux, Véronique; Million, Matthieu; Robert, Catherine; Magne, Alix; Raoult, Didier

    2013-01-01

    Oceanobacillus massiliensis strain N’DiopT sp. nov. is the type strain of O. massiliensis sp. nov., a new species within the genus Oceanobacillus. This strain, whose genome is described here, was isolated from the fecal flora of a healthy patient. O. massiliensis is an aerobic rod. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,532,675 bp long genome contains 3,519 protein-coding genes and 72 RNA genes, including between 6 and 8 rRNA operons. PMID:24976893

  7. Complete genome sequence of the fire blight pathogen Erwinia pyrifoliae DSM 12163T and comparative genomic insights into plant pathogenicity

    PubMed Central

    2010-01-01

    Background Erwinia pyrifoliae is a newly described necrotrophic pathogen, which causes fire blight on Asian (Nashi) pear and is geographically restricted to Eastern Asia. Relatively little is known about its genetics compared to the closely related main fire blight pathogen E. amylovora. Results The genome of the type strain of E. pyrifoliae strain DSM 12163T, was sequenced using both 454 and Solexa pyrosequencing and annotated. The genome contains a circular chromosome of 4.026 Mb and four small plasmids. Based on their respective role in virulence in E. amylovora or related organisms, we identified several putative virulence factors, including type III and type VI secretion systems and their effectors, flagellar genes, sorbitol metabolism, iron uptake determinants, and quorum-sensing components. A deletion in the rpoS gene covering the most conserved region of the protein was identified which may contribute to the difference in virulence/host-range compared to E. amylovora. Comparative genomics with the pome fruit epiphyte Erwinia tasmaniensis Et1/99 showed that both species are overall highly similar, although specific differences were identified, for example the presence of some phage gene-containing regions and a high number of putative genomic islands containing transposases in the E. pyrifoliae DSM 12163T genome. Conclusions The E. pyrifoliae genome is an important addition to the published genome of E. tasmaniensis and the unfinished genome of E. amylovora providing a foundation for re-sequencing additional strains that may shed light on the evolution of the host-range and virulence/pathogenicity of this important group of plant-associated bacteria. PMID:20047678

  8. Cloning and Sequencing of Defective Particles Derived from the Autonomous Parvovirus Minute Virus of Mice for the Construction of Vectors with Minimal cis-Acting Sequences

    PubMed Central

    Clément, Nathalie; Avalosse, Bernard; El Bakkouri, Karim; Velu, Thierry; Brandenburger, Annick

    2001-01-01

    The production of wild-type-free stocks of recombinant parvovirus minute virus of mice [MVM(p)] is difficult due to the presence of homologous sequences in vector and helper genomes that cannot easily be eliminated from the overlapping coding sequences. We have therefore cloned and sequenced spontaneously occurring defective particles of MVM(p) with very small genomes to identify the minimal cis-acting sequences required for DNA amplification and virus production. One of them has lost all capsid-coding sequences but is still able to replicate in permissive cells when nonstructural proteins are provided in trans by a helper plasmid. Vectors derived from this particle produce stocks with no detectable wild-type MVM after cotransfection with new, matched, helper plasmids that present no homology downstream from the transgene. PMID:11152501

  9. Complete sequence of the genome of avian paramyxovirus type 2 (strain Yucaipa) and comparison with other paramyxoviruses

    PubMed Central

    Subbiah, Madhuri; Xiao, Sa; Collins, Peter L.; Samal, Siba K

    2009-01-01

    The complete RNA genome sequence of avian paramyxovirus (APMV) serotype 2, strain Yucaipa isolated from chicken has been determined. With genome size of 14,904 nucleotides (nt), strain Yucaipa is consistent with the “rule of six” and is the smallest virus reported to date among the members of subfamily Paramyxovirinae. The genome contains six non-overlapping genes in the order 3′-N-P/V-M-F-HN-L-5′. The genes are flanked on either side by highly-conserved transcription start and stop signals and have intergenic sequences varying in length from 3 to 23 nt. The genome contains a 55 nt leader sequence at 3′ end and a 154 nt trailer sequence at 5′ end. Alignment and phylogenetic analysis of the predicted amino acid sequences of strain Yucaipa proteins with the cognate proteins of viruses of all of the five genera of family Paramyxoviridae showed that APMV-2 strain Yucaipa is more closely related to APMV-6 than APMV-1. PMID:18603323

  10. Genome sequence of the pink to light reddish-pigmented Rubellimicrobium mesophilum type strain (DSM 19309(T)), a representative of the Roseobacter group isolated from soil, and emended description of the species.

    PubMed

    Riedel, Thomas; Spring, Stefan; Fiebig, Anne; Petersen, Jörn; Göker, Markus; Klenk, Hans-Peter

    2014-06-15

    Rubellimicrobium mesophilum Dastager et al. 2008 is a mesophilic and light reddish-pigmented representative of the Roseobacter group within the alphaproteobacterial family Rhodobacteraceae. Representatives of the Roseobacter group play an important role in the marine biogeochemical cycles and were found in a broad variety of marine environments associated with algal blooms, different kinds of sediments, and surfaces of invertebrates and vertebrates. Roseobacters were shown to be widely distributed, especially within the total bacterial community found in coastal waters, as well as in mixed water layers of the open ocean. Here we describe the features of R. mesophilum strain MSL-20(T) together with its genome sequence and annotation generated from a culture of DSM 19309(T). The 4,927,676 bp genome sequence consists of one chromosome and probably one extrachromosomal element. It contains 5,082 protein-coding genes and 56 RNA genes. As previously reported, the G+C content is significantly different from the actual genome sequence-based G+C content and as the type strain tests positively for oxidase, the species description is emended accordingly. The genome was sequenced as part of the activities of the Transregional Collaborative Research Centre 51 (TRR51) funded by the German Research Foundation (DFG).

  11. Food Safety in the Age of Next Generation Sequencing, Bioinformatics, and Open Data Access.

    PubMed

    Taboada, Eduardo N; Graham, Morag R; Carriço, João A; Van Domselaar, Gary

    2017-01-01

    Public health labs and food regulatory agencies globally are embracing whole genome sequencing (WGS) as a revolutionary new method that is positioned to replace numerous existing diagnostic and microbial typing technologies with a single new target: the microbial draft genome. The ability to cheaply generate large amounts of microbial genome sequence data, combined with emerging policies of food regulatory and public health institutions making their microbial sequences increasingly available and public, has served to open up the field to the general scientific community. This open data access policy shift has resulted in a proliferation of data being deposited into sequence repositories and of novel bioinformatics software designed to analyze these vast datasets. There also has been a more recent drive for improved data sharing to achieve more effective global surveillance, public health and food safety. Such developments have heightened the need for enhanced analytical systems in order to process and interpret this new type of data in a timely fashion. In this review we outline the emergence of genomics, bioinformatics and open data in the context of food safety. We also survey major efforts to translate genomics and bioinformatics technologies out of the research lab and into routine use in modern food safety labs. We conclude by discussing the challenges and opportunities that remain, including those expected to play a major role in the future of food safety science.

  12. Sequencing and characterizing the genome of Estrella lausannensis as an undergraduate project: training students and biological insights.

    PubMed

    Bertelli, Claire; Aeby, Sébastien; Chassot, Bérénice; Clulow, James; Hilfiker, Olivier; Rappo, Samuel; Ritzmann, Sébastien; Schumacher, Paolo; Terrettaz, Céline; Benaglio, Paola; Falquet, Laurent; Farinelli, Laurent; Gharib, Walid H; Goesmann, Alexander; Harshman, Keith; Linke, Burkhard; Miyazaki, Ryo; Rivolta, Carlo; Robinson-Rechavi, Marc; van der Meer, Jan Roelof; Greub, Gilbert

    2015-01-01

    With the widespread availability of high-throughput sequencing technologies, sequencing projects have become pervasive in the molecular life sciences. The huge bulk of data generated daily must be analyzed further by biologists with skills in bioinformatics and by "embedded bioinformaticians," i.e., bioinformaticians integrated in wet lab research groups. Thus, students interested in molecular life sciences must be trained in the main steps of genomics: sequencing, assembly, annotation and analysis. To reach that goal, a practical course has been set up for master students at the University of Lausanne: the "Sequence a genome" class. At the beginning of the academic year, a few bacterial species whose genome is unknown are provided to the students, who sequence and assemble the genome(s) and perform manual annotation. Here, we report the progress of the first class from September 2010 to June 2011 and the results obtained by seven master students who specifically assembled and annotated the genome of Estrella lausannensis, an obligate intracellular bacterium related to Chlamydia. The draft genome of Estrella is composed of 29 scaffolds encompassing 2,819,825 bp that encode for 2233 putative proteins. Estrella also possesses a 9136 bp plasmid that encodes for 14 genes, among which we found an integrase and a toxin/antitoxin module. Like all other members of the Chlamydiales order, Estrella possesses a highly conserved type III secretion system, considered as a key virulence factor. The annotation of the Estrella genome also allowed the characterization of the metabolic abilities of this strictly intracellular bacterium. Altogether, the students provided the scientific community with the Estrella genome sequence and a preliminary understanding of the biology of this recently-discovered bacterial genus, while learning to use cutting-edge technologies for sequencing and to perform bioinformatics analyses.

  13. Full genome sequences of zebra-borne equine herpesvirus type 1 isolated from zebra, onager and Thomson's gazelle.

    PubMed

    Guo, Xiaoqin; Izume, Satoko; Okada, Ayaka; Ohya, Kenji; Kimura, Takashi; Fukushi, Hideto

    2014-09-01

    A strain of equine herpesvirus type 1 (EHV-1) was isolated from zebra. This strain, called "zebra-borne EHV-1", was also isolated from an onager and a gazelle in zoological gardens in U.S.A. The full genome sequences of the 3 strains were determined. They shared 99% identities with each other, while they shared 98% and 95% identities with the horse derived EHV-1 and equine herpesvirus type 9, respectively. Sequence data indicated that the EHV-1 isolated from a polar bear in Germany is one of the zebra-borne EHV-1 and not a recombinant virus. These results indicated that zebra-borne EHV-1 is a subtype of EHV-1.

  14. In-silico Taxonomic Classification of 373 Genomes Reveals Species Misidentification and New Genospecies within the Genus Pseudomonas.

    PubMed

    Tran, Phuong N; Savka, Michael A; Gan, Han Ming

    2017-01-01

    The genus Pseudomonas has one of the largest diversity of species within the Bacteria kingdom. To date, its taxonomy is still being revised and updated. Due to the non-standardized procedure and ambiguous thresholds at species level, largely based on 16S rRNA gene or conventional biochemical assay, species identification of publicly available Pseudomonas genomes remains questionable. In this study, we performed a large-scale analysis of all Pseudomonas genomes with species designation (excluding the well-defined P. aeruginosa ) and re-evaluated their taxonomic assignment via in silico genome-genome hybridization and/or genetic comparison with valid type species. Three-hundred and seventy-three pseudomonad genomes were analyzed and subsequently clustered into 145 distinct genospecies. We detected 207 erroneous labels and corrected 43 to the proper species based on Average Nucleotide Identity Multilocus Sequence Typing (MLST) sequence similarity to the type strain. Surprisingly, more than half of the genomes initially designated as Pseudomonas syringae and Pseudomonas fluorescens should be classified either to a previously described species or to a new genospecies. Notably, high pairwise average nucleotide identity (>95%) indicating species-level similarity was observed between P. synxantha-P. libanensis, P. psychrotolerans - P. oryzihabitans , and P. kilonensis- P. brassicacearum , that were previously differentiated based on conventional biochemical tests and/or genome-genome hybridization techniques.

  15. In Depth Characterization of Repetitive DNA in 23 Plant Genomes Reveals Sources of Genome Size Variation in the Legume Tribe Fabeae.

    PubMed

    Macas, Jiří; Novák, Petr; Pellicer, Jaume; Čížková, Jana; Koblížková, Andrea; Neumann, Pavel; Fuková, Iva; Doležel, Jaroslav; Kelly, Laura J; Leitch, Ilia J

    2015-01-01

    The differential accumulation and elimination of repetitive DNA are key drivers of genome size variation in flowering plants, yet there have been few studies which have analysed how different types of repeats in related species contribute to genome size evolution within a phylogenetic context. This question is addressed here by conducting large-scale comparative analysis of repeats in 23 species from four genera of the monophyletic legume tribe Fabeae, representing a 7.6-fold variation in genome size. Phylogenetic analysis and genome size reconstruction revealed that this diversity arose from genome size expansions and contractions in different lineages during the evolution of Fabeae. Employing a combination of low-pass genome sequencing with novel bioinformatic approaches resulted in identification and quantification of repeats making up 55-83% of the investigated genomes. In turn, this enabled an analysis of how each major repeat type contributed to the genome size variation encountered. Differential accumulation of repetitive DNA was found to account for 85% of the genome size differences between the species, and most (57%) of this variation was found to be driven by a single lineage of Ty3/gypsy LTR-retrotransposons, the Ogre elements. Although the amounts of several other lineages of LTR-retrotransposons and the total amount of satellite DNA were also positively correlated with genome size, their contributions to genome size variation were much smaller (up to 6%). Repeat analysis within a phylogenetic framework also revealed profound differences in the extent of sequence conservation between different repeat types across Fabeae. In addition to these findings, the study has provided a proof of concept for the approach combining recent developments in sequencing and bioinformatics to perform comparative analyses of repetitive DNAs in a large number of non-model species without the need to assemble their genomes.

  16. Complete genome sequence of Methanospirillum hungatei type strain JF1

    DOE PAGES

    Gunsalus, Robert; Cook, Lauren E.; Crable, Bryan R.; ...

    2016-01-06

    Methanospirillum hungatei strain JF1 (DSM 864) is a methane-producing archaeon and is the type species of the genus Methanospirillum, which belongs to the family Methanospirillaceae within the order Methanomicrobiales. Its genome was selected for sequencing due to its ability to utilize hydrogen and carbon dioxide and/or formate as a sole source of energy. Ecologically, M. hungatei functions as the hydrogen- and/or formate-using partner with many species of syntrophic bacteria. Its morphology is distinct from other methanogens with the ability to form long chains of cells (up to 100 m in length), which are enclosed within a sheath-like structure, and terminalmore » cells with polar flagella. The genome of M. hungatei strain JF1 is the first completely sequenced genome of the family Methanospirillaceae, and it has a circular genome of 3,544,738 bp containing 3,239 protein coding and 68 RNA genes. Furthermore, the large genome of M. hungatei JF1 suggests the presence of unrecognized biochemical/physiological properties that likely extend to the other Methanospirillaceae and include the ability to form the unusual sheath-like structure and to successfully interact with syntrophic bacteria.« less

  17. Position-based scanning for comparative genomics and identification of genetic islands in Haemophilus influenzae type b.

    PubMed

    Bergman, Nicholas H; Akerley, Brian J

    2003-03-01

    Bacteria exhibit extensive genetic heterogeneity within species. In many cases, these differences account for virulence properties unique to specific strains. Several such loci have been discovered in the genome of the type b serotype of Haemophilus influenzae, a human pathogen able to cause meningitis, pneumonia, and septicemia. Here we report application of a PCR-based scanning procedure to compare the genome of a virulent type b (Hib) strain with that of the laboratory-passaged Rd KW20 strain for which a complete genome sequence is available. We have identified seven DNA segments or H. influenzae genetic islands (HiGIs) present in the type b genome and absent from the Rd genome. These segments vary in size and content and show signs of horizontal gene transfer in that their percent G+C content differs from that of the rest of the H. influenzae genome, they contain genes similar to those found on phages or other mobile elements, or they are flanked by DNA repeats. Several of these loci represent potential pathogenicity islands, because they contain genes likely to mediate interactions with the host. These newly identified genetic islands provide areas of investigation into both the evolution and pathogenesis of H. influenzae. In addition, the genome scanning approach developed to identify these islands provides a rapid means to compare the genomes of phenotypically diverse bacterial strains once the genome sequence of one representative strain has been determined.

  18. Identification and profiling of novel microRNAs in the Brassica rapa genome based on small RNA deep sequencing

    PubMed Central

    2012-01-01

    Background MicroRNAs (miRNAs) are one of the functional non-coding small RNAs involved in the epigenetic control of the plant genome. Although plants contain both evolutionary conserved miRNAs and species-specific miRNAs within their genomes, computational methods often only identify evolutionary conserved miRNAs. The recent sequencing of the Brassica rapa genome enables us to identify miRNAs and their putative target genes. In this study, we sought to provide a more comprehensive prediction of B. rapa miRNAs based on high throughput small RNA deep sequencing. Results We sequenced small RNAs from five types of tissue: seedlings, roots, petioles, leaves, and flowers. By analyzing 2.75 million unique reads that mapped to the B. rapa genome, we identified 216 novel and 196 conserved miRNAs that were predicted to target approximately 20% of the genome’s protein coding genes. Quantitative analysis of miRNAs from the five types of tissue revealed that novel miRNAs were expressed in diverse tissues but their expression levels were lower than those of the conserved miRNAs. Comparative analysis of the miRNAs between the B. rapa and Arabidopsis thaliana genomes demonstrated that redundant copies of conserved miRNAs in the B. rapa genome may have been deleted after whole genome triplication. Novel miRNA members seemed to have spontaneously arisen from the B. rapa and A. thaliana genomes, suggesting the species-specific expansion of miRNAs. We have made this data publicly available in a miRNA database of B. rapa called BraMRs. The database allows the user to retrieve miRNA sequences, their expression profiles, and a description of their target genes from the five tissue types investigated here. Conclusions This is the first report to identify novel miRNAs from Brassica crops using genome-wide high throughput techniques. The combination of computational methods and small RNA deep sequencing provides robust predictions of miRNAs in the genome. The finding of numerous novel miRNAs, many with few target genes and low expression levels, suggests the rapid evolution of miRNA genes. The development of a miRNA database, BraMRs, enables us to integrate miRNA identification, target prediction, and functional annotation of target genes. BraMRs will represent a valuable public resource with which to study the epigenetic control of B. rapa and other closely related Brassica species. The database is available at the following link: http://bramrs.rna.kr [1]. PMID:23163954

  19. BAC end sequencing of Pacific white shrimp Litopenaeus vannamei: a glimpse into the genome of Penaeid shrimp

    NASA Astrophysics Data System (ADS)

    Zhao, Cui; Zhang, Xiaojun; Liu, Chengzhang; Huan, Pin; Li, Fuhua; Xiang, Jianhai; Huang, Chao

    2012-05-01

    Little is known about the genome of Pacific white shrimp ( Litopenaeus vannamei). To address this, we conducted BAC (bacterial artificial chromosome) end sequencing of L. vannamei. We selected and sequenced 7 812 BAC clones from the BAC library LvHE from the two ends of the inserts by Sanger sequencing. After trimming and quality filtering, 11 279 BAC end sequences (BESs) including 4 609 pairedends BESs were obtained. The total length of the BESs was 4 340 753 bp, representing 0.18% of the L. vannamei haploid genome. The lengths of the BESs ranged from 100 bp to 660 bp with an average length of 385 bp. Analysis of the BESs indicated that the L. vannamei genome is AT-rich and that the primary repeats patterns were simple sequence repeats (SSRs) and low complexity sequences. Dinucleotide and hexanucleotide repeats were the most common SSR types in the BESs. The most abundant transposable element was gypsy, which may contribute to the generation of the large genome size of L. vannamei. We successfully annotated 4 519 BESs by BLAST searching, including genes involved in immunity and sex determination. Our results provide an important resource for functional gene studies, map construction and integration, and complete genome assembly for this species.

  20. Whole genomic DNA sequencing and comparative genomic analysis of Arthrospira platensis: high genome plasticity and genetic diversity

    PubMed Central

    Xu, Teng; Qin, Song; Hu, Yongwu; Song, Zhijian; Ying, Jianchao; Li, Peizhen; Dong, Wei; Zhao, Fangqing; Yang, Huanming; Bao, Qiyu

    2016-01-01

    Arthrospira platensis is a multi-cellular and filamentous non-N2-fixing cyanobacterium that is capable of performing oxygenic photosynthesis. In this study, we determined the nearly complete genome sequence of A. platensis YZ. A. platensis YZ genome is a single, circular chromosome of 6.62 Mb in size. Phylogenetic and comparative genomic analyses revealed that A. platensis YZ was more closely related to A. platensis NIES-39 than Arthrospira sp. PCC 8005 and A. platensis C1. Broad gene gains were identified between A. platensis YZ and three other Arthrospira speices, some of which have been previously demonstrated that can be laterally transferred among different species, such as restriction-modification systems-coding genes. Moreover, unprecedented extensive chromosomal rearrangements among different strains were observed. The chromosomal rearrangements, particularly the chromosomal inversions, were analysed and estimated to be closely related to palindromes that involved long inverted repeat sequences and the extensively distributed type IIR restriction enzyme in the Arthrospira genome. In addition, species from genus Arthrospira unanimously contained the highest rate of repetitive sequence compared with the other species of order Oscillatoriales, suggested that sequence duplication significantly contributed to Arthrospira genome phylogeny. These results provided in-depth views into the genomic phylogeny and structural variation of A. platensis, as well as provide a valuable resource for functional genomics studies. PMID:27330141

  1. Optimized guide RNA structure for genome editing via Cas9

    PubMed Central

    Xu, Jianyong; Lian, Wei; Jia, Yuning; Li, Lingyun; Huang, Zhong

    2017-01-01

    The genome editing tool Cas9-gRNA (guide RNA) has been successfully applied in different cell types and organisms with high efficiency. However, more efforts need to be made to enhance both efficiency and specificity. In the current study, we optimized the guide RNA structure of Streptococcus pyogenes CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)/Cas (CRISPR-associated) system to improve its genome editing efficiency. Comparing with the original functional structure of guide RNA, which is composed of crRNA and tracrRNA, the widely used chimeric gRNA has shorter crRNA and tracrRNA sequence. The deleted RNA sequence could form extra loop structure, which might enhance the stability of the guide RNA structure and subsequently the genome editing efficiency. Thus the genome editing efficiency of different forms of guide RNA was tested. And we found that the chimeric structure of gRNA with original full length of crRNA and tracrRNA showed higher genome editing efficiency than the conventional chimeric structure or other types of gRNA we tested. Therefore our data here uncovered the new type of gRNA structure with higher genome editing efficiency. PMID:29212218

  2. Investigation of Outbreaks of Salmonella enterica Serovar Typhimurium and Its Monophasic Variants Using Whole-Genome Sequencing, Denmark

    PubMed Central

    Gymoese, Pernille; Sørensen, Gitte; Litrup, Eva; Olsen, John Elmerdal; Nielsen, Eva Møller

    2017-01-01

    Whole-genome sequencing is rapidly replacing current molecular typing methods for surveillance purposes. Our study evaluates core-genome single-nucleotide polymorphism analysis for outbreak detection and linking of sources of Salmonella enterica serovar Typhimurium and its monophasic variants during a 7-month surveillance period in Denmark. We reanalyzed and defined 8 previously characterized outbreaks from the phylogenetic relatedness of the isolates, epidemiologic data, and food traceback investigations. All outbreaks were identified, and we were able to exclude unrelated and include additional related human cases. We were furthermore able to link possible food and veterinary sources to the outbreaks. Isolates clustered according to sequence types (STs) 19, 34, and 36. Our study shows that core-genome single-nucleotide polymorphism analysis is suitable for surveillance and outbreak investigation for Salmonella Typhimurium (ST19 and ST36), but whole genome–wide analysis may be required for the tight genetic clone of monophasic variants (ST34). PMID:28930002

  3. Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.

    PubMed

    Angiuoli, Samuel V; White, James R; Matalka, Malcolm; White, Owen; Fricke, W Florian

    2011-01-01

    The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers.

  4. Resources and Costs for Microbial Sequence Analysis Evaluated Using Virtual Machines and Cloud Computing

    PubMed Central

    Angiuoli, Samuel V.; White, James R.; Matalka, Malcolm; White, Owen; Fricke, W. Florian

    2011-01-01

    Background The widespread popularity of genomic applications is threatened by the “bioinformatics bottleneck” resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. Results We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Conclusions Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers. PMID:22028928

  5. Draft genome sequence of Lactobacillus mali KCTC 3596.

    PubMed

    Kim, Dong-Wook; Choi, Sang-Haeng; Kang, Aram; Nam, Seong-Hyeuk; Kim, Dae-Soo; Kim, Ryong Nam; Kim, Aeri; Park, Hong-Seog

    2011-09-01

    We announce the draft genome sequence of the type strain Lactobacillus mali KCTC 3596 (2,652,969 bp, with a G+C content of 36.0%), which is one of the most prevalent lactic acid bacteria present during the manufacturing process of apple juice. The genome consists of 122 large contigs (>100 bp). All of the contigs were assembled by Newbler Assembler 2.3 (454 Life Science). Copyright © 2011, American Society for Microbiology. All Rights Reserved.

  6. Genome of Herbaspirillum seropedicae Strain SmR1, a Specialized Diazotrophic Endophyte of Tropical Grasses

    PubMed Central

    Pedrosa, Fábio O.; Monteiro, Rose Adele; Wassem, Roseli; Cruz, Leonardo M.; Ayub, Ricardo A.; Colauto, Nelson B.; Fernandez, Maria Aparecida; Fungaro, Maria Helena P.; Grisard, Edmundo C.; Hungria, Mariangela; Madeira, Humberto M. F.; Nodari, Rubens O.; Osaku, Clarice A.; Petzl-Erler, Maria Luiza; Terenzi, Hernán; Vieira, Luiz G. E.; Steffens, Maria Berenice R.; Weiss, Vinicius A.; Pereira, Luiz F. P.; Almeida, Marina I. M.; Alves, Lysangela R.; Marin, Anelis; Araujo, Luiza Maria; Balsanelli, Eduardo; Baura, Valter A.; Chubatsu, Leda S.; Faoro, Helisson; Favetti, Augusto; Friedermann, Geraldo; Glienke, Chirlei; Karp, Susan; Kava-Cordeiro, Vanessa; Raittz, Roberto T.; Ramos, Humberto J. O.; Ribeiro, Enilze Maria S. F.; Rigo, Liu Un; Rocha, Saul N.; Schwab, Stefan; Silva, Anilda G.; Souza, Eliel M.; Tadra-Sfeir, Michelle Z.; Torres, Rodrigo A.; Dabul, Audrei N. G.; Soares, Maria Albertina M.; Gasques, Luciano S.; Gimenes, Ciela C. T.; Valle, Juliana S.; Ciferri, Ricardo R.; Correa, Luiz C.; Murace, Norma K.; Pamphile, João A.; Patussi, Eliana Valéria; Prioli, Alberto J.; Prioli, Sonia Maria A.; Rocha, Carmem Lúcia M. S. C.; Arantes, Olívia Márcia N.; Furlaneto, Márcia Cristina; Godoy, Leandro P.; Oliveira, Carlos E. C.; Satori, Daniele; Vilas-Boas, Laurival A.; Watanabe, Maria Angélica E.; Dambros, Bibiana Paula; Guerra, Miguel P.; Mathioni, Sandra Marisa; Santos, Karine Louise; Steindel, Mario; Vernal, Javier; Barcellos, Fernando G.; Campo, Rubens J.; Chueire, Ligia Maria O.; Nicolás, Marisa Fabiana; Pereira-Ferrari, Lilian; da Conceição Silva, José L.; Gioppo, Nereida M. R.; Margarido, Vladimir P.; Menck-Soares, Maria Amélia; Pinto, Fabiana Gisele S.; Simão, Rita de Cássia G.; Takahashi, Elizabete K.; Yates, Marshall G.; Souza, Emanuel M.

    2011-01-01

    The molecular mechanisms of plant recognition, colonization, and nutrient exchange between diazotrophic endophytes and plants are scarcely known. Herbaspirillum seropedicae is an endophytic bacterium capable of colonizing intercellular spaces of grasses such as rice and sugar cane. The genome of H. seropedicae strain SmR1 was sequenced and annotated by The Paraná State Genome Programme—GENOPAR. The genome is composed of a circular chromosome of 5,513,887 bp and contains a total of 4,804 genes. The genome sequence revealed that H. seropedicae is a highly versatile microorganism with capacity to metabolize a wide range of carbon and nitrogen sources and with possession of four distinct terminal oxidases. The genome contains a multitude of protein secretion systems, including type I, type II, type III, type V, and type VI secretion systems, and type IV pili, suggesting a high potential to interact with host plants. H. seropedicae is able to synthesize indole acetic acid as reflected by the four IAA biosynthetic pathways present. A gene coding for ACC deaminase, which may be involved in modulating the associated plant ethylene-signaling pathway, is also present. Genes for hemagglutinins/hemolysins/adhesins were found and may play a role in plant cell surface adhesion. These features may endow H. seropedicae with the ability to establish an endophytic life-style in a large number of plant species. PMID:21589895

  7. Genome of Herbaspirillum seropedicae strain SmR1, a specialized diazotrophic endophyte of tropical grasses.

    PubMed

    Pedrosa, Fábio O; Monteiro, Rose Adele; Wassem, Roseli; Cruz, Leonardo M; Ayub, Ricardo A; Colauto, Nelson B; Fernandez, Maria Aparecida; Fungaro, Maria Helena P; Grisard, Edmundo C; Hungria, Mariangela; Madeira, Humberto M F; Nodari, Rubens O; Osaku, Clarice A; Petzl-Erler, Maria Luiza; Terenzi, Hernán; Vieira, Luiz G E; Steffens, Maria Berenice R; Weiss, Vinicius A; Pereira, Luiz F P; Almeida, Marina I M; Alves, Lysangela R; Marin, Anelis; Araujo, Luiza Maria; Balsanelli, Eduardo; Baura, Valter A; Chubatsu, Leda S; Faoro, Helisson; Favetti, Augusto; Friedermann, Geraldo; Glienke, Chirlei; Karp, Susan; Kava-Cordeiro, Vanessa; Raittz, Roberto T; Ramos, Humberto J O; Ribeiro, Enilze Maria S F; Rigo, Liu Un; Rocha, Saul N; Schwab, Stefan; Silva, Anilda G; Souza, Eliel M; Tadra-Sfeir, Michelle Z; Torres, Rodrigo A; Dabul, Audrei N G; Soares, Maria Albertina M; Gasques, Luciano S; Gimenes, Ciela C T; Valle, Juliana S; Ciferri, Ricardo R; Correa, Luiz C; Murace, Norma K; Pamphile, João A; Patussi, Eliana Valéria; Prioli, Alberto J; Prioli, Sonia Maria A; Rocha, Carmem Lúcia M S C; Arantes, Olívia Márcia N; Furlaneto, Márcia Cristina; Godoy, Leandro P; Oliveira, Carlos E C; Satori, Daniele; Vilas-Boas, Laurival A; Watanabe, Maria Angélica E; Dambros, Bibiana Paula; Guerra, Miguel P; Mathioni, Sandra Marisa; Santos, Karine Louise; Steindel, Mario; Vernal, Javier; Barcellos, Fernando G; Campo, Rubens J; Chueire, Ligia Maria O; Nicolás, Marisa Fabiana; Pereira-Ferrari, Lilian; Silva, José L da Conceição; Gioppo, Nereida M R; Margarido, Vladimir P; Menck-Soares, Maria Amélia; Pinto, Fabiana Gisele S; Simão, Rita de Cássia G; Takahashi, Elizabete K; Yates, Marshall G; Souza, Emanuel M

    2011-05-01

    The molecular mechanisms of plant recognition, colonization, and nutrient exchange between diazotrophic endophytes and plants are scarcely known. Herbaspirillum seropedicae is an endophytic bacterium capable of colonizing intercellular spaces of grasses such as rice and sugar cane. The genome of H. seropedicae strain SmR1 was sequenced and annotated by The Paraná State Genome Programme--GENOPAR. The genome is composed of a circular chromosome of 5,513,887 bp and contains a total of 4,804 genes. The genome sequence revealed that H. seropedicae is a highly versatile microorganism with capacity to metabolize a wide range of carbon and nitrogen sources and with possession of four distinct terminal oxidases. The genome contains a multitude of protein secretion systems, including type I, type II, type III, type V, and type VI secretion systems, and type IV pili, suggesting a high potential to interact with host plants. H. seropedicae is able to synthesize indole acetic acid as reflected by the four IAA biosynthetic pathways present. A gene coding for ACC deaminase, which may be involved in modulating the associated plant ethylene-signaling pathway, is also present. Genes for hemagglutinins/hemolysins/adhesins were found and may play a role in plant cell surface adhesion. These features may endow H. seropedicae with the ability to establish an endophytic life-style in a large number of plant species.

  8. Evolution of the P-type II ATPase gene family in the fungi and presence of structural genomic changes among isolates of Glomus intraradices.

    PubMed

    Corradi, Nicolas; Sanders, Ian R

    2006-03-10

    The P-type II ATPase gene family encodes proteins with an important role in adaptation of the cell to variation in external K+, Ca2+ and Na2+ concentrations. The presence of P-type II gene subfamilies that are specific for certain kingdoms has been reported but was sometimes contradicted by discovery of previously unknown homologous sequences in newly sequenced genomes. Members of this gene family have been sampled in all of the fungal phyla except the arbuscular mycorrhizal fungi (AMF; phylum Glomeromycota), which are known to play a key-role in terrestrial ecosystems and to be genetically highly variable within populations. Here we used highly degenerate primers on AMF genomic DNA to increase the sampling of fungal P-Type II ATPases and to test previous predictions about their evolution. In parallel, homologous sequences of the P-type II ATPases have been used to determine the nature and amount of polymorphism that is present at these loci among isolates of Glomus intraradices harvested from the same field. In this study, four P-type II ATPase sub-families have been isolated from three AMF species. We show that, contrary to previous predictions, P-type IIC ATPases are present in all basal fungal taxa. Additionally, P-Type IIE ATPases should no longer be considered as exclusive to the Ascomycota and the Basidiomycota, since we also demonstrate their presence in the Zygomycota. Finally, a comparison of homologous sequences encoding P-type IID ATPases showed unexpectedly that indel mutations among coding regions, as well as specific gene duplications occur among AMF individuals within the same field. On the basis of these results we suggest that the diversification of P-Type IIC and E ATPases followed the diversification of the extant fungal phyla with independent events of gene gains and losses. Consistent with recent findings on the human genome, but at a much smaller geographic scale, we provided evidence that structural genomic changes, such as exonic indel mutations and gene duplications are less rare than previously thought and that these also occur within fungal populations.

  9. Genome Sequence of Pseudomonas sp. Strain S9, an Extracellular Arylsulfatase-Producing Bacterium Isolated from Mangrove Soil ▿

    PubMed Central

    Long, Mengxian; Ruan, Lingwei; Yu, Ziniu; Xu, Xun

    2011-01-01

    Pseudomonas sp. strain S9 was originally isolated from mangrove soil in Xiamen, China. It is an aerobic bacterium which shows extracellular arylsulfatase activity. Here, we describe the 4.8-Mb draft genome sequence of Pseudomonas sp. S9, which exhibits novel cysteine-type sulfatases. PMID:21622746

  10. Genome Sequence of Lactobacillus farciminis KCTC 3681▿

    PubMed Central

    Nam, Seong-Hyeuk; Choi, Sang-Haeng; Kang, Aram; Kim, Dong-Wook; Kim, Ryong Nam; Kim, Aeri; Kim, Dae-Soo; Park, Hong-Seog

    2011-01-01

    Lactobacillus farciminis is one of the most prevalent lactic acid bacterial species present during the manufacturing process of kimchi, the best-known traditional Korean dish. Here, we present the draft genome sequence of the type strain Lactobacillus farciminis KCTC 3681 (2,498,309 bp, with a G+C content of 36.4%), which consists of 5 scaffolds. PMID:21257766

  11. Genome sequence of Lactobacillus farciminis KCTC 3681.

    PubMed

    Nam, Seong-Hyeuk; Choi, Sang-Haeng; Kang, Aram; Kim, Dong-Wook; Kim, Ryong Nam; Kim, Aeri; Kim, Dae-Soo; Park, Hong-Seog

    2011-04-01

    Lactobacillus farciminis is one of the most prevalent lactic acid bacterial species present during the manufacturing process of kimchi, the best-known traditional Korean dish. Here, we present the draft genome sequence of the type strain Lactobacillus farciminis KCTC 3681 (2,498,309 bp, with a G+C content of 36.4%), which consists of 5 scaffolds.

  12. Development of Fibroblast Cell Lines From the Cow Used to Sequence the Bovine Genome

    USDA-ARS?s Scientific Manuscript database

    Two cell lines, designated MARC.BGCF.2 and MARC.BGCF.1-3, were initiated from skin biopsies obtained from the Hereford cow whose DNA was used in sequencing the bovine genome. These cell lines were submitted to American Type Culture Collection (ATCC, Manassas, VA, USA) and will be made publicly avai...

  13. In silico genomic insights into aspects of food safety and defense mechanisms of a potentially probiotic Lactobacillus pentosus MP-10 isolated from brines of naturally fermented Aloreña green table olives.

    PubMed

    Abriouel, Hikmate; Pérez Montoro, Beatriz; Casado Muñoz, María Del Carmen; Knapp, Charles W; Gálvez, Antonio; Benomar, Nabil

    2017-01-01

    Lactobacillus pentosus MP-10, isolated from brines of naturally fermented Aloreña green table olives, exhibited high probiotic potential. The genome sequence of L. pentosus MP-10 is currently considered the largest genome among lactobacilli, highlighting the microorganism's ecological flexibility and adaptability. Here, we analyzed the complete genome sequence for the presence of acquired antibiotic resistance and virulence determinants to understand their defense mechanisms and explore its putative safety in food. The annotated genome sequence revealed evidence of diverse mobile genetic elements, such as prophages, transposases and transposons involved in their adaptation to brine-associated niches. In-silico analysis of L. pentosus MP-10 genome sequence identified a CRISPR (clustered regularly interspaced short palindromic repeats)/cas (CRISPR-associated protein genes) as an immune system against foreign genetic elements, which consisted of six arrays (4-12 repeats) and eleven predicted cas genes [CRISPR1 and CRISPR2 consisted of 3 (Type II-C) and 8 (Type I) genes] with high similarity to L. pentosus KCA1. Bioinformatic analyses revealed L. pentosus MP-10 to be absent of acquired antibiotic resistance genes, and most resistance genes were related to efflux mechanisms; no virulence determinants were found in the genome. This suggests that L. pentosus MP-10 could be considered safe and with high-adaptation potential, which could facilitate its application as a starter culture and probiotic in food preparations.

  14. Comparative genomic analysis of Acinetobacter strains isolated from murine colonic crypts.

    PubMed

    Saffarian, Azadeh; Touchon, Marie; Mulet, Céline; Tournebize, Régis; Passet, Virginie; Brisse, Sylvain; Rocha, Eduardo P C; Sansonetti, Philippe J; Pédron, Thierry

    2017-07-11

    A restricted set of aerobic bacteria dominated by the Acinetobacter genus was identified in murine intestinal colonic crypts. The vicinity of such bacteria with intestinal stem cells could indicate that they protect the crypt against cytotoxic and genotoxic signals. Genome analyses of these bacteria were performed to better appreciate their biodegradative capacities. Two taxonomically different clusters of Acinetobacter were isolated from murine proximal colonic crypts, one was identified as A. modestus and the other as A. radioresistens. Their identification was performed through biochemical parameters and housekeeping gene sequencing. After selection of one strain of each cluster (A. modestus CM11G and A. radioresistens CM38.2), comparative genomic analysis was performed on whole-genome sequencing data. The antibiotic resistance pattern of these two strains is different, in line with the many genes involved in resistance to heavy metals identified in both genomes. Moreover whereas the operon benABCDE involved in benzoate metabolism is encoded by the two genomes, the operon antABC encoding the anthranilate dioxygenase, and the phenol hydroxylase gene cluster are absent in the A. modestus genomic sequence, indicating that the two strains have different capacities to metabolize xenobiotics. A common feature of the two strains is the presence of a type IV pili system, and the presence of genes encoding proteins pertaining to secretion systems such as Type I and Type II secretion systems. Our comparative genomic analysis revealed that different Acinetobacter isolated from the same biological niche, even if they share a large majority of genes, possess unique features that could play a specific role in the protection of the intestinal crypt.

  15. Complete Genome Sequence of an Avian Paramyxovirus Type 4 from North America Reveals a Shorter Genome and New Genotype

    PubMed Central

    Parthiban, Manoharan; Kaliyaperumal, Manimaran; Xiao, Sa; Nayak, Baibaswata; Paldurai, Anandan; Kim, Shin-Hee; Ladman, Brian S.; Preskenis, Lauren A.; Gelb, Jack; Collins, Peter L.

    2013-01-01

    An avian paramyxovirus type 4 (APMV-4) was isolated from a duck in Delaware in 2010. Its genome is 15,048 nucleotides (nt) long, which is shorter by 6 nt than those for all previously reported strains. Phylogenetic analysis revealed that this strain formed a separate cluster within APMV-4 strains. PMID:23405329

  16. Rewriting the blueprint of life by synthetic genomics and genome engineering.

    PubMed

    Annaluru, Narayana; Ramalingam, Sivaprakash; Chandrasegaran, Srinivasan

    2015-06-16

    Advances in DNA synthesis and assembly methods over the past decade have made it possible to construct genome-size fragments from oligonucleotides. Early work focused on synthesis of small viral genomes, followed by hierarchical synthesis of wild-type bacterial genomes and subsequently on transplantation of synthesized bacterial genomes into closely related recipient strains. More recently, a synthetic designer version of yeast Saccharomyces cerevisiae chromosome III has been generated, with numerous changes from the wild-type sequence without having an impact on cell fitness and phenotype, suggesting plasticity of the yeast genome. A project to generate the first synthetic yeast genome--the Sc2.0 Project--is currently underway.

  17. The population structure of Vibrio cholerae from the Chandigarh Region of Northern India.

    PubMed

    Abd El Ghany, Moataz; Chander, Jagadish; Mutreja, Ankur; Rashid, Mamoon; Hill-Cawthorne, Grant A; Ali, Shahjahan; Naeem, Raeece; Thomson, Nicholas R; Dougan, Gordon; Pain, Arnab

    2014-07-01

    Cholera infection continues to be a threat to global public health. The current cholera pandemic associated with Vibrio cholerae El Tor has now been ongoing for over half a century. Thirty-eight V. cholerae El Tor isolates associated with a cholera outbreak in 2009 from the Chandigarh region of India were characterised by a combination of microbiology, molecular typing and whole-genome sequencing. The genomic analysis indicated that two clones of V. cholera circulated in the region and caused disease during this time. These clones fell into two distinct sub-clades that map independently onto wave 3 of the phylogenetic tree of seventh pandemic V. cholerae El Tor. Sequence analyses of the cholera toxin gene, the Vibrio seventh Pandemic Island II (VSPII) and SXT element correlated with this phylogenetic position of the two clades on the El Tor tree. The clade 2 isolates, characterized by a drug-resistant profile and the expression of a distinct cholera toxin, are closely related to the recent V. cholerae isolated elsewhere, including Haiti, but fell on a distinct branch of the tree, showing they were independent outbreaks. Multi-Locus Sequence Typing (MLST) distinguishes two sequence types among the 38 isolates, that did not correspond to the clades defined by whole-genome sequencing. Multi-Locus Variable-length tandem-nucleotide repeat Analysis (MLVA) identified 16 distinct clusters. The use of whole-genome sequencing enabled the identification of two clones of V. cholerae that circulated during the 2009 Chandigarh outbreak. These clones harboured a similar structure of ICEVchHai1 but differed mainly in the structure of CTX phage and VSPII. The limited capacity of MLST and MLVA to discriminate between the clones that circulated in the 2009 Chandigarh outbreak highlights the value of whole-genome sequencing as a route to the identification of further genetic markers to subtype V. cholerae isolates.

  18. Defining a Core Genome Multilocus Sequence Typing Scheme for the Global Epidemiology of Vibrio parahaemolyticus.

    PubMed

    Gonzalez-Escalona, Narjol; Jolley, Keith A; Reed, Elizabeth; Martinez-Urtaza, Jaime

    2017-06-01

    Vibrio parahaemolyticus is an important human foodborne pathogen whose transmission is associated with the consumption of contaminated seafood, with a growing number of infections reported over recent years worldwide. A multilocus sequence typing (MLST) database for V. parahaemolyticus was created in 2008, and a large number of clones have been identified, causing severe outbreaks worldwide (sequence type 3 [ST3]), recurrent outbreaks in certain regions (e.g., ST36), or spreading to other regions where they are nonendemic (e.g., ST88 or ST189). The current MLST scheme uses sequences of 7 genes to generate an ST, which results in a powerful tool for inferring the population structure of this pathogen, although with limited resolution, especially compared to pulsed-field gel electrophoresis (PFGE). The application of whole-genome sequencing (WGS) has become routine for trace back investigations, with core genome MLST (cgMLST) analysis as one of the most straightforward ways to explore complex genomic data in an epidemiological context. Therefore, there is a need to generate a new, portable, standardized, and more advanced system that provides higher resolution and discriminatory power among V. parahaemolyticus strains using WGS data. We sequenced 92 V. parahaemolyticus genomes and used the genome of strain RIMD 2210633 as a reference (with a total of 4,832 genes) to determine which genes were suitable for establishing a V. parahaemolyticus cgMLST scheme. This analysis resulted in the identification of 2,254 suitable core genes for use in the cgMLST scheme. To evaluate the performance of this scheme, we performed a cgMLST analysis of 92 newly sequenced genomes, plus an additional 142 strains with genomes available at NCBI. cgMLST analysis was able to distinguish related and unrelated strains, including those with the same ST, clearly showing its enhanced resolution over conventional MLST analysis. It also distinguished outbreak-related from non-outbreak-related strains within the same ST. The sequences obtained from this work were deposited and are available in the public database (http://pubmlst.org/vparahaemolyticus). The application of this cgMLST scheme to the characterization of V. parahaemolyticus strains provided by different laboratories from around the world will reveal the global picture of the epidemiology, spread, and evolution of this pathogen and will become a powerful tool for outbreak investigations, allowing for the unambiguous comparison of strains with global coverage. Copyright © 2017 Gonzalez-Escalona et al.

  19. Phenotypic and genomic analysis of serotype 3 Sabin poliovirus vaccine produced in MRC-5 cell substrate.

    PubMed

    Alirezaie, Behnam; Taqavian, Mohammad; Aghaiypour, Khosrow; Esna-Ashari, Fatemeh; Shafyi, Abbas

    2011-05-01

    The cell substrate has a pivotal role in live virus vaccines production. It is necessary to evaluate the effects of the cell substrate on the properties of the propagated viruses, especially in the case of viruses which are unstable genetically such as polioviruses, by monitoring the molecular and phenotypical characteristics of harvested viruses. To investigate the presence/absence of mutation(s), the near full-length genomic sequence of different harvests of the type 3 Sabin strain of poliovirus propagated in MRC-5 cells were determined. The sequences were compared with genomic sequences of different virus seeds, vaccines, and OPV-like isolates. Nearly complete genomic sequencing results, however, revealed no detectable mutations throughout the genome RNA-plaque purified (RSO)-derived monopool of type 3 OPVs manufactured in MRC-5. Thirty-six years of experience in OPV production, trend analysis, and vaccine surveillance also suggest that: (i) different monopools of serotype 3 OPV produced in MRC-5 retained their phenotypic characteristics (temperature sensitivity and neuroattenuation), (ii) MRC-5 cells support the production of acceptable virus yields, (iii) OPV replicated in the MRC-5 cell substrate is a highly efficient and safe vaccine. These results confirm previous reports that MRC-5 is a desirable cell substrate for the production of OPV. Copyright © 2011 Wiley-Liss, Inc.

  20. High quality draft genome sequence of Corynebacterium ulceribovis type strain IMMIB-L1395T (DSM 45146T)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yassin, Atteyet F.; Lapidus, Alla; Han, James

    We report that the Corynebacterium ulceribovis strain IMMIB L-1395T (= DSM 45146T) is an aerobic to facultative anaerobic, Gram-positive, non-spore-forming, non-motile rod-shaped bacterium that was isolated from the skin of the udder of a cow, in Schleswig Holstein, Germany. The cell wall of C. ulceribovis contains corynemycolic acids. The cellular fatty acids are those described for the genus Corynebacterium, but tuberculostearic acid is not present. Here we describe the features of C. ulceribovis strain IMMIB L-1395T, together with genome sequence information and its annotation. The 2,300,451 bp long genome containing 2,104 protein-coding genes and 54 RNA-encoding genes and is partmore » of the Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG) project.« less

  1. High quality draft genome sequence of Corynebacterium ulceribovis type strain IMMIB-L1395T (DSM 45146T)

    DOE PAGES

    Yassin, Atteyet F.; Lapidus, Alla; Han, James; ...

    2015-08-05

    We report that the Corynebacterium ulceribovis strain IMMIB L-1395T (= DSM 45146T) is an aerobic to facultative anaerobic, Gram-positive, non-spore-forming, non-motile rod-shaped bacterium that was isolated from the skin of the udder of a cow, in Schleswig Holstein, Germany. The cell wall of C. ulceribovis contains corynemycolic acids. The cellular fatty acids are those described for the genus Corynebacterium, but tuberculostearic acid is not present. Here we describe the features of C. ulceribovis strain IMMIB L-1395T, together with genome sequence information and its annotation. The 2,300,451 bp long genome containing 2,104 protein-coding genes and 54 RNA-encoding genes and is partmore » of the Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG) project.« less

  2. Host-Associated Genomic Features of the Novel Uncultured Intracellular Pathogen Ca. Ichthyocystis Revealed by Direct Sequencing of Epitheliocysts

    PubMed Central

    Qi, Weihong; Vaughan, Lloyd; Katharios, Pantelis; Schlapbach, Ralph; Seth-Smith, Helena M.B.

    2016-01-01

    Advances in single-cell and mini-metagenome sequencing have enabled important investigations into uncultured bacteria. In this study, we applied the mini-metagenome sequencing method to assemble genome drafts of the uncultured causative agents of epitheliocystis, an emerging infectious disease in the Mediterranean aquaculture species gilthead seabream. We sequenced multiple cyst samples and constructed 11 genome drafts from a novel beta-proteobacterial lineage, Candidatus Ichthyocystis. The draft genomes demonstrate features typical of pathogenic bacteria with an obligate intracellular lifestyle: a reduced genome of up to 2.6 Mb, reduced G + C content, and reduced metabolic capacity. Reconstruction of metabolic pathways reveals that Ca. Ichthyocystis genomes lack all amino acid synthesis pathways, compelling them to scavenge from the fish host. All genomes encode type II, III, and IV secretion systems, a large repertoire of predicted effectors, and a type IV pilus. These are all considered to be virulence factors, required for adherence, invasion, and host manipulation. However, no evidence of lipopolysaccharide synthesis could be found. Beyond the core functions shared within the genus, alignments showed distinction into different species, characterized by alternative large gene families. These comprise up to a third of each genome, appear to have arisen through duplication and diversification, encode many effector proteins, and are seemingly critical for virulence. Thus, Ca. Ichthyocystis represents a novel obligatory intracellular pathogenic beta-proteobacterial lineage. The methods used: mini-metagenome analysis and manual annotation, have generated important insights into the lifestyle and evolution of the novel, uncultured pathogens, elucidating many putative virulence factors including an unprecedented array of novel gene families. PMID:27190004

  3. Population genomic analysis of strain variation in Leptospirillum group II bacteria involved in acid mine drainage formation.

    PubMed

    Simmons, Sheri L; Dibartolo, Genevieve; Denef, Vincent J; Goltsman, Daniela S Aliaga; Thelen, Michael P; Banfield, Jillian F

    2008-07-22

    Deeply sampled community genomic (metagenomic) datasets enable comprehensive analysis of heterogeneity in natural microbial populations. In this study, we used sequence data obtained from the dominant member of a low-diversity natural chemoautotrophic microbial community to determine how coexisting closely related individuals differ from each other in terms of gene sequence and gene content, and to uncover evidence of evolutionary processes that occur over short timescales. DNA sequence obtained from an acid mine drainage biofilm was reconstructed, taking into account the effects of strain variation, to generate a nearly complete genome tiling path for a Leptospirillum group II species closely related to L. ferriphilum (sampling depth approximately 20x). The population is dominated by one sequence type, yet we detected evidence for relatively abundant variants (>99.5% sequence identity to the dominant type) at multiple loci, and a few rare variants. Blocks of other Leptospirillum group II types ( approximately 94% sequence identity) have recombined into one or more variants. Variant blocks of both types are more numerous near the origin of replication. Heterogeneity in genetic potential within the population arises from localized variation in gene content, typically focused in integrated plasmid/phage-like regions. Some laterally transferred gene blocks encode physiologically important genes, including quorum-sensing genes of the LuxIR system. Overall, results suggest inter- and intrapopulation genetic exchange involving distinct parental genome types and implicate gain and loss of phage and plasmid genes in recent evolution of this Leptospirillum group II population. Population genetic analyses of single nucleotide polymorphisms indicate variation between closely related strains is not maintained by positive selection, suggesting that these regions do not represent adaptive differences between strains. Thus, the most likely explanation for the observed patterns of polymorphism is divergence of ancestral strains due to geographic isolation, followed by mixing and subsequent recombination.

  4. Population Genomic Analysis of Strain Variation in Leptospirillum Group II Bacteria Involved in Acid Mine Drainage Formation

    PubMed Central

    Denef, Vincent J; Goltsman, Daniela S. Aliaga; Thelen, Michael P; Banfield, Jillian F

    2008-01-01

    Deeply sampled community genomic (metagenomic) datasets enable comprehensive analysis of heterogeneity in natural microbial populations. In this study, we used sequence data obtained from the dominant member of a low-diversity natural chemoautotrophic microbial community to determine how coexisting closely related individuals differ from each other in terms of gene sequence and gene content, and to uncover evidence of evolutionary processes that occur over short timescales. DNA sequence obtained from an acid mine drainage biofilm was reconstructed, taking into account the effects of strain variation, to generate a nearly complete genome tiling path for a Leptospirillum group II species closely related to L. ferriphilum (sampling depth ∼20×). The population is dominated by one sequence type, yet we detected evidence for relatively abundant variants (>99.5% sequence identity to the dominant type) at multiple loci, and a few rare variants. Blocks of other Leptospirillum group II types (∼94% sequence identity) have recombined into one or more variants. Variant blocks of both types are more numerous near the origin of replication. Heterogeneity in genetic potential within the population arises from localized variation in gene content, typically focused in integrated plasmid/phage-like regions. Some laterally transferred gene blocks encode physiologically important genes, including quorum-sensing genes of the LuxIR system. Overall, results suggest inter- and intrapopulation genetic exchange involving distinct parental genome types and implicate gain and loss of phage and plasmid genes in recent evolution of this Leptospirillum group II population. Population genetic analyses of single nucleotide polymorphisms indicate variation between closely related strains is not maintained by positive selection, suggesting that these regions do not represent adaptive differences between strains. Thus, the most likely explanation for the observed patterns of polymorphism is divergence of ancestral strains due to geographic isolation, followed by mixing and subsequent recombination. PMID:18651792

  5. Population Structure and Antimicrobial Resistance Profiles of Streptococcus suis Serotype 2 Sequence Type 25 Strains.

    PubMed

    Athey, Taryn B T; Teatero, Sarah; Takamatsu, Daisuke; Wasserscheid, Jessica; Dewar, Ken; Gottschalk, Marcelo; Fittipaldi, Nahuel

    2016-01-01

    Strains of serotype 2 Streptococcus suis are responsible for swine and human infections. Different serotype 2 genetic backgrounds have been defined using multilocus sequence typing (MLST). However, little is known about the genetic diversity within each MLST sequence type (ST). Here, we used whole-genome sequencing to test the hypothesis that S. suis serotype 2 strains of the ST25 lineage are genetically heterogeneous. We evaluated 51 serotype 2 ST25 S. suis strains isolated from diseased pigs and humans in Canada, the United States of America, and Thailand. Whole-genome sequencing revealed numerous large-scale rearrangements in the ST25 genome, compared to the genomes of ST1 and ST28 S. suis strains, which result, among other changes, in disruption of a pilus island locus. We report that recombination and lateral gene transfer contribute to ST25 genetic diversity. Phylogenetic analysis identified two main and distinct Thai and North American clades grouping most strains investigated. These clades also possessed distinct patterns of antimicrobial resistance genes, which correlated with acquisition of different integrative and conjugative elements (ICEs). Some of these ICEs were found to be integrated at a recombination hot spot, previously identified as the site of integration of the 89K pathogenicity island in serotype 2 ST7 S. suis strains. Our results highlight the limitations of MLST for phylogenetic analysis of S. suis, and the importance of lateral gene transfer and recombination as drivers of diversity in this swine pathogen and zoonotic agent.

  6. Population Structure and Antimicrobial Resistance Profiles of Streptococcus suis Serotype 2 Sequence Type 25 Strains

    PubMed Central

    Athey, Taryn B. T.; Teatero, Sarah; Takamatsu, Daisuke; Wasserscheid, Jessica; Dewar, Ken; Gottschalk, Marcelo; Fittipaldi, Nahuel

    2016-01-01

    Strains of serotype 2 Streptococcus suis are responsible for swine and human infections. Different serotype 2 genetic backgrounds have been defined using multilocus sequence typing (MLST). However, little is known about the genetic diversity within each MLST sequence type (ST). Here, we used whole-genome sequencing to test the hypothesis that S. suis serotype 2 strains of the ST25 lineage are genetically heterogeneous. We evaluated 51 serotype 2 ST25 S. suis strains isolated from diseased pigs and humans in Canada, the United States of America, and Thailand. Whole-genome sequencing revealed numerous large-scale rearrangements in the ST25 genome, compared to the genomes of ST1 and ST28 S. suis strains, which result, among other changes, in disruption of a pilus island locus. We report that recombination and lateral gene transfer contribute to ST25 genetic diversity. Phylogenetic analysis identified two main and distinct Thai and North American clades grouping most strains investigated. These clades also possessed distinct patterns of antimicrobial resistance genes, which correlated with acquisition of different integrative and conjugative elements (ICEs). Some of these ICEs were found to be integrated at a recombination hot spot, previously identified as the site of integration of the 89K pathogenicity island in serotype 2 ST7 S. suis strains. Our results highlight the limitations of MLST for phylogenetic analysis of S. suis, and the importance of lateral gene transfer and recombination as drivers of diversity in this swine pathogen and zoonotic agent. PMID:26954687

  7. Structural characterization of copia-type retrotransposons leads to insights into the marker development in a biofuel crop, Jatropha curcas L.

    PubMed Central

    2013-01-01

    Background Recently, Jatropha curcas L. has attracted worldwide attention for its potential as a source of biodiesel. However, most DNA markers have demonstrated high levels of genetic similarity among and within jatropha populations around the globe. Despite promising features of copia-type retrotransposons as ideal genetic tools for gene tagging, mutagenesis, and marker-assisted selection, they have not been characterized in the jatropha genome yet. Here, we examined the diversity, evolution, and genome-wide organization of copia-type retrotransposons in the Asian, African, and Mesoamerican accessions of jatropha, then introduced a retrotransposon-based marker for this biofuel crop. Results In total, 157 PCR fragments that were amplified using the degenerate primers for the reverse transcriptase (RT) domain of copia-type retroelements were sequenced and aligned to construct the neighbor-joining tree. Phylogenetic analysis demonstrated that isolated copia RT sequences were classified into ten families, which were then grouped into three lineages. An in-depth study of the jatropha genome for the RT sequences of each family led to the characterization of full consensus sequences of the jatropha copia-type families. Estimated copy numbers of target sequences were largely different among families, as was presence of genes within 5 kb flanking regions for each family. Five copia-type families were as appealing candidates for the development of DNA marker systems. A candidate marker from family Jc7 was particularly capable of detecting genetic variation among different jatropha accessions. Fluorescence in situ hybridization (FISH) to metaphase chromosomes reveals that copia-type retrotransposons are scattered across chromosomes mainly located in the distal part regions. Conclusion This is the first report on genome-wide analysis and the cytogenetic mapping of copia-type retrotransposons of jatropha, leading to the discovery of families bearing high potential as DNA markers. Distinct dynamics of individual copia-type families, feasibility of a retrotransposon-based insertion polymorphism marker system in examining genetic variability, and approaches for the development of breeding strategies in jatropha using copia-type retrotransposons are discussed. PMID:24020916

  8. Muscle RAS oncogene homolog (MRAS) recurrent mutation in Borrmann type IV gastric cancer.

    PubMed

    Yasumoto, Makiko; Sakamoto, Etsuko; Ogasawara, Sachiko; Isobe, Taro; Kizaki, Junya; Sumi, Akiko; Kusano, Hironori; Akiba, Jun; Torimura, Takuji; Akagi, Yoshito; Itadani, Hiraku; Kobayashi, Tsutomu; Hasako, Shinichi; Kumazaki, Masafumi; Mizuarai, Shinji; Oie, Shinji; Yano, Hirohisa

    2017-01-01

    The prognosis of patients with Borrmann type IV gastric cancer (Type IV) is extremely poor. Thus, there is an urgent need to elucidate the molecular mechanisms underlying the oncogenesis of Type IV and to identify new therapeutic targets. Although previous studies using whole-exome and whole-genome sequencing have elucidated genomic alterations in gastric cancer, none has focused on comprehensive genetic analysis of Type IV. To discover cancer-relevant genes in Type IV, we performed whole-exome sequencing and genome-wide copy number analysis on 13 patients with Type IV. Exome sequencing identified 178 somatic mutations in protein-coding sequences or at splice sites. Among the mutations, we found a mutation in muscle RAS oncogene homolog (MRAS), which is predicted to cause molecular dysfunction. MRAS belongs to the Ras subgroup of small G proteins, which includes the prototypic RAS oncogenes. We analyzed an additional 46 Type IV samples to investigate the frequency of MRAS mutation. There were eight nonsynonymous mutations (mutation frequency, 17%), showing that MRAS is recurrently mutated in Type IV. Copy number analysis identified six focal amplifications and one homozygous deletion, including insulin-like growth factor 1 receptor (IGF1R) amplification. The samples with IGF1R amplification had remarkably higher IGF1R mRNA and protein expression levels compared with the other samples. This is the first report of MRAS recurrent mutation in human tumor samples. Our results suggest that MRAS mutation and IGF1R amplification could drive tumorigenesis of Type IV and could be new therapeutic targets. © 2016 The Authors. Cancer Medicine published by John Wiley & Sons Ltd.

  9. Genomic analysis of Ascochyta rabiei identifies dynamic genome environments of solanapyrone biosynthesis gene cluster and a novel type of pathway-specific regulator

    USDA-ARS?s Scientific Manuscript database

    Secondary metabolite genes are often clustered together and situated in particular genomic regions such as the subtelomere, which can facilitate niche adaptation in fungi. Solanapyrones are toxic secondary metabolites produced by fungi occupying different ecological niches. Full genome sequencing of...

  10. Efficient genome editing of differentiated renal epithelial cells.

    PubMed

    Hofherr, Alexis; Busch, Tilman; Huber, Nora; Nold, Andreas; Bohn, Albert; Viau, Amandine; Bienaimé, Frank; Kuehn, E Wolfgang; Arnold, Sebastian J; Köttgen, Michael

    2017-02-01

    Recent advances in genome editing technologies have enabled the rapid and precise manipulation of genomes, including the targeted introduction, alteration, and removal of genomic sequences. However, respective methods have been described mainly in non-differentiated or haploid cell types. Genome editing of well-differentiated renal epithelial cells has been hampered by a range of technological issues, including optimal design, efficient expression of multiple genome editing constructs, attainable mutation rates, and best screening strategies. Here, we present an easily implementable workflow for the rapid generation of targeted heterozygous and homozygous genomic sequence alterations in renal cells using transcription activator-like effector nucleases (TALENs) and the clustered regularly interspaced short palindromic repeat (CRISPR) system. We demonstrate the versatility of established protocols by generating novel cellular models for studying autosomal dominant polycystic kidney disease (ADPKD). Furthermore, we show that cell culture-validated genetic modifications can be readily applied to mouse embryonic stem cells (mESCs) for the generation of corresponding mouse models. The described procedure for efficient genome editing can be applied to any cell type to study physiological and pathophysiological functions in the context of precisely engineered genotypes.

  11. Serratia marcescens harbouring SME-type class A carbapenemases in Canada and the presence of blaSME on a novel genomic island, SmarGI1-1.

    PubMed

    Mataseje, L F; Boyd, D A; Delport, J; Hoang, L; Imperial, M; Lefebvre, B; Kuhn, M; Van Caeseele, P; Willey, B M; Mulvey, M R

    2014-07-01

    An increasing prevalence since 2010 of Serratia marcescens harbouring the Ambler class A carbapenemase SME prompted us to further characterize these isolates. Isolates harbouring bla(SME) were identified by PCR and sequencing. Phenotypic analysis for carbapenemase activity was carried out by a modified Hodge test and a modified Carba NP test. Antimicrobial susceptibilities were determined by Etest and Vitek 2. Typing was by PFGE of macrorestriction digests. Whole-genome sequencing of three isolates was carried out to characterize the genomic region harbouring the bla(SME)-type genes. All S. marcescens harbouring SME-type enzymes could be detected using a modified Carba NP test. Isolates harbouring bla(SME) were resistant to penicillins and carbapenems, but remained susceptible to third-generation cephalosporins, as well as fluoroquinolones and trimethoprim/sulfamethoxazole. Isolates exhibited diverse genetic backgrounds, though 57% of isolates were found in three clusters. Analysis of whole-genome sequence data from three isolates revealed that the bla(SME) gene occurred in a novel cryptic prophage genomic island, SmarGI1-1. There has been an increasing occurrence of S. marcescens harbouring bla(SME) in Canada since 2010. The bla(SME) gene was found on a genomic island, SmarGI1-1, that can be excised and circularized, which probably contributes to its dissemination amongst S. marcescens. © The Author 2014. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  12. The spectrum of genomic signatures: from dinucleotides to chaos game representation.

    PubMed

    Wang, Yingwei; Hill, Kathleen; Singh, Shiva; Kari, Lila

    2005-02-14

    In the post genomic era, access to complete genome sequence data for numerous diverse species has opened multiple avenues for examining and comparing primary DNA sequence organization of entire genomes. Previously, the concept of a genomic signature was introduced with the observation of species-type specific Dinucleotide Relative Abundance Profiles (DRAPs); dinucleotides were identified as the subsequences with the greatest bias in representation in a majority of genomes. Herein, we demonstrate that DRAP is one particular genomic signature contained within a broader spectrum of signatures. Within this spectrum, an alternative genomic signature, Chaos Game Representation (CGR), provides a unique visualization of patterns in sequence organization. A genomic signature is associated with a particular integer order or subsequence length that represents a measure of the resolution or granularity in the analysis of primary DNA sequence organization. We quantitatively explore the organizational information provided by genomic signatures of different orders through different distance measures, including a novel Image Distance. The Image Distance and other existing distance measures are evaluated by comparing the phylogenetic trees they generate for 26 complete mitochondrial genomes from a diversity of species. The phylogenetic tree generated by the Image Distance is compatible with the known relatedness of species. Quantitative evaluation of the spectrum of genomic signatures may be used to ultimately gain insight into the determinants and biological relevance of the genome signatures.

  13. Molecular epidemiology of Staphylococcus aureus bacteremia in a single large Minnesota medical center in 2015 as assessed using MLST, core genome MLST and spa typing.

    PubMed

    Park, Kyung-Hwa; Greenwood-Quaintance, Kerryl E; Uhl, James R; Cunningham, Scott A; Chia, Nicholas; Jeraldo, Patricio R; Sampathkumar, Priya; Nelson, Heidi; Patel, Robin

    2017-01-01

    Staphylococcus aureus is a leading cause of bacteremia in hospitalized patients. Whether or not S. aureus bacteremia (SAB) is associated with clonality, implicating potential nosocomial transmission, has not, however, been investigated. Herein, we examined the epidemiology of SAB using whole genome sequencing (WGS). 152 SAB isolates collected over the course of 2015 at a single large Minnesota medical center were studied. Staphylococcus protein A (spa) typing was performed by PCR/Sanger sequencing; multilocus sequence typing (MLST) and core genome MLST (cgMLST) were determined by WGS. Forty-eight isolates (32%) were methicillin-resistant S. aureus (MRSA). The isolates encompassed 66 spa types, clustered into 11 spa clonal complexes (CCs) and 10 singleton types. 88% of 48 MRSA isolates belonged to spa CC-002 or -008. Methicillin-susceptible S. aureus (MSSA) isolates were more genotypically diverse, with 61% distributed across four spa CCs (CC-002, CC-012, CC-008 and CC-084). By MLST, there was 31 sequence types (STs), including 18 divided into 6 CCs and 13 singleton STs. Amongst MSSA isolates, the common MLST clones were CC5 (23%), CC30 (19%), CC8 (15%) and CC15 (11%). Common MRSA clones were CC5 (67%) and CC8 (25%); there were no MRSA isolates in CC45 or CC30. By cgMLST analysis, there were 9 allelic differences between two isolates, with the remaining 150 isolates differing from each other by over 40 alleles. The two isolates were retroactively epidemiologically linked by medical record review. Overall, cgMLST analysis resulted in higher resolution epidemiological typing than did multilocus sequence or spa typing.

  14. Draft Genome Sequence of Mycobacterium chimaera Type ...

    EPA Pesticide Factsheets

    We report the draft genome sequence of the type strain Mycobacterium chimaera Fl-0169T, a member of the Mycobacterium avium complex (MAC). M. chimaera Fl-0169T was isolated from a patient in Italy and is highly similar to strains of M. chimaera isolated in Ireland, though Fl-0169T possesses unique virulence genes. Evidence suggests that M. avium, M. intracellulare, and M. chimaera are differently virulent and a comparative genomic analysis is critically needed to identify diagnostic targets that reliably differentiate species of MAC. With treatment costs for Mycobacterium infections estimated to be >$1.8 B annually in the U.S., correct species identification will result in improved treatment selection, lower costs, and improved patient outcomes.

  15. Epstein-Barr Virus Sequence Variation—Biology and Disease

    PubMed Central

    Tzellos, Stelios; Farrell, Paul J.

    2012-01-01

    Some key questions in Epstein-Barr virus (EBV) biology center on whether naturally occurring sequence differences in the virus affect infection or EBV associated diseases. Understanding the pattern of EBV sequence variation is also important for possible development of EBV vaccines. At present EBV isolates worldwide can be grouped into Type 1 and Type 2, a classification based on the EBNA2 gene sequence. Type 1 EBV is the most prevalent worldwide but Type 2 is common in parts of Africa. Type 1 transforms human B cells into lymphoblastoid cell lines much more efficiently than Type 2 EBV. Molecular mechanisms that may account for this difference in cell transformation are now becoming clearer. Advances in sequencing technology will greatly increase the amount of whole EBV genome data for EBV isolated from different parts of the world. Study of regional variation of EBV strains independent of the Type 1/Type 2 classification and systematic investigation of the relationship between viral strains, infection and disease will become possible. The recent discovery that specific mutation of the EBV EBNA3B gene may be linked to development of diffuse large B cell lymphoma illustrates the importance that mutations in the virus genome may have in infection and human disease. PMID:25436768

  16. Draft genome sequence of the extremely halophilic archaeon Haladaptatus cibarius type strain D43(T) isolated from fermented seafood.

    PubMed

    Lee, Hae-Won; Kim, Dae-Won; Lee, Mi-Hwa; Kim, Byung-Yong; Cho, Yong-Joon; Yim, Kyung June; Song, Hye Seon; Rhee, Jin-Kyu; Seo, Myung-Ji; Choi, Hak-Jong; Choi, Jong-Soon; Lee, Dong-Gi; Yoon, Changmann; Nam, Young-Do; Roh, Seong Woon

    2015-01-01

    An extremely halophilic archaeon, Haladaptatus cibarius D43(T), was isolated from traditional Korean salt-rich fermented seafood. Strain D43(T) shows the highest 16S rRNA gene sequence similarity (98.7 %) with Haladaptatus litoreus RO1-28(T), is Gram-negative staining, motile, and extremely halophilic. Despite potential industrial applications of extremely halophilic archaea, their genome characteristics remain obscure. Here, we describe the whole genome sequence and annotated features of strain D43(T). The 3,926,724 bp genome includes 4,092 protein-coding and 57 RNA genes (including 6 rRNA and 49 tRNA genes) with an average G + C content of 57.76 %.

  17. Genomic characterization of a Helicobacter pylori isolate from a patient with gastric cancer in China

    PubMed Central

    2014-01-01

    Background Helicobacter pylori is well known for its relationship with the occurrence of several severe gastric diseases. The mechanisms of pathogenesis triggered by H. pylori are less well known. In this study, we report the genome sequence and genomic characterizations of H. pylori strain HLJ039 that was isolated from a patient with gastric cancer in the Chinese province of Heilongjiang, where there is a high incidence of gastric cancer. To investigate potential genomic features that may be involved in pathogenesis of carcinoma, the genome was compared to three previously sequenced genomes in this area. Result We obtained 42 contigs with a total length of 1,611,192 bp and predicted 1,687 coding sequences. Compared to strains isolated from gastritis and ulcers in this area, 10 different regions were identified as being unique for HLJ039; they mainly encoded type II restriction-modification enzyme, type II m6A methylase, DNA-cytosine methyltransferase, DNA methylase, and hypothetical proteins. A unique 547-bp fragment sharing 93% identity with a hypothetical protein of Helicobacter cinaedi ATCC BAA-847 was not present in any other previous H. pylori strains. Phylogenetic analysis based on core genome single nucleotide polymorphisms shows that HLJ039 is defined as hspEAsia subgroup, which belongs to the hpEastAsia group. Conclusion DNA methylations, variations of the genomic regions involved in restriction and modification systems, are the “hot” regions that may be related to the mechanism of H. pylori-induced gastric cancer. The genome sequence will provide useful information for the deep mining of potential mechanisms related to East Asian gastric cancer. PMID:24565107

  18. Using comparative genome analysis to identify problems in annotated microbial genomes.

    PubMed

    Poptsova, Maria S; Gogarten, J Peter

    2010-07-01

    Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli, and consider additional quality control for their results.

  19. IVAG: An Integrative Visualization Application for Various Types of Genomic Data Based on R-Shiny and the Docker Platform.

    PubMed

    Lee, Tae-Rim; Ahn, Jin Mo; Kim, Gyuhee; Kim, Sangsoo

    2017-12-01

    Next-generation sequencing (NGS) technology has become a trend in the genomics research area. There are many software programs and automated pipelines to analyze NGS data, which can ease the pain for traditional scientists who are not familiar with computer programming. However, downstream analyses, such as finding differentially expressed genes or visualizing linkage disequilibrium maps and genome-wide association study (GWAS) data, still remain a challenge. Here, we introduce a dockerized web application written in R using the Shiny platform to visualize pre-analyzed RNA sequencing and GWAS data. In addition, we have integrated a genome browser based on the JBrowse platform and an automated intermediate parsing process required for custom track construction, so that users can easily build and navigate their personal genome tracks with in-house datasets. This application will help scientists perform series of downstream analyses and obtain a more integrative understanding about various types of genomic data by interactively visualizing them with customizable options.

  20. Horizontal gene transfer of chromosomal Type II toxin-antitoxin systems of Escherichia coli.

    PubMed

    Ramisetty, Bhaskar Chandra Mohan; Santhosh, Ramachandran Sarojini

    2016-02-01

    Type II toxin-antitoxin systems (TAs) are small autoregulated bicistronic operons that encode a toxin protein with the potential to inhibit metabolic processes and an antitoxin protein to neutralize the toxin. Most of the bacterial genomes encode multiple TAs. However, the diversity and accumulation of TAs on bacterial genomes and its physiological implications are highly debated. Here we provide evidence that Escherichia coli chromosomal TAs (encoding RNase toxins) are 'acquired' DNA likely originated from heterologous DNA and are the smallest known autoregulated operons with the potential for horizontal propagation. Sequence analyses revealed that integration of TAs into the bacterial genome is unique and contributes to variations in the coding and/or regulatory regions of flanking host genome sequences. Plasmids and genomes encoding identical TAs of natural isolates are mutually exclusive. Chromosomal TAs might play significant roles in the evolution and ecology of bacteria by contributing to host genome variation and by moderation of plasmid maintenance. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  1. Nucleotide sequence and genetic organization of barley stripe mosaic virus RNA gamma.

    PubMed

    Gustafson, G; Hunter, B; Hanau, R; Armour, S L; Jackson, A O

    1987-06-01

    The complete nucleotide sequences of RNA gamma from the Type and ND18 strains of barley stripe mosaic virus (BSMV) have been determined. The sequences are 3164 (Type) and 2791 (ND18) nucleotides in length. Both sequences contain a 5'-noncoding region (87 or 88 nucleotides) which is followed by a long open reading frame (ORF1). A 42-nucleotide intercistronic region separates ORF1 from a second, shorter open reading frame (ORF2) located near the 3'-end of the RNA. There is a high degree of homology between the Type and ND18 strains in the nucleotide sequence of ORF1. However, the Type strain contains a 366 nucleotide direct tandem repeat within ORF1 which is absent in the ND18 strain. Consequently, the predicted translation product of Type RNA gamma ORF1 (mol wt 87,312) is significantly larger than that of ND18 RNA gamma ORF1 (mol wt 74,011). The amino acid sequence of the ORF1 polypeptide contains homologies with putative RNA polymerases from other RNA viruses, suggesting that this protein may function in replication of the BSMV genome. The nucleotide sequence of RNA gamma ORF2 is nearly identical in the Type and ND18 strains. ORF2 codes for a polypeptide with a predicted molecular weight of 17,209 (Type) or 17,074 (ND18) which is known to be translated from a subgenomic (sg) RNA. The initiation point of this sgRNA has been mapped to a location 27 nucleotides upstream of the ORF2 initiation codon in the intercistronic region between ORF1 and ORF2. The sgRNA is not coterminal with the 3'-end of the genomic RNA, but instead contains heterogeneous poly(A) termini up to 150 nucleotides long (J. Stanley, R. Hanau, and A. O. Jackson, 1984, Virology 139, 375-383). In the genomic RNA gamma, ORF2 is followed by a short poly(A) tract and a 238-nucleotide tRNA-like structure.

  2. Genomic characterization of two new enterovirus types, EV-A114 and EV-A121.

    PubMed

    Deshpande, Jagadish M; Sharma, Deepa K; Saxena, Vinay K; Shetty, Sushmitha A; Qureshi, Tarique Husain I H; Nalavade, Uma P

    2016-12-01

    Enteroviruses cause a variety of illnesses of the gastrointestinal tract, central nervous system and cardiovascular system. Phylogenetic analysis of VP1 sequences has identified 106 different human enteroviruses classified into four enterovirus species within the genus Enterovirus of the family Picornaviridae. It is likely that not all enterovirus types have been discovered. Between September 2013 and October 2014, stool samples of 6274 apparently healthy children of up to 5 years of age residing in Gorakhpur district, Uttar Pradesh, India were screened for enteroviruses. Virus isolates obtained in RD and Hep-2c cells were identified by complete VP1 sequencing. Enteroviruses were isolated from 3042 samples. A total of 87 different enterovirus types were identified. Two isolates with 71 and 74 % nucleotide sequence similarity to all other known enteroviruses were recognized as novel types. In this paper we report identification and complete genome sequence analysis of these two isolates classified as EV-A114 and EV-A121.

  3. Whole genome sequencing in clinical and public health microbiology

    PubMed Central

    Kwong, J. C.; McCallum, N.; Sintchenko, V.; Howden, B. P.

    2015-01-01

    SummaryGenomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology. The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology. Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories. As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future. Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure. PMID:25730631

  4. Whole genome sequencing in clinical and public health microbiology.

    PubMed

    Kwong, J C; McCallum, N; Sintchenko, V; Howden, B P

    2015-04-01

    Genomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology.The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology.Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories.As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future.Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure.

  5. Complete Whole-Genome Sequence of Salmonella enterica subsp. enterica Serovar Java NCTC5706.

    PubMed

    Fazal, Mohammed-Abbas; Alexander, Sarah; Burnett, Edward; Deheer-Graham, Ana; Oliver, Karen; Holroyd, Nancy; Parkhill, Julian; Russell, Julie E

    2016-11-03

    Salmonellae are a significant cause of morbidity and mortality globally. Here, we report the first complete genome sequence for Salmonella enterica subsp. enterica serovar Java strain NCTC5706. This strain is of historical significance, having been isolated in the pre-antibiotic era and was deposited into the National Collection of Type Cultures in 1939. © Crown copyright 2016.

  6. Genome Sequence of Streptococcus phocae subsp. salmonis Strain C-4T, Isolated from Atlantic Salmon (Salmo salar)

    PubMed Central

    Suarez, Rudy; Lazo, Eduardo; Bravo, Diego; Llegues, Katerina O.; Romalde, Jesús L.; Godoy, Marcos G.

    2014-01-01

    Streptococcus phocae subsp. salmonis is a fish pathogen that has an important impact on the Chilean salmon industry. Here, we report the genome sequence of the type strain C-4T isolated from Atlantic salmon (Salmo salar), showing a number of interesting features and genes related to its possible virulence factors. PMID:25502668

  7. Dynamic Evolution of Pathogenicity Revealed by Sequencing and Comparative Genomics of 19 Pseudomonas syringae Isolates

    PubMed Central

    Romanchuk, Artur; Chang, Jeff H.; Mukhtar, M. Shahid; Cherkis, Karen; Roach, Jeff; Grant, Sarah R.; Jones, Corbin D.; Dangl, Jeffery L.

    2011-01-01

    Closely related pathogens may differ dramatically in host range, but the molecular, genetic, and evolutionary basis for these differences remains unclear. In many Gram- negative bacteria, including the phytopathogen Pseudomonas syringae, type III effectors (TTEs) are essential for pathogenicity, instrumental in structuring host range, and exhibit wide diversity between strains. To capture the dynamic nature of virulence gene repertoires across P. syringae, we screened 11 diverse strains for novel TTE families and coupled this nearly saturating screen with the sequencing and assembly of 14 phylogenetically diverse isolates from a broad collection of diseased host plants. TTE repertoires vary dramatically in size and content across all P. syringae clades; surprisingly few TTEs are conserved and present in all strains. Those that are likely provide basal requirements for pathogenicity. We demonstrate that functional divergence within one conserved locus, hopM1, leads to dramatic differences in pathogenicity, and we demonstrate that phylogenetics-informed mutagenesis can be used to identify functionally critical residues of TTEs. The dynamism of the TTE repertoire is mirrored by diversity in pathways affecting the synthesis of secreted phytotoxins, highlighting the likely role of both types of virulence factors in determination of host range. We used these 14 draft genome sequences, plus five additional genome sequences previously reported, to identify the core genome for P. syringae and we compared this core to that of two closely related non-pathogenic pseudomonad species. These data revealed the recent acquisition of a 1 Mb megaplasmid by a sub-clade of cucumber pathogens. This megaplasmid encodes a type IV secretion system and a diverse set of unknown proteins, which dramatically increases both the genomic content of these strains and the pan-genome of the species. PMID:21799664

  8. The (in)famous GWAS P-value threshold revisited and updated for low-frequency variants.

    PubMed

    Fadista, João; Manning, Alisa K; Florez, Jose C; Groop, Leif

    2016-08-01

    Genome-wide association studies (GWAS) have long relied on proposed statistical significance thresholds to be able to differentiate true positives from false positives. Although the genome-wide significance P-value threshold of 5 × 10(-8) has become a standard for common-variant GWAS, it has not been updated to cope with the lower allele frequency spectrum used in many recent array-based GWAS studies and sequencing studies. Using a whole-genome- and -exome-sequencing data set of 2875 individuals of European ancestry from the Genetics of Type 2 Diabetes (GoT2D) project and a whole-exome-sequencing data set of 13 000 individuals from five ancestries from the GoT2D and T2D-GENES (Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples) projects, we describe guidelines for genome- and exome-wide association P-value thresholds needed to correct for multiple testing, explaining the impact of linkage disequilibrium thresholds for distinguishing independent variants, minor allele frequency and ancestry characteristics. We emphasize the advantage of studying recent genetic isolate populations when performing rare and low-frequency genetic association analyses, as the multiple testing burden is diminished due to higher genetic homogeneity.

  9. Genome-based classification of micromonosporae with a focus on their biotechnological and ecological potential.

    PubMed

    Carro, Lorena; Nouioui, Imen; Sangal, Vartul; Meier-Kolthoff, Jan P; Trujillo, Martha E; Montero-Calasanz, Maria Del Carmen; Sahin, Nevzat; Smith, Darren Lee; Kim, Kristi E; Peluso, Paul; Deshpande, Shweta; Woyke, Tanja; Shapiro, Nicole; Kyrpides, Nikos C; Klenk, Hans-Peter; Göker, Markus; Goodfellow, Michael

    2018-01-11

    There is a need to clarify relationships within the actinobacterial genus Micromonospora, the type genus of the family Micromonosporaceae, given its biotechnological and ecological importance. Here, draft genomes of 40 Micromonospora type strains and two non-type strains are made available through the Genomic Encyclopedia of Bacteria and Archaea project and used to generate a phylogenomic tree which showed they could be assigned to well supported phyletic lines that were not evident in corresponding trees based on single and concatenated sequences of conserved genes. DNA G+C ratios derived from genome sequences showed that corresponding data from species descriptions were imprecise. Emended descriptions include precise base composition data and approximate genome sizes of the type strains. antiSMASH analyses of the draft genomes show that micromonosporae have a previously unrealised potential to synthesize novel specialized metabolites. Close to one thousand biosynthetic gene clusters were detected, including NRPS, PKS, terpenes and siderophores clusters that were discontinuously distributed thereby opening up the prospect of prioritising gifted strains for natural product discovery. The distribution of key stress related genes provide an insight into how micromonosporae adapt to key environmental variables. Genes associated with plant interactions highlight the potential use of micromonosporae in agriculture and biotechnology.

  10. Complete genome sequence of Rhodospirillum rubrum type strain (S1T)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Munk, Christine; Copeland, A; Lucas, Susan

    2011-01-01

    Rhodospirillum rubrum (Esmarch 1887) Molisch 1907 is the type species of the genus Rho- dospirillum, which is the type genus of the family Rhodospirillaceae in the class Alphaproteo- bacteria. The species is of special interest because it is an anoxygenic phototroph that pro- duces extracellular elemental sulfur (instead of oxygen) while harvesting light. It contains one of the most simple photosynthetic systems currently known, lacking light harvesting complex 2. Strain S1T can grow on carbon monoxide as sole energy source. With currently over 1,750 PubMed entries, R. rubrum is one of the most intensively studied microbial species, in partic- ularmore » for physiological and genetic studies. Next to R. centenum strain SW, the genome se- quence of strain S1T is only the second genome of a member of the genus Rhodospirillum to be published, but the first type strain genome from the genus. The 4,352,825 bp long chro- mosome and 53,732 bp plasmid with a total of 3,850 protein-coding and 83 RNA genes were sequenced as part of the DOE Joint Genome Institute Program DOEM 2002.« less

  11. Genomic and Genetic Diversity within the Pseudomonas fluorescens Complex

    PubMed Central

    Garrido-Sanz, Daniel; Meier-Kolthoff, Jan P.; Göker, Markus; Martín, Marta; Rivilla, Rafael; Redondo-Nieto, Miguel

    2016-01-01

    The Pseudomonas fluorescens complex includes Pseudomonas strains that have been taxonomically assigned to more than fifty different species, many of which have been described as plant growth-promoting rhizobacteria (PGPR) with potential applications in biocontrol and biofertilization. So far the phylogeny of this complex has been analyzed according to phenotypic traits, 16S rDNA, MLSA and inferred by whole-genome analysis. However, since most of the type strains have not been fully sequenced and new species are frequently described, correlation between taxonomy and phylogenomic analysis is missing. In recent years, the genomes of a large number of strains have been sequenced, showing important genomic heterogeneity and providing information suitable for genomic studies that are important to understand the genomic and genetic diversity shown by strains of this complex. Based on MLSA and several whole-genome sequence-based analyses of 93 sequenced strains, we have divided the P. fluorescens complex into eight phylogenomic groups that agree with previous works based on type strains. Digital DDH (dDDH) identified 69 species and 75 subspecies within the 93 genomes. The eight groups corresponded to clustering with a threshold of 31.8% dDDH, in full agreement with our MLSA. The Average Nucleotide Identity (ANI) approach showed inconsistencies regarding the assignment to species and to the eight groups. The small core genome of 1,334 CDSs and the large pan-genome of 30,848 CDSs, show the large diversity and genetic heterogeneity of the P. fluorescens complex. However, a low number of strains were enough to explain most of the CDSs diversity at core and strain-specific genomic fractions. Finally, the identification and analysis of group-specific genome and the screening for distinctive characters revealed a phylogenomic distribution of traits among the groups that provided insights into biocontrol and bioremediation applications as well as their role as PGPR. PMID:26915094

  12. Short-term evolution of Shiga toxin-producing Escherichia coli O157:H7 between two food-borne outbreaks.

    PubMed

    Cowley, Lauren A; Dallman, Timothy J; Fitzgerald, Stephen; Irvine, Neil; Rooney, Paul J; McAteer, Sean P; Day, Martin; Perry, Neil T; Bono, James L; Jenkins, Claire; Gally, David L

    2016-09-01

    Shiga toxin-producing Escherichia coli (STEC) O157:H7 is a public health threat and outbreaks occur worldwide. Here, we investigate genomic differences between related STEC O157:H7 that caused two outbreaks, eight weeks apart, at the same restaurant. Short-read genome sequencing divided the outbreak strains into two sub-clusters separated by only three single-nucleotide polymorphisms in the core genome while traditional typing identified them as separate phage types, PT8 and PT54. Isolates did not cluster with local strains but with those associated with foreign travel to the Middle East/North Africa. Combined long-read sequencing approaches and optical mapping revealed that the two outbreak strains had undergone significant microevolution in the accessory genome with prophage gain, loss and recombination. In addition, the PT54 sub-type had acquired a 240 kbp multi-drug resistance (MDR) IncHI2 plasmid responsible for the phage type switch. A PT54 isolate had a general fitness advantage over a PT8 isolate in rich medium, including an increased capacity to use specific amino acids and dipeptides as a nitrogen source. The second outbreak was considerably larger and there were multiple secondary cases indicative of effective human-to-human transmission. We speculate that MDR plasmid acquisition and prophage changes have adapted the PT54 strain for human infection and transmission. Our study shows the added insights provided by combining whole-genome sequencing approaches for outbreak investigations.

  13. Short-term evolution of Shiga toxin-producing Escherichia coli O157:H7 between two food-borne outbreaks

    PubMed Central

    Dallman, Timothy J.; Fitzgerald, Stephen; Irvine, Neil; Rooney, Paul J.; McAteer, Sean P.; Day, Martin; Perry, Neil T.; Bono, James L.; Jenkins, Claire; Gally, David L.

    2016-01-01

    Shiga toxin-producing Escherichia coli (STEC) O157:H7 is a public health threat and outbreaks occur worldwide. Here, we investigate genomic differences between related STEC O157:H7 that caused two outbreaks, eight weeks apart, at the same restaurant. Short-read genome sequencing divided the outbreak strains into two sub-clusters separated by only three single-nucleotide polymorphisms in the core genome while traditional typing identified them as separate phage types, PT8 and PT54. Isolates did not cluster with local strains but with those associated with foreign travel to the Middle East/North Africa. Combined long-read sequencing approaches and optical mapping revealed that the two outbreak strains had undergone significant microevolution in the accessory genome with prophage gain, loss and recombination. In addition, the PT54 sub-type had acquired a 240 kbp multi-drug resistance (MDR) IncHI2 plasmid responsible for the phage type switch. A PT54 isolate had a general fitness advantage over a PT8 isolate in rich medium, including an increased capacity to use specific amino acids and dipeptides as a nitrogen source. The second outbreak was considerably larger and there were multiple secondary cases indicative of effective human-to-human transmission. We speculate that MDR plasmid acquisition and prophage changes have adapted the PT54 strain for human infection and transmission. Our study shows the added insights provided by combining whole-genome sequencing approaches for outbreak investigations. PMID:28348875

  14. How, who, and when: preferences for delivery of genome sequencing results among women diagnosed with breast cancer at a young age.

    PubMed

    Kaphingst, Kimberly A; Ivanovich, Jennifer; Elrick, Ashley; Dresser, Rebecca; Matsen, Cindy; Goodman, Melody S

    2016-11-01

    The increasing use of genome sequencing with patients raises a critical communication challenge: return of secondary findings. While the issue of what sequencing results should be returned to patients has been examined, much less attention has been paid to developing strategies to return these results in ways that meet patients' needs and preferences. To address this, we investigated delivery preferences (i.e., who, how, when) for individual genome sequencing results among women diagnosed with breast cancer at age 40 or younger. We conducted 60 semistructured, in-person individual interviews to examine preferences for the return of different types of genome sequencing results and the reasons underlying these preferences. Two coders independently coded interview transcripts; analysis was conducted using NVivo 10. The major findings from the study were that: (1) many participants wanted sequencing results as soon as possible, even at the time of breast cancer diagnosis; (2) participants wanted an opportunity for an in-person discussion of results; and (3) they put less emphasis on the type of person delivering results than on the knowledge and communicative skills of that person. Participants also emphasized the importance of a results return process tailored to a patient's individual circumstances and one that she has a voice in determining. A critical goal for future transdisciplinary research including clinicians, patients, and communication researchers may be to develop decision-making processes to help patients make decisions about how they would like various sequencing results returned.

  15. APE-Type Non-LTR Retrotransposons of Multicellular Organisms Encode Virus-Like 2A Oligopeptide Sequences, Which Mediate Translational Recoding during Protein Synthesis

    PubMed Central

    Odon, Valerie; Luke, Garry A.; Roulston, Claire; Brown, Jeremy D.; Ryan, Martin D.; Sukhodub, Andriy

    2013-01-01

    2A oligopeptide sequences (“2As”) mediate a cotranslational recoding event termed “ribosome skipping.” Previously we demonstrated the activity of 2As (and “2A-like sequences”) within a wide range of animal RNA virus genomes and non-long terminal repeat retrotransposons (non-LTRs) in the genomes of the unicellular organisms Trypanosoma brucei (Ingi) and T. cruzi (L1Tc). Here, we report the presence of 2A-like sequences in the genomes of a wide range of multicellular organisms and, as in the trypanosome genomes, within non-LTR retrotransposons (non-LTRs)—clustering in the Rex1, Crack, L2, L2A, and CR1 clades, in addition to Ingi. These 2A-like sequences were tested for translational recoding activity, and highly active sequences were found within the Rex1, L2, CR1, and Ingi clades. The presence of 2A-like sequences within non-LTRs may not only represent a method of controlling protein biogenesis but also shows some correlation with such apurinic/apyrimidinic DNA endonuclease-type non-LTRs encoding one, rather than two, open reading frames (ORFs). Interestingly, such non-LTRs cluster with closely related elements lacking 2A-like recoding elements but retaining ORF1. Taken together, these observations suggest that acquisition of 2A-like translational recoding sequences may have played a role in the evolution of these elements. PMID:23728794

  16. Non-contiguous finished genome sequence and description of Alistipes timonensis sp. nov.

    PubMed Central

    Lagier, Jean-Christophe; Armougom, Fabrice; Mishra, Ajay Kumar; Nguyen, Thi-Tien; Raoult, Didier; Fournier, Pierre-Edouard

    2012-01-01

    Alistipes timonensis strain JC136T sp. nov. is the type strain of A. timonensis sp. nov., a new species within the genus Alistipes. This strain, whose genome is described here, was isolated from the fecal flora of a healthy patient. A. timonensis is an obligate anaerobic rod. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,497,779 bp long genome (one chromosome but no plasmid) contains 2,742 protein-coding and 50 RNA genes, including three rRNA genes. PMID:23408657

  17. Clan Genomics and the Complex Architecture of Human Disease

    PubMed Central

    Belmont, John W.; Boerwinkle, Eric

    2013-01-01

    Human diseases are caused by alleles that encompass the full range of variant types, from single-nucleotide changes to copy-number variants, and these variations span a broad frequency spectrum, from the very rare to the common. The picture emerging from analysis of whole-genome sequences, the 1000 Genomes Project pilot studies, and targeted genomic sequencing derived from very large sample sizes reveals an abundance of rare and private variants. One implication of this realization is that recent mutation may have a greater influence on disease susceptibility or protection than is conferred by variations that arose in distant ancestors. PMID:21962505

  18. Anonymization of electronic medical records for validating genome-wide association studies

    PubMed Central

    Loukides, Grigorios; Gkoulalas-Divanis, Aris; Malin, Bradley

    2010-01-01

    Genome-wide association studies (GWAS) facilitate the discovery of genotype–phenotype relations from population-based sequence databases, which is an integral facet of personalized medicine. The increasing adoption of electronic medical records allows large amounts of patients’ standardized clinical features to be combined with the genomic sequences of these patients and shared to support validation of GWAS findings and to enable novel discoveries. However, disseminating these data “as is” may lead to patient reidentification when genomic sequences are linked to resources that contain the corresponding patients’ identity information based on standardized clinical features. This work proposes an approach that provably prevents this type of data linkage and furnishes a result that helps support GWAS. Our approach automatically extracts potentially linkable clinical features and modifies them in a way that they can no longer be used to link a genomic sequence to a small number of patients, while preserving the associations between genomic sequences and specific sets of clinical features corresponding to GWAS-related diseases. Extensive experiments with real patient data derived from the Vanderbilt's University Medical Center verify that our approach generates data that eliminate the threat of individual reidentification, while supporting GWAS validation and clinical case analysis tasks. PMID:20385806

  19. Patterns of Viral DNA Integration in Cells Transformed by Wild Type or DNA-Binding Protein Mutants of Adenovirus Type 5 and Effect of Chemical Carcinogens on Integration

    PubMed Central

    Dorsch-Häsler, Karoline; Fisher, Paul B.; Weinstein, I. Bernard; Ginsberg, Harold S.

    1980-01-01

    The integration pattern of viral DNA was studied in a number of cell lines transformed by wild-type adenovirus type 5 (Ad5 WT) and two mutants of the DNA-binding protein gene, H5ts125 and H5ts107. The effect of chemical carcinogens on the integration of viral DNA was also investigated. Liquid hybridization (C0t) analyses showed that rat embryo cells transformed by Ad5 WT usually contained only the left-hand end of the viral genome, whereas cell lines transformed by H5ts125 or H5ts107 at either the semipermissive (36°C) or nonpermissive (39.5°C) temperature often contained one to five copies of all or most of the entire adenovirus genome. The arrangement of the integrated adenovirus DNA sequences was determined by cleavage of transformed cell DNA with restriction endonucleases XbaI, EcoRI, or HindIII followed by transfer of separated fragments to nitrocellulose paper and hybridization according to the technique of E. M. Southern (J. Mol. Biol. 98: 503-517, 1975). It was found that the adenovirus genome is integrated as a linear sequence covalently linked to host cell DNA; that the viral DNA is integrated into different host DNA sequences in each cell line studied; that in cell lines that contain multiple copies of the Ad5 genome the viral DNA sequences can be integrated in a single set of host cell DNA sequences and not as concatemers; and that chemical carcinogens do not alter the extent or pattern of viral DNA integration. Images PMID:6246266

  20. Whole genome sequencing for typing and characterisation of Listeria monocytogenes isolated in a rabbit meat processing plant.

    PubMed

    Palma, Federica; Pasquali, Frédérique; Lucchi, Alex; Cesare, Alessandra De; Manfreda, Gerardo

    2017-08-16

    Listeria monocytogenes is a food-borne pathogen able to survive and grow in different environments including food processing plants where it can persist for month or years. In the present study the discriminatory power of Whole Genome Sequencing (WGS)-based analysis (cgMLST) was compared to that of molecular typing methods on 34 L. monocytogenes isolates collected over one year in the same rabbit meat processing plant and belonging to three genotypes (ST14, ST121, ST224). Each genotype included isolates indistinguishable by standard molecular typing methods. The virulence potential of all isolates was assessed by Multi Virulence-Locus Sequence Typing (MVLST) and the investigation of a representative database of virulence determinant genes. The whole genome of each isolate was sequenced on a MiSeq platform. The cgMLST, MVLST, and in silico identification of virulence genes were performed using publicly available tools. Draft genomes included a number of contigs ranging from 13 to 28 and N50 ranging from 456298 to 580604. The coverage ranged from 41 to 187X. The cgMLST showed a significantly superior discriminatory power only in comparison to ribotyping, nevertheless it allows the detection of two singletons belonging to ST14 that were not observed by other molecular methods. All ST14 isolates belonged to VT107, which 7-loci concatenated sequence differs for only 4 nucleotides to VT1 (Epidemic clone III). Analysis of virulence genes showed the presence of a fulllength inlA version in all ST14 isolates and of a mutated version including a premature stop codon (PMSC) associated to attenuated virulence in all ST121 isolates.

  1. Genome sequence of the pink–pigmented marine bacterium Loktanella hongkongensis type strain (UST950701–009PT), a representative of the Roseobacter group

    DOE PAGES

    Lau, Stanley CK; Riedel, Thomas; Fiebig, Anne; ...

    2015-08-11

    Loktanella hongkongensis UST950701-009PT is a Gram-negative, non-motile and rod-shaped bacterium isolated from a marine biofilm in the subtropical seawater of Hong Kong. When growing as a monospecies biofilm on polystyrene surfaces, this bacterium is able to induce larval settlement and metamorphosis of a ubiquitous polychaete tubeworm Hydroides elegans. The inductive cues are low-molecular weight compounds bound to the exopolymeric matrix of the bacterial cells. In the present study we describe the features of L. hongkongensis strain DSM 17492T together with its genome sequence and annotation and novel aspects of its phenotype. The 3,198,444 bp long genome sequence encodes 3104 protein-codingmore » genes and 57 RNA genes. Lastly, the two unambiguously identified extrachromosomal replicons contain replication modules of the RepB and the Rhodobacteraceae-specific DnaA-like type, respectively.« less

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Larimer, Frank W; Chain, Patrick S. G.; Hauser, Loren John

    Rhodopseudomonas palustris is among the most metabolically versatile bacteria known. It uses light, inorganic compounds, or organic compounds, for energy. It acquires carbon from many types of green plant-derived compounds or by carbon dioxide fixation, and it fixes nitrogen. Here we describe the genome sequence of R. palustris, which consists of a 5,459,213-base-pair (bp) circular chromosome with 4,836 predicted genes and a plasmid of 8,427 bp. The sequence reveals genes that confer a remarkably large number of options within a given type of metabolism, including three nitrogenases, five benzene ring cleavage pathways and four light harvesting 2 systems. R. palustrismore » encodes 63 signal transduction histidine kinases and 79 response regulator receiver domains. Almost 15% of the genome is devoted to transport. This genome sequence is a starting point to use R. palustris as a model to explore how organisms integrate metabolic modules in response to environmental perturbations.« less

  3. Genome sequence of the pink-pigmented marine bacterium Loktanella hongkongensis type strain (UST950701-009P(T)), a representative of the Roseobacter group.

    PubMed

    Lau, Stanley Ck; Riedel, Thomas; Fiebig, Anne; Han, James; Huntemann, Marcel; Petersen, Jörn; Ivanova, Natalia N; Markowitz, Victor; Woyke, Tanja; Göker, Markus; Kyrpides, Nikos C; Klenk, Hans-Peter; Qian, Pei-Yuan

    2015-01-01

    Loktanella hongkongensis UST950701-009P(T) is a Gram-negative, non-motile and rod-shaped bacterium isolated from a marine biofilm in the subtropical seawater of Hong Kong. When growing as a monospecies biofilm on polystyrene surfaces, this bacterium is able to induce larval settlement and metamorphosis of a ubiquitous polychaete tubeworm Hydroides elegans. The inductive cues are low-molecular weight compounds bound to the exopolymeric matrix of the bacterial cells. In the present study we describe the features of L. hongkongensis strain DSM 17492(T) together with its genome sequence and annotation and novel aspects of its phenotype. The 3,198,444 bp long genome sequence encodes 3104 protein-coding genes and 57 RNA genes. The two unambiguously identified extrachromosomal replicons contain replication modules of the RepB and the Rhodobacteraceae-specific DnaA-like type, respectively.

  4. Extraction of High Molecular Weight DNA from Fungal Rust Spores for Long Read Sequencing.

    PubMed

    Schwessinger, Benjamin; Rathjen, John P

    2017-01-01

    Wheat rust fungi are complex organisms with a complete life cycle that involves two different host plants and five different spore types. During the asexual infection cycle on wheat, rusts produce massive amounts of dikaryotic urediniospores. These spores are dikaryotic (two nuclei) with each nucleus containing one haploid genome. This dikaryotic state is likely to contribute to their evolutionary success, making them some of the major wheat pathogens globally. Despite this, most published wheat rust genomes are highly fragmented and contain very little haplotype-specific sequence information. Current long-read sequencing technologies hold great promise to provide more contiguous and haplotype-phased genome assemblies. Long reads are able to span repetitive regions and phase structural differences between the haplomes. This increased genome resolution enables the identification of complex loci and the study of genome evolution beyond simple nucleotide polymorphisms. Long-read technologies require pure high molecular weight DNA as an input for sequencing. Here, we describe a DNA extraction protocol for rust spores that yields pure double-stranded DNA molecules with molecular weight of >50 kilo-base pairs (kbp). The isolated DNA is of sufficient purity for PacBio long-read sequencing, but may require additional purification for other sequencing technologies such as Nanopore and 10× Genomics.

  5. The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color.

    PubMed

    Motamayor, Juan C; Mockaitis, Keithanne; Schmutz, Jeremy; Haiminen, Niina; Livingstone, Donald; Cornejo, Omar; Findley, Seth D; Zheng, Ping; Utro, Filippo; Royaert, Stefan; Saski, Christopher; Jenkins, Jerry; Podicheti, Ram; Zhao, Meixia; Scheffler, Brian E; Stack, Joseph C; Feltus, Frank A; Mustiga, Guiliana M; Amores, Freddy; Phillips, Wilbert; Marelli, Jean Philippe; May, Gregory D; Shapiro, Howard; Ma, Jianxin; Bustamante, Carlos D; Schnell, Raymond J; Main, Dorrie; Gilbert, Don; Parida, Laxmi; Kuhn, David N

    2013-06-03

    Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits.

  6. The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color

    PubMed Central

    2013-01-01

    Background Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. Results We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. Conclusions We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits. PMID:23731509

  7. Improvement of the banana "Musa acuminata" reference sequence using NGS data and semi-automated bioinformatics methods.

    PubMed

    Martin, Guillaume; Baurens, Franc-Christophe; Droc, Gaëtan; Rouard, Mathieu; Cenci, Alberto; Kilian, Andrzej; Hastie, Alex; Doležel, Jaroslav; Aury, Jean-Marc; Alberti, Adriana; Carreel, Françoise; D'Hont, Angélique

    2016-03-16

    Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata). We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80%), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5% of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70%. Unknown sites (N) were reduced from 17.3 to 10.0%. The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species.

  8. Filovirus RefSeq Entries: Evaluation and Selection of Filovirus Type Variants, Type Sequences, and Names

    PubMed Central

    Kuhn, Jens H.; Andersen, Kristian G.; Bào, Yīmíng; Bavari, Sina; Becker, Stephan; Bennett, Richard S.; Bergman, Nicholas H.; Blinkova, Olga; Bradfute, Steven; Brister, J. Rodney; Bukreyev, Alexander; Chandran, Kartik; Chepurnov, Alexander A.; Davey, Robert A.; Dietzgen, Ralf G.; Doggett, Norman A.; Dolnik, Olga; Dye, John M.; Enterlein, Sven; Fenimore, Paul W.; Formenty, Pierre; Freiberg, Alexander N.; Garry, Robert F.; Garza, Nicole L.; Gire, Stephen K.; Gonzalez, Jean-Paul; Griffiths, Anthony; Happi, Christian T.; Hensley, Lisa E.; Herbert, Andrew S.; Hevey, Michael C.; Hoenen, Thomas; Honko, Anna N.; Ignatyev, Georgy M.; Jahrling, Peter B.; Johnson, Joshua C.; Johnson, Karl M.; Kindrachuk, Jason; Klenk, Hans-Dieter; Kobinger, Gary; Kochel, Tadeusz J.; Lackemeyer, Matthew G.; Lackner, Daniel F.; Leroy, Eric M.; Lever, Mark S.; Mühlberger, Elke; Netesov, Sergey V.; Olinger, Gene G.; Omilabu, Sunday A.; Palacios, Gustavo; Panchal, Rekha G.; Park, Daniel J.; Patterson, Jean L.; Paweska, Janusz T.; Peters, Clarence J.; Pettitt, James; Pitt, Louise; Radoshitzky, Sheli R.; Ryabchikova, Elena I.; Saphire, Erica Ollmann; Sabeti, Pardis C.; Sealfon, Rachel; Shestopalov, Aleksandr M.; Smither, Sophie J.; Sullivan, Nancy J.; Swanepoel, Robert; Takada, Ayato; Towner, Jonathan S.; van der Groen, Guido; Volchkov, Viktor E.; Volchkova, Valentina A.; Wahl-Jensen, Victoria; Warren, Travis K.; Warfield, Kelly L.; Weidmann, Manfred; Nichol, Stuart T.

    2014-01-01

    Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information’s (NCBI’s) RefSeq is a non-redundant, curated database for reference (or type) nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [ ()////-], we report consensus decisions from a majority of past and currently active filovirus experts on the eight filovirus type variants and isolates to be represented in RefSeq, their final designations, and their associated sequences. PMID:25256396

  9. 1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life

    DOE PAGES

    Mukherjee, Supratim; Seshadri, Rekha; Varghese, Neha J.; ...

    2017-06-12

    We present 1,003 reference genomes that were sequenced as part of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) initiative, selected to maximize sequence coverage of phylogenetic space. These genomes double the number of existing type strains and expand their overall phylogenetic diversity by 25%. Comparative analyses with previously available finished and draft genomes reveal a 10.5% increase in novel protein families as a function of phylogenetic diversity. The GEBA genomes recruit 25 million previously unassigned metagenomic proteins from 4,650 samples, improving their phylogenetic and functional interpretation. We identify numerous biosynthetic clusters and experimentally validate a divergent phenazine cluster withmore » potential new chemical structure and antimicrobial activity. This Resource is the largest single release of reference genomes to date. Bacterial and archaeal isolate sequence space is still far from saturated, and future endeavors in this direction will continue to be a valuable resource for scientific discovery.« less

  10. 1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mukherjee, Supratim; Seshadri, Rekha; Varghese, Neha J.

    We present 1,003 reference genomes that were sequenced as part of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) initiative, selected to maximize sequence coverage of phylogenetic space. These genomes double the number of existing type strains and expand their overall phylogenetic diversity by 25%. Comparative analyses with previously available finished and draft genomes reveal a 10.5% increase in novel protein families as a function of phylogenetic diversity. The GEBA genomes recruit 25 million previously unassigned metagenomic proteins from 4,650 samples, improving their phylogenetic and functional interpretation. We identify numerous biosynthetic clusters and experimentally validate a divergent phenazine cluster withmore » potential new chemical structure and antimicrobial activity. This Resource is the largest single release of reference genomes to date. Bacterial and archaeal isolate sequence space is still far from saturated, and future endeavors in this direction will continue to be a valuable resource for scientific discovery.« less

  11. Genome-wide analysis of wild-type Epstein-Barr virus genomes derived from healthy individuals of the 1,000 Genomes Project.

    PubMed

    Santpere, Gabriel; Darre, Fleur; Blanco, Soledad; Alcami, Antonio; Villoslada, Pablo; Mar Albà, M; Navarro, Arcadi

    2014-04-01

    Most people in the world (∼90%) are infected by the Epstein-Barr virus (EBV), which establishes itself permanently in B cells. Infection by EBV is related to a number of diseases including infectious mononucleosis, multiple sclerosis, and different types of cancer. So far, only seven complete EBV strains have been described, all of them coming from donors presenting EBV-related diseases. To perform a detailed comparative genomic analysis of EBV including, for the first time, EBV strains derived from healthy individuals, we reconstructed EBV sequences infecting lymphoblastoid cell lines (LCLs) from the 1000 Genomes Project. As strain B95-8 was used to transform B cells to obtain LCLs, it is always present, but a specific deletion in its genome sets it apart from natural EBV strains. After studying hundreds of individuals, we determined the presence of natural EBV in at least 10 of them and obtained a set of variants specific to wild-type EBV. By mapping the natural EBV reads into the EBV reference genome (NC007605), we constructed nearly complete wild-type viral genomes from three individuals. Adding them to the five disease-derived EBV genomic sequences available in the literature, we performed an in-depth comparative genomic analysis. We found that latency genes harbor more nucleotide diversity than lytic genes and that six out of nine latency-related genes, as well as other genes involved in viral attachment and entry into host cells, packaging, and the capsid, present the molecular signature of accelerated protein evolution rates, suggesting rapid host-parasite coevolution.

  12. CRISPR-based screening of genomic island excision events in bacteria.

    PubMed

    Selle, Kurt; Klaenhammer, Todd R; Barrangou, Rodolphe

    2015-06-30

    Genomic analysis of Streptococcus thermophilus revealed that mobile genetic elements (MGEs) likely contributed to gene acquisition and loss during evolutionary adaptation to milk. Clustered regularly interspaced short palindromic repeats-CRISPR-associated genes (CRISPR-Cas), the adaptive immune system in bacteria, limits genetic diversity by targeting MGEs including bacteriophages, transposons, and plasmids. CRISPR-Cas systems are widespread in streptococci, suggesting that the interplay between CRISPR-Cas systems and MGEs is one of the driving forces governing genome homeostasis in this genus. To investigate the genetic outcomes resulting from CRISPR-Cas targeting of integrated MGEs, in silico prediction revealed four genomic islands without essential genes in lengths from 8 to 102 kbp, totaling 7% of the genome. In this study, the endogenous CRISPR3 type II system was programmed to target the four islands independently through plasmid-based expression of engineered CRISPR arrays. Targeting lacZ within the largest 102-kbp genomic island was lethal to wild-type cells and resulted in a reduction of up to 2.5-log in the surviving population. Genotyping of Lac(-) survivors revealed variable deletion events between the flanking insertion-sequence elements, all resulting in elimination of the Lac-encoding island. Chimeric insertion sequence footprints were observed at the deletion junctions after targeting all of the four genomic islands, suggesting a common mechanism of deletion via recombination between flanking insertion sequences. These results established that self-targeting CRISPR-Cas systems may direct significant evolution of bacterial genomes on a population level, influencing genome homeostasis and remodeling.

  13. Whole genome typing of the recently emerged Canadian serogroup W Neisseria meningitidis sequence type 11 clonal complex isolates associated with invasive meningococcal disease.

    PubMed

    Tsang, Raymond S W; Ahmad, Tauqeer; Tyler, Shaun; Lefebvre, Brigitte; Deeks, Shelley L; Gilca, Rodica; Hoang, Linda; Tyrrell, Gregory; Van Caeseele, Paul; Van Domselaar, Gary; Jamieson, Frances B

    2018-04-01

    This study was performed to analyze the Canadian invasive serogroup W Neisseria meningitidis (MenW) sequence type 11 (ST-11) clonal complex (CC) isolates by whole genome typing and to compare Canadian isolates with similar isolates from elsewhere. Whole genome typing of 30 MenW ST-11 CC, 20 meningococcal group C (MenC) ST-11 CC, and 31 MenW ST-22 CC isolates was performed on the Bacterial Isolate Genome Sequence database platform. Canadian MenW ST-11 CC isolates were compared with the 2000 MenW Hajj outbreak strain, as well as with MenW ST-11 CC from other countries. Whole genome typing showed that the Canadian MenW ST-11 CC isolates were distinct from the traditional MenW ST-22 CC; they were not capsule-switched contemporary MenC strains that incorporated MenW capsules. While some recent MenW disease cases in Canada were caused by MenW ST-11 CC isolates showing relatedness to the 2000 MenW Hajj strain, many were non-Hajj isolates similar to current MenW ST-11 isolates found globally. Geographical and temporal variations in genotypes and surface protein antigen genes were found among the MenW ST-11 CC isolates. The current MenW ST-11 isolates did not arise by capsule switching from contemporary MenC ST-11 isolates. Both the Hajj-related and non-Hajj MenW ST-11 CC strains were associated with invasive meningococcal disease in Canada. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  14. Differential Distribution of Type II CRISPR-Cas Systems in Agricultural and Nonagricultural Campylobacter coli and Campylobacter jejuni Isolates Correlates with Lack of Shared Environments

    PubMed Central

    Pearson, Bruce M.; Louwen, Rogier; van Baarlen, Peter; van Vliet, Arnoud H.M.

    2015-01-01

    CRISPR (clustered regularly interspaced palindromic repeats)-Cas (CRISPR-associated) systems are sequence-specific adaptive defenses against phages and plasmids which are widespread in prokaryotes. Here we have studied whether phylogenetic relatedness or sharing of environmental niches affects the distribution and dissemination of Type II CRISPR-Cas systems, first in 132 bacterial genomes from 15 phylogenetic classes, ranging from Proteobacteria to Actinobacteria. There was clustering of distinct Type II CRISPR-Cas systems in phylogenetically distinct genera with varying G+C%, which share environmental niches. The distribution of CRISPR-Cas within a genus was studied using a large collection of genome sequences of the closely related Campylobacter species Campylobacter jejuni (N = 3,746) and Campylobacter coli (N = 486). The Cas gene cas9 and CRISPR-repeat are almost universally present in C. jejuni genomes (98.0% positive) but relatively rare in C. coli genomes (9.6% positive). Campylobacter jejuni and agricultural C. coli isolates share the C. jejuni CRISPR-Cas system, which is closely related to, but distinct from the C. coli CRISPR-Cas system found in C. coli isolates from nonagricultural sources. Analysis of the genomic position of CRISPR-Cas insertion suggests that the C. jejuni-type CRISPR-Cas has been transferred to agricultural C. coli. Conversely, the absence of the C. coli-type CRISPR-Cas in agricultural C. coli isolates may be due to these isolates not sharing the same environmental niche, and may be affected by farm hygiene and biosecurity practices in the agricultural sector. Finally, many CRISPR spacer alleles were linked with specific multilocus sequence types, suggesting that these can assist molecular epidemiology applications for C. jejuni and C. coli. PMID:26338188

  15. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liolios, Konstantinos; Abt, Birte; Scheuner, Carmen

    Spirochaeta africana Zhilina et al. 1996 is an anaerobic, aerotolerant, spiral-shaped bacte- rium that is motile via periplasmic flagella. The type strain of the species, Z-7692T, was iso- lated in 1993 or earlier from a bacterial bloom in the brine under the trona layer in a shallow lagoon of the alkaline equatorial Lake Magadi in Kenya. Here we describe the features of this organism, together with the complete genome sequence, and annotation. Considering the pending reclassification of S. caldaria to the genus Treponema, S. africana is only the second 'true' member of the genus Spirochaeta with a genome-sequenced type strainmore » to be pub- lished. The 3,285,855 bp long genome of strain Z-7692T with its 2,817 protein-coding and 57 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  16. In-silico Taxonomic Classification of 373 Genomes Reveals Species Misidentification and New Genospecies within the Genus Pseudomonas

    PubMed Central

    Tran, Phuong N.; Savka, Michael A.; Gan, Han Ming

    2017-01-01

    The genus Pseudomonas has one of the largest diversity of species within the Bacteria kingdom. To date, its taxonomy is still being revised and updated. Due to the non-standardized procedure and ambiguous thresholds at species level, largely based on 16S rRNA gene or conventional biochemical assay, species identification of publicly available Pseudomonas genomes remains questionable. In this study, we performed a large-scale analysis of all Pseudomonas genomes with species designation (excluding the well-defined P. aeruginosa) and re-evaluated their taxonomic assignment via in silico genome-genome hybridization and/or genetic comparison with valid type species. Three-hundred and seventy-three pseudomonad genomes were analyzed and subsequently clustered into 145 distinct genospecies. We detected 207 erroneous labels and corrected 43 to the proper species based on Average Nucleotide Identity Multilocus Sequence Typing (MLST) sequence similarity to the type strain. Surprisingly, more than half of the genomes initially designated as Pseudomonas syringae and Pseudomonas fluorescens should be classified either to a previously described species or to a new genospecies. Notably, high pairwise average nucleotide identity (>95%) indicating species-level similarity was observed between P. synxantha-P. libanensis, P. psychrotolerans–P. oryzihabitans, and P. kilonensis- P. brassicacearum, that were previously differentiated based on conventional biochemical tests and/or genome-genome hybridization techniques. PMID:28747902

  17. Construction of an ultra-high density consensus genetic map, and enhancement of the physical map from genome sequencing in Lupinus angustifolius.

    PubMed

    Zhou, Gaofeng; Jian, Jianbo; Wang, Penghao; Li, Chengdao; Tao, Ye; Li, Xuan; Renshaw, Daniel; Clements, Jonathan; Sweetingham, Mark; Yang, Huaan

    2018-01-01

    An ultra-high density genetic map containing 34,574 sequence-defined markers was developed in Lupinus angustifolius. Markers closely linked to nine genes of agronomic traits were identified. A physical map was improved to cover 560.5 Mb genome sequence. Lupin (Lupinus angustifolius L.) is a recently domesticated legume grain crop. In this study, we applied the restriction-site associated DNA sequencing (RADseq) method to genotype an F 9 recombinant inbred line population derived from a wild type × domesticated cultivar (W × D) cross. A high density linkage map was developed based on the W × D population. By integrating sequence-defined DNA markers reported in previous mapping studies, we established an ultra-high density consensus genetic map, which contains 34,574 markers consisting of 3508 loci covering 2399 cM on 20 linkage groups. The largest gap in the entire consensus map was 4.73 cM. The high density W × D map and the consensus map were used to develop an improved physical map, which covered 560.5 Mb of genome sequence data. The ultra-high density consensus linkage map, the improved physical map and the markers linked to genes of breeding interest reported in this study provide a common tool for genome sequence assembly, structural genomics, comparative genomics, functional genomics, QTL mapping, and molecular plant breeding in lupin.

  18. Cancer related gene alterations can be detected with next-generation sequencing analysis of bile in diffusely infiltrating type cholangiocarcinoma.

    PubMed

    Lee, Chang Hun; Wang, Hong En; Seo, Seung Young; Kim, Seong Hun; Kim, In Hee; Kim, Sang Wook; Lee, Soo Teik; Kim, Dae Ghon; Han, Myung Kwan; Lee, Seung Ok

    2016-08-01

    Genome-wide association study in diffusely infiltrating type cholangiocarcinoma (CC) can be limited due to the difficulty of obtaining tumor tissue. We aimed to evaluate the genomic alterations of diffusely infiltrating type CC using next-generation sequencing (NGS) of bile and to compare the variations with those of mass-forming type CC. A total of 24 bile samples obtained during endoscopic retrograde cholangiopancreatography (ERCP) and 17 surgically obtained tumor tissue samples were evaluated. Buffy coat and normal tissue samples were used as controls for a somatic mutation analysis. After extraction of genomic DNA, NGS analysis was performed for 48 cancer related genes. There were 27 men and 14 women with a mean age of 65.0±11.8years. The amount of extracted genomic DNA from 3cm(3) of bile was 66.0±84.7μg and revealed a high depth of sequencing coverage. All of the patients had genomic variations, with an average number of 19.4±2.8 and 22.3±3.3 alterations per patient from the bile and tumor tissue, respectively. After filtering process, damaging SNPs (8 sites for each type of CC) were predicted by analyzing tools, and their target genes showed relevant differences between the diffusely infiltrating and mass-forming type CC. Finally, in somatic mutation analysis, tumor-normal paired 14 tissue and 6 bile samples were analyzed, genomic alterations of EGFR, FGFR1, ABL1, PIK3CA, and CDKN2A gene were seen in the diffusely infiltrating type CC, and TP53, KRAS, APC, GNA11, ERBB4, ATM, SMAD4, BRAF, and IDH1 were altered in the mass-forming type CC group. STK11, GNAQ, RB1, KDR, and SMO genes were revealed in both groups. The NGS analysis was feasible with bile sample and diffusely infiltrating type CC revealed genetic differences compared with mass-forming type CC. Genome-wide association study could be performed using bile sample in the patients with CC undergoing ERCP and a different genetic approach for accurate diagnosis, pathogenesis study, and targeted therapy will be needed in diffusely infiltrating type CC. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. MendeLIMS: a web-based laboratory information management system for clinical genome sequencing.

    PubMed

    Grimes, Susan M; Ji, Hanlee P

    2014-08-27

    Large clinical genomics studies using next generation DNA sequencing require the ability to select and track samples from a large population of patients through many experimental steps. With the number of clinical genome sequencing studies increasing, it is critical to maintain adequate laboratory information management systems to manage the thousands of patient samples that are subject to this type of genetic analysis. To meet the needs of clinical population studies using genome sequencing, we developed a web-based laboratory information management system (LIMS) with a flexible configuration that is adaptable to continuously evolving experimental protocols of next generation DNA sequencing technologies. Our system is referred to as MendeLIMS, is easily implemented with open source tools and is also highly configurable and extensible. MendeLIMS has been invaluable in the management of our clinical genome sequencing studies. We maintain a publicly available demonstration version of the application for evaluation purposes at http://mendelims.stanford.edu. MendeLIMS is programmed in Ruby on Rails (RoR) and accesses data stored in SQL-compliant relational databases. Software is freely available for non-commercial use at http://dna-discovery.stanford.edu/software/mendelims/.

  20. Evaluation and validation of de novo and hybrid assembly techniques to derive high quality genome sequences

    DOE PAGES

    Utturkar, Sagar M.; Klingeman, Dawn Marie; Land, Miriam L.; ...

    2014-06-14

    Our motivation with this work was to assess the potential of different types of sequence data combined with de novo and hybrid assembly approaches to improve existing draft genome sequences. Our results show Illumina, 454 and PacBio sequencing technologies were used to generate de novo and hybrid genome assemblies for four different bacteria, which were assessed for quality using summary statistics (e.g. number of contigs, N50) and in silico evaluation tools. Differences in predictions of multiple copies of rDNA operons for each respective bacterium were evaluated by PCR and Sanger sequencing, and then the validated results were applied as anmore » additional criterion to rank assemblies. In general, assemblies using longer PacBio reads were better able to resolve repetitive regions. In this study, the combination of Illumina and PacBio sequence data assembled through the ALLPATHS-LG algorithm gave the best summary statistics and most accurate rDNA operon number predictions. This study will aid others looking to improve existing draft genome assemblies. As to availability and implementation–all assembly tools except CLC Genomics Workbench are freely available under GNU General Public License.« less

  1. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Klenk, Hans-Peter; Held, Brittany; Lucas, Susan

    Saccharomonospora azurea Runmao et al. 1987 is a member to the genomically so far poorly characterized genus Saccharomonospora in the family Pseudonocardiaceae. Members of the genus Sacharomonosoras are of interest because they originate from diverse habitats, such as leaf litter, manure, compost, surface of peat, moist and over-heated grain, where they might play a role in the primary degradation of plant material by attacking hemicellulose. They are Gram-negative staining organisms classified among the usually Gram-positive actinomycetes. Next to S. viridis, S. azurea is only the second member in the genus Saccharomonospora for with a completely sequenced type strain genome willmore » be published. Here we describe the features of this organism, together with the complete genome sequence with project status 'permanent draft', and annotation. The 4,763,832 bp long chromosome with its 4,472 protein-coding and 58 RNA genes was sequenced as part of the DOE funded Community Sequencing Program (CSP) 2010 at the Joint Genome Institute (JGI).« less

  2. The Draft Genome Sequence of a Novel High-Efficient Butanol-Producing Bacterium Clostridium Diolis Strain WST.

    PubMed

    Chen, Chaoyang; Sun, Chongran; Wu, Yi-Rui

    2018-03-21

    A wild-type solventogenic strain Clostridium diolis WST, isolated from mangrove sediments, was characterized to produce high amount of butanol and acetone with negligible level of ethanol and acids from glucose via a unique acetone-butanol (AB) fermentation pathway. Through the genomic sequencing, the assembled draft genome of strain WST is calculated to be 5.85 Mb with a GC content of 29.69% and contains 5263 genes that contribute to the annotation of 5049 protein-coding sequences. Within these annotated genes, the butanol dehydrogenase gene (bdh) was determined to be in a higher amount from strain WST compared to other Clostridial strains, which is positively related to its high-efficient production of butanol. Therefore, we present a draft genome sequence analysis of strain WST in this article that should facilitate to further understand the solventogenic mechanism of this special microorganism.

  3. Community-led comparative genomic and phenotypic analysis of the aquaculture pathogen Pseudomonas baetica a390T sequenced by Ion semiconductor and Nanopore technologies

    PubMed Central

    Beaton, Ainsley; Lood, Cédric; Cunningham-Oakes, Edward; MacFadyen, Alison; Mullins, Alex J; Bestawy, Walid El; Botelho, João; Chevalier, Sylvie; Dalzell, Chloe; Dolan, Stephen K; Faccenda, Alberto; Ghequire, Maarten G K; Higgins, Steven; Kutschera, Alexander; Murray, Jordan; Redway, Martha; Salih, Talal; Smith, Brian A; Smits, Nathan; Thomson, Ryan; Woodcock, Stuart; Cornelis, Pierre; Lavigne, Rob; van Noort, Vera

    2018-01-01

    Abstract Pseudomonas baetica strain a390T is the type strain of this recently described species and here we present its high-contiguity draft genome. To celebrate the 16th International Conference on Pseudomonas, the genome of P. baetica strain a390T was sequenced using a unique combination of Ion Torrent semiconductor and Oxford Nanopore methods as part of a collaborative community-led project. The use of high-quality Ion Torrent sequences with long Nanopore reads gave rapid, high-contiguity and -quality, 16-contig genome sequence. Whole genome phylogenetic analysis places P. baetica within the P. koreensis clade of the P. fluorescens group. Comparison of the main genomic features of P. baetica with a variety of other Pseudomonas spp. suggests that it is a highly adaptable organism, typical of the genus. This strain was originally isolated from the liver of a diseased wedge sole fish, and genotypic and phenotypic analyses show that it is tolerant to osmotic stress and to oxytetracycline. PMID:29579234

  4. Complete genome sequence of the bacteriochlorophyll a-containing Roseibacterium elongatum type strain (DSM 19469T), a representative of the Roseobacter group isolated from Australian coast sand

    PubMed Central

    Riedel, Thomas; Fiebig, Anne; Göker, Markus; Klenk, Hans-Peter

    2014-01-01

    Roseibacterium elongatum Suzuki et al. 2006 is a pink-pigmented and bacteriochlorophyll a-producing representative of the Roseobacter group within the alphaproteobacterial family Rhodobacteraceae. Representatives of the marine ‘Roseobacter group’ were found to be abundant in the ocean and play an important role in global and biogeochemical processes. In the present study we describe the features of R. elongatum strain OCh 323T together with its genome sequence and annotation. The 3,555,102 bp long genome consists of one circular chromosome with no extrachromosomal elements and is one of the smallest known Roseobacter genomes. It contains 3,540 protein-coding genes and 59 RNA genes. Genome analysis revealed the presence of a photosynthetic gene cluster, which putatively enables a photoheterotrophic lifestyle. Gene sequences associated with quorum sensing, motility, surface attachment, and thiosulfate and carbon monoxide oxidation could be detected. The genome was sequenced as part of the activities of the Transregional Collaborative Research Centre 51 (TRR51) funded by the German Research Foundation (DFG). PMID:25197467

  5. Complete genome sequence of the bacteriochlorophyll a-containing Roseibacterium elongatum type strain (DSM 19469(T)), a representative of the Roseobacter group isolated from Australian coast sand.

    PubMed

    Riedel, Thomas; Fiebig, Anne; Göker, Markus; Klenk, Hans-Peter

    2014-06-15

    Roseibacterium elongatum Suzuki et al. 2006 is a pink-pigmented and bacteriochlorophyll a-producing representative of the Roseobacter group within the alphaproteobacterial family Rhodobacteraceae. Representatives of the marine 'Roseobacter group' were found to be abundant in the ocean and play an important role in global and biogeochemical processes. In the present study we describe the features of R. elongatum strain OCh 323(T) together with its genome sequence and annotation. The 3,555,102 bp long genome consists of one circular chromosome with no extrachromosomal elements and is one of the smallest known Roseobacter genomes. It contains 3,540 protein-coding genes and 59 RNA genes. Genome analysis revealed the presence of a photosynthetic gene cluster, which putatively enables a photoheterotrophic lifestyle. Gene sequences associated with quorum sensing, motility, surface attachment, and thiosulfate and carbon monoxide oxidation could be detected. The genome was sequenced as part of the activities of the Transregional Collaborative Research Centre 51 (TRR51) funded by the German Research Foundation (DFG).

  6. Whole Genome Sequencing Demonstrates Limited Transmission within Identified Mycobacterium tuberculosis Clusters in New South Wales, Australia

    PubMed Central

    Gurjav, Ulziijargal; Outhred, Alexander C.; Jelfs, Peter; McCallum, Nadine; Wang, Qinning; Hill-Cawthorne, Grant A.; Marais, Ben J.; Sintchenko, Vitali

    2016-01-01

    Australia has a low tuberculosis incidence rate with most cases occurring among recent immigrants. Given suboptimal cluster resolution achieved with 24-locus mycobacterium interspersed repetitive unit (MIRU-24) genotyping, the added value of whole genome sequencing was explored. MIRU-24 profiles of all Mycobacterium tuberculosis culture-confirmed tuberculosis cases diagnosed between 2009 and 2013 in New South Wales (NSW), Australia, were examined and clusters identified. The relatedness of cases within the largest MIRU-24 clusters was assessed using whole genome sequencing and phylogenetic analyses. Of 1841 culture-confirmed TB cases, 91.9% (1692/1841) had complete demographic and genotyping data. East-African Indian (474; 28.0%) and Beijing (470; 27.8%) lineage strains predominated. The overall rate of MIRU-24 clustering was 20.1% (340/1692) and was highest among Beijing lineage strains (35.7%; 168/470). One Beijing and three East-African Indian (EAI) clonal complexes were responsible for the majority of observed clusters. Whole genome sequencing of the 4 largest clusters (30 isolates) demonstrated diverse single nucleotide polymorphisms (SNPs) within identified clusters. All sequenced EAI strains and 70% of Beijing lineage strains clustered by MIRU-24 typing demonstrated distinct SNP profiles. The superior resolution provided by whole genome sequencing demonstrated limited M. tuberculosis transmission within NSW, even within identified MIRU-24 clusters. Routine whole genome sequencing could provide valuable public health guidance in low burden settings. PMID:27737005

  7. Toward the 1,000 dollars human genome.

    PubMed

    Bennett, Simon T; Barnes, Colin; Cox, Anthony; Davies, Lisa; Brown, Clive

    2005-06-01

    Revolutionary new technologies, capable of transforming the economics of sequencing, are providing an unparalleled opportunity to analyze human genetic variation comprehensively at the whole-genome level within a realistic timeframe and at affordable costs. Current estimates suggest that it would cost somewhere in the region of 30 million US dollars to sequence an entire human genome using Sanger-based sequencing, and on one machine it would take about 60 years. Solexa is widely regarded as a company with the necessary disruptive technology to be the first to achieve the ultimate goal of the so-called 1,000 dollars human genome - the conceptual cost-point needed for routine analysis of individual genomes. Solexa's technology is based on completely novel sequencing chemistry capable of sequencing billions of individual DNA molecules simultaneously, a base at a time, to enable highly accurate, low cost analysis of an entire human genome in a single experiment. When applied over a large enough genomic region, these new approaches to resequencing will enable the simultaneous detection and typing of known, as well as unknown, polymorphisms, and will also offer information about patterns of linkage disequilibrium in the population being studied. Technological progress, leading to the advent of single-molecule-based approaches, is beginning to dramatically drive down costs and increase throughput to unprecedented levels, each being several orders of magnitude better than that which is currently available. A new sequencing paradigm based on single molecules will be faster, cheaper and more sensitive, and will permit routine analysis at the whole-genome level.

  8. Molecular analysis of the anaerobic rumen fungus Orpinomyces - insights into an AT-rich genome.

    PubMed

    Nicholson, Matthew J; Theodorou, Michael K; Brookman, Jayne L

    2005-01-01

    The anaerobic gut fungi occupy a unique niche in the intestinal tract of large herbivorous animals and are thought to act as primary colonizers of plant material during digestion. They are the only known obligately anaerobic fungi but molecular analysis of this group has been hampered by difficulties in their culture and manipulation, and by their extremely high A+T nucleotide content. This study begins to answer some of the fundamental questions about the structure and organization of the anaerobic gut fungal genome. Directed plasmid libraries using genomic DNA digested with highly or moderately rich AT-specific restriction enzymes (VspI and EcoRI) were prepared from a polycentric Orpinomyces isolate. Clones were sequenced from these libraries and the breadth of genomic inserts, both genic and intergenic, was characterized. Genes encoding numerous functions not previously characterized for these fungi were identified, including cytoskeletal, secretory pathway and transporter genes. A peptidase gene with no introns and having sequence similarity to a gene encoding a bacterial peptidase was also identified, extending the range of metabolic enzymes resulting from apparent trans-kingdom transfer from bacteria to fungi, as previously characterized largely for genes encoding plant-degrading enzymes. This paper presents the first thorough analysis of the genic, intergenic and rDNA regions of a variety of genomic segments from an anaerobic gut fungus and provides observations on rules governing intron boundaries, the codon biases observed with different types of genes, and the sequence of only the second anaerobic gut fungal promoter reported. Large numbers of retrotransposon sequences of different types were found and the authors speculate on the possible consequences of any such transposon activity in the genome. The coding sequences identified included several orphan gene sequences, including one with regions strongly suggestive of structural proteins such as collagens and lampirin. This gene was present as a single copy in Orpinomyces, was expressed during vegetative growth and was also detected in genomes from another gut fungal genus, Neocallimastix.

  9. Mitogenomes from type specimens, a genotyping tool for morphologically simple species: ten genomes of agar-producing red algae.

    PubMed

    Boo, Ga Hun; Hughey, Jeffery R; Miller, Kathy Ann; Boo, Sung Min

    2016-10-14

    DNA sequences from type specimens provide independent, objective characters that enhance the value of type specimens and permit the correct application of species names to phylogenetic clades and specimens. We provide mitochondrial genomes (mitogenomes) from archival type specimens of ten species in agar-producing red algal genera Gelidium and Pterocladiella. The genomes contain 43-44 genes, ranging in size from 24,910 to 24,970 bp with highly conserved gene synteny. Low Ka/Ks ratios of apocytochrome b and cytochrome oxidase genes support their utility as markers. Phylogenies of mitogenomes and cox1+rbcL sequences clarified classification at the genus and species levels. Three species formerly in Gelidium and Pterocladia are transferred to Pterocladiella: P. media comb. nov., P. musciformis comb. nov., and P. luxurians comb. and stat. nov. Gelidium sinicola is merged with G. coulteri because they share identical cox1 and rbcL sequences. We describe a new species, Gelidium millariana sp. nov., previously identified as G. isabelae from Australia. We demonstrate that mitogenomes from type specimens provide a new tool for typifying species in the Gelidiales and that there is an urgent need for analyzing mitogenomes from type specimens of red algae and other morphologically simple organisms for insight into their nomenclature, taxonomy and evolution.

  10. Mitogenomes from type specimens, a genotyping tool for morphologically simple species: ten genomes of agar-producing red algae

    PubMed Central

    Boo, Ga Hun; Hughey, Jeffery R.; Miller, Kathy Ann; Boo, Sung Min

    2016-01-01

    DNA sequences from type specimens provide independent, objective characters that enhance the value of type specimens and permit the correct application of species names to phylogenetic clades and specimens. We provide mitochondrial genomes (mitogenomes) from archival type specimens of ten species in agar-producing red algal genera Gelidium and Pterocladiella. The genomes contain 43–44 genes, ranging in size from 24,910 to 24,970 bp with highly conserved gene synteny. Low Ka/Ks ratios of apocytochrome b and cytochrome oxidase genes support their utility as markers. Phylogenies of mitogenomes and cox1+rbcL sequences clarified classification at the genus and species levels. Three species formerly in Gelidium and Pterocladia are transferred to Pterocladiella: P. media comb. nov., P. musciformis comb. nov., and P. luxurians comb. and stat. nov. Gelidium sinicola is merged with G. coulteri because they share identical cox1 and rbcL sequences. We describe a new species, Gelidium millariana sp. nov., previously identified as G. isabelae from Australia. We demonstrate that mitogenomes from type specimens provide a new tool for typifying species in the Gelidiales and that there is an urgent need for analyzing mitogenomes from type specimens of red algae and other morphologically simple organisms for insight into their nomenclature, taxonomy and evolution. PMID:27739454

  11. Centromere and telomere sequence alterations reflect the rapid genome evolution within the carnivorous plant genus Genlisea.

    PubMed

    Tran, Trung D; Cao, Hieu X; Jovtchev, Gabriele; Neumann, Pavel; Novák, Petr; Fojtová, Miloslava; Vu, Giang T H; Macas, Jiří; Fajkus, Jiří; Schubert, Ingo; Fuchs, Joerg

    2015-12-01

    Linear chromosomes of eukaryotic organisms invariably possess centromeres and telomeres to ensure proper chromosome segregation during nuclear divisions and to protect the chromosome ends from deterioration and fusion, respectively. While centromeric sequences may differ between species, with arrays of tandemly repeated sequences and retrotransposons being the most abundant sequence types in plant centromeres, telomeric sequences are usually highly conserved among plants and other organisms. The genome size of the carnivorous genus Genlisea (Lentibulariaceae) is highly variable. Here we study evolutionary sequence plasticity of these chromosomal domains at an intrageneric level. We show that Genlisea nigrocaulis (1C = 86 Mbp; 2n = 40) and G. hispidula (1C = 1550 Mbp; 2n = 40) differ as to their DNA composition at centromeres and telomeres. G. nigrocaulis and its close relative G. pygmaea revealed mainly 161 bp tandem repeats, while G. hispidula and its close relative G. subglabra displayed a combination of four retroelements at centromeric positions. G. nigrocaulis and G. pygmaea chromosome ends are characterized by the Arabidopsis-type telomeric repeats (TTTAGGG); G. hispidula and G. subglabra instead revealed two intermingled sequence variants (TTCAGG and TTTCAGG). These differences in centromeric and, surprisingly, also in telomeric DNA sequences, uncovered between groups with on average a > 9-fold genome size difference, emphasize the fast genome evolution within this genus. Such intrageneric evolutionary alteration of telomeric repeats with cytosine in the guanine-rich strand, not yet known for plants, might impact the epigenetic telomere chromatin modification. © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.

  12. Draft genome sequence of Actinotignum schaalii DSM 15541T: Genetic insights into the lifestyle, cell fitness and virulence.

    PubMed

    Yassin, Atteyet F; Langenberg, Stefan; Huntemann, Marcel; Clum, Alicia; Pillay, Manoj; Palaniappan, Krishnaveni; Varghese, Neha; Mikhailova, Natalia; Mukherjee, Supratim; Reddy, T B K; Daum, Chris; Shapiro, Nicole; Ivanova, Natalia; Woyke, Tanja; Kyrpides, Nikos C

    2017-01-01

    The permanent draft genome sequence of Actinotignum schaalii DSM 15541T is presented. The annotated genome includes 2,130,987 bp, with 1777 protein-coding and 58 rRNA-coding genes. Genome sequence analysis revealed absence of genes encoding for: components of the PTS systems, enzymes of the TCA cycle, glyoxylate shunt and gluconeogensis. Genomic data revealed that A. schaalii is able to oxidize carbohydrates via glycolysis, the nonoxidative pentose phosphate and the Entner-Doudoroff pathways. Besides, the genome harbors genes encoding for enzymes involved in the conversion of pyruvate to lactate, acetate and ethanol, which are found to be the end products of carbohydrate fermentation. The genome contained the gene encoding Type I fatty acid synthase required for de novo FAS biosynthesis. The plsY and plsX genes encoding the acyltransferases necessary for phosphatidic acid biosynthesis were absent from the genome. The genome harbors genes encoding enzymes responsible for isoprene biosynthesis via the mevalonate (MVA) pathway. Genes encoding enzymes that confer resistance to reactive oxygen species (ROS) were identified. In addition, A. schaalii harbors genes that protect the genome against viral infections. These include restriction-modification (RM) systems, type II toxin-antitoxin (TA), CRISPR-Cas and abortive infection system. A. schaalii genome also encodes several virulence factors that contribute to adhesion and internalization of this pathogen such as the tad genes encoding proteins required for pili assembly, the nanI gene encoding exo-alpha-sialidase, genes encoding heat shock proteins and genes encoding type VII secretion system. These features are consistent with anaerobic and pathogenic lifestyles. Finally, resistance to ciprofloxacin occurs by mutation in chromosomal genes that encode the subunits of DNA-gyrase (GyrA) and topisomerase IV (ParC) enzymes, while resistant to metronidazole was due to the frxA gene, which encodes NADPH-flavin oxidoreductase.

  13. The Sequenced Angiosperm Genomes and Genome Databases.

    PubMed

    Chen, Fei; Dong, Wei; Zhang, Jiawei; Guo, Xinyue; Chen, Junhao; Wang, Zhengjia; Lin, Zhenguo; Tang, Haibao; Zhang, Liangsheng

    2018-01-01

    Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology.

  14. WhopGenome: high-speed access to whole-genome variation and sequence data in R.

    PubMed

    Wittelsbürger, Ulrich; Pfeifer, Bastian; Lercher, Martin J

    2015-02-01

    The statistical programming language R has become a de facto standard for the analysis of many types of biological data, and is well suited for the rapid development of new algorithms. However, variant call data from population-scale resequencing projects are typically too large to be read and processed efficiently with R's built-in I/O capabilities. WhopGenome can efficiently read whole-genome variation data stored in the widely used variant call format (VCF) file format into several R data types. VCF files can be accessed either on local hard drives or on remote servers. WhopGenome can associate variants with annotations such as those available from the UCSC genome browser, and can accelerate the reading process by filtering loci according to user-defined criteria. WhopGenome can also read other Tabix-indexed files and create indices to allow fast selective access to FASTA-formatted sequence files. The WhopGenome R package is available on CRAN at http://cran.r-project.org/web/packages/WhopGenome/. A Bioconductor package has been submitted. lercher@cs.uni-duesseldorf.de. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  15. The Sequenced Angiosperm Genomes and Genome Databases

    PubMed Central

    Chen, Fei; Dong, Wei; Zhang, Jiawei; Guo, Xinyue; Chen, Junhao; Wang, Zhengjia; Lin, Zhenguo; Tang, Haibao; Zhang, Liangsheng

    2018-01-01

    Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology. PMID:29706973

  16. Complete genomic sequence of campylobacter jejuni subsp. jejuni HS:19 penner reference strain

    USDA-ARS?s Scientific Manuscript database

    Campylobacter jejuni subsp. jejuni (Cjj) infections are a leading cause of foodborne gastroenteritis and the most prevalent antecedent to Guillain-Barré syndrome (GBS). Capsular type Penner HS:19 is among several capsule types shown to be markers for GBS. This study describes the genome of Cjj HS:19...

  17. Permanent draft genome sequence of Desulfurococcus mobilis type strain DSM 2161, a thermoacidophilic sulfur-reducing crenarchaeon isolated from acidic hot springs of Hveravellir, Iceland.

    PubMed

    Susanti, Dwi; Johnson, Eric F; Lapidus, Alla; Han, James; Reddy, T B K; Pilay, Manoj; Ivanova, Natalia N; Markowitz, Victor M; Woyke, Tanja; Kyrpides, Nikos C; Mukhopadhyay, Biswarup

    2016-01-01

    This report presents the permanent draft genome sequence of Desulfurococcus mobilis type strain DSM 2161, an obligate anaerobic hyperthermophilic crenarchaeon that was isolated from acidic hot springs in Hveravellir, Iceland. D. mobilis utilizes peptides as carbon and energy sources and reduces elemental sulfur to H2S. A metabolic construction derived from the draft genome identified putative pathways for peptide degradation and sulfur respiration in this archaeon. Existence of several hydrogenase genes in the genome supported previous findings that H2 is produced during the growth of D. mobilis in the absence of sulfur. Interestingly, genes encoding glucose transport and utilization systems also exist in the D. mobilis genome though this archaeon does not utilize carbohydrate for growth. The draft genome of D. mobilis provides an additional mean for comparative genomic analysis of desulfurococci. In addition, our analysis on the Average Nucleotide Identity between D. mobilis and Desulfurococcus mucosus suggested that these two desulfurococci are two different strains of the same species.

  18. Permanent draft genome sequence of Desulfurococcus mobilis type strain DSM 2161, a thermoacidophilic sulfur-reducing crenarchaeon isolated from acidic hot springs of Hveravellir, Iceland

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Susanti, Dwi; Johnson, Eric F.; Lapidus, Alla

    Our report presents the permanent draft genome sequence of Desulfurococcus mobilis type strain DSM 2161, an obligate anaerobic hyperthermophilic crenarchaeon that was isolated from acidic hot springs in Hveravellir, Iceland. D. mobilis utilizes peptides as carbon and energy sources and reduces elemental sulfur to H 2S. A metabolic construction derived from the draft genome identified putative pathways for peptide degradation and sulfur respiration in this archaeon. Existence of several hydrogenase genes in the genome supported previous findings that H 2 is produced during the growth of D. mobilis in the absence of sulfur. Interestingly, genes encoding glucose transport and utilizationmore » systems also exist in the D. mobilis genome though this archaeon does not utilize carbohydrate for growth. The draft genome of D. mobilis provides an additional mean for comparative genomic analysis of desulfurococci. In addition, our analysis on the Average Nucleotide Identity between D. mobilis and Desulfurococcus mucosus suggested that these two desulfurococci are two different strains of the same species.« less

  19. Permanent draft genome sequence of Desulfurococcus mobilis type strain DSM 2161, a thermoacidophilic sulfur-reducing crenarchaeon isolated from acidic hot springs of Hveravellir, Iceland

    DOE PAGES

    Susanti, Dwi; Johnson, Eric F.; Lapidus, Alla; ...

    2016-01-13

    Our report presents the permanent draft genome sequence of Desulfurococcus mobilis type strain DSM 2161, an obligate anaerobic hyperthermophilic crenarchaeon that was isolated from acidic hot springs in Hveravellir, Iceland. D. mobilis utilizes peptides as carbon and energy sources and reduces elemental sulfur to H 2S. A metabolic construction derived from the draft genome identified putative pathways for peptide degradation and sulfur respiration in this archaeon. Existence of several hydrogenase genes in the genome supported previous findings that H 2 is produced during the growth of D. mobilis in the absence of sulfur. Interestingly, genes encoding glucose transport and utilizationmore » systems also exist in the D. mobilis genome though this archaeon does not utilize carbohydrate for growth. The draft genome of D. mobilis provides an additional mean for comparative genomic analysis of desulfurococci. In addition, our analysis on the Average Nucleotide Identity between D. mobilis and Desulfurococcus mucosus suggested that these two desulfurococci are two different strains of the same species.« less

  20. A comprehensive characterization of simple sequence repeats in pepper genomes provides valuable resources for marker development in Capsicum.

    PubMed

    Cheng, Jiaowen; Zhao, Zicheng; Li, Bo; Qin, Cheng; Wu, Zhiming; Trejo-Saavedra, Diana L; Luo, Xirong; Cui, Junjie; Rivera-Bustamante, Rafael F; Li, Shuaicheng; Hu, Kailin

    2016-01-07

    The sequences of the full set of pepper genomes including nuclear, mitochondrial and chloroplast are now available for use. However, the overall of simple sequence repeats (SSR) distribution in these genomes and their practical implications for molecular marker development in Capsicum have not yet been described. Here, an average of 868,047.50, 45.50 and 30.00 SSR loci were identified in the nuclear, mitochondrial and chloroplast genomes of pepper, respectively. Subsequently, systematic comparisons of various species, genome types, motif lengths, repeat numbers and classified types were executed and discussed. In addition, a local database composed of 113,500 in silico unique SSR primer pairs was built using a homemade bioinformatics workflow. As a pilot study, 65 polymorphic markers were validated among a wide collection of 21 Capsicum genotypes with allele number and polymorphic information content value per marker raging from 2 to 6 and 0.05 to 0.64, respectively. Finally, a comparison of the clustering results with those of a previous study indicated the usability of the newly developed SSR markers. In summary, this first report on the comprehensive characterization of SSR motifs in pepper genomes and the very large set of SSR primer pairs will benefit various genetic studies in Capsicum.

  1. A comprehensive characterization of simple sequence repeats in pepper genomes provides valuable resources for marker development in Capsicum

    PubMed Central

    Cheng, Jiaowen; Zhao, Zicheng; Li, Bo; Qin, Cheng; Wu, Zhiming; Trejo-Saavedra, Diana L.; Luo, Xirong; Cui, Junjie; Rivera-Bustamante, Rafael F.; Li, Shuaicheng; Hu, Kailin

    2016-01-01

    The sequences of the full set of pepper genomes including nuclear, mitochondrial and chloroplast are now available for use. However, the overall of simple sequence repeats (SSR) distribution in these genomes and their practical implications for molecular marker development in Capsicum have not yet been described. Here, an average of 868,047.50, 45.50 and 30.00 SSR loci were identified in the nuclear, mitochondrial and chloroplast genomes of pepper, respectively. Subsequently, systematic comparisons of various species, genome types, motif lengths, repeat numbers and classified types were executed and discussed. In addition, a local database composed of 113,500 in silico unique SSR primer pairs was built using a homemade bioinformatics workflow. As a pilot study, 65 polymorphic markers were validated among a wide collection of 21 Capsicum genotypes with allele number and polymorphic information content value per marker raging from 2 to 6 and 0.05 to 0.64, respectively. Finally, a comparison of the clustering results with those of a previous study indicated the usability of the newly developed SSR markers. In summary, this first report on the comprehensive characterization of SSR motifs in pepper genomes and the very large set of SSR primer pairs will benefit various genetic studies in Capsicum. PMID:26739748

  2. [Sequencing and analysis of the complete genome of a rabies virus isolate from Sika deer].

    PubMed

    Zhao, Yun-Jiao; Guo, Li; Huang, Ying; Zhang, Li-Shi; Qian, Ai-Dong

    2008-05-01

    One DRV strain was isolated from Sika Deer brain and sequenced. Nine overlapped gene fragments were amplified by RT-PCR through 3'-RACE and 5'-RACE method, and the complete DRV genome sequence was assembled. The length of the complete genome is 11863bp. The DRV genome organization was similar to other rabies viruses which were composed of five genes and the initiation sites and termination sites were highly conservative. There were mutated amino acids in important antigen sites of nucleoprotein and glycoprotein. The nucleotide and amino acid homologies of gene N, P, M, G, L in strains with completed genomie sequencing were compared. Compared with N gene sequence of other typical rabies viruses, a phylogenetic tree was established . These results indicated that DRV belonged to gene type 1. The highest homology compared with Chinese vaccine strain 3aG was 94%, and the lowest was 71% compared with WCBV. These findings provided theoretical reference for further research in rabies virus.

  3. In silico genomic insights into aspects of food safety and defense mechanisms of a potentially probiotic Lactobacillus pentosus MP-10 isolated from brines of naturally fermented Aloreña green table olives

    PubMed Central

    Pérez Montoro, Beatriz; Casado Muñoz, María del Carmen; Knapp, Charles W.; Gálvez, Antonio; Benomar, Nabil

    2017-01-01

    Lactobacillus pentosus MP-10, isolated from brines of naturally fermented Aloreña green table olives, exhibited high probiotic potential. The genome sequence of L. pentosus MP-10 is currently considered the largest genome among lactobacilli, highlighting the microorganism’s ecological flexibility and adaptability. Here, we analyzed the complete genome sequence for the presence of acquired antibiotic resistance and virulence determinants to understand their defense mechanisms and explore its putative safety in food. The annotated genome sequence revealed evidence of diverse mobile genetic elements, such as prophages, transposases and transposons involved in their adaptation to brine-associated niches. In-silico analysis of L. pentosus MP-10 genome sequence identified a CRISPR (clustered regularly interspaced short palindromic repeats)/cas (CRISPR-associated protein genes) as an immune system against foreign genetic elements, which consisted of six arrays (4–12 repeats) and eleven predicted cas genes [CRISPR1 and CRISPR2 consisted of 3 (Type II-C) and 8 (Type I) genes] with high similarity to L. pentosus KCA1. Bioinformatic analyses revealed L. pentosus MP-10 to be absent of acquired antibiotic resistance genes, and most resistance genes were related to efflux mechanisms; no virulence determinants were found in the genome. This suggests that L. pentosus MP-10 could be considered safe and with high-adaptation potential, which could facilitate its application as a starter culture and probiotic in food preparations. PMID:28651019

  4. Dynamics and Differential Proliferation of Transposable Elements During the Evolution of the B and A Genomes of Wheat

    PubMed Central

    Charles, Mathieu; Belcram, Harry; Just, Jérémy; Huneau, Cécile; Viollet, Agnès; Couloux, Arnaud; Segurens, Béatrice; Carter, Meredith; Huteau, Virginie; Coriton, Olivier; Appels, Rudi; Samain, Sylvie; Chalhoub, Boulos

    2008-01-01

    Transposable elements (TEs) constitute >80% of the wheat genome but their dynamics and contribution to size variation and evolution of wheat genomes (Triticum and Aegilops species) remain unexplored. In this study, 10 genomic regions have been sequenced from wheat chromosome 3B and used to constitute, along with all publicly available genomic sequences of wheat, 1.98 Mb of sequence (from 13 BAC clones) of the wheat B genome and 3.63 Mb of sequence (from 19 BAC clones) of the wheat A genome. Analysis of TE sequence proportions (as percentages), ratios of complete to truncated copies, and estimation of insertion dates of class I retrotransposons showed that specific types of TEs have undergone waves of differential proliferation in the B and A genomes of wheat. While both genomes show similar rates and relatively ancient proliferation periods for the Athila retrotransposons, the Copia retrotransposons proliferated more recently in the A genome whereas Gypsy retrotransposon proliferation is more recent in the B genome. It was possible to estimate for the first time the proliferation periods of the abundant CACTA class II DNA transposons, relative to that of the three main retrotransposon superfamilies. Proliferation of these TEs started prior to and overlapped with that of the Athila retrotransposons in both genomes. However, they also proliferated during the same periods as Gypsy and Copia retrotransposons in the A genome, but not in the B genome. As estimated from their insertion dates and confirmed by PCR-based tracing analysis, the majority of differential proliferation of TEs in B and A genomes of wheat (87 and 83%, respectively), leading to rapid sequence divergence, occurred prior to the allotetraploidization event that brought them together in Triticum turgidum and Triticum aestivum, <0.5 million years ago. More importantly, the allotetraploidization event appears to have neither enhanced nor repressed retrotranspositions. We discuss the apparent proliferation of TEs as resulting from their insertion, removal, and/or combinations of both evolutionary forces. PMID:18780739

  5. Microbial ecology in the age of genomics and metagenomics: concepts, tools, and recent advances.

    PubMed

    Xu, Jianping

    2006-06-01

    Microbial ecology examines the diversity and activity of micro-organisms in Earth's biosphere. In the last 20 years, the application of genomics tools have revolutionized microbial ecological studies and drastically expanded our view on the previously underappreciated microbial world. This review first introduces the basic concepts in microbial ecology and the main genomics methods that have been used to examine natural microbial populations and communities. In the ensuing three specific sections, the applications of the genomics in microbial ecological research are highlighted. The first describes the widespread application of multilocus sequence typing and representational difference analysis in studying genetic variation within microbial species. Such investigations have identified that migration, horizontal gene transfer and recombination are common in natural microbial populations and that microbial strains can be highly variable in genome size and gene content. The second section highlights and summarizes the use of four specific genomics methods (phylogenetic analysis of ribosomal RNA, DNA-DNA re-association kinetics, metagenomics, and micro-arrays) in analysing the diversity and potential activity of microbial populations and communities from a variety of terrestrial and aquatic environments. Such analyses have identified many unexpected phylogenetic lineages in viruses, bacteria, archaea, and microbial eukaryotes. Functional analyses of environmental DNA also revealed highly prevalent, but previously unknown, metabolic processes in natural microbial communities. In the third section, the ecological implications of sequenced microbial genomes are briefly discussed. Comparative analyses of prokaryotic genomic sequences suggest the importance of ecology in determining microbial genome size and gene content. The significant variability in genome size and gene content among strains and species of prokaryotes indicate the highly fluid nature of prokaryotic genomes, a result consistent with those from multilocus sequence typing and representational difference analyses. The integration of various levels of ecological analyses coupled to the application and further development of high throughput technologies are accelerating the pace of discovery in microbial ecology.

  6. LINE-1 Elements in Structural Variation and Disease

    PubMed Central

    Beck, Christine R.; Garcia-Perez, José Luis; Badge, Richard M.; Moran, John V.

    2014-01-01

    The completion of the human genome reference sequence ushered in a new era for the study and discovery of human transposable elements. It now is undeniable that transposable elements, historically dismissed as junk DNA, have had an instrumental role in sculpting the structure and function of our genomes. In particular, long interspersed element-1 (LINE-1 or L1) and short interspersed elements (SINEs) continue to affect our genome, and their movement can lead to sporadic cases of disease. Here, we briefly review the types of transposable elements present in the human genome and their mechanisms of mobility. We next highlight how advances in DNA sequencing and genomic technologies have enabled the discovery of novel retrotransposons in individual genomes. Finally, we discuss how L1-mediated retrotransposition events impact human genomes. PMID:21801021

  7. Analysis of Multiallelic CNVs by Emulsion Haplotype Fusion PCR.

    PubMed

    Tyson, Jess; Armour, John A L

    2017-01-01

    Emulsion-fusion PCR recovers long-range sequence information by combining products in cis from individual genomic DNA molecules. Emulsion droplets act as very numerous small reaction chambers in which different PCR products from a single genomic DNA molecule are condensed into short joint products, to unite sequences in cis from widely separated genomic sites. These products can therefore provide information about the arrangement of sequences and variants at a larger scale than established long-read sequencing methods. The method has been useful in defining the phase of variants in haplotypes, the typing of inversions, and determining the configuration of sequence variants in multiallelic CNVs. In this description we outline the rationale for the application of emulsion-fusion PCR methods to the analysis of multiallelic CNVs, and give practical details for our own implementation of the method in that context.

  8. Draft Genome Sequence of the Marine Bacterium Pseudomonas aestusnigri VGXO14T.

    PubMed

    Gomila, Margarita; Mulet, Magdalena; Lalucat, Jorge; García-Valdés, Elena

    2017-08-10

    The type strain of Pseudomonas aestusnigri (VGXO14), isolated from a crude oil-polluted marine sand sample, is a member of the P. pertucinogena phylogenetic group. Here, we report the genome sequence (3.83 Mb) of P. aestusnigri to gain insights into the biology and taxonomy of marine Pseudomonas spp. adapted to polluted marine habitats. Copyright © 2017 Gomila et al.

  9. Draft Genome Sequence of the Marine Bacterium Pseudomonas aestusnigri VGXO14T

    PubMed Central

    2017-01-01

    ABSTRACT The type strain of Pseudomonas aestusnigri (VGXO14), isolated from a crude oil-polluted marine sand sample, is a member of the P. pertucinogena phylogenetic group. Here, we report the genome sequence (3.83 Mb) of P. aestusnigri to gain insights into the biology and taxonomy of marine Pseudomonas spp. adapted to polluted marine habitats. PMID:28798177

  10. Draft genome sequence of a multidrug-resistant Aeromonas hydrophila ST508 strain carrying rmtD and blaCTX-M-131 isolated from a bloodstream infection.

    PubMed

    Moura, Quézia; Fernandes, Miriam R; Cerdeira, Louise; Santos, Ana Carolina M; de Souza, Tiago A; Ienne, Susan; Pignatari, Antonio Carlos C; Gales, Ana C; Silva, Rosa M; Lincopan, Nilton

    2017-09-01

    Here we report the draft genome sequence of a multidrug-resistant (MDR) Aeromonas hydrophila strain belonging to sequence type 508 (ST508) isolated from a human bloodstream infection. Assembly and annotation of this draft genome resulted in 5028498bp and revealed the presence of 16S rRNA methylase rmtD and bla CTX-M-131 genes encoding high-level resistance to aminoglycosides and cephalosporins, respectively, as well as multiple virulence genes. This draft genome can provide significant information for understanding mechanisms on the establishment and treatment of infections caused by this pathogen. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.

  11. Next-generation sequencing of cancer genomes: back to the future

    PubMed Central

    Walter, Matthew J; Graubert, Timothy A; DiPersio, John F; Mardis, Elaine R; Wilson, Richard K; Ley, Timothy J

    2010-01-01

    The systematic karyotyping of bone marrow cells was the first genomic approach used to personalize therapy for patients with leukemia. The paradigm established by cytogenetic studies in leukemia (from gene discovery to therapeutic intervention) now has the potential to be rapidly extended with the use of whole-genome sequencing approaches for cancer, which are now possible. We are now entering a period of exponential growth in cancer gene discovery that will provide many novel therapeutic targets for a large number of cancer types. Establishing the pathogenetic relevance of individual mutations is a major challenge that must be solved. However, after thousands of cancer genomes have been sequenced, the genetic rules of cancer will become known and new approaches for diagnosis, risk stratification and individualized treatment of cancer patients will surely follow. PMID:20161678

  12. Phylogenetic Distribution of CRISPR-Cas Systems in Antibiotic-Resistant Pseudomonas aeruginosa.

    PubMed

    van Belkum, Alex; Soriaga, Leah B; LaFave, Matthew C; Akella, Srividya; Veyrieras, Jean-Baptiste; Barbu, E Magda; Shortridge, Dee; Blanc, Bernadette; Hannum, Gregory; Zambardi, Gilles; Miller, Kristofer; Enright, Mark C; Mugnier, Nathalie; Brami, Daniel; Schicklin, Stéphane; Felderman, Martina; Schwartz, Ariel S; Richardson, Toby H; Peterson, Todd C; Hubby, Bolyn; Cady, Kyle C

    2015-11-24

    Pseudomonas aeruginosa is an antibiotic-refractory pathogen with a large genome and extensive genotypic diversity. Historically, P. aeruginosa has been a major model system for understanding the molecular mechanisms underlying type I clustered regularly interspaced short palindromic repeat (CRISPR) and CRISPR-associated protein (CRISPR-Cas)-based bacterial immune system function. However, little information on the phylogenetic distribution and potential role of these CRISPR-Cas systems in molding the P. aeruginosa accessory genome and antibiotic resistance elements is known. Computational approaches were used to identify and characterize CRISPR-Cas systems within 672 genomes, and in the process, we identified a previously unreported and putatively mobile type I-C P. aeruginosa CRISPR-Cas system. Furthermore, genomes harboring noninhibited type I-F and I-E CRISPR-Cas systems were on average ~300 kb smaller than those without a CRISPR-Cas system. In silico analysis demonstrated that the accessory genome (n = 22,036 genes) harbored the majority of identified CRISPR-Cas targets. We also assembled a global spacer library that aided the identification of difficult-to-characterize mobile genetic elements within next-generation sequencing (NGS) data and allowed CRISPR typing of a majority of P. aeruginosa strains. In summary, our analysis demonstrated that CRISPR-Cas systems play an important role in shaping the accessory genomes of globally distributed P. aeruginosa isolates. P. aeruginosa is both an antibiotic-refractory pathogen and an important model system for type I CRISPR-Cas bacterial immune systems. By combining the genome sequences of 672 newly and previously sequenced genomes, we were able to provide a global view of the phylogenetic distribution, conservation, and potential targets of these systems. This analysis identified a new and putatively mobile P. aeruginosa CRISPR-Cas subtype, characterized the diverse distribution of known CRISPR-inhibiting genes, and provided a potential new use for CRISPR spacer libraries in accessory genome analysis. Our data demonstrated the importance of CRISPR-Cas systems in modulating the accessory genomes of globally distributed strains while also providing substantial data for subsequent genomic and experimental studies in multiple fields. Understanding why certain genotypes of P. aeruginosa are clinically prevalent and adept at horizontally acquiring virulence and antibiotic resistance elements is of major clinical and economic importance. Copyright © 2015 van Belkum et al.

  13. Whole-Genome Sequencing of Sordaria macrospora Mutants Identifies Developmental Genes.

    PubMed

    Nowrousian, Minou; Teichert, Ines; Masloff, Sandra; Kück, Ulrich

    2012-02-01

    The study of mutants to elucidate gene functions has a long and successful history; however, to discover causative mutations in mutants that were generated by random mutagenesis often takes years of laboratory work and requires previously generated genetic and/or physical markers, or resources like DNA libraries for complementation. Here, we present an alternative method to identify defective genes in developmental mutants of the filamentous fungus Sordaria macrospora through Illumina/Solexa whole-genome sequencing. We sequenced pooled DNA from progeny of crosses of three mutants and the wild type and were able to pinpoint the causative mutations in the mutant strains through bioinformatics analysis. One mutant is a spore color mutant, and the mutated gene encodes a melanin biosynthesis enzyme. The causative mutation is a G to A change in the first base of an intron, leading to a splice defect. The second mutant carries an allelic mutation in the pro41 gene encoding a protein essential for sexual development. In the mutant, we detected a complex pattern of deletion/rearrangements at the pro41 locus. In the third mutant, a point mutation in the stop codon of a transcription factor-encoding gene leads to the production of immature fruiting bodies. For all mutants, transformation with a wild type-copy of the affected gene restored the wild-type phenotype. Our data demonstrate that whole-genome sequencing of mutant strains is a rapid method to identify developmental genes in an organism that can be genetically crossed and where a reference genome sequence is available, even without prior mapping information.

  14. Whole-Genome Sequencing of Sordaria macrospora Mutants Identifies Developmental Genes

    PubMed Central

    Nowrousian, Minou; Teichert, Ines; Masloff, Sandra; Kück, Ulrich

    2012-01-01

    The study of mutants to elucidate gene functions has a long and successful history; however, to discover causative mutations in mutants that were generated by random mutagenesis often takes years of laboratory work and requires previously generated genetic and/or physical markers, or resources like DNA libraries for complementation. Here, we present an alternative method to identify defective genes in developmental mutants of the filamentous fungus Sordaria macrospora through Illumina/Solexa whole-genome sequencing. We sequenced pooled DNA from progeny of crosses of three mutants and the wild type and were able to pinpoint the causative mutations in the mutant strains through bioinformatics analysis. One mutant is a spore color mutant, and the mutated gene encodes a melanin biosynthesis enzyme. The causative mutation is a G to A change in the first base of an intron, leading to a splice defect. The second mutant carries an allelic mutation in the pro41 gene encoding a protein essential for sexual development. In the mutant, we detected a complex pattern of deletion/rearrangements at the pro41 locus. In the third mutant, a point mutation in the stop codon of a transcription factor-encoding gene leads to the production of immature fruiting bodies. For all mutants, transformation with a wild type-copy of the affected gene restored the wild-type phenotype. Our data demonstrate that whole-genome sequencing of mutant strains is a rapid method to identify developmental genes in an organism that can be genetically crossed and where a reference genome sequence is available, even without prior mapping information. PMID:22384404

  15. Dispositional optimism and perceived risk interact to predict intentions to learn genome sequencing results.

    PubMed

    Taber, Jennifer M; Klein, William M P; Ferrer, Rebecca A; Lewis, Katie L; Biesecker, Leslie G; Biesecker, Barbara B

    2015-07-01

    Dispositional optimism and risk perceptions are each associated with health-related behaviors and decisions and other outcomes, but little research has examined how these constructs interact, particularly in consequential health contexts. The predictive validity of risk perceptions for health-related information seeking and intentions may be improved by examining dispositional optimism as a moderator, and by testing alternate types of risk perceptions, such as comparative and experiential risk. Participants (n = 496) had their genomes sequenced as part of a National Institutes of Health pilot cohort study (ClinSeq®). Participants completed a cross-sectional baseline survey of various types of risk perceptions and intentions to learn genome sequencing results for differing disease risks (e.g., medically actionable, nonmedically actionable, carrier status) and to use this information to change their lifestyle/health behaviors. Risk perceptions (absolute, comparative, and experiential) were largely unassociated with intentions to learn sequencing results. Dispositional optimism and comparative risk perceptions interacted, however, such that individuals higher in optimism reported greater intentions to learn all 3 types of sequencing results when comparative risk was perceived to be higher than when it was perceived to be lower. This interaction was inconsistent for experiential risk and absent for absolute risk. Independent of perceived risk, participants high in dispositional optimism reported greater interest in learning risks for nonmedically actionable disease and carrier status, and greater intentions to use genome information to change their lifestyle/health behaviors. The relationship between risk perceptions and intentions may depend on how risk perceptions are assessed and on degree of optimism. (c) 2015 APA, all rights reserved.

  16. Dispositional Optimism and Perceived Risk Interact to Predict Intentions to Learn Genome Sequencing Results

    PubMed Central

    Taber, Jennifer M.; Klein, William M. P.; Ferrer, Rebecca A.; Lewis, Katie L.; Biesecker, Leslie G.; Biesecker, Barbara B.

    2015-01-01

    Objective Dispositional optimism and risk perceptions are each associated with health-related behaviors and decisions and other outcomes, but little research has examined how these constructs interact, particularly in consequential health contexts. The predictive validity of risk perceptions for health-related information seeking and intentions may be improved by examining dispositional optimism as a moderator, and by testing alternate types of risk perceptions, such as comparative and experiential risk. Method Participants (n = 496) had their genomes sequenced as part of a National Institutes of Health pilot cohort study (ClinSeq®). Participants completed a cross-sectional baseline survey of various types of risk perceptions and intentions to learn genome sequencing results for differing disease risks (e.g., medically actionable, nonmedically actionable, carrier status) and to use this information to change their lifestyle/health behaviors. Results Risk perceptions (absolute, comparative, and experiential) were largely unassociated with intentions to learn sequencing results. Dispositional optimism and comparative risk perceptions interacted, however, such that individuals higher in optimism reported greater intentions to learn all 3 types of sequencing results when comparative risk was perceived to be higher than when it was perceived to be lower. This interaction was inconsistent for experiential risk and absent for absolute risk. Independent of perceived risk, participants high in dispositional optimism reported greater interest in learning risks for nonmedically actionable disease and carrier status, and greater intentions to use genome information to change their lifestyle/health behaviors. Conclusions The relationship between risk perceptions and intentions may depend on how risk perceptions are assessed and on degree of optimism. PMID:25313897

  17. REBASE--a database for DNA restriction and modification: enzymes, genes and genomes.

    PubMed

    Roberts, Richard J; Vincze, Tamas; Posfai, Janos; Macelis, Dana

    2015-01-01

    REBASE is a comprehensive and fully curated database of information about the components of restriction-modification (RM) systems. It contains fully referenced information about recognition and cleavage sites for both restriction enzymes and methyltransferases as well as commercial availability, methylation sensitivity, crystal and sequence data. All genomes that are completely sequenced are analyzed for RM system components, and with the advent of PacBio sequencing, the recognition sequences of DNA methyltransferases (MTases) are appearing rapidly. Thus, Type I and Type III systems can now be characterized in terms of recognition specificity merely by DNA sequencing. The contents of REBASE may be browsed from the web http://rebase.neb.com and selected compilations can be downloaded by FTP (ftp.neb.com). Monthly updates are also available via email. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Draft genome sequence of Sugiyamaella xylanicola UFMG-CM-Y1884T, a xylan-degrading yeast species isolated from rotting wood samples in Brazil.

    PubMed

    Batista, Thiago M; Moreira, Rennan G; Hilário, Heron O; Morais, Camila G; Franco, Glória R; Rosa, Luiz H; Rosa, Carlos A

    2017-03-01

    We present the draft genome sequence of the type strain of the yeast Sugiyamaella xylanicola UFMG-CM-Y1884 T (= UFMG-CA-32.1 T  = CBS 12683 T ), a xylan-degrading species capable of fermenting d-xylose to ethanol. The assembled genome has a size of ~ 13.7 Mb and a GC content of 33.8% and contains 5971 protein-coding genes. We identified 15 genes with significant similarity to the d-xylose reductase gene from several other fungal species. The draft genome assembled from whole-genome shotgun sequencing of the yeast Sugiyamaella xylanicola UFMG-CM-Y1884 T (= UFMG-CA-32.1 T  = CBS 12683 T ) has been deposited at DDBJ/ENA/GenBank under the accession number MQSX00000000 under version MQSX01000000.

  19. High quality draft genome sequence of Brachymonas chironomi AIMA4T (DSM 19884T) isolated from a Chironomus sp. egg mass

    DOE PAGES

    Laviad, Sivan; Lapidus, Alla; Han, James; ...

    2015-05-27

    Brachymonas chironomi strain AIMA4T (Halpern et al., 2009) is a Gram-negative, non-motile, aerobic, chemoorganotroph bacterium. B. chironomi is a member of the Comamonadaceae, a family within the class Betaproteobacteria. This species was isolated from a chironomid (Diptera; Chironomidae) egg mass, sampled from a waste stabilization pond in northern Israel. Phylogenetic analysis based on the 16S rRNA gene sequences placed strain AIMA4T in the genus Brachymonas. Here we describe the features of this organism, together with the complete genome sequence and annotation. We find the DNA GC content is 63.5%. The chromosome length is 2,509,395 bp. It encodes 2,382 proteins andmore » 68 RNA genes. Brachymonas chironomi genome is part of the Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG) project.« less

  20. Detection of BRAF mutations from solid tumors using Tumorplex™ technology

    PubMed Central

    Yo, Jacob; Hay, Katie S.L.; Vinayagamoorthy, Dilanthi; Maryanski, Danielle; Carter, Mark; Wiegel, Joseph; Vinayagamoorthy, Thuraiayah

    2015-01-01

    Allele specific multiplex sequencing (Tumorplex™) is a new molecular platform for the detection of single base mutation in tumor biopsies with high sensitivity for clinical testing. Tumorplex™ is a novel modification of Sanger sequencing technology that generates both mutant and wild type nucleotide sequences simultaneously in the same electropherogram. The molecular weight of the two sequencing primers are different such that the two sequences generated are separated, thus eliminating possible suppression of mutant signal by the more abundant wild type signal. Tumorplex™ platform technology was tested using BRAF mutation V600E. These studies were performed with cloned BRAF mutations and genomic DNA extracted from tumor cells carrying 50% mutant allele. The lower limit of detection for BRAF V600E was found to be 20 genome equivalents (GE) using genomic DNA extracted from mutation specific cell lines. Sensitivity of the assay was tested by challenging the mutant allele with wild type allele at 20 GE, and was able to detect BRAF mutant signal at a GE ration of 20:1 × 107 (mutant to wild-type). This level of sensitivity can detect low abundance of clonal mutations in tumor biopsies and eliminate the need for cell enrichment. • Tumorplex™ is a single tube assay that permits the recognition of mutant allele without suppression by wildtype signal. • Tumorplex™ provides a high level of sensitivity. • Tumorplex™ can be used with small sample size with mixed population of cells carrying heterogeneous gDNA. PMID:26258049

  1. [Comparative analysis of clustered regularly interspaced short palindromic repeats (CRISPRs) loci in the genomes of halophilic archaea].

    PubMed

    Zhang, Fan; Zhang, Bing; Xiang, Hua; Hu, Songnian

    2009-11-01

    Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a widespread system that provides acquired resistance against phages in bacteria and archaea. Here we aim to genome-widely analyze the CRISPR in extreme halophilic archaea, of which the whole genome sequences are available at present time. We used bioinformatics methods including alignment, conservation analysis, GC content and RNA structure prediction to analyze the CRISPR structures of 7 haloarchaeal genomes. We identified the CRISPR structures in 5 halophilic archaea and revealed a conserved palindromic motif in the flanking regions of these CRISPR structures. In addition, we found that the repeat sequences of large CRISPR structures in halophilic archaea were greatly conserved, and two types of predicted RNA secondary structures derived from the repeat sequences were likely determined by the fourth base of the repeat sequence. Our results support the proposal that the leader sequence may function as recognition site by having palindromic structures in flanking regions, and the stem-loop secondary structure formed by repeat sequences may function in mediating the interaction between foreign genetic elements and CAS-encoded proteins.

  2. Complete genome sequence of Polynucleobacter necessarius subsp. asymbioticus type strain (QLW-P1DMWA-1T)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Meincke, Linda; Copeland, A; Lapidus, Alla L.

    2012-01-01

    Polynucleobacter necessarius subsp. asymbioticus Hahn et al. 2009 is one of currently two subspecies of P. necessarius. While P. necessarius subsp. asymbioticus is a free-living bacterium, the closely related second subspecies, P. necessarius subsp. necessarius is an obligate endosymbiont living in the cytoplasm of freshwater ciliates of the genus Euplotes aediculatus. The two P. necessarius subspecies were the closest thus far reported phylogenetic neighbors that differ in their lifestyle as obligately free-living vs. obligate endosymbiontic, and they are the only members of the genus Polynucleobacter with completely sequenced genomes. The genome-sequenced strain represents a group of closely related strains notmore » distinguishable by 16S rRNA, 16S-23S ITS or glnA sequences, which is persistent in the home habitat of the strain and frequently contributes > 10% of total bacterial numbers in water samples of the habitat. The 2,159,490 bp long chromosome with a total of 2,088 protein-coding and 48 RNA genes was sequenced as part of the DOE Joint Genome Institute Community Sequencing Program 2006.« less

  3. Whole genome sequencing options for bacterial strain typing and epidemiologic analysis based on single nucleotide polymorphism versus gene-by-gene-based approaches.

    PubMed

    Schürch, A C; Arredondo-Alonso, S; Willems, R J L; Goering, R V

    2018-04-01

    Whole genome sequence (WGS)-based strain typing finds increasing use in the epidemiologic analysis of bacterial pathogens in both public health as well as more localized infection control settings. This minireview describes methodologic approaches that have been explored for WGS-based epidemiologic analysis and considers the challenges and pitfalls of data interpretation. Personal collection of relevant publications. When applying WGS to study the molecular epidemiology of bacterial pathogens, genomic variability between strains is translated into measures of distance by determining single nucleotide polymorphisms in core genome alignments or by indexing allelic variation in hundreds to thousands of core genes, assigning types to unique allelic profiles. Interpreting isolate relatedness from these distances is highly organism specific, and attempts to establish species-specific cutoffs are unlikely to be generally applicable. In cases where single nucleotide polymorphism or core gene typing do not provide the resolution necessary for accurate assessment of the epidemiology of bacterial pathogens, inclusion of accessory gene or plasmid sequences may provide the additional required discrimination. As with all epidemiologic analysis, realizing the full potential of the revolutionary advances in WGS-based approaches requires understanding and dealing with issues related to the fundamental steps of data generation and interpretation. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  4. Genome Comparison of Candida orthopsilosis Clinical Strains Reveals the Existence of Hybrids between Two Distinct Subspecies

    PubMed Central

    Pryszcz, Leszek P.; Németh, Tibor; Gácser, Attila; Gabaldón, Toni

    2014-01-01

    The Candida parapsilosis species complex comprises a group of emerging human pathogens of varying virulence. This complex was recently subdivided into three different species: C. parapsilosis sensu stricto, C. metapsilosis, and C. orthopsilosis. Within the latter, at least two clearly distinct subspecies seem to be present among clinical isolates (Type 1 and Type 2). To gain insight into the genomic differences between these subspecies, we undertook the sequencing of a clinical isolate classified as Type 1 and compared it with the available sequence of a Type 2 clinical strain. Unexpectedly, the analysis of the newly sequenced strain revealed a highly heterozygous genome, which we show to be the consequence of a hybridization event between both identified subspecies. This implicitly suggests that C. orthopsilosis is able to mate, a so-far unanswered question. The resulting hybrid shows a chimeric genome that maintains a similar gene dosage from both parental lineages and displays ongoing loss of heterozygosity. Several of the differences found between the gene content in both strains relate to virulent-related families, with the hybrid strain presenting a higher copy number of genes coding for efflux pumps or secreted lipases. Remarkably, two clinical strains isolated from distant geographical locations (Texas and Singapore) are descendants of the same hybrid line, raising the intriguing possibility of a relationship between the hybridization event and the global spread of a virulent clone. PMID:24747362

  5. Complete genome sequence of Planctomyces brasiliensis type strain (DSM 5305 T), phylogenomic analysis and reclassification of Planctomycetes including the descriptions of Gimesia gen. nov., Planctopirus gen. nov. and Rubinisphaera gen. nov. and emended descriptions of the order Planctomycetales and the family Planctomycetaceae

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Scheuner, Carmen; Tindall, Brian J.; Lu, Megan

    Planctomyces brasiliensis Schlesner 1990 belongs to the order Planctomycetales, which differs from other bacterial taxa by several distinctive features such as internal cell compartmentalization, multiplication by forming buds directly from the spherical, ovoid or pear-shaped mother cell and a cell wall consisting of a proteinaceous layer rather than a peptidoglycan layer. The first strains of P. brasiliensis, including the type strain IFAM 1448 T, were isolated from a water sample of Lagoa Vermelha, a salt pit near Rio de Janeiro, Brasil. This is the second completed genome sequence of a type strain of the genus Planctomyces to be published andmore » the sixth type strain genome sequence from the family Planctomycetaceae. The 6,006,602 bp long genome with its 4,811 protein-coding and 54 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. We study phylogenomic analyses that indicate that the classification within the Planctomycetaceae is partially in conflict with its evolutionary history, as the positioning of Schlesneria renders the genus Planctomyces paraphyletic. A re-analysis of published fatty-acid measurements also does not support the current arrangement of the two genera. A quantitative comparison of phylogenetic and phenotypic aspects indicates that the three Planctomyces species with type strains available in public culture collections should be placed in separate genera. Thus the genera Gimesia, Planctopirus and Rubinisphaera are proposed to accommodate P. maris, P. limnophilus and P. brasiliensis, respectively. Pronounced differences between the reported G + C content of Gemmata obscuriglobus, Singulisphaera acidiphila and Zavarzinella formosa and G + C content calculated from their genome sequences call for emendation of their species descriptions. Lastly, in addition to other features, the range of G + C values reported for the genera within the Planctomycetaceae indicates that the descriptions of the family and the order should be emended.« less

  6. Complete genome sequence of Planctomyces brasiliensis type strain (DSM 5305 T), phylogenomic analysis and reclassification of Planctomycetes including the descriptions of Gimesia gen. nov., Planctopirus gen. nov. and Rubinisphaera gen. nov. and emended descriptions of the order Planctomycetales and the family Planctomycetaceae

    DOE PAGES

    Scheuner, Carmen; Tindall, Brian J.; Lu, Megan; ...

    2014-12-08

    Planctomyces brasiliensis Schlesner 1990 belongs to the order Planctomycetales, which differs from other bacterial taxa by several distinctive features such as internal cell compartmentalization, multiplication by forming buds directly from the spherical, ovoid or pear-shaped mother cell and a cell wall consisting of a proteinaceous layer rather than a peptidoglycan layer. The first strains of P. brasiliensis, including the type strain IFAM 1448 T, were isolated from a water sample of Lagoa Vermelha, a salt pit near Rio de Janeiro, Brasil. This is the second completed genome sequence of a type strain of the genus Planctomyces to be published andmore » the sixth type strain genome sequence from the family Planctomycetaceae. The 6,006,602 bp long genome with its 4,811 protein-coding and 54 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. We study phylogenomic analyses that indicate that the classification within the Planctomycetaceae is partially in conflict with its evolutionary history, as the positioning of Schlesneria renders the genus Planctomyces paraphyletic. A re-analysis of published fatty-acid measurements also does not support the current arrangement of the two genera. A quantitative comparison of phylogenetic and phenotypic aspects indicates that the three Planctomyces species with type strains available in public culture collections should be placed in separate genera. Thus the genera Gimesia, Planctopirus and Rubinisphaera are proposed to accommodate P. maris, P. limnophilus and P. brasiliensis, respectively. Pronounced differences between the reported G + C content of Gemmata obscuriglobus, Singulisphaera acidiphila and Zavarzinella formosa and G + C content calculated from their genome sequences call for emendation of their species descriptions. Lastly, in addition to other features, the range of G + C values reported for the genera within the Planctomycetaceae indicates that the descriptions of the family and the order should be emended.« less

  7. BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons

    PubMed Central

    2011-01-01

    Background Visualisation of genome comparisons is invaluable for helping to determine genotypic differences between closely related prokaryotes. New visualisation and abstraction methods are required in order to improve the validation, interpretation and communication of genome sequence information; especially with the increasing amount of data arising from next-generation sequencing projects. Visualising a prokaryote genome as a circular image has become a powerful means of displaying informative comparisons of one genome to a number of others. Several programs, imaging libraries and internet resources already exist for this purpose, however, most are either limited in the number of comparisons they can show, are unable to adequately utilise draft genome sequence data, or require a knowledge of command-line scripting for implementation. Currently, there is no freely available desktop application that enables users to rapidly visualise comparisons between hundreds of draft or complete genomes in a single image. Results BLAST Ring Image Generator (BRIG) can generate images that show multiple prokaryote genome comparisons, without an arbitrary limit on the number of genomes compared. The output image shows similarity between a central reference sequence and other sequences as a set of concentric rings, where BLAST matches are coloured on a sliding scale indicating a defined percentage identity. Images can also include draft genome assembly information to show read coverage, assembly breakpoints and collapsed repeats. In addition, BRIG supports the mapping of unassembled sequencing reads against one or more central reference sequences. Many types of custom data and annotations can be shown using BRIG, making it a versatile approach for visualising a range of genomic comparison data. BRIG is readily accessible to any user, as it assumes no specialist computational knowledge and will perform all required file parsing and BLAST comparisons automatically. Conclusions There is a clear need for a user-friendly program that can produce genome comparisons for a large number of prokaryote genomes with an emphasis on rapidly utilising unfinished or unassembled genome data. Here we present BRIG, a cross-platform application that enables the interactive generation of comparative genomic images via a simple graphical-user interface. BRIG is freely available for all operating systems at http://sourceforge.net/projects/brig/. PMID:21824423

  8. BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons.

    PubMed

    Alikhan, Nabil-Fareed; Petty, Nicola K; Ben Zakour, Nouri L; Beatson, Scott A

    2011-08-08

    Visualisation of genome comparisons is invaluable for helping to determine genotypic differences between closely related prokaryotes. New visualisation and abstraction methods are required in order to improve the validation, interpretation and communication of genome sequence information; especially with the increasing amount of data arising from next-generation sequencing projects. Visualising a prokaryote genome as a circular image has become a powerful means of displaying informative comparisons of one genome to a number of others. Several programs, imaging libraries and internet resources already exist for this purpose, however, most are either limited in the number of comparisons they can show, are unable to adequately utilise draft genome sequence data, or require a knowledge of command-line scripting for implementation. Currently, there is no freely available desktop application that enables users to rapidly visualise comparisons between hundreds of draft or complete genomes in a single image. BLAST Ring Image Generator (BRIG) can generate images that show multiple prokaryote genome comparisons, without an arbitrary limit on the number of genomes compared. The output image shows similarity between a central reference sequence and other sequences as a set of concentric rings, where BLAST matches are coloured on a sliding scale indicating a defined percentage identity. Images can also include draft genome assembly information to show read coverage, assembly breakpoints and collapsed repeats. In addition, BRIG supports the mapping of unassembled sequencing reads against one or more central reference sequences. Many types of custom data and annotations can be shown using BRIG, making it a versatile approach for visualising a range of genomic comparison data. BRIG is readily accessible to any user, as it assumes no specialist computational knowledge and will perform all required file parsing and BLAST comparisons automatically. There is a clear need for a user-friendly program that can produce genome comparisons for a large number of prokaryote genomes with an emphasis on rapidly utilising unfinished or unassembled genome data. Here we present BRIG, a cross-platform application that enables the interactive generation of comparative genomic images via a simple graphical-user interface. BRIG is freely available for all operating systems at http://sourceforge.net/projects/brig/.

  9. Complete genome sequences of avian paramyxovirus type 8 strains goose/Delaware/1053/76 and pintail/Wakuya/20/78

    PubMed Central

    Paldurai, Anandan; Subbiah, Madhuri; Kumar, Sachin; Collins, Peter L.; Samal, Siba K.

    2009-01-01

    Complete consensus genome sequences were determined for avian paramyxovirus type 8 (APMV-8) strains goose/Delaware/1053/76 (prototype strain) and pintail/Wakuya/20/78. The genome of each strain is 15,342 nucleotides (nt) long, which follows the “rule of six”. The genome consists of six genes in the order of 3′-N-P/V/W-M-F-HN-L-5′. The genes are flanked on either side by conserved transcription start and stop signals, and have intergenic regions ranging from 1 to 30 nt. The genome contains a 55 nt leader region at the 3′-end and a 171 nt trailer region at the 5′-end. Comparison of sequences of strains Delaware and Wakuya showed nucleotide identity of 96.8% at the genome level and amino acid identities of 99.3%, 96.5%, 98.6%, 99.4%, 98.6% and 99.1% for the predicted N, P, M, F, HN and L proteins, respectively. Both strains grew in embryonated chicken eggs and in primary chicken embryo kidney cells, and 293T cells. Both strains contained only a single basic residue at the cleavage activation site of the F protein and their efficiency of replication in vitro depended on and was augmented by, the presence of exogenous protease in most cell lines. Sequence alignment and phylogenic analysis of the predicted amino acid sequence of APMV-8 strain Delaware proteins with the cognate proteins of other available APMV serotypes showed that APMV-8 is more closely related to APMV-2 and -6 than to APMV-1, -3 and -4. PMID:19341613

  10. Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula

    PubMed Central

    Macas, Jiří; Neumann, Pavel; Navrátilová, Alice

    2007-01-01

    Background Extraordinary size variation of higher plant nuclear genomes is in large part caused by differences in accumulation of repetitive DNA. This makes repetitive DNA of great interest for studying the molecular mechanisms shaping architecture and function of complex plant genomes. However, due to methodological constraints of conventional cloning and sequencing, a global description of repeat composition is available for only a very limited number of higher plants. In order to provide further data required for investigating evolutionary patterns of repeated DNA within and between species, we used a novel approach based on massive parallel sequencing which allowed a comprehensive repeat characterization in our model species, garden pea (Pisum sativum). Results Analysis of 33.3 Mb sequence data resulted in quantification and partial sequence reconstruction of major repeat families occurring in the pea genome with at least thousands of copies. Our results showed that the pea genome is dominated by LTR-retrotransposons, estimated at 140,000 copies/1C. Ty3/gypsy elements are less diverse and accumulated to higher copy numbers than Ty1/copia. This is in part due to a large population of Ogre-like retrotransposons which alone make up over 20% of the genome. In addition to numerous types of mobile elements, we have discovered a set of novel satellite repeats and two additional variants of telomeric sequences. Comparative genome analysis revealed that there are only a few repeat sequences conserved between pea and soybean genomes. On the other hand, all major families of pea mobile elements are well represented in M. truncatula. Conclusion We have demonstrated that even in a species with a relatively large genome like pea, where a single 454-sequencing run provided only 0.77% coverage, the generated sequences were sufficient to reconstruct and analyze major repeat families corresponding to a total of 35–48% of the genome. These data provide a starting point for further investigations of legume plant genomes based on their global comparative analysis and for the development of more sophisticated approaches for data mining. PMID:18031571

  11. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sobottka, Marcelo, E-mail: sobottka@mtm.ufsc.br; Hart, Andrew G., E-mail: ahart@dim.uchile.cl

    Highlights: {yields} We propose a simple stochastic model to construct primitive DNA sequences. {yields} The model provide an explanation for Chargaff's second parity rule in primitive DNA sequences. {yields} The model is also used to predict a novel type of strand symmetry in primitive DNA sequences. {yields} We extend the results for bacterial DNA sequences and compare distributional properties intrinsic to the model to statistical estimates from 1049 bacterial genomes. {yields} We find out statistical evidences that the novel type of strand symmetry holds for bacterial DNA sequences. -- Abstract: Chargaff's second parity rule for short oligonucleotides states that themore » frequency of any short nucleotide sequence on a strand is approximately equal to the frequency of its reverse complement on the same strand. Recent studies have shown that, with the exception of organellar DNA, this parity rule generally holds for double-stranded DNA genomes and fails to hold for single-stranded genomes. While Chargaff's first parity rule is fully explained by the Watson-Crick pairing in the DNA double helix, a definitive explanation for the second parity rule has not yet been determined. In this work, we propose a model based on a hidden Markov process for approximating the distributional structure of primitive DNA sequences. Then, we use the model to provide another possible theoretical explanation for Chargaff's second parity rule, and to predict novel distributional aspects of bacterial DNA sequences.« less

  12. Non contiguous-finished genome sequence and description of Enorma timonensis sp. nov.

    PubMed Central

    Ramasamy, Dhamodaran; Dubourg, Gregory; Robert, Catherine; Caputo, Aurelia; Papazian, Laurent; Raoult, Didier; Fournier, Pierre-Edouard

    2014-01-01

    Enorma timonensis strain GD5T sp. nov., is the type strain of E. timonensis sp. nov., a new member of the genus Enorma within the family Coriobacteriaceae. This strain, whose genome is described here, was isolated from the fecal flora of a 53-year-old woman hospitalized for 3 months in an intensive care unit. E. timonensis is an obligate anaerobic rod. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,365,123 bp long genome (1 chromosome but no plasmid) contains 2,060 protein-coding and 52 RNA genes, including 4 rRNA genes. PMID:25197477

  13. Non contiguous-finished genome sequence and description of Peptoniphilus obesi sp. nov.

    PubMed Central

    Mishra, Ajay Kumar; Hugon, Perrine; Lagier, Jean-Christophe; Nguyen, Thi-Thien; Robert, Catherine; Couderc, Carine; Raoult, Didier

    2013-01-01

    Peptoniphilus obesi strain ph1T sp. nov., is the type strain of P. obesi sp. nov., a new species within the genus Peptoniphilus. This strain, whose genome is described here, was isolated from the fecal flora of a 26-year-old woman suffering from morbid obesity. P. obesi strain ph1T is a Gram-positive, obligate anaerobic coccus. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 1,774,150 bp long genome (1 chromosome but no plasmid) contains 1,689 protein-coding and 29 RNA genes, including 5 rRNA genes. PMID:24019985

  14. Pancreatic islet enhancer clusters enriched in type 2 diabetes risk-associated variants.

    PubMed

    Pasquali, Lorenzo; Gaulton, Kyle J; Rodríguez-Seguí, Santiago A; Mularoni, Loris; Miguel-Escalada, Irene; Akerman, İldem; Tena, Juan J; Morán, Ignasi; Gómez-Marín, Carlos; van de Bunt, Martijn; Ponsa-Cobas, Joan; Castro, Natalia; Nammo, Takao; Cebola, Inês; García-Hurtado, Javier; Maestro, Miguel Angel; Pattou, François; Piemonti, Lorenzo; Berney, Thierry; Gloyn, Anna L; Ravassard, Philippe; Skarmeta, José Luis Gómez; Müller, Ferenc; McCarthy, Mark I; Ferrer, Jorge

    2014-02-01

    Type 2 diabetes affects over 300 million people, causing severe complications and premature death, yet the underlying molecular mechanisms are largely unknown. Pancreatic islet dysfunction is central in type 2 diabetes pathogenesis, and understanding islet genome regulation could therefore provide valuable mechanistic insights. We have now mapped and examined the function of human islet cis-regulatory networks. We identify genomic sequences that are targeted by islet transcription factors to drive islet-specific gene activity and show that most such sequences reside in clusters of enhancers that form physical three-dimensional chromatin domains. We find that sequence variants associated with type 2 diabetes and fasting glycemia are enriched in these clustered islet enhancers and identify trait-associated variants that disrupt DNA binding and islet enhancer activity. Our studies illustrate how islet transcription factors interact functionally with the epigenome and provide systematic evidence that the dysregulation of islet enhancers is relevant to the mechanisms underlying type 2 diabetes.

  15. Genome sequence of a diabetes-prone rodent reveals a mutation hotspot around the ParaHox gene cluster.

    PubMed

    Hargreaves, Adam D; Zhou, Long; Christensen, Josef; Marlétaz, Ferdinand; Liu, Shiping; Li, Fang; Jansen, Peter Gildsig; Spiga, Enrico; Hansen, Matilde Thye; Pedersen, Signe Vendelbo Horn; Biswas, Shameek; Serikawa, Kyle; Fox, Brian A; Taylor, William R; Mulley, John Frederick; Zhang, Guojie; Heller, R Scott; Holland, Peter W H

    2017-07-18

    The sand rat Psammomys obesus is a gerbil species native to deserts of North Africa and the Middle East, and is constrained in its ecology because high carbohydrate diets induce obesity and type II diabetes that, in extreme cases, can lead to pancreatic failure and death. We report the sequencing of the sand rat genome and discovery of an unusual, extensive, and mutationally biased GC-rich genomic domain. This highly divergent genomic region encompasses several functionally essential genes, and spans the ParaHox cluster which includes the insulin-regulating homeobox gene Pdx1. The sequence of sand rat Pdx1 has been grossly affected by GC-biased mutation, leading to the highest divergence observed for this gene across the Bilateria. In addition to genomic insights into restricted caloric intake in a desert species, the discovery of a localized chromosomal region subject to elevated mutation suggests that mutational heterogeneity within genomes could influence the course of evolution.

  16. Investigation of the Evolutionary Development of the Genus Bifidobacterium by Comparative Genomics

    PubMed Central

    Lugli, Gabriele Andrea; Milani, Christian; Turroni, Francesca; Duranti, Sabrina; Ferrario, Chiara; Viappiani, Alice; Mancabelli, Leonardo; Mangifesta, Marta; Taminiau, Bernard; Delcenserie, Véronique; van Sinderen, Douwe

    2014-01-01

    The Bifidobacterium genus currently encompasses 48 recognized taxa, which have been isolated from different ecosystems. However, the current phylogeny of bifidobacteria is hampered by the relative paucity of genotypic data. Here, we reassessed the taxonomy of this bacterial genus using genome-based approaches, which demonstrated that the previous taxonomic view of bifidobacteria contained several inconsistencies. In particular, high levels of genetic relatedness were shown to exist between particular Bifidobacterium taxa which would not justify their status as separate species. The results presented are here based on average nucleotide identity analysis involving the genome sequences for each type strain of the 48 bifidobacterial taxa, as well as phylogenetic comparative analysis of the predicted core genome of the Bifidobacterium genus. The results of this study demonstrate that the availability of complete genome sequences allows the reconstruction of a more robust bifidobacterial phylogeny than that obtained from a single gene-based sequence comparison, thus discouraging the assignment of a new or separate bifidobacterial taxon without such a genome-based validation. PMID:25107967

  17. Comparative analysis of mitochondrial genomes between a wheat K-type cytoplasmic male sterility (CMS) line and its maintainer line.

    PubMed

    Liu, Huitao; Cui, Peng; Zhan, Kehui; Lin, Qiang; Zhuo, Guoyin; Guo, Xiaoli; Ding, Feng; Yang, Wenlong; Liu, Dongcheng; Hu, Songnian; Yu, Jun; Zhang, Aimin

    2011-03-29

    Plant mitochondria, semiautonomous organelles that function as manufacturers of cellular ATP, have their own genome that has a slow rate of evolution and rapid rearrangement. Cytoplasmic male sterility (CMS), a common phenotype in higher plants, is closely associated with rearrangements in mitochondrial DNA (mtDNA), and is widely used to produce F1 hybrid seeds in a variety of valuable crop species. Novel chimeric genes deduced from mtDNA rearrangements causing CMS have been identified in several plants, such as rice, sunflower, pepper, and rapeseed, but there are very few reports about mtDNA rearrangements in wheat. In the present work, we describe the mitochondrial genome of a wheat K-type CMS line and compare it with its maintainer line. The complete mtDNA sequence of a wheat K-type (with cytoplasm of Aegilops kotschyi) CMS line, Ks3, was assembled into a master circle (MC) molecule of 647,559 bp and found to harbor 34 known protein-coding genes, three rRNAs (18 S, 26 S, and 5 S rRNAs), and 16 different tRNAs. Compared to our previously published sequence of a K-type maintainer line, Km3, we detected Ks3-specific mtDNA (> 100 bp, 11.38%) and repeats (> 100 bp, 29 units) as well as genes that are unique to each line: rpl5 was missing in Ks3 and trnH was absent from Km3. We also defined 32 single nucleotide polymorphisms (SNPs) in 13 protein-coding, albeit functionally irrelevant, genes, and predicted 22 unique ORFs in Ks3, representing potential candidates for K-type CMS. All these sequence variations are candidates for involvement in CMS. A comparative analysis of the mtDNA of several angiosperms, including those from Ks3, Km3, rice, maize, Arabidopsis thaliana, and rapeseed, showed that non-coding sequences of higher plants had mostly divergent multiple reorganizations during the mtDNA evolution of higher plants. The complete mitochondrial genome of the wheat K-type CMS line Ks3 is very different from that of its maintainer line Km3, especially in non-coding sequences. Sequence rearrangement has produced novel chimeric ORFs, which may be candidate genes for CMS. Comparative analysis of several angiosperm mtDNAs indicated that non-coding sequences are the most frequently reorganized during mtDNA evolution in higher plants.

  18. Integration of Genomic and Other Epidemiologic Data to Investigate and Control a Cross-Institutional Outbreak of Streptococcus pyogenes.

    PubMed

    Chalker, Victoria J; Smith, Alyson; Al-Shahib, Ali; Botchway, Stella; Macdonald, Emily; Daniel, Roger; Phillips, Sarah; Platt, Steven; Doumith, Michel; Tewolde, Rediat; Coelho, Juliana; Jolley, Keith A; Underwood, Anthony; McCarthy, Noel D

    2016-06-01

    Single-strain outbreaks of Streptococcus pyogenes infections are common and often go undetected. In 2013, two clusters of invasive group A Streptococcus (iGAS) infection were identified in independent but closely located care homes in Oxfordshire, United Kingdom. Investigation included visits to each home, chart review, staff survey, microbiologic sampling, and genome sequencing. S. pyogenes emm type 1.0, the most common circulating type nationally, was identified from all cases yielding GAS isolates. A tailored whole-genome reference population comprising epidemiologically relevant contemporaneous isolates and published isolates was assembled. Data were analyzed independently using whole-genome multilocus sequencing and single-nucleotide polymorphism analyses. Six isolates from staff and residents of the homes formed a single cluster that was separated from the reference population by both analytical approaches. No further cases occurred after mass chemoprophylaxis and enhanced infection control. Our findings demonstrate the ability of 2 independent analytical approaches to enable robust conclusions from nonstandardized whole-genome analysis to support public health practice.

  19. De novo prediction of human chromosome structures: Epigenetic marking patterns encode genome architecture.

    PubMed

    Di Pierro, Michele; Cheng, Ryan R; Lieberman Aiden, Erez; Wolynes, Peter G; Onuchic, José N

    2017-11-14

    Inside the cell nucleus, genomes fold into organized structures that are characteristic of cell type. Here, we show that this chromatin architecture can be predicted de novo using epigenetic data derived from chromatin immunoprecipitation-sequencing (ChIP-Seq). We exploit the idea that chromosomes encode a 1D sequence of chromatin structural types. Interactions between these chromatin types determine the 3D structural ensemble of chromosomes through a process similar to phase separation. First, a neural network is used to infer the relation between the epigenetic marks present at a locus, as assayed by ChIP-Seq, and the genomic compartment in which those loci reside, as measured by DNA-DNA proximity ligation (Hi-C). Next, types inferred from this neural network are used as an input to an energy landscape model for chromatin organization [Minimal Chromatin Model (MiChroM)] to generate an ensemble of 3D chromosome conformations at a resolution of 50 kilobases (kb). After training the model, dubbed Maximum Entropy Genomic Annotation from Biomarkers Associated to Structural Ensembles (MEGABASE), on odd-numbered chromosomes, we predict the sequences of chromatin types and the subsequent 3D conformational ensembles for the even chromosomes. We validate these structural ensembles by using ChIP-Seq tracks alone to predict Hi-C maps, as well as distances measured using 3D fluorescence in situ hybridization (FISH) experiments. Both sets of experiments support the hypothesis of phase separation being the driving process behind compartmentalization. These findings strongly suggest that epigenetic marking patterns encode sufficient information to determine the global architecture of chromosomes and that de novo structure prediction for whole genomes may be increasingly possible. Copyright © 2017 the Author(s). Published by PNAS.

  20. The Genome Sequence of Methanohalophilus mahii SLP T Reveals Differences in the Energy Metabolism among Members of the Methanosarcinaceae Inhabiting Freshwater and Saline Environments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Spring, Stefan; Scheuner, Carmen; Lapidus, Alla

    Methanohalophilus mahii is the type species of the genus Methanohalophilus , which currently comprises three distinct species with validly published names. Mhp. mahii represents moderately halophilic methanogenic archaea with a strictly methylotrophic metabolism. The type strain SLP T was isolated from hypersaline sediments collected from the southern arm of Great Salt Lake, Utah. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,012,424 bp genome is a single replicon with 2032 protein-coding and 63 RNA genes and part of the Genomic Encyclopedia of Bacteria and Archaea project. A comparison of the reconstructedmore » energy metabolism in the halophilic species Mhp. mahii with other representatives of the Methanosarcinaceae reveals some interesting differences to freshwater species.« less

Top