Science.gov

Sample records for genome sequencing centers

  1. The Genome Sequencing Center at NCGR

    SciTech Connect

    Schilkey, Faye

    2010-06-02

    Faye Schilkey from the National Center for Genome Resources discusses NCGR's research, sequencing and analysis experience on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  2. Genome Science: A Video Tour of the Washington University Genome Sequencing Center for High School and Undergraduate Students

    ERIC Educational Resources Information Center

    Flowers, Susan K.; Easter, Carla; Holmes, Andrea; Cohen, Brian; Bednarski, April E.; Mardis, Elaine R.; Wilson, Richard K.; Elgin, Sarah C. R.

    2005-01-01

    Sequencing of the human genome has ushered in a new era of biology. The technologies developed to facilitate the sequencing of the human genome are now being applied to the sequencing of other genomes. In 2004, a partnership was formed between Washington University School of Medicine Genome Sequencing Center's Outreach Program and Washington

  3. Genome Science: A Video Tour of the Washington University Genome Sequencing Center for High School and Undergraduate Students

    ERIC Educational Resources Information Center

    Flowers, Susan K.; Easter, Carla; Holmes, Andrea; Cohen, Brian; Bednarski, April E.; Mardis, Elaine R.; Wilson, Richard K.; Elgin, Sarah C. R.

    2005-01-01

    Sequencing of the human genome has ushered in a new era of biology. The technologies developed to facilitate the sequencing of the human genome are now being applied to the sequencing of other genomes. In 2004, a partnership was formed between Washington University School of Medicine Genome Sequencing Center's Outreach Program and Washington…

  4. Genome Science: A Video Tour of the Washington University Genome Sequencing Center for High School and Undergraduate Students

    PubMed Central

    2005-01-01

    Sequencing of the human genome has ushered in a new era of biology. The technologies developed to facilitate the sequencing of the human genome are now being applied to the sequencing of other genomes. In 2004, a partnership was formed between Washington University School of Medicine Genome Sequencing Center's Outreach Program and Washington University Department of Biology Science Outreach to create a video tour depicting the processes involved in large-scale sequencing. “Sequencing a Genome: Inside the Washington University Genome Sequencing Center” is a tour of the laboratory that follows the steps in the sequencing pipeline, interspersed with animated explanations of the scientific procedures used at the facility. Accompanying interviews with the staff illustrate different entry levels for a career in genome science. This video project serves as an example of how research and academic institutions can provide teachers and students with access and exposure to innovative technologies at the forefront of biomedical research. Initial feedback on the video from undergraduate students, high school teachers, and high school students provides suggestions for use of this video in a classroom setting to supplement present curricula. PMID:16341256

  5. Introducing National Center for Genome Resources (NCGR) Informatics (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Crow, John [National Center for Genome Resources

    2013-01-25

    John Crow from the National Center for Genome Resources discusses his organization's informatics at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  6. Introducing National Center for Genome Resources (NCGR) Informatics (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    SciTech Connect

    Crow, John

    2012-06-01

    John Crow from the National Center for Genome Resources discusses his organization's informatics at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  7. Integration of PacBio RS into Massive Parallel Sequencing and Data Analysis Pipelining at the UC Davis Genome Center

    PubMed Central

    Vanessa, Rashbrook; O'Geen, Henriette; Nguyen, Oanh; Ashtari, Siranoosh; Fan, Xiaohong; Kim, Ryan

    2013-01-01

    Whole genome sequencing and genomic biology has been widely adopted in many fields of biology as next-generation sequencing technology (NGS) has rapidly improved quality, read length, and throughput to make whole genome sequencing and association studies possible in a very cost effective manner. Continued improvement and development of sample preparation protocols and data analysis tools have been significant in helping to extend genome sequencing technology to genomes that were previously difficult to sequence. Recent arrival of Pacific Biosciences RS (PacBio) contributed in furthering such opportunity by providing options for single molecule long read sequencing in real time and kinetic analysis (methylation). PacBio has been employed successfully for sequencing low complexity genomic region such as extremely high GC, long repeats, rearrangement, gene fusion, etc. In this poster we present the optimization of PacBio sample preparation that was fine-tuned to meet unique challenges of sequencing through “difficult-to-sequence” template. We discuss the integration of PacBio into the wet lab equipped with other NGS platforms and data pipelining workflow including cloud computing and robotic sample preparation at the Genome Center. UC Davis Genome Center currently operates NGS technology platforms including HiSeq, MiSeq, PacBio, and has genotyping capacity using Illumina Infinium and GoldenGate technology. UC Davis Genome Center and Bioinformatics Program provides most up-to-date genome technology and informatics support tailored for specific biological goals meeting needs for more than 80 faculty members within Genome Center and more than 200 campus and off-campus researchers.

  8. Complete Genome Sequence of Cupriavidus basilensis 4G11, Isolated from the Oak Ridge Field Research Center Site

    PubMed Central

    Ray, Jayashree; Waters, R. Jordan; Skerker, Jeffrey M.; Kuehl, Jennifer V.; Price, Morgan N.; Huang, Jiawen; Chakraborty, Romy; Arkin, Adam P.

    2015-01-01

    Cupriavidus basilensis 4G11 was isolated from groundwater at the Oak Ridge Field Research Center (FRC) site. Here, we report the complete genome sequence and annotation of Cupriavidus basilensis 4G11. The genome contains 8,421,483 bp, 7,661 predicted protein-coding genes, and a total GC content of 64.4%. PMID:25977418

  9. Complete genome sequence of Cupriavidus basilensis 4G11, isolated from the Oak Ridge Field Research Center site

    DOE PAGESBeta

    Ray, Jayashree; Waters, R. Jordan; Skerker, Jeffrey M.; Kuehl, Jennifer V.; Price, Morgan N.; Huang, Jiawen; Chakraborty, Romy; Arkin, Adam P.; Deutschbauer, Adam

    2015-05-14

    Cupriavidus basilensis 4G11 was isolated from groundwater at the Oak Ridge Field Research Center (FRC) site. Here, we report the complete genome sequence and annotation of Cupriavidus basilensis 4G11. The genome contains 8,421,483 bp, 7,661 predicted protein-coding genes, and a total GC content of 64.4%.

  10. Whole Genome Sequencing

    MedlinePlus

    ... you want to learn. Search form Search Whole Genome Sequencing You are here Home Testing & Services Testing ... the full story, click here . What is whole genome sequencing? Whole genome sequencing is the mapping out ...

  11. Funding Opportunity: Genomic Data Centers

    Cancer.gov

    Funding Opportunity CCG, Funding Opportunity Center for Cancer Genomics, CCG, Center for Cancer Genomics, CCG RFA, Center for cancer genomics rfa, genomic data analysis network, genomic data analysis network centers,

  12. Next-generation sequencing-based genome diagnostics across clinical genetics centers: implementation choices and their effects

    PubMed Central

    Vrijenhoek, Terry; Kraaijeveld, Ken; Elferink, Martin; de Ligt, Joep; Kranendonk, Elcke; Santen, Gijs; Nijman, Isaac J; Butler, Derek; Claes, Godelieve; Costessi, Adalberto; Dorlijn, Wim; van Eyndhoven, Winfried; Halley, Dicky J J; van den Hout, Mirjam C G N; van Hove, Steven; Johansson, Lennart F; Jongbloed, Jan D H; Kamps, Rick; Kockx, Christel E M; de Koning, Bart; Kriek, Marjolein; Lekanne dit Deprez, Ronald; Lunstroo, Hans; Mannens, Marcel; Mook, Olaf R; Nelen, Marcel; Ploem, Corrette; Rijnen, Marco; Saris, Jasper J; Sinke, Richard; Sistermans, Erik; van Slegtenhorst, Marjon; Sleutels, Frank; van der Stoep, Nienke; van Tienhoven, Marianne; Vermaat, Martijn; Vogel, Maartje; Waisfisz, Quinten; Marjan Weiss, Janneke; van den Wijngaard, Arthur; van Workum, Wilbert; Ijntema, Helger; van der Zwaag, Bert; van IJcken, Wilfred FJ; den Dunnen, Johan; Veltman, Joris A; Hennekam, Raoul; Cuppen, Edwin

    2015-01-01

    Implementation of next-generation DNA sequencing (NGS) technology into routine diagnostic genome care requires strategic choices. Instead of theoretical discussions on the consequences of such choices, we compared NGS-based diagnostic practices in eight clinical genetic centers in the Netherlands, based on genetic testing of nine pre-selected patients with cardiomyopathy. We highlight critical implementation choices, including the specific contributions of laboratory and medical specialists, bioinformaticians and researchers to diagnostic genome care, and how these affect interpretation and reporting of variants. Reported pathogenic mutations were consistent for all but one patient. Of the two centers that were inconsistent in their diagnosis, one reported to have found ‘no causal variant', thereby underdiagnosing this patient. The other provided an alternative diagnosis, identifying another variant as causal than the other centers. Ethical and legal analysis showed that informed consent procedures in all centers were generally adequate for diagnostic NGS applications that target a limited set of genes, but not for exome- and genome-based diagnosis. We propose changes to further improve and align these procedures, taking into account the blurring boundary between diagnostics and research, and specific counseling options for exome- and genome-based diagnostics. We conclude that alternative diagnoses may infer a certain level of ‘greediness' to come to a positive diagnosis in interpreting sequencing results. Moreover, there is an increasing interdependence of clinic, diagnostics and research departments for comprehensive diagnostic genome care. Therefore, we invite clinical geneticists, physicians, researchers, bioinformatics experts and patients to reconsider their role and position in future diagnostic genome care. PMID:25626705

  13. Genomic Sequencing in Cancer

    PubMed Central

    Tuna, Musaffe; Amos, Christopher I.

    2013-01-01

    Genomic sequencing has provided critical insights into the etiology of both simple and complex diseases. The enormous reductions in cost for whole genome sequencing have allowed this technology to gain increasing use. Whole genome analysis has impacted research of complex diseases including cancer by allowing the systematic analysis of entire genomes in a single experiment, thereby facilitating the discovery of somatic and germline mutations, and identification of the function and impact of the insertions, deletions, and structural rearrangements, including translocations and inversions, in novel disease genes. Whole-genome sequencing can be used to provide the most comprehensive characterization of the cancer genome, the complexity of which we are only beginning to understand. Hence in this review, we focus on whole-genome sequencing in cancer. PMID:23178448

  14. Complete genome sequence of Cupriavidus basilensis 4G11, isolated from the Oak Ridge Field Research Center site

    SciTech Connect

    Ray, Jayashree; Waters, R. Jordan; Skerker, Jeffrey M.; Kuehl, Jennifer V.; Price, Morgan N.; Huang, Jiawen; Chakraborty, Romy; Arkin, Adam P.; Deutschbauer, Adam

    2015-05-14

    Cupriavidus basilensis 4G11 was isolated from groundwater at the Oak Ridge Field Research Center (FRC) site. Here, we report the complete genome sequence and annotation of Cupriavidus basilensis 4G11. The genome contains 8,421,483 bp, 7,661 predicted protein-coding genes, and a total GC content of 64.4%.

  15. Comparisons of eukaryotic genomic sequences.

    PubMed Central

    Karlin, S; Ladunga, I

    1994-01-01

    A method for assessing genomic similarity based on relative abundances of short oligonucleotides in large DNA samples is introduced. The method requires neither homologous sequences nor prior sequence alignments. The analysis centers on (i) dinucleotide (and tri- and tetra-) relative abundance extremes in genomic sequences, (ii) distances between sequences based on all dinucleotide relative abundance values, and (iii) a multidimensional partial ordering protocol. The emphasis in this paper is on assessments of general relatedness of genomes as distinguished from phylogenetic reconstructions. Our methods demonstrate that the relative abundance distances almost always differ more for genomic interspecific sequence comparisons than for genomic intraspecific sequence comparisons, indicating congruence over different genome sequence samples. The genomic comparisons are generally concordant with accepted phylogenies among vertebrate and among fungal species sequences. Several unexpected relationships between the major groups of metazoa, fungal, and protist DNA emerge, including the following. (i) Schizosaccharomyces pombe and Saccharomyces cerevisiae in dinucleotide relative abundance distances are as similar to each other as human is to bovine. (ii) S. cerevisiae, although substantially far from, is significantly closer to the vertebrates than are the invertebrates (Drosophila melanogaster, Bombyx mori, and Caenorhabditis elegans). This phenomenon may suggest variable evolutionary rates during the metazoan radiations and slower changes in the fungal divergences, and/or a polyphyletic origin of metazoa. (iii) The genomic sequences of D. melanogaster and Trypanosoma brucei are strikingly similar. This DNA similarity might be explained by some molecular adaptation of the parasite to its dipteran (tsetse fly) host, a host-parasite gene transfer hypothesis. Robustness of the methods may be due to a genomic signature of dinucleotide relative abundance values reflecting DNA structures related to dinucleotide stacking energies, constraints of DNA curvature, and mechanisms attendant to replication, repair, and recombination. PMID:7809130

  16. Center for Cancer Genomics | Office of Cancer Genomics

    Cancer.gov

    The Center for Cancer Genomics (CCG) was established to unify the National Cancer Institute's activities in cancer genomics, with the goal of advancing genomics research and translating findings into the clinic to improve the precise diagnosis and treatment of cancers. In addition to promoting genomic sequencing approach

  17. The Genome Center at Washington University

    SciTech Connect

    Fulton, Bob

    2010-06-02

    Bob Fulton of Washington University discusses the sequencing platforms in use at this large scale genome center on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  18. SINGLE CELL GENOME SEQUENCING

    PubMed Central

    Yilmaz, Suzan; Singh, Anup K.

    2011-01-01

    Whole genome amplification and next-generation sequencing of single cells has become a powerful approach for studying uncultivated microorganisms that represent 9099 % of all environmental microbes. Single cell sequencing enables not only the identification of microbes but also linking of functions to species, a feat not achievable by metagenomic techniques. Moreover, it allows the analysis of low abundance species that may be missed in community-based analyses. It has also proved very useful in complementing metagenomics in the assembly and binning of single genomes. With the advent of drastically cheaper and higher throughput sequencing technologies, it is expected that single cell sequencing will become a standard tool in studying the genome and transcriptome of microbial communities. PMID:22154471

  19. Assessment of incidental findings in 232 whole exome sequences from the Baylor-Hopkins Center for Mendelian Genomics

    PubMed Central

    Jurgens, Julie; Ling, Hua; Hetrick, Kurt; Pugh, Elizabeth; Schiettecatte, Francois; Doheny, Kimberly; Hamosh, Ada; Avramopoulos, Dimitri; Valle, David; Sobreira, Nara

    2014-01-01

    Purpose In March 2013, the ACMG published a list of 56 genes with the recommendation that pathogenic and likely pathogenic variants detected incidentally by clinical sequencing should be reported to patients. As an initial step in determining the practical consequences of this recommendation in the research setting, we searched for variants in these genes in 232 whole exome sequences from the Baylor-Hopkins Center for Mendelian Genomics. Methods We identified rare, nonsynonymous and splicing SNVs and indels and assessed variant classification using HGMD, Emory and ClinVar databases. We analyzed the burden of mutation in each of the 56 genes and determined which variants should be reported to patients. Results Our filtering resulted in 249 distinct variants, with a mean of 1.69 variants per individual. Half of these were novel missense mutations not classified by any of the 3 reference databases. Of 101 variants listed in HGMD, 48 were also in ClinVar and 3 were also in Emory; half of these shared variants were classified discordantly between databases. Some genes consistently had greater variation than others. In total, 0.86% of individuals had a reportable incidental variant. Conclusion These observations demonstrate some current challenges of assessing phenotypic consequences of incidental variants for counseling patients. PMID:25569433

  20. Towards Sequencing Cotton (Gossypium) Genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Despite rapidly decreasing costs and innovative technologies, sequencing of angiosperm genomes is not yet undertaken lightly. Generating larger amounts of sequence data more quickly does not address the difficulties of sequencing and assembling complex genomes de novo. The cotton genomes represent a...

  1. Assessing inhomogeneities in bacterial long genomic sequences

    SciTech Connect

    Karlin, S.

    1997-12-01

    Several complete prokaryotic and eukaryotic genomes are already at hand (S. cerevisiae, H. influenzae, M. genitalium, M. jannaschii, Synechocystis, sp.) and many are forthcoming (e.g., E. coli, H, pylori, C. elegans). The comparative analysis of genomes generally strives to identify genes and characterize function/structure relationships inferred mostly via amino acid sequence comparisons. We describe concisely methods for comparing genomes (or long contigs) emphasizing sequence features other than gene comparisons. These center on the following measures of genomic organization and sequence heterogeneity: (i) compositional biases of short oligonucleotides; (ii) dinucleotide relative abundance distances within and between genomes; (iii) rare and frequent word (oligonucleotide) determinations and their distributional properties; (iv) r-scan statistics assessing clustering, overdispersion, or excessive evenness of various marker arrays; and (v) characterizations of repeat structures in the genome. 20 refs., 3 figs.

  2. Genome Sequence Databases (Overview): Sequencing and Assembly

    SciTech Connect

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  3. Fungal Genome Sequencing and Bioenergy

    SciTech Connect

    Baker, Scott; Thykaer, Jette; Adney, William S; Brettin, Tom; Brockman, Fred; Dhaeseleer, Patrick; Martinez, A diego; Miller, R michael; Rokhsar, Daniel; Schadt, Christopher Warren; Torok, Tamas; Tuskan, Gerald A; Bennett, Joan; Berka, Randy; Briggs, Steven; Heitman, Joseph; Taylor, John; Turgeon, Gillian; Werner-Washburne, Maggie; Himmel, Michael E

    2008-01-01

    To date, the number of ongoing filamentous fungal genome sequencing projects is almost tenfold fewer than those of bacterial and archaeal genome projects. The fungi chosen for sequencing represent narrow kingdom diversity; most are pathogens or models. We advocate an ambitious, forward-looking phylogenetic-based genome sequencing program, designed to capture metabolic diversity within the fungal kingdom, thereby enhancing research into alternative bioenergy sources, bioremediation, and fungal-environment interactions. Published by Elsevier Ltd on behalf of The British Mycological Society.

  4. Fungal Genome Sequencing and Bioenergy

    SciTech Connect

    Schadt, Christopher Warren; Baker, Scott; Thykaer, Jette; Adney, William S; Brettin, Tom; Brockman, Fred; Dhaeseleer, Patrick; Martinez, A diego; Miller, R michael; Rokhsar, Daniel; Torok, Tamas; Tuskan, Gerald A; Bennett, Joan; Berka, Randy; Briggs, Steven; Heitman, Joseph; Rizvi, L; Taylor, John; Turgeon, Gillian; Werner-Washburne, Maggie; Himmel, Michael

    2008-01-01

    To date, the number of ongoing filamentous fungal genome sequencing projects is almost tenfold fewer than those of bacterial and archaeal genome projects. The fungi chosen for sequencing represent narrow kingdom diversity; most are pathogens or models. We advocate an ambitious, forward-looking phylogenetic-based genome sequencing program, designed to capture metabolic diversity within the fungal kingdom, thereby enhancing research into alternative bioenergy sources, bioremediation, and fungal-environment interactions.

  5. Fungal Genome Sequencing and Bioenergy

    SciTech Connect

    Baker, Scott E.; Thykaer, Jette; Adney, William S.; Brettin, T.; Brockman, Fred J.; D'haeseleer, Patrik; Martinez, Antonio D.; Miller, R. M.; Rokhsar, Daniel S.; Schadt, Christopher W.; Torok, Tamas; Tuskan, Gerald; Bennett, Joan W.; Berka, Randy; Briggs, Steve; Heitman, Joseph; Taylor, John; Turgeon, Barbara G.; Werner-Washburne, Maggie; Himmel, Michael E.

    2008-09-30

    To date, the number of ongoing filamentous fungal genome sequencing projects is almost tenfold fewer than those of bacterial and archaeal genome projects. The fungi chosen for sequencing represent narrow kingdom diversity; most are pathogens or models. We advocate an ambitious, forward-looking phylogenetic-based genome sequencing program, designed to capture metabolic diversity within the fungal kingdom, thereby enhancing research into alternative bioenergy sources, bioremediation, and fungal-environment interactions.

  6. Sequencing Centers Panel at SFAF

    SciTech Connect

    Schilkey, Faye; Ali, Johar; Grafham, Darren; Muzny, Donna; Fulton, Bob; Fitzgerald, Mike; Hostetler, Jessica; Daum, Chris

    2010-06-02

    From left to right: Faye Schilkey of NCGR, Johar Ali of OICR, Darren Grafham of Wellcome Trust Sanger Institute, Donna Muzny of the Baylor College of Medicine, Bob Fulton of Washington University, Mike Fitzgerald of the Broad Institute, Jessica Hostetler of the J. Craig Venter Institute and Chris Daum of the DOE Joint Genome Institute discuss sequencing technologies, applications and pipelines on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  7. Integrating sequence, evolution and functional genomics in regulatory genomics

    PubMed Central

    Vingron, Martin; Brazma, Alvis; Coulson, Richard; van Helden, Jacques; Manke, Thomas; Palin, Kimmo; Sand, Olivier; Ukkonen, Esko

    2009-01-01

    With genome analysis expanding from the study of genes to the study of gene regulation, 'regulatory genomics' utilizes sequence information, evolution and functional genomics measurements to unravel how regulatory information is encoded in the genome. PMID:19226437

  8. Evidence from genome-wide simple sequence repeat markers for a polyphyletic origin and secondary centers of genetic diversity of Brassica juncea in China and India.

    PubMed

    Chen, Sheng; Wan, Zhenjie; Nelson, Matthew N; Chauhan, Jitendra S; Redden, Robert; Burton, Wayne A; Lin, Ping; Salisbury, Phillip A; Fu, Tingdong; Cowling, Wallace A

    2013-01-01

    The oilseed Brassica juncea is an important crop with a long history of cultivation in India and China. Previous studies have suggested a polyphyletic origin of B. juncea and more than one migration from the primary to secondary centers of diversity. We investigated molecular genetic diversity based on 99 simple sequence repeat markers in 119 oilseed B. juncea varieties from China, India, Europe, and Australia to test whether molecular differentiation follows Vavilov's proposal of secondary centers of diversity in India and China. Two distinct groups were identified by markers in the A genome, and the same two groups were confirmed by markers in the B genome. Group 1 included accessions from central and western India, in addition to those from eastern China. Group 2 included accessions from central and western China, as well as those from northern and eastern India. European and Australian accessions were found only in Group 2. Chinese accessions had higher allelic diversity per accession (Group 1) and more private alleles per accession (Groups 1 and 2) than those from India. The marker data and geographic distribution of Groups 1 and 2 were consistent with two independent migrations of B. juncea from its center of origin in the Middle East and neighboring regions along trade routes to western China and northern India, followed by regional adaptation. Group 1 migrated further south and west in India, and further east in China, than Group 2. Group 2 showed diverse agroecological adaptation, with yellow-seeded spring-sown types in central and western China and brown-seeded autumn-sown types in India. PMID:23519868

  9. Sequencing Complex Genomic Regions

    SciTech Connect

    Eichler, Evan

    2009-05-28

    Evan Eichler, Howard Hughes Medical Investigator at the University of Washington, gives the May 28, 2009 keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM. Part 1 of 2

  10. Sequencing Complex Genomic Regions

    SciTech Connect

    Eichler, Evan

    2009-05-28

    Evan Eichler, Howard Hughes Medical Investigator at the University of Washington, gives the May 28, 2009 keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM. Part 2 of 2

  11. Poultry Genome Sequences: Progress and Outstanding Challenges

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The first build of the chicken genome sequence appeared in March 2004 – the first genome sequence of any animal agriculture species. That sequence was done primarily by whole genome shotgun Sanger sequencing, along with the use of an extensive BAC contig-based physical map to assemble the sequence ...

  12. Whole genome sequencing in pharmacogenomics

    PubMed Central

    Katsila, Theodora

    2015-01-01

    Pharmacogenomics aims to shed light on the role of genes and genomic variants in clinical treatment response. Although, several drug–gene relationships are characterized to date, many challenges still remain toward the application of pharmacogenomics in the clinic; clinical guidelines for pharmacogenomic testing are still in their infancy, whereas the emerging high throughput genotyping technologies produce a tsunami of new findings. Herein, the potential of whole genome sequencing on pharmacogenomics research and clinical application are highlighted. PMID:25859217

  13. Almost finished: the complete genome sequence of Mycosphaerella graminicola

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Mycosphaerella graminicola causes septoria tritici blotch of wheat. An 8.9x shotgun sequence of bread wheat strain IPO323 was generated through the Community Sequencing Program of the U.S. Department of Energy’s Joint Genome Institute (JGI), and was finished at the Stanford Human Genome Center. The ...

  14. Sequencing and mapping of the onion genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The cost of DNA sequencing continues to decline and, in the near future, it will become reasonable to undertake sequencing of the enormous nuclear genome of onion. We undertook sequencing of expressed and genomic regions of the onion genome to learn about the structure of the onion genome, as well a...

  15. Genome Sequence of Mycobacteriophage Phayonce.

    PubMed

    Pope, Welkin H; Jacobetz, Emily; Johnson, Courtney A; Kihle, Brooke L; Sobeski, Margaret A; Werner, Madison B; Adkins, Nancy L; Kramer, Zachary J; Montgomery, Matthew T; Grubb, Sarah R; Warner, Marcie H; Bowman, Charles A; Russell, Daniel A; Hatfull, Graham F

    2015-01-01

    Mycobacteriophage Phayonce is a newly isolated phage recovered from a soil sample in Pittsburgh, PA, using Mycobacterium smegmatis mc(2)155 as a host. Phayonce's genome is 49,203 bp long and contains 77 protein-coding genes, 23 of them having predicted functions. Phayonce shares a strong similarity in nucleotide sequence with phages of cluster P. PMID:26089413

  16. Fusicladium effusum draft genome sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The pecan scab fungus (Fusicladium effusum [G. Winter]) is an economically important pathogen of pecan (Carya illinoinensis [Wangenh]. K. Koch), on account of its impact on yield and quality of valuable nutmeats. We describe the first draft genome sequence of F. effusum, the characteristics of annot...

  17. Genome Sequence of Mycobacteriophage Phayonce

    PubMed Central

    Jacobetz, Emily; Johnson, Courtney A.; Kihle, Brooke L.; Sobeski, Margaret A.; Werner, Madison B.; Adkins, Nancy L.; Kramer, Zachary J.; Montgomery, Matthew T.; Grubb, Sarah R.; Warner, Marcie H.; Bowman, Charles A.; Russell, Daniel A.; Hatfull, Graham F.

    2015-01-01

    Mycobacteriophage Phayonce is a newly isolated phage recovered from a soil sample in Pittsburgh, PA, using Mycobacterium smegmatis mc2155 as a host. Phayonce’s genome is 49,203 bp long and contains 77 protein-coding genes, 23 of them having predicted functions. Phayonce shares a strong similarity in nucleotide sequence with phages of cluster P. PMID:26089413

  18. Plant genome sequencing - applications for crop improvement.

    PubMed

    Bolger, Marie E; Weisshaar, Bernd; Scholz, Uwe; Stein, Nils; Usadel, Björn; Mayer, Klaus F X

    2014-04-01

    It is over 10 years since the genome sequence of the first crop was published. Since then, the number of crop genomes sequenced each year has increased steadily. The amazing pace at which genome sequences are becoming available is largely due to the improvement in sequencing technologies both in terms of cost and speed. Modern sequencing technologies allow the sequencing of multiple cultivars of smaller crop genomes at a reasonable cost. Though many of the published genomes are considered incomplete, they nevertheless have proved a valuable tool to understand important crop traits such as fruit ripening, grain traits and flowering time adaptation. PMID:24679255

  19. Sequencing crop genomes: approaches and applications

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Plant genome sequencing methodology parrallels the sequencing of the human genome. The first projects were slow and very expensive. BAC by BAC approaches were utilized first and whole-genome shotgun sequencing rapidly replaced that approach. So called 'next generation' technologies such as short rea...

  20. Genome Sequence of Mycobacteriophage Momo

    PubMed Central

    Bina, Elizabeth A.; Brahme, Indraneel S.; Hill, Amy B.; Himmelstein, Philip H.; Hunsicker, Sara M.; Ish, Amanda R.; Le, Tinh S.; Martin, Mary M.; Moscinski, Catherine N.; Shetty, Sameer A.; Swierzewski, Tomasz; Iyengar, Varun B.; Kim, Hannah; Schafer, Claire E.; Grubb, Sarah R.; Warner, Marcie H.; Bowman, Charles A.; Russell, Daniel A.; Hatfull, Graham F.

    2015-01-01

    Momo is a newly discovered phage of Mycobacterium smegmatis mc2155. Momo has a double-stranded DNA genome 154,553 bp in length, with 233 predicted protein-encoding genes, 34 tRNA genes, and one transfer-messenger RNA (tmRNA) gene. Momo has a myoviral morphology and shares extensive nucleotide sequence similarity with subcluster C1 mycobacteriophages. PMID:26089415

  1. Draft Genome Sequences of the Onion Center Rot Pathogen Pantoea ananatis PA4 and Maize Brown Stalk Rot Pathogen P. ananatis BD442

    PubMed Central

    Weller-Stuart, Tania; Chan, Wai Yin; Venter, Stephanus N.; Smits, Theo H. M.; Duffy, Brion; Goszczynska, Teresa; Cowan, Don A.; de Maayer, Pieter

    2014-01-01

    Pantoea ananatis is an emerging phytopathogen that infects a broad spectrum of plant hosts. Here, we present the genomes of two South African isolates, P. ananatis PA4, which causes center rot of onion, and BD442, isolated from brown stalk rot of maize. PMID:25103759

  2. Sequencing Intractable DNA to Close Microbial Genomes

    SciTech Connect

    Hurt, Jr., Richard Ashley; Brown, Steven D; Podar, Mircea; Palumbo, Anthony Vito; Elias, Dwayne A

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  3. Fungal genome sequencing: basic biology to biotechnology.

    PubMed

    Sharma, Krishna Kant

    2016-08-01

    The genome sequences provide a first glimpse into the genomic basis of the biological diversity of filamentous fungi and yeast. The genome sequence of the budding yeast, Saccharomyces cerevisiae, with a small genome size, unicellular growth, and rich history of genetic and molecular analyses was a milestone of early genomics in the 1990s. The subsequent completion of fission yeast, Schizosaccharomyces pombe and genetic model, Neurospora crassa initiated a revolution in the genomics of the fungal kingdom. In due course of time, a substantial number of fungal genomes have been sequenced and publicly released, representing the widest sampling of genomes from any eukaryotic kingdom. An ambitious genome-sequencing program provides a wealth of data on metabolic diversity within the fungal kingdom, thereby enhancing research into medical science, agriculture science, ecology, bioremediation, bioenergy, and the biotechnology industry. Fungal genomics have higher potential to positively affect human health, environmental health, and the planet's stored energy. With a significant increase in sequenced fungal genomes, the known diversity of genes encoding organic acids, antibiotics, enzymes, and their pathways has increased exponentially. Currently, over a hundred fungal genome sequences are publicly available; however, no inclusive review has been published. This review is an initiative to address the significance of the fungal genome-sequencing program and provides the road map for basic and applied research. PMID:25721271

  4. Draft Genome Sequences of Fungus Aspergillus calidoustus

    PubMed Central

    Horn, Fabian; Linde, Jörg; Mattern, Derek J.; Walther, Grit; Guthke, Reinhard; Scherlach, Kirstin; Martin, Karin; Brakhage, Axel A.; Petzke, Lutz

    2016-01-01

    Here, we report the draft genome sequence of Aspergillus calidoustus (strain SF006504). The functional annotation of A. calidoustus predicts a relatively large number of secondary metabolite gene clusters. The presented genome sequence builds the basis for further genome mining. PMID:26966204

  5. Draft Genome Sequences of Fungus Aspergillus calidoustus.

    PubMed

    Horn, Fabian; Linde, Jörg; Mattern, Derek J; Walther, Grit; Guthke, Reinhard; Scherlach, Kirstin; Martin, Karin; Brakhage, Axel A; Petzke, Lutz; Valiante, Vito

    2016-01-01

    Here, we report the draft genome sequence of Aspergillus calidoustus (strain SF006504). The functional annotation of A. calidoustus predicts a relatively large number of secondary metabolite gene clusters. The presented genome sequence builds the basis for further genome mining. PMID:26966204

  6. Value of a newly sequenced bacterial genome

    PubMed Central

    Barbosa, Eudes GV; Aburjaile, Flavia F; Ramos, Rommel TJ; Carneiro, Adriana R; Le Loir, Yves; Baumbach, Jan; Miyoshi, Anderson; Silva, Artur; Azevedo, Vasco

    2014-01-01

    Next-generation sequencing (NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft (partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the “scientific value” of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information. PMID:24921006

  7. The fungal genome initiative and lessons learned from genome sequencing.

    PubMed

    Cuomo, Christina A; Birren, Bruce W

    2010-01-01

    The sequence of Saccharomyces cerevisiae enabled systematic genome-wide experimental approaches, demonstrating the power of having the complete genome of an organism. The rapid impact of these methods on research in yeast mobilized an effort to expand genomic resources for other fungi. The "fungal genome initiative" represents an organized genome sequencing effort to promote comparative and evolutionary studies across the fungal kingdom. Through such an approach, scientists can not only better understand specific organisms but also illuminate the shared and unique aspects of fungal biology that underlie the importance of fungi in biomedical research, health, food production, and industry. To date, assembled genomes for over 100 fungi are available in public databases, and many more sequencing projects are underway. Here, we discuss both examples of findings from comparative analysis of fungal sequences, with a specific emphasis on yeast genomes, and on the analytical approaches taken to mine fungal genomes. New sequencing methods are accelerating comparative studies of fungi by reducing the cost and difficulty of sequencing. This has driven more common use of sequencing applications, such as to study genome-wide variation in populations or to deeply profile RNA transcripts. These and further technological innovations will continue to be piloted in yeasts and other fungi, and will expand the applications of sequencing to study fungal biology. PMID:20946837

  8. Genome-tools: a flexible package for genome sequence analysis.

    PubMed

    Lee, William; Chen, Swaine L

    2002-12-01

    Genome-tools is a Perl module, a set of programs, and a user interface that facilitates access to genome sequence information. The package is flexible, extensible, and designed to be accessible and useful to both nonprogrammers and programmers. Any relatively well-annotated genome available with standard GenBank genome files may be used with genome-tools. A simple Web-based front end permits searching any available genome with an intuitive interface. Flexible design choices also make it simple to handle revised versions of genome annotation files as they change. In addition, programmers can develop cross-genomic tools and analyses with minimal additional overhead by combining genome-tools modules with newly written modules. Genome-tools runs on any computer platform for which Perl is available, including Unix, Microsoft Windows, and Mac OS. By simplifying the access to large amounts of genomic data, genome-tools may be especially useful for molecular biologists looking at newly sequenced genomes, for which few informatics tools are available. The genome-tools Web interface is accessible at http://genome-tools.sourceforge.net, and the source code is available at http://sourceforge.net/projects/genome-tools. PMID:12503321

  9. Towards a reference pecan genome sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The cost of generating DNA sequence data has declined dramatically over the previous 15 years as a result of the Human Genome Project and the potential applications of genome sequencing for human medicine. This cost reduction has generated renewed interest among crop breeding scientists in applying...

  10. Draft Genome Sequence of Lactobacillus plantarum 2025

    PubMed Central

    Khlebnikov, Valentin C.; Kosarev, Igor V.; Abramov, Vyacheslav M.

    2016-01-01

    A draft genome sequence of Lactobacillus plantarum 2025 was derived using Ion Torrent sequencing technology. The total size of the assembly (3.33 Mb) was in agreement with the genome sizes of other strains of this species. The data will assist in revealing the genes responsible for the specific properties of this strain. PMID:26744375

  11. Draft Genome Sequence of Lactobacillus plantarum 2025.

    PubMed

    Karlyshev, Andrey V; Khlebnikov, Valentin C; Kosarev, Igor V; Abramov, Vyacheslav M

    2016-01-01

    A draft genome sequence of Lactobacillus plantarum 2025 was derived using Ion Torrent sequencing technology. The total size of the assembly (3.33 Mb) was in agreement with the genome sizes of other strains of this species. The data will assist in revealing the genes responsible for the specific properties of this strain. PMID:26744375

  12. Draft Genome Sequence of a Klebsiella pneumoniae Carbapenemase-Positive Sequence Type 111 Pseudomonas aeruginosa Strain

    PubMed Central

    Dotson, Gabrielle A.; Dekker, John P.; Palmore, Tara N.; Segre, Julia A.

    2016-01-01

    Here, we report the draft genome sequence of a sequence type 111 Pseudomonas aeruginosa strain isolated in 2014 from a patient at the NIH Clinical Center. This P. aeruginosa strain exhibits pan-drug resistance and harbors the blaKPC-2 gene, encoding the Klebsiella pneumoniae carbapenemase enzyme, on a plasmid. PMID:26868386

  13. Human Genome Sequencing in Health and Disease

    PubMed Central

    Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.

    2013-01-01

    Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320

  14. Twenty years of bacterial genome sequencing.

    PubMed

    Loman, Nicholas J; Pallen, Mark J

    2015-12-01

    Twenty years ago, the publication of the first bacterial genome sequence, from Haemophilus influenzae, shook the world of bacteriology. In this Timeline, we review the first two decades of bacterial genome sequencing, which have been marked by three revolutions: whole-genome shotgun sequencing, high-throughput sequencing and single-molecule long-read sequencing. We summarize the social history of sequencing and its impact on our understanding of the biology, diversity and evolution of bacteria, while also highlighting spin-offs and translational impact in the clinic. We look forward to a 'sequencing singularity', where sequencing becomes the method of choice for as-yet unthinkable applications in bacteriology and beyond. PMID:26548914

  15. The genome sequence of parrot bornavirus 5.

    PubMed

    Guo, Jianhua; Tizard, Ian

    2015-12-01

    Although several new avian bornaviruses have recently been described, information on their evolution, virulence, and sequence are often limited. Here we report the complete genome sequence of parrot bornavirus 5 (PaBV-5) isolated from a case of proventricular dilatation disease in a Palm cockatoo (Probosciger aterrimus). The complete genome consists of 8842 nucleotides with distinct 5' and 3' end sequences. This virus shares nucleotide sequence identities of 69-74 % with other bornaviruses in the genomic regions excluding the 5' and 3' terminal sequences. Phylogenetic analysis based on the genomic regions demonstrated this new isolate is an isolated branch within the clade that includes the aquatic bird bornaviruses and the passerine bornaviruses. Based on phylogenetic analyses and its low nucleotide sequence identities with other bornavirus, we support the proposal that PaBV-5 be assigned to a new bornavirus species:- Psittaciform 2 bornavirus. PMID:26403158

  16. Sequence Maneuverer: tool for sequence extraction from genomes

    PubMed Central

    Yasmin, Tayyaba; Rehman, Inayat Ur; Ansari, Adnan Ahmad; liaqat, Khurrum; khan, Muhammad Irfan

    2012-01-01

    The availability of genomic sequences of many organisms has opened new challenges in many aspects particularly in terms of genome analysis. Sequence extraction is a vital step and many tools have been developed to solve this issue. These tools are available publically but have limitations with reference to the sequence extraction, length of the sequence to be extracted, organism specificity and lack of user friendly interface. We have developed a java based software package having three modules which can be used independently or sequentially. The tool efficiently extracts sequences from large datasets with few simple steps. It can efficiently extract multiple sequences of any desired length from a genome of any organism. The results are crosschecked by published data. Availability URL 1: http://ww3.comsats.edu.pk/bio/ResearchProjects.aspx URL 2: http://ww3.comsats.edu.pk/bio/SequenceManeuverer.aspx PMID:23275734

  17. P224-M Management Systems and Storage of Laboratory Information at Laval University Hospital Research Center Genomic Sequencing and Genotyping Platform

    PubMed Central

    Rodrigue, M. A.; Fortier, É.; Pagé, A. Gareau; Beaulieu, P.; Raymond, V.

    2007-01-01

    Between November 2005 and October 2006, a total of 106 academic and corporate researchers requested the services of the CRCHUL Genomic Sequencing and Genotyping Platform. All procedures are performed by only one manager and two research assistants. During this one-year period, we sequenced 205,000 DNA samples using only one ABI 3730xl DNA analyzer. We also genotyped 8200 samples using one ABI 3100 sequencer. The maximum delivery time was 48 h for all projects of less than 384 samples. These samples comprised 80% of all our orders. The management and integration of all these samples were conducted on a computer system that we developed in house. This laboratory information management system has significantly reduced processing and analysis time and concurrently increased production capacity. All data are stored on our Oracle 10G database installed on a UNIX server, which is connected to an interface application server module to increase security. The Web interfaces were developed in PL/SQL, which facilitates quick consultations, additions, and modifications. Our customers can thus easily submit their samples at www.sequences.crchul.ulaval.ca. All laboratory procedures are tracked and managed by two research assistants through our laboratory information management system (LIMS). This LIMS also allows the customers to follow the progress of their samples on line. Once the sequencing process is completed, all samples are analyzed by our personnel, a descriptive note is added to every sample, and an electronic notification is sent to the customer. Results are retrieved from our database by downloading a compressed file. Production of monthly financial statements is incorporated into our computer system, reducing administrative time and expenses with respect to management. In the near future, we will be integrating our most recent SNP genotyping service using the Sequenom MassArray into our information management system.

  18. Translational genomics for plant breeding with the genome sequence explosion.

    PubMed

    Kang, Yang Jae; Lee, Taeyoung; Lee, Jayern; Shim, Sangrea; Jeong, Haneul; Satyawan, Dani; Kim, Moon Young; Lee, Suk-Ha

    2016-04-01

    The use of next-generation sequencers and advanced genotyping technologies has propelled the field of plant genomics in model crops and plants and enhanced the discovery of hidden bridges between genotypes and phenotypes. The newly generated reference sequences of unstudied minor plants can be annotated by the knowledge of model plants via translational genomics approaches. Here, we reviewed the strategies of translational genomics and suggested perspectives on the current databases of genomic resources and the database structures of translated information on the new genome. As a draft picture of phenotypic annotation, translational genomics on newly sequenced plants will provide valuable assistance for breeders and researchers who are interested in genetic studies. PMID:26269219

  19. Genome Sequencing and Analysis Conference IV

    SciTech Connect

    Not Available

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  20. Genomic sequencing of Pleistocene cave bears

    SciTech Connect

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  1. The genome sequence of Drosophila melanogaster.

    SciTech Connect

    2000-03-24

    The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the {approximately}120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes {approximately}13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

  2. Sequencing the genome of the Atlantic salmon (Salmo salar)

    PubMed Central

    2010-01-01

    The International Collaboration to Sequence the Atlantic Salmon Genome (ICSASG) will produce a genome sequence that identifies and physically maps all genes in the Atlantic salmon genome and acts as a reference sequence for other salmonids. PMID:20887641

  3. Plantagora: Modeling Whole Genome Sequencing and Assembly of Plant Genomes

    PubMed Central

    Barthelson, Roger; McFarlin, Adam J.; Rounsley, Steven D.; Young, Sarah

    2011-01-01

    Background Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them. Methodology/Principal Findings For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website. Conclusions/Significance Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly further. PMID:22174807

  4. Microbial species delineation using whole genome sequences

    SciTech Connect

    Kyrpides, Nikos; Mukherjee, Supratim; Ivanova, Natalia; Mavrommatics, Kostas; Pati, Amrita; Konstantinidis, Konstantinos

    2014-10-20

    Species assignments in prokaryotes use a manual, poly-phasic approach utilizing both phenotypic traits and sequence information of phylogenetic marker genes. With thousands of genomes being sequenced every year, an automated, uniform and scalable approach exploiting the rich genomic information in whole genome sequences is desired, at least for the initial assignment of species to an organism. We have evaluated pairwise genome-wide Average Nucleotide Identity (gANI) values and alignment fractions (AFs) for nearly 13,000 genomes using our fast implementation of the computation, identifying robust and widely applicable hard cut-offs for species assignments based on AF and gANI. Using these cutoffs, we generated stable species-level clusters of organisms, which enabled the identification of several species mis-assignments and facilitated the assignment of species for organisms without species definitions.

  5. Complementary DNA sequencing: Expressed sequence tags and human genome project

    SciTech Connect

    Adams, M.D.; Kelley, J.M.; Gocayne, J.D.; Dubnick, M.; Wu, A.; Olde, B.; Moreno, R.F.; Kerlavage, A.R.; McCombie, W.R.; Venter, J.C. ); Polymeropoulos, M.H.; Hong Xiao; Merril, C.R. )

    1991-06-21

    Automated partial DNA sequencing was conducted on more than 600 randomly selected human brain complementary DNA (cDNA) clones to generate expressed sequence tags (ESTs). ESTs have applications in the discovery of new human genes, mapping of the human genome, and identification of coding regions in genomic sequences. Of the sequences generated, 337 represent new genes, including 48 with significant similarity to genes from other organisms, such as a yeast RNA polymerase II subunit; Drosophila kinesin, Notch, and Enhancer of split; and a murine tyrosine kinase receptor. Forty-six ESTs were mapped to chromosomes after amplification by the polymerase chain reaction. This fast approach to cDNA characterization will facilitate the tagging of most human genes in a few years at a fraction of the cost of complete genomic sequencing, provide new genetic markers, and serve as a resource in diverse biological research fields.

  6. Genome sequence of Coxiella burnetii strain Namibia

    PubMed Central

    2014-01-01

    We present the whole genome sequence and annotation of the Coxiella burnetii strain Namibia. This strain was isolated from an aborting goat in 1991 in Windhoek, Namibia. The plasmid type QpRS was confirmed in our work. Further genomic typing placed the strain into a unique genomic group. The genome sequence is 2,101,438 bp long and contains 1,979 protein-coding and 51 RNA genes, including one rRNA operon. To overcome the poor yield from cell culture systems, an additional DNA enrichment with whole genome amplification (WGA) methods was applied. We describe a bioinformatics pipeline for improved genome assembly including several filters with a special focus on WGA characteristics. PMID:25593636

  7. Streptococcal taxonomy based on genome sequence analyses

    PubMed Central

    2013-01-01

    The identification of the clinically relevant viridans streptococci group, at species level, is still problematic. The aim of this study was to extract taxonomic information from the complete genome sequences of 67 streptococci, comprising 19 species, by means of genomic analyses, multilocus sequence analysis (MLSA), average amino acid identity (AAI), genomic signatures, genome-to-genome distances (GGD) and codon usage bias. We then attempted to determine the usefulness of these genomic tools for species identification in streptococci. Our results showed that MLSA, AAI and GGD analyses are robust markers to identify streptococci at the species level, for instance, S. pneumoniae, S. mitis, and S. oralis. A Streptococcus species can be defined as a group of strains that share ≥ 95% DNA similarity in MLSA and AAI, and > 70% DNA identity in GGD. This approach allows an advanced understanding of bacterial diversity. PMID:24358875

  8. Genome sequence of Coxiella burnetii strain Namibia.

    PubMed

    Walter, Mathias C; Öhrman, Caroline; Myrtennäs, Kerstin; Sjödin, Andreas; Byström, Mona; Larsson, Pär; Macellaro, Anna; Forsman, Mats; Frangoulidis, Dimitrios

    2014-01-01

    We present the whole genome sequence and annotation of the Coxiella burnetii strain Namibia. This strain was isolated from an aborting goat in 1991 in Windhoek, Namibia. The plasmid type QpRS was confirmed in our work. Further genomic typing placed the strain into a unique genomic group. The genome sequence is 2,101,438 bp long and contains 1,979 protein-coding and 51 RNA genes, including one rRNA operon. To overcome the poor yield from cell culture systems, an additional DNA enrichment with whole genome amplification (WGA) methods was applied. We describe a bioinformatics pipeline for improved genome assembly including several filters with a special focus on WGA characteristics. PMID:25593636

  9. Genome sequence and analysis of Lactobacillus helveticus

    PubMed Central

    Cremonesi, Paola; Chessa, Stefania; Castiglioni, Bianca

    2013-01-01

    The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of Lactobacillus helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE) inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract. As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones. PMID:23335916

  10. Sequencing and comparing whole mitochondrial genomes ofanimals

    SciTech Connect

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  11. Complete genome sequence of arracacha mottle virus.

    PubMed

    Orílio, Anelise F; Lucinda, Natalia; Dusi, André N; Nagata, Tatsuya; Inoue-Nagata, Alice K

    2013-01-01

    Arracacha mottle virus (AMoV) is the only potyvirus reported to infect arracacha (Arracacia xanthorrhiza) in Brazil. Here, the complete genome sequence of an isolate of AMoV was determined to be 9,630 nucleotides in length, excluding the 3' poly-A tail, and encoding a polyprotein of 3,135 amino acids and a putative P3N-PIPO protein. Its genomic organization is typical of a member of the genus Potyvirus, containing all conserved motifs. Its full genome sequence shared 56.2 % nucleotide identity with sunflower chlorotic mottle virus and verbena virus Y, the most closely related viruses. PMID:23001696

  12. Genomic Sequencing of Single Microbial Cells from Environmental Samples

    SciTech Connect

    Ishoey, Thomas; Woyke, Tanja; Stepanauskas, Ramunas; Novotny, Mark; Lasken, Roger S.

    2008-02-01

    Recently developed techniques allow genomic DNA sequencing from single microbial cells [Lasken RS: Single-cell genomic sequencing using multiple displacement amplification, Curr Opin Microbiol 2007, 10:510-516]. Here, we focus on research strategies for putting these methods into practice in the laboratory setting. An immediate consequence of single-cell sequencing is that it provides an alternative to culturing organisms as a prerequisite for genomic sequencing. The microgram amounts of DNA required as template are amplified from a single bacterium by a method called multiple displacement amplification (MDA) avoiding the need to grow cells. The ability to sequence DNA from individual cells will likely have an immense impact on microbiology considering the vast numbers of novel organisms, which have been inaccessible unless culture-independent methods could be used. However, special approaches have been necessary to work with amplified DNA. MDA may not recover the entire genome from the single copy present in most bacteria. Also, some sequence rearrangements can occur during the DNA amplification reaction. Over the past two years many research groups have begun to use MDA, and some practical approaches to single-cell sequencing have been developed. We review the consensus that is emerging on optimum methods, reliability of amplified template, and the proper interpretation of 'composite' genomes which result from the necessity of combining data from several single-cell MDA reactions in order to complete the assembly. Preferred laboratory methods are considered on the basis of experience at several large sequencing centers where >70% of genomes are now often recovered from single cells. Methods are reviewed for preparation of bacterial fractions from environmental samples, single-cell isolation, DNA amplification by MDA, and DNA sequencing.

  13. Bacterial genome sequencing and drug discovery.

    PubMed

    Allsop, A E

    1998-12-01

    The availability of bacterial genome sequence information has opened up many new strategies for antibacterial drug hunting. There are obvious benefits for the identification and evaluation of new drug targets, but genomic-based technology is also beginning to provide new tools for the downstream, preclinical, optimisation of compounds. The greatest benefit from these new approaches lies in the ability to examine the entire genome (or several genomes) simultaneously and in total. In this way, one potential target can be evaluated against another, and either the total effects of functional impairment can be established or the effects of a compound can be compared across species. PMID:9889137

  14. Global Alignment System for Large Genomic Sequencing

    Energy Science and Technology Software Center (ESTSC)

    2002-03-01

    AVID is a global alignment system tailored for the alignment of large genomic sequences up to megabases in length. Features include the possibility of one sequence being in draft form, fast alignment, robustness and accuracy. The method is an anchor based alignment using maximal matches derived from suffix trees.

  15. Complete genome sequence of trivittatus virus.

    PubMed

    Groseth, Allison; Vine, Veronica; Weisend, Carla; Ebihara, Hideki

    2015-10-01

    Trivittatus virus (family Bunyaviridae, genus Orthobunyavirus) represents an important genetic intermediate between the California encephalitis group and the Bwamba/Pongola and Nyando groups. Here, we report the first complete genome sequence of the prototype (Eklund) strain, isolated in 1948, which, interestingly, shows only a few differences when compared to partial sequences of modern strains. PMID:26212363

  16. Draft Genome Sequence of Goose Dicistrovirus.

    PubMed

    Greninger, Alexander L; Jerome, Keith R

    2016-01-01

    We report the draft genome sequence of goose dicistrovirus assembled from the filtered feces of a Canadian goose from South Lake Union in Seattle, Washington. The 9.1-kb dicistronic RNA virus falls within the family Dicistroviridae; however, it shares <33% translated amino acid sequence within the nonstructural open reading frame (ORF) from aparavirus or cripavirus. PMID:26941149

  17. Draft Genome Sequence of Goose Dicistrovirus

    PubMed Central

    Jerome, Keith R.

    2016-01-01

    We report the draft genome sequence of goose dicistrovirus assembled from the filtered feces of a Canadian goose from South Lake Union in Seattle, Washington. The 9.1-kb dicistronic RNA virus falls within the family Dicistroviridae; however, it shares <33% translated amino acid sequence within the nonstructural open reading frame (ORF) from aparavirus or cripavirus. PMID:26941149

  18. Complete Genome Sequencing of Trivittatus virus

    PubMed Central

    Groseth, Allison; Vine, Veronica; Weisend, Carla; Ebihara, Hideki

    2015-01-01

    Trivittatus virus (family Bunyaviridae, genus Orthobunyavirus) represents an important genetic intermediate between the California encephalitis group, and Bwamba/Pongola and Nyando groups. Here, we report the first complete genome sequence of the prototype (Eklund) strain, isolated in 1948, which interestingly shows only few differences compared to partial sequences of modern strains. PMID:26212363

  19. Mining for single nucleotide polymorphisms in pig genome sequence data

    PubMed Central

    Kerstens, Hindrik HD; Kollers, Sonja; Kommadath, Arun; del Rosario, Marisol; Dibbits, Bert; Kinders, Sylvia M; Crooijmans, Richard P; Groenen, Martien AM

    2009-01-01

    Background Single nucleotide polymorphisms (SNPs) are ideal genetic markers due to their high abundance and the highly automated way in which SNPs are detected and SNP assays are performed. The number of SNPs identified in the pig thus far is still limited. Results A total of 4.8 million whole genome shotgun sequences obtained from the NCBI trace-repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project" were analysed for the presence of SNPs. Available BAC and BAC-end sequences and their naming and mapping information, all obtained from SangerInstitute FTP site, served as a rough assembly of a reference genome. In 1.2 Gb of pig genome sequence, we identified 98,151 SNPs in which one of the sequences in the alignment represented the polymorphism and 6,374 SNPs in which two sequences represent an identical polymorphism. To benchmark the SNP identification method, 163 SNPs, in which the polymorphism was represented twice in the sequence alignment, were selected and tested on a panel of three purebred boar lines and wild boar. Of these 163 in silico identified SNPs, 134 were shown to be polymorphic in our animal panel. Conclusion This SNP identification method, which mines for SNPs in publicly available porcine shotgun sequences repositories, provides thousands of high quality SNPs. Benchmarking in an animal panel showed that more than 80% of the predicted SNPs represented true genetic variation. PMID:19126189

  20. Genomic sequence analysis tools: a user's guide.

    PubMed

    Fortna, A; Gardiner, K

    2001-03-01

    The wealth of information from various genome sequencing projects provides the biologist with a new perspective from which to analyze, and design experiments with, mammalian systems. The complexity of the information, however, requires new software tools, and numerous such tools are now available. Which type and which specific system is most effective depends, in part, upon how much sequence is to be analyzed and with what level of experimental support. Here we survey a number of mammalian genomic sequence analysis systems with respect to the data they provide and the ease of their use. The hope is to aid the experimental biologist in choosing the most appropriate tool for their analyses. PMID:11226611

  1. Computational Genomics: From Genome Sequence To Global Gene Regulation

    NASA Astrophysics Data System (ADS)

    Li, Hao

    2000-03-01

    As various genome projects are shifting to the post-sequencing phase, it becomes a big challenge to analyze the sequence data and extract biological information using computational tools. In the past, computational genomics has mainly focused on finding new genes and mapping out their biological functions. With the rapid accumulation of experimental data on genome-wide gene activities, it is now possible to understand how genes are regulated on a genomic scale. A major mechanism for gene regulation is to control the level of transcription, which is achieved by regulatory proteins that bind to short DNA sequences - the regulatory elements. We have developed a new approach to identifying regulatory elements in genomes. The approach formalizes how one would proceed to decipher a ``text'' consisting of a long string of letters written in an unknown language that did not delineate words. The algorithm is based on a statistical mechanics model in which the sequence is segmented probabilistically into ``words'' and a ``dictionary'' of ``words'' is built concurrently. For the control regions in the yeast genome, we built a ``dictionary'' of about one thousand words which includes many known as well as putative regulatory elements. I will discuss how we can use this dictionary to search for genes that are likely to be regulated in a similar fashion and to analyze gene expression data generated from DNA micro-array experiments.

  2. Genome Sequence of Mycobacteriophage Cabrinians

    PubMed Central

    Chudoff, Dylan; Conboy, Andrew; Conboy, Danielle; Atoulelou, Mireille; Hasan, Sakina; Martinez, Alexandria; Mastrando, Jessica; Roy, Renoy; Schmidt, Robert; Sheed, Kabreeze; Smith, Jewel; Sperratore, Morgan; Struga, Rexhina; Starr, Katelyn; Suppi, Regina; Uguru, Ugo; Terry, Katrina; Villafuerte, Rosendo; Yuan, Vanessa

    2016-01-01

    Mycobacteriophage Cabrinians is a newly isolated phage capable of infecting both Mycobacterium phlei and Mycobacterium smegmatis and was recovered from a soil sample in New York City, NY. Cabrinians has a genome length of 56,669 bp, encodes 101 predicted proteins, and is a member of mycobacteriophages in cluster F. PMID:26847904

  3. Genome Sequence of Mycobacteriophage Mindy.

    PubMed

    Pope, Welkin H; Bernstein, Nicholas I; Fasolas, Christina S; Mezghani, Nadia; Pressimone, Catherine A; Selvakumar, Priyanga; Stanton, Ann-Catherine J; Lapin, Jonathan S; Prout, Ashley K; Grubb, Sarah R; Warner, Marcie H; Bowman, Charles A; Russell, Daniel A; Hatfull, Graham F

    2015-01-01

    Mycobacteriophage Mindy is a newly isolated phage of Mycobacterium smegmatis, recovered from a soil sample in Pittsburgh, Pennsylvania, USA. Mindy has a genome length of 75,796 bp, encodes 147 predicted proteins and two tRNAs, and is closely related to mycobacteriophages in cluster E. PMID:26089411

  4. Genome Sequence of Mycobacteriophage Cabrinians.

    PubMed

    Chudoff, Dylan; Conboy, Andrew; Conboy, Danielle; Atoulelou, Mireille; Hasan, Sakina; Martinez, Alexandria; Mastrando, Jessica; Roy, Renoy; Schmidt, Robert; Sheed, Kabreeze; Smith, Jewel; Sperratore, Morgan; Struga, Rexhina; Starr, Katelyn; Suppi, Regina; Uguru, Ugo; Terry, Katrina; Villafuerte, Rosendo; Yuan, Vanessa; Dunbar, David

    2016-01-01

    Mycobacteriophage Cabrinians is a newly isolated phage capable of infecting both Mycobacterium phlei and Mycobacterium smegmatis and was recovered from a soil sample in New York City, NY. Cabrinians has a genome length of 56,669 bp, encodes 101 predicted proteins, and is a member of mycobacteriophages in cluster F. PMID:26847904

  5. Genome Sequence of Mycobacteriophage Mindy

    PubMed Central

    Bernstein, Nicholas I.; Fasolas, Christina S.; Mezghani, Nadia; Pressimone, Catherine A.; Selvakumar, Priyanga; Stanton, Ann-Catherine J.; Lapin, Jonathan S.; Prout, Ashley K.; Grubb, Sarah R.; Warner, Marcie H.; Bowman, Charles A.; Russell, Daniel A.; Hatfull, Graham F.

    2015-01-01

    Mycobacteriophage Mindy is a newly isolated phage of Mycobacterium smegmatis, recovered from a soil sample in Pittsburgh, Pennsylvania, USA. Mindy has a genome length of 75,796 bp, encodes 147 predicted proteins and two tRNAs, and is closely related to mycobacteriophages in cluster E. PMID:26089411

  6. Standardized Metadata for Human Pathogen/Vector Genomic Sequences

    PubMed Central

    Dugan, Vivien G.; Emrich, Scott J.; Giraldo-Calderón, Gloria I.; Harb, Omar S.; Newman, Ruchi M.; Pickett, Brett E.; Schriml, Lynn M.; Stockwell, Timothy B.; Stoeckert, Christian J.; Sullivan, Dan E.; Singh, Indresh; Ward, Doyle V.; Yao, Alison; Zheng, Jie; Barrett, Tanya; Birren, Bruce; Brinkac, Lauren; Bruno, Vincent M.; Caler, Elizabet; Chapman, Sinéad; Collins, Frank H.; Cuomo, Christina A.; Di Francesco, Valentina; Durkin, Scott; Eppinger, Mark; Feldgarden, Michael; Fraser, Claire; Fricke, W. Florian; Giovanni, Maria; Henn, Matthew R.; Hine, Erin; Hotopp, Julie Dunning; Karsch-Mizrachi, Ilene; Kissinger, Jessica C.; Lee, Eun Mi; Mathur, Punam; Mongodin, Emmanuel F.; Murphy, Cheryl I.; Myers, Garry; Neafsey, Daniel E.; Nelson, Karen E.; Nierman, William C.; Puzak, Julia; Rasko, David; Roos, David S.; Sadzewicz, Lisa; Silva, Joana C.; Sobral, Bruno; Squires, R. Burke; Stevens, Rick L.; Tallon, Luke; Tettelin, Herve; Wentworth, David; White, Owen; Will, Rebecca; Wortman, Jennifer; Zhang, Yun; Scheuermann, Richard H.

    2014-01-01

    High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium’s minimal information (MIxS) and NCBI’s BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant. PMID:24936976

  7. Standardized metadata for human pathogen/vector genomic sequences.

    PubMed

    Dugan, Vivien G; Emrich, Scott J; Giraldo-Calderón, Gloria I; Harb, Omar S; Newman, Ruchi M; Pickett, Brett E; Schriml, Lynn M; Stockwell, Timothy B; Stoeckert, Christian J; Sullivan, Dan E; Singh, Indresh; Ward, Doyle V; Yao, Alison; Zheng, Jie; Barrett, Tanya; Birren, Bruce; Brinkac, Lauren; Bruno, Vincent M; Caler, Elizabet; Chapman, Sinéad; Collins, Frank H; Cuomo, Christina A; Di Francesco, Valentina; Durkin, Scott; Eppinger, Mark; Feldgarden, Michael; Fraser, Claire; Fricke, W Florian; Giovanni, Maria; Henn, Matthew R; Hine, Erin; Hotopp, Julie Dunning; Karsch-Mizrachi, Ilene; Kissinger, Jessica C; Lee, Eun Mi; Mathur, Punam; Mongodin, Emmanuel F; Murphy, Cheryl I; Myers, Garry; Neafsey, Daniel E; Nelson, Karen E; Nierman, William C; Puzak, Julia; Rasko, David; Roos, David S; Sadzewicz, Lisa; Silva, Joana C; Sobral, Bruno; Squires, R Burke; Stevens, Rick L; Tallon, Luke; Tettelin, Herve; Wentworth, David; White, Owen; Will, Rebecca; Wortman, Jennifer; Zhang, Yun; Scheuermann, Richard H

    2014-01-01

    High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS) and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant. PMID:24936976

  8. Genome Sequence of the Palaeopolyploid soybean

    SciTech Connect

    Schmutz, Jeremy; Cannon, Steven B.; Schlueter, Jessica; Ma, Jianxin; Mitros, Therese; Nelson, William; Hyten, David L.; Song, Qijian; Thelen, Jay J.; Cheng, Jianlin; Xu, Dong; Hellsten, Uffe; May, Gregory D.; Yu, Yeisoo; Sakura, Tetsuya; Umezawa, Taishi; Bhattacharyya, Madan K.; Sandhu, Devinder; Valliyodan, Babu; Lindquist, Erika; Peto, Myron; Grant, David; Shu, Shengqiang; Goodstein, David; Barry, Kerrie; Futrell-Griggs, Montona; Abernathy, Brian; Du, Jianchang; Tian, Zhixi; Zhu, Liucun; Gill, Navdeep; Joshi, Trupti; Libault, Marc; Sethuraman, Anand; Zhang, Xue-Cheng; Shinozaki, Kazuo; Nguyen, Henry T.; Wing, Rod A.; Cregan, Perry; Specht, James; Grimwood, Jane; Rokhsar, Dan; Stacey, Gary; Shoemaker, Randy C.; Jackson, Scott A.

    2009-08-03

    Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70percent more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78percent of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75percent of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

  9. Rhipicephalus (Boophilus) microplus strain Deutsch, whole genome shotgun sequencing project first submission of genome sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence difficult. Cot filtration/selection techniques were used to reduce the repetitive fraction of the tick genome and enrich for the fraction of DNA with gene-containing regions. The Cot-selected ...

  10. Complete genome sequence and genomic characterization of Microcystis panniformis FACHB 1757 by third-generation sequencing.

    PubMed

    Zhang, Jun-Yi; Guan, Rui; Zhang, Hu-Jun; Li, Hua; Xiao, Peng; Yu, Gong-Liang; Du, Lei; Cao, De-Min; Zhu, Bing-Chuan; Li, Ren-Hui; Lu, Zu-Hong

    2016-01-01

    The cyanobacterial genus Microcystis is well known as the main group that forms harmful blooms in water. A strain of Microcystis, M. panniformis FACHB1757, was isolated from Meiliang Bay of Lake Taihu in August 2011. The whole genome was sequenced using PacBio RS II sequencer with 48-fold coverage. The complete genome sequence with no gaps contained a 5,686,839 bp chromosome and a 38,683 bp plasmid, which coded for 6,519 and 49 proteins, respectively. Comparison with strains of M. aeruginosa and some other water bloom-forming cyanobacterial species revealed large-scale structure rearrangement and length variation at the genome level along with 36 genomic islands annotated genome-wide, which demonstrates high plasticity of the M. panniformis FACHB1757 genome and reveals that Microcystis has a flexible genome evolution. PMID:26823957

  11. Accelerating Genome Sequencing 100X with FPGAs

    SciTech Connect

    Storaasli, Olaf O; Strenski, Dave

    2007-01-01

    The performance of two Cray XD1 systems with Virtex-II Pro 50 and Virtex-4 LX160 FPGAs was evaluated using the FASTA computational biology program for human genome (DNA and protein) sequence comparisons. FPGA speedups of 50X (Virtex-II Pro 50) and 100X (Virtex-4 LX160) over a 2.2 GHz Opteron were obtained. FPGA coding issues for human genome data are described.

  12. Microbial species delineation using whole genome sequences.

    PubMed

    Varghese, Neha J; Mukherjee, Supratim; Ivanova, Natalia; Konstantinidis, Konstantinos T; Mavrommatis, Kostas; Kyrpides, Nikos C; Pati, Amrita

    2015-08-18

    Increased sequencing of microbial genomes has revealed that prevailing prokaryotic species assignments can be inconsistent with whole genome information for a significant number of species. The long-standing need for a systematic and scalable species assignment technique can be met by the genome-wide Average Nucleotide Identity (gANI) metric, which is widely acknowledged as a robust measure of genomic relatedness. In this work, we demonstrate that the combination of gANI and the alignment fraction (AF) between two genomes accurately reflects their genomic relatedness. We introduce an efficient implementation of AF,gANI and discuss its successful application to 86.5M genome pairs between 13,151 prokaryotic genomes assigned to 3032 species. Subsequently, by comparing the genome clusters obtained from complete linkage clustering of these pairs to existing taxonomy, we observed that nearly 18% of all prokaryotic species suffer from anomalies in species definition. Our results can be used to explore central questions such as whether microorganisms form a continuum of genetic diversity or distinct species represented by distinct genetic signatures. We propose that this precise and objective AF,gANI-based species definition: the MiSI (Microbial Species Identifier) method, be used to address previous inconsistencies in species classification and as the primary guide for new taxonomic species assignment, supplemented by the traditional polyphasic approach, as required. PMID:26150420

  13. Microbial species delineation using whole genome sequences

    PubMed Central

    Varghese, Neha J.; Mukherjee, Supratim; Ivanova, Natalia; Konstantinidis, Konstantinos T.; Mavrommatis, Kostas; Kyrpides, Nikos C.; Pati, Amrita

    2015-01-01

    Increased sequencing of microbial genomes has revealed that prevailing prokaryotic species assignments can be inconsistent with whole genome information for a significant number of species. The long-standing need for a systematic and scalable species assignment technique can be met by the genome-wide Average Nucleotide Identity (gANI) metric, which is widely acknowledged as a robust measure of genomic relatedness. In this work, we demonstrate that the combination of gANI and the alignment fraction (AF) between two genomes accurately reflects their genomic relatedness. We introduce an efficient implementation of AF,gANI and discuss its successful application to 86.5M genome pairs between 13,151 prokaryotic genomes assigned to 3032 species. Subsequently, by comparing the genome clusters obtained from complete linkage clustering of these pairs to existing taxonomy, we observed that nearly 18% of all prokaryotic species suffer from anomalies in species definition. Our results can be used to explore central questions such as whether microorganisms form a continuum of genetic diversity or distinct species represented by distinct genetic signatures. We propose that this precise and objective AF,gANI-based species definition: the MiSI (Microbial Species Identifier) method, be used to address previous inconsistencies in species classification and as the primary guide for new taxonomic species assignment, supplemented by the traditional polyphasic approach, as required. PMID:26150420

  14. Using comparative genomics to reorder the human genome sequence into a virtual sheep genome

    PubMed Central

    Dalrymple, Brian P; Kirkness, Ewen F; Nefedov, Mikhail; McWilliam, Sean; Ratnakumar, Abhirami; Barris, Wes; Zhao, Shaying; Shetty, Jyoti; Maddox, Jillian F; O'Grady, Margaret; Nicholas, Frank; Crawford, Allan M; Smith, Tim; de Jong, Pieter J; McEwan, John; Oddy, V Hutton; Cockett, Noelle E

    2007-01-01

    Background Is it possible to construct an accurate and detailed subgene-level map of a genome using bacterial artificial chromosome (BAC) end sequences, a sparse marker map, and the sequences of other genomes? Results A sheep BAC library, CHORI-243, was constructed and the BAC end sequences were determined and mapped with high sensitivity and low specificity onto the frameworks of the human, dog, and cow genomes. To maximize genome coverage, the coordinates of all BAC end sequence hits to the cow and dog genomes were also converted to the equivalent human genome coordinates. The 84,624 sheep BACs (about 5.4-fold genome coverage) with paired ends in the correct orientation (tail-to-tail) and spacing, combined with information from sheep BAC comparative genome contigs (CGCs) built separately on the dog and cow genomes, were used to construct 1,172 sheep BAC-CGCs, covering 91.2% of the human genome. Clustered non-tail-to-tail and outsize BACs located close to the ends of many BAC-CGCs linked BAC-CGCs covering about 70% of the genome to at least one other BAC-CGC on the same chromosome. Using the BAC-CGCs, the intrachromosomal and interchromosomal BAC-CGC linkage information, human/cow and vertebrate synteny, and the sheep marker map, a virtual sheep genome was constructed. To identify BACs potentially located in gaps between BAC-CGCs, an additional set of 55,668 sheep BACs were positioned on the sheep genome with lower confidence. A coordinate conversion process allowed us to transfer human genes and other genome features to the virtual sheep genome to display on a sheep genome browser. Conclusion We demonstrate that limited sequencing of BACs combined with positioning on a well assembled genome and integrating locations from other less well assembled genomes can yield extensive, detailed subgene-level maps of mammalian genomes, for which genomic resources are currently limited. PMID:17663790

  15. Sequence Pattern Recognition in Genome Analysis

    NASA Astrophysics Data System (ADS)

    Luo, Liaofu; Lu, Jun

    2007-12-01

    The problem of pattern recognition in genome analysis is studied. How the sequence information is extracted and integrated in the approach to sequence pattern recognition is discussed in detail. We propose two methods for calculation and prediction. The first is the Information Deviation Measure with Quadratic Discriminant (IDQD) and the second is the Information Deviation Measure with U-transformation Discriminant (IDUD). The former is applicable in case of sequence information obeying Gaussian-type distribution and the latter can be used in more general statistical distributions of sequence information.

  16. Genome Walking by Next Generation Sequencing Approaches

    PubMed Central

    Volpicella, Mariateresa; Leoni, Claudia; Costanza, Alessandra; Fanizza, Immacolata; Placido, Antonio; Ceci, Luigi R.

    2012-01-01

    Genome Walking (GW) comprises a number of PCR-based methods for the identification of nucleotide sequences flanking known regions. The different methods have been used for several purposes: from de novo sequencing, useful for the identification of unknown regions, to the characterization of insertion sites for viruses and transposons. In the latter cases Genome Walking methods have been recently boosted by coupling to Next Generation Sequencing technologies. This review will focus on the development of several protocols for the application of Next Generation Sequencing (NGS) technologies to GW, which have been developed in the course of analysis of insertional libraries. These analyses find broad application in protocols for functional genomics and gene therapy. Thanks to the application of NGS technologies, the original vision of GW as a procedure for walking along an unknown genome is now changing into the possibility of observing the parallel marching of hundreds of thousands of primers across the borders of inserted DNA molecules in host genomes. PMID:24832505

  17. An International Plan to Sequence the Onion Genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The cost of DNA sequencing continues to decline and, in the near future, it will become reasonable to undertake sequencing of the enormous nuclear genome of onion. We undertook sequencing of expressed and genomic regions of the onion genome to learn about the structure of the onion genome, as well a...

  18. Complete genome sequence of Caulobacter crescentus.

    PubMed

    Nierman, W C; Feldblyum, T V; Laub, M T; Paulsen, I T; Nelson, K E; Eisen, J A; Heidelberg, J F; Alley, M R; Ohta, N; Maddock, J R; Potocka, I; Nelson, W C; Newton, A; Stephens, C; Phadke, N D; Ely, B; DeBoy, R T; Dodson, R J; Durkin, A S; Gwinn, M L; Haft, D H; Kolonay, J F; Smit, J; Craven, M B; Khouri, H; Shetty, J; Berry, K; Utterback, T; Tran, K; Wolf, A; Vamathevan, J; Ermolaeva, M; White, O; Salzberg, S L; Venter, J C; Shapiro, L; Fraser, C M; Eisen, J

    2001-03-27

    The complete genome sequence of Caulobacter crescentus was determined to be 4,016,942 base pairs in a single circular chromosome encoding 3,767 genes. This organism, which grows in a dilute aquatic environment, coordinates the cell division cycle and multiple cell differentiation events. With the annotated genome sequence, a full description of the genetic network that controls bacterial differentiation, cell growth, and cell cycle progression is within reach. Two-component signal transduction proteins are known to play a significant role in cell cycle progression. Genome analysis revealed that the C. crescentus genome encodes a significantly higher number of these signaling proteins (105) than any bacterial genome sequenced thus far. Another regulatory mechanism involved in cell cycle progression is DNA methylation. The occurrence of the recognition sequence for an essential DNA methylating enzyme that is required for cell cycle regulation is severely limited and shows a bias to intergenic regions. The genome contains multiple clusters of genes encoding proteins essential for survival in a nutrient poor habitat. Included are those involved in chemotaxis, outer membrane channel function, degradation of aromatic ring compounds, and the breakdown of plant-derived carbon sources, in addition to many extracytoplasmic function sigma factors, providing the organism with the ability to respond to a wide range of environmental fluctuations. C. crescentus is, to our knowledge, the first free-living alpha-class proteobacterium to be sequenced and will serve as a foundation for exploring the biology of this group of bacteria, which includes the obligate endosymbiont and human pathogen Rickettsia prowazekii, the plant pathogen Agrobacterium tumefaciens, and the bovine and human pathogen Brucella abortus. PMID:11259647

  19. Mapping and Sequencing the Human Genome

    DOE R&D Accomplishments Database

    1988-01-01

    Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.

  20. Mapping and sequencing the human genome

    SciTech Connect

    1988-01-01

    Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.

  1. Genome Sequence of Lactobacillus versmoldensis KCTC 3814

    PubMed Central

    Kim, Dae-Soo; Choi, Sang-Haeng; Kim, Dong-Wook; Kim, Ryong Nam; Nam, Seong-Hyeuk; Kang, Aram; Kim, Aeri; Park, Hong-Seog

    2011-01-01

    Lactobacillus versmoldensis KCTC 3814 was isolated from raw fermented poultry salami. The species was present in high numbers and frequently dominated the lactic acid bacteria (LAB) populations of the products. Here, we announce the draft genome sequence of Lactobacillus versmoldensis KCTC 3814, isolated from poultry salami, and describe major findings from its annotation. PMID:21914893

  2. Genome sequence of Lactobacillus crispatus ST1.

    PubMed

    Ojala, Teija; Kuparinen, Veera; Koskinen, J Patrik; Alatalo, Edward; Holm, Liisa; Auvinen, Petri; Edelman, Sanna; Westerlund-Wikström, Benita; Korhonen, Timo K; Paulin, Lars; Kankainen, Matti

    2010-07-01

    Lactobacillus crispatus is a common member of the beneficial microbiota present in the vertebrate gastrointestinal and human genitourinary tracts. Here, we report the genome sequence of L. crispatus ST1, a chicken isolate displaying strong adherence to vaginal epithelial cells. PMID:20435723

  3. Genome Sequence of Salmonella Phage 9NA.

    PubMed

    Casjens, Sherwood R; Leavitt, Justin C; Hatfull, Graham F; Hendrix, Roger W

    2014-01-01

    The virulent double-stranded DNA (dsDNA) bacteriophage 9NA infects Salmonella enterica serovar Typhimurium and has a long noncontractile tail. We report its complete 52,869-bp genome sequence. Phage 9NA and two closely related S. enterica serovar Newport phages represent a tailed phage type whose molecular lifestyle has not yet been studied in detail. PMID:25146133

  4. Genome Sequence of Salmonella Phage 9NA

    PubMed Central

    Leavitt, Justin C.; Hatfull, Graham F.; Hendrix, Roger W.

    2014-01-01

    The virulent double-stranded DNA (dsDNA) bacteriophage 9NA infects Salmonella enterica serovar Typhimurium and has a long noncontractile tail. We report its complete 52,869-bp genome sequence. Phage 9NA and two closely related S. enterica serovar Newport phages represent a tailed phage type whose molecular lifestyle has not yet been studied in detail. PMID:25146133

  5. Genome Sequence of Salmonella Phage χ.

    PubMed

    Hendrix, Roger W; Ko, Ching-Chung; Jacobs-Sera, Deborah; Hatfull, Graham F; Erhardt, Marc; Hughes, Kelly T; Casjens, Sherwood R

    2015-01-01

    Salmonella bacteriophage χ is a member of the Siphoviridae family that gains entry into its host cells by adsorbing to their flagella. We report the complete 59,578-bp sequence of the genome of phage χ, which together with its relatives, exemplifies a largely unexplored type of tailed bacteriophage. PMID:25720684

  6. Genome Sequence of Salmonella Phage χ

    PubMed Central

    Ko, Ching-Chung; Jacobs-Sera, Deborah; Hatfull, Graham F.; Erhardt, Marc; Hughes, Kelly T.; Casjens, Sherwood R.

    2015-01-01

    Salmonella bacteriophage χ is a member of the Siphoviridae family that gains entry into its host cells by adsorbing to their flagella. We report the complete 59,578-bp sequence of the genome of phage χ, which together with its relatives, exemplifies a largely unexplored type of tailed bacteriophage. PMID:25720684

  7. Genome Sequence of Corynebacterium ulcerans Strain 210932

    PubMed Central

    Viana, Marcus Vinicius Canário; de Jesus Benevides, Leandro; Batista Mariano, Diego Cesar; de Souza Rocha, Flávia; Bagano Vilas Boas, Priscilla Carolinne; Folador, Edson Luiz; Pereira, Felipe Luiz; Alves Dorella, Fernanda; Gomes Leal, Carlos Augusto; Fiorini de Carvalho, Alex; Silva, Artur; de Castro Soares, Siomar; Pereira Figueiredo, Henrique Cesar; Guimarães, Luis Carlos

    2014-01-01

    In this work, we present the complete genome sequence of Corynebacterium ulcerans strain 210932, isolated from a human. The species is an emergent pathogen that infects a variety of wild and domesticated animals and humans. It is associated with a growing number of cases of a diphtheria-like disease around the world. PMID:25428977

  8. Whole Genome Sequence of a Turkish Individual

    PubMed Central

    Dogan, Haluk; Can, Handan; Otu, Hasan H.

    2014-01-01

    Although whole human genome sequencing can be done with readily available technical and financial resources, the need for detailed analyses of genomes of certain populations still exists. Here we present, for the first time, sequencing and analysis of a Turkish human genome. We have performed 35x coverage using paired-end sequencing, where over 95% of sequencing reads are mapped to the reference genome covering more than 99% of the bases. The assembly of unmapped reads rendered 11,654 contigs, 2,168 of which did not reveal any homology to known sequences, resulting in ∼1 Mbp of unmapped sequence. Single nucleotide polymorphism (SNP) discovery resulted in 3,537,794 SNP calls with 29,184 SNPs identified in coding regions, where 106 were nonsense and 259 were categorized as having a high-impact effect. The homo/hetero zygosity (1,415,123∶2,122,671 or 1∶1.5) and transition/transversion ratios (2,383,204∶1,154,590 or 2.06∶1) were within expected limits. Of the identified SNPs, 480,396 were potentially novel with 2,925 in coding regions, including 48 nonsense and 95 high-impact SNPs. Functional analysis of novel high-impact SNPs revealed various interaction networks, notably involving hereditary and neurological disorders or diseases. Assembly results indicated 713,640 indels (1∶1.09 insertion/deletion ratio), ranging from −52 bp to 34 bp in length and causing about 180 codon insertion/deletions and 246 frame shifts. Using paired-end- and read-depth-based methods, we discovered 9,109 structural variants and compared our variant findings with other populations. Our results suggest that whole genome sequencing is a valuable tool for understanding variations in the human genome across different populations. Detailed analyses of genomes of diverse origins greatly benefits research in genetics and medicine and should be conducted on a larger scale. PMID:24416366

  9. Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genomics research in mammals has produced reference genome sequences that are essential for identifying variation associated with disease. High quality reference genome sequences are now available for humans, model species, and economically important agricultural animals. Comparisons between these s...

  10. The Theory and Practice of Genome Sequence Assembly.

    PubMed

    Simpson, Jared T; Pop, Mihai

    2015-01-01

    The current genomic revolution was made possible by joint advances in genome sequencing technologies and computational approaches for analyzing sequence data. The close interaction between biologists and computational scientists is perhaps most apparent in the development of approaches for sequencing entire genomes, a feat that would not be possible without sophisticated computational tools called genome assemblers (short for genome sequence assemblers). Here, we survey the key developments in algorithms for assembling genome sequences since the development of the first DNA sequencing methods more than 35 years ago. PMID:25939056

  11. Gambling on a shortcut to genome sequencing

    SciTech Connect

    Roberts, L.

    1991-06-21

    Almost from the start of the Human Genome Project, a debate has been raging over whether to sequence the entire human genome, all 3 billion bases, or just the genes - a mere 2% or 3% of the genome, and by far the most interesting part. In England, Sydney Brenner convinced the Medical Research Council (MRC) to start with the expressed genes, or complementary DNAs. But the US stance has been that the entire sequence is essential if we are to understand the blueprint of man. Craig Venter of the National Institute of Neurological Disorders and Stroke says that focusing on the expressed genes may be even more useful than expected. His strategy involves randomly selecting clones from cDNA libraries which theoretically contain all the genes that are switched on at a particular time in a particular tissue. Then the researchers sequence just a short stretch of each clone, about 400 to 500 bases, to create can expressed sequence tag or EST. The sequences of these ESTs are then stored in a database. Using that information, other researchers can then recreate that EST by using polymerase chain reaction techniques.

  12. Dominant short repeated sequences in bacterial genomes.

    PubMed

    Avershina, Ekaterina; Rudi, Knut

    2015-03-01

    We use a novel multidimensional searching approach to present the first exhaustive search for all possible repeated sequences in 166 genomes selected to cover the bacterial domain. We found an overrepresentation of repeated sequences in all but one of the genomes. The most prevalent repeats by far were related to interspaced short palindromic repeats (CRISPRs)—conferring bacterial adaptive immunity. We identified a deep branching clade of thermophilic Firmicutes containing the highest number of CRISPR repeats. We also identified a high prevalence of tandem repeated heptamers. In addition, we identified GC-rich repeats that could potentially be involved in recombination events. Finally, we identified repeats in a 16322 amino acid mega protein (involved in biofilm formation) and inverted repeats flanking miniature transposable elements (MITEs). In conclusion, the exhaustive search for repeated sequences identified new elements and distribution of these, which has implications for understanding both the ecology and evolution of bacteria. PMID:25561351

  13. Genome, Epigenome and RNA sequences of Monozygotic Twins Discordant for Multiple Sclerosis

    SciTech Connect

    Miller, Neil

    2010-06-02

    Neil Miller, Deputy Director of Software Engineering at the National Center for Genome Resources, discusses a monozygotic twin study on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  14. Comparative Analysis of Genome Sequences with VISTA

    DOE Data Explorer

    Dubchak, Inna

    VISTA is a comprehensive suite of programs and databases developed by and hosted at the Genomics Division of Lawrence Berkeley National Laboratory. They provide information and tools designed to facilitate comparative analysis of genomic sequences. Users have two ways to interact with the suite of applications at the VISTA portal. They can submit their own sequences and alignments for analysis (VISTA servers) or examine pre-computed whole-genome alignments of different species. A key menu option is the Enhancer Browser and Database at http://enhancer.lbl.gov/. The VISTA Enhancer Browser is a central resource for experimentally validated human noncoding fragments with gene enhancer activity as assessed in transgenic mice. Most of these noncoding elements were selected for testing based on their extreme conservation with other vertebrates. The results of this enhancer screen are provided through this publicly available website. The browser also features relevant results by external contributors and a large collection of additional genome-wide conserved noncoding elements which are candidate enhancer sequences. The LBL developers invite external groups to submit computational predictions of developmental enhancers. As of 10/19/2009 the database contains information on 1109 in vivo tested elements - 508 elements with enhancer activity.

  15. Defining Genome Project Standards in a New Era of Sequencing

    SciTech Connect

    Chain, Patrick

    2009-05-27

    Patrick Chain of the DOE Joint Genome Institute gives a talk on behalf of the International Genome Sequencing Standards Consortium on the need for intermediate genome classifications between "draft" and "finished"

  16. Whole-genome sequencing in bacteriology: state of the art

    PubMed Central

    Dark, Michael J

    2013-01-01

    Over the last ten years, genome sequencing capabilities have expanded exponentially. There have been tremendous advances in sequencing technology, DNA sample preparation, genome assembly, and data analysis. This has led to advances in a number of facets of bacterial genomics, including metagenomics, clinical medicine, bacterial archaeology, and bacterial evolution. This review examines the strengths and weaknesses of techniques in bacterial genome sequencing, upcoming technologies, and assembly techniques, as well as highlighting recent studies that highlight new applications for bacterial genomics. PMID:24143115

  17. Draft Genome Sequence of Mycobacterium brumae ATCC 51384

    PubMed Central

    D'Auria, Giuseppe

    2016-01-01

    Here, we report the draft genome sequence of Mycobacterium brumae type strain ATCC 51384. This is the first draft genome sequence of M. brumae, a nonpathogenic, rapidly growing, nonchromogenic mycobacterium, with immunotherapeutic capacities. PMID:27125480

  18. Draft Genome Sequence of Mycobacterium brumae ATCC 51384.

    PubMed

    D'Auria, Giuseppe; Torrents, Eduard; Luquin, Marina; Comas, Iñaki; Julián, Esther

    2016-01-01

    Here, we report the draft genome sequence of Mycobacterium brumae type strain ATCC 51384. This is the first draft genome sequence of M. brumae, a nonpathogenic, rapidly growing, nonchromogenic mycobacterium, with immunotherapeutic capacities. PMID:27125480

  19. Secure distributed genome analysis for GWAS and sequence comparison computation

    PubMed Central

    2015-01-01

    Background The rapid increase in the availability and volume of genomic data makes significant advances in biomedical research possible, but sharing of genomic data poses challenges due to the highly sensitive nature of such data. To address the challenges, a competition for secure distributed processing of genomic data was organized by the iDASH research center. Methods In this work we propose techniques for securing computation with real-life genomic data for minor allele frequency and chi-squared statistics computation, as well as distance computation between two genomic sequences, as specified by the iDASH competition tasks. We put forward novel optimizations, including a generalization of a version of mergesort, which might be of independent interest. Results We provide implementation results of our techniques based on secret sharing that demonstrate practicality of the suggested protocols and also report on performance improvements due to our optimization techniques. Conclusions This work describes our techniques, findings, and experimental results developed and obtained as part of iDASH 2015 research competition to secure real-life genomic computations and shows feasibility of securely computing with genomic data in practice. PMID:26733307

  20. Sequencing of Seven Haloarchaeal Genomes Reveals Patterns of Genomic Flux

    PubMed Central

    Lynch, Erin A.; Langille, Morgan G. I.; Darling, Aaron; Wilbanks, Elizabeth G.; Haltiner, Caitlin; Shao, Katie S. Y.; Starr, Michael O.; Teiling, Clotilde; Harkins, Timothy T.; Edwards, Robert A.; Eisen, Jonathan A.; Facciotti, Marc T.

    2012-01-01

    We report the sequencing of seven genomes from two haloarchaeal genera, Haloferax and Haloarcula. Ease of cultivation and the existence of well-developed genetic and biochemical tools for several diverse haloarchaeal species make haloarchaea a model group for the study of archaeal biology. The unique physiological properties of these organisms also make them good candidates for novel enzyme discovery for biotechnological applications. Seven genomes were sequenced to ∼20×coverage and assembled to an average of 50 contigs (range 5 scaffolds - 168 contigs). Comparisons of protein-coding gene compliments revealed large-scale differences in COG functional group enrichment between these genera. Analysis of genes encoding machinery for DNA metabolism reveals genera-specific expansions of the general transcription factor TATA binding protein as well as a history of extensive duplication and horizontal transfer of the proliferating cell nuclear antigen. Insights gained from this study emphasize the importance of haloarchaea for investigation of archaeal biology. PMID:22848480

  1. A comparison of virus genome sequences with their host silkworm, Bombyx mori.

    PubMed

    Tang, Xu-Dong; Yue, Ya-Jie; Wang, Wei; Li, Nan; Shen, Zhong-Yuan

    2016-01-15

    With the recent availability of the genomes of many viruses and the silkworm, Bombyx mori, as well as a variety of Basic Local Alignment Search Tool (BLAST) programs, a new opportunity to gain insight into the interaction of viruses with the silkworm is possible. This study aims to determine the possible existence of sequence identities between the genomes of viruses and the silkworm and attempts to explain this phenomenon. BLAST searches of the genomes of viruses against the silkworm genome were performed using the resources of the National Center for Biotechnology Information. All studied viruses contained variable numbers of short regions with sequence identity to the genome of the silkworm. The short regions of sequence identity in the genome of the silkworm may be derived from the genomes of viruses in the long history of silkworm-virus interaction. This study is the first to compare these genomes, and may contribute to research on the interaction between viruses and the silkworm. PMID:26432002

  2. Whole genome sequence analysis of Mycobacterium suricattae.

    PubMed

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; van der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah Musa; Siame, Kabengele Keith; Gey van Pittius, Nicolaas Claudius; van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-12-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi. PMID:26542221

  3. Draft Genome Sequence of Rubrivivax gelatinosus CBS

    SciTech Connect

    Hu, P. S.; Lang, J.; Wawrousek, K.; Yu, J. P.; Maness, P. C.; Chen, J.

    2012-06-01

    Rubrivivax gelatinosus CBS, a purple nonsulfur photosynthetic bacterium, can grow photosynthetically using CO and N{sub 2} as the sole carbon and nitrogen nutrients, respectively. R. gelatinosus CBS is of particular interest due to its ability to metabolize CO and yield H{sub 2}. We present the 5-Mb draft genome sequence of R. gelatinosus CBS with the goal of providing genetic insight into the metabolic properties of this bacterium.

  4. Complete Genome Sequences of 138 Mycobacteriophages

    PubMed Central

    2012-01-01

    Bacteriophages are the most numerous biological entities in the biosphere, and although their genetic diversity is high, it remains ill defined. Mycobacteriophages—the viruses of mycobacterial hosts—provide insights into this diversity as well as tools for manipulating Mycobacterium tuberculosis. We report here the complete genome sequences of 138 new mycobacteriophages, which—together with the 83 mycobacteriophages previously reported—represent the largest collection of phages known to infect a single common host, Mycobacterium smegmatis mc2 155. PMID:22282335

  5. Genome Technology Center at the NYU Langone Medical Center: New Support for Clinical and Translational Science

    PubMed Central

    Mische, S.; Zavadil, J.

    2011-01-01

    To significantly enhance support for clinical and translational research within the framework of its CTSI, the NYU Langone Medical Center consolidated the Microarray and DNA Sequencing Cores into a new Genome Technology Center, a shared resource overseen by the Office for Collaborative Science. The GTC's team of 4 technical personnel and one faculty level director assists >120 NYULMC laboratories in their basic, clinical and translational research. The Sequencing Unit operates 2 Illumina GAIIs, and a HiSeq sequencer will be added in Q1 2011. The GAII capacity is applied to research applications (ChIP-seq, small-RNA-seq and RIP-seq) and to identification of disease-related genome-level structure changes and correlates (e.g. RNA-seq of cancer transcriptomes). GTC also has a Roche GS FLX System (454) used for de novo sequencing of microbial species and for amplicon sequencing in clinical genetics, patient microbiome diversity, etc. The Microarray Unit operates Affymetrix GeneChip system and high-capacity QPCR (ABI 7900HT) with automated plate setup and loading for gene and microRNA profiling and for SNP genotyping in clinical genetics. The GTC cooperates closely with the newly established Center for Health Informatics and Bioinformatics (CHIBI) supported by the NIH/NCRR CTSA Award. CHIBI provides an HPC facility for sequencing and microarray data storage and offers a full range of informatics services. The GTC is committed to regional and nationwide collaborations with other Cores. GTC participates in the activities of the Genomic Analysis and Technology Excellence (GATE) Working Group of the Academy for Medical Development and Collaboration (AMDeC), particularly in the sections of Core Facility Directors, Funding Strategy and Bioinformatics. It also contributes to the AMDeC Facilities Instrumentation Resources Services Technologies (FIRST), a real-time online database of biomedical research technology and resources available in the New York City area and throughout Northeastern US. Key services of the GTC are offered to external clients.

  6. Assessing the Costs and Cost-Effectiveness of Genomic Sequencing

    PubMed Central

    Christensen, Kurt D.; Dukhovny, Dmitry; Siebert, Uwe; Green, Robert C.

    2015-01-01

    Despite dramatic drops in DNA sequencing costs, concerns are great that the integration of genomic sequencing into clinical settings will drastically increase health care expenditures. This commentary presents an overview of what is known about the costs and cost-effectiveness of genomic sequencing. We discuss the cost of germline genomic sequencing, addressing factors that have facilitated the decrease in sequencing costs to date and anticipating the factors that will drive sequencing costs in the future. We then address the cost-effectiveness of diagnostic and pharmacogenomic applications of genomic sequencing, with an emphasis on the implications for secondary findings disclosure and the integration of genomic sequencing into general patient care. Throughout, we ground the discussion by describing efforts in the MedSeq Project, an ongoing randomized controlled clinical trial, to understand the costs and cost-effectiveness of integrating whole genome sequencing into cardiology and primary care settings. PMID:26690481

  7. Whole genome sequencing for lung cancer

    PubMed Central

    Goh, Felicia; Wright, Casey M; Sriram, Krishna B; Relan, Vandana; Clarke, Belinda E; Duhig, Edwina E; Bowman, Rayleen V; Yang, Ian A; Fong, Kwun M

    2012-01-01

    Lung cancer is a leading cause of cancer related morbidity and mortality globally, and carries a dismal prognosis. Improved understanding of the biology of cancer is required to improve patient outcomes. Next-generation sequencing (NGS) is a powerful tool for whole genome characterisation, enabling comprehensive examination of somatic mutations that drive oncogenesis. Most NGS methods are based on polymerase chain reaction (PCR) amplification of platform-specific DNA fragment libraries, which are then sequenced. These techniques are well suited to high-throughput sequencing and are able to detect the full spectrum of genomic changes present in cancer. However, they require considerable investments in time, laboratory infrastructure, computational analysis and bioinformatic support. Next-generation sequencing has been applied to studies of the whole genome, exome, transcriptome and epigenome, and is changing the paradigm of lung cancer research and patient care. The results of this new technology will transform current knowledge of oncogenic pathways and provide molecular targets of use in the diagnosis and treatment of cancer. Somatic mutations in lung cancer have already been identified by NGS, and large scale genomic studies are underway. Personalised treatment strategies will improve care for those likely to benefit from available therapies, while sparing others the expense and morbidity of futile intervention. Organisational, computational and bioinformatic challenges of NGS are driving technological advances as well as raising ethical issues relating to informed consent and data release. Differentiation between driver and passenger mutations requires careful interpretation of sequencing data. Challenges in the interpretation of results arise from the types of specimens used for DNA extraction, sample processing techniques and tumour content. Tumour heterogeneity can reduce power to detect mutations implicated in oncogenesis. Next-generation sequencing will facilitate investigation of the biological and clinical implications of such variation. These techniques can now be applied to single cells and free circulating DNA, and possibly in the future to DNA obtained from body fluids and from subpopulations of tumour. As costs reduce, and speed and processing accuracy increase, NGS technology will become increasingly accessible to researchers and clinicians, with the ultimate goal of improving the care of patients with lung cancer. PMID:22833821

  8. Why Assembling Plant Genome Sequences Is So Challenging

    PubMed Central

    Claros, Manuel Gonzalo; Bautista, Rocío; Guerrero-Fernández, Darío; Benzerki, Hicham; Seoane, Pedro; Fernández-Pozo, Noé

    2012-01-01

    In spite of the biological and economic importance of plants, relatively few plant species have been sequenced. Only the genome sequence of plants with relatively small genomes, most of them angiosperms, in particular eudicots, has been determined. The arrival of next-generation sequencing technologies has allowed the rapid and efficient development of new genomic resources for non-model or orphan plant species. But the sequencing pace of plants is far from that of animals and microorganisms. This review focuses on the typical challenges of plant genomes that can explain why plant genomics is less developed than animal genomics. Explanations about the impact of some confounding factors emerging from the nature of plant genomes are given. As a result of these challenges and confounding factors, the correct assembly and annotation of plant genomes is hindered, genome drafts are produced, and advances in plant genomics are delayed. PMID:24832233

  9. NCI Center of Excellence in Integrative Cancer Biology and Genomics

    Cancer.gov

    Highlighted Article 1 The Center of Excellence in Integrative Cancer Biology and Genomics (CEICBG) is one of four Centers of Excellence established within the NCI Intramural Research Program (IRP). The Centers of Excellence build upon existing structures

  10. Whole Chloroplast Genome Sequencing in Fragaria Using Deep Sequencing: A Comparison of Three Methods

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Chloroplast sequences previously investigated in Fragaria revealed low amounts of variation. Deep sequencing technologies enable economical sequencing of complete chloroplast genomes. These sequences can potentially provide robust phylogenetic resolution, even at low taxonomic levels within plant gr...

  11. Complete genome sequence of bacteriophage T5.

    PubMed

    Wang, Jianbin; Jiang, Yan; Vincent, Myriam; Sun, Yongqiao; Yu, Hong; Wang, Jing; Bao, Qiyu; Kong, Huimin; Hu, Songnian

    2005-02-01

    The 121,752-bp genome sequence of bacteriophage T5 was determined; the linear, double-stranded DNA is nicked in one of the strands and has large direct terminal repeats of 10,139 bp (8.3%) at both ends. The genome structure is consistently arranged according to its lytic life cycle. Of the 168 potential open reading frames (ORFs), 61 were annotated; these annotated ORFs are mainly enzymes involved in phage DNA replication, repair, and nucleotide metabolism. At least five endonucleases that believed to help inducing nicks in T5 genomic DNA, and a DNA ligase gene was found to be split into two separate ORFs. Analysis of T5 early promoters suggests a probable motif AAA{3, 4 T}nTTGCTT{17, 18 n}TATAATA{12, 13 W}{10 R} for strong promoters that may strengthen the step modification of host RNA polymerase, and thus control transcription of phage DNA. The distinct protein domain profile and a mosaic genome structure suggest an origin from the common genetic pool. PMID:15661140

  12. BorreliaBase: a phylogeny-centered browser of Borrelia genomes

    PubMed Central

    2014-01-01

    Background The bacterial genus Borrelia (phylum Spirochaetes) consists of two groups of pathogens represented respectively by B. burgdorferi, the agent of Lyme borreliosis, and B. hermsii, the agent of tick-borne relapsing fever. The number of publicly available Borrelia genomic sequences is growing rapidly with the discovery and sequencing of Borrelia strains worldwide. There is however a lack of dedicated online databases to facilitate comparative analyses of Borrelia genomes. Description We have developed BorreliaBase, an online database for comparative browsing of Borrelia genomes. The database is currently populated with sequences from 35 genomes of eight Lyme-borreliosis (LB) group Borrelia species and 7 Relapsing-fever (RF) group Borrelia species. Distinct from genome repositories and aggregator databases, BorreliaBase serves manually curated comparative-genomic data including genome-based phylogeny, genome synteny, and sequence alignments of orthologous genes and intergenic spacers. Conclusions With a genome phylogeny at its center, BorreliaBase allows online identification of hypervariable lipoprotein genes, potential regulatory elements, and recombination footprints by providing evolution-based expectations of sequence variability at each genomic locus. The phylo-centric design of BorreliaBase (http://borreliabase.org) is a novel model for interactive browsing and comparative analysis of bacterial genomes online. PMID:24994456

  13. Simple sequence repeats in bryophyte mitochondrial genomes.

    PubMed

    Zhao, Chao-Xian; Zhu, Rui-Liang; Liu, Yang

    2016-01-01

    Simple sequence repeats (SSRs) are thought to be common in plant mitochondrial (mt) genomes, but have yet to be fully described for bryophytes. We screened the mt genomes of two liverworts (Marchantia polymorpha and Pleurozia purpurea), two mosses (Physcomitrella patens and Anomodon rugelii) and two hornworts (Phaeoceros laevis and Nothoceros aenigmaticus), and detected 475 SSRs. Some SSRs are found conserved during the evolution, among which except one exists in both liverworts and mosses, all others are shared only by the two liverworts, mosses or hornworts. SSRs are known as DNA tracts having high mutation rates; however, according to our observations, they still can evolve slowly. The conservativeness of these SSRs suggests that they are under strong selection and could play critical roles in maintaining the gene functions. PMID:24491104

  14. Initial sequencing and comparative analysis of the mouse genome

    SciTech Connect

    Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F.; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E.; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R.; Brown, Daniel G.; Brown, Stephen D.; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D.; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T.; Church, Deanna M.; Clamp, Michele; Clee, Christopher; Collins, Francis S.; Cook, Lisa L.; Copley, Richard R.; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D.; Deri, Justin; Dermitzakis, Emmanouil T.; Dewey, Colin; Dickens, Nicholas J.; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M.; Eddy, Sean R.; Elnitski, Laura; Emes, Richard D.; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A.; Flicek, Paul; Foley, Karen; Frankel, Wayne N.; Fulton, Lucinda A.; Fulton, Robert S.; Furey, Terrence S.; Gage, Diane; Gibbs, Richard A.; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A.; Green, Eric D.; Gregory, Simon; Guigo, Roderic; Guyer, Mark; Hardison, Ross C.; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W.; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B.; Johnson, L. Steven; Jones, Matthew; Jones, Thomas A.; Joy, Ann; Kamal, Michael; Karlsson, Elinor K.; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W. James; Kirby, Andrew; Kolbe, Diana L.; Korf, Ian; Kucherlapati, Raju S.; Kulbokas III, Edward J.; Kulp, David; Landers, Tom; Leger, J.P.; Leonard, Steven; Letunic, Ivica; Levine, Rosie; et al.

    2002-12-15

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  15. Draft Genome Sequence of the Fungus Trametes hirsuta 072

    PubMed Central

    Tyazhelova, Tatiana V.; Moiseenko, Konstantin V.; Vasina, Daria V.; Mosunova, Olga V.; Fedorova, Tatiana V.; Maloshenok, Lilya G.; Landesman, Elena O.; Bruskin, Sergei A.; Psurtseva, Nadezhda V.; Slesarev, Alexei I.; Kozyavkin, Sergei A.; Koroleva, Olga V.

    2015-01-01

    A standard draft genome sequence of the white rot saprotrophic fungus Trametes hirsuta 072 (Basidiomycota, Polyporales) is presented. The genome sequence contains about 33.6 Mb assembled in 141 scaffolds with a G+C content of ~57.6%. The draft genome annotation predicts 14,598 putative protein-coding open reading frames (ORFs). PMID:26586872

  16. Draft Genome Sequence of Streptomyces hygroscopicus subsp. hygroscopicus NBRC 16556.

    PubMed

    Komaki, Hisayuki; Ichikawa, Natsuko; Oguchi, Akio; Hamada, Moriyuki; Tamura, Tomohiko; Suzuki, Ken-Ichiro; Fujita, Nobuyuki

    2016-01-01

    Here, we report the draft genome sequence of strain NBRC 16556, deposited as Streptomyces hygroscopicus subsp. hygroscopicus into the NBRC culture collection. An average nucleotide identity analysis confirmed that the taxonomic identification is correct. The genome sequence will serve as a valuable reference for genome mining to search new secondary metabolites. PMID:27198007

  17. Complete Genome Sequence of the Embu Virus Strain SPAn880

    PubMed Central

    Antwerpen, Markus; Georgi, Enrico; Vette, Philipp; Zoeller, Gudrun; Meyer, Hermann

    2014-01-01

    We report the complete genome sequence of the Embu virus. The genome consists of 185,139 bp and is nearly identical to that of the Cotia virus. This is the first report on the Embu virus genome sequence, which has been considered an unclassified poxvirus until now. PMID:25477400

  18. Draft Genome Sequence of Alternaria alternata ATCC 34957.

    PubMed

    Nguyen, Hai D T; Lewis, Christopher T; Lévesque, C André; Gräfenhan, Tom

    2016-01-01

    We report the draft genome sequence of Alternaria alternata ATCC 34957. This strain was previously reported to produce alternariol and alternariol monomethyl ether on weathered grain sorghum. The genome was sequenced with PacBio technology and assembled into 27 scaffolds with a total genome size of 33.5 Mb. PMID:26769939

  19. Draft Genome Sequence of Alternaria alternata ATCC 34957

    PubMed Central

    Nguyen, Hai D. T.; Lewis, Christopher T.; Lévesque, C. André

    2016-01-01

    We report the draft genome sequence of Alternaria alternata ATCC 34957. This strain was previously reported to produce alternariol and alternariol monomethyl ether on weathered grain sorghum. The genome was sequenced with PacBio technology and assembled into 27 scaffolds with a total genome size of 33.5 Mb. PMID:26769939

  20. Genome sequence of the Pea Aphid Acyrthosiphon pisum

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The International aphid genome consortium, IAGC, herein presents the 464 Mb draft genome assembly sequence of the pea aphid Acyrthosiphon pisum. This is the first published whole genome sequence from the diverse assemblage of hemimetabolous insects, providing an outgroup to the multiple published g...

  1. The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative Genomics

    PubMed Central

    2003-01-01

    The soil nematodes Caenorhabditis briggsae and Caenorhabditis elegans diverged from a common ancestor roughly 100 million years ago and yet are almost indistinguishable by eye. They have the same chromosome number and genome sizes, and they occupy the same ecological niche. To explore the basis for this striking conservation of structure and function, we have sequenced the C. briggsae genome to a high-quality draft stage and compared it to the finished C. elegans sequence. We predict approximately 19,500 protein-coding genes in the C. briggsae genome, roughly the same as in C. elegans. Of these, 12,200 have clear C. elegans orthologs, a further 6,500 have one or more clearly detectable C. elegans homologs, and approximately 800 C. briggsae genes have no detectable matches in C. elegans. Almost all of the noncoding RNAs (ncRNAs) known are shared between the two species. The two genomes exhibit extensive colinearity, and the rate of divergence appears to be higher in the chromosomal arms than in the centers. Operons, a distinctive feature of C. elegans, are highly conserved in C. briggsae, with the arrangement of genes being preserved in 96% of cases. The difference in size between the C. briggsae (estimated at approximately 104 Mbp) and C. elegans (100.3 Mbp) genomes is almost entirely due to repetitive sequence, which accounts for 22.4% of the C. briggsae genome in contrast to 16.5% of the C. elegans genome. Few, if any, repeat families are shared, suggesting that most were acquired after the two species diverged or are undergoing rapid evolution. Coclustering the C. elegans and C. briggsae proteins reveals 2,169 protein families of two or more members. Most of these are shared between the two species, but some appear to be expanding or contracting, and there seem to be as many as several hundred novel C. briggsae gene families. The C. briggsae draft sequence will greatly improve the annotation of the C. elegans genome. Based on similarity to C. briggsae, we found strong evidence for 1,300 new C. elegans genes. In addition, comparisons of the two genomes will help to understand the evolutionary forces that mold nematode genomes. PMID:14624247

  2. Complete genome sequence of Methanocorpusculum labreanum type strain Z

    SciTech Connect

    Anderson, Iain; Sieprawska-Lupa, Magdalena; Goltsman, Eugene; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Tice, Hope; Dalin, Eileen; Barry, Kerrie; Pitluck, Sam; Hauser, Loren John; Land, Miriam L; Lucas, Susan; Richardson, P M; Whitman, W. B.; Kyrpides, Nikos C

    2009-01-01

    Methanocorpusculum labreanum is a methanogen belonging to the order Methanomicrobiales within the archaeal phylum Euryarchaeota. The type strain Z was isolated from surface sediments of Tar Pit Lake in the La Brea Tar Pits in Los Angeles, California. M. labreanum is of phylogenetic interest because at the time the sequencing project began only one genome had previously been sequenced from the order Methanomicrobiales. We report here the complete genome sequence of M. labreanum type strain Z and its annotation. This is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.

  3. Complete genome sequence of Methanoculleus marisnigri type strain JR1

    SciTech Connect

    Anderson, Iain; Sieprawska-Lupa, Magdalena; Goltsman, Eugene; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Tice, Hope; Dalin, Eileen; Barry, Kerrie; Saunders, Elizabeth H; Han, Cliff; Brettin, Tom; Detter, J. Chris; Bruce, David; Mikhailova, Natalia; Pitluck, Sam; Hauser, Loren John; Land, Miriam L; Lucas, Susan; Richardson, P M; Whitman, W. B.; Kyrpides, Nikos C

    2009-01-01

    Methanoculleus marisnigri Romesser et al. 1981 is a methanogen belonging to the order Methanomicrobiales within the archaeal phylum Euryarchaeota. The type strain, JR1, was isolated from anoxic sediments of the Black Sea. M. marisnigri is of phylogenetic interest because at the time the sequencing project began only one genome had previously been sequenced from the order Methanomicrobiales. We report here the complete genome sequence of M. marisnigri type strain JR1 and its annotation. This is part of a Joint Genome Institute 2006 Community Sequencing Program to sequence genomes of diverse Archaea.

  4. TAG Sequence Identification of Genomic Regions Using TAGdb.

    PubMed

    Ruperao, Pradeep

    2016-01-01

    Second-generation sequencing (SGS) technology has enabled the sequencing of genomes and identification of genes. However, large complex plant genomes remain particularly difficult for de novo assembly. Access to the vast quantity of raw sequence data may facilitate discoveries; however the volume of this data makes access difficult. This chapter discusses the Web-based tool TAGdb that enables researchers to identify paired read second-generation DNA sequence data that share identity with a submitted query sequence. The identified reads can be used for PCR amplification of genomic regions to identify genes and promoters without the need for genome assembly. PMID:26519409

  5. Genomic Sequence Comparisons, 1987-2003 Final Report

    SciTech Connect

    George M. Church

    2004-07-29

    This project was to develop new DNA sequencing and RNA and protein quantitation methods and related genome annotation tools. The project began in 1987 with the development of multiplex sequencing (published in Science in 1988), and one of the first automated sequencing methods. This lead to the first commercial genome sequence in 1994 and to the establishment of the main commercial participants (GTC then Agencourt) in the public DOE/NIH genome project. In collaboration with GTC we contributed to one of the first complete DOE genome sequences, in 1997, that of Methanobacterium thermoautotropicum, a species of great relevance to energy-rich gas production.

  6. Update on Genomic Databases and Resources at the National Center for Biotechnology Information.

    PubMed

    Tatusova, Tatiana

    2016-01-01

    The National Center for Biotechnology Information (NCBI), as a primary public repository of genomic sequence data, collects and maintains enormous amounts of heterogeneous data. Data for genomes, genes, gene expressions, gene variation, gene families, proteins, and protein domains are integrated with the analytical, search, and retrieval resources through the NCBI website, text-based search and retrieval system, provides a fast and easy way to navigate across diverse biological databases.Comparative genome analysis tools lead to further understanding of evolution processes quickening the pace of discovery. Recent technological innovations have ignited an explosion in genome sequencing that has fundamentally changed our understanding of the biology of living organisms. This huge increase in DNA sequence data presents new challenges for the information management system and the visualization tools. New strategies have been designed to bring an order to this genome sequence shockwave and improve the usability of associated data. PMID:27115625

  7. Detecting long tandem duplications in genomic sequences

    PubMed Central

    2012-01-01

    Background Detecting duplication segments within completely sequenced genomes provides valuable information to address genome evolution and in particular the important question of the emergence of novel functions. The usual approach to gene duplication detection, based on all-pairs protein gene comparisons, provides only a restricted view of duplication. Results In this paper, we introduce ReD Tandem, a software using a flow based chaining algorithm targeted at detecting tandem duplication arrays of moderate to longer length regions, with possibly locally weak similarities, directly at the DNA level. On the A. thaliana genome, using a reference set of tandem duplicated genes built using TAIR,a we show that ReD Tandem is able to predict a large fraction of recently duplicated genes (dS < 1) and that it is also able to predict tandem duplications involving non coding elements such as pseudo-genes or RNA genes. Conclusions ReD Tandem allows to identify large tandem duplications without any annotation, leading to agnostic identification of tandem duplications. This approach nicely complements the usual protein gene based which ignores duplications involving non coding regions. It is however inherently restricted to relatively recent duplications. By recovering otherwise ignored events, ReD Tandem gives a more comprehensive view of existing evolutionary processes and may also allow to improve existing annotations. PMID:22568762

  8. Complete Genome Sequence of Rift Valley Fever Virus Strain Lunyo

    PubMed Central

    Horton, Daniel L.; Marston, Denise A.; Johnson, Nicholas; Ellis, Richard J.; Fooks, Anthony R.; Hewson, Roger

    2016-01-01

    Using next-generation sequencing technologies, the first complete genome sequence of Rift Valley fever virus strain Lunyo is reported here. Originally reported as an attenuated antigenic variant strain from Uganda, genomic sequence analysis shows that Lunyo clusters together with other Ugandan isolates. PMID:27081121

  9. Genome Sequence of Stachybotrys chartarum Strain 51-11.

    PubMed

    Betancourt, Doris A; Dean, Timothy R; Kim, Jean; Levy, Josh

    2015-01-01

    The Stachybotrys chartarum strain 51-11 genome was sequenced by shotgun sequencing utilizing Illumina HiSeq 2000 and PacBio technologies. Since S. chartarum has been implicated as having health impacts within water-damaged buildings, any information extracted from the genomic sequence data relating to toxins or the metabolism of the fungus might be useful. PMID:26430036

  10. Complete Genome Sequence of Rift Valley Fever Virus Strain Lunyo.

    PubMed

    Lumley, Sarah; Horton, Daniel L; Marston, Denise A; Johnson, Nicholas; Ellis, Richard J; Fooks, Anthony R; Hewson, Roger

    2016-01-01

    Using next-generation sequencing technologies, the first complete genome sequence of Rift Valley fever virus strain Lunyo is reported here. Originally reported as an attenuated antigenic variant strain from Uganda, genomic sequence analysis shows that Lunyo clusters together with other Ugandan isolates. PMID:27081121

  11. Next Generation Sequencing at the University of Chicago Genomics Core

    SciTech Connect

    Faber, Pieter

    2013-04-24

    The University of Chicago Genomics Core provides University of Chicago investigators (and external clients) access to State-of-the-Art genomics capabilities: next generation sequencing, Sanger sequencing / genotyping and micro-arrays (gene expression, genotyping, and methylation). The current presentation will highlight our capabilities in the area of ultra-high throughput sequencing analysis.

  12. Complete Genomic Sequence of Duck Flavivirus from China

    PubMed Central

    Liu, Ming; Liu, Chunguo; Li, Gang; Li, Xiaojun; Yin, Xiuchen; Chen, Yuhuan

    2012-01-01

    We report here the complete genomic sequence of the Chinese duck flavivirus TA strain. This work is the first to document the complete genomic sequence of this previously unknown duck flavivirus strain. The sequence will help further relevant epidemiological studies and extend our general knowledge of flaviviruses. PMID:22354941

  13. Current challenges in de novo plant genome sequencing and assembly

    PubMed Central

    2012-01-01

    Genome sequencing is now affordable, but assembling plant genomes de novo remains challenging. We assess the state of the art of assembly and review the best practices for the community. PMID:22546054

  14. Genome sequencing of the important oilseed crop Sesamum indicum L

    PubMed Central

    2013-01-01

    The Sesame Genome Working Group (SGWG) has been formed to sequence and assemble the sesame (Sesamum indicum L.) genome. The status of this project and our planned analyses are described. PMID:23369264

  15. Draft Genome Sequences of 63 Pseudomonas aeruginosa Isolates Recovered from Cystic Fibrosis Sputum.

    PubMed

    Spilker, Theodore; LiPuma, John J

    2016-01-01

    Here, we report the draft genome sequences of 63 ITALIC! Pseudomonas aeruginosaisolates, recovered in culture of sputum from 15 individuals with cystic fibrosis (CF) receiving care in a single CF care center over a 13-year period. These sequences add value to studies of within-host evolution of bacterial pathogens during chronic infection. PMID:27103710

  16. Complete Genome Sequence of Human Norovirus GII.4_2006b, a Variant of Minerva 2006

    PubMed Central

    Yang, Zhihui; Mammel, Mark K.

    2016-01-01

    In 2006, the National Calicivirus Laboratory at the U.S. Centers for Disease Control and Prevention (CDC) confirmed multistate outbreaks of norovirus infection and identified two new GII.4 norovirus strains (Minerva and Laurens) through partial sequencing of the major capsid (VP1) gene. Here, we report the first complete genome sequence of the GII.4 Minerva isolate. PMID:26823589

  17. Draft Genome Sequences of 63 Pseudomonas aeruginosa Isolates Recovered from Cystic Fibrosis Sputum

    PubMed Central

    Spilker, Theodore

    2016-01-01

    Here, we report the draft genome sequences of 63 Pseudomonas aeruginosa isolates, recovered in culture of sputum from 15 individuals with cystic fibrosis (CF) receiving care in a single CF care center over a 13-year period. These sequences add value to studies of within-host evolution of bacterial pathogens during chronic infection. PMID:27103710

  18. Draft Genome Sequences of Klebsiella variicola Plant Isolates.

    PubMed

    Martínez-Romero, Esperanza; Silva-Sanchez, Jesús; Barrios, Humberto; Rodríguez-Medina, Nadia; Martínez-Barnetche, Jesús; Téllez-Sosa, Juan; Gómez-Barreto, Rosa Elena; Garza-Ramos, Ulises

    2015-01-01

    Three endophytic Klebsiella variicola isolates-T29A, 3, and 6A2, obtained from sugar cane stem, maize shoots, and banana leaves, respectively-were used for whole-genome sequencing. Here, we report the draft genome sequences of circular chromosomes and plasmids. The genomes contain plant colonization and cellulases genes. This study will help toward understanding the genomic basis of K. variicola interaction with plant hosts. PMID:26358599

  19. Complete genome sequence of Arcanobacterium haemolyticum type strain (11018T)

    SciTech Connect

    Yasawong, Montri; Teshima, Hazuki; Lapidus, Alla L.; Nolan, Matt; Lucas, Susan; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Bruce, David; Detter, J. Chris; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Rohde, Manfred; Sikorski, Johannes; Pukall, Rudiger; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-01

    Vulcanisaeta distributa Itoh et al. 2002 belongs to the family Thermoproteaceae in the phylum Crenarchaeota. The genus Vulcanisaeta is characterized by a global distribution in hot and acidic springs. This is the first genome sequence from a member of the genus Vulcanisaeta and seventh genome sequence in the family Thermoproteaceae. The 2,374,137 bp long genome with its 2,544 protein-coding and 49 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  20. Draft Genome Sequences of Klebsiella variicola Plant Isolates

    PubMed Central

    Martínez-Romero, Esperanza; Silva-Sanchez, Jesús; Barrios, Humberto; Rodríguez-Medina, Nadia; Martínez-Barnetche, Jesús; Téllez-Sosa, Juan; Gómez-Barreto, Rosa Elena

    2015-01-01

    Three endophytic Klebsiella variicola isolates—T29A, 3, and 6A2, obtained from sugar cane stem, maize shoots, and banana leaves, respectively—were used for whole-genome sequencing. Here, we report the draft genome sequences of circular chromosomes and plasmids. The genomes contain plant colonization and cellulases genes. This study will help toward understanding the genomic basis of K. variicola interaction with plant hosts. PMID:26358599

  1. Reconstructing cancer genomes from paired-end sequencing data

    PubMed Central

    2012-01-01

    Background A cancer genome is derived from the germline genome through a series of somatic mutations. Somatic structural variants - including duplications, deletions, inversions, translocations, and other rearrangements - result in a cancer genome that is a scrambling of intervals, or "blocks" of the germline genome sequence. We present an efficient algorithm for reconstructing the block organization of a cancer genome from paired-end DNA sequencing data. Results By aligning paired reads from a cancer genome - and a matched germline genome, if available - to the human reference genome, we derive: (i) a partition of the reference genome into intervals; (ii) adjacencies between these intervals in the cancer genome; (iii) an estimated copy number for each interval. We formulate the Copy Number and Adjacency Genome Reconstruction Problem of determining the cancer genome as a sequence of the derived intervals that is consistent with the measured adjacencies and copy numbers. We design an efficient algorithm, called Paired-end Reconstruction of Genome Organization (PREGO), to solve this problem by reducing it to an optimization problem on an interval-adjacency graph constructed from the data. The solution to the optimization problem results in an Eulerian graph, containing an alternating Eulerian tour that corresponds to a cancer genome that is consistent with the sequencing data. We apply our algorithm to five ovarian cancer genomes that were sequenced as part of The Cancer Genome Atlas. We identify numerous rearrangements, or structural variants, in these genomes, analyze reciprocal vs. non-reciprocal rearrangements, and identify rearrangements consistent with known mechanisms of duplication such as tandem duplications and breakage/fusion/bridge (B/F/B) cycles. Conclusions We demonstrate that PREGO efficiently identifies complex and biologically relevant rearrangements in cancer genome sequencing data. An implementation of the PREGO algorithm is available at http://compbio.cs.brown.edu/software/. PMID:22537039

  2. Complete Genome Sequence of Corynebacterium pseudotuberculosis Viscerotropic Strain N1

    PubMed Central

    Portela, Ricardo W.; Sousa, Thiago J.; Rocha, Flávia; Pereira, Felipe L.; Dorella, Fernanda A.; Carvalho, Alex F.; Menezes, Nildo; Macedo, Eduardo S.; Moura-Costa, Lilia F.; Meyer, Roberto; Leal, Carlos A. G.; Figueiredo, Henrique C.; Azevedo, Vasco

    2016-01-01

    We present the complete genome sequence of Corynebacterium pseudotuberculosis strain N1. The sequencing was performed with the Ion Torrent Personal Genome Machine system. The genome is a circular chromosome with 2,337,845 bp, a G+C content of 52.85%, and a total of 2,045 coding sequences, 12 rRNAs, 49 tRNAs, and 58 pseudogenes. PMID:26823597

  3. Complete Genome Sequence of Corynebacterium pseudotuberculosis Viscerotropic Strain N1.

    PubMed

    Loureiro, Dan; Portela, Ricardo W; Sousa, Thiago J; Rocha, Flávia; Pereira, Felipe L; Dorella, Fernanda A; Carvalho, Alex F; Menezes, Nildo; Macedo, Eduardo S; Moura-Costa, Lilia F; Meyer, Roberto; Leal, Carlos A G; Figueiredo, Henrique C; Azevedo, Vasco

    2016-01-01

    We present the complete genome sequence of Corynebacterium pseudotuberculosis strain N1. The sequencing was performed with the Ion Torrent Personal Genome Machine system. The genome is a circular chromosome with 2,337,845 bp, a G+C content of 52.85%, and a total of 2,045 coding sequences, 12 rRNAs, 49 tRNAs, and 58 pseudogenes. PMID:26823597

  4. Caenorhabditis elegans mutant allele identification by whole-genome sequencing.

    PubMed

    Sarin, Sumeet; Prabhu, Snehit; O'Meara, M Maggie; Pe'er, Itsik; Hobert, Oliver

    2008-10-01

    Identification of the molecular lesion in Caenorhabditis elegans mutants isolated through forward genetic screens usually involves time-consuming genetic mapping. We used Illumina deep sequencing technology to sequence a complete, mutant C. elegans genome and thus pinpointed a single-nucleotide mutation in the genome that affects a neuronal cell fate decision. This constitutes a proof-of-principle for using whole-genome sequencing to analyze C. elegans mutants. PMID:18677319

  5. Genome sequencing and annotation of Proteus sp. SAS71

    PubMed Central

    Selim, Samy; Hassan, Sherif; Hagagy, Nashwa

    2015-01-01

    We report draft genome sequence of Proteus sp. strain SAS71, isolated from water spring in Aljouf region, Saudi Arabia. The draft genome size is 3,037,704 bp with a G + C content of 39.3% and contains 6 rRNA sequence (single copies of 5S, 16S & 23S rRNA). The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. LDIU00000000. PMID:26697338

  6. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions fr...

  7. Complete genome sequence of the alkaliphilic bacterium Bacillus halodurans and genomic sequence comparison with Bacillus subtilis.

    PubMed

    Takami, H; Nakasone, K; Takaki, Y; Maeno, G; Sasaki, R; Masui, N; Fuji, F; Hirama, C; Nakamura, Y; Ogasawara, N; Kuhara, S; Horikoshi, K

    2000-11-01

    The 4 202 353 bp genome of the alkaliphilic bacterium Bacillus halodurans C-125 contains 4066 predicted protein coding sequences (CDSs), 2141 (52.7%) of which have functional assignments, 1182 (29%) of which are conserved CDSs with unknown function and 743 (18. 3%) of which have no match to any protein database. Among the total CDSs, 8.8% match sequences of proteins found only in Bacillus subtilis and 66.7% are widely conserved in comparison with the proteins of various organisms, including B.subtilis. The B. halodurans genome contains 112 transposase genes, indicating that transposases have played an important evolutionary role in horizontal gene transfer and also in internal genetic rearrangement in the genome. Strain C-125 lacks some of the necessary genes for competence, such as comS, srfA and rapC, supporting the fact that competence has not been demonstrated experimentally in C-125. There is no paralog of tupA, encoding teichuronopeptide, which contributes to alkaliphily, in the C-125 genome and an ortholog of tupA cannot be found in the B.subtilis genome. Out of 11 sigma factors which belong to the extracytoplasmic function family, 10 are unique to B. halodurans, suggesting that they may have a role in the special mechanism of adaptation to an alkaline environment. PMID:11058132

  8. [Sequence and organization of Muntiacus reevesi mitochondrial genome].

    PubMed

    Zhang, Xiao-Mei; Shan, Xiang-Nian; Shi, Yan-Feng; Zhang, Hai-Jun; Li, Jian; Zheng, Ai-Ling

    2004-11-01

    A shot-gun DNA sequence strategy was employed,in which the mitochondrial genome library of Muntiacus reevesi has been constructed to obtain the complete mitochondrial genome sequence. The Chinese Muntjac's mitochondrial genome, consisting of 16354 base pairs which encode genes for 13 proteins, 2 rRNAs, and 22 tRNAs, is similar to those mammals in both order and orientation. The sequence of rRNA gene, some of the protein-coding regions and tRNAs are highly homologous in mammals. Differences existing in the length and sequence of the D-loop regions account for the variations in mammals mitochondrial genomes. PMID:15640115

  9. Sequencing and Assembly of the 22-Gb Loblolly Pine Genome

    PubMed Central

    Zimin, Aleksey; Stevens, Kristian A.; Crepeau, Marc W.; Holtz-Morris, Ann; Koriabine, Maxim; Marçais, Guillaume; Puiu, Daniela; Roberts, Michael; Wegrzyn, Jill L.; de Jong, Pieter J.; Neale, David B.; Salzberg, Steven L.; Yorke, James A.; Langley, Charles H.

    2014-01-01

    Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer “super-reads,” rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp. PMID:24653210

  10. Sequencing and assembly of the 22-gb loblolly pine genome.

    PubMed

    Zimin, Aleksey; Stevens, Kristian A; Crepeau, Marc W; Holtz-Morris, Ann; Koriabine, Maxim; Marçais, Guillaume; Puiu, Daniela; Roberts, Michael; Wegrzyn, Jill L; de Jong, Pieter J; Neale, David B; Salzberg, Steven L; Yorke, James A; Langley, Charles H

    2014-03-01

    Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer "super-reads," rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp. PMID:24653210

  11. The reference genome sequence of Saccharomyces cerevisiae: then and now.

    PubMed

    Engel, Stacia R; Dietrich, Fred S; Fisk, Dianna G; Binkley, Gail; Balakrishnan, Rama; Costanzo, Maria C; Dwight, Selina S; Hitz, Benjamin C; Karra, Kalpana; Nash, Robert S; Weng, Shuai; Wong, Edith D; Lloyd, Paul; Skrzypek, Marek S; Miyasato, Stuart R; Simison, Matt; Cherry, J Michael

    2014-03-01

    The genome of the budding yeast Saccharomyces cerevisiae was the first completely sequenced from a eukaryote. It was released in 1996 as the work of a worldwide effort of hundreds of researchers. In the time since, the yeast genome has been intensively studied by geneticists, molecular biologists, and computational scientists all over the world. Maintenance and annotation of the genome sequence have long been provided by the Saccharomyces Genome Database, one of the original model organism databases. To deepen our understanding of the eukaryotic genome, the S. cerevisiae strain S288C reference genome sequence was updated recently in its first major update since 1996. The new version, called "S288C 2010," was determined from a single yeast colony using modern sequencing technologies and serves as the anchor for further innovations in yeast genomic science. PMID:24374639

  12. Compiling Multicopy Single-Stranded DNA Sequences from Bacterial Genome Sequences

    PubMed Central

    Yoo, Wonseok; Lim, Dongbin

    2016-01-01

    A retron is a bacterial retroelement that encodes an RNA gene and a reverse transcriptase (RT). The former, once transcribed, works as a template primer for reverse transcription by the latter. The resulting DNA is covalently linked to the upstream part of the RNA; this chimera is called multicopy single-stranded DNA (msDNA), which is extrachromosomal DNA found in many bacterial species. Based on the conserved features in the eight known msDNA sequences, we developed a detection method and applied it to scan National Center for Biotechnology Information (NCBI) RefSeq bacterial genome sequences. Among 16,844 bacterial sequences possessing a retron-type RT domain, we identified 48 unique types of msDNA. Currently, the biological role of msDNA is not well understood. Our work will be a useful tool in studying the distribution, evolution, and physiological role of msDNA. PMID:27103888

  13. Selection to sequence: opportunities in fungal genomics

    SciTech Connect

    Baker, Scott E.

    2009-12-01

    Selection is a biological force, causing genotypic and phenotypic change over time. Whether environmental or human induced, selective pressures shape the genotypes and the phenotypes of organisms both in nature and in the laboratory. In nature, selective pressure is highly dynamic and the sum of the environment and other organisms. In the laboratory, selection is used in genetic studies and industrial strain development programs to isolate mutants affecting biological processes of interest to researchers. Selective pressures are important considerations for fungal biology. In the laboratory a number of fungi are used as experimental systems to study a wide range of biological processes and in nature fungi are important pathogens of plants and animals and play key roles in carbon and nitrogen cycling. The continued development of high throughput sequencing technologies makes it possible to characterize at the genomic level, the effect of selective pressures both in the lab and in nature for filamentous fungi as well as other organisms.

  14. A taste of pineapple evolution through genome sequencing.

    PubMed

    Xu, Qing; Liu, Zhong-Jian

    2015-12-01

    The genome sequence assembly of the highly heterozygous Ananas comosus and its varieties is an impressive technical achievement. The sequence opens the door to a greater understanding of pineapple morphology and evolution. PMID:26620110

  15. Insights from twenty years of bacterial genome sequencing

    SciTech Connect

    Land, Miriam L; Hauser, Loren John; Jun, Se Ran; Nookaew, Intawat; Leuze, Michael Rex; Ahn, Tae-Hyuk; Karpinets, Tatiana V; Lund, Ole; Kora, Guruprasad H; Wassenaar, Trudy; Poudel, Suresh; Ussery, David W

    2015-01-01

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them.

  16. Whole-genome sequencing in outbreak analysis.

    PubMed

    Gilchrist, Carol A; Turner, Stephen D; Riley, Margaret F; Petri, William A; Hewlett, Erik L

    2015-07-01

    In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  17. Genome Project Standards in a New Era of Sequencing

    SciTech Connect

    GSC Consortia; HMP Jumpstart Consortia; Chain, P. S. G.; Grafham, D. V.; Fulton, R. S.; FitzGerald, M. G.; Hostetler, J.; Muzny, D.; Detter, J. C.; Ali, J.; Birren, B.; Bruce, D. C.; Buhay, C.; Cole, J. R.; Ding, Y.; Dugan, S.; Field, D.; Garrity, G. M.; Gibbs, R.; Graves, T.; Han, C. S.; Harrison, S. H.; Highlander, S.; Hugenholtz, P.; Khouri, H. M.; Kodira, C. D.; Kolker, E.; Kyrpides, N. C.; Lang, D.; Lapidus, A.; Malfatti, S. A.; Markowitz, V.; Metha, T.; Nelson, K. E.; Parkhill, J.; Pitluck, S.; Qin, X.; Read, T. D.; Schmutz, J.; Sozhamannan, S.; Strausberg, R.; Sutton, G.; Thomson, N. R.; Tiedje, J. M.; Weinstock, G.; Wollam, A.

    2009-06-01

    For over a decade, genome 43 sequences have adhered to only two standards that are relied on for purposes of sequence analysis by interested third parties (1, 2). However, ongoing developments in revolutionary sequencing technologies have resulted in a redefinition of traditional whole genome sequencing that requires a careful reevaluation of such standards. With commercially available 454 pyrosequencing (followed by Illumina, SOLiD, and now Helicos), there has been an explosion of genomes sequenced under the moniker 'draft', however these can be very poor quality genomes (due to inherent errors in the sequencing technologies, and the inability of assembly programs to fully address these errors). Further, one can only infer that such draft genomes may be of poor quality by navigating through the databases to find the number and type of reads deposited in sequence trace repositories (and not all genomes have this available), or to identify the number of contigs or genome fragments deposited to the database. The difficulty in assessing the quality of such deposited genomes has created some havoc for genome analysis pipelines and contributed to many wasted hours of (mis)interpretation. These same novel sequencing technologies have also brought an exponential leap in raw sequencing capability, and at greatly reduced prices that have further skewed the time- and cost-ratios of draft data generation versus the painstaking process of improving and finishing a genome. The resulting effect is an ever-widening gap between drafted and finished genomes that only promises to continue (Figure 1), hence there is an urgent need to distinguish good and poor datasets. The sequencing institutes in the authorship, along with the NIH's Human Microbiome Project Jumpstart Consortium (3), strongly believe that a new set of standards is required for genome sequences. The following represents a set of six community-defined categories of genome sequence standards that better reflect the quality of the genome sequence, based on our collective understanding of the different technologies, available assemblers, and the varied efforts to improve upon drafted genomes. Due to the increasingly rapid pace of genomics we avoided the use of rigid numerical thresholds in our definitions to take into account the types of products achieved by any combination of technology, chemistry, assembler, or improvement/finishing process.

  18. Genome Wide Characterization of Simple Sequence Repeats in Cucumber

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The whole genome sequence of the cucumber cultivar Gy14 was recently sequenced at 15× coverage with the Roche 454 Titanium technology. The microsatellite DNA sequences (simple sequence repeats, SSRs) in the assembled scaffolds were computationally explored and characterized. A total of 112,073 SSRs ...

  19. Finishing The Euchromatic Sequence Of The Human Genome

    SciTech Connect

    Rubin, Edward M.; Lucas, Susan; Richardson, Paul; Rokhsar, Daniel; Pennacchio, Len

    2004-09-07

    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process.The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers {approx}99% of the euchromatic genome and is accurate to an error rate of {approx}1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number,birth and death. Notably, the human genome seems to encode only20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead.

  20. Validation of rice genome sequence by optical mapping

    PubMed Central

    Zhou, Shiguo; Bechner, Michael C; Place, Michael; Churas, Chris P; Pape, Louise; Leong, Sally A; Runnheim, Rod; Forrest, Dan K; Goldstein, Steve; Livny, Miron; Schwartz, David C

    2007-01-01

    Background Rice feeds much of the world, and possesses the simplest genome analyzed to date within the grass family, making it an economically relevant model system for other cereal crops. Although the rice genome is sequenced, validation and gap closing efforts require purely independent means for accurate finishing of sequence build data. Results To facilitate ongoing sequencing finishing and validation efforts, we have constructed a whole-genome SwaI optical restriction map of the rice genome. The physical map consists of 14 contigs, covering 12 chromosomes, with a total genome size of 382.17 Mb; this value is about 11% smaller than original estimates. 9 of the 14 optical map contigs are without gaps, covering chromosomes 1, 2, 3, 4, 5, 7, 8 10, and 12 in their entirety – including centromeres and telomeres. Alignments between optical and in silico restriction maps constructed from IRGSP (International Rice Genome Sequencing Project) and TIGR (The Institute for Genomic Research) genome sequence sources are comprehensive and informative, evidenced by map coverage across virtually all published gaps, discovery of new ones, and characterization of sequence misassemblies; all totalling ~14 Mb. Furthermore, since optical maps are ordered restriction maps, identified discordances are pinpointed on a reliable physical scaffold providing an independent resource for closure of gaps and rectification of misassemblies. Conclusion Analysis of sequence and optical mapping data effectively validates genome sequence assemblies constructed from large, repeat-rich genomes. Given this conclusion we envision new applications of such single molecule analysis that will merge advantages offered by high-resolution optical maps with inexpensive, but short sequence reads generated by emerging sequencing platforms. Lastly, map construction techniques presented here points the way to new types of comparative genome analysis that would focus on discernment of structural differences revealed by optical maps constructed from a broad range of rice subspecies and varieties. PMID:17697381

  1. SEQUENCING THE PIG GENOME USING A BAC BY BAC APPROACH

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We have generated a highly contiguous physical map covering >98% of the pig genome in just 176 contigs. The map is localized to the genome through integration with the UIVC RH map as well BAC end sequence alignments to the human genome. Over 265k HindIII restriction digest fingerprints totaling 16.2...

  2. Genome Sequence of Mushroom Soft-Rot Pathogen Janthinobacterium agaricidamnosum.

    PubMed

    Graupner, Katharina; Lackner, Gerald; Hertweck, Christian

    2015-01-01

    Janthinobacterium agaricidamnosum causes soft-rot disease of the cultured button mushroom Agaricus bisporus and is thus responsible for agricultural losses. Here, we present the genome sequence of J. agaricidamnosum DSM 9628. The 5.9-Mb genome harbors several secondary metabolite biosynthesis gene clusters, which renders this neglected bacterium a promising source for genome mining approaches. PMID:25883287

  3. Draft Genome Sequence of a Diarrheagenic Morganella morganii Isolate

    PubMed Central

    Singh, Pallavi; Mosci, Rebekah; Rudrik, James T.

    2015-01-01

    This is a report of the whole-genome draft sequence of a diarrheagenic Morganella morganii isolate from a patient in Michigan, USA. This genome represents an important addition to the limited number of pathogenic M.morganii genomes available. PMID:26450735

  4. Initial sequencing and analysis of the human genome.

    PubMed

    Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, Y; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowki, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

    2001-02-15

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence. PMID:11237011

  5. Draft Genome Sequence of Neurospora crassa Strain FGSC 73

    SciTech Connect

    Baker, Scott E.; Schackwitz, Wendy; Lipzen, Anna; Martin, Joel; Haridas, Sajeet; LaButti, Kurt; Grigoriev, Igor V.; Simmons, Blake A.; McCluskey, Kevin

    2015-04-02

    We report the elucidation of the complete genome of the Neurospora crassa (Shear and Dodge) strain FGSC 73, a mat-a, trp-3 mutant strain. The genome sequence around the idiotypic mating type locus represents the only publicly available sequence for a mat-a strain. 40.42 Megabases are assembled into 358 scaffolds carrying 11,978 gene models.

  6. Complete genome sequence of ‘Candidatus Liberibacter africanus’

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The complete genome sequence of ‘Candidatus Liberibacter africanus’ (Laf), strain ptsapsy, was obtained by an Illumina HiSeq 2000. The Laf genome comprises 1,192,232 nucleotides, 34.5% GC content, 1,141 predicted coding sequences, 44 tRNAs, 3 complete copies of ribosomal RNA genes (16S, 23S and 5S) ...

  7. Complete Genome Sequences of Five Paenibacillus larvae Bacteriophages.

    PubMed

    Sheflo, Michael A; Gardner, Adam V; Merrill, Bryan D; Fisher, Joshua N B; Lunt, Bryce L; Breakwell, Donald P; Grose, Julianne H; Burnett, Sandra H

    2013-01-01

    Paenibacillus larvae is a pathogen of honeybees that causes American foulbrood (AFB). We isolated bacteriophages from soil containing bee debris collected near beehives in Utah. We announce five high-quality complete genome sequences, which represent the first completed genome sequences submitted to GenBank for any P. larvae bacteriophage. PMID:24233582

  8. Complete Genome Sequence of Bacillus megaterium Bacteriophage Eldridge.

    PubMed

    Reveille, Alexandra M; Eldridge, Kimberly A; Temple, Louise M

    2016-01-01

    In this study the complete genome sequence of the unique bacteriophage Eldridge, isolated from soil using ITALIC! Bacillus megateriumas the host organism, was determined. Eldridge is a myovirus with a genome consisting of 242 genes and is unique when compared to phage sequences in GenBank. PMID:27103735

  9. Draft Genome Sequence of the Wolbachia Endosymbiont of Drosophila suzukii

    PubMed Central

    Cestaro, Alessandro; Kaur, Rupinder; Pertot, Ilaria; Rota-Stabelli, Omar; Anfora, Gianfranco

    2013-01-01

    Wolbachia is one of the most successful and abundant symbiotic bacteria in nature, infecting more than 40% of the terrestrial arthropod species. Here we report the draft genome sequence of a novel Wolbachia strain named “wSuzi” that was retrieved from the genome sequencing of its host, the invasive pest Drosophila suzukii. PMID:23472225

  10. Draft Genome Sequence of the Wolbachia Endosymbiont of Drosophila suzukii.

    PubMed

    Siozios, Stefanos; Cestaro, Alessandro; Kaur, Rupinder; Pertot, Ilaria; Rota-Stabelli, Omar; Anfora, Gianfranco

    2013-01-01

    Wolbachia is one of the most successful and abundant symbiotic bacteria in nature, infecting more than 40% of the terrestrial arthropod species. Here we report the draft genome sequence of a novel Wolbachia strain named "wSuzi" that was retrieved from the genome sequencing of its host, the invasive pest Drosophila suzukii. PMID:23472225

  11. The Prospects for Sequencing the Western Corn Rootworm Genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Historically, obtaining the complete sequence of eukaryotic genomes has been an expensive and complex task. For this reason, efforts to sequence insect genomes have largely been confined to model organisms, species that are important to human health, and representative species from a few insect orde...

  12. Draft Genome Sequence of "Cohnella kolymensis" B-2846.

    PubMed

    Karlyshev, Andrey V; Kudryashova, Ekaterina B; Ariskina, Elena V

    2016-01-01

    A draft genome sequence of "Cohnella kolymensis" strain B-2846 was derived using IonTorrent sequencing technology. The size of the assembly and G+C content were in agreement with those of other species of this genus. Characterization of the genome of a novel species of Cohnella will assist in bacterial systematics. PMID:26769947

  13. De Novo Genome Sequence of Yersinia aleksiciae Y159T

    PubMed Central

    Neubauer, Heinrich

    2015-01-01

    We report here on the genome sequence of Yersinia aleksiciae Y159T, isolated in Finland in 1981. The genome has a size of 4 Mb, a G+C content of 49%, and is predicted to contain 3,423 coding sequences. PMID:26383649

  14. Nearly Complete Genome Sequence of Lactobacillus plantarum Strain NIZO2877

    PubMed Central

    Bayjanov, Jumamurat R.; Joncour, Pauline; Hughes, Sandrine; Gillet, Benjamin; Kleerebezem, Michiel; Siezen, Roland; van Hijum, Sacha A. F. T.

    2015-01-01

    Lactobacillus plantarum is a versatile bacterial species that is isolated mostly from foods. Here, we present the first genome sequence of L. plantarum strain NIZO2877 isolated from a hot dog in Vietnam. Its two contigs represent a nearly complete genome sequence. PMID:26607887

  15. Complete Genome Sequence of Bacillus megaterium Bacteriophage Eldridge

    PubMed Central

    Reveille, Alexandra M.; Eldridge, Kimberly A.

    2016-01-01

    In this study the complete genome sequence of the unique bacteriophage Eldridge, isolated from soil using Bacillus megaterium as the host organism, was determined. Eldridge is a myovirus with a genome consisting of 242 genes and is unique when compared to phage sequences in GenBank. PMID:27103735

  16. Complete Genome Sequence of Enterococcus faecium ATCC 700221.

    PubMed

    McKenney, Peter T; Ling, Lilan; Wang, Guilin; Mane, Shrikant; Pamer, Eric G

    2016-01-01

    We report the complete genome sequence of a vancomycin-resistant isolate of Enterococcus faecium derived from human feces. The genome comprises one chromosome of 2.9 Mb and three plasmids. The strain harbors a plasmid-borne vanA-type vancomycin resistance locus and is a member of multilocus sequencing type (MLST) cluster ST-17. PMID:27198022

  17. Genome sequencing and annotation of Cellulomonas sp. HZM

    PubMed Central

    Chua, Patric; Har, Zi Mei; Austin, Christopher M.; Yule, Catherine M.; Dykes, Gary A.; Lee, Sui Mae

    2015-01-01

    We report the draft genome sequence of Cellulomonas sp. HZM, isolated from a tropical peat swamp forest. The draft genome size is 3,559,280 bp with a G + C content of 73% and contains 3 rRNA sequences (single copies of 5S, 16S and 23S rRNA). PMID:26484221

  18. Genome sequencing and annotation of Cellulomonas sp. HZM.

    PubMed

    Chua, Patric; Har, Zi Mei; Austin, Christopher M; Yule, Catherine M; Dykes, Gary A; Lee, Sui Mae

    2015-09-01

    We report the draft genome sequence of Cellulomonas sp. HZM, isolated from a tropical peat swamp forest. The draft genome size is 3,559,280 bp with a G + C content of 73% and contains 3 rRNA sequences (single copies of 5S, 16S and 23S rRNA). PMID:26484221

  19. Draft Genome Sequence of Mycobacterium heraklionense Strain Davo

    PubMed Central

    Greninger, Alexander L.; Cunningham, Gail; Chiu, Charles Y.

    2015-01-01

    We report the draft genome sequence of Mycobacterium heraklionense strain Davo, isolated from a fine-needle aspirate of a right-ankle soft-tissue mass. This is the first draft genome sequence of Mycobacterium heraklionense, a nonpigmented rapidly growing mycobacterium. PMID:26205863

  20. Complete genome sequence of Melissococcus plutonius ATCC 35311.

    PubMed

    Okumura, Kayo; Arai, Rie; Okura, Masatoshi; Kirikae, Teruo; Takamatsu, Daisuke; Osaki, Makoto; Miyoshi-Akiyama, Tohru

    2011-08-01

    We report the first completely annotated genome sequence of Melissococcus plutonius ATCC 35311. M. plutonius is a one-genus, one-species bacterium and the etiological agent of European foulbrood of the honeybee. The genome sequence will provide new insights into the molecular mechanisms underlying its pathogenicity. PMID:21622755

  1. Complete Genome Sequence of Melissococcus plutonius ATCC 35311 ▿

    PubMed Central

    Okumura, Kayo; Arai, Rie; Okura, Masatoshi; Kirikae, Teruo; Takamatsu, Daisuke; Osaki, Makoto; Miyoshi-Akiyama, Tohru

    2011-01-01

    We report the first completely annotated genome sequence of Melissococcus plutonius ATCC 35311. M. plutonius is a one-genus, one-species bacterium and the etiological agent of European foulbrood of the honeybee. The genome sequence will provide new insights into the molecular mechanisms underlying its pathogenicity. PMID:21622755

  2. Whole-genome sequence of Streptococcus pseudopneumoniae isolate IS7493.

    PubMed

    Shahinas, Dea; Tamber, Gurdeep Singh; Arya, Gitanjali; Wong, Andrew; Lau, Rachel; Jamieson, Frances; Ma, Jennifer H; Alexander, David C; Low, Donald E; Pillai, Dylan R

    2011-11-01

    Streptococcus pseudopneumoniae is a member of the viridans group streptococci (VGS) whose pathogenic significance is unclear. We announce the complete genome sequence of S. pseudopneumoniae IS7493. The genome sequence will assist in the characterization of this new organism and facilitate the development of accurate diagnostic assays to distinguish it from Streptococcus pneumoniae and Streptococcus mitis. PMID:21994930

  3. Draft Genome Sequence of “Cohnella kolymensis” B-2846

    PubMed Central

    Kudryashova, Ekaterina B.; Ariskina, Elena V.

    2016-01-01

    A draft genome sequence of “Cohnella kolymensis” strain B-2846 was derived using IonTorrent sequencing technology. The size of the assembly and G+C content were in agreement with those of other species of this genus. Characterization of the genome of a novel species of Cohnella will assist in bacterial systematics. PMID:26769947

  4. De novo assembly of a bell pepper endornavirus genome sequence using RNA sequencing data.

    PubMed

    Jo, Yeonhwa; Choi, Hoseng; Cho, Won Kyong

    2015-01-01

    The genus Endornavirus is a double-stranded RNA virus that infects a wide range of hosts. In this study, we report on the de novo assembly of a bell pepper endornavirus genome sequence by RNA sequencing (RNA-Seq). Our result demonstrates the successful application of RNA-Seq to obtain a complete viral genome sequence from the transcriptome data. PMID:25792042

  5. Ten years of bacterial genome sequencing: comparative-genomics-based discoveries.

    PubMed

    Binnewies, Tim T; Motro, Yair; Hallin, Peter F; Lund, Ole; Dunn, David; La, Tom; Hampson, David J; Bellgard, Matthew; Wassenaar, Trudy M; Ussery, David W

    2006-07-01

    It has been more than 10 years since the first bacterial genome sequence was published. Hundreds of bacterial genome sequences are now available for comparative genomics, and searching a given protein against more than a thousand genomes will soon be possible. The subject of this review will address a relatively straightforward question: "What have we learned from this vast amount of new genomic data?" Perhaps one of the most important lessons has been that genetic diversity, at the level of large-scale variation amongst even genomes of the same species, is far greater than was thought. The classical textbook view of evolution relying on the relatively slow accumulation of mutational events at the level of individual bases scattered throughout the genome has changed. One of the most obvious conclusions from examining the sequences from several hundred bacterial genomes is the enormous amount of diversity--even in different genomes from the same bacterial species. This diversity is generated by a variety of mechanisms, including mobile genetic elements and bacteriophages. An examination of the 20 Escherichia coli genomes sequenced so far dramatically illustrates this, with the genome size ranging from 4.6 to 5.5 Mbp; much of the variation appears to be of phage origin. This review also addresses mobile genetic elements, including pathogenicity islands and the structure of transposable elements. There are at least 20 different methods available to compare bacterial genomes. Metagenomics offers the chance to study genomic sequences found in ecosystems, including genomes of species that are difficult to culture. It has become clear that a genome sequence represents more than just a collection of gene sequences for an organism and that information concerning the environment and growth conditions for the organism are important for interpretation of the genomic data. The newly proposed Minimal Information about a Genome Sequence standard has been developed to obtain this information. PMID:16773396

  6. From complete genome sequence to “complete“ understanding?

    PubMed Central

    Galperin, Michael Y.; Koonin, Eugene V.

    2011-01-01

    The rapidly accumulating genome sequence data allow researchers to address fundamental biological questions that were not even asked just a few years ago. A major problem in genomics is the widening gap between the rapid progress in genome sequencing and the comparatively slow progress in the functional characterization of sequenced genomes. Here we discuss two key questions of genome biology: whether we need more genomes, and how deep is our understanding of biology based on genomic analysis. We argue that overly specific annotations of gene functions are often less useful than the more generic, but also more robust, functional assignments based on protein family classification. We also discuss problems in understanding the functions of the remaining “conserved hypothetical” genes. PMID:20647113

  7. Draft sequences of the radish (Raphanus sativus L.) genome.

    PubMed

    Kitashiba, Hiroyasu; Li, Feng; Hirakawa, Hideki; Kawanabe, Takahiro; Zou, Zhongwei; Hasegawa, Yoichi; Tonosaki, Kaoru; Shirasawa, Sachiko; Fukushima, Aki; Yokoi, Shuji; Takahata, Yoshihito; Kakizaki, Tomohiro; Ishida, Masahiko; Okamoto, Shunsuke; Sakamoto, Koji; Shirasawa, Kenta; Tabata, Satoshi; Nishio, Takeshi

    2014-10-01

    Radish (Raphanus sativus L., n = 9) is one of the major vegetables in Asia. Since the genomes of Brassica and related species including radish underwent genome rearrangement, it is quite difficult to perform functional analysis based on the reported genomic sequence of Brassica rapa. Therefore, we performed genome sequencing of radish. Short reads of genomic sequences of 191.1 Gb were obtained by next-generation sequencing (NGS) for a radish inbred line, and 76,592 scaffolds of ≥ 300 bp were constructed along with the bacterial artificial chromosome-end sequences. Finally, the whole draft genomic sequence of 402 Mb spanning 75.9% of the estimated genomic size and containing 61,572 predicted genes was obtained. Subsequently, 221 single nucleotide polymorphism markers and 768 PCR-RFLP markers were used together with the 746 markers produced in our previous study for the construction of a linkage map. The map was combined further with another radish linkage map constructed mainly with expressed sequence tag-simple sequence repeat markers into a high-density integrated map of 1,166 cM with 2,553 DNA markers. A total of 1,345 scaffolds were assigned to the linkage map, spanning 116.0 Mb. Bulked PCR products amplified by 2,880 primer pairs were sequenced by NGS, and SNPs in eight inbred lines were identified. PMID:24848699

  8. Draft Sequences of the Radish (Raphanus sativus L.) Genome

    PubMed Central

    Kitashiba, Hiroyasu; Li, Feng; Hirakawa, Hideki; Kawanabe, Takahiro; Zou, Zhongwei; Hasegawa, Yoichi; Tonosaki, Kaoru; Shirasawa, Sachiko; Fukushima, Aki; Yokoi, Shuji; Takahata, Yoshihito; Kakizaki, Tomohiro; Ishida, Masahiko; Okamoto, Shunsuke; Sakamoto, Koji; Shirasawa, Kenta; Tabata, Satoshi; Nishio, Takeshi

    2014-01-01

    Radish (Raphanus sativus L., n = 9) is one of the major vegetables in Asia. Since the genomes of Brassica and related species including radish underwent genome rearrangement, it is quite difficult to perform functional analysis based on the reported genomic sequence of Brassica rapa. Therefore, we performed genome sequencing of radish. Short reads of genomic sequences of 191.1 Gb were obtained by next-generation sequencing (NGS) for a radish inbred line, and 76,592 scaffolds of ≥300 bp were constructed along with the bacterial artificial chromosome-end sequences. Finally, the whole draft genomic sequence of 402 Mb spanning 75.9% of the estimated genomic size and containing 61,572 predicted genes was obtained. Subsequently, 221 single nucleotide polymorphism markers and 768 PCR-RFLP markers were used together with the 746 markers produced in our previous study for the construction of a linkage map. The map was combined further with another radish linkage map constructed mainly with expressed sequence tag-simple sequence repeat markers into a high-density integrated map of 1,166 cM with 2,553 DNA markers. A total of 1,345 scaffolds were assigned to the linkage map, spanning 116.0 Mb. Bulked PCR products amplified by 2,880 primer pairs were sequenced by NGS, and SNPs in eight inbred lines were identified. PMID:24848699

  9. Genome sequencing and annotation of Serratia sp. strain TEL

    PubMed Central

    Lephoto, Tiisetso E.; Gray, Vincent M.

    2015-01-01

    We present the annotation of the draft genome sequence of Serratia sp. strain TEL (GenBank accession number KP711410). This organism was isolated from entomopathogenic nematode Oscheius sp. strain TEL (GenBank accession number KM492926) collected from grassland soil and has a genome size of 5,000,541bp and 542 subsystems. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession number LDEG00000000. PMID:26697332

  10. Genome sequencing and annotation of Serratia sp. strain TEL.

    PubMed

    Lephoto, Tiisetso E; Gray, Vincent M

    2015-12-01

    We present the annotation of the draft genome sequence of Serratia sp. strain TEL (GenBank accession number KP711410). This organism was isolated from entomopathogenic nematode Oscheius sp. strain TEL (GenBank accession number KM492926) collected from grassland soil and has a genome size of 5,000,541 bp and 542 subsystems. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession number LDEG00000000. PMID:26697332

  11. Standards for Sequencing Viral Genomes in the Era of High-Throughput Sequencing

    PubMed Central

    Beitzel, Brett; Chain, Patrick S. G.; Davenport, Matthew G.; Donaldson, Eric; Frieman, Matthew; Kugelman, Jeffrey; Kuhn, Jens H.; O’Rear, Jules; Sabeti, Pardis C.; Wentworth, David E.; Wiley, Michael R.; Yu, Guo-Yun; Sozhamannan, Shanmuga; Bradburne, Christopher

    2014-01-01

    ABSTRACT Thanks to high-throughput sequencing technologies, genome sequencing has become a common component in nearly all aspects of viral research; thus, we are experiencing an explosion in both the number of available genome sequences and the number of institutions producing such data. However, there are currently no common standards used to convey the quality, and therefore utility, of these various genome sequences. Here, we propose five “standard” categories that encompass all stages of viral genome finishing, and we define them using simple criteria that are agnostic to the technology used for sequencing. We also provide genome finishing recommendations for various downstream applications, keeping in mind the cost-benefit trade-offs associated with different levels of finishing. Our goal is to define a common vocabulary that will allow comparison of genome quality across different research groups, sequencing platforms, and assembly techniques. PMID:24939889

  12. Microbial genome sequencing using optical mapping and Illumina sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Introduction Optical mapping is a technique in which strands of genomic DNA are digested with one or more restriction enzymes, and a physical map of the genome constructed from the resulting image. In outline, genomic DNA is extracted from a pure culture, linearly arrayed on a specialized glass sli...

  13. Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi

    SciTech Connect

    Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.; Casjens, S. R.; Qiu, W.-G.; Mongodin, E. F.; Luft, B. J.

    2011-02-01

    Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.

  14. Complete Genome Sequence of Probiotic Strain Lactobacillus acidophilus La-14.

    PubMed

    Stahl, Buffy; Barrangou, Rodolphe

    2013-01-01

    We present the 1,991,830-bp complete genome sequence of Lactobacillus acidophilus strain La-14 (SD-5212). Comparative genomic analysis revealed 99.98% similarity overall to the L. acidophilus NCFM genome. Globally, 111 single nucleotide polymorphisms (SNPs) (95 SNPs, 16 indels) were observed throughout the genome. Also, a 416-bp deletion in the LA14_1146 sugar ABC transporter was identified. PMID:23788546

  15. The Brachypodium genome sequence: a resource for oat genomics research

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Oat (Avena sativa) is an important cereal crop used as both an animal feed and for human consumption. Genetic and genomic research on oat is hindered because it is hexaploid and possesses a large (13 Gb) genome. Diploid Avena relatives have been employed for genetic and genomic studies, but only mod...

  16. Genomics:GTL Bioenergy Research Centers White Paper

    SciTech Connect

    Mansfield, Betty Kay; Alton, Anita Jean; Andrews, Shirley H; Bownas, Jennifer Lynn; Casey, Denise; Martin, Sheryl A; Mills, Marissa; Nylander, Kim; Wyrick, Judy M; Drell, Dr. Daniel; Weatherwax, Sharlene; Carruthers, Julie

    2006-08-01

    In his Advanced Energy Initiative announced in January 2006, President George W. Bush committed the nation to new efforts to develop alternative sources of energy to replace imported oil and fossil fuels. Developing cost-effective and energy-efficient methods of producing renewable alternative fuels such as cellulosic ethanol from biomass and solar-derived biofuels will require transformational breakthroughs in science and technology. Incremental improvements in current bioenergy production methods will not suffice. The Genomics:GTL Bioenergy Research Centers will be dedicated to fundamental research on microbe and plant systems with the goal of developing knowledge that will advance biotechnology-based strategies for biofuels production. The aim is to spur substantial progress toward cost-effective production of biologically based renewable energy sources. This document describes the rationale for the establishment of the centers and their objectives in light of the U.S. Department of Energy's mission and goals. Developing energy-efficient and cost-effective methods of producing alternative fuels such as cellulosic ethanol from biomass will require transformational breakthroughs in science and technology. Incremental improvements in current bioenergy-production methods will not suffice. The focus on microbes (for cellular mechanisms) and plants (for source biomass) fundamentally exploits capabilities well known to exist in the microbial world. Thus 'proof of concept' is not required, but considerable basic research into these capabilities remains an urgent priority. Several developments have converged in recent years to suggest that systems biology research into microbes and plants promises solutions that will overcome critical roadblocks on the path to cost-effective, large-scale production of cellulosic ethanol and other renewable energy from biomass. The ability to rapidly sequence the DNA of any organism is a critical part of these new capabilities, but it is only a first step. Other advances include the growing number of high-throughput techniques for protein production and characterization; a range of new instrumentation for observing proteins and other cell constituents; the rapid growth of commercially available reagents for protein production; a new generation of high-intensity light sources that provide precision imaging on the nanoscale and allow observation of molecular interactions in ultrafast time intervals; major advances in computational capability; and the continually increasing numbers of these instruments and technologies within the national laboratory infrastructure, at universities, and in private industry. All these developments expand our ability to elucidate mechanisms present in living cells, but much more remains to be done. The Centers are designed to accomplish GTL program objectives more rapidly, more effectively, and at reduced cost by concentrating appropriate technologies and scientific expertise, from genome sequence to an integrated systems understanding of the pathways and internal structures of microbes and plants most relevant to developing bioenergy compounds. The Centers will seek to understand the principles underlying the structural and functional design of selected microbial, plant, and molecular systems. This will be accomplished by building technological pathways linking the genome-determined components in an organism with bioenergy-relevant cellular systems that can be characterized sufficiently to generate realistic options for biofuel development. In addition, especially in addressing what are believed to be nearer-term approaches to renewable energy (e.g., producing cellulosic ethanol cost-effectively and energy-efficiently), the Center research team must understand in depth the current industrial-level roadblocks and bottlenecks (see section, GTL's Vision for Biological Energy Alternatives, below). For the Centers, and indeed the entire BER effort, to be successful, Center research must be integrated with individual investigator research, and coordination of activities, from DNA sequencing to high-throughput protein development and characterization.

  17. Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes

    SciTech Connect

    McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.; Kuehl, Jennifer V.; Boore, Jeffrey L.; dePamphilis, Claude W.

    2005-08-26

    Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. A minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.

  18. Genome Sequence of the Trichosporon asahii Environmental Strain CBS 8904

    PubMed Central

    Li, Hai Tao; Zhu, He; Zhou, Guang Peng; Wang, Meng; Wang, Lei

    2012-01-01

    This is the first report of the genome sequence of Trichosporon asahii environmental strain CBS 8904, which was isolated from maize cobs. Comparison of the genome sequence with that of clinical strain CBS 2479 revealed that they have >99% chromosomal and mitochondrial sequence identity, yet CBS 8904 has 368 specific genes. Analysis of clusters of orthologous groups predicted that 3,307 genes belong to 23 functional categories and 703 genes were predicted to have a general function. PMID:23193141

  19. Complete genome sequence of Mycoplasma haemofelis, a hemotropic mycoplasma.

    PubMed

    Barker, Emily N; Helps, Chris R; Peters, Iain R; Darby, Alistair C; Radford, Alan D; Tasker, Séverine

    2011-04-01

    Here, we present the genome sequence of Mycoplasma haemofelis strain Langford 1, representing the first hemotropic mycoplasma (hemoplasma) species to be completely sequenced and annotated. Originally isolated from a cat with hemolytic anemia, this strain induces severe hemolytic anemia when inoculated into specific-pathogen-free-derived cats. The genome sequence has provided insights into the biology of this uncultivatable hemoplasma and has identified potential molecular mechanisms underlying its pathogenicity. PMID:21317334

  20. Complete Genome Sequence of Salmonella Bacteriophage SS3e

    PubMed Central

    Kim, Sung-Hun; Park, Jeong-Hyun; Lee, Bok-Kwon; Kwon, Hyuk-Joon; Shin, Ji-Hyun; Kim, Jungmin

    2012-01-01

    A Salmonella lytic bacteriophage, SS3e, was isolated, and its genome was sequenced completely. This phage is able to lyse not only various Salmonella serovars but also Escherichia coli, Shigella sonnei, Enterobacter cloacae, and Serratia marcescens, indicating a broad host specificity. Genomic sequence analysis of SS3e revealed a linear double-stranded DNA sequence of 40,793 bp harboring 58 open reading frames, which is highly similar to Salmonella phages SETP13 and MB78. PMID:22923809

  1. Real-time, portable genome sequencing for Ebola surveillance.

    PubMed

    Quick, Joshua; Loman, Nicholas J; Duraffour, Sophie; Simpson, Jared T; Severi, Ettore; Cowley, Lauren; Bore, Joseph Akoi; Koundouno, Raymond; Dudas, Gytis; Mikhail, Amy; Ouédraogo, Nobila; Afrough, Babak; Bah, Amadou; Baum, Jonathan H J; Becker-Ziaja, Beate; Boettcher, Jan Peter; Cabeza-Cabrerizo, Mar; Camino-Sánchez, Álvaro; Carter, Lisa L; Doerrbecker, Juliane; Enkirch, Theresa; García-Dorival, Isabel; Hetzelt, Nicole; Hinzmann, Julia; Holm, Tobias; Kafetzopoulou, Liana Eleni; Koropogui, Michel; Kosgey, Abigael; Kuisma, Eeva; Logue, Christopher H; Mazzarelli, Antonio; Meisel, Sarah; Mertens, Marc; Michel, Janine; Ngabo, Didier; Nitzsche, Katja; Pallasch, Elisa; Patrono, Livia Victoria; Portmann, Jasmine; Repits, Johanna Gabriella; Rickett, Natasha Y; Sachse, Andreas; Singethan, Katrin; Vitoriano, Inês; Yemanaberhan, Rahel L; Zekeng, Elsa G; Racine, Trina; Bello, Alexander; Sall, Amadou Alpha; Faye, Ousmane; Faye, Oumar; Magassouba, N'Faly; Williams, Cecelia V; Amburgey, Victoria; Winona, Linda; Davis, Emily; Gerlach, Jon; Washington, Frank; Monteil, Vanessa; Jourdain, Marine; Bererd, Marion; Camara, Alimou; Somlare, Hermann; Camara, Abdoulaye; Gerard, Marianne; Bado, Guillaume; Baillet, Bernard; Delaune, Déborah; Nebie, Koumpingnin Yacouba; Diarra, Abdoulaye; Savane, Yacouba; Pallawo, Raymond Bernard; Gutierrez, Giovanna Jaramillo; Milhano, Natacha; Roger, Isabelle; Williams, Christopher J; Yattara, Facinet; Lewandowski, Kuiama; Taylor, James; Rachwal, Phillip; Turner, Daniel J; Pollakis, Georgios; Hiscox, Julian A; Matthews, David A; O'Shea, Matthew K; Johnston, Andrew McD; Wilson, Duncan; Hutley, Emma; Smit, Erasmus; Di Caro, Antonino; Wölfel, Roman; Stoecker, Kilian; Fleischmann, Erna; Gabriel, Martin; Weller, Simon A; Koivogui, Lamine; Diallo, Boubacar; Keïta, Sakoba; Rambaut, Andrew; Formenty, Pierre; Günther, Stephan; Carroll, Miles W

    2016-02-11

    The Ebola virus disease epidemic in West Africa is the largest on record, responsible for over 28,599 cases and more than 11,299 deaths. Genome sequencing in viral outbreaks is desirable to characterize the infectious agent and determine its evolutionary rate. Genome sequencing also allows the identification of signatures of host adaptation, identification and monitoring of diagnostic targets, and characterization of responses to vaccines and treatments. The Ebola virus (EBOV) genome substitution rate in the Makona strain has been estimated at between 0.87 × 10(-3) and 1.42 × 10(-3) mutations per site per year. This is equivalent to 16-27 mutations in each genome, meaning that sequences diverge rapidly enough to identify distinct sub-lineages during a prolonged epidemic. Genome sequencing provides a high-resolution view of pathogen evolution and is increasingly sought after for outbreak surveillance. Sequence data may be used to guide control measures, but only if the results are generated quickly enough to inform interventions. Genomic surveillance during the epidemic has been sporadic owing to a lack of local sequencing capacity coupled with practical difficulties transporting samples to remote sequencing facilities. To address this problem, here we devise a genomic surveillance system that utilizes a novel nanopore DNA sequencing instrument. In April 2015 this system was transported in standard airline luggage to Guinea and used for real-time genomic surveillance of the ongoing epidemic. We present sequence data and analysis of 142 EBOV samples collected during the period March to October 2015. We were able to generate results less than 24 h after receiving an Ebola-positive sample, with the sequencing process taking as little as 15-60 min. We show that real-time genomic surveillance is possible in resource-limited settings and can be established rapidly to monitor outbreaks. PMID:26840485

  2. Data structures and compression algorithms for genomic sequence data

    PubMed Central

    Brandon, Marty C.; Wallace, Douglas C.; Baldi, Pierre

    2009-01-01

    Motivation: The continuing exponential accumulation of full genome data, including full diploid human genomes, creates new challenges not only for understanding genomic structure, function and evolution, but also for the storage, navigation and privacy of genomic data. Here, we develop data structures and algorithms for the efficient storage of genomic and other sequence data that may also facilitate querying and protecting the data. Results: The general idea is to encode only the differences between a genome sequence and a reference sequence, using absolute or relative coordinates for the location of the differences. These locations and the corresponding differential variants can be encoded into binary strings using various entropy coding methods, from fixed codes such as Golomb and Elias codes, to variables codes, such as Huffman codes. We demonstrate the approach and various tradeoffs using highly variables human mitochondrial genome sequences as a testbed. With only a partial level of optimization, 3615 genome sequences occupying 56 MB in GenBank are compressed down to only 167 KB, achieving a 345-fold compression rate, using the revised Cambridge Reference Sequence as the reference sequence. Using the consensus sequence as the reference sequence, the data can be stored using only 133 KB, corresponding to a 433-fold level of compression, roughly a 23% improvement. Extensions to nuclear genomes and high-throughput sequencing data are discussed. Availability: Data are publicly available from GenBank, the HapMap web site, and the MITOMAP database. Supplementary materials with additional results, statistics, and software implementations are available from http://mammag.web.uci.edu/bin/view/Mitowiki/ProjectDNACompression. Contact: pfbaldi@ics.uci.edu PMID:19447783

  3. Whole-exome targeted sequencing of the uncharacterized pine genome.

    PubMed

    Neves, Leandro G; Davis, John M; Barbazuk, William B; Kirst, Matias

    2013-07-01

    The large genome size of many species hinders the development and application of genomic tools to study them. For instance, loblolly pine (Pinus taeda L.), an ecologically and economically important conifer, has a large and yet uncharacterized genome of 21.7 Gbp. To characterize the pine genome, we performed exome capture and sequencing of 14 729 genes derived from an assembly of expressed sequence tags. Efficiency of sequence capture was evaluated and shown to be similar across samples with increasing levels of complexity, including haploid cDNA, haploid genomic DNA and diploid genomic DNA. However, this efficiency was severely reduced for probes that overlapped multiple exons, presumably because intron sequences hindered probe:exon hybridizations. Such regions could not be entirely avoided during probe design, because of the lack of a reference sequence. To improve the throughput and reduce the cost of sequence capture, a method to multiplex the analysis of up to eight samples was developed. Sequence data showed that multiplexed capture was reproducible among 24 haploid samples, and can be applied for high-throughput analysis of targeted genes in large populations. Captured sequences were de novo assembled, resulting in 11 396 expanded and annotated gene models, significantly improving the knowledge about the pine gene space. Interspecific capture was also evaluated with over 98% of all probes designed from P. taeda that were efficient in sequence capture, were also suitable for analysis of the related species Pinus elliottii Engelm. PMID:23551702

  4. Advances in understanding cancer genomes through second-generation sequencing.

    PubMed

    Meyerson, Matthew; Gabriel, Stacey; Getz, Gad

    2010-10-01

    Cancers are caused by the accumulation of genomic alterations. Therefore, analyses of cancer genome sequences and structures provide insights for understanding cancer biology, diagnosis and therapy. The application of second-generation DNA sequencing technologies (also known as next-generation sequencing) - through whole-genome, whole-exome and whole-transcriptome approaches - is allowing substantial advances in cancer genomics. These methods are facilitating an increase in the efficiency and resolution of detection of each of the principal types of somatic cancer genome alterations, including nucleotide substitutions, small insertions and deletions, copy number alterations, chromosomal rearrangements and microbial infections. This Review focuses on the methodological considerations for characterizing somatic genome alterations in cancer and the future prospects for these approaches. PMID:20847746

  5. Reference genome sequence of the model plant Setaria

    SciTech Connect

    Bennetzen, Jeffrey L; Schmutz, Jeremy; Wang, Hao; Percifield, Ryan; Hawkins, Jennifer; Pontaroli, Ana C.; Estep, Matt; Feng, Liang; Vaughn, Justin N; Grimwood, Jane; Jenkins, Jerry; Barry, Kerrie; Lindquist, Erika; Hellsten, Uffe; Deshpande, Shweta; Wang, Xuewen; Wu, Xiaomei; Mitros, Therese; Triplett, Jimmy; Yang, Xiaohan; Ye, Chuyu; Mauro-Herrera, Margarita; Wang, Lin; Li, Pinghua; Sharma, Manoj; Sharma, Rita; Ronald, Pamela; Panaud, Olivier; Kellogg, Elizabeth A.; Brutnell, Thomas P.; Doust, Andrew N.; Tuskan, Gerald A; Rokhsar, Daniel; Devos, Katrien M

    2012-01-01

    We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The ~400-Mb assembly covers ~80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).

  6. Reference genome sequence of the model plant Setaria

    SciTech Connect

    Bennetzen, Jeffrey L; Yang, Xiaohan; Ye, Chuyu; Tuskan, Gerald A

    2012-01-01

    We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The {approx}400-Mb assembly covers {approx}80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).

  7. Marsupial genome sequences: providing insight into evolution and disease.

    PubMed

    Deakin, Janine E

    2012-01-01

    Marsupials (metatherians), with their position in vertebrate phylogeny and their unique biological features, have been studied for many years by a dedicated group of researchers, but it has only been since the sequencing of the first marsupial genome that their value has been more widely recognised. We now have genome sequences for three distantly related marsupial species (the grey short-tailed opossum, the tammar wallaby, and Tasmanian devil), with the promise of many more genomes to be sequenced in the near future, making this a particularly exciting time in marsupial genomics. The emergence of a transmissible cancer, which is obliterating the Tasmanian devil population, has increased the importance of obtaining and analysing marsupial genome sequence for understanding such diseases as well as for conservation efforts. In addition, these genome sequences have facilitated studies aimed at answering questions regarding gene and genome evolution and provided insight into the evolution of epigenetic mechanisms. Here I highlight the major advances in our understanding of evolution and disease, facilitated by marsupial genome projects, and speculate on the future contributions to be made by such sequences. PMID:24278712

  8. Marsupial Genome Sequences: Providing Insight into Evolution and Disease

    PubMed Central

    Deakin, Janine E.

    2012-01-01

    Marsupials (metatherians), with their position in vertebrate phylogeny and their unique biological features, have been studied for many years by a dedicated group of researchers, but it has only been since the sequencing of the first marsupial genome that their value has been more widely recognised. We now have genome sequences for three distantly related marsupial species (the grey short-tailed opossum, the tammar wallaby, and Tasmanian devil), with the promise of many more genomes to be sequenced in the near future, making this a particularly exciting time in marsupial genomics. The emergence of a transmissible cancer, which is obliterating the Tasmanian devil population, has increased the importance of obtaining and analysing marsupial genome sequence for understanding such diseases as well as for conservation efforts. In addition, these genome sequences have facilitated studies aimed at answering questions regarding gene and genome evolution and provided insight into the evolution of epigenetic mechanisms. Here I highlight the major advances in our understanding of evolution and disease, facilitated by marsupial genome projects, and speculate on the future contributions to be made by such sequences. PMID:24278712

  9. First Complete Genome Sequences of Two Keystone Viruses from Florida

    PubMed Central

    Stockwell, Timothy B.; Heberlein-Larson, Lea A.; Tan, Yi; Halpin, Rebecca A.; Fedorova, Nadia; Katzel, Daniel A.; Smole, Sandra; Unnasch, Thomas R.; Kramer, Laura D.

    2015-01-01

    We report here the first complete sequences of two Keystone virus (KEYV) genomes isolated from Florida in 2005, which include the first two publicly available complete large (L) gene sequences. The sequences of the KEYV L segments show 75.99 to 83.86% nucleotide similarity with those of other viruses in the California (CAL) serogroup of bunyaviruses. PMID:26514762

  10. An international plan to sequence the nuclear genome of onion

    Technology Transfer Automated Retrieval System (TEKTRAN)

    As large-scale DNA sequencing technologies become more efficient and less costly, the genomic DNAs of more and more plants are being sequenced, assembled, and annotated. These complete sequences are extremely valuable for the identification of specific genes associated with important phenotypes. Thi...

  11. First Complete Genome Sequences of Two Keystone Viruses from Florida.

    PubMed

    Stockwell, Timothy B; Heberlein-Larson, Lea A; Tan, Yi; Halpin, Rebecca A; Fedorova, Nadia; Katzel, Daniel A; Smole, Sandra; Unnasch, Thomas R; Kramer, Laura D; Das, Suman R

    2015-01-01

    We report here the first complete sequences of two Keystone virus (KEYV) genomes isolated from Florida in 2005, which include the first two publicly available complete large (L) gene sequences. The sequences of the KEYV L segments show 75.99 to 83.86% nucleotide similarity with those of other viruses in the California (CAL) serogroup of bunyaviruses. PMID:26514762

  12. Complete genome sequence of Gordonia bronchialis type strain (3410T)

    PubMed Central

    Ivanova, Natalia; Sikorski, Johannes; Jando, Marlen; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Del Rio, Tijana Glavina; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Chain, Patrick; Saunders, Elizabeth; Han, Cliff; Detter, John C.; Brettin, Thomas; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C.

    2010-01-01

    Gordonia bronchialis Tsukamura 1971 is the type species of the genus. G. bronchialis is a human-pathogenic organism that has been isolated from a large variety of human tissues. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family Gordoniaceae. The 5,290,012 bp long genome with its 4,944 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304674

  13. Complete genome sequence of Spirosoma linguale type strain (1T)

    SciTech Connect

    Lail, Kathleen; Sikorski, Johannes; Saunders, Elizabeth H; Lapidus, Alla L.; Glavina Del Rio, Tijana; Copeland, A; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Nolan, Matt; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Ivanova, N; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Detter, J. Chris; Schutze, Andrea; Rohde, Manfred; Tindall, Brian; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Chen, Feng

    2010-01-01

    Spirosoma linguale Migula 1894 is the type species of the genus. S. linguale is a free-living and non-pathogenic organism, known for its peculiar ringlike and horseshoe-shaped cell morphology. Here we describe the features of this organism, together with the complete ge-nome sequence and annotation. This is only the third completed genome sequence of a member of the family Cytophagaceae. The 8,491,258 bp long genome with its eight plas-mids, 7,069 protein-coding and 60 RNA genes is part of the Genomic Encyclopedia of Bacte-ria and Archaea project.

  14. Complete genome sequence of Sulfurospirillum deleyianum type strain (5175T)

    SciTech Connect

    Sikorski, Johannes; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Chen, Feng; Tice, Hope; Cheng, Jan-Fang; Saunders, Elizabeth H; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Ovchinnikova, Galina; Pati, Amrita; Ivanova, N; Mavromatis, K; Chen, Amy; Palaniappan, Krishna; Chain, Patrick S. G.; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Detter, J. Chris; Han, Cliff; Rohde, Manfred; Lang, Elke; Spring, Stefan; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-01

    Sulfurospirillum deleyianum Schumacher et al. 1993 is the type species of the genus Sulfurospirillum. S. deleyianum is a model organism for studying sulfur reduction and dissimilatory nitrate reduction as energy source for growth. Also, it is a prominent model organism for studying the structural and functional characteristics of the cytochrome c nitrite reductase. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the genus Sulfurospirillum. The 2,306,351 bp long genome with its 2291 protein-coding and 52 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  15. Complete genome sequence of Acidimicrobium ferrooxidans type strain (ICPT)

    SciTech Connect

    Clum, Alicia; Nolan, Matt; Lang, Elke; Glavina Del Rio, Tijana; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavrommatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Goker, Markus; Spring, Stefan; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia C.; Chain, Patrick; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Lapidus, Alla

    2009-05-20

    Acidimicrobium ferrooxidans (Clark and Norris 1996) is the sole and type species of the genus, which until recently was the only genus within the actinobacterial family Acidimicrobiaceae and in the order Acidomicrobiales. Rapid oxidation of iron pyrite during autotrophic growth in the absence of an enhanced CO2 concentration is characteristic for A. ferrooxidans. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the order Acidomicrobiales, and the 2,158,157 bp long single replicon genome with its 2038 protein coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  16. Complete genome sequence of Thermomonospora curvata type strain (B9)

    SciTech Connect

    Chertkov, Olga; Sikorski, Johannes; Nolan, Matt; Lapidus, Alla L.; Lucas, Susan; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Ngatchou, Olivier Duplex; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Brettin, Thomas S; Han, Cliff; Detter, J. Chris; Rohde, Manfred; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2011-01-01

    Thermomonospora curvata Henssen 1957 is the type species of the genus Thermomonospora. This genus is of interest because members of this clade are sources of new antibiotics, enzymes, and products with pharmacological activity. In addition, members of this genus participate in the active degradation of cellulose. This is the first complete genome sequence of a member of the family Thermomonosporaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 5,639,016 bp long genome with its 4,985 protein-coding and 76 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  17. Complete genome sequence of Gordonia bronchialis type strain (3410T)

    SciTech Connect

    Ivanova, N; Sikorski, Johannes; Jando, Marlen; Lapidus, Alla L.; Nolan, Matt; Glavina Del Rio, Tijana; Tice, Hope; Copeland, A; Cheng, Jan-Fang; Chen, Feng; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Saunders, Elizabeth H; Han, Cliff; Detter, J C; Brettin, Thomas S; Rohde, Manfred; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2010-01-01

    Gordonia bronchialis Tsukamura 1971 is the type species of the genus. G. bronchialis is a human-pathogenic organism that has been isolated from a large variety of human tissues. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family Gordoniaceae. The 5,290,012 bp long genome with its 4,944 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  18. Genome sequence of the Nocardia bacteriophage NBR1.

    PubMed

    Petrovski, Steve; Seviour, Robert J; Tillett, Daniel

    2014-01-01

    We here characterize a novel bacteriophage (NBR1) that is lytic for Nocardia otitidiscaviarum and N. brasiliensis. NBR1 is a member of the family Siphoviridae and appears to have a structurally more complex tail than previously reported Siphoviridae phages. NBR1 has a linear genome of 46,140 bp and a sequence that appears novel when compared to those of other phage sequences in GenBank. Annotation of the genome reveals 68 putative open reading frames. The phage genome organization appears to be similar to other Siphoviridae phage genomes in that it has a modular arrangement. PMID:23913189

  19. Genome sequencing: a systematic review of health economic evidence

    PubMed Central

    2013-01-01

    Recently the sequencing of the human genome has become a major biological and clinical research field. However, the public health impact of this new technology with focus on the financial effect is not yet to be foreseen. To provide an overview of the current health economic evidence for genome sequencing, we conducted a thorough systematic review of the literature from 17 databases. In addition, we conducted a hand search. Starting with 5 520 records we ultimately included five full-text publications and one internet source, all focused on cost calculations. The results were very heterogeneous and, therefore, difficult to compare. Furthermore, because the methodology of the publications was quite poor, the reliability and validity of the results were questionable. The real costs for the whole sequencing workflow, including data management and analysis, remain unknown. Overall, our review indicates that the current health economic evidence for genome sequencing is quite poor. Therefore, we listed aspects that needed to be considered when conducting health economic analyses of genome sequencing. Thereby, specifics regarding the overall aim, technology, population, indication, comparator, alternatives after sequencing, outcomes, probabilities, and costs with respect to genome sequencing are discussed. For further research, at the outset, a comprehensive cost calculation of genome sequencing is needed, because all further health economic studies rely on valid cost data. The results will serve as an input parameter for budget-impact analyses or cost-effectiveness analyses. PMID:24330507

  20. Complete Genome Sequence of Phytopathogenic Pectobacterium atrosepticum Bacteriophage Peat1

    PubMed Central

    Kalischuk, Melanie; Hachey, John

    2015-01-01

    Pectobacterium atrosepticum is a common phytopathogen causing significant economic losses worldwide. To develop a biocontrol strategy for this blackleg pathogen of solanaceous plants, P. atrosepticum bacteriophage Peat1 was isolated and its genome completely sequenced. Interestingly, morphological and sequence analyses of the 45,633-bp genome revealed that phage Peat1 is a member of the family Podoviridae and most closely resembles the Klebsiella pneumoniae bacteriophage KP34. This is the first published complete genome sequence of a phytopathogenic P. atrosepticum bacteriophage, and details provide important information for the development of biocontrol by advancing our understanding of phage-phytopathogen interactions. PMID:26272557

  1. Complete Genome Sequence of Phytopathogenic Pectobacterium atrosepticum Bacteriophage Peat1.

    PubMed

    Kalischuk, Melanie; Hachey, John; Kawchuk, Lawrence

    2015-01-01

    Pectobacterium atrosepticum is a common phytopathogen causing significant economic losses worldwide. To develop a biocontrol strategy for this blackleg pathogen of solanaceous plants, P. atrosepticum bacteriophage Peat1 was isolated and its genome completely sequenced. Interestingly, morphological and sequence analyses of the 45,633-bp genome revealed that phage Peat1 is a member of the family Podoviridae and most closely resembles the Klebsiella pneumoniae bacteriophage KP34. This is the first published complete genome sequence of a phytopathogenic P. atrosepticum bacteriophage, and details provide important information for the development of biocontrol by advancing our understanding of phage-phytopathogen interactions. PMID:26272557

  2. Genome sequencing and annotation of Aeromonas sp. HZM.

    PubMed

    Chua, Patric; Har, Zi Mei; Austin, Christopher M; Yule, Catherine M; Dykes, Gary A; Lee, Sui Mae

    2015-09-01

    We report the draft genome sequence of Aeromonas sp. strain HZM, isolated from tropical peat swamp forest soil. The draft genome size is 4,451,364 bp with a G + C content of 61.7% and contains 10 rRNA sequences (eight copies of 5S rRNA genes, single copy of 16S and 23S rRNA each). The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. JEMQ00000000. PMID:26484220

  3. Complete genome sequence of Staphylothermus hellenicus P8T

    SciTech Connect

    Anderson, Iain; Wirth, Reinhard; Lucas, Susan; Copeland, A; Lapidus, Alla L.; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Davenport, Karen W.; Detter, J. Chris; Han, Cliff; Tapia, Roxanne; Land, Miriam L; Hauser, Loren John; Pati, Amrita; Mikhailova, Natalia; Woyke, Tanja; Klenk, Hans-Peter; Kyrpides, Nikos C; Ivanova, N

    2011-01-01

    Staphylothermus hellenicus belongs to the order Desulfurococcales within the archaeal phy- lum Crenarchaeota. Strain P8T is the type strain of the species and was isolated from a shal- low hydrothermal vent system at Palaeochori Bay, Milos, Greece. It is a hyperthermophilic, anaerobic heterotroph. Here we describe the features of this organism together with the com- plete genome sequence and annotation. The 1,580,347 bp genome with its 1,668 protein- coding and 48 RNA genes was sequenced as part of a DOE Joint Genome Institute (JGI) La- boratory Sequencing Program (LSP) project.

  4. Genome sequencing and annotation of Aeromonas sp. HZM

    PubMed Central

    Chua, Patric; Har, Zi Mei; Austin, Christopher M.; Yule, Catherine M.; Dykes, Gary A.; Lee, Sui Mae

    2015-01-01

    We report the draft genome sequence of Aeromonas sp. strain HZM, isolated from tropical peat swamp forest soil. The draft genome size is 4,451,364 bp with a G + C content of 61.7% and contains 10 rRNA sequences (eight copies of 5S rRNA genes, single copy of 16S and 23S rRNA each). The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. JEMQ00000000. PMID:26484220

  5. Complete Genome Sequence of Corynebacterium pseudotuberculosis Strain 12C.

    PubMed

    Sousa, Thiago Jesus; Mariano, Diego; Parise, Doglas; Parise, Mariana; Viana, Marcus Vinicius Canário; Guimarães, Luis Carlos; Benevides, Leandro Jesus; Rocha, Flávia; Bagano, Priscilla; Ramos, Rommel; Silva, Artur; Figueiredo, Henrique; Almeida, Sintia; Azevedo, Vasco

    2015-01-01

    We present here the complete genome sequence of Corynebacterium pseudotuberculosis strain 12C, isolated from a sheep abscess in the Brazil. The sequencing was performed with the Ion Torrent Personal Genome Machine (PGM) system, a fragment library, and a coverage of ~48-fold. The genome presented is a circular chromosome with 2,337,451 bp in length, 2,119 coding sequences, 12 rRNAs, 49 tRNAs, and a G+C content of 52.83%. PMID:26184935

  6. Complete Genome Sequence of Corynebacterium pseudotuberculosis Strain 12C

    PubMed Central

    Sousa, Thiago Jesus; Mariano, Diego; Parise, Doglas; Parise, Mariana; Viana, Marcus Vinicius Canário; Guimarães, Luis Carlos; Benevides, Leandro Jesus; Rocha, Flávia; Bagano, Priscilla; Ramos, Rommel; Silva, Artur; Figueiredo, Henrique; Almeida, Sintia

    2015-01-01

    We present here the complete genome sequence of Corynebacterium pseudotuberculosis strain 12C, isolated from a sheep abscess in the Brazil. The sequencing was performed with the Ion Torrent Personal Genome Machine (PGM) system, a fragment library, and a coverage of ~48-fold. The genome presented is a circular chromosome with 2,337,451 bp in length, 2,119 coding sequences, 12 rRNAs, 49 tRNAs, and a G+C content of 52.83%. PMID:26184935

  7. Limitations of next-generation genome sequence assembly.

    PubMed

    Alkan, Can; Sajjadian, Saba; Eichler, Evan E

    2011-01-01

    High-throughput sequencing technologies promise to transform the fields of genetics and comparative biology by delivering tens of thousands of genomes in the near future. Although it is feasible to construct de novo genome assemblies in a few months, there has been relatively little attention to what is lost by sole application of short sequence reads. We compared the recent de novo assemblies using the short oligonucleotide analysis package (SOAP), generated from the genomes of a Han Chinese individual and a Yoruban individual, to experimentally validated genomic features. We found that de novo assemblies were 16.2% shorter than the reference genome and that 420.2 megabase pairs of common repeats and 99.1% of validated duplicated sequences were missing from the genome. Consequently, over 2,377 coding exons were completely missing. We conclude that high-quality sequencing approaches must be considered in conjunction with high-throughput sequencing for comparative genomics analyses and studies of genome evolution. PMID:21102452

  8. Genomic Treasure Troves: Complete Genome Sequencing of Herbarium and Insect Museum Specimens

    PubMed Central

    Staats, Martijn; Erkens, Roy H. J.; van de Vossenberg, Bart; Wieringa, Jan J.; Kraaijeveld, Ken; Stielow, Benjamin; Geml, Jzsef; Richardson, James E.; Bakker, Freek T.

    2013-01-01

    Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 2282 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.497.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.271.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal. Furthermore, NGS of historical DNA enables recovering crucial genetic information from old type specimens that to date have remained mostly unutilized and, thus, opens up a new frontier for taxonomic research as well. PMID:23922691

  9. Genome Sequence of a Novel Iflavirus from mRNA Sequencing of the Butterfly Heliconius erato

    PubMed Central

    Macias-Muoz, Aide; Briscoe, Adriana D.

    2014-01-01

    Here, we report the genome sequence of a novel iflavirus strain recovered from the neotropical butterfly Heliconius erato. The coding DNA sequence (CDS) of the iflavirus genome was 8,895 nucleotides in length, encoding a polyprotein that was 2,965amino acids long. PMID:24831145

  10. BAC-pool 454-sequencing: A rapid and efficient approach to sequence complex tetraploid cotton genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    New and emerging next generation sequencing technologies have been promising in reducing sequencing costs, but not significantly for complex polyploid plant genomes such as cotton. Large and highly repetitive genome of G. hirsutum (~2.5GB) is less amenable and cost-intensive with traditional BAC-by...

  11. Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genetic and genomic analyses of Upland cotton (Gossypium hirsutum) are difficult because it has a complex allotetraploid (AADD; 2n = 4x = 52) genome. Here we sequenced, assembled and analyzed the world's most important cultivated cotton genome with 246.2 gigabase (Gb) clean data obtained using whol...

  12. Complete genome sequences for 59 burkholderia isolates, both pathogenic and near neighbor.

    PubMed

    Johnson, Shannon L; Bishop-Lilly, Kimberly A; Ladner, Jason T; Daligault, Hajnalka E; Davenport, Karen W; Jaissle, James; Frey, Kenneth G; Koroleva, Galina I; Bruce, David C; Coyne, Susan R; Broomall, Stacey M; Li, Po-E; Teshima, Hazuki; Gibbons, Henry S; Palacios, Gustavo F; Rosenzweig, C Nicole; Redden, Cassie L; Xu, Yan; Minogue, Timothy D; Chain, Patrick S

    2015-01-01

    The genus Burkholderia encompasses both pathogenic (including Burkholderia mallei and Burkholderia pseudomallei, U.S. Centers for Disease Control and Prevention Category B listed), and nonpathogenic Gram-negative bacilli. Here we present full genome sequences for a panel of 59 Burkholderia strains, selected to aid in detection assay development. PMID:25931592

  13. Draft Genome Sequence of Klebsiella pneumoniae UCD-JA29 Isolated from a Patient with Sepsis.

    PubMed

    Alexiev, Alexandra; Coil, David A; Jospin, Guillaume; Eisen, Jonathan A; Adams, Jason Y

    2016-01-01

    Here, we present the 6,155,188-bp draft genome sequence of Klebsiella pneumoniae UCD-JA29, isolated from blood cultures from a patient with sepsis at the University of California, Davis Medical Center in Sacramento, California, USA. PMID:27151785

  14. Complete Genome Sequences for 59 Burkholderia Isolates, Both Pathogenic and Near Neighbor

    PubMed Central

    Bishop-Lilly, Kimberly A.; Ladner, Jason T.; Daligault, Hajnalka E.; Davenport, Karen W.; Jaissle, James; Frey, Kenneth G.; Koroleva, Galina I.; Bruce, David C.; Coyne, Susan R.; Broomall, Stacey M.; Li, Po-E; Teshima, Hazuki; Gibbons, Henry S.; Palacios, Gustavo F.; Rosenzweig, C. Nicole; Redden, Cassie L.; Xu, Yan; Minogue, Timothy D.; Chain, Patrick S.

    2015-01-01

    The genus Burkholderia encompasses both pathogenic (including Burkholderia mallei and Burkholderia pseudomallei, U.S. Centers for Disease Control and Prevention Category B listed), and nonpathogenic Gram-negative bacilli. Here we present full genome sequences for a panel of 59 Burkholderia strains, selected to aid in detection assay development. PMID:25931592

  15. Draft Genome Sequence of Klebsiella pneumoniae UCD-JA29 Isolated from a Patient with Sepsis

    PubMed Central

    Alexiev, Alexandra; Coil, David A.; Jospin, Guillaume; Adams, Jason Y.

    2016-01-01

    Here, we present the 6,155,188-bp draft genome sequence of Klebsiella pneumoniae UCD-JA29, isolated from blood cultures from a patient with sepsis at the University of California, Davis Medical Center in Sacramento, California, USA. PMID:27151785

  16. Complete Genome Sequences for 59 Burkholderia Isolates, Both Pathogenic and Near Neighbor

    DOE PAGESBeta

    Johnson, Shannon L.; Bishop-Lilly, Kimberly A.; Ladner, Jason T.; Daligault, Hajnalka E.; Davenport, Karen W.; Jaissle, James; Frey, Kenneth G.; Koroleva, Galina I.; Bruce, David C.; Coyne, Susan R.; et al

    2015-04-30

    The genus Burkholderia encompasses both pathogenic (including Burkholderia mallei and Burkholderia pseudomallei, U.S. Centers for Disease Control and Prevention Category B listed), and nonpathogenic Gram-negative bacilli. Presented in this document are full genome sequences for a panel of 59 Burkholderia strains, selected to aid in detection assay development.

  17. Genome sequencing and analysis of the model grass Brachypodium distachyon

    SciTech Connect

    Yang, Xiaohan; Kalluri, Udaya C; Tuskan, Gerald A

    2010-01-01

    Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.

  18. Complete genome sequence of Cellulomonas flavigena type strain (134T)

    PubMed Central

    Abt, Birte; Foster, Brian; Lapidus, Alla; Clum, Alicia; Sun, Hui; Pukall, Rüdiger; Lucas, Susan; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Goodwin, Lynne; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Rohde, Manfred; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2010-01-01

    Cellulomonas flavigena (Kellerman and McBeth 1912) Bergey et al. 1923 is the type species of the genus Cellulomonas of the actinobacterial family Cellulomonadaceae. Members of the genus Cellulomonas are of special interest for their ability to degrade cellulose and hemicellulose, particularly with regard to the use of biomass as an alternative energy source. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the genus Cellulomonas, and next to the human pathogen Tropheryma whipplei the second complete genome sequence within the actinobacterial family Cellulomonadaceae. The 4,123,179 bp long single replicon genome with its 3,735 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304688

  19. The complete chloroplast genome sequence of Zanthoxylum piperitum.

    PubMed

    Lee, Jonghoon; Lee, Hyeon Ju; Kim, Kyunghee; Lee, Sang-Choon; Sung, Sang Hyun; Yang, Tae-Jin

    2016-09-01

    The complete chloroplast genome sequence of Zanthoxylum piperitum, a plant species with useful aromatic oils in family Rutaceae, was generated in this study by de novo assembly with whole-genome sequence data. The chloroplast genome was 158 154 bp in length with a typical quadripartite structure containing a pair of inverted repeats of 27 644 bp, separated by large single copy and small single copy of 85 340 bp and 17 526 bp, respectively. The chloroplast genome harbored 112 genes consisting of 78 protein-coding genes 30 tRNA genes and 4 rRNA genes. Phylogenetic analysis of the complete chloroplast genome sequences with those of known relatives revealed that Z. piperitum is most closely related to the Citrus species. PMID:26260183

  20. Complete genome sequence of Cellulomonas flavigena type strain (134T)

    SciTech Connect

    Abt, Birte; Foster, Brian; Lapidus, Alla L.; Clum, Alicia; Sun, Hui; Pukall, Rudiger; Lucas, Susan; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Goodwin, Lynne A.; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Rohde, Manfred; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-01

    Cellulomonas flavigena (Kellerman and McBeth 1912) Bergey et al. 1923 is the type species of the genus Cellulomonas of the actinobacterial family Cellulomonadaceae. Members of the genus Cellulomonas are of special interest for their ability to degrade cellulose and hemicellulose, particularly with regard to the use of biomass as an alternative energy source. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the genus Cellulomonas, and next to the human pathogen Tropheryma whipplei the second complete genome sequence within the actinobacterial family Cellulomonadaceae. The 4,123,179 bp long single replicon genome with its 3,735 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  1. Genome sequencing and analysis of the model grass Brachypodium distachyon.

    PubMed

    2010-02-11

    Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops. PMID:20148030

  2. Draft Genome Sequence of Stenotrophomonas maltophilia Strain UV74 Reveals Extensive Variability within Its Genomic Group.

    PubMed

    Conchillo-Solé, Oscar; Yero, Daniel; Coves, Xavier; Huedo, Pol; Martínez-Servat, Sònia; Daura, Xavier; Gibert, Isidre

    2015-01-01

    We report the draft genome sequence of Stenotrophomonas maltophilia UV74, isolated from a vascular ulcer. This draft genome sequence shall contribute to the understanding of the evolution and pathogenicity of this species, particularly regarding isolates of clinical origin. PMID:26067959

  3. The Release 6 reference sequence of the Drosophila melanogaster genome

    PubMed Central

    Carlson, Joseph W.; Wan, Kenneth H.; Park, Soo; Mendez, Ivonne; Galle, Samuel E.; Booth, Benjamin W.; Pfeiffer, Barret D.; George, Reed A.; Svirskas, Robert; Krzywinski, Martin; Schein, Jacqueline; Accardo, Maria Carmela; Damia, Elisabetta; Messina, Giovanni; Méndez-Lago, María; de Pablos, Beatriz; Demakova, Olga V.; Andreyeva, Evgeniya N.; Boldyreva, Lidiya V.; Marra, Marco; Carvalho, A. Bernardo; Dimitri, Patrizio; Villasante, Alfredo; Zhimulev, Igor F.; Rubin, Gerald M.; Karpen, Gary H.

    2015-01-01

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads. PMID:25589440

  4. The Release 6 reference sequence of the Drosophila melanogaster genome.

    PubMed

    Hoskins, Roger A; Carlson, Joseph W; Wan, Kenneth H; Park, Soo; Mendez, Ivonne; Galle, Samuel E; Booth, Benjamin W; Pfeiffer, Barret D; George, Reed A; Svirskas, Robert; Krzywinski, Martin; Schein, Jacqueline; Accardo, Maria Carmela; Damia, Elisabetta; Messina, Giovanni; Méndez-Lago, María; de Pablos, Beatriz; Demakova, Olga V; Andreyeva, Evgeniya N; Boldyreva, Lidiya V; Marra, Marco; Carvalho, A Bernardo; Dimitri, Patrizio; Villasante, Alfredo; Zhimulev, Igor F; Rubin, Gerald M; Karpen, Gary H; Celniker, Susan E

    2015-03-01

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads. PMID:25589440

  5. The Arabidopsis lyrata genome sequence and the basis of rapid genome size change

    SciTech Connect

    Hu, Tina T.; Pattyn, Pedro; Bakker, Erica G.; Cao, Jun; Cheng, Jan-Fang; Clark, Richard M.; Fahlgren, Noah; Fawcett, Jeffrey A.; Grimwood, Jane; Gundlach, Heidrun; Haberer, Georg; Hollister, Jesse D.; Ossowski, Stephan; Ottilar, Robert P.; Salamov, Asaf A.; Schneeberger, Korbinian; Spannagl, Manuel; Wang, Xi; Yang, Liang; Nasrallah, Mikhail E.; Bergelson, Joy; Carrington, James C.; Gaut, Brandon S.; Schmutz, Jeremy; Mayer, Klaus F. X.; Van de Peer, Yves; Grigoriev, Igor V.; Nordborg, Magnus; Weigel, Detlef; Guo, Ya-Long

    2011-04-29

    In our manuscript, we present a high-quality genome sequence of the Arabidopsis thaliana relative, Arabidopsis lyrata, produced by dideoxy sequencing. We have performed the usual types of genome analysis (gene annotation, dN/dS studies etc. etc.), but this is relegated to the Supporting Information. Instead, we focus on what was a major motivation for sequencing this genome, namely to understand how A. thaliana lost half its genome in a few million years and lived to tell the tale. The rather surprising conclusion is that there is not a single genomic feature that accounts for the reduced genome, but that every aspect centromeres, intergenic regions, transposable elements, gene family number is affected through hundreds of thousands of cuts. This strongly suggests that overall genome size in itself is what has been under selection, a suggestion that is strongly supported by our demonstration (using population genetics data from A. thaliana) that new deletions seem to be driven to fixation.

  6. The complete chloroplast genome sequence of Panax quinquefolius (L.).

    PubMed

    Kim, Kyunghee; Lee, Sang-Choon; Lee, Junki; Kim, Nam-Hoon; Jang, Woojong; Yang, Tae-Jin

    2016-07-01

    The complete chloroplast genome sequence of Panax quinquefolius, an important medicinal herb, was generated by de novo assembly with low-coverage whole-genome sequence data and manual correction. A circular 156 088-bp chloroplast genome showed typical chloroplast genome structure comprising a large single copy region of 86 095 bp, a small single copy region of 17 993 bp, and a pair of inverted repeats of 26 000 bp. The chloroplast genome had 87 protein-coding genes, 37 tRNA genes, and eight rRNA genes. Phylogenetic analysis with the chloroplast genome revealed that P. quinquefolius is much closer to P. ginseng than P. notoginseng. PMID:26162051

  7. Complete genome sequences of six strains of the genus methylobacterium

    SciTech Connect

    Marx, Christopher J; Bringel, Francoise O.; Christoserdova, Ludmila; Moulin, Lionel; Farhan Ul Haque, Muhammad; Fleischman, Darrell E.; Gruffaz, Christelle; Jourand, Philippe; Knief, Claudia; Lee, Ming-Chun; Muller, Emilie E. L.; Nadalig, Thierry; Peyraud, Remi; Roselli, Sandro; Russ, Lina; Aguero, Fernan; Goodwin, Lynne A.; Ivanova, N; Kyrpides, Nikos C; Lajus, Aurelie; Medigue, Claudine; Nolan, Matt; Woyke, Tanja; Stolyar, Sergey; Vorholt, Julia A.; Vuilleumier, Stephane

    2012-01-01

    The complete and assembled genome sequences were determined for six strains of the alphaproteobacterial genus Methylobacterium, chosen for their key adaptations to different plant-associated niches and environmental constraints.

  8. Genome sequence of the halophilic archaeon Halococcus hamelinensis.

    PubMed

    Burns, Brendan P; Gudhka, Reema K; Neilan, Brett A

    2012-04-01

    Halococcus hamelinensis was isolated from hypersaline stromatolites in Shark Bay, Australia. Here we report the genome sequence (3,133,046 bp) of H. hamelinensis, which provides insights into the ecology, evolution, and adaptation of this novel microorganism. PMID:22461544

  9. Complete Genome Sequence of Rahnella aquatilis CIP 78.65

    SciTech Connect

    Martinez, Robert J; Bruce, David; Detter, J C; Goodwin, Lynne A.; Han, James; Han, Cliff; Held, Brittany; Land, Miriam L; Mikhailova, Natalia; Nolan, Matt; Pennacchio, Len; Pitluck, Sam; Tapia, Roxanne; Woyke, Tanja; Sobeckya, Patricia A.

    2012-01-01

    Rahnella aquatilis CIP 78.65 is a gammaproteobacterium isolated from a drinking water source in Lille, France. Here we report the complete genome sequence of Rahnella aquatilis CIP 78.65, the type strain of R. aquatilis.

  10. Draft Genome Sequences of Three Mycobacterium chimaera Respiratory Isolates

    PubMed Central

    Roycroft, Emma; Raftery, Philomena; Mok, Simone; Fitzgibbon, Margaret; Rogers, Thomas R.

    2015-01-01

    Mycobacterium chimaera is an opportunistic human pathogen implicated in both pulmonary and cardiovascular infections. Here, we report the draft genome sequences of three strains isolated from human respiratory specimens. PMID:26634757

  11. Genome sequence of the fish pathogen Flavobacterium columnare ATCC 49512

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Flavobacterium columnare is a Gram-negative, rod shaped, motile, and highly prevalent fish pathogen causing columnaris disease in freshwater fish worldwide. Here, we present the complete genome sequence of F. columnare strain ATCC 49512. ...

  12. Complete Genome Sequence of Mycobacterium phlei Type Strain RIVM601174

    PubMed Central

    Rashid, Mamoon; Adroub, Sabir A.; Arnoux, Marc; Ali, Shahjahan; van Soolingen, Dick; Bitter, Wilbert

    2012-01-01

    Mycobacterium phlei is a rapidly growing nontuberculous Mycobacterium species that is typically nonpathogenic, with few reported cases of human disease. Here we report the whole genome sequence of M. phlei type strain RIVM601174. PMID:22628511

  13. Draft Genome Sequences of Gammaproteobacterial Methanotrophs Isolated from Marine Ecosystems.

    PubMed

    Flynn, James D; Hirayama, Hisako; Sakai, Yasuyoshi; Dunfield, Peter F; Klotz, Martin G; Knief, Claudia; Op den Camp, Huub J M; Jetten, Mike S M; Khmelenina, Valentina N; Trotsenko, Yuri A; Murrell, J Colin; Semrau, Jeremy D; Svenning, Mette M; Stein, Lisa Y; Kyrpides, Nikos; Shapiro, Nicole; Woyke, Tanja; Bringel, Françoise; Vuilleumier, Stéphane; DiSpirito, Alan A; Kalyuzhnaya, Marina G

    2016-01-01

    The genome sequences of Methylobacter marinus A45, Methylobacter sp. strain BBA5.1, and Methylomarinum vadi IT-4 were obtained. These aerobic methanotrophs are typical members of coastal and hydrothermal vent marine ecosystems. PMID:26798114

  14. Draft Genome Sequences of Gammaproteobacterial Methanotrophs Isolated from Marine Ecosystems

    PubMed Central

    Flynn, James D.; Hirayama, Hisako; Sakai, Yasuyoshi; Dunfield, Peter F.; Knief, Claudia; Op den Camp, Huub J. M.; Jetten, Mike S. M.; Khmelenina, Valentina N.; Trotsenko, Yuri A.; Murrell, J. Colin; Semrau, Jeremy D.; Svenning, Mette M.; Stein, Lisa Y.; Kyrpides, Nikos; Shapiro, Nicole; Woyke, Tanja; Bringel, Françoise; Vuilleumier, Stéphane; DiSpirito, Alan A.

    2016-01-01

    The genome sequences of Methylobacter marinus A45, Methylobacter sp. strain BBA5.1, and Methylomarinum vadi IT-4 were obtained. These aerobic methanotrophs are typical members of coastal and hydrothermal vent marine ecosystems. PMID:26798114

  15. Complete Genome Sequences of Six Strains of the Genus Methylobacterium

    SciTech Connect

    Marx, Christopher J; Bringel, Francoise O.; Christoserdova, Ludmila; Moulin, Lionel; UI Hague, Muhammad Farhan; Fleischman, Darrell E.; Gruffaz, Christelle; Jourand, Philippe; Knief, Claudia; Lee, Ming-Chun; Muller, Emilie E. L.; Nadalig, Thierry; Peyraud, Remi; Roselli, Sandro; Russ, Lina; Goodwin, Lynne A.; Ivanov, Pavel S.; Ivanova, N; Kyrpides, Nikos C; Lajus, Aurelie; Medigue, Claudine; Nolan, Matt; Woyke, Tanja; Stolyar, Sergey; Vorholt, Julia A.; Vuilleumier, Stephane

    2012-01-01

    The complete and assembled genome sequences were determined for six strains of the alphaproteobacterial genus Methylobacterium, chosen for their key adaptations to different plant-associated niches and environmental constraints.

  16. Draft Genome Sequence of Pseudomonas syringae pv. persicae NCPPB 2254.

    PubMed

    Zhao, Wenjun; Jiang, Hongshan; Tian, Qian; Hu, Jie

    2015-01-01

    Pseudomonas syringae pv. persicae is a pathogen that causes bacterial decline of stone fruit. Here, we report the draft genome sequence for P. syringae pv. persicae, which was isolated from Prunus persica. PMID:26044420

  17. Complete Genome Sequence of Fish Pathogen Aeromonas hydrophila JBN2301.

    PubMed

    Yang, Wuming; Li, Ningqiu; Li, Ming; Zhang, Defeng; An, Guannan

    2016-01-01

    Aeromonas hydrophila is one of the most important fish pathogens in China. Here, we report complete genome sequence of a virulent strain, A. hydrophila JBN2301, which was isolated from diseased crucian carp. PMID:26823580

  18. Complete Genome Sequence of Fish Pathogen Aeromonas hydrophila JBN2301

    PubMed Central

    Yang, Wuming; Li, Ming; Zhang, Defeng; An, Guannan

    2016-01-01

    Aeromonas hydrophila is one of the most important fish pathogens in China. Here, we report complete genome sequence of a virulent strain, A. hydrophila JBN2301, which was isolated from diseased crucian carp. PMID:26823580

  19. Sequence analysis of the complete mitochondrial genome of Youxian sheldrake.

    PubMed

    He, Shao-Ping; Liu, Li-Li; Yu, Qi-Fang; Li, Si; He, Jian-Hua

    2016-03-01

    Youxian sheldrake is excellent native breeds in Hunan province in China. The complete mitochondrial (mt) genome sequence plays an important role in the accurate determination of phylogenetic relationships among metazoans. This is the first study to determine the complete mitochondrial genome sequence of Youxian sheldrake using PCR-based amplification and Sanger sequencing. The characteristic of the entire mitochondrial genome was analyzed in detail, the total length of the mitogenome is 16,605?bp, with the base composition of 29.21% A, 22.18% T, 32.84% C, 15.77% G in the Youxian sheldrake. It contained 2 ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and a major non-coding control region (D-loop region). The complete mitochondrial genome sequence of Youxian sheldrake provided an important data for further study of the phylogenetics of poultry, and available data for the genetics and breeding. PMID:25090395

  20. Draft Genome Sequence of Mycobacterium fortuitum Isolated from Murine Brain

    PubMed Central

    Singh, Alok Kumar; Karaulia, Pratiksha

    2016-01-01

    Mycobacterium fortuitum subsp. fortuitum ATCC 6841 is a type and standard laboratory testing quality control strain. We report here the completed draft genome sequence for a strain isolated from the brains of M. fortuitum-infected mice. PMID:27034497

  1. Genome sequence of vanilla distortion mosaic virus infecting Coriandrum sativum.

    PubMed

    Adams, I P; Rai, S; Deka, M; Harju, V; Hodges, T; Hayward, G; Skelton, A; Fox, A; Boonham, N

    2014-12-01

    The 9573-nucleotide genome of a potyvirus was sequenced from a Coriandrum sativum plant from India with viral symptoms. On analysis, this virus was shown to have greater than 85 % nucleotide sequence identity to vanilla distortion mosaic virus (VDMV). Analysis of the putative coat protein sequence confirmed that this virus was in fact VDMV, with greater than 91 % amino acid sequence identity. The genome appears to encode a 3083-amino-acid polyprotein potentially cleaved into the 10 mature proteins expected in potyviruses. Phylogenetic analysis confirmed that VDMV is a distinct but ungrouped member of the genus Potyvirus. PMID:25252813

  2. Complete genome sequence of Treponema pallidum strain DAL-1

    PubMed Central

    Zobaníková, Marie; Mikolka, Pavol; Čejková, Darina; Pospíšilová, Petra; Chen, Lei; Strouhal, Michal; Qin, Xiang; Weinstock, George M.; Šmajs, David

    2012-01-01

    Treponema pallidum strain DAL-1 is a human uncultivable pathogen causing the sexually transmitted disease syphilis. Strain DAL-1 was isolated from the amniotic fluid of a pregnant woman in the secondary stage of syphilis. Here we describe the 1,139,971 bp long genome of T. pallidum strain DAL-1 which was sequenced using two independent sequencing methods (454 pyrosequencing and Illumina). In rabbits, strain DAL-1 replicated better than the T. pallidum strain Nichols. The comparison of the complete DAL-1 genome sequence with the Nichols sequence revealed a list of genetic differences that are potentially responsible for the increased rabbit virulence of the DAL-1 strain. PMID:23449808

  3. Sequencing, Assembling, and Correcting Draft Genomes Using Recombinant Populations

    PubMed Central

    Hahn, Matthew W.; Zhang, Simo V.; Moyle, Leonie C.

    2014-01-01

    Current de novo whole-genome sequencing approaches often are inadequate for organisms lacking substantial preexisting genetic data. Problems with these methods are manifest as: large numbers of scaffolds that are not ordered within chromosomes or assigned to individual chromosomes, misassembly of allelic sequences as separate loci when the individual(s) being sequenced are heterozygous, and the collapse of recently duplicated sequences into a single locus, regardless of levels of heterozygosity. Here we propose a new approach for producing de novo whole-genome sequences—which we call recombinant population genome construction—that solves many of the problems encountered in standard genome assembly and that can be applied in model and nonmodel organisms. Our approach takes advantage of next-generation sequencing technologies to simultaneously barcode and sequence a large number of individuals from a recombinant population. The sequences of all recombinants can be combined to create an initial de novo assembly, followed by the use of individual recombinant genotypes to correct assembly splitting/collapsing and to order and orient scaffolds within linkage groups. Recombinant population genome construction can rapidly accelerate the transformation of nonmodel species into genome-enabled systems by simultaneously producing a high-quality genome assembly and providing genomic tools (e.g., high-confidence single-nucleotide polymorphisms) for immediate applications. In populations segregating for important functional traits, this approach also enables simultaneous mapping of quantitative trait loci. We demonstrate our method using simulated Illumina data from a recombinant population of Caenorhabditis elegans and show that the method can produce a high-fidelity, high-quality genome assembly for both parents of the cross. PMID:24531727

  4. Intra-species sequence comparisons for annotating genomes

    SciTech Connect

    Boffelli, Dario; Weer, Claire V.; Weng, Li; Lewis, Keith D.; Shoukry, Malak I.; Pachter, Lior; Keys, David N.; Rubin, Edward M.

    2004-07-15

    Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intra-species sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents and a set of genomic intervals amplified, resequenced and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom and raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. The sequence data from this study has been submitted to GenBank under accession nos. AY667278-AY667407.

  5. Draft genome sequence of Therminicola potens strain JR

    SciTech Connect

    Byrne-Bailey, K.G.; Wrighton, K.C.; Melnyk, R.A.; Agbo, P.; Hazen, T.C.; Coates, J.D.

    2010-07-01

    'Thermincola potens' strain JR is one of the first Gram-positive dissimilatory metal-reducing bacteria (DMRB) for which there is a complete genome sequence. Consistent with the physiology of this organism, preliminary annotation revealed an abundance of multiheme c-type cytochromes that are putatively associated with the periplasm and cell surface in a Gram-positive bacterium. Here we report the complete genome sequence of strain JR.

  6. Arabidopsis genomic information for interpreting wheat EST sequences.

    PubMed

    Clarke, Bryan; Lambrecht, Mark; Rhee, Seung Y

    2003-03-01

    The resources available from Arabidopsis thaliana for interpreting functional attributes of wheat EST are reviewed. A focus for the review is a comparison between wheat EST sequences, generated from developing endosperm tissue, and the complete genomic sequence from Arabidopsis. The available information indicates that not only can tentative annotations be assigned to many wheat genes but also putative or unknown Arabidopsis gene annotations can be improved by comparative genomics. PMID:12590341

  7. Genome sequence of the cultivated cotton Gossypium arboreum

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Cotton is one of the most economically important natural fiber crops in the world, and the complex tetraploid nature of its genome (AADD, 2n = 52) makes genetic, genomic and functional analyses extremely challenging. Here we sequenced and assembled 98.3% of the 1.7-gigabase G. arboreum (AA, 2n = 26...

  8. Complete Genome Sequence of Cyanobacterium Leptolyngbya sp. NIES-3755

    PubMed Central

    Fujisawa, Takatomo; Ohtsubo, Yoshiyuki; Katayama, Mitsunori; Misawa, Naomi; Wakazuki, Sachiko; Shimura, Yohei; Nakamura, Yasukazu; Kawachi, Masanobu; Yoshikawa, Hirofumi; Eki, Toshihiko

    2016-01-01

    Cyanobacterial genus Leptolyngbya comprises genetically diverse species, but the availability of their complete genome information is limited. Here, we isolated Leptolyngbya sp. strain NIES-3755 from soil at the Toyohashi University of Technology, Japan. We determined the complete genome sequence of the NIES-3755 strain, which is composed of one chromosome and three plasmids. PMID:26988037

  9. A snapshot of the emerging tomato genome sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genome of tomato (Solanum lycopersicum) is being sequenced by an international consortium of 10 countries (Korea, China, the United Kingdom, India, the Netherlands, France, Japan, Spain, Italy and the United States) as part of a larger initiative called the ‘International Solanaceae Genome Proje...

  10. Complete Genome Sequence of Klebsiella pneumoniae YH43.

    PubMed

    Iwase, Tadayuki; Ogura, Yoshitoshi; Hayashi, Tetsuya; Mizunoe, Yoshimitsu

    2016-01-01

    We report here the complete genome sequence ofKlebsiella pneumoniaestrain YH43, isolated from sweet potato. The genome consists of a single circular chromosome of 5,520,319 bp in length. It carries 8 copies of rRNA operons, 86 tRNA genes, 5,154 protein-coding genes, and thenifgene cluster for nitrogen fixation. PMID:27081127

  11. Complete Genome Sequence of Mycoplasma synoviae Strain WVU 1853T

    PubMed Central

    Kutish, Gerald F.; Barbet, Anthony F.; Michaels, Dina L.

    2015-01-01

    A hybrid sequence assembly of the complete Mycoplasma synoviae type strain WVU 1853T genome was compared to that of strain MS53. The findings support prior conclusions about M. synoviae, based on the genome of that otherwise uncharacterized field strain, and provide the first evidence of epigenetic modifications in M. synoviae. PMID:26021934

  12. Draft Genome Sequence of Linfuranone Producer Microbispora sp. GMKU 363.

    PubMed

    Komaki, Hisayuki; Ichikawa, Natsuko; Hosoyama, Akira; Fujita, Nobuyuki; Thamchaipenet, Arinthip; Igarashi, Yasuhiro

    2015-01-01

    Here, we report the draft genome sequence of Microbispora sp. GMKU 363, a plant-derived actinomycete that produces linfuranone A, a linear polyketide modified with a furanone ring possessing adipocyte differentiation inducing activity. The biosynthetic gene cluster for linfuranone was identified by analyzing polyketide synthase genes in the genome. PMID:26659694

  13. Whole-Genome Sequences of Three Symbiotic Endozoicomonas Bacteria

    PubMed Central

    Neave, Matthew J.; Michell, Craig T.

    2014-01-01

    Members of the genus Endozoicomonas associate with a wide range of marine organisms. Here, we report on the whole-genome sequencing, assembly, and annotation of three Endozoicomonas type strains. These data will assist in exploring interactions between Endozoicomonas organisms and their hosts, and it will aid in the assembly of genomes from uncultivated Endozoicomonas spp. PMID:25125646

  14. Complete Mitochondrial Genome Sequence of the Pezizomycete Pyronema confluens

    PubMed Central

    2016-01-01

    The complete mitochondrial genome of the ascomycete Pyronema confluens has been sequenced. The circular genome has a size of 191 kb and contains 48 protein-coding genes, 26 tRNA genes, and two rRNA genes. Of the protein-coding genes, 14 encode conserved mitochondrial proteins, and 31 encode predicted homing endonuclease genes. PMID:27174271

  15. First Complete Genome Sequence of Felis catus Gammaherpesvirus 1

    PubMed Central

    Lee, Justin S.; Vuyisich, Momchilo; Chain, Patrick; Lo, Chien-Chi; Kronmiller, Brent; Bracha, Shay; Avery, Anne C.; VandeWoude, Sue

    2015-01-01

    We sequenced the complete genome of Felis catus gammaherpesvirus 1 (FcaGHV1) from lymph node DNA of an infected cat. The genome includes a 121,556-nucleotide unique region with 87 predicted open reading frames (61 gammaherpesvirus conserved and 26 unique) flanked by multiple copies of a 966-nucleotide terminal repeat. PMID:26543105

  16. Draft Genome Sequence of Mycobacterium lentiflavum CSUR P1491

    PubMed Central

    Phelippeau, Michael; Croce, Olivier; Robert, Catherine; Raoult, Didier

    2015-01-01

    We announce the draft genome sequence of Mycobacterium lentiflavum strain CSUR P1491, a nontuberculous mycobacterium responsible for opportunistic potentially life-threatening infections in immunocompromised patients. The genome described here comprises a 6,818,507-bp chromosome exhibiting a 65.75% G+C content, 6,354 protein-coding genes, and 75 RNA genes. PMID:26205866

  17. Draft Genome Sequence of Mycobacterium triplex DSM 44626

    PubMed Central

    Sassi, Mohamed; Croce, Olivier; Robert, Catherine; Raoult, Didier

    2014-01-01

    We announce the draft genome sequence of Mycobacterium triplex strain DSM 44626, a nontuberculosis species responsible for opportunistic infections. The genome described here is composed of 6,382,840 bp, with a G+C content of 66.57%, and contains 5,988 protein-coding genes and 81 RNA genes. PMID:24874681

  18. Draft Genome Sequence of Mycobacterium europaeum Strain CSUR P1344

    PubMed Central

    Phelippeau, Michael; Croce, Olivier; Robert, Catherine; Raoult, Didier

    2015-01-01

    We report the draft genome sequence of Mycobacterium europaeum strain CSUR P1344, a slowly growing mycobacterium of the Mycobacterium simiae complex and opportunistic respiratory tract colonizer and pathogen. This genome of 6,152,523 bp exhibits a 68.18% G+C content, encoding 5,814 predicted proteins and 74 RNAs. PMID:26205865

  19. Draft Genome Sequence of Mycobacterium vulneris DSM 45247T

    PubMed Central

    Croce, Olivier; Robert, Catherine; Raoult, Didier

    2014-01-01

    We report the draft genome sequence of Mycobacterium vulneris DSM 45247T strain, an emerging, opportunistic pathogen of the Mycobacterium avium complex. The genome described here is composed of 6,981,439 bp (with a G+C content of 67.14%) and has 6,653 protein-coding genes and 84 predicted RNA genes. PMID:24812218

  20. Draft Genome Sequence of Mycobacterium mageritense DSM 44476T

    PubMed Central

    Croce, Olivier; Robert, Catherine; Raoult, Didier

    2014-01-01

    We report the draft genome sequence of Mycobacterium mageritense strain DSM 44476T (CIP 104973), a nontuberculosis species responsible for various infections. The genome described here is composed of 7,966,608 bp, with a G+C content of 66.95%, and contains 7,675 protein-coding genes and 120 predicted RNA genes. PMID:24786954

  1. Genomic sequence for the aflatoxigenic filamentous fungus Aspergillus nomius

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genome of the A. nomius type strain was sequenced using a personal genome machine. Annotation of the genes was undertaken, followed by gene ontology and an investigation into the number of secondary metabolite clusters. Comparative studies with other Aspergillus species involved shared/unique ge...

  2. Draft genome sequence of the silver pomfret fish, Pampus argenteus.

    PubMed

    AlMomin, Sabah; Kumar, Vinod; Al-Amad, Sami; Al-Hussaini, Mohsen; Dashti, Talal; Al-Enezi, Khaznah; Akbar, Abrar

    2016-01-01

    Silver pomfret, Pampus argenteus, is a fish species from coastal waters. Despite its high commercial value, this edible fish has not been sequenced. Hence, its genetic and genomic studies have been limited. We report the first draft genome sequence of the silver pomfret obtained using a Next Generation Sequencing (NGS) technology. We assembled 38.7 Gb of nucleotides into scaffolds of 350 Mb with N50 of about 1.5 kb, using high quality paired end reads. These scaffolds represent 63.7% of the estimated silver pomfret genome length. The newly sequenced and assembled genome has 11.06% repetitive DNA regions, and this percentage is comparable to that of the tilapia genome. The genome analysis predicted 16 322 genes. About 91% of these genes showed homology with known proteins. Many gene clusters were annotated to protein and fatty-acid metabolism pathways that may be important in the context of the meat texture and immune system developmental processes. The reference genome can pave the way for the identification of many other genomic features that could improve breeding and population-management strategies, and it can also help characterize the genetic diversity of P. argenteus. PMID:26692342

  3. MAIZE CHLOROTIC DWARF VIRUS GENOME SEQUENCE AND POLYPROTEIN CLEAVAGE

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genomic sequence (11.8 kb) of the severe Ohio Maize chlorotic dwarf virus isolate (MCDV-S, genus Waikavirus) was determined from overlapping cDNA clones. Approximately 400 kDa polyprotein encoded by the viral genome is post-translationally cleaved into several smaller functional proteins. Wher...

  4. Mitochondrial Genome Sequence of the Glass Sponge Oopsacas minuta

    PubMed Central

    Jourda, Cyril; Santini, Sébastien; Rocher, Caroline; Le Bivic, André

    2015-01-01

    We report the complete mitochondrial genome sequence of the Mediterranean glass sponge Oopsacas minuta. This 19-kb mitochondrial genome has 24 noncoding genes (22 tRNAs and 2 rRNAs) and 14 protein-encoding genes coding for 11 subunits of respiratory chain complexes and 3 ATP synthase subunits. PMID:26227597

  5. Draft Genome Sequence of Rhodococcus sp. Strain 311R

    PubMed Central

    Ehsani, Elham; Jauregui, Ruy; Geffers, Robert; Jareck, Michael; Boon, Nico; Pieper, Dietmar H.

    2015-01-01

    Here, we report the draft genome sequence of Rhodococcus sp. strain 311R, which was isolated from a site contaminated with alkanes and aromatic compounds. Strain 311R shares 90% of the genome of Rhodococcus erythropolis SK121, which is the closest related bacteria. PMID:25999565

  6. Complete Genome Sequence of Mycobacterium bovis Strain BCG-1 (Russia)

    PubMed Central

    Shitikov, Egor A.; Malakhova, Maja V.; Kostryukova, Elena S.; Ilina, Elena N.; Atrasheuskaya, Alena V.; Ignatyev, Georgy M.; Vinokurova, Nataliya V.; Gorbachyov, Vyacheslav Y.

    2016-01-01

    Mycobacterium bovis BCG (Bacille Calmette-Guérin) is a vaccine strain used for protection against tuberculosis. Here, we announce the complete genome sequence of M. bovis strain BCG-1 (Russia). Extensive use of this strain necessitates the study of its genome stability by comparative analysis. PMID:27034492

  7. Complete Genome Sequence of Mycobacterium bovis Strain BCG-1 (Russia).

    PubMed

    Sotnikova, Evgeniya A; Shitikov, Egor A; Malakhova, Maja V; Kostryukova, Elena S; Ilina, Elena N; Atrasheuskaya, Alena V; Ignatyev, Georgy M; Vinokurova, Nataliya V; Gorbachyov, Vyacheslav Y

    2016-01-01

    Mycobacterium bovisBCG (Bacille Calmette-Guérin) is a vaccine strain used for protection against tuberculosis. Here, we announce the complete genome sequence ofM. bovisstrain BCG-1 (Russia). Extensive use of this strain necessitates the study of its genome stability by comparative analysis. PMID:27034492

  8. The tomato genome sequence provides insight into fleshy fruit evolution

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genome of the inbred tomato cultivar ‘Heinz 1706’ was sequenced and assembled using a combination of Sanger and “next generation” technologies. The predicted genome size is ~900 Mb, consistent with prior estimates, of which 760 Mb were assembled in 91 scaffolds aligned to the 12 tomato chromosom...

  9. Genome Sequence of Fusarium graminearum Isolate CS3005

    PubMed Central

    Stiller, Jiri; Kazan, Kemal

    2014-01-01

    Fusarium graminearum is one of the most important fungal pathogens of wheat, barley, and maize worldwide. This announcement reports the genome sequence of a highly virulent Australian isolate of this species to supplement the existing genome of the North American F. graminearum isolate Ph1. PMID:24744326

  10. Draft Genome Sequence of Linfuranone Producer Microbispora sp. GMKU 363

    PubMed Central

    Ichikawa, Natsuko; Hosoyama, Akira; Fujita, Nobuyuki; Thamchaipenet, Arinthip; Igarashi, Yasuhiro

    2015-01-01

    Here, we report the draft genome sequence of Microbispora sp. GMKU 363, a plant-derived actinomycete that produces linfuranone A, a linear polyketide modified with a furanone ring possessing adipocyte differentiation inducing activity. The biosynthetic gene cluster for linfuranone was identified by analyzing polyketide synthase genes in the genome. PMID:26659694

  11. Complete genome sequence of pronghorn virus, a pestivirus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The complete genome sequence of Pronghorn virus, a member of the Pestivirus genus of the Flaviviridae, was determined. The virus, originally isolated from a pronghorn antelope, had a genome of 12,287 nucleotides with a single open reading frame of 11,694 bases encoding 3898 amino acids....

  12. Genome Sequence of Type Strain Lysinibacillus macroides DSM 54T

    PubMed Central

    Liu, Guo-hong; Wang, Jie-ping; Che, Jian-Mei; Chen, Qian-Qian; Chen, Zheng; Ge, Ci-bin

    2015-01-01

    Lysinibacillus macroides DSM 54T is a Gram-positive, spore-forming bacterium. Here, we report the 4,866,035-bp genome sequence of Lysinibacillus macroides DSM 54T, which will accelerate the application of degrading xylan and provide useful information for genomic taxonomy and phylogenomics of Bacillus-like bacteria. PMID:26543111

  13. Complete genome sequence of Aeromonas hydrophila AL06-06

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aeromonas hydrophila occurs in freshwater environments and infects fish and mammals. In this work, we report the complete genome sequence of Aeromonas hydrophila AL06-06, which was isolated from diseased goldfish and is being used for comparative genomic studies with A. hydrophila strains causing ba...

  14. Genome Sequence of Type Strain Lysinibacillus macroides DSM 54T.

    PubMed

    Liu, Guo-Hong; Liu, Bo; Wang, Jie-Ping; Che, Jian-Mei; Chen, Qian-Qian; Chen, Zheng; Ge, Ci-Bin

    2015-01-01

    Lysinibacillus macroides DSM 54(T) is a Gram-positive, spore-forming bacterium. Here, we report the 4,866,035-bp genome sequence of Lysinibacillus macroides DSM 54(T), which will accelerate the application of degrading xylan and provide useful information for genomic taxonomy and phylogenomics of Bacillus-like bacteria. PMID:26543111

  15. Draft genome sequence of Phomopsis longicolla MSPL 10-6

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Phomopsis longicolla T.W. Hobbs is the primary cause of Phomopsis seed decay in soybean. We report the de novo assembled draft genome sequence of P. longicolla isolate MSPL10-6 with a 54.8-fold depth of coverage. The resulting draft genome was estimated to be approximately 64 Mb in size with an over...

  16. Draft Genome Sequence of Mycobacterium cosmeticum DSM 44829

    PubMed Central

    Croce, Olivier; Robert, Catherine; Raoult, Didier

    2014-01-01

    We announce the draft genome sequence of Mycobacterium cosmeticum strain DSM 44829, a nontuberculous species responsible for opportunistic infection. The genome described here is composed of 6,462,090 bp, with a G+C content of 68.24%. It contains 6,281 protein-coding genes and 75 predicted RNA genes. PMID:24723727

  17. Complete Genome Sequence of Cyanobacterium Leptolyngbya sp. NIES-3755.

    PubMed

    Hirose, Yuu; Fujisawa, Takatomo; Ohtsubo, Yoshiyuki; Katayama, Mitsunori; Misawa, Naomi; Wakazuki, Sachiko; Shimura, Yohei; Nakamura, Yasukazu; Kawachi, Masanobu; Yoshikawa, Hirofumi; Eki, Toshihiko; Kanesaki, Yu

    2016-01-01

    Cyanobacterial genus Leptolyngbya comprises genetically diverse species, but the availability of their complete genome information is limited. Here, we isolated Leptolyngbya sp. strain NIES-3755 from soil at the Toyohashi University of Technology, Japan. We determined the complete genome sequence of the NIES-3755 strain, which is composed of one chromosome and three plasmids. PMID:26988037

  18. Genome Sequence of Bacillus sp. Strain FJAT-14515

    PubMed Central

    Liu, Guohong; Tang, Weiqi; Che, Jianmei; Lin, Yingzhi; Zhu, Yujing; Su, Mingxing; Tang, Jianyang

    2014-01-01

    We report the draft genome sequence of Bacillus sp. strain FJAT-14515. The genome is 5.44 Mb in length. It covers 5,263 genes with an average length of 791 bp, has a G+C value of 37.06%, and contains 67 tRNAs, 31 small RNAs, and 5 rRNA loci. PMID:24459256

  19. Whole-Genome Sequence of Staphylococcus epidermidis Tü3298

    PubMed Central

    Moran, Josephine C.

    2016-01-01

    Staphylococcus epidermidis Tü3298 is a frequently used laboratory strain, known for its production of epidermin and absence of the icaABCD operon. We report the whole-genome sequence of this strain, a 2.5-kb genome containing 2,332 genes. PMID:26966218

  20. Whole-Genome Sequence of Staphylococcus epidermidis Tü3298.

    PubMed

    Moran, Josephine C; Horsburgh, Malcolm J

    2016-01-01

    Staphylococcus epidermidis Tü3298 is a frequently used laboratory strain, known for its production of epidermin and absence of the icaABCD operon. We report the whole-genome sequence of this strain, a 2.5-kb genome containing 2,332 genes. PMID:26966218

  1. Salmonella Serotype Determination Utilizing High-Throughput Genome Sequencing Data

    PubMed Central

    Zhang, Shaokang; Yin, Yanlong; Jones, Marcus B.; Zhang, Zhenzhen; Deatherage Kaiser, Brooke L.; Dinsmore, Blake A.; Fitzgerald, Collette; Fields, Patricia I.

    2015-01-01

    Serotyping forms the basis of national and international surveillance networks for Salmonella, one of the most prevalent foodborne pathogens worldwide (1–3). Public health microbiology is currently being transformed by whole-genome sequencing (WGS), which opens the door to serotype determination using WGS data. SeqSero (www.denglab.info/SeqSero) is a novel Web-based tool for determining Salmonella serotypes using high-throughput genome sequencing data. SeqSero is based on curated databases of Salmonella serotype determinants (rfb gene cluster, fliC and fljB alleles) and is predicted to determine serotype rapidly and accurately for nearly the full spectrum of Salmonella serotypes (more than 2,300 serotypes), from both raw sequencing reads and genome assemblies. The performance of SeqSero was evaluated by testing (i) raw reads from genomes of 308 Salmonella isolates of known serotype; (ii) raw reads from genomes of 3,306 Salmonella isolates sequenced and made publicly available by GenomeTrakr, a U.S. national monitoring network operated by the Food and Drug Administration; and (iii) 354 other publicly available draft or complete Salmonella genomes. We also demonstrated Salmonella serotype determination from raw sequencing reads of fecal metagenomes from mice orally infected with this pathogen. SeqSero can help to maintain the well-established utility of Salmonella serotyping when integrated into a platform of WGS-based pathogen subtyping and characterization. PMID:25762776

  2. Sequencing a new target genome: the Boophilus microplus (Acari: Ixodidae) genome project.

    PubMed

    Guerrero, Felix D; Nene, Vishvanath M; George, John E; Barker, Stephen C; Willadsen, Peter

    2006-01-01

    The southern cattle tick, Boophilus microplus (Canestrini), causes annual economic losses in the hundreds of millions of dollars to cattle producers throughout the world, and ranks as the most economically important tick from a global perspective. Control failures attributable to the development of pesticide resistance have become commonplace, and novel control technologies are needed. The availability of the genome sequence will facilitate the development of these new technologies, and we are proposing sequencing to a 4-6X draft coverage. Many existing biological resources are available to facilitate a genome sequencing project, including several inbred laboratory tick strains, a database of approximately 45,000 expressed sequence tags compiled into a B. microplus Gene Index, a bacterial artificial chromosome (BAC) library, an established B. microplus cell line, and genomic DNA suitable for library synthesis. Collaborative projects are underway to map BACs and cDNAs to specific chromosomes and to sequence selected BAC clones. When completed, the genome sequences from the cow, B. microplus, and the B. microplus-borne pathogens Babesia bovis and Anaplasma marginale will enhance studies of host-vector-pathogen systems. Genes involved in the regeneration of amputated tick limbs and transitions through developmental stages are largely unknown. Studies of these and other interesting biological questions will be advanced by tick genome sequence data. Comparative genomics offers the prospect of new insight into many, perhaps all, aspects of the biology of ticks and the pathogens they transmit to farm animals and people. The B. microplus genome sequence will fill a major gap in comparative genomics: a sequence from the Metastriata lineage of ticks. The purpose of the article is to synergize interest in and provide rationales for sequencing the genome of B. microplus and for publicizing currently available genomic resources for this tick. PMID:16506442

  3. Dissection of the octoploid strawberry genome by deep sequencing of the genomes of Fragaria species.

    PubMed

    Hirakawa, Hideki; Shirasawa, Kenta; Kosugi, Shunichi; Tashiro, Kosuke; Nakayama, Shinobu; Yamada, Manabu; Kohara, Mistuyo; Watanabe, Akiko; Kishida, Yoshie; Fujishiro, Tsunakazu; Tsuruoka, Hisano; Minami, Chiharu; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Komaki, Akiko; Yanagi, Tomohiro; Guoxin, Qin; Maeda, Fumi; Ishikawa, Masami; Kuhara, Satoru; Sato, Shusei; Tabata, Satoshi; Isobe, Sachiko N

    2014-01-01

    Cultivated strawberry (Fragaria x ananassa) is octoploid and shows allogamous behaviour. The present study aims at dissecting this octoploid genome through comparison with its wild relatives, F. iinumae, F. nipponica, F. nubicola, and F. orientalis by de novo whole-genome sequencing on an Illumina and Roche 454 platforms. The total length of the assembled Illumina genome sequences obtained was 698 Mb for F. x ananassa, and ∼200 Mb each for the four wild species. Subsequently, a virtual reference genome termed FANhybrid_r1.2 was constructed by integrating the sequences of the four homoeologous subgenomes of F. x ananassa, from which heterozygous regions in the Roche 454 and Illumina genome sequences were eliminated. The total length of FANhybrid_r1.2 thus created was 173.2 Mb with the N50 length of 5137 bp. The Illumina-assembled genome sequences of F. x ananassa and the four wild species were then mapped onto the reference genome, along with the previously published F. vesca genome sequence to establish the subgenomic structure of F. x ananassa. The strategy adopted in this study has turned out to be successful in dissecting the genome of octoploid F. x ananassa and appears promising when applied to the analysis of other polyploid plant species. PMID:24282021

  4. Genomic distribution of simple sequence repeats in Brassica rapa.

    PubMed

    Hong, Chang Pyo; Piao, Zhong Yun; Kang, Tae Wook; Batley, Jacqueline; Yang, Tae-Jin; Hur, Yoon-Kang; Bhak, Jong; Park, Beom-Seok; Edwards, David; Lim, Yong Pyo

    2007-06-30

    Simple Sequence Repeats (SSRs) represent short tandem duplications found within all eukaryotic organisms. To examine the distribution of SSRs in the genome of Brassica rapa ssp. pekinensis, SSRs from different genomic regions representing 17.7 Mb of genomic sequence were surveyed. SSRs appear more abundant in non-coding regions (86.6%) than in coding regions (13.4%). Comparison of SSR densities in different genomic regions demonstrated that SSR density was greatest within the 5'-flanking regions of the predicted genes. The proportion of different repeat motifs varied between genomic regions, with trinucleotide SSRs more prevalent in predicted coding regions, reflecting the codon structure in these regions. SSRs were also preferentially associated with gene-rich regions, with peri-centromeric heterochromatin SSRs mostly associated with retrotransposons. These results indicate that the distribution of SSRs in the genome is non-random. Comparison of SSR abundance between B. rapa and the closely related species Arabidopsis thaliana suggests a greater abundance of SSRs in B. rapa, which may be due to the proposed genome triplication. Our results provide a comprehensive view of SSR genomic distribution and evolution in Brassica for comparison with the sequenced genomes of A. thaliana and Oryza sativa. PMID:17646709

  5. Clinical Interpretation and Implications of Whole-Genome Sequencing

    PubMed Central

    Dewey, Frederick E.; Grove, Megan E.; Pan, Cuiping; Goldstein, Benjamin A.; Bernstein, Jonathan A.; Chaib, Hassan; Merker, Jason D.; Goldfeder, Rachel L.; Enns, Gregory M.; David, Sean P.; Pakdaman, Neda; Ormond, Kelly E.; Caleshu, Colleen; Kingham, Kerry; Klein, Teri E.; Whirl-Carrillo, Michelle; Sakamoto, Kenneth; Wheeler, Matthew T.; Butte, Atul J.; Ford, James M.; Boxer, Linda; Ioannidis, John P. A.; Yeung, Alan C.; Altman, Russ B.; Assimes, Themistocles L.; Snyder, Michael; Ashley, Euan A.; Quertermous, Thomas

    2014-01-01

    IMPORTANCE Whole-genome sequencing (WGS) is increasingly applied in clinical medicine and is expected to uncover clinically significant findings regardless of sequencing indication. OBJECTIVES To examine coverage and concordance of clinically relevant genetic variation provided by WGS technologies; to quantitate inherited disease risk and pharmacogenomic findings in WGS data and resources required for their discovery and interpretation; and to evaluate clinical action prompted by WGS findings. DESIGN, SETTING, AND PARTICIPANTS An exploratory study of 12 adult participants recruited at Stanford University Medical Center who underwent WGS between November 2011 and March 2012. A multidisciplinary team reviewed all potentially reportable genetic findings. Five physicians proposed initial clinical follow-up based on the genetic findings. MAIN OUTCOMES AND MEASURES Genome coverage and sequencing platform concordance in different categories of genetic disease risk, person-hours spent curating candidate disease-risk variants, interpretation agreement between trained curators and disease genetics databases, burden of inherited disease risk and pharmacogenomic findings, and burden and interrater agreement of proposed clinical follow-up. RESULTS Depending on sequencing platform, 10% to 19% of inherited disease genes were not covered to accepted standards for single nucleotide variant discovery. Genotype concordance was high for previously described single nucleotide genetic variants (99%-100%) but low for small insertion/deletion variants (53%-59%). Curation of 90 to 127 genetic variants in each participant required a median of 54 minutes (range, 5-223 minutes) per genetic variant, resulted in moderate classification agreement between professionals (Gross κ, 0.52; 95%CI, 0.40-0.64), and reclassified 69%of genetic variants cataloged as disease causing in mutation databases to variants of uncertain or lesser significance. Two to 6 personal disease-risk findings were discovered in each participant, including 1 frameshift deletion in the BRCA1 gene implicated in hereditary breast and ovarian cancer. Physician review of sequencing findings prompted consideration of a median of 1 to 3 initial diagnostic tests and referrals per participant, with fair interrater agreement about the suitability of WGS findings for clinical follow-up (Fleiss κ, 0.24; P < 001). CONCLUSIONS AND RELEVANCE In this exploratory study of 12 volunteer adults, the use of WGS was associated with incomplete coverage of inherited disease genes, low reproducibility of detection of genetic variation with the highest potential clinical effects, and uncertainty about clinically reportable findings. In certain cases, WGS will identify clinically actionable genetic variants warranting early medical intervention. These issues should be considered when determining the role of WGS in clinical medicine. PMID:24618965

  6. iJGVD: an integrative Japanese genome variation database based on whole-genome sequencing

    PubMed Central

    Yamaguchi-Kabata, Yumi; Nariai, Naoki; Kawai, Yosuke; Sato, Yukuto; Kojima, Kaname; Tateno, Minoru; Katsuoka, Fumiki; Yasuda, Jun; Yamamoto, Masayuki; Nagasaki, Masao

    2015-01-01

    The integrative Japanese Genome Variation Database (iJGVD; http://ijgvd.megabank.tohoku.ac.jp/) provides genomic variation data detected by whole-genome sequencing (WGS) of Japanese individuals. Specifically, the database contains variants detected by WGS of 1,070 individuals who participated in a genome cohort study of the Tohoku Medical Megabank Project. In the first release, iJGVD includes >4,300,000 autosomal single nucleotide variants (SNVs) whose minor allele frequencies are >5.0%.

  7. Genome sequence of the date palm Phoenix dactylifera L

    PubMed Central

    Al-Mssallem, Ibrahim S.; Hu, Songnian; Zhang, Xiaowei; Lin, Qiang; Liu, Wanfei; Tan, Jun; Yu, Xiaoguang; Liu, Jiucheng; Pan, Linlin; Zhang, Tongwu; Yin, Yuxin; Xin, Chengqi; Wu, Hao; Zhang, Guangyu; Ba Abdullah, Mohammed M.; Huang, Dawei; Fang, Yongjun; Alnakhli, Yasser O.; Jia, Shangang; Yin, An; Alhuzimi, Eman M.; Alsaihati, Burair A.; Al-Owayyed, Saad A.; Zhao, Duojun; Zhang, Sun; Al-Otaibi, Noha A.; Sun, Gaoyuan; Majrashi, Majed A.; Li, Fusen; Tala; Wang, Jixiang; Yun, Quanzheng; Alnassar, Nafla A.; Wang, Lei; Yang, Meng; Al-Jelaify, Rasha F.; Liu, Kan; Gao, Shenghan; Chen, Kaifu; Alkhaldi, Samiyah R.; Liu, Guiming; Zhang, Meng; Guo, Haiyan; Yu, Jun

    2013-01-01

    Date palm (Phoenix dactylifera L.) is a cultivated woody plant species with agricultural and economic importance. Here we report a genome assembly for an elite variety (Khalas), which is 605.4 Mb in size and covers >90% of the genome (~671 Mb) and >96% of its genes (~41,660 genes). Genomic sequence analysis demonstrates that P. dactylifera experienced a clear genome-wide duplication after either ancient whole genome duplications or massive segmental duplications. Genetic diversity analysis indicates that its stress resistance and sugar metabolism-related genes tend to be enriched in the chromosomal regions where the density of single-nucleotide polymorphisms is relatively low. Using transcriptomic data, we also illustrate the date palm’s unique sugar metabolism that underlies fruit development and ripening. Our large-scale genomic and transcriptomic data pave the way for further genomic studies not only on P. dactylifera but also other Arecaceae plants. PMID:23917264

  8. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis

    PubMed Central

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-01-01

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled. PMID:26586576

  9. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    PubMed

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-01-01

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181?Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40?Kbp were obtained, that is, approximately 60 coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299?Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled. PMID:26586576

  10. Accessing complex crop genomes with next-generation sequencing.

    PubMed

    Edwards, David; Batley, Jacqueline; Snowdon, Rod J

    2013-01-01

    Many important crop species have genomes originating from ancestral or recent polyploidisation events. Multiple homoeologous gene copies, chromosomal rearrangements and amplification of repetitive DNA within large and complex crop genomes can considerably complicate genome analysis and gene discovery by conventional, forward genetics approaches. On the other hand, ongoing technological advances in molecular genetics and genomics today offer unprecedented opportunities to analyse and access even more recalcitrant genomes. In this review, we describe next-generation sequencing and data analysis techniques that vastly improve our ability to dissect and mine genomes for causal genes underlying key traits and allelic variation of interest to breeders. We focus primarily on wheat and oilseed rape, two leading examples of major polyploid crop genomes whose size or complexity present different, significant challenges. In both cases, the latest DNA sequencing technologies, applied using quite different approaches, have enabled considerable progress towards unravelling the respective genomes. Our ability to discover the extent and distribution of genetic diversity in crop gene pools, and its relationship to yield and quality-related traits, is swiftly gathering momentum as DNA sequencing and the bioinformatic tools to deal with growing quantities of genomic data continue to develop. In the coming decade, genomic and transcriptomic sequencing, discovery and high-throughput screening of single nucleotide polymorphisms, presence-absence variations and other structural chromosomal variants in diverse germplasm collections will give detailed insight into the origins, domestication and available trait-relevant variation of polyploid crops, in the process facilitating novel approaches and possibilities for genomics-assisted breeding. PMID:22948437

  11. Genome size evolution in pufferfish: an insight from BAC clone-based Diodon holocanthus genome sequencing

    PubMed Central

    2010-01-01

    Background Variations in genome size within and between species have been observed since the 1950 s in diverse taxonomic groups. Serving as model organisms, smooth pufferfish possess the smallest vertebrate genomes. Interestingly, spiny pufferfish from its sister family have genome twice as large as smooth pufferfish. Therefore, comparative genomic analysis between smooth pufferfish and spiny pufferfish is useful for our understanding of genome size evolution in pufferfish. Results Ten BAC clones of a spiny pufferfish Diodon holocanthus were randomly selected and shotgun sequenced. In total, 776 kb of non-redundant sequences without gap representing 0.1% of the D. holocanthus genome were identified, and 77 distinct genes were predicted. In the sequenced D. holocanthus genome, 364 kb is homologous with 265 kb of the Takifugu rubripes genome, and 223 kb is homologous with 148 kb of the Tetraodon nigroviridis genome. The repetitive DNA accounts for 8% of the sequenced D. holocanthus genome, which is higher than that in the T. rubripes genome (6.89%) and that in the Te. nigroviridis genome (4.66%). In the repetitive DNA, 76% is retroelements which account for 6% of the sequenced D. holocanthus genome and belong to known families of transposable elements. More than half of retroelements were distributed within genes. In the non-homologous regions, repeat element proportion in D. holocanthus genome increased to 10.6% compared with T. rubripes and increased to 9.19% compared with Te. nigroviridis. A comparison of 10 well-defined orthologous genes showed that the average intron size (566 bp) in D. holocanthus genome is significantly longer than that in the smooth pufferfish genome (435 bp). Conclusion Compared with the smooth pufferfish, D. holocanthus has a low gene density and repeat elements rich genome. Genome size variation between D. holocanthus and the smooth pufferfish exhibits as length variation between homologous region and different accumulation of non-homologous sequences. The length difference of intron is consistent with the genome size variation between D. holocanthus and the smooth pufferfish. Different transposable element accumulation is responsible for genome size variation between D. holocanthus and the smooth pufferfish. PMID:20569428

  12. Complete genome sequence of Serratia plymuthica strain AS12

    SciTech Connect

    Neupane, Saraswoti; Finlay, Roger D.; Alstrom, Sadhna; Goodwin, Lynne A.; Kyrpides, Nikos C; Lucas, Susan; Lapidus, Alla L.; Bruce, David; Pitluck, Sam; Peters, Lin; Ovchinnikova, Galina; Chertkov, Olga; Han, James; Han, Cliff; Tapia, Roxanne; Detter, J. Chris; Land, Miriam L; Hauser, Loren John; Cheng, Jan-Fang; Ivanova, N; Pagani, Ioanna; Klenk, Hans-Peter; Woyke, Tanja; Hogberg, Nils

    2012-01-01

    A plant associated member of the family Enterobacteriaceae, Serratia plymuthica strain AS12 was isolated from rapeseed roots. It is of scientific interest due to its plant growth promoting and plant pathogen inhibiting ability. The genome of S. plymuthica AS12 comprises a 5,443,009 bp long circular chromosome, which consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome was sequenced within the 2010 DOE-JGI Community Sequencing Program (CSP2010) as part of the project entitled 'Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogens'.

  13. Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis.

    PubMed

    Carlton, Jane M; Hirt, Robert P; Silva, Joana C; Delcher, Arthur L; Schatz, Michael; Zhao, Qi; Wortman, Jennifer R; Bidwell, Shelby L; Alsmark, U Cecilia M; Besteiro, Sébastien; Sicheritz-Ponten, Thomas; Noel, Christophe J; Dacks, Joel B; Foster, Peter G; Simillion, Cedric; Van de Peer, Yves; Miranda-Saavedra, Diego; Barton, Geoffrey J; Westrop, Gareth D; Müller, Sylke; Dessi, Daniele; Fiori, Pier Luigi; Ren, Qinghu; Paulsen, Ian; Zhang, Hanbang; Bastida-Corcuera, Felix D; Simoes-Barbosa, Augusto; Brown, Mark T; Hayes, Richard D; Mukherjee, Mandira; Okumura, Cheryl Y; Schneider, Rachel; Smith, Alias J; Vanacova, Stepanka; Villalvazo, Maria; Haas, Brian J; Pertea, Mihaela; Feldblyum, Tamara V; Utterback, Terry R; Shu, Chung-Li; Osoegawa, Kazutoyo; de Jong, Pieter J; Hrdy, Ivan; Horvathova, Lenka; Zubacova, Zuzana; Dolezal, Pavel; Malik, Shehre-Banoo; Logsdon, John M; Henze, Katrin; Gupta, Arti; Wang, Ching C; Dunne, Rebecca L; Upcroft, Jacqueline A; Upcroft, Peter; White, Owen; Salzberg, Steven L; Tang, Petrus; Chiu, Cheng-Hsun; Lee, Ying-Shiung; Embley, T Martin; Coombs, Graham H; Mottram, Jeremy C; Tachezy, Jan; Fraser-Liggett, Claire M; Johnson, Patricia J

    2007-01-12

    We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the approximately 160-megabase genome, reflecting a recent massive expansion of genetic material. This expansion, in conjunction with the shaping of metabolic pathways that likely transpired through lateral gene transfer from bacteria, and amplification of specific gene families implicated in pathogenesis and phagocytosis of host proteins may exemplify adaptations of the parasite during its transition to a urogenital environment. The genome sequence predicts previously unknown functions for the hydrogenosome, which support a common evolutionary origin of this unusual organelle with mitochondria. PMID:17218520

  14. Complete genome sequence of Serratia plymuthica strain AS12

    PubMed Central

    Finlay, Roger D.; Alström, Sadhna; Goodwin, Lynne; Kyrpides, Nikos C.; Lucas, Susan; Lapidus, Alla; Bruce, David; Pitluck, Sam; Peters, Lin; Ovchinnikova, Galina; Chertkov, Olga; Han, James; Han, Cliff; Tapia, Roxanne; Detter, John C.; Land, Miriam; Hauser, Loren; Cheng, Jan-Fang; Ivanova, Natalia; Pagani, Ioanna; Klenk, Hans-Peter; Woyke, Tanja; Högberg, Nils

    2012-01-01

    A plant-associated member of the family Enterobacteriaceae, Serratia plymuthica strain AS12 was isolated from rapeseed roots. It is of scientific interest because it promotes plant growth and inhibits plant pathogens. The genome of S. plymuthica AS12 comprises a 5,443,009 bp long circular chromosome, which consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome was sequenced within the 2010 DOE-JGI Community Sequencing Program (CSP2010) as part of the project entitled “Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogens”. PMID:22768360

  15. Complete genome sequence of Ferroglobus placidus AEDII12DO

    SciTech Connect

    Anderson, Iain; Risso, Carla; Holmes, Dawn; Lucas, Susan; Copeland, A; Lapidus, Alla L.; Cheng, Jan-Fang; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Saunders, Elizabeth H; Brettin, Thomas S; Detter, J. Chris; Han, Cliff; Tapia, Roxanne; Larimer, Frank W; Land, Miriam L; Hauser, Loren John; Woyke, Tanja; Lovley, Derek; Kyrpides, Nikos C; Ivanova, N

    2011-01-01

    Ferroglobus placidus belongs to the order Archaeoglobales within the archaeal phylum Euryar- chaeota. Strain AEDII12DO is the type strain of the species and was isolated from a shallow marine hydrothermal system at Vulcano, Italy. It is a hyperthermophilic, anaerobic chemoli- thoautotroph, but it can also use a variety of aromatic compounds as electron donors. Here we describe the features of this organism together with the complete genome sequence and anno- tation. The 2,196,266 bp genome with its 2,567 protein-coding and 55 RNA genes was se- quenced as part of a DOE Joint Genome Institute Laboratory Sequencing Program (LSP) project.

  16. Complete genome sequence of Serratia plymuthica strain AS12.

    PubMed

    Neupane, Saraswoti; Finlay, Roger D; Alström, Sadhna; Goodwin, Lynne; Kyrpides, Nikos C; Lucas, Susan; Lapidus, Alla; Bruce, David; Pitluck, Sam; Peters, Lin; Ovchinnikova, Galina; Chertkov, Olga; Han, James; Han, Cliff; Tapia, Roxanne; Detter, John C; Land, Miriam; Hauser, Loren; Cheng, Jan-Fang; Ivanova, Natalia; Pagani, Ioanna; Klenk, Hans-Peter; Woyke, Tanja; Högberg, Nils

    2012-05-25

    A plant-associated member of the family Enterobacteriaceae, Serratia plymuthica strain AS12 was isolated from rapeseed roots. It is of scientific interest because it promotes plant growth and inhibits plant pathogens. The genome of S. plymuthica AS12 comprises a 5,443,009 bp long circular chromosome, which consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome was sequenced within the 2010 DOE-JGI Community Sequencing Program (CSP2010) as part of the project entitled "Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogens". PMID:22768360

  17. Use of metaphors about exome and whole genome sequencing.

    PubMed

    Nelson, Sarah C; Crouch, Julia M; Bamshad, Michael J; Tabor, Holly K; Yu, Joon-Ho

    2016-05-01

    Clinical and research uses of exome and whole genome sequencing (ES/WGS) are growing rapidly. An enhanced understanding of how individuals conceptualize and communicate about sequencing results is needed to ensure effective, mutual exchange of information between care providers and patients and between researchers and participants. Focus groups and interviews participants were recruited to discuss their attitudes and preferences for receiving hypothetical results from ES/WGS. African Americans were intentionally oversampled. We qualitatively analyzed participants' speech to identify unsolicited metaphorical language pertaining to genes and health, and grouped these occurrences into metaphorical concepts. Participants compared genetic information to physical objects including tools, weapons, contents of boxes, and formal documents or reports. These metaphorical concepts centered on several key themes, including locus of control; containment versus release of information; and desirability, usability, interpretability, and ownership of genetic results. Metaphorical language is often used intentionally or unintentionally in discussions about receiving results from ES/WGS in both clinical and research settings. Awareness of the use of metaphorical language and attention to its varied meanings facilitates effective communication about return of ES/WGS results. In turn, both should foster shared and informed decision-making and improve the translation of genetic information by clinicians and researchers. © 2016 Wiley Periodicals, Inc. PMID:26822973

  18. Toward genomic prediction from whole-genome sequence data: impact of sequencing design on genotype imputation and accuracy of predictions

    PubMed Central

    Druet, T; Macleod, I M; Hayes, B J

    2014-01-01

    Genomic prediction from whole-genome sequence data is attractive, as the accuracy of genomic prediction is no longer bounded by extent of linkage disequilibrium between DNA markers and causal mutations affecting the trait, given the causal mutations are in the data set. A cost-effective strategy could be to sequence a small proportion of the population, and impute sequence data to the rest of the reference population. Here, we describe strategies for selecting individuals for sequencing, based on either pedigree relationships or haplotype diversity. Performance of these strategies (number of variants detected and accuracy of imputation) were evaluated in sequence data simulated through a real Belgian Blue cattle pedigree. A strategy (AHAP), which selected a subset of individuals for sequencing that maximized the number of unique haplotypes (from single-nucleotide polymorphism panel data) sequenced gave good performance across a range of variant minor allele frequencies. We then investigated the optimum number of individuals to sequence by fold coverage given a maximum total sequencing effort. At 600 total fold coverage (x 600), the optimum strategy was to sequence 75 individuals at eightfold coverage. Finally, we investigated the accuracy of genomic predictions that could be achieved. The advantage of using imputed sequence data compared with dense SNP array genotypes was highly dependent on the allele frequency spectrum of the causative mutations affecting the trait. When this followed a neutral distribution, the advantage of the imputed sequence data was small; however, when the causal mutations all had low minor allele frequencies, using the sequence data improved the accuracy of genomic prediction by up to 30%. PMID:23549338

  19. Genome sequence analysis of the model grass Brachypodium distachyon: insights into grass genome evolution

    SciTech Connect

    Schulman, Al

    2009-08-09

    Three subfamilies of grasses, the Erhardtoideae (rice), the Panicoideae (maize, sorghum, sugar cane and millet), and the Pooideae (wheat, barley and cool season forage grasses) provide the basis of human nutrition and are poised to become major sources of renewable energy. Here we describe the complete genome sequence of the wild grass Brachypodium distachyon (Brachypodium), the first member of the Pooideae subfamily to be completely sequenced. Comparison of the Brachypodium, rice and sorghum genomes reveals a precise sequence- based history of genome evolution across a broad diversity of the grass family and identifies nested insertions of whole chromosomes into centromeric regions as a predominant mechanism driving chromosome evolution in the grasses. The relatively compact genome of Brachypodium is maintained by a balance of retroelement replication and loss. The complete genome sequence of Brachypodium, coupled to its exceptional promise as a model system for grass research, will support the development of new energy and food crops

  20. Sequencing and comparative analyses of the genomes of zoysiagrasses.

    PubMed

    Tanaka, Hidenori; Hirakawa, Hideki; Kosugi, Shunichi; Nakayama, Shinobu; Ono, Akiko; Watanabe, Akiko; Hashiguchi, Masatsugu; Gondo, Takahiro; Ishigaki, Genki; Muguerza, Melody; Shimizu, Katsuya; Sawamura, Noriko; Inoue, Takayasu; Shigeki, Yuichi; Ohno, Naoki; Tabata, Satoshi; Akashi, Ryo; Sato, Shusei

    2016-04-01

    Zoysiais a warm-season turfgrass, which comprises 11 allotetraploid species (2n= 4x= 40), each possessing different morphological and physiological traits. To characterize the genetic systems ofZoysiaplants and to analyse their structural and functional differences in individual species and accessions, we sequenced the genomes ofZoysiaspecies using HiSeq and MiSeq platforms. As a reference sequence ofZoysiaspecies, we generated a high-quality draft sequence of the genome ofZ. japonicaaccession 'Nagirizaki' (334 Mb) in which 59,271 protein-coding genes were predicted. In parallel, draft genome sequences ofZ. matrella'Wakaba' andZ. pacifica'Zanpa' were also generated for comparative analyses. To investigate the genetic diversity among theZoysiaspecies, genome sequence reads of three additional accessions,Z. japonica'Kyoto',Z. japonica'Miyagi' andZ. matrella'Chiba Fair Green', were accumulated, and aligned against the reference genome of 'Nagirizaki' along with those from 'Wakaba' and 'Zanpa'. As a result, we detected 7,424,163 single-nucleotide polymorphisms and 852,488 short indels among these species. The information obtained in this study will be valuable for basic studies on zoysiagrass evolution and genetics as well as for the breeding of zoysiagrasses, and is made available in the 'Zoysia Genome Database' athttp://zoysia.kazusa.or.jp. PMID:26975196

  1. Sequencing and comparative analyses of the genomes of zoysiagrasses

    PubMed Central

    Tanaka, Hidenori; Hirakawa, Hideki; Kosugi, Shunichi; Nakayama, Shinobu; Ono, Akiko; Watanabe, Akiko; Hashiguchi, Masatsugu; Gondo, Takahiro; Ishigaki, Genki; Muguerza, Melody; Shimizu, Katsuya; Sawamura, Noriko; Inoue, Takayasu; Shigeki, Yuichi; Ohno, Naoki; Tabata, Satoshi; Akashi, Ryo; Sato, Shusei

    2016-01-01

    Zoysia is a warm-season turfgrass, which comprises 11 allotetraploid species (2n = 4x = 40), each possessing different morphological and physiological traits. To characterize the genetic systems of Zoysia plants and to analyse their structural and functional differences in individual species and accessions, we sequenced the genomes of Zoysia species using HiSeq and MiSeq platforms. As a reference sequence of Zoysia species, we generated a high-quality draft sequence of the genome of Z. japonica accession ‘Nagirizaki’ (334 Mb) in which 59,271 protein-coding genes were predicted. In parallel, draft genome sequences of Z. matrella ‘Wakaba’ and Z. pacifica ‘Zanpa’ were also generated for comparative analyses. To investigate the genetic diversity among the Zoysia species, genome sequence reads of three additional accessions, Z. japonica ‘Kyoto’, Z. japonica ‘Miyagi’ and Z. matrella ‘Chiba Fair Green’, were accumulated, and aligned against the reference genome of ‘Nagirizaki’ along with those from ‘Wakaba’ and ‘Zanpa’. As a result, we detected 7,424,163 single-nucleotide polymorphisms and 852,488 short indels among these species. The information obtained in this study will be valuable for basic studies on zoysiagrass evolution and genetics as well as for the breeding of zoysiagrasses, and is made available in the ‘Zoysia Genome Database’ at http://zoysia.kazusa.or.jp. PMID:26975196

  2. Comparison of mitochondrial genome sequences of pangolins (Mammalia, Pholidota).

    PubMed

    Hassanin, Alexandre; Hugot, Jean-Pierre; van Vuuren, Bettine Jansen

    2015-04-01

    The complete mitochondrial genome was sequenced for three species of pangolins, Manis javanica, Phataginus tricuspis, and Smutsia temminckii, and comparisons were made with two other species, Manis pentadactyla and Phataginus tetradactyla. The genome of Manidae contains the 37 genes found in a typical mammalian genome, and the structure of the control region is highly conserved among species. In Manis, the overall base composition differs from that found in African genera. Phylogenetic analyses support the monophyly of the genera Manis, Phataginus, and Smutsia, as well as the basal division between Maninae and Smutsiinae. Comparisons with GenBank sequences reveal that the reference genomes of M. pentadactyla and P. tetradactyla (accession numbers NC_016008 and NC_004027) were sequenced from misidentified taxa, and that a new species of tree pangolin should be described in Gabon. PMID:25746396

  3. Complete genome sequence of equine herpesvirus type 9.

    PubMed

    Fukushi, Hideto; Yamaguchi, Tsuyoshi; Yamada, Souichi

    2012-12-01

    Equine herpesvirus type 9 (EHV-9), which we isolated from a case of epizootic encephalitis in a herd of Thomson's gazelles (Gazella thomsoni) in 1993, has been known to cause fatal encephalitis in Thomson's gazelle, giraffe, and polar bear in natural infections. Our previous report indicated that EHV-9 was similar to the equine pathogen equine herpesvirus type 1 (EHV-1), which mainly causes abortion, respiratory infection, and equine herpesvirus myeloencephalopathy. We determined the genome sequence of EHV-9. The genome has a length of 148,371 bp and all 80 of the open reading frames (ORFs) found in the genome of EHV-1. The nucleotide sequences of the ORFs in EHV-9 were 86 to 95% identical to those in EHV-1. The whole genome sequence should help to reveal the neuropathogenicity of EHV-9. PMID:23166237

  4. The complete chloroplast genome sequence of Dieffenbachia seguine (Araceae).

    PubMed

    Wang, Bin; Han, Limin; Chen, Chen; Wang, Zhezhi

    2016-07-01

    The nucleotide sequence of the chloroplast genome from Dieffenbachia seguine is the first to have complete genome sequence from genus of Dieffenbachia family Araceae. The genome size is 163 699 bp in length, with 36.4% GC content. A pair of inverted repeats (IRs, 25 235 bp) is separated by a large single copy region (LSC, 90 780 bp) and a small single copy region (SSC, 22 449 bp). The chloroplast genome contains 113 unique genes, 88 protein-coding genes, 37 tRNA genes, and four rRNA genes. In these genes, 16 genes contained single intron and two genes composed of double introns. A maximum likelihood phylogenetic analysis using complete chloroplast genome revealed that Dieffenbachia seguine belongs to the Araceae family of the Arecidae group, which is conform to the traditional classification. PMID:26153749

  5. Long-read sequence assembly of the gorilla genome.

    PubMed

    Gordon, David; Huddleston, John; Chaisson, Mark J P; Hill, Christopher M; Kronenberg, Zev N; Munson, Katherine M; Malig, Maika; Raja, Archana; Fiddes, Ian; Hillier, LaDeana W; Dunn, Christopher; Baker, Carl; Armstrong, Joel; Diekhans, Mark; Paten, Benedict; Shendure, Jay; Wilson, Richard K; Haussler, David; Chin, Chen-Shan; Eichler, Evan E

    2016-04-01

    Accurate sequence and assembly of genomes is a critical first step for studies of genetic variation. We generated a high-quality assembly of the gorilla genome using single-molecule, real-time sequence technology and a string graph de novo assembly algorithm. The new assembly improves contiguity by two to three orders of magnitude with respect to previously released assemblies, recovering 87% of missing reference exons and incomplete gene models. Although regions of large, high-identity segmental duplications remain largely unresolved, this comprehensive assembly provides new biological insight into genetic diversity, structural variation, gene loss, and representation of repeat structures within the gorilla genome. The approach provides a path forward for the routine assembly of mammalian genomes at a level approaching that of the current quality of the human genome. PMID:27034376

  6. Draft genome sequence of adzuki bean, Vigna angularis.

    PubMed

    Kang, Yang Jae; Satyawan, Dani; Shim, Sangrea; Lee, Taeyoung; Lee, Jayern; Hwang, Won Joo; Kim, Sue K; Lestari, Puji; Laosatit, Kularb; Kim, Kil Hyun; Ha, Tae Joung; Chitikineni, Annapurna; Kim, Moon Young; Ko, Jong-Min; Gwag, Jae-Gyun; Moon, Jung-Kyung; Lee, Yeong-Ho; Park, Beom-Seok; Varshney, Rajeev K; Lee, Suk-Ha

    2015-01-01

    Adzuki bean (Vigna angularis var. angularis) is a dietary legume crop in East Asia. The presumed progenitor (Vigna angularis var. nipponensis) is widely found in East Asia, suggesting speciation and domestication in these temperate climate regions. Here, we report a draft genome sequence of adzuki bean. The genome assembly covers 75% of the estimated genome and was mapped to 11 pseudo-chromosomes. Gene prediction revealed 26,857 high confidence protein-coding genes evidenced by RNAseq of different tissues. Comparative gene expression analysis with V. radiata showed that the tissue specificity of orthologous genes was highly conserved. Additional re-sequencing of wild adzuki bean, V. angularis var. nipponensis, and V. nepalensis, was performed to analyze the variations between cultivated and wild adzuki bean. The determined divergence time of adzuki bean and the wild species predated archaeology-based domestication time. The present genome assembly will accelerate the genomics-assisted breeding of adzuki bean. PMID:25626881

  7. Quantifying Genome Editing Outcomes at Endogenous Loci using SMRT Sequencing

    PubMed Central

    Clark, Joseph; Punjya, Niraj; Sebastiano, Vittorio; Bao, Gang; Porteus, Matthew H

    2014-01-01

    SUMMARY Targeted genome editing with engineered nucleases has transformed the ability to introduce precise sequence modifications at almost any site within the genome. A major obstacle to probing the efficiency and consequences of genome editing is that no existing method enables the frequency of different editing events to be simultaneously measured across a cell population at any endogenous genomic locus. We have developed a novel method for quantifying individual genome editing outcomes at any site of interest using single molecule real time (SMRT) DNA sequencing. We show that this approach can be applied at various loci, using multiple engineered nuclease platforms including TALENs, RNA guided endonucleases (CRISPR/Cas9), and ZFNs, and in different cell lines to identify conditions and strategies in which the desired engineering outcome has occurred. This approach facilitates the evaluation of new gene editing technologies and permits sensitive quantification of editing outcomes in almost every experimental system used. PMID:24685129

  8. Genome Sequence of a Mycoplasma meleagridis Field Strain.

    PubMed

    Rocha, Ticiana S; Bertolotti, Luigi; Catania, Salvatore; Pourquier, Philippe; Rosati, Sergio

    2016-01-01

    Mycoplasma meleagridis is a major cause of disease and economic loss in turkeys. Here, we report the genome sequence of an M. meleagridis field strain, which enlarges the knowledge about this bacterium and helps the identification of possible coding sequences for drug resistance genes and specific antigens. PMID:26941131

  9. Genome Sequence of a Mycoplasma meleagridis Field Strain

    PubMed Central

    Bertolotti, Luigi; Catania, Salvatore; Pourquier, Philippe; Rosati, Sergio

    2016-01-01

    Mycoplasma meleagridis is a major cause of disease and economic loss in turkeys. Here, we report the genome sequence of an M. meleagridis field strain, which enlarges the knowledge about this bacterium and helps the identification of possible coding sequences for drug resistance genes and specific antigens. PMID:26941131

  10. Complete Genome Sequence of the Alfalfa latent virus.

    PubMed

    Nemchinov, Lev G; Shao, Jonathan; Postnikova, Olga A

    2015-01-01

    The first complete genome sequence of the Alfalfa latent carlavirus (ALV) was obtained by primer walking and Illumina RNA sequencing. The virus differs substantially from the Czech ALV isolate and the Pea streak virus isolate from Wisconsin. The absence of a clear nucleic acid-binding protein indicates ALV divergence from other carlaviruses. PMID:25883281

  11. Genome Sequences of Vibrio navarrensis, a Potential Human Pathogen.

    PubMed

    Gladney, Lori M; Katz, Lee S; Knipe, Kristen M; Rowe, Lori A; Conley, Andrew B; Rishishwar, Lavanya; Mariño-Ramírez, Leonardo; Jordan, I King; Tarr, Cheryl L

    2014-01-01

    Vibrio navarrensis is an aquatic bacterium recently shown to be associated with human illness. We report the first genome sequences of three V. navarrensis strains obtained from clinical and environmental sources. Preliminary analyses of the sequences reveal that V. navarrensis contains genes commonly associated with virulence in other human pathogens. PMID:25414502

  12. Draft Genome Sequence of Type Strain Streptococcus gordonii ATCC 10558.

    PubMed

    Rasmussen, Louise H; Dargis, Rimtas; Christensen, Jens Jørgen; Skovgaard, Ole; Nielsen, Xiaohui C

    2016-01-01

    Streptococcus gordonii ATCC 10558(T) was isolated from a patient with infective endocarditis in 1946 and announced as a type strain in 1989. Here, we report the 2,154,510-bp draft genome sequence of S. gordonii ATCC 10558(T). This sequence will contribute to knowledge about the pathogenesis of infective endocarditis. PMID:26893427

  13. PHYTOPHTHORA GENOME SEQUENCES UNCOVER EVOLUTIONARY ORIGINS AND MECHANISMS OF PATHOGENESIS

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Draft genome sequences of the soybean pathogen Phytophthora sojae and the sudden oak death pathogen Phytophthora ramorum have been determined. Oomycetes such as these Phytophthora species share the kingdom Stramenopiles with photosynthetic algae such as diatoms, and the Phytophthora sequences sugges...

  14. Sequencing the Genome of the Heirloom Watermelon Cultivar Charleston Gray

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genome of the watermelon cultivar Charleston Gray, a major heirloom which has been used in breeding programs of many watermelon cultivars, was sequenced. Our strategy involved a hybrid approach using the Illumina and 454/Titanium next-generation sequencing technologies. For Illumina, shotgun g...

  15. Complete Genome Sequence of Kocuria palustris MU14/1

    PubMed Central

    Foecking, Mark F.

    2015-01-01

    Presented here is the first completely assembled genome sequence of Kocuria palustris, an actinobacterial species with broad ecological distribution. The single, circular chromosome of K. palustris MU14/1 comprises 2,854,447 bp, has a G+C content of 70.5%, and contains a deduced gene set of 2,521 coding sequences. PMID:26472837

  16. Complete Genome Sequence of the Alfalfa latent virus

    PubMed Central

    Shao, Jonathan; Postnikova, Olga A.

    2015-01-01

    The first complete genome sequence of the Alfalfa latent carlavirus (ALV) was obtained by primer walking and Illumina RNA sequencing. The virus differs substantially from the Czech ALV isolate and the Pea streak virus isolate from Wisconsin. The absence of a clear nucleic acid-binding protein indicates ALV divergence from other carlaviruses. PMID:25883281

  17. Genome sequence of Stachybotrys chartarum Strain 51-11

    EPA Science Inventory

    Stachybotrys chartarum strain 51-11 genome was sequenced by shotgun sequencing utilizing Illumina Hiseq 2000 and PacBio long read technology. Since Stachybotrys chartarum has been implicated in health impacts within water-damaged buildings, any information extracted from the geno...

  18. Genome Sequence of Lactobacillus plantarum Strain UCMA 3037

    PubMed Central

    Naz, Saima; Tareb, Raouf; Bernardeau, Marion; Vaisse, Melissa; Lucchetti-Miganeh, Celine; Rechenmann, Mathias

    2013-01-01

    Nucleic acid of the strain Lactobacillus plantarum UCMA 3037, isolated from raw milk camembert cheese in our laboratory, was sequenced. We present its draft genome sequence with the aim of studying its functional properties and relationship to the cheese ecosystem. PMID:23704179

  19. Genome Sequence of Lactobacillus plantarum Strain UCMA 3037.

    PubMed

    Naz, Saima; Tareb, Raouf; Bernardeau, Marion; Vaisse, Melissa; Lucchetti-Miganeh, Celine; Rechenmann, Mathias; Vernoux, Jean-Paul

    2013-01-01

    Nucleic acid of the strain Lactobacillus plantarum UCMA 3037, isolated from raw milk camembert cheese in our laboratory, was sequenced. We present its draft genome sequence with the aim of studying its functional properties and relationship to the cheese ecosystem. PMID:23704179

  20. Draft Genome Sequence of Toluene-Resistant Staphylococcus epidermidis SNUT

    PubMed Central

    Kim, Beomsoo; Kim, Jingyu; Park, Hyun

    2016-01-01

    Here, we report draft sequence of the Gram-positive toluene-resistant bacterium Staphylococcus epidermidis SNUT. The draft genome sequence is 2,511,658 bases, with 2,346 protein-coding genes, 57 tRNA-coding genes, and 8 rRNA genes. PMID:26941142

  1. Complete Genome Sequence of Caulobacter crescentus Siphophage Sansa

    PubMed Central

    Vara, Leonardo; Kane, Ashley A.; Cahill, Jesse L.; Rasche, Eric S.

    2015-01-01

    Caulobacter crescentus is a Gram-negative dimorphic model organism used to study cell differentiation. Siphophage Sansa is a newly isolated siphophage with an icosahedral capsid that infects C. crescentus. Sansa shares no sequence similarity to other phages deposited in GenBank. Here, we describe its genome sequence and general features. PMID:26450723

  2. GENOMIC SEQUENCE ANALYSIS OF LEPTOSPIRA BORGPETERSENII SEROVAR HARDJO

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A genomic library from Leptospira borgpetersenii serovar hardjo strain JB197 was prepared by mechanically shearing the DNA and inserting it into a positive selection vector. DNA was prepared from approximately 22,000 random clones and used as templates for automated sequencing. Sequence data was c...

  3. Complete Genome Sequences of Mandrillus leucophaeus and Papio ursinus Cytomegaloviruses.

    PubMed

    Blewett, Earl Linwood; Sherrod, Carly J; Texier, Jordan R; Conrad, Tom M; Dittmer, Dirk P

    2015-01-01

    The complete genome sequences of Mandrillus leucophaeus and Papio ursinus cytomegaloviruses were determined. An isolate from a drill monkey, OCOM6-2, and an isolate from a chacma baboon, OCOM4-52, were subjected to pyrosequencing and assembled. Comparative alignment of published primate cytomegaloviruses (CMVs) showed variable sequence conservation between species. PMID:26251484

  4. Draft Genome Sequence of Type Strain Streptococcus gordonii ATCC 10558

    PubMed Central

    Rasmussen, Louise H.; Dargis, Rimtas; Skovgaard, Ole

    2016-01-01

    Streptococcus gordonii ATCC 10558T was isolated from a patient with infective endocarditis in 1946 and announced as a type strain in 1989. Here, we report the 2,154,510-bp draft genome sequence of S. gordonii ATCC 10558T. This sequence will contribute to knowledge about the pathogenesis of infective endocarditis. PMID:26893427

  5. Genome Sequences of Five Nonvirulent Listeria monocytogenes Serovar 4 Strains.

    PubMed

    Sumrall, Eric; Klumpp, Jochen; Shen, Yang; Loessner, Martin J

    2016-01-01

    We present the complete genome sequences of five nonpathogenicListeria monocytogenesserovar 4 strains: WSLC 1018 (4e), 1019 (4c), 1020 (4a), 1033 (4d), and 1047 (4d). These sequences may help to uncover genes involved in the synthesis of the serovar antigens-phenotypic determinants of virulence deemed clinically relevant. PMID:27034489

  6. Draft Genome Sequence of Lactobacillus fermentum NB-22

    PubMed Central

    Shkoporov, A. N.; Efimov, B. A.; Pikina, A. P.; Borisova, O. Y.; Gladko, I. A.; Postnikova, E. A.; Lordkipanidze, A. E.; Kafarskaia, L. I.

    2015-01-01

    We announce here a draft genome sequence of Lactobacillus fermentum NB-22, a strain isolated from human vaginal microbiota. The assembled sequence consists of 190 contigs, joined into 137 scaffolds, and the total size is 2.01 Mb. PMID:26272572

  7. Draft Genome Sequence of Lactobacillus fermentum NB-22.

    PubMed

    Chaplin, A V; Shkoporov, A N; Efimov, B A; Pikina, A P; Borisova, O Y; Gladko, I A; Postnikova, E A; Lordkipanidze, A E; Kafarskaia, L I

    2015-01-01

    We announce here a draft genome sequence of Lactobacillus fermentum NB-22, a strain isolated from human vaginal microbiota. The assembled sequence consists of 190 contigs, joined into 137 scaffolds, and the total size is 2.01 Mb. PMID:26272572

  8. Draft Genome Sequences of Two Toxigenic Corynebacterium ulcerans Strains

    PubMed Central

    Fournier, Eric; Massé, Cynthia; Charest, Hugues; Bernard, Kathryn; Côté, Jean-Charles; Tremblay, Cécile

    2015-01-01

    Here, we present the draft genome sequences of two toxigenic Corynebacterium ulcerans strains isolated from two different patients: one from a blood sample and the other from a scar exudate following surgery. Although these two strains harbor the diphtheria toxin gene tox, no full prophage sequences were found in the flanking regions. PMID:26112794

  9. Genome Sequences of Five Nonvirulent Listeria monocytogenes Serovar 4 Strains

    PubMed Central

    Shen, Yang; Loessner, Martin J.

    2016-01-01

    We present the complete genome sequences of five nonpathogenic Listeria monocytogenes serovar 4 strains: WSLC 1018 (4e), 1019 (4c), 1020 (4a), 1033 (4d), and 1047 (4d). These sequences may help to uncover genes involved in the synthesis of the serovar antigens—phenotypic determinants of virulence deemed clinically relevant. PMID:27034489

  10. Draft Genome Sequence of Toluene-Resistant Staphylococcus epidermidis SNUT.

    PubMed

    Kim, Beomsoo; Kim, Jingyu; Park, Hyun; Park, Joonho

    2016-01-01

    Here, we report draft sequence of the Gram-positive toluene-resistant bacterium Staphylococcus epidermidis SNUT. The draft genome sequence is 2,511,658 bases, with 2,346 protein-coding genes, 57 tRNA-coding genes, and 8 rRNA genes. PMID:26941142

  11. Structure-based inference of molecular functions of proteins of unknown function from Berkeley Structural Genomics Center

    SciTech Connect

    Kim, Sung-Hou; Shin, Dong Hae; Hou, Jingtong; Chandonia, John-Marc; Das, Debanu; Choi, In-Geol; Kim, Rosalind; Kim, Sung-Hou

    2007-09-02

    Advances in sequence genomics have resulted in an accumulation of a huge number of protein sequences derived from genome sequences. However, the functions of a large portion of them cannot be inferred based on the current methods of sequence homology detection to proteins of known functions. Three-dimensional structure can have an important impact in providing inference of molecular function (physical and chemical function) of a protein of unknown function. Structural genomics centers worldwide have been determining many 3-D structures of the proteins of unknown functions, and possible molecular functions of them have been inferred based on their structures. Combined with bioinformatics and enzymatic assay tools, the successful acceleration of the process of protein structure determination through high throughput pipelines enables the rapid functional annotation of a large fraction of hypothetical proteins. We present a brief summary of the process we used at the Berkeley Structural Genomics Center to infer molecular functions of proteins of unknown function.

  12. Corruption of genomic databases with anomalous sequence.

    PubMed Central

    Lamperti, E D; Kittelberger, J M; Smith, T F; Villa-Komaroff, L

    1992-01-01

    We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%. PMID:1614861

  13. Complete genome sequence of southern tomato virus identified from China using next generation sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Complete genome sequence of a double-stranded RNA (dsRNA) virus, southern tomato virus (STV), on tomatoes in China, was elucidated using small RNAs deep sequencing. The identified STV_CN12 shares 99% sequence identity to other isolates from Mexico, France, Spain, and U.S. This is the first report ...

  14. Complete Genome Sequence of Southern tomato virus Identified in China Using Next-Generation Sequencing

    PubMed Central

    Padmanabhan, Chellappan; Zheng, Yi; Li, Rugang; Sun, Shu-E; Zhang, Deyong; Liu, Yong; Fei, Zhangjun

    2015-01-01

    The complete genome sequence of Southern tomato virus (STV), a double-stranded RNA virus that affects tomato in China, was determined using small RNA deep sequencing. This Chinese isolate shares 99% sequence identity to other isolates from Mexico, France, Spain, and the United States. This is the first report of STV infecting tomatoes in Asia. PMID:26494671

  15. Complete Genome Sequence of Southern tomato virus Identified in China Using Next-Generation Sequencing.

    PubMed

    Padmanabhan, Chellappan; Zheng, Yi; Li, Rugang; Sun, Shu-E; Zhang, Deyong; Liu, Yong; Fei, Zhangjun; Ling, Kai-Shu

    2015-01-01

    The complete genome sequence of Southern tomato virus (STV), a double-stranded RNA virus that affects tomato in China, was determined using small RNA deep sequencing. This Chinese isolate shares 99% sequence identity to other isolates from Mexico, France, Spain, and the United States. This is the first report of STV infecting tomatoes in Asia. PMID:26494671

  16. Initial sequence and comparative analysis of the cat genome

    PubMed Central

    Pontius, Joan U.; Mullikin, James C.; Smith, Douglas R.; Lindblad-Toh, Kerstin; Gnerre, Sante; Clamp, Michele; Chang, Jean; Stephens, Robert; Neelam, Beena; Volfovsky, Natalia; Schäffer, Alejandro A.; Agarwala, Richa; Narfström, Kristina; Murphy, William J.; Giger, Urs; Roca, Alfred L.; Antunes, Agostinho; Menotti-Raymond, Marilyn; Yuhki, Naoya; Pecon-Slattery, Jill; Johnson, Warren E.; Bourque, Guillaume; Tesler, Glenn; O’Brien, Stephen J.

    2007-01-01

    The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing ∼65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence. PMID:17975172

  17. Widespread mitovirus sequences in plant genomes

    PubMed Central

    Warner, Benjamin E.; Yerramsetty, Pradeep

    2015-01-01

    The exploration of the evolution of RNA viruses has been aided recently by the discovery of copies of fragments or complete genomes of non-retroviral RNA viruses (Non-retroviral Endogenous RNA Viral Elements, or NERVEs) in many eukaryotic nuclear genomes. Among the most prominent NERVEs are partial copies of the RNA dependent RNA polymerase (RdRP) of the mitoviruses in plant mitochondrial genomes. Mitoviruses are in the family Narnaviridae, which are the simplest viruses, encoding only a single protein (the RdRP) in their unencapsidated viral plus strand. Narnaviruses are known only in fungi, and the origin of plant mitochondrial mitovirus NERVEs appears to be horizontal transfer from plant pathogenic fungi. At least one mitochondrial mitovirus NERVE, but not its nuclear copy, is expressed. PMID:25870770

  18. Identification of genes in genomic and EST sequences

    SciTech Connect

    Fields, C.; Adams, M.D.; Kerlavage, A.R.; Dubnick, M.; McCombie, W.R.; Martin-Gallardo, A.; Venter, J.C.; White, O.

    1993-12-31

    Currently-available software tools are capable of predicting the locations of most protein-coding genes in anonymous genomic DNA sequences. The use of predicted exxon to select primers for PCR amplification from cDNA libraries allows the complete structures of novel genes to be determined efficiently. As the number of expressed sequence tag (EST) sequences increases, the fraction of genes that can be localized in genomic sequences by searching EST databases will rapidly approach unity. The challenge for automated DNA sequence analysis is now to develop methods for accurately predicting gene structure and alternative splicing patterns. Substantially improving current accuracies in gene structure prediction will require retrospective comparative analysis of sequences from different organisms and gene families.

  19. Legume genomics: understanding biology through DNA and RNA sequencing

    PubMed Central

    O'Rourke, Jamie A.; Bolon, Yung-Tsi; Bucciarelli, Bruna; Vance, Carroll P.

    2014-01-01

    Background The legume family (Leguminosae) consists of approx. 17 000 species. A few of these species, including, but not limited to, Phaseolus vulgaris, Cicer arietinum and Cajanus cajan, are important dietary components, providing protein for approx. 300 million people worldwide. Additional species, including soybean (Glycine max) and alfalfa (Medicago sativa), are important crops utilized mainly in animal feed. In addition, legumes are important contributors to biological nitrogen, forming symbiotic relationships with rhizobia to fix atmospheric N2 and providing up to 30 % of available nitrogen for the next season of crops. The application of high-throughput genomic technologies including genome sequencing projects, genome re-sequencing (DNA-seq) and transcriptome sequencing (RNA-seq) by the legume research community has provided major insights into genome evolution, genomic architecture and domestication. Scope and Conclusions This review presents an overview of the current state of legume genomics and explores the role that next-generation sequencing technologies play in advancing legume genomics. The adoption of next-generation sequencing and implementation of associated bioinformatic tools has allowed researchers to turn each species of interest into their own model organism. To illustrate the power of next-generation sequencing, an in-depth overview of the transcriptomes of both soybean and white lupin (Lupinus albus) is provided. The soybean transcriptome focuses on analysing seed development in two near-isogenic lines, examining the role of transporters, oil biosynthesis and nitrogen utilization. The white lupin transcriptome analysis examines how phosphate deficiency alters gene expression patterns, inducing the formation of cluster roots. Such studies illustrate the power of next-generation sequencing and bioinformatic analyses in elucidating the gene networks underlying biological processes. PMID:24769535

  20. Genome Sequencing Highlights the Dynamic Early History of Dogs

    PubMed Central

    Freedman, Adam H.; Gronau, Ilan; Schweizer, Rena M.; Ortega-Del Vecchyo, Diego; Han, Eunjung; Silva, Pedro M.; Galaverni, Marco; Fan, Zhenxin; Marx, Peter; Lorente-Galdos, Belen; Beale, Holly; Ramirez, Oscar; Hormozdiari, Farhad; Alkan, Can; Vilà, Carles; Squire, Kevin; Geffen, Eli; Kusak, Josip; Boyko, Adam R.; Parker, Heidi G.; Lee, Clarence; Tadigotla, Vasisht; Siepel, Adam; Bustamante, Carlos D.; Harkins, Timothy T.; Nelson, Stanley F.; Ostrander, Elaine A.; Marques-Bonet, Tomas; Wayne, Robert K.; Novembre, John

    2014-01-01

    To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we generated high-quality genome sequences from three gray wolves, one from each of the three putative centers of dog domestication, two basal dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. Analysis of these sequences supports a demographic model in which dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow. In dogs, the domestication bottleneck involved at least a 16-fold reduction in population size, a much more severe bottleneck than estimated previously. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was substantially larger than represented by modern wolf populations. We narrow the plausible range for the date of initial dog domestication to an interval spanning 11–16 thousand years ago, predating the rise of agriculture. In light of this finding, we expand upon previous work regarding the increase in copy number of the amylase gene (AMY2B) in dogs, which is believed to have aided digestion of starch in agricultural refuse. We find standing variation for amylase copy number variation in wolves and little or no copy number increase in the Dingo and Husky lineages. In conjunction with the estimated timing of dog origins, these results provide additional support to archaeological finds, suggesting the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that, surprisingly, none of the extant wolf lineages from putative domestication centers is more closely related to dogs, and, instead, the sampled wolves form a sister monophyletic clade. This result, in combination with dog-wolf admixture during the process of domestication, suggests that a re-evaluation of past hypotheses regarding dog origins is necessary. PMID:24453982

  1. Genome sequencing highlights the dynamic early history of dogs.

    PubMed

    Freedman, Adam H; Gronau, Ilan; Schweizer, Rena M; Ortega-Del Vecchyo, Diego; Han, Eunjung; Silva, Pedro M; Galaverni, Marco; Fan, Zhenxin; Marx, Peter; Lorente-Galdos, Belen; Beale, Holly; Ramirez, Oscar; Hormozdiari, Farhad; Alkan, Can; Vilà, Carles; Squire, Kevin; Geffen, Eli; Kusak, Josip; Boyko, Adam R; Parker, Heidi G; Lee, Clarence; Tadigotla, Vasisht; Wilton, Alan; Siepel, Adam; Bustamante, Carlos D; Harkins, Timothy T; Nelson, Stanley F; Ostrander, Elaine A; Marques-Bonet, Tomas; Wayne, Robert K; Novembre, John

    2014-01-01

    To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we generated high-quality genome sequences from three gray wolves, one from each of the three putative centers of dog domestication, two basal dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. Analysis of these sequences supports a demographic model in which dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow. In dogs, the domestication bottleneck involved at least a 16-fold reduction in population size, a much more severe bottleneck than estimated previously. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was substantially larger than represented by modern wolf populations. We narrow the plausible range for the date of initial dog domestication to an interval spanning 11-16 thousand years ago, predating the rise of agriculture. In light of this finding, we expand upon previous work regarding the increase in copy number of the amylase gene (AMY2B) in dogs, which is believed to have aided digestion of starch in agricultural refuse. We find standing variation for amylase copy number variation in wolves and little or no copy number increase in the Dingo and Husky lineages. In conjunction with the estimated timing of dog origins, these results provide additional support to archaeological finds, suggesting the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that, surprisingly, none of the extant wolf lineages from putative domestication centers is more closely related to dogs, and, instead, the sampled wolves form a sister monophyletic clade. This result, in combination with dog-wolf admixture during the process of domestication, suggests that a re-evaluation of past hypotheses regarding dog origins is necessary. PMID:24453982

  2. LLNL Genomic Assessment: Viral and Bacterial Sequencing Needs for TMTI, Task 1.4.2 Report

    SciTech Connect

    Slezak, T; Borucki, M; Lam, M; Lenhoff, R; Vitalis, E

    2010-01-26

    Good progress has been made on both bacterial and viral sequencing by the TMTI centers. While access to appropriate samples is a limiting factor to throughput, excellent progress has been made with respect to getting agreements in place with key sources of relevant materials. Sharing of sequenced genomes funded by TMTI has been extremely limited to date. The April 2010 exercise should force a resolution to this, but additional managerial pressures may be needed to ensure that rapid sharing of TMTI-funded sequencing occurs, regardless of collaborator constraints concerning ultimate publication(s). Policies to permit TMTI-internal rapid sharing of sequenced genomes should be written into all TMTI agreements with collaborators now being negotiated. TMTI needs to establish a Web-based system for tracking samples destined for sequencing. This includes metadata on sample origins and contributor, information on sample shipment/receipt, prioritization by TMTI, assignment to one or more sequencing centers (including possible TMTI-sponsored sequencing at a contributor site), and status history of the sample sequencing effort. While this system could be a component of the AFRL system, it is not part of any current development effort. Policy and standardized procedures are needed to ensure appropriate verification of all TMTI samples prior to the investment in sequencing. PCR, arrays, and classical biochemical tests are examples of potential verification methods. Verification is needed to detect miss-labeled, degraded, mixed or contaminated samples. Regular QC exercises are needed to ensure that the TMTI-funded centers are meeting all standards for producing quality genomic sequence data.

  3. Sequence-based physical mapping of complex genomes by whole genome profiling

    PubMed Central

    van Oeveren, Jan; de Ruiter, Marjo; Jesse, Taco; van der Poel, Hein; Tang, Jifeng; Yalcin, Feyruz; Janssen, Antoine; Volpin, Hanne; Stormo, Keith E.; Bogden, Robert; van Eijk, Michiel J.T.; Prins, Marcel

    2011-01-01

    We present whole genome profiling (WGP), a novel next-generation sequencing-based physical mapping technology for construction of bacterial artificial chromosome (BAC) contigs of complex genomes, using Arabidopsis thaliana as an example. WGP leverages short read sequences derived from restriction fragments of two-dimensionally pooled BAC clones to generate sequence tags. These sequence tags are assigned to individual BAC clones, followed by assembly of BAC contigs based on shared regions containing identical sequence tags. Following in silico analysis of WGP sequence tags and simulation of a map of Arabidopsis chromosome 4 and maize, a WGP map of Arabidopsis thaliana ecotype Columbia was constructed de novo using a six-genome equivalent BAC library. Validation of the WGP map using the Columbia reference sequence confirmed that 350 BAC contigs (98%) were assembled correctly, spanning 97% of the 102-Mb calculated genome coverage. We demonstrate that WGP maps can also be generated for more complex plant genomes and will serve as excellent scaffolds to anchor genetic linkage maps and integrate whole genome sequence data. PMID:21324881

  4. Bioinformatics approaches for genomics and post genomics applications of next-generation sequencing.

    PubMed

    Horner, David Stephen; Pavesi, Giulio; Castrignanò, Tiziana; De Meo, Paolo D'Onorio; Liuni, Sabino; Sammeth, Michael; Picardi, Ernesto; Pesole, Graziano

    2010-03-01

    Technical advances such as the development of molecular cloning, Sanger sequencing, PCR and oligonucleotide microarrays are key to our current capacity to sequence, annotate and study complete organismal genomes. Recent years have seen the development of a variety of so-called 'next-generation' sequencing platforms, with several others anticipated to become available shortly. The previously unimaginable scale and economy of these methods, coupled with their enthusiastic uptake by the scientific community and the potential for further improvements in accuracy and read length, suggest that these technologies are destined to make a huge and ongoing impact upon genomic and post-genomic biology. However, like the analysis of microarray data and the assembly and annotation of complete genome sequences from conventional sequencing data, the management and analysis of next-generation sequencing data requires (and indeed has already driven) the development of informatics tools able to assemble, map, and interpret huge quantities of relatively or extremely short nucleotide sequence data. Here we provide a broad overview of bioinformatics approaches that have been introduced for several genomics and functional genomics applications of next-generation sequencing. PMID:19864250

  5. Complete Genome Sequences of Three Strains of Coxsackievirus A7

    PubMed Central

    Ylä-Pelto, Jani; Koskinen, Satu; Karelehto, Eveliina; Sittig, Eleonora; Roivainen, Merja; Hyypiä, Timo

    2013-01-01

    Genomes of three strains (Parker, USSR, and 275/58) of coxsackievirus A7 (CV-A7) were amplified by the long reverse transcription (RT)-PCR method and sequenced. While the sequences of Parker and USSR were identical, the similarities of 275/58 to the CV-A7 reference sequence, accession no. AY421765, were 82.6% and 96.2% for nucleotides and amino acids, respectively. PMID:23580710

  6. Spectral entropy criteria for structural segmentation in genomic DNA sequences

    NASA Astrophysics Data System (ADS)

    Chechetkin, V. R.; Lobzin, V. V.

    2004-07-01

    The spectral entropy is calculated with Fourier structure factors and characterizes the level of structural ordering in a sequence of symbols. It may efficiently be applied to the assessment and reconstruction of the modular structure in genomic DNA sequences. We present the relevant spectral entropy criteria for the local and non-local structural segmentation in DNA sequences. The results are illustrated with the model examples and analysis of intervening exon-intron segments in the protein-coding regions.

  7. Complete Chloroplast Genome Sequence of a Major Allogamous Forage Species, Perennial Ryegrass (Lolium perenne L.)

    PubMed Central

    Diekmann, Kerstin; Hodkinson, Trevor R.; Wolfe, Kenneth H.; van den Bekerom, Rob; Dix, Philip J.; Barth, Susanne

    2009-01-01

    Lolium perenne L. (perennial ryegrass) is globally one of the most important forage and grassland crops. We sequenced the chloroplast (cp) genome of Lolium perenne cultivar Cashel. The L. perenne cp genome is 135 282 bp with a typical quadripartite structure. It contains genes for 76 unique proteins, 30 tRNAs and four rRNAs. As in other grasses, the genes accD, ycf1 and ycf2 are absent. The genome is of average size within its subfamily Pooideae and of medium size within the Poaceae. Genome size differences are mainly due to length variations in non-coding regions. However, considerable length differences of 1–27 codons in comparison of L. perenne to other Poaceae and 1–68 codons among all Poaceae were also detected. Within the cp genome of this outcrossing cultivar, 10 insertion/deletion polymorphisms and 40 single nucleotide polymorphisms were detected. Two of the polymorphisms involve tiny inversions within hairpin structures. By comparing the genome sequence with RT–PCR products of transcripts for 33 genes, 31 mRNA editing sites were identified, five of them unique to Lolium. The cp genome sequence of L. perenne is available under Accession number AM777385 at the European Molecular Biology Laboratory, National Center for Biotechnology Information and DNA DataBank of Japan. PMID:19414502

  8. A DRAFT SEQUENCE OF THE RICE GENOME (ORYZA SATIVA L. SSP. INDICA)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genome of the japonica subspecies of rice, an important cereal and model monocot, was sequenced and assembled by whole-genome shotgun sequencing. The assembled sequence covers 93% of the 420-megabase genome. Gene predictions on the assembled sequence suggest that the genome contains 32,000 to 50...

  9. Draft genome sequences of two virulent serotypes of avian Pasteurella multocida

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Here we report the draft genome sequences of two virulent avian strains of Pasteurella multocida. Comparative analyses of these genomes were done with the published genome sequence of avirulent Pasteurella multocida strain Pm70....

  10. The first draft of the pigeonpea genome sequence.

    PubMed

    Singh, Nagendra K; Gupta, Deepak K; Jayaswal, Pawan K; Mahato, Ajay K; Dutta, Sutapa; Singh, Sangeeta; Bhutani, Shefali; Dogra, Vivek; Singh, Bikram P; Kumawat, Giriraj; Pal, Jitendra K; Pandit, Awadhesh; Singh, Archana; Rawal, Hukum; Kumar, Akhilesh; Rama Prashat, G; Khare, Ambika; Yadav, Rekha; Raje, Ranjit S; Singh, Mahendra N; Datta, Subhojit; Fakrudin, Bashasab; Wanjari, Keshav B; Kansal, Rekha; Dash, Prasanta K; Jain, Pradeep K; Bhattacharya, Ramcharan; Gaikwad, Kishor; Mohapatra, Trilochan; Srinivasan, R; Sharma, Tilak R

    2012-01-01

    Pigeonpea (Cajanus cajan) is an important grain legume of the Indian subcontinent, South-East Asia and East Africa. More than eighty five percent of the world pigeonpea is produced and consumed in India where it is a key crop for food and nutritional security of the people. Here we present the first draft of the genome sequence of a popular pigeonpea variety 'Asha'. The genome was assembled using long sequence reads of 454 GS-FLX sequencing chemistry with mean read lengths of >550bp and >10-fold genome coverage, resulting in 510,809,477bp of high quality sequence. Total 47,004 protein coding genes and 12,511 transposable elements related genes were predicted. We identified 1,213 disease resistance/defense response genes and 152 abiotic stress tolerance genes in the pigeonpea genome that make it a hardy crop. In comparison to soybean, pigeonpea has relatively fewer number of genes for lipid biosynthesis and larger number of genes for cellulose synthesis. The sequence contigs were arranged in to 59,681 scaffolds, which were anchored to eleven chromosomes of pigeonpea with 347 genic-SNP markers of an intra-species reference genetic map. Eleven pigeonpea chromosomes showed low but significant synteny with the twenty chromosomes of soybean. The genome sequence was used to identify large number of hypervariable 'Arhar' simple sequence repeat (HASSR) markers, 437 of which were experimentally validated for PCR amplification and high rate of polymorphism among pigeonpea varieties. These markers will be useful for fingerprinting and diversity analysis of pigeonpea germplasm and molecular breeding applications. This is the first plant genome sequence completed entirely through a network of Indian institutions led by the Indian Council of Agricultural Research and provides a valuable resource for the pigeonpea variety improvement. PMID:24431589

  11. Sequence-Based Mapping of the Polyploid Wheat Genome

    PubMed Central

    Saintenac, Cyrille; Jiang, Dayou; Wang, Shichen; Akhunov, Eduard

    2013-01-01

    The emergence of new sequencing technologies has provided fast and cost-efficient strategies for high-resolution mapping of complex genomes. Although these approaches hold great promise to accelerate genome analysis, their application in studying genetic variation in wheat has been hindered by the complexity of its polyploid genome. Here, we applied the next-generation sequencing of a wheat doubled-haploid mapping population for high-resolution gene mapping and tested its utility for ordering shotgun sequence contigs of a flow-sorted wheat chromosome. A bioinformatical pipeline was developed for reliable variant analysis of sequence data generated for polyploid wheat mapping populations. The results of variant mapping were consistent with the results obtained using the wheat 9000 SNP iSelect assay. A reference map of the wheat genome integrating 2740 gene-associated single-nucleotide polymorphisms from the wheat iSelect assay, 1351 diversity array technology, 118 simple sequence repeat/sequence-tagged sites, and 416,856 genotyping-by-sequencing markers was developed. By analyzing the sequenced megabase-size regions of the wheat genome we showed that mapped markers are located within 40−100 kb from genes providing a possibility for high-resolution mapping at the level of a single gene. In our population, gene loci controlling a seed color phenotype cosegregated with 2459 markers including one that was located within the red seed color gene. We demonstrate that the high-density reference map presented here is a useful resource for gene mapping and linking physical and genetic maps of the wheat genome. PMID:23665877

  12. Genome Sequences of Mycobacteriophages Luchador and Nerujay.

    PubMed

    Pope, Welkin H; Ahmed, Taha; Drobitch, Marissa K; Early, David R; Eljamri, Soukaina; Kasturiarachi, Naomi S; Klonicki, Emily F; Manjooran, Daniel T; Ní Chochlain, Aífe N; Puglionesi, Andrew O; Rajakumar, Vinod; Shindle, Katherine A; Tran, Mai T; Brown, Bryony R; Churilla, Bryce M; Cohen, Karen L; Wilkes, Kellyn E; Grubb, Sarah R; Warner, Marcie H; Bowman, Charles A; Russell, Daniel A; Hatfull, Graham F

    2015-01-01

    Luchador and Nerujay are two newly isolated mycobacteriophages recovered from soil samples using Mycobacterium smegmatis. Their genomes are 53,387 bp and 53,455 bp long and have 96 and 97 predicted open reading frames, respectively. Nerujay is related to subcluster A1 phages, and Luchador represents a new subcluster, A14. PMID:26089414

  13. Genome Sequences of Mycobacteriophages Luchador and Nerujay

    PubMed Central

    Ahmed, Taha; Drobitch, Marissa K.; Early, David R.; Eljamri, Soukaina; Kasturiarachi, Naomi S.; Klonicki, Emily F.; Manjooran, Daniel T.; Ní Chochlain, Aífe N.; Puglionesi, Andrew O.; Rajakumar, Vinod; Shindle, Katherine A.; Tran, Mai T.; Brown, Bryony R.; Churilla, Bryce M.; Cohen, Karen L.; Wilkes, Kellyn E.; Grubb, Sarah R.; Warner, Marcie H.; Bowman, Charles A.; Russell, Daniel A.; Hatfull, Graham F.

    2015-01-01

    Luchador and Nerujay are two newly isolated mycobacteriophages recovered from soil samples using Mycobacterium smegmatis. Their genomes are 53,387 bp and 53,455 bp long and have 96 and 97 predicted open reading frames, respectively. Nerujay is related to subcluster A1 phages, and Luchador represents a new subcluster, A14. PMID:26089414

  14. Next Generation DNA Sequencing and the Future of Genomic Medicine

    PubMed Central

    Anderson, Matthew W.; Schrijver, Iris

    2010-01-01

    In the years since the first complete human genome sequence was reported, there has been a rapid development of technologies to facilitate high-throughput sequence analysis of DNA (termed “next-generation” sequencing). These novel approaches to DNA sequencing offer the promise of complete genomic analysis at a cost feasible for routine clinical diagnostics. However, the ability to more thoroughly interrogate genomic sequence raises a number of important issues with regard to result interpretation, laboratory workflow, data storage, and ethical considerations. This review describes the current high-throughput sequencing platforms commercially available, and compares the inherent advantages and disadvantages of each. The potential applications for clinical diagnostics are considered, as well as the need for software and analysis tools to interpret the vast amount of data generated. Finally, we discuss the clinical and ethical implications of the wealth of genetic information generated by these methods. Despite the challenges, we anticipate that the evolution and refinement of high-throughput DNA sequencing technologies will catalyze a new era of personalized medicine based on individualized genomic analysis. PMID:24710010

  15. New sequencers to take on the genome

    SciTech Connect

    Not Available

    1987-10-16

    DOE is exploring technologies that may allow sequencing thousands of bases a second, for less than a penny a base. There is considerable skepticism about whether that rate can actually be attained anytime soon. At this stage, most efforts are focusing on working out the bugs for the first generation of automated DNA sequencers. Sequencing by conventional techniques, involves generating a series of DNA fragments that vary in length by one nucleotide base-that is, they start at the same point and end at different bases, an A, G, T, or C.

  16. Performance comparison of whole-genome sequencing platforms

    PubMed Central

    Lam, Hugo Y K; Clark, Michael J; Chen, Rui; Chen, Rong; Natsoulis, Georges; O’Huallachain, Maeve; Dewey, Frederick E; Habegger, Lukas; Ashley, Euan A; Gerstein, Mark B; Butte, Atul J; Ji, Hanlee P; Snyder, Michael

    2014-01-01

    Whole-genome sequencing is becoming commonplace, but the accuracy and completeness of variant calling by the most widely used platforms from Illumina and Complete Genomics have not been reported. Here we sequenced the genome of an individual with both technologies to a high average coverage of ~76×, and compared their performance with respect to sequence coverage and calling of single-nucleotide variants (SNVs), insertions and deletions (indels). Although 88.1% of the ~3.7 million unique SNVs were concordant between platforms, there were tens of thousands of platform-specific calls located in genes and other genomic regions. In contrast, 26.5% of indels were concordant between platforms. Target enrichment validated 92.7% of the concordant SNVs, whereas validation by genotyping array revealed a sensitivity of 99.3%. The validation experiments also suggested that >60% of the platform-specific variants were indeed present in the genome. Our results have important implications for understanding the accuracy and completeness of the genome sequencing platforms. PMID:22178993

  17. Systematic genome sequence differences among leaf cells within individual trees

    PubMed Central

    2014-01-01

    Background Even in the age of next-generation sequencing (NGS), it has been unclear whether or not cells within a single organism have systematically distinctive genomes. Resolving this question, one of the most basic biological problems associated with DNA mutation rates, can assist efforts to elucidate essential mechanisms of cancer. Results Using genome profiling (GP), we detected considerable systematic variation in genome sequences among cells in individual woody plants. The degree of genome sequence difference (genomic distance) varied systematically from the bottom to the top of the plant, such that the greatest divergence was observed between leaf genomes from uppermost branches and the remainder of the tree. This systematic variation was observed within both Yoshino cherry and Japanese beech trees. Conclusions As measured by GP, the genomic distance between two cells within an individual organism was non-negligible, and was correlated with physical distance (i.e., branch-to-branch distance). This phenomenon was assumed to be the result of accumulation of mutations from each cell division, implying that the degree of divergence is proportional to the number of generations separating the two cells. PMID:24548431

  18. Human papillomavirus type 70 genome cloned from overlapping PCR products: complete nucleotide sequence and genomic organization.

    PubMed Central

    Forslund, O; Hansson, B G

    1996-01-01

    The genome of human papillomavirus (HPV) type 70 (HPV 70), isolated from a cervical condyloma, was obtained by cloning overlapping PCR products. By automated DNA sequence analysis, the genome was found to consist of 7,905 bp with a G + C content of 40%. The genomic organization showed the characteristic features shared by other sequenced HPVs. Nucleotide sequence comparison with previously known HPV types demonstrated the closest homology with HPV 68 (82%), HPV 39 (82%), HPV 18 (70%), HPV 45 (70%), and HPV 59 (70%). Comparison with seven other partially sequenced HPV 70 isolates showed homologies of between 100 and 99.5%. Cloning of overlapping PCR products and automated DNA sequence analysis was found to be a feasible method of obtaining full-length sequences of HPVs. PMID:8815087

  19. Evolution Analysis of Simple Sequence Repeats in Plant Genome

    PubMed Central

    Qin, Zhen; Wang, Yanping; Wang, Qingmei; Li, Aixian; Hou, Fuyun; Zhang, Liming

    2015-01-01

    Simple sequence repeats (SSRs) are widespread units on genome sequences, and play many important roles in plants. In order to reveal the evolution of plant genomes, we investigated the evolutionary regularities of SSRs during the evolution of plant species and the plant kingdom by analysis of twelve sequenced plant genome sequences. First, in the twelve studied plant genomes, the main SSRs were those which contain repeats of 1–3 nucleotides combination. Second, in mononucleotide SSRs, the A/T percentage gradually increased along with the evolution of plants (except for P. patens). With the increase of SSRs repeat number the percentage of A/T in C. reinhardtii had no significant change, while the percentage of A/T in terrestrial plants species gradually declined. Third, in dinucleotide SSRs, the percentage of AT/TA increased along with the evolution of plant kingdom and the repeat number increased in terrestrial plants species. This trend was more obvious in dicotyledon than monocotyledon. The percentage of CG/GC showed the opposite pattern to the AT/TA. Forth, in trinucleotide SSRs, the percentages of combinations including two or three A/T were in a rising trend along with the evolution of plant kingdom; meanwhile with the increase of SSRs repeat number in plants species, different species chose different combinations as dominant SSRs. SSRs in C. reinhardtii, P. patens, Z. mays and A. thaliana showed their specific patterns related to evolutionary position or specific changes of genome sequences. The results showed that, SSRs not only had the general pattern in the evolution of plant kingdom, but also were associated with the evolution of the specific genome sequence. The study of the evolutionary regularities of SSRs provided new insights for the analysis of the plant genome evolution. PMID:26630570

  20. Adaptive seeds tame genomic sequence comparison.

    PubMed

    Kiełbasa, Szymon M; Wan, Raymond; Sato, Kengo; Horton, Paul; Frith, Martin C

    2011-03-01

    The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition. PMID:21209072

  1. Genome sequence of the pea aphid Acyrthosiphon pisum.

    PubMed

    2010-02-01

    Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetabolous insect provides an outgroup to the multiple published genomes of holometabolous insects. Pea aphids are host-plant specialists, they can reproduce both sexually and asexually, and they have coevolved with an obligate bacterial symbiont. Here we highlight findings from whole genome analysis that may be related to these unusual biological features. These findings include discovery of extensive gene duplication in more than 2000 gene families as well as loss of evolutionarily conserved genes. Gene family expansions relative to other published genomes include genes involved in chromatin modification, miRNA synthesis, and sugar transport. Gene losses include genes central to the IMD immune pathway, selenoprotein utilization, purine salvage, and the entire urea cycle. The pea aphid genome reveals that only a limited number of genes have been acquired from bacteria; thus the reduced gene count of Buchnera does not reflect gene transfer to the host genome. The inventory of metabolic genes in the pea aphid genome suggests that there is extensive metabolite exchange between the aphid and Buchnera, including sharing of amino acid biosynthesis between the aphid and Buchnera. The pea aphid genome provides a foundation for post-genomic studies of fundamental biological questions and applied agricultural problems. PMID:20186266

  2. Genome Sequence of the Pea Aphid Acyrthosiphon pisum

    PubMed Central

    2010-01-01

    Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetabolous insect provides an outgroup to the multiple published genomes of holometabolous insects. Pea aphids are host-plant specialists, they can reproduce both sexually and asexually, and they have coevolved with an obligate bacterial symbiont. Here we highlight findings from whole genome analysis that may be related to these unusual biological features. These findings include discovery of extensive gene duplication in more than 2000 gene families as well as loss of evolutionarily conserved genes. Gene family expansions relative to other published genomes include genes involved in chromatin modification, miRNA synthesis, and sugar transport. Gene losses include genes central to the IMD immune pathway, selenoprotein utilization, purine salvage, and the entire urea cycle. The pea aphid genome reveals that only a limited number of genes have been acquired from bacteria; thus the reduced gene count of Buchnera does not reflect gene transfer to the host genome. The inventory of metabolic genes in the pea aphid genome suggests that there is extensive metabolite exchange between the aphid and Buchnera, including sharing of amino acid biosynthesis between the aphid and Buchnera. The pea aphid genome provides a foundation for post-genomic studies of fundamental biological questions and applied agricultural problems. PMID:20186266

  3. Genomic Sequence or Signature Tags (GSTs) from the Genome Group at Brookhaven National Laboratory (BNL)

    DOE Data Explorer

    Dunn, John J.; McCorkle, Sean R.; Praissman, Laura A.; Hind, Geoffrey; Van der Lelie, Daniel; Bahou, Wadie F.; Gnatenko, Dmitri V.; Krause, Maureen K.

    Genomic Signature Tags (GSTs) are the products of a method we have developed for identifying and quantitatively analyzing genomic DNAs. The DNA is initially fragmented with a type II restriction enzyme. An oligonucleotide adaptor containing a recognition site for MmeI, a type IIS restriction enzyme, is then used to release 21-bp tags from fixed positions in the DNA relative to the sites recognized by the fragmenting enzyme. These tags are PCR-amplified, purified, concatenated and then cloned and sequenced. The tag sequences and abundances are used to create a high resolution GST sequence profile of the genomic DNA. [Quoted from Genomic Signature Tags (GSTs): A System for Profiling Genomic DNA, Dunn, John J.; McCorkle, Sean R.; Praissman, Laura A.; Hind, Geoffrey; Van der Lelie, Daniel; Bahou, Wadie F.; Gnatenko, Dmitri V.; Krause, Maureen K., Revised 9/13/2002

  4. Toward Complete Bacterial Genome Sequencing Through the Combined Use of Multiple Next-Generation Sequencing Platforms.

    PubMed

    Jeong, Haeyoung; Lee, Dae-Hee; Ryu, Choong-Min; Park, Seung-Hwan

    2016-01-01

    PacBio's long-read sequencing technologies can be successfully used for a complete bacterial genome assembly using recently developed non-hybrid assemblers in the absence of secondgeneration, high-quality short reads. However, standardized procedures that take into account multiple pre-existing second-generation sequencing platforms are scarce. In addition to Illumina HiSeq and Ion Torrent PGM-based genome sequencing results derived from previous studies, we generated further sequencing data, including from the PacBio RS II platform, and applied various bioinformatics tools to obtain complete genome assemblies for five bacterial strains. Our approach revealed that the hierarchical genome assembly process (HGAP) non-hybrid assembler resulted in nearly complete assemblies at a moderate coverage of ~75x, but that different versions produced non-compatible results requiring post processing. The other two platforms further improved the PacBio assembly through scaffolding and a final error correction. PMID:26464377

  5. Draft genome sequence of the Tibetan antelope

    PubMed Central

    Ge, Ri-Li; Cai, Qingle; Shen, Yong-Yi; San, A; Ma, Lan; Zhang, Yong; Yi, Xin; Chen, Yan; Yang, Lingfeng; Huang, Ying; He, Rongjun; Hui, Yuanyuan; Hao, Meirong; Li, Yue; Wang, Bo; Ou, Xiaohua; Xu, Jiaohui; Zhang, Yongfen; Wu, Kui; Geng, Chunyu; Zhou, Weiping; Zhou, Taicheng; Irwin, David M.; Yang, Yingzhong; Ying, Liu; Bao, Haihua; Kim, Jaebum; Larkin, Denis M.; Ma, Jian; Lewin, Harris A.; Xing, Jinchuan; Platt, Roy N.; Ray, David A.; Auvil, Loretta; Capitanu, Boris; Zhang, Xiufeng; Zhang, Guojie; Murphy, Robert W.; Wang, Jun; Zhang, Ya-Ping; Wang, Jian

    2013-01-01

    The Tibetan antelope (Pantholops hodgsonii) is endemic to the extremely inhospitable high-altitude environment of the Qinghai-Tibetan Plateau, a region that has a low partial pressure of oxygen and high ultraviolet radiation. Here we generate a draft genome of this artiodactyl and use it to detect the potential genetic bases of highland adaptation. Compared with other plain-dwelling mammals, the genome of the Tibetan antelope shows signals of adaptive evolution and gene-family expansion in genes associated with energy metabolism and oxygen transmission. Both the highland American pika, and the Tibetan antelope have signals of positive selection for genes involved in DNA repair and the production of ATPase. Genes associated with hypoxia seem to have experienced convergent evolution. Thus, our study suggests that common genetic mechanisms might have been utilized to enable high-altitude adaptation. PMID:23673643

  6. Genome Sequencing Fishes out Longevity Genes.

    PubMed

    Lakhina, Vanisha; Murphy, Coleen T

    2015-12-01

    Understanding the molecular basis underlying aging is critical if we are to fully understand how and why we age-and possibly how to delay the aging process. Up until now, most longevity pathways were discovered in invertebrates because of their short lifespans and availability of genetic tools. Now, Reichwald et al. and Valenzano et al. independently provide a reference genome for the short-lived African turquoise killifish, establishing its role as a vertebrate system for aging research. PMID:26638067

  7. A rapid whole genome sequencing and analysis system supporting genomic epidemiology (7th Annual SFAF Meeting, 2012)

    SciTech Connect

    FitzGerald, Michael

    2012-06-01

    Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  8. A rapid whole genome sequencing and analysis system supporting genomic epidemiology (7th Annual SFAF Meeting, 2012)

    ScienceCinema

    FitzGerald, Michael [Broad Institute

    2013-02-12

    Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  9. Complete mitochondrial genome sequence of Cheirotonus jansoni (Coleoptera: Scarabaeidae).

    PubMed

    Shao, L L; Huang, D Y; Sun, X Y; Hao, J S; Cheng, C H; Zhang, W; Yang, Q

    2014-01-01

    We sequenced the complete mitochondrial genome (mitogenome) of Cheirotonus jansoni (Coleoptera: Scarabaeidae), an endangered insect species from Southeast Asia. This long legged scarab is widely collected and reared for sale, although it is rare and protected in the wild. The circular genome is 17,249 bp long and contains a typical gene complement: 13 protein-coding genes, 2 rRNA genes, 22 putative tRNA genes, and a non-coding AT-rich region. Its gene order and arrangement are identical to the common type found in most insect mitogenomes. As with all other sequenced coleopteran species, a 5-bp long TAGTA motif was detected in the intergenic space sequence located between trnS(UCN) and nad1. The atypical cox1 start codon is AAC, and the putative initiation codon for the atp8 gene appears to be GTC, instead of the frequently found ATN. By sequence comparison, the 2590-bp long non-coding AT-rich region is the second longest among the coleopterans, with two tandem repeat regions: one is 10 copies of an 88-bp sequence and the other is 2 copies of a 153-bp sequence. Additionally, the A+T content (64%) of the 13 protein-coding genes is the lowest among all sequenced coleopteran species. This newly sequenced genome aids in our understanding of the comparative biology of the mitogenomes of coleopteran species and supplies important data for the conservation of this species. PMID:24634126

  10. Strong nucleosomes of mouse genome including recovered centromeric sequences.

    PubMed

    Salih, Bilal F; Teif, Vladimir B; Tripathi, Vijay; Trifonov, Edward N

    2015-01-01

    Recently discovered strong nucleosomes (SNs) characterized by visibly periodical DNA sequences have been found to concentrate in centromeres of Arabidopsis thaliana and in transient meiotic centromeres of Caenorhabditis elegans. To find out whether such affiliation of SNs to centromeres is a more general phenomenon, we studied SNs of the Mus musculus. The publicly available genome sequences of mouse, as well as of practically all other eukaryotes do not include the centromere regions which are difficult to assemble because of a large amount of repeat sequences in the centromeres and pericentromeric regions. We recovered those missing sequences using the data from MNase-seq experiments in mouse embryonic stem cells, where the sequence of DNA inside nucleosomes, including missing regions, was determined by 100-bp paired-end sequencing. Those nucleosome sequences, which are not matching to the published genome sequence, would largely belong to the centromeres. By evaluating SN densities in centromeres and in non-centromeric regions, we conclude that mouse SNs concentrate in the centromeres of telocentric mouse chromosomes, with ~3.9 times excess compared to their density in the rest of the genome. The remaining non-centromeric SNs are harbored mainly by introns and intergenic regions, by retro-transposons, in particular. The centromeric involvement of the SNs opens new horizons for the chromosome and centromere structure studies. PMID:24998943

  11. The International Pea Genome Sequencing Project: Sequencing and Assembly Progresses Updates

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The International Consortium for the Pea Genome Sequencing (ICPG) includes scientists from six countries around the world. Its aim is to provide a high quality reference of the pea genome to the scientific community as well as to the pea breeder community. The consortium proposed a strategy that int...

  12. Sequencing quality assessment tools to enable data-driven informatics for high throughput genomics.

    PubMed

    Leggett, Richard M; Ramirez-Gonzalez, Ricardo H; Clavijo, Bernardo J; Waite, Darren; Davey, Robert P

    2013-01-01

    The processes of quality assessment and control are an active area of research at The Genome Analysis Centre (TGAC). Unlike other sequencing centers that often concentrate on a certain species or technology, TGAC applies expertise in genomics and bioinformatics to a wide range of projects, often requiring bespoke wet lab and in silico workflows. TGAC is fortunate to have access to a diverse range of sequencing and analysis platforms, and we are at the forefront of investigations into library quality and sequence data assessment. We have developed and implemented a number of algorithms, tools, pipelines and packages to ascertain, store, and expose quality metrics across a number of next-generation sequencing platforms, allowing rapid and in-depth cross-platform Quality Control (QC) bioinformatics. In this review, we describe these tools as a vehicle for data-driven informatics, offering the potential to provide richer context for downstream analysis and to inform experimental design. PMID:24381581

  13. Plasmodium knowlesi Genome Sequences from Clinical Isolates Reveal Extensive Genomic Dimorphism

    PubMed Central

    Millar, Scott B.; Sanderson, Theo; Otto, Thomas D.; Lu, Woon Chan; Krishna, Sanjeev; Rayner, Julian C.; Cox-Singh, Janet

    2015-01-01

    Plasmodium knowlesi is a newly described zoonosis that causes malaria in the human population that can be severe and fatal. The study of P. knowlesi parasites from human clinical isolates is relatively new and, in order to obtain maximum information from patient sample collections, we explored the possibility of generating P. knowlesi genome sequences from archived clinical isolates. Our patient sample collection consisted of frozen whole blood samples that contained excessive human DNA contamination and, in that form, were not suitable for parasite genome sequencing. We developed a method to reduce the amount of human DNA in the thawed blood samples in preparation for high throughput parasite genome sequencing using Illumina HiSeq and MiSeq sequencing platforms. Seven of fifteen samples processed had sufficiently pure P. knowlesi DNA for whole genome sequencing. The reads were mapped to the P. knowlesi H strain reference genome and an average mapping of 90% was obtained. Genes with low coverage were removed leaving 4623 genes for subsequent analyses. Previously we identified a DNA sequence dimorphism on a small fragment of the P. knowlesi normocyte binding protein xa gene on chromosome 14. We used the genome data to assemble full-length Pknbpxa sequences and discovered that the dimorphism extended along the gene. An in-house algorithm was developed to detect SNP sites co-associating with the dimorphism. More than half of the P. knowlesi genome was dimorphic, involving genes on all chromosomes and suggesting that two distinct types of P. knowlesi infect the human population in Sarawak, Malaysian Borneo. We use P. knowlesi clinical samples to demonstrate that Plasmodium DNA from archived patient samples can produce high quality genome data. We show that analyses, of even small numbers of difficult clinical malaria isolates, can generate comprehensive genomic information that will improve our understanding of malaria parasite diversity and pathobiology. PMID:25830531

  14. Complete genome sequence of Arthrobacter sp. strain FB24

    PubMed Central

    Nakatsu, Cindy H.; Barabote, Ravi; Thompson, Sue; Bruce, David; Detter, Chris; Brettin, Thomas; Han, Cliff; Beasley, Federico; Chen, Weimin; Konopka, Allan; Xie, Gary

    2013-01-01

    Arthrobacter sp. strain FB24 is a species in the genus Arthrobacter Conn and Dimmick 1947, in the family Micrococcaceae and class Actinobacteria. A number of Arthrobacter genome sequences have been completed because of their important role in soil, especially bioremediation. This isolate is of special interest because it is tolerant to multiple metals and it is extremely resistant to elevated concentrations of chromate. The genome consists of a 4,698,945 bp circular chromosome and three plasmids (96,488, 115,507, and 159,536 bp, a total of 5,070,478 bp), coding 4,536 proteins of which 1,257 are without known function. This genome was sequenced as part of the DOE Joint Genome Institute Program. PMID:24501649

  15. Complete genome sequence of Alicyclobacillus acidocaldarius type strain (104-IAT)

    PubMed Central

    Mavromatis, Konstantinos; Sikorski, Johannes; Lapidus, Alla; Glavina Del Rio, Tijana; Copeland, Alex; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Nolan, Matt; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Chain, Patrick; Meincke, Linda; Sims, David; Chertkov, Olga; Han, Cliff; Brettin, Thomas; Detter, John C.; Wahrenburg, Claudia; Rohde, Manfred; Pukall, Rüdiger; Göker, Markus; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C.

    2010-01-01

    Alicyclobacillus acidocaldarius (Darland and Brock 1971) is the type species of the larger of the two genera in the bacillal family ‘Alicyclobacillaceae’. A. acidocaldarius is a free-living and non-pathogenic organism, but may also be associated with food and fruit spoilage. Due to its acidophilic nature, several enzymes from this species have since long been subjected to detailed molecular and biochemical studies. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family ‘Alicyclobacillaceae’. The 3,205,686 bp long genome (chromosome and three plasmids) with its 3,153 protein-coding and 82 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304673

  16. Complete genome sequence of Klebsiella pneumoniae phage JD001.

    PubMed

    Cui, Zelin; Shen, Wenbin; Wang, Zheng; Zhang, Haotian; Me, Rao; Wang, Yanchun; Zeng, Lingbin; Zhu, Yongzhang; Qin, Jinhong; He, Ping; Guo, Xiaokui

    2012-12-01

    Klebsiella pneumoniae is a member of the family Enterobacteriaceae, opportunistic pathogens that are among the eight most prevalent infectious agents in hospitals. The emergence of multidrug-resistant strains of K. pneumoniae has became a public health problem globally. To develop an effective antimicrobial agent, we isolated a bacteriophage, named JD001, from seawater and sequenced its genome. Comparative genome analysis of phage JD001 with other K. pneumoniae bacteriophages revealed that phage JD001 has little similarity to previously published K. pneumoniae phages KP15, KP32, KP34, and phiKO2. Here we announce the complete genome sequence of JD001 and report major findings from the genomic analysis. PMID:23166250

  17. Complete Genome Sequence of Klebsiella pneumoniae Phage JD001

    PubMed Central

    Cui, Zelin; Shen, Wenbin; Wang, Zheng; Zhang, Haotian; Me, Rao; Wang, Yanchun; Zeng, Lingbin; Zhu, Yongzhang; Qin, Jinhong

    2012-01-01

    Klebsiella pneumoniae is a member of the family Enterobacteriaceae, opportunistic pathogens that are among the eight most prevalent infectious agents in hospitals. The emergence of multidrug-resistant strains of K. pneumoniae has became a public health problem globally. To develop an effective antimicrobial agent, we isolated a bacteriophage, named JD001, from seawater and sequenced its genome. Comparative genome analysis of phage JD001 with other K. pneumoniae bacteriophages revealed that phage JD001 has little similarity to previously published K. pneumoniae phages KP15, KP32, KP34, and phiKO2. Here we announce the complete genome sequence of JD001 and report major findings from the genomic analysis. PMID:23166250

  18. Complete genome sequence of Alicyclobacillus acidocaldarius type strain (104-IAT)

    SciTech Connect

    Mavromatis, K; Sikorski, Johannes; Lapidus, Alla L.; Glavina Del Rio, Tijana; Copeland, A; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Nolan, Matt; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Ivanova, N; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Meincke, Linda; Sims, David; Chertkov, Olga; Han, Cliff; Brettin, Tom; Detter, J C; Wahrenburg, Claudia; Rohde, Manfred; Pukall, Rudiger; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2010-01-01

    Alicyclobacillus acidocaldarius (Darland and Brock 1971) is the type species of the larger of the two genera in the bacillal family Alicyclobacillaceae . A. acidocaldarius is a free-living and non-pathogenic organism, but may also be associated with food and fruit spoilage. Due to its acidophilic nature, several enzymes from this species have since long been subjected to detailed molecular and biochemical studies. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family Alicyclobacillaceae . The 3,205,686 bp long genome (chromosome and three plasmids) with its 3,153 protein-coding and 82 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  19. Draft genome sequence of Paenibacillus sp. strain A2.

    PubMed

    Zheng, Beiwen; Zhang, Fan; Dong, Hao; Chai, Lujun; Shu, Fuchang; Yi, Shaojin; Wang, Zhengliang; Cui, Qingfeng; Dong, Hanping; Zhang, Zhongzhi; Hou, Dujie; Yang, Jinshui; She, Yuehui

    2016-01-01

    Paenibacillus sp. strain A2 is a Gram-negative rod-shaped bacterium isolated from a mixture of formation water and petroleum in Daqing oilfield, China. This facultative aerobic bacterium was found to have a broad capacity for metabolizing hydrocarbon and organosulfur compounds, which are the main reasons for the interest in sequencing its genome. Here we describe the features of Paenibacillus sp. strain A2, together with the genome sequence and its annotation. The 7,650,246bp long genome (1 chromosome but no plasmid) exhibits a G+C content of 54.2% and contains 7575 protein-coding and 49 RNA genes, including 3 rRNA genes. One putative alkane monooxygenase, one putative alkanesulfonate monooxygenase, one putative alkanesulfonate transporter and four putative sulfate transporters were found in the draft genome. PMID:26819653

  20. Complete genome sequence of Sanguibacter keddieii type strain (ST-74).

    PubMed

    Ivanova, Natalia; Sikorski, Johannes; Sims, David; Brettin, Thomas; Detter, John C; Han, Cliff; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Chen, Feng; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Pati, Amrita; Mavromatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; D'haeseleer, Patrik; Chain, Patrick; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Göker, Markus; Pukall, Rüdiger; Klenk, Hans-Peter; Kyrpides, Nikos C

    2009-01-01

    Sanguibacter keddieii is the type species of the genus Sanguibacter, the only genus within the family of Sanguibacteraceae. Phylogenetically, this family is located in the neighborhood of the genus Oerskovia and the family Cellulomonadaceae within the actinobacterial suborder Micrococcineae. The strain described in this report was isolated from blood of apparently healthy cows. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the family Sanguibacteraceae, and the 4,253,413 bp long single replicon genome with its 3735 protein-coding and 70 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304646

  1. Complete genome sequence of Arthrobacter sp. strain FB24

    SciTech Connect

    Nakatsu, C. H.; Barabote, Ravi; Thompson, Sue; Bruce, David; Detter, Chris; Brettin, T.; Han, Cliff F.; Beasley, Federico; Chen, Weimin; Konopka, Allan; Xie, Gary

    2013-09-30

    Arthrobacter sp. strain FB24 is a species in the genus Arthrobacter Conn and Dimmick 1947, in the family Micrococcaceae and class Actinobacteria. A number of Arthrobacter genome sequences have been completed because of their important role in soil, especially bioremediation. This isolate is of special interest because it is tolerant to multiple metals and it is extremely resistant to elevated concentrations of chromate. The genome consists of a 4,698,945 bp circular chromosome and three plasmids (96,488, 115,507, and 159,536 bp, a total of 5,070,478 bp), coding 4,536 proteins of which 1,257 are without known function. This genome was sequenced as part of the DOE Joint Genome Institute Program.

  2. The complete mitochondrial genome sequence of Malus hupehensis var. pinyiensis.

    PubMed

    Duan, Naibin; Sun, Honghe; Wang, Nan; Fei, Zhangjun; Chen, Xuesen

    2016-07-01

    The complete mitochondrial genome sequence of Malus hupehensis var. pinyiensis, a widely used apple rootstock, was determined using the Illumina high-throughput sequencing approach. The genome is 422,555 bp in length and has a GC content of 45.21%. It is separated by a pair of inverted repeats of 32,504 bp, to form a large single copy region of 213,055 bp and a small single copy region of 144,492 bp. The genome contains 38 protein-coding genes, four pseudogenes, 25 tRNA genes, and three rRNA genes. The genome is 25,608 bp longer than that of M. domestica, and several structural variations between these two mitogenomes were detected. PMID:26539696

  3. The complete genome sequence of Canna yellow streak virus.

    PubMed

    Monger, W A; Adams, I P; Glover, R H; Barrett, B

    2010-09-01

    Canna yellow streak virus (Potyvirus, Potyviridae) was sequenced using the novel method of next-generation pyrosequencing. The complete genome was found to be 9,502 nucleotides excluding the poly-A tail with a predicted genome organisation typical for a member of the genus Potyvirus. As with other potyviruses that infect monocotyledons, some of the predicted cleavage sites of the polyprotein genome were unusual, such as a glutamic acid/threonine (E/T) between the CI and 6K2 proteins and a glutamic acid/aspartic acid (E/D) between the NIa and NIb proteins. Evidence of the presence of endogenous pararetroviruses in the canna genome was found from the large number of sequences obtained with this method. PMID:20495988

  4. Whole genome sequencing in clinical and public health microbiology

    PubMed Central

    Kwong, J. C.; McCallum, N.; Sintchenko, V.; Howden, B. P.

    2015-01-01

    SummaryGenomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology. The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology. Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories. As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future. Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure. PMID:25730631

  5. Complete genome sequence of Acidimicrobium ferrooxidans type strain (ICPT)

    SciTech Connect

    Clum, Alicia; Nolan, Matt; Lang, Elke; Glavina Del Rio, Tijana; Tice, Hope; Copeland, A; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Goker, Markus; Spring, Stefan; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lapidus, Alla L.

    2009-01-01

    Acidimicrobium ferrooxidans (Clark and Norris 1996) is the sole and type species of the ge-nus, which until recently was the only genus within the actinobacterial family Acidimicrobia-ceae and in the order Acidomicrobiales. Rapid oxidation of iron pyrite during autotrophic growth in the absence of an enhanced CO2 concentration is characteristic for A. ferrooxidans. Here we describe the features of this organism, together with the complete genome se-quence, and annotation. This is the first complete genome sequence of the order Acidomi-crobiales, and the 2,158,157 bp long single replicon genome with its 2038 protein coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  6. Draft genome sequence of the rubber tree Hevea brasiliensis

    PubMed Central

    2013-01-01

    Background Hevea brasiliensis, a member of the Euphorbiaceae family, is the major commercial source of natural rubber (NR). NR is a latex polymer with high elasticity, flexibility, and resilience that has played a critical role in the world economy since 1876. Results Here, we report the draft genome sequence of H. brasiliensis. The assembly spans ~1.1 Gb of the estimated 2.15 Gb haploid genome. Overall, ~78% of the genome was identified as repetitive DNA. Gene prediction shows 68,955 gene models, of which 12.7% are unique to Hevea. Most of the key genes associated with rubber biosynthesis, rubberwood formation, disease resistance, and allergenicity have been identified. Conclusions The knowledge gained from this genome sequence will aid in the future development of high-yielding clones to keep up with the ever increasing need for natural rubber. PMID:23375136

  7. Complete genome sequence of Arcobacter nitrofigilis type strain (CIT)

    SciTech Connect

    Pati, Amrita; Gronow, Sabine; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Han, Cliff; Chertkov, Olga; Bruce, David; Tapia, Roxanne; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Jeffries, Cynthia; Detter, J. Chris; Rohde, Manfred; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2010-01-01

    Arcobacter nitrofigilis (McClung et al. 1983) Vandamme et al. 1991 is the type species of the genus Arcobacter in the epsilonproteobacterial family Campylobacteraceae. The species was first described in 1983 as Campylobacter nitrofigilis [1] after its detection as a free-living, nitrogen-fixing Campylobacter species associated with Spartina alterniflora Loisel. roots [2]. It is of phylogenetic interest because of its lifestyle as a symbiotic organism in a marine environment in contrast to many other Arcobacter species which are associated with warm-blooded animals and tend to be pathogenic. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a type stain of the genus Arcobacter. The 3,192,235 bp genome with its 3,154 protein-coding and 70 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  8. Complete genome sequence of Meiothermus ruber type strain (21T)

    SciTech Connect

    Tindall, Brian; Sikorski, Johannes; Lucas, Susan; Goltsman, Eugene; Copeland, A; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Han, Cliff; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Fahnrich, Regine; Goodwin, Lynne A.; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Rohde, Manfred; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lapidus, Alla L.

    2010-01-01

    Meiothermus ruber (Loginova et al. 1984) Nobre et al. 1996 is the type species of the genus Meiothermus. This thermophilic genus is of special interest, as its members can be affiliated to either low-temperature or high-temperature groups. The temperature related split is in accordance with the chemotaxonomic feature of the polar lipids. M. ruber is a representative of the low-temperature group. This is the first completed genome sequence of the genus Meiothermus and only the third genome sequence to be published from a member of the family Thermaceae. The 3,097,457 bp long genome with its 3,052 protein-coding and 53 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  9. The power of EST sequence data: Relation to Acyrthosiphon pisum genome annotation and functional genomics initiatives

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genes important to aphid biology, survival and reproduction were successfully identified by use of a genomics approach. We created and described the Sequencing, compilation, and annotation of the approxiamtely 525Mb nuclear genome of the pea aphid, Acyrthosiphon pisum, which represents an important ...

  10. Preliminary Genomic Characterization of Ten Hardwood Tree Species from Multiplexed Low Coverage Whole Genome Sequencing

    PubMed Central

    Staton, Margaret; Best, Teodora; Khodwekar, Sudhir; Owusu, Sandra; Xu, Tao; Xu, Yi; Jennings, Tara; Cronn, Richard; Arumuganathan, A. Kathiravetpilla; Coggeshall, Mark; Gailing, Oliver; Liang, Haiying; Romero-Severson, Jeanne; Schlarbaum, Scott; Carlson, John E.

    2015-01-01

    Forest health issues are on the rise in the United States, resulting from introduction of alien pests and diseases, coupled with abiotic stresses related to climate change. Increasingly, forest scientists are finding genetic/genomic resources valuable in addressing forest health issues. For a set of ten ecologically and economically important native hardwood tree species representing a broad phylogenetic spectrum, we used low coverage whole genome sequencing from multiplex Illumina paired ends to economically profile their genomic content. For six species, the genome content was further analyzed by flow cytometry in order to determine the nuclear genome size. Sequencing yielded a depth of 0.8X to 7.5X, from which in silico analysis yielded preliminary estimates of gene and repetitive sequence content in the genome for each species. Thousands of genomic SSRs were identified, with a clear predisposition toward dinucleotide repeats and AT-rich repeat motifs. Flanking primers were designed for SSR loci for all ten species, ranging from 891 loci in sugar maple to 18,167 in redbay. In summary, we have demonstrated that useful preliminary genome information including repeat content, gene content and useful SSR markers can be obtained at low cost and time input from a single lane of Illumina multiplex sequence. PMID:26698853

  11. An effective alternate cloning strategy for unstable mouse genomic sequences.

    PubMed

    Lan, Michael S; Muguira, Michelle

    2005-05-13

    Unstable mammalian genomic sequences frequently underwent spontaneous rearrangement during the bacterial cloning process. When the flanking sequences of an INSM1 gene comprised of 3.0 and 4.5 kb were subcloned into a targeting vector for a gene deletion study, both the genomic sequences underwent spontaneous rearrangement. Neither the usage of recombinase-free Escherichia coli competent cells nor lowering the culture incubation temperature averted the recombination events. Co-transformation of a methyltransferase vector, pAIT2, with the targeting vector had little effect in preventing recombination through methylation of the plasmid DNA. Here, we show that a single-copy cloning technique is effective to clone the unstable mouse genomic DNA into the targeting vector. PMID:15809045

  12. Exploring genome characteristics and sequence quality without a reference

    PubMed Central

    2014-01-01

    Motivation: The de novo assembly of large, complex genomes is a significant challenge with currently available DNA sequencing technology. While many de novo assembly software packages are available, comparatively little attention has been paid to assisting the user with the assembly. Results: This article addresses the practical aspects of de novo assembly by introducing new ways to perform quality assessment on a collection of sequence reads. The software implementation calculates per-base error rates, paired-end fragment-size distributions and coverage metrics in the absence of a reference genome. Additionally, the software will estimate characteristics of the sequenced genome, such as repeat content and heterozygosity that are key determinants of assembly difficulty. Availability: The software described is freely available online (https://github.com/jts/sga) and open source under the GNU Public License. Contact: jared.simpson@oicr.on.ca Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:24443382

  13. Molecular Poltergeists: Mitochondrial DNA Copies (numts) in Sequenced Nuclear Genomes

    PubMed Central

    Hazkani-Covo, Einat; Zeller, Raymond M.; Martin, William

    2010-01-01

    The natural transfer of DNA from mitochondria to the nucleus generates nuclear copies of mitochondrial DNA (numts) and is an ongoing evolutionary process, as genome sequences attest. In humans, five different numts cause genetic disease and a dozen human loci are polymorphic for the presence of numts, underscoring the rapid rate at which mitochondrial sequences reach the nucleus over evolutionary time. In the laboratory and in nature, numts enter the nuclear DNA via non-homolgous end joining (NHEJ) at double-strand breaks (DSBs). The frequency of numt insertions among 85 sequenced eukaryotic genomes reveal that numt content is strongly correlated with genome size, suggesting that the numt insertion rate might be limited by DSB frequency. Polymorphic numts in humans link maternally inherited mitochondrial genotypes to nuclear DNA haplotypes during the past, offering new opportunities to associate nuclear markers with mitochondrial markers back in time. PMID:20168995

  14. Genome sequence of the model medicinal mushroom Ganoderma lucidum

    PubMed Central

    Chen, Shilin; Xu, Jiang; Liu, Chang; Zhu, Yingjie; Nelson, David R.; Zhou, Shiguo; Li, Chunfang; Wang, Lizhi; Guo, Xu; Sun, Yongzhen; Luo, Hongmei; Li, Ying; Song, Jingyuan; Henrissat, Bernard; Levasseur, Anthony; Qian, Jun; Li, Jianqin; Luo, Xiang; Shi, Linchun; He, Liu; Xiang, Li; Xu, Xiaolan; Niu, Yunyun; Li, Qiushi; Han, Mira V.; Yan, Haixia; Zhang, Jin; Chen, Haimei; Lv, Aiping; Wang, Zhen; Liu, Mingzhu; Schwartz, David C.; Sun, Chao

    2012-01-01

    Ganoderma lucidum is a widely used medicinal macrofungus in traditional Chinese medicine that creates a diverse set of bioactive compounds. Here we report its 43.3-Mb genome, encoding 16,113 predicted genes, obtained using next-generation sequencing and optical mapping approaches. The sequence analysis reveals an impressive array of genes encoding cytochrome P450s (CYPs), transporters and regulatory proteins that cooperate in secondary metabolism. The genome also encodes one of the richest sets of wood degradation enzymes among all of the sequenced basidiomycetes. In all, 24 physical CYP gene clusters are identified. Moreover, 78 CYP genes are coexpressed with lanosterol synthase, and 16 of these show high similarity to fungal CYPs that specifically hydroxylate testosterone, suggesting their possible roles in triterpenoid biosynthesis. The elucidation of the G. lucidum genome makes this organism a potential model system for the study of secondary metabolic pathways and their regulation in medicinal fungi. PMID:22735441

  15. Complete genome sequence of Croceibacter atlanticus HTCC2559T.

    PubMed

    Oh, Hyun-Myung; Kang, Ilnam; Ferriera, Steve; Giovannoni, Stephen J; Cho, Jang-Cheon

    2010-09-01

    Here we announce the complete genome sequence of Croceibacter atlanticus HTCC2559(T), which was isolated by high-throughput dilution-to-extinction culturing from the Bermuda Atlantic Time Series station in the Western Sargasso Sea. Strain HTCC2559(T) contained genes for carotenoid biosynthesis, flavonoid biosynthesis, and several macromolecule-degrading enzymes. The genome confirmed physiological observations of cultivated Croceibacter atlanticus strain HTCC2559(T), which identified it as an obligate chemoheterotroph. PMID:20639333

  16. Genome sequencing, annotation of Citrobacter freundii strain GTC 09479.

    PubMed

    Kimura, Kazuyuki; Kumar, Shailesh; Takeo, Masahiro; Mayilraj, Shanmugam

    2014-12-01

    We report the 4.9-Mb genome sequence of Citrobacter freundii strain GTC 09479, isolated from urine sample collected during the year 1983 at Gifu University Graduate School of Medicine, Japan. This draft genome consist of 4,899,578bp with 51.62% G+C, 4,574 predicted CDSs, 72 tRNAs and 10 rRNAs. PMID:26484065

  17. Complete Genome Sequence of Klebsiella pneumoniae YH43

    PubMed Central

    Ogura, Yoshitoshi; Hayashi, Tetsuya; Mizunoe, Yoshimitsu

    2016-01-01

    We report here the complete genome sequence of Klebsiella pneumoniae strain YH43, isolated from sweet potato. The genome consists of a single circular chromosome of 5,520,319 bp in length. It carries 8 copies of rRNA operons, 86 tRNA genes, 5,154 protein-coding genes, and the nif gene cluster for nitrogen fixation. PMID:27081127

  18. Draft Genome Sequence of Mycobacterium austroafricanum DSM 44191

    PubMed Central

    Croce, Olivier; Robert, Catherine; Raoult, Didier

    2014-01-01

    We announce the draft genome sequence of Mycobacterium austroafricanum DSM 44191T (= E9789-SA12441T), a non-tuberculosis species responsible for opportunistic infection. The genome described here has a size of 6,772,357 bp with a G+C content of 66.79% and contains 6,419 protein-coding genes and 112 RNA genes. PMID:24744336

  19. Complete Genome Sequence of Bacillus cereus Bacteriophage BCP78

    PubMed Central

    Lee, Ju-Hoon; Shin, Hakdong; Son, Bokyung

    2012-01-01

    Bacillus cereus is generally found in soil habitats, and it contaminates a wide variety of foods, causing food poisoning with symptoms such as vomiting and diarrhea. To develop a novel biocontrol agent to inhibit this pathogen, bacteriophage BCP78 belonging to the Siphoviridae family was isolated from a fermented food sample. Here we announce the complete genome sequence of BCP78, which may be useful for understanding its inhibition mechanism against B. cereus, and describe major findings from the genome annotation. PMID:22158847

  20. Contribution to Sequencing of the Deinococcus radiodurans Genome

    SciTech Connect

    Minton, K.W.

    1999-03-11

    The stated goal of this project was to supply The Institute for Genomic Research (TIGR) with pure DNA from the bacterium Deinocmus radiodurans RI for purposes of complete genomic sequencing by TIGR. We subsequently decided to expand this project to include a second goal; this second goal was the development of a NotI chromosomal map of D. radiodurans R1 using Pulsed Field Gel Electrophoresis (PFGE).

  1. Genome Sequence of Corynebacterium ulcerans Strain FRC11

    PubMed Central

    Benevides, Leandro de Jesus; Viana, Marcus Vinicius Canário; Mariano, Diego César Batista; Rocha, Flávia de Souza; Bagano, Priscilla Carolinne; Folador, Edson Luiz; Pereira, Felipe Luiz; Dorella, Fernanda Alves; Leal, Carlos Augusto Gomes; Carvalho, Alex Fiorini; Soares, Siomar de Castro; Carneiro, Adriana; Ramos, Rommel; Badell-Ocando, Edgar; Guiso, Nicole; Silva, Artur; Figueiredo, Henrique; Guimarães, Luis Carlos

    2015-01-01

    Here, we present the genome sequence of Corynebacterium ulcerans strain FRC11. The genome includes one circular chromosome of 2,442,826 bp (53.35% G+C content), and 2,210 genes were predicted, 2,146 of which are putative protein-coding genes, with 12 rRNAs and 51 tRNAs; 1 pseudogene was also identified. PMID:25767241

  2. Sequencing small genomic targets with high efficiency and extreme accuracy

    PubMed Central

    Schmitt, Michael W.; Fox, Edward J.; Prindle, Marc J.; Reid-Bayliss, Kate S.; True, Lawrence D.; Radich, Jerald P.; Loeb, Lawrence A.

    2015-01-01

    The detection of minority variants in mixed samples demands methods for enrichment and accurate sequencing of small genomic intervals. We describe an efficient approach based on sequential rounds of hybridization with biotinylated oligonucleotides, enabling more than one-million fold enrichment of genomic regions of interest. In conjunction with error correcting double-stranded molecular tags, our approach enables the quantification of mutations in individual DNA molecules. PMID:25849638

  3. Transcription of densovirus endogenous sequences in the Myzus persicae genome.

    PubMed

    Clavijo, Gabriel; van Munster, Manuella; Monsion, Baptiste; Bochet, Nicole; Brault, Véronique

    2016-04-01

    Integration of non-retroviral sequences in the genome of different organisms has been observed and, in some cases, a relationship of these integrations with immunity has been established. The genome of the green peach aphid, Myzus persicae (clone G006), was screened for densovirus-like sequence (DLS) integrations. A total of 21 DLSs localized on 10 scaffolds were retrieved that mostly shared sequence identity with two aphid-infecting viruses, Myzus persicae densovirus (MpDNV) and Dysaphis plantaginea densovirus (DplDNV). In some cases, uninterrupted potential ORFs corresponding to non-structural viral proteins or capsid proteins were found within DLSs identified in the aphid genome. In particular, one scaffold harboured a complete virus-like genome, while another scaffold contained two virus-like genomes in reverse orientation. Remarkably, transcription of some of these ORFs was observed in M. persicae, suggesting a biological effect of these viral integrations. In contrast to most of the other densoviruses identified so far that induce acute host infection, it has been reported previously that MpDNV has only a minor effect on M. persicae fitness, while DplDNV can even have a beneficial effect on its aphid host. This suggests that DLS integration in the M. persicae genome may be responsible for the latency of MpDNV infection in the aphid host. PMID:26758080

  4. Genome sequence of the human malaria parasite Plasmodium falciparum

    PubMed Central

    Gardner, Malcolm J.; Hall, Neil; Fung, Eula; White, Owen; Berriman, Matthew; Hyman, Richard W.; Carlton, Jane M.; Pain, Arnab; Nelson, Karen E.; Bowman, Sharen; Paulsen, Ian T.; James, Keith; Eisen, Jonathan A.; Rutherford, Kim; Salzberg, Steven L.; Craig, Alister; Kyes, Sue; Chan, Man-Suen; Nene, Vishvanath; Shallom, Shamira J.; Suh, Bernard; Peterson, Jeremy; Angiuoli, Sam; Pertea, Mihaela; Allen, Jonathan; Selengut, Jeremy; Haft, Daniel; Mather, Michael W.; Vaidya, Akhil B.; Martin, David M. A.; Fairlamb, Alan H.; Fraunholz, Martin J.; Roos, David S.; Ralph, Stuart A.; McFadden, Geoffrey I.; Cummings, Leda M.; Subramanian, G. Mani; Mungall, Chris; Venter, J. Craig; Carucci, Daniel J.; Hoffman, Stephen L.; Newbold, Chris; Davis, Ronald W.; Fraser, Claire M.; Barrell, Bart

    2013-01-01

    The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and hostparasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria. PMID:12368864

  5. The complete mitochondrial genome sequence of the budgerigar, Melopsittacus undulatus.

    PubMed

    Guan, Xiaojing; Xu, Jun; Smith, Edward J

    2016-01-01

    Here, we describe the budgie's mitochondrial genome sequence, a resource that can facilitate this parrot's use as a model organism as well as for determining its phylogenetic relatedness to other parrots/Psittaciformes. The estimated total length of the sequence was 18,193?bp. In addition to the to the 13 protein and tRNA and rRNA coding regions, the sequence also includes a duplicated hypervariable region, a feature unique to only a few birds. The two hypervariable regions shared a sequence identity of about 86%. PMID:24660934

  6. Circlator: automated circularization of genome assemblies using long sequencing reads.

    PubMed

    Hunt, Martin; Silva, Nishadi De; Otto, Thomas D; Parkhill, Julian; Keane, Jacqueline A; Harris, Simon R

    2015-01-01

    The assembly of DNA sequence data is undergoing a renaissance thanks to emerging technologies capable of producing reads tens of kilobases long. Assembling complete bacterial and small eukaryotic genomes is now possible, but the final step of circularizing sequences remains unsolved. Here we present Circlator, the first tool to automate assembly circularization and produce accurate linear representations of circular sequences. Using Pacific Biosciences and Oxford Nanopore data, Circlator correctly circularized 26 of 27 circularizable sequences, comprising 11 chromosomes and 12 plasmids from bacteria, the apicoplast and mitochondrion of Plasmodium falciparum and a human mitochondrion. Circlator is available at http://sanger-pathogens.github.io/circlator/ . PMID:26714481

  7. A cryptographic approach to securely share and query genomic sequences.

    PubMed

    Kantarcioglu, Murat; Jiang, Wei; Liu, Ying; Malin, Bradley

    2008-09-01

    To support large-scale biomedical research projects, organizations need to share person-specific genomic sequences without violating the privacy of their data subjects. In the past, organizations protected subjects' identities by removing identifiers, such as name and social security number; however, recent investigations illustrate that deidentified genomic data can be "reidentified" to named individuals using simple automated methods. In this paper, we present a novel cryptographic framework that enables organizations to support genomic data mining without disclosing the raw genomic sequences. Organizations contribute encrypted genomic sequence records into a centralized repository, where the administrator can perform queries, such as frequency counts, without decrypting the data. We evaluate the efficiency of our framework with existing databases of single nucleotide polymorphism (SNP) sequences and demonstrate that the time needed to complete count queries is feasible for real world applications. For example, our experiments indicate that a count query over 40 SNPs in a database of 5000 records can be completed in approximately 30 min with off-the-shelf technology. We further show that approximation strategies can be applied to significantly speed up query execution times with minimal loss in accuracy. The framework can be implemented on top of existing information and network technologies in biomedical environments. PMID:18779075

  8. Complete genome sequence of Pyrolobus fumarii type strain (1AT)

    SciTech Connect

    Anderson, Iain; Goker, Markus; Nolan, Matt; Lucas, Susan; Hammon, Nancy; Deshpande, Shweta; Cheng, Jan-Fang; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne A.; Pitluck, Sam; Huntemann, Marcel; Liolios, Konstantinos; Ivanova, N; Pagani, Ioanna; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Brambilla, Evelyne-Marie; Huber, Harald; Yasawong, Montri; Rohde, Manfred; Spring, Stefan; Abt, Birte; Sikorski, Johannes; Wirth, Reinhard; Detter, J. Chris; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lapidus, Alla L.

    2011-01-01

    Pyrolobus fumarii Bl chl et al. 1997 is the type species of the genus Pyrolobus, which be- longs to the crenarchaeal family Pyrodictiaceae. The species is a facultatively microaerophilic non-motile crenarchaeon. It is of interest because of its isolated phylogenetic location in the tree of life and because it is a hyperthermophilic chemolithoautotroph known as the primary producer of organic matter at deep-sea hydrothermal vents. P. fumarii exhibits currently the highest optimal growth temperature of all life forms on earth (106 C). This is the first com- pleted genome sequence of a member of the genus Pyrolobus to be published and only the second genome sequence from a member of the family Pyrodictiaceae. Although Diversa Corporation announced the completion of sequencing of the P. fumarii genome on Septem- ber 25, 2001, this sequence was never released to the public. The 1,843,267 bp long genome with its 1,986 protein-coding and 52 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  9. Castor Bean Organelle Genome Sequencing and Worldwide Genetic Diversity Analysis

    PubMed Central

    Chan, Agnes P.; Williams, Amber L.; Rice, Danny W.; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M. J.; Khouri, Hoda M.; Beckstrom-Sternberg, Stephen M.; Allan, Gerard J.; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D.

    2011-01-01

    Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade. PMID:21750729

  10. The minimum information about a genome sequence (MIGS) specification

    PubMed Central

    Field, Dawn; Garrity, George; Gray, Tanya; Morrison, Norman; Selengut, Jeremy; Sterk, Peter; Tatusova, Tatiana; Thomson, Nicholas; Allen, Michael J; Angiuoli, Samuel V; Ashburner, Michael; Axelrod, Nelson; Baldauf, Sandra; Ballard, Stuart; Boore, Jeffrey; Cochrane, Guy; Cole, James; Dawyndt, Peter; De Vos, Paul; dePamphilis, Claude; Edwards, Robert; Faruque, Nadeem; Feldman, Robert; Gilbert, Jack; Gilna, Paul; Glöckner, Frank Oliver; Goldstein, Philip; Guralnick, Robert; Haft, Dan; Hancock, David; Hermjakob, Henning; Hertz-Fowler, Christiane; Hugenholtz, Phil; Joint, Ian; Kagan, Leonid; Kane, Matthew; Kennedy, Jessie; Kowalchuk, George; Kottmann, Renzo; Kolker, Eugene; Kravitz, Saul; Kyrpides, Nikos; Leebens-Mack, Jim; Lewis, Suzanna E; Li, Kelvin; Lister, Allyson L; Lord, Phillip; Maltsev, Natalia; Markowitz, Victor; Martiny, Jennifer; Methe, Barbara; Mizrachi, Ilene; Moxon, Richard; Nelson, Karen; Parkhill, Julian; Proctor, Lita; White, Owen; Sansone, Susanna-Assunta; Spiers, Andrew; Stevens, Robert; Swift, Paul; Taylor, Chris; Tateno, Yoshio; Tett, Adrian; Turner, Sarah; Ussery, David; Vaughan, Bob; Ward, Naomi; Whetzel, Trish; Gil, Ingio San; Wilson, Gareth; Wipat, Anil

    2008-01-01

    With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the ‘transparency’ of the information contained in existing genomic databases. PMID:18464787

  11. Genome Sequencing and Annotation of Mycobacterium tuberculosis PR08 strain

    PubMed Central

    Jaafar, Mohammad Maaruf; Halim, Mohd Zakihalani A.; Ismail, Mohamad Izwan; Shien, Lee Lian; Kek, Teh Lay; Fong, Ngeow Yun; Nor, Norazmi Mohd; Zainuddin, Zainul Fadziruddin; Hock, Tang Thean; Najimudin, Mohd Nazalan Mohd; Salleh, Mohd Zaki

    2015-01-01

    Mycobacterium tuberculosis is an acid fast bacterial species in the family Mycobacteriaceae and is the causative agent of most cases of tuberculosis. Here, we report the genomic features of Mycobacterium tuberculosis isolated from the cerebrospinal fluid (CSF) of a patient diagnosed with both pulmonary and extrapulmonary tuberculosis (TB). The isolated strain was identified as Mycobacterium tuberculosis PR08 (MTB PR08). Genomic DNA of the MTB PR08 strain was extracted and subjected to whole genome sequencing using MiSeq (Illumina, CA,USA). The draft genome size of MTB PR08 strain is 4,292,364 bp with a G + C content of 65.2%. This strain was annotated to have 4723 genes and 48 RNAs. This whole genome shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession number CP010895. PMID:26981383

  12. The minimum information about a genome sequence (MIGS) specification.

    PubMed

    Field, Dawn; Garrity, George; Gray, Tanya; Morrison, Norman; Selengut, Jeremy; Sterk, Peter; Tatusova, Tatiana; Thomson, Nicholas; Allen, Michael J; Angiuoli, Samuel V; Ashburner, Michael; Axelrod, Nelson; Baldauf, Sandra; Ballard, Stuart; Boore, Jeffrey; Cochrane, Guy; Cole, James; Dawyndt, Peter; De Vos, Paul; DePamphilis, Claude; Edwards, Robert; Faruque, Nadeem; Feldman, Robert; Gilbert, Jack; Gilna, Paul; Glöckner, Frank Oliver; Goldstein, Philip; Guralnick, Robert; Haft, Dan; Hancock, David; Hermjakob, Henning; Hertz-Fowler, Christiane; Hugenholtz, Phil; Joint, Ian; Kagan, Leonid; Kane, Matthew; Kennedy, Jessie; Kowalchuk, George; Kottmann, Renzo; Kolker, Eugene; Kravitz, Saul; Kyrpides, Nikos; Leebens-Mack, Jim; Lewis, Suzanna E; Li, Kelvin; Lister, Allyson L; Lord, Phillip; Maltsev, Natalia; Markowitz, Victor; Martiny, Jennifer; Methe, Barbara; Mizrachi, Ilene; Moxon, Richard; Nelson, Karen; Parkhill, Julian; Proctor, Lita; White, Owen; Sansone, Susanna-Assunta; Spiers, Andrew; Stevens, Robert; Swift, Paul; Taylor, Chris; Tateno, Yoshio; Tett, Adrian; Turner, Sarah; Ussery, David; Vaughan, Bob; Ward, Naomi; Whetzel, Trish; San Gil, Ingio; Wilson, Gareth; Wipat, Anil

    2008-05-01

    With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the 'transparency' of the information contained in existing genomic databases. PMID:18464787

  13. Genome Sequencing and Annotation of Mycobacterium tuberculosis PR08 strain.

    PubMed

    Jaafar, Mohammad Maaruf; Halim, Mohd Zakihalani A; Ismail, Mohamad Izwan; Shien, Lee Lian; Kek, Teh Lay; Fong, Ngeow Yun; Nor, Norazmi Mohd; Zainuddin, Zainul Fadziruddin; Hock, Tang Thean; Najimudin, Mohd Nazalan Mohd; Salleh, Mohd Zaki

    2016-03-01

    Mycobacterium tuberculosis is an acid fast bacterial species in the family Mycobacteriaceae and is the causative agent of most cases of tuberculosis. Here, we report the genomic features of Mycobacterium tuberculosis isolated from the cerebrospinal fluid (CSF) of a patient diagnosed with both pulmonary and extrapulmonary tuberculosis (TB). The isolated strain was identified as Mycobacterium tuberculosis PR08 (MTB PR08). Genomic DNA of the MTB PR08 strain was extracted and subjected to whole genome sequencing using MiSeq (Illumina, CA,USA). The draft genome size of MTB PR08 strain is 4,292,364 bp with a G + C content of 65.2%. This strain was annotated to have 4723 genes and 48 RNAs. This whole genome shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession number CP010895. PMID:26981383

  14. Complete genome sequence of Haliscomenobacter hydrossis type strain (OT)

    SciTech Connect

    Daligault, Hajnalka E.; Lapidus, Alla L.; Zeytun, Ahmet; Nolan, Matt; Lucas, Susan; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Pagani, Ioanna; Ivanova, N; Huntemann, Marcel; Mavromatis, K; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Brambilla, Evelyne-Marie; Rohde, Manfred; Verbarg, Susanne; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Woyke, Tanja

    2011-01-01

    Haliscomenobacter hydrossis van Veen et al. 1973 is the type species of the genus Halisco- menobacter, which belongs to order 'Sphingobacteriales'. The species is of interest because of its isolated phylogenetic location in the tree of life, especially the so far genomically un- charted part of it, and because the organism grows in a thin, hardly visible hyaline sheath. Members of the species were isolated from fresh water of lakes and from ditch water. The genome of H. hydrossis is the first completed genome sequence reported from a member of the family 'Saprospiraceae'. The 8,771,651 bp long genome with its three plasmids of 92 kbp, 144 kbp and 164 kbp length contains 6,848 protein-coding and 60 RNA genes, and is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  15. Sequencing a North American yak genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Livestock researchers are beginning to identify beneficial effects of natural genetic variation in livestock. For example, comparing gene sequences from related species has helped identify the underlying mechanisms of traits like coat color, fertility, and disease resistance. Although cattle and y...

  16. Draft genome sequence of Bacillus endophyticus 2102.

    PubMed

    Lee, Yong-Jik; Lee, Sang-Jae; Kim, Sun Hong; Lee, Sang Jun; Kim, Byoung-Chan; Lee, Han-Seung; Jeong, Haeyoung; Lee, Dong-Woo

    2012-10-01

    Bacillus endophyticus 2102 is an endospore-forming, plant growth-promoting rhizobacterium isolated from a hypersaline pond in South Korea. Here we present the draft sequence of B. endophyticus 2102, which is of interest because of its potential use in the industrial production of algaecides and bioplastics and for the treatment of industrial textile effluents. PMID:23012284

  17. Draft Genome Sequence of Bacillus endophyticus 2102

    PubMed Central

    Lee, Yong-Jik; Lee, Sang-Jae; Kim, Sun Hong; Lee, Sang Jun; Kim, Byoung-Chan; Lee, Han-Seung

    2012-01-01

    Bacillus endophyticus 2102 is an endospore-forming, plant growth-promoting rhizobacterium isolated from a hypersaline pond in South Korea. Here we present the draft sequence of B. endophyticus 2102, which is of interest because of its potential use in the industrial production of algaecides and bioplastics and for the treatment of industrial textile effluents. PMID:23012284

  18. Genome Sequence of Propionibacterium acidipropionici ATCC 55737.

    PubMed

    Luna-Flores, Carlos H; Nielsen, Lars K; Marcellin, Esteban

    2016-01-01

    Propionibacterium acidipropionici produces propionic acid as its main fermentation product. Traditionally derived from fossil fuels, environmental and sustainable issues have revived the interest in producing propionic acid using biological resources. Here, we present the closed sequence of Propionibacterium acidipropionici ATCC 55737, an efficient propionic acid producer. PMID:27198010

  19. MinION nanopore sequencing of an influenza genome

    PubMed Central

    Wang, Jing; Moore, Nicole E.; Deng, Yi-Mo; Eccles, David A.; Hall, Richard J.

    2015-01-01

    Influenza epidemics and pandemics have significant impacts on economies, morbidity and mortality worldwide. The ability to rapidly and accurately sequence influenza viruses is instrumental in the prevention and mitigation of influenza. All eight influenza genes from an influenza A virus were amplified by PCR simultaneously and then subjected to sequencing on a MinION nanopore sequencer. A complete influenza virus genome was obtained that shared greater than 99% identity with sequence data obtained from Illumina MiSeq and traditional Sanger-sequencing. The laboratory infrastructure and computing resources used to perform this experiment on the MinION nanopore sequencer would be available in most molecular laboratories around the world. Using this system, the concept of portability, and thus sequencing influenza viruses in the clinic or field is now tenable. PMID:26347715

  20. Establishing a framework for comparative analysis of genome sequences

    SciTech Connect

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  1. A HIGH COVERAGE GENOME SEQUENCE FROM AN ARCHAIC DENISOVAN INDIVIDUAL

    PubMed Central

    Meyer, Matthias; Kircher, Martin; Gansauge, Marie-Theres; Li, Heng; Racimo, Fernando; Mallick, Swapan; Schraiber, Joshua G.; Jay, Flora; Prüfer, Kay; de Filippo, Cesare; Sudmant, Peter H.; Alkan, Can; Fu, Qiaomei; Do, Ron; Rohland, Nadin; Tandon, Arti; Siebauer, Michael; Green, Richard E.; Bryc, Katarzyna; Briggs, Adrian W.; Stenzel, Udo; Dabney, Jesse; Shendure, Jay; Kitzman, Jacob; Hammer, Michael F.; Shunkov, Michael V.; Derevianko, Anatoli P.; Patterson, Nick; Andrés, Aida M.; Eichler, Evan E.; Slatkin, Montgomery; Reich, David; Kelso, Janet; Pääbo, Svante

    2013-01-01

    We present a DNA library preparation method that has allowed us to reconstruct a high coverage (30X) genome sequence of a Denisovan, an extinct relative of Neandertals. The quality of this genome allows a direct estimation of Denisovan heterozygosity indicating that genetic diversity in these archaic hominins was extremely low. It also allows tentative dating of the specimen on the basis of “missing evolution” in its genome, detailed measurements of Denisovan and Neandertal admixture into present-day human populations, and the generation of a near-complete catalog of genetic changes that swept to high frequency in modern humans since their divergence from Denisovans. PMID:22936568

  2. The complete chloroplast genome sequence of Curcuma flaviflora (Curcuma).

    PubMed

    Zhang, Yan; Deng, Jiabin; Li, Yangyi; Gao, Gang; Ding, Chunbang; Zhang, Li; Zhou, Yonghong; Yang, Ruiwu

    2016-09-01

    The complete chloroplast (cp) genome of Curcuma flaviflora, a medicinal plant in Southeast Asia, was sequenced. The genome size was 160 478 bp in length, with 36.3% GC content. A pair of inverted repeats (IRs) of 26 946 bp were separated by a large single copy (LSC) of 88 008 bp and a small single copy (SSC) of 18 578 bp, respectively. The cp genome contained 132 annotated genes, including 79 protein coding genes, 30 tRNA genes, and four rRNA genes. And 19 of these genes were duplicated in inverted repeat regions. PMID:26367332

  3. Genome sequence and description of Aeromicrobium massiliense sp. nov.

    PubMed Central

    Ramasamy, Dhamodharan; Kokcha, Sahare; Lagier, Jean-Christophe; Nguyen, Thi-Thien; Raoult, Didier

    2012-01-01

    Aeromicrobium massiliense strain JC14Tsp. nov. is the type strain of Aeromicrobium massiliense sp. nov., a new species within the genus Aeromicrobium. This strain, whose genome is described here, was isolated from the fecal microbiota of an asymptomatic patient. Aeromicrobium massiliense is an aerobic rod-shaped gram-positive bacterium. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,322,119 bp long genome contains 3,296 protein-coding and 51 RNA genes. PMID:23408663

  4. The complete chloroplast genome sequence of Chloranthus japonicus.

    PubMed

    Sun, Jing; Zhang, Gang; Li, Yimin; Chen, Ying; Zhang, Xiaofei; Tang, Zhishu; Wu, Haifeng

    2016-09-01

    The complete chloroplast genome of Chloranthus japonicus, an important traditional Chinese herbal medicine, was sequenced and characterized in this study. The genome size is 158,640 bp in length with 38.9% GC content. Two inverted repeats of 26,149 bp are separated by a large single-copy region (87,724 bp) and a small single-copy region (18,618 bp). The genome contains 131 individual genes, including 86 protein-coding genes, 37 tRNA genes and 8 rRNA genes. Eighteen genes contain one or two introns. PMID:25707409

  5. The complete mitochondrial genome sequence of Emperor Penguins (Aptenodytes forsteri).

    PubMed

    Xu, Qiwu; Xia, Yan; Dang, Xiao; Chen, Xiaoli

    2016-09-01

    The emperor penguin (Aptenodytes forsteri) is the largest living species of penguin. Herein, we first reported the complete mitochondrial genome of emperor penguin. The mitochondrial genome is a circular molecule of 17 301 bp in length, consisting of 13 protein-coding genes, 22 tRNA genes, two rRNA, and one control region. To verify the accuracy and the utility of new determined mitogenome sequences, we constructed the species phylogenetic tree of emperor penguin together with 10 other closely species. This is the second complete mitochondrial genome of penguin, and this is going to be an important data to study mitochondrial evolution of birds. PMID:26403091

  6. Draft genome sequence of Arthrospira platensis C1 (PCC9438).

    PubMed

    Cheevadhanarak, Supapon; Paithoonrangsarid, Kalyanee; Prommeenate, Peerada; Kaewngam, Warunee; Musigkain, Apiluck; Tragoonrung, Somvong; Tabata, Satoshi; Kaneko, Takakazu; Chaijaruwanich, Jeerayut; Sangsrakru, Duangjai; Tangphatsornruang, Sithichoke; Chanprasert, Juntima; Tongsima, Sissades; Kusonmano, Kanthida; Jeamton, Wattana; Dulsawat, Sudarat; Klanchui, Amornpan; Vorapreeda, Tayvich; Chumchua, Vasunun; Khannapho, Chiraphan; Thammarongtham, Chinae; Plengvidhya, Vethachai; Subudhi, Sanjukta; Hongsthong, Apiradee; Ruengjitchatchawalya, Marasri; Meechai, Asawin; Senachak, Jittisak; Tanticharoen, Morakot

    2012-03-19

    Arthrospira platensis is a cyanobacterium that is extensively cultivated outdoors on a large commercial scale for consumption as a food for humans and animals. It can be grown in monoculture under highly alkaline conditions, making it attractive for industrial production. Here we describe the complete genome sequence of A. platensis C1 strain and its annotation. The A. platensis C1 genome contains 6,089,210 bp including 6,108 protein-coding genes and 45 RNA genes, and no plasmids. The genome information has been used for further comparative analysis, particularly of metabolic pathways, photosynthetic efficiency and barriers to gene transfer. PMID:22675597

  7. Draft genome sequence of Arthrospira platensis C1 (PCC9438)

    PubMed Central

    Cheevadhanarak, Supapon; Paithoonrangsarid, Kalyanee; Prommeenate, Peerada; Kaewngam, Warunee; Musigkain, Apiluck; Tragoonrung, Somvong; Tabata, Satoshi; Kaneko, Takakazu; Chaijaruwanich, Jeerayut; Sangsrakru, Duangjai; Tangphatsornruang, Sithichoke; Chanprasert, Juntima; Tongsima, Sissades; Kusonmano, Kanthida; Jeamton, Wattana; Dulsawat, Sudarat; Klanchui, Amornpan; Vorapreeda, Tayvich; Chumchua, Vasunun; Khannapho, Chiraphan; Thammarongtham, Chinae; Plengvidhya, Vethachai; Subudhi, Sanjukta; Hongsthong, Apiradee; Ruengjitchatchawalya, Marasri; Meechai, Asawin; Senachak, Jittisak; Tanticharoen, Morakot

    2012-01-01

    Arthrospira platensis is a cyanobacterium that is extensively cultivated outdoors on a large commercial scale for consumption as a food for humans and animals. It can be grown in monoculture under highly alkaline conditions, making it attractive for industrial production. Here we describe the complete genome sequence of A. platensis C1 strain and its annotation. The A. platensis C1 genome contains 6,089,210 bp including 6,108 protein-coding genes and 45 RNA genes, and no plasmids. The genome information has been used for further comparative analysis, particularly of metabolic pathways, photosynthetic efficiency and barriers to gene transfer. PMID:22675597

  8. Deep Whole-Genome Sequencing of 100 Southeast Asian Malays

    PubMed Central

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-01

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. PMID:23290073

  9. Unveiling Mycoplasma hyopneumoniae Promoters: Sequence Definition and Genomic Distribution

    PubMed Central

    Weber, Shana de Souto; Sant'Anna, Fernando Hayashi; Schrank, Irene Silveira

    2012-01-01

    Several Mycoplasma species have had their genome completely sequenced, including four strains of the swine pathogen Mycoplasma hyopneumoniae. Nevertheless, little is known about the nucleotide sequences that control transcriptional initiation in these microorganisms. Therefore, with the objective of investigating the promoter sequences of M. hyopneumoniae, 23 transcriptional start sites (TSSs) of distinct genes were mapped. A pattern that resembles the σ70 promoter −10 element was found upstream of the TSSs. However, no −35 element was distinguished. Instead, an AT-rich periodic signal was identified. About half of the experimentally defined promoters contained the motif 5′-TRTGn-3′, which was identical to the −16 element usually found in Gram-positive bacteria. The defined promoters were utilized to build position-specific scoring matrices in order to scan putative promoters upstream of all coding sequences (CDSs) in the M. hyopneumoniae genome. Two hundred and one signals were found associated with 169 CDSs. Most of these sequences were located within 100 nucleotides of the start codons. This study has shown that the number of promoter-like sequences in the M. hyopneumoniae genome is more frequent than expected by chance, indicating that most of the sequences detected are probably biologically functional. PMID:22334569

  10. The mitochondrial genome sequence of the Tasmanian tiger (Thylacinus cynocephalus)

    PubMed Central

    Miller, Webb; Drautz, Daniela I.; Janecka, Jan E.; Lesk, Arthur M.; Ratan, Aakrosh; Tomsho, Lynn P.; Packard, Mike; Zhang, Yeting; McClellan, Lindsay R.; Qi, Ji; Zhao, Fangqing; Gilbert, M. Thomas P.; Daln, Love; Arsuaga, Juan Luis; Ericson, Per G.P.; Huson, Daniel H.; Helgen, Kristofer M.; Murphy, William J.; Gtherstrm, Anders; Schuster, Stephan C.

    2009-01-01

    We report the first two complete mitochondrial genome sequences of the thylacine (Thylacinus cynocephalus), or so-called Tasmanian tiger, extinct since 1936. The thylacine's phylogenetic position within australidelphian marsupials has long been debated, and here we provide strong support for the thylacine's basal position in Dasyuromorphia, aided by mitochondrial genome sequence that we generated from the extant numbat (Myrmecobius fasciatus). Surprisingly, both of our thylacine sequences differ by 11%15% from putative thylacine mitochondrial genes in GenBank, with one of our samples originating from a direct offspring of the previously sequenced individual. Our data sample each mitochondrial nucleotide an average of 50 times, thereby providing the first high-fidelity reference sequence for thylacine population genetics. Our two sequences differ in only five nucleotides out of 15,452, hinting at a very low genetic diversity shortly before extinction. Despite the samples heavy contamination with bacterial and human DNA and their temperate storage history, we estimate that as much as one-third of the total DNA in each sample is from the thylacine. The microbial content of the two thylacine samples was subjected to metagenomic analysis, and showed striking differences between a wild-captured individual and a born-in-captivity one. This study therefore adds to the growing evidence that extensive sequencing of museum collections is both feasible and desirable, and can yield complete genomes. PMID:19139089

  11. The mitochondrial genome sequence of the Tasmanian tiger (Thylacinus cynocephalus).

    PubMed

    Miller, Webb; Drautz, Daniela I; Janecka, Jan E; Lesk, Arthur M; Ratan, Aakrosh; Tomsho, Lynn P; Packard, Mike; Zhang, Yeting; McClellan, Lindsay R; Qi, Ji; Zhao, Fangqing; Gilbert, M Thomas P; Dalén, Love; Arsuaga, Juan Luis; Ericson, Per G P; Huson, Daniel H; Helgen, Kristofer M; Murphy, William J; Götherström, Anders; Schuster, Stephan C

    2009-02-01

    We report the first two complete mitochondrial genome sequences of the thylacine (Thylacinus cynocephalus), or so-called Tasmanian tiger, extinct since 1936. The thylacine's phylogenetic position within australidelphian marsupials has long been debated, and here we provide strong support for the thylacine's basal position in Dasyuromorphia, aided by mitochondrial genome sequence that we generated from the extant numbat (Myrmecobius fasciatus). Surprisingly, both of our thylacine sequences differ by 11%-15% from putative thylacine mitochondrial genes in GenBank, with one of our samples originating from a direct offspring of the previously sequenced individual. Our data sample each mitochondrial nucleotide an average of 50 times, thereby providing the first high-fidelity reference sequence for thylacine population genetics. Our two sequences differ in only five nucleotides out of 15,452, hinting at a very low genetic diversity shortly before extinction. Despite the samples' heavy contamination with bacterial and human DNA and their temperate storage history, we estimate that as much as one-third of the total DNA in each sample is from the thylacine. The microbial content of the two thylacine samples was subjected to metagenomic analysis, and showed striking differences between a wild-captured individual and a born-in-captivity one. This study therefore adds to the growing evidence that extensive sequencing of museum collections is both feasible and desirable, and can yield complete genomes. PMID:19139089

  12. Characteristics of cloned repeated DNA sequences in the barley genome

    SciTech Connect

    Anan'ev, E.V.; Bochkanov, S.S.; Ryzhik, M.V.; Sonina, N.V.; Chernyshev, A.I.; Shchipkova, N.I.; Yakovleva, E.Yu.

    1986-12-01

    A partial clone library of barley DNA fragments based on plasmid pBR325 was created. The cloned EcoRI-fragments of chromosomal DNA are from 2 to 14 kbp in length. More than 95% of the barley DNA inserts comprise repeated sequences of different complexity and copy number. Certain of these DNA sequences are from families comprising at least 1% of the barley genome. A significant proportion of the clones hybridize with numerous sets of restriction fragments of genome DNA and they are dispersed throughout the barley chromosomes.

  13. Complete Plastid Genome Sequence of the Brown Alga Undaria pinnatifida

    PubMed Central

    Liu, Tao; Wang, Guoliang; Chi, Shan; Liu, Cui; Wang, Haiyang

    2015-01-01

    In this study, we fully sequenced the circular plastid genome of a brown alga, Undaria pinnatifida. The genome is 130,383 base pairs (bp) in size; it contains a large single-copy (LSC, 76,598 bp) and a small single-copy region (SSC, 42,977 bp), separated by two inverted repeats (IRa and IRb: 5,404 bp). The genome contains 139 protein-coding, 28 tRNA, and 6 rRNA genes; none of these genes contains introns. Organization and gene contents of the U. pinnatifida plastid genome were similar to those of Saccharina japonica. There is a co-linear relationship between the plastid genome of U. pinnatifida and that of three previously sequenced large brown algal species. Phylogenetic analyses of 43 taxa based on 23 plastid protein-coding genes grouped all plastids into a red or green lineage. In the large brown algae branch, U. pinnatifida and S. japonica formed a sister clade with much closer relationship to Ectocarpus siliculosus than to Fucus vesiculosus. For the first time, the start codon ATT was identified in the plastid genome of large brown algae, in the atpA gene of U. pinnatifida. In addition, we found a gene-length change induced by a 3-bp repetitive DNA in ycf35 and ilvB genes of the U. pinnatifida plastid genome. PMID:26426800

  14. Sequence modelling and an extensible data model for genomic database

    SciTech Connect

    Li, Peter Wei-Der Lawrence Berkeley Lab., CA )

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  15. Sequence modelling and an extensible data model for genomic database

    SciTech Connect

    Li, Peter Wei-Der |

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  16. The genome sequence of the model ascomycete fungus Podospora anserina

    PubMed Central

    Espagne, Eric; Lespinet, Olivier; Malagnac, Fabienne; Da Silva, Corinne; Jaillon, Olivier; Porcel, Betina M; Couloux, Arnaud; Aury, Jean-Marc; Ségurens, Béatrice; Poulain, Julie; Anthouard, Véronique; Grossetete, Sandrine; Khalili, Hamid; Coppin, Evelyne; Déquard-Chablat, Michelle; Picard, Marguerite; Contamine, Véronique; Arnaise, Sylvie; Bourdais, Anne; Berteaux-Lecellier, Véronique; Gautheret, Daniel; de Vries, Ronald P; Battaglia, Evy; Coutinho, Pedro M; Danchin, Etienne GJ; Henrissat, Bernard; Khoury, Riyad EL; Sainsard-Chanet, Annie; Boivin, Antoine; Pinan-Lucarré, Bérangère; Sellem, Carole H; Debuchy, Robert; Wincker, Patrick; Weissenbach, Jean; Silar, Philippe

    2008-01-01

    Background The dung-inhabiting ascomycete fungus Podospora anserina is a model used to study various aspects of eukaryotic and fungal biology, such as ageing, prions and sexual development. Results We present a 10X draft sequence of P. anserina genome, linked to the sequences of a large expressed sequence tag collection. Similar to higher eukaryotes, the P. anserina transcription/splicing machinery generates numerous non-conventional transcripts. Comparison of the P. anserina genome and orthologous gene set with the one of its close relatives, Neurospora crassa, shows that synteny is poorly conserved, the main result of evolution being gene shuffling in the same chromosome. The P. anserina genome contains fewer repeated sequences and has evolved new genes by duplication since its separation from N. crassa, despite the presence of the repeat induced point mutation mechanism that mutates duplicated sequences. We also provide evidence that frequent gene loss took place in the lineages leading to P. anserina and N. crassa. P. anserina contains a large and highly specialized set of genes involved in utilization of natural carbon sources commonly found in its natural biotope. It includes genes potentially involved in lignin degradation and efficient cellulose breakdown. Conclusion The features of the P. anserina genome indicate a highly dynamic evolution since the divergence of P. anserina and N. crassa, leading to the ability of the former to use specific complex carbon sources that match its needs in its natural biotope. PMID:18460219

  17. Quantitative assessment of mitochondrial DNA copies from whole genome sequencing

    PubMed Central

    2012-01-01

    Background Mitochondrial dysfunction is associated with various aging diseases. The copy number of mtDNA in human cells may therefore be a potential biomarker for diagnostics of aging. Here we propose a new computational method for the accurate assessment of mtDNA copies from whole genome sequencing data. Results Two families of the human whole genome sequencing datasets from the HapMap and the 1000 Genomes projects were used for the accurate counting of mitochondrial DNA copy numbers. The results revealed the parental mitochondrial DNA copy numbers are significantly lower than that of their children in these samples. There are 8%~21% more copies of mtDNA in samples from the children than from their parents. The experiment demonstrated the possible correlations between the quantity of mitochondrial DNA and aging-related diseases. Conclusions Since the next-generation sequencing technology strives to deliver affordable and non-biased sequencing results, accurate assessment of mtDNA copy numbers can be achieved effectively from the output of whole genome sequencing. We implemented the method as a software package MitoCounter with the source code and user's guide available to the public at http://sourceforge.net/projects/mitocounter/. PMID:23282223

  18. Overview of PSB track on gene structure identification in large-scale genomic sequence

    SciTech Connect

    Uberbacher, E.C.; Xu, Y.

    1998-12-31

    The recent funding of more than a dozen major genome centers to begin community-wide high-throughput sequencing of the human genome has created a significant new challenge for the computational analysis of DNA sequence and the prediction of gene structure and function. It has been estimated that on average from 1996 to 2003, approximately 2 million bases of newly finished DNA sequence will be produced every day and be made available on the Internet and in central databases. The finished (fully assembled) sequence generated each day will represent approximately 75 new genes (and their respective proteins), and many times this number will be represented in partially completed sequences. The information contained in these is of immeasurable value to medical research, biotechnology, the pharmaceutical industry and researchers in a host of fields ranging from microorganism metabolism, to structural biology, to bioremediation. Sequencing of microorganisms and other model organisms is also ramping up at a very rapid rate. The genomes for yeast and several microorganisms such as H. influenza have recently been fully sequenced, although the significance of many genes remains to be determined.

  19. Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing

    PubMed Central

    2011-01-01

    Background Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. Results A 0.5 genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. Conclusions The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first step in the development of a community resource for further study of plant-insect co-evolution, anti-herbivore defense, floral developmental genetics, reproductive biology, chemical evolution, population genetics, and comparative genomics using milkweeds, and A. syriaca in particular, as ecological and evolutionary models. PMID:21542930

  20. Mitochondrial DNA sequences in the nuclear genome of a locust.

    PubMed

    Gellissen, G; Bradfield, J Y; White, B N; Wyatt, G R

    The endosymbiotic theory of the origin of mitochondria is widely accepted, and implies that loss of genes from the mitochondria to the nucleus of eukaryotic cells has occurred over evolutionary time. However, evidence at the DNA sequence level for gene transfer between these organelles has so far been limited to a single example, the demonstration that a mitochondrial ATPase subunit gene of Neurospora crassa has an homologous partner in the nuclear genome. From a gene library of the insect, Locusta migratoria, we have now isolated two clones, representing separate fragments of nuclear DNA, which contain sequences homologous to the mitochondrial genes for ribosomal RNA, as well as regions of homology with highly repeated nuclear sequences. The results suggest the transfer of sequences between mitochondrial and nuclear genomes, followed by evolutionary divergence. PMID:6298629

  1. Easy quantitative assessment of genome editing by sequence trace decomposition

    PubMed Central

    Brinkman, Eva K.; Chen, Tao; Amendola, Mario; van Steensel, Bas

    2014-01-01

    The efficacy and the mutation spectrum of genome editing methods can vary substantially depending on the targeted sequence. A simple, quick assay to accurately characterize and quantify the induced mutations is therefore needed. Here we present TIDE, a method for this purpose that requires only a pair of PCR reactions and two standard capillary sequencing runs. The sequence traces are then analyzed by a specially developed decomposition algorithm that identifies the major induced mutations in the projected editing site and accurately determines their frequency in a cell population. This method is cost-effective and quick, and it provides much more detailed information than current enzyme-based assays. An interactive web tool for automated decomposition of the sequence traces is available. TIDE greatly facilitates the testing and rational design of genome editing strategies. PMID:25300484

  2. The first complete genome sequence of iris severe mosaic virus.

    PubMed

    Li, Yongqiang; Deng, Congliang; Shang, Qiaoxia; Zhao, Xiaoli; Liu, Xingliang; Zhou, Qi

    2016-04-01

    The first complete genome sequence of ISMV was determined by deep sequencing of a small RNA library constructed from ISMV-infected samples and rapid amplification of cDNA ends (RACE) PCR. The ISMV genome consists of 10,403 nucleotides excluding the poly(A) tail and contains a large open reading frame encoding a polyprotein of 3316 amino acids. Putative proteolytic cleavage sites were identified by BLAST analysis. The ISMV polyprotein showed highest amino acid sequence identity to that encoded by onion yellow dwarf virus. Phylogenetic analysis of the polyprotein amino acid sequence confirmed that ISMV forms a cluster with shallot yellow stripe virus, Cyrtanthus elatus virus A, narcissus degeneration virus and onion yellow dwarf virus. These results confirm that ISMV is a distinct member of the genus Potyvirus. PMID:26729478

  3. Complete genome sequence of Thauera aminoaromatica strain MZ1T

    PubMed Central

    Jiang, Ke; Sanseverino, John; Chauhan, Archana; Lucas, Susan; Copeland, Alex; Lapidus, Alla; Del Rio, Tijana Glavina; Dalin, Eileen; Tice, Hope; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Sims, David; Brettin, Thomas; Detter, John C.; Han, Cliff; Chang, Y.J.; Larimer, Frank; Land, Miriam; Hauser, Loren; Kyrpides, Nikos C.; Mikhailova, Natalia; Moser, Scott; Jegier, Patricia; Close, Dan; DeBruyn, Jennifer M.; Wang, Ying; Layton, Alice C.; Allen, Michael S.; Sayler, Gary S.

    2012-01-01

    Thauera aminoaromatica strain MZ1T, an isolate belonging to genus Thauera, of the family Rhodocyclaceae and the class the Betaproteobacteria, has been characterized for its ability to produce abundant exopolysaccharide and degrade various aromatic compounds with nitrate as an electron acceptor. These properties, if fully understood at the genome-sequence level, can aid in environmental processing of organic matter in anaerobic cycles by short-circuiting a central anaerobic metabolite, acetate, from microbiological conversion to methane, a critical greenhouse gas. Strain MZ1T is the first strain from the genus Thauera with a completely sequenced genome. The 4,496,212 bp chromosome and 78,374 bp plasmid contain 4,071 protein-coding and 71 RNA genes, and were sequenced as part of the DOE Community Sequencing Program CSP_776774. PMID:23407619

  4. Complete genome sequence of Allochromatium vinosum DSM 180T

    SciTech Connect

    Weissgerber, Thomas; Zigann, Renate; Bruce, David; Chang, Yun-Juan; Detter, J. Chris; Han, Cliff; Hauser, Loren John; Jeffries, Cynthia; Land, Miriam L; Munk, Christine; Tapia, Roxanne; Dahl, Christiane

    2011-01-01

    Allochromatium vinosum formerly Chromatium vinosum is a mesophilic purple sulfur bacte- rium belonging to the family Chromatiaceae in the bacterial class Gammaproteobacteria. The genus Allochromatium contains currently five species. All members were isolated from fresh- water, brackish water or marine habitats and are predominately obligate phototrophs. Here we describe the features of the organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the Chromatiaceae within the purple sulfur bacteria thriving in globally occurring habitats. The 3,669,074 bp ge- nome with its 3,302 protein-coding and 64 RNA genes was sequenced within the Joint Ge- nome Institute Community Sequencing Program.

  5. Genome Calligrapher: A Web Tool for Refactoring Bacterial Genome Sequences for de Novo DNA Synthesis.

    PubMed

    Christen, Matthias; Deutsch, Samuel; Christen, Beat

    2015-08-21

    Recent advances in synthetic biology have resulted in an increasing demand for the de novo synthesis of large-scale DNA constructs. Any process improvement that enables fast and cost-effective streamlining of digitized genetic information into fabricable DNA sequences holds great promise to study, mine, and engineer genomes. Here, we present Genome Calligrapher, a computer-aided design web tool intended for whole genome refactoring of bacterial chromosomes for de novo DNA synthesis. By applying a neutral recoding algorithm, Genome Calligrapher optimizes GC content and removes obstructive DNA features known to interfere with the synthesis of double-stranded DNA and the higher order assembly into large DNA constructs. Subsequent bioinformatics analysis revealed that synthesis constraints are prevalent among bacterial genomes. However, a low level of codon replacement is sufficient for refactoring bacterial genomes into easy-to-synthesize DNA sequences. To test the algorithm, 168 kb of synthetic DNA comprising approximately 20 percent of the synthetic essential genome of the cell-cycle bacterium Caulobacter crescentus was streamlined and then ordered from a commercial supplier of low-cost de novo DNA synthesis. The successful assembly into eight 20 kb segments indicates that Genome Calligrapher algorithm can be efficiently used to refactor difficult-to-synthesize DNA. Genome Calligrapher is broadly applicable to recode biosynthetic pathways, DNA sequences, and whole bacterial genomes, thus offering new opportunities to use synthetic biology tools to explore the functionality of microbial diversity. The Genome Calligrapher web tool can be accessed at https://christenlab.ethz.ch/GenomeCalligrapher  . PMID:26107775

  6. Complete chloroplast genome sequence of Fritillaria unibracteata var. wabuensis based on SMRT Sequencing Technology.

    PubMed

    Li, Ying; Li, Qiushi; Li, Xiwen; Song, Jingyuan; Sun, Chao

    2016-09-01

    Fritillaria unibracteata var. wabuensis is an important medicinal plant used for the treatment of cough symptoms related to the respiratory system. The chloroplast genome of F. unibracteata var. wabuensis (GenBank accession no. KF769142) was assembled using the PacBio RS platform (Pacific Biosciences, Beverly, MA) as a circle sequence with 151 009 bp. The assembled genome contains 133 genes, including 88 protein-coding, 37 tRNA, and eight rRNA genes. This genome sequence will provide important resource for further studies on the evolution of Fritillaria genus and molecular identification of Fritillaria herbs and their adulterants. This work suggests that PacBio RS is a powerful tool to sequence and assemble chloroplast genomes. PMID:26370383

  7. An algorithm for finding substantially broken repeated sequences in newly sequenced genomes

    NASA Astrophysics Data System (ADS)

    Singh, Abanish; Stojanovic, Nikola

    2008-01-01

    Interspersed repeats occupy a significant fraction of many eukaryotic genomes. They result from the activity and accumulation of transposable elements, sequences which are able to replicate in virtually all organisms and which have been successfully maintained through the evolution. With the increasing availability of higher eukaryotic genomes, the identification and annotation of repeats has become an important task in genome biology and it has provoked a shift from the study of individual elements to their genome-wide distributions. In this paper we present a new method for de novo identification of repetitive segments in a genome, particularly suitable to identify these present in large copy numbers but which have diverged so much that they cannot be recognized by existing techniques, generally relying on relatively high sequence similarity between the copies.

  8. Animal selection for whole genome sequencing by quantifying the unique contribution of homozygous haplotypes sequenced

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Major whole genome sequencing projects promise to identify rare and causal variants within livestock species; however, the efficient selection of animals for sequencing remains a major problem within these surveys. The goal of this project was to develop a library of high accuracy genetic variants f...

  9. Complete Genome Sequence of the WHO International Standard for HIV-2 RNA Determined by Deep Sequencing

    PubMed Central

    Ham, Claire; Morris, Clare

    2016-01-01

    The World Health Organization (WHO) International Standard for HIV-2 RNA nucleic acid assays was characterized by complete genome deep sequencing. The entire coding sequence and flanking long terminal repeats (LTRs), including minority species, were assigned subtype A. This information will aid design, development, and evaluation of HIV-2 RNA amplification assays. PMID:26847885

  10. First Complete Genome Sequence of Two Staphylococcus epidermidis Bacteriophages▿ †

    PubMed Central

    Daniel, Anu; Bonnen, Penelope E.; Fischetti, Vincent A.

    2007-01-01

    Staphylococcus epidermidis is an important opportunistic pathogen causing nosocomial infections and is often associated with infections in patients with implanted prosthetic devices. A number of virulence determinants have been identified in S. epidermidis, which are typically acquired through horizontal gene transfer. Due to the high recombination potential, bacteriophages play an important role in these transfer events. Knowledge of phage genome sequences provides insights into phage-host biology and evolution. We present the complete genome sequence and a molecular characterization of two S. epidermidis phages, φPH15 (PH15) and φCNPH82 (CNPH82). Both phages belonged to the Siphoviridae family and produced stable lysogens. The PH15 and CNPH82 genomes displayed high sequence homology; however, our analyses also revealed important functional differences. The PH15 genome contained two introns, and in vivo splicing of phage mRNAs was demonstrated for both introns. Secondary structures for both introns were also predicted and showed high similarity to those of Streptococcus thermophilus phage 2972 introns. An additional finding was differential superinfection inhibition between the two phages that corresponded with differences in nucleotide sequence and overall gene content within the lysogeny module. We conducted phylogenetic analyses on all known Siphoviridae, which showed PH15 and CNPH82 clustering with Staphylococcus aureus, creating a novel clade within the S. aureus group and providing a higher overall resolution of the siphophage branch of the phage proteomic tree than previous studies. Until now, no S. epidermidis phage genome sequences have been reported in the literature, and thus this study represents the first complete genomic and molecular description of two S. epidermidis phages. PMID:17172342

  11. Data for identification of porcine X-chromosome inactivation center, XIC, by genomic comparison with human and mouse XIC

    PubMed Central

    Hwang, Jae Yeon; Choi, Kwang-Hwan; Lee, Chang-Kyu

    2015-01-01

    The data included in this article shows homologies of genes in porcine X-chromosome inactivation center, XIC, to each orthologue in human and mouse XIC. Open sequences of XIC-linked genes in human and mouse were compared to porcine genome and sequence homology of each orthologue to porcine genome was calculated. Sequence information of porcine genes encoded in the genomic regions having sequence homology with the human XIC-linked genes and their 2 Kb upstream regions were downloaded. Obtained information was used to design primer pairs for expression and methylation pattern analyses of XIC-linked genes in pigs. The data represented in here is related and applied to the research article entitled “Dosage compensation of X-chromosome inactivation center, XIC,-linked genes in porcine preimplantation embryos: Non-chromosome wide initiation of X-chromosome inactivation in blastocysts”, published in Mechanisms of Development Hwang et al., 2015 [1]. PMID:26793753

  12. Data for identification of porcine X-chromosome inactivation center, XIC, by genomic comparison with human and mouse XIC.

    PubMed

    Hwang, Jae Yeon; Choi, Kwang-Hwan; Lee, Chang-Kyu

    2015-12-01

    The data included in this article shows homologies of genes in porcine X-chromosome inactivation center, XIC, to each orthologue in human and mouse XIC. Open sequences of XIC-linked genes in human and mouse were compared to porcine genome and sequence homology of each orthologue to porcine genome was calculated. Sequence information of porcine genes encoded in the genomic regions having sequence homology with the human XIC-linked genes and their 2 Kb upstream regions were downloaded. Obtained information was used to design primer pairs for expression and methylation pattern analyses of XIC-linked genes in pigs. The data represented in here is related and applied to the research article entitled "Dosage compensation of X-chromosome inactivation center, XIC,-linked genes in porcine preimplantation embryos: Non-chromosome wide initiation of X-chromosome inactivation in blastocysts", published in Mechanisms of Development Hwang et al., 2015 [1]. PMID:26793753

  13. Whole-Genome Sequences of Bacillus subtilis and Close Relatives

    PubMed Central

    Earl, Ashlee M.; Eppinger, Mark; Fricke, W. Florian; Rosovitz, M. J.; Rasko, David A.; Daugherty, Sean; Losick, Richard; Kolter, Roberto

    2012-01-01

    We sequenced four strains of Bacillus subtilis and the type strains for two closely related species, Bacillus vallismortis and Bacillus mojavensis. We report the high-quality Sanger genome sequences of B. subtilis subspecies subtilis RO-NN-1 and AUSI98, B. subtilis subspecies spizizenii TU-B-10T and DV1-B-1, Bacillus mojavensis RO-H-1T, and Bacillus vallismortis DV1-F-3T. PMID:22493193

  14. Whole-genome sequences of Bacillus subtilis and close relatives.

    PubMed

    Earl, Ashlee M; Eppinger, Mark; Fricke, W Florian; Rosovitz, M J; Rasko, David A; Daugherty, Sean; Losick, Richard; Kolter, Roberto; Ravel, Jacques

    2012-05-01

    We sequenced four strains of Bacillus subtilis and the type strains for two closely related species, Bacillus vallismortis and Bacillus mojavensis. We report the high-quality Sanger genome sequences of B. subtilis subspecies subtilis RO-NN-1 and AUSI98, B. subtilis subspecies spizizenii TU-B-10(T) and DV1-B-1, Bacillus mojavensis RO-H-1(T), and Bacillus vallismortis DV1-F-3(T). PMID:22493193

  15. Complete mitochondrial genome sequence of Grundulus bogotensis (Humboldt, 1821).

    PubMed

    Isaza, Juan P; Alzate, Juan F; Maldonado-Ocampo, Javier A

    2016-05-01

    The Grundulus bogotensis is an Endangered fish in Colombia. In this study, we report the complete mitochondrial DNA sequences of G. bogotensis. The entire genome comprised 17.123 bases and a GC content of 39.84%. The mitogenome sequence of G. bogotensis would contribute to better understand population genetics, and evolution of this lineage. Molecule was deposited at the GenBank database under the accession number KM677190. PMID:25405907

  16. Library Construction for Mutation Identification by Whole-Genome Sequencing.

    PubMed

    Smith, Harold E

    2015-01-01

    Next-generation sequencing provides a rapid and powerful method for mutation identification. Herein is described a workflow for sample preparation to allow the simultaneous mapping and identification of candidate mutations by whole-genome sequencing in Caenorhabditis elegans. The protocol is designed for small numbers of worms to accommodate classes of mutations, such as lethal and sterile alleles, that are difficult to identify by traditional means. PMID:26423963

  17. The Genomic HyperBrowser: inferential genomics at the sequence level

    PubMed Central

    2010-01-01

    The immense increase in the generation of genomic scale data poses an unmet analytical challenge, due to a lack of established methodology with the required flexibility and power. We propose a first principled approach to statistical analysis of sequence-level genomic information. We provide a growing collection of generic biological investigations that query pairwise relations between tracks, represented as mathematical objects, along the genome. The Genomic HyperBrowser implements the approach and is available at http://hyperbrowser.uio.no. PMID:21182759

  18. Targeted or whole genome sequencing of formalin fixed tissue samples: potential applications in cancer genomics

    PubMed Central

    Zhao, Yue; Cottrell, Joseph; Klotzle, Brandy; Godwin, Andrew K.; Koestler, Devin; Beyerlein, Peter; Fan, Jian-Bing; Bibikova, Marina; Chien, Jeremy

    2015-01-01

    Current genomic studies are limited by the poor availability of fresh-frozen tissue samples. Although formalin-fixed diagnostic samples are in abundance, they are seldom used in current genomic studies because of the concern of formalin-fixation artifacts. Better characterization of these artifacts will allow the use of archived clinical specimens in translational and clinical research studies. To provide a systematic analysis of formalin-fixation artifacts on Illumina sequencing, we generated 26 DNA sequencing data sets from 13 pairs of matched formalin-fixed paraffin-embedded (FFPE) and fresh-frozen (FF) tissue samples. The results indicate high rate of concordant calls between matched FF/FFPE pairs at reference and variant positions in three commonly used sequencing approaches (whole genome, whole exome, and targeted exon sequencing). Global mismatch rates and C·G > T·A substitutions were comparable between matched FF/FFPE samples, and discordant rates were low (<0.26%) in all samples. Finally, low-pass whole genome sequencing produces similar pattern of copy number alterations between FF/FFPE pairs. The results from our studies suggest the potential use of diagnostic FFPE samples for cancer genomic studies to characterize and catalog variations in cancer genomes. PMID:26305677

  19. Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data

    PubMed Central

    Jang, Ho; Hur, Youngmi; Lee, Hyunju

    2016-01-01

    DNA copy number alterations (CNAs) are the main genomic events that occur during the initiation and development of cancer. Distinguishing driver aberrant regions from passenger regions, which might contain candidate target genes for cancer therapies, is an important issue. Several methods for identifying cancer-driver genes from multiple cancer patients have been developed for single nucleotide polymorphism (SNP) arrays. However, for NGS data, methods for the SNP array cannot be directly applied because of different characteristics of NGS such as higher resolutions of data without predefined probes and incorrectly mapped reads to reference genomes. In this study, we developed a wavelet-based method for identification of focal genomic alterations for sequencing data (WIFA-Seq). We applied WIFA-Seq to whole genome sequencing data from glioblastoma multiforme, ovarian serous cystadenocarcinoma and lung adenocarcinoma, and identified focal genomic alterations, which contain candidate cancer-related genes as well as previously known cancer-driver genes. PMID:27156852

  20. Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data.

    PubMed

    Jang, Ho; Hur, Youngmi; Lee, Hyunju

    2016-01-01

    DNA copy number alterations (CNAs) are the main genomic events that occur during the initiation and development of cancer. Distinguishing driver aberrant regions from passenger regions, which might contain candidate target genes for cancer therapies, is an important issue. Several methods for identifying cancer-driver genes from multiple cancer patients have been developed for single nucleotide polymorphism (SNP) arrays. However, for NGS data, methods for the SNP array cannot be directly applied because of different characteristics of NGS such as higher resolutions of data without predefined probes and incorrectly mapped reads to reference genomes. In this study, we developed a wavelet-based method for identification of focal genomic alterations for sequencing data (WIFA-Seq). We applied WIFA-Seq to whole genome sequencing data from glioblastoma multiforme, ovarian serous cystadenocarcinoma and lung adenocarcinoma, and identified focal genomic alterations, which contain candidate cancer-related genes as well as previously known cancer-driver genes. PMID:27156852