Science.gov

Sample records for genome sequencing project

  1. Genome Project Standards in a New Era of Sequencing

    SciTech Connect

    GSC Consortia; HMP Jumpstart Consortia; Chain, P. S. G.; Grafham, D. V.; Fulton, R. S.; FitzGerald, M. G.; Hostetler, J.; Muzny, D.; Detter, J. C.; Ali, J.; Birren, B.; Bruce, D. C.; Buhay, C.; Cole, J. R.; Ding, Y.; Dugan, S.; Field, D.; Garrity, G. M.; Gibbs, R.; Graves, T.; Han, C. S.; Harrison, S. H.; Highlander, S.; Hugenholtz, P.; Khouri, H. M.; Kodira, C. D.; Kolker, E.; Kyrpides, N. C.; Lang, D.; Lapidus, A.; Malfatti, S. A.; Markowitz, V.; Metha, T.; Nelson, K. E.; Parkhill, J.; Pitluck, S.; Qin, X.; Read, T. D.; Schmutz, J.; Sozhamannan, S.; Strausberg, R.; Sutton, G.; Thomson, N. R.; Tiedje, J. M.; Weinstock, G.; Wollam, A.

    2009-06-01

    For over a decade, genome 43 sequences have adhered to only two standards that are relied on for purposes of sequence analysis by interested third parties (1, 2). However, ongoing developments in revolutionary sequencing technologies have resulted in a redefinition of traditional whole genome sequencing that requires a careful reevaluation of such standards. With commercially available 454 pyrosequencing (followed by Illumina, SOLiD, and now Helicos), there has been an explosion of genomes sequenced under the moniker 'draft', however these can be very poor quality genomes (due to inherent errors in the sequencing technologies, and the inability of assembly programs to fully address these errors). Further, one can only infer that such draft genomes may be of poor quality by navigating through the databases to find the number and type of reads deposited in sequence trace repositories (and not all genomes have this available), or to identify the number of contigs or genome fragments deposited to the database. The difficulty in assessing the quality of such deposited genomes has created some havoc for genome analysis pipelines and contributed to many wasted hours of (mis)interpretation. These same novel sequencing technologies have also brought an exponential leap in raw sequencing capability, and at greatly reduced prices that have further skewed the time- and cost-ratios of draft data generation versus the painstaking process of improving and finishing a genome. The resulting effect is an ever-widening gap between drafted and finished genomes that only promises to continue (Figure 1), hence there is an urgent need to distinguish good and poor datasets. The sequencing institutes in the authorship, along with the NIH's Human Microbiome Project Jumpstart Consortium (3), strongly believe that a new set of standards is required for genome sequences. The following represents a set of six community-defined categories of genome sequence standards that better reflect the

  2. Taxonomy becoming a driving force in genome sequencing projects.

    PubMed

    Tamames, Javier; Durante-Rodríguez, Gonzalo

    2013-06-01

    We studied the possible impact of genomic projects by comparing the number of published articles before and after the completion of the project. We found that for most species, there is no significant change in the number of citations. Also our study remarks the growing importance of taxonomy as main motivation for the sequencing of genomes. Copyright © 2013 Elsevier GmbH. All rights reserved.

  3. The NCGENES project: exploring the new world of genome sequencing.

    PubMed

    Foreman, Ann Katherine M; Lee, Kristy; Evans, James P

    2013-01-01

    Massively parallel sequencing (MPS) is now a clinical reality, promising improved diagnosis, targeted therapies, and population-based screening. To realize the potential of genomics, we must learn how to apply this technology optimally. The NCGENES project is designed to address several challenges that must be overcome in order to integrate MPS into clinical care.

  4. 959 Nematode Genomes: a semantic wiki for coordinating sequencing projects.

    PubMed

    Kumar, Sujai; Schiffer, Philipp H; Blaxter, Mark

    2012-01-01

    Genome sequencing has been democratized by second-generation technologies, and even small labs can sequence metazoan genomes now. In this article, we describe '959 Nematode Genomes'--a community-curated semantic wiki to coordinate the sequencing efforts of individual labs to collectively sequence 959 genomes spanning the phylum Nematoda. The main goal of the wiki is to track sequencing projects that have been proposed, are in progress, or have been completed. Wiki pages for species and strains are linked to pages for people and organizations, using machine- and human-readable metadata that users can query to see the status of their favourite worm. The site is based on the same platform that runs Wikipedia, with semantic extensions that allow the underlying taxonomy and data storage models to be maintained and updated with ease compared with a conventional database-driven web site. The wiki also provides a way to track and share preliminary data if those data are not polished enough to be submitted to the official sequence repositories. In just over a year, this wiki has already fostered new international collaborations and attracted newcomers to the enthusiastic community of nematode genomicists. www.nematodegenomes.org.

  5. Draft Genome Sequences of 1,183 Salmonella Strains from the 100K Pathogen Genome Project

    PubMed Central

    Kong, Nguyet; Davis, Matthew; Arabyan, Narine; Huang, Bihua C.; Weis, Allison M.; Chen, Poyin; Thao, Kao; Ng, Whitney; Chin, Ning; Foutouhi, Soraya; Foutouhi, Azarene; Kaufman, James; Xie, Yi; Storey, Dylan B.

    2017-01-01

    ABSTRACT Salmonella is a common food-associated bacterium that has substantial impact on worldwide human health and the global economy. This is the public release of 1,183 Salmonella draft genome sequences as part of the 100K Pathogen Genome Project. These isolates represent global genomic diversity in the Salmonella genus. PMID:28705963

  6. 100K Pathogen Genome Project: 306 Listeria Draft Genome Sequences for Food Safety and Public Health.

    PubMed

    Chen, Poyin; Kong, Nguyet; Huang, Bihua; Thao, Kao; Ng, Whitney; Storey, Dylan Bobby; Arabyan, Narine; Foutouhi, Azarene; Foutouhi, Soraya; Weimer, Bart C

    2017-02-09

    Listeria monocytogenes is a food-associated bacterium that is responsible for food-related illnesses worldwide. This is the initial public release of 306 L. monocytogenes genome sequences as part of the 100K Pathogen Genome Project. These isolates represent global genomic diversity in L. monocytogenes.

  7. 100K Pathogen Genome Project: 306 Listeria Draft Genome Sequences for Food Safety and Public Health

    PubMed Central

    Chen, Poyin; Kong, Nguyet; Huang, Bihua; Thao, Kao; Ng, Whitney; Storey, Dylan Bobby; Arabyan, Narine; Foutouhi, Azarene; Foutouhi, Soraya

    2017-01-01

    ABSTRACT Listeria monocytogenes is a food-associated bacterium that is responsible for food-related illnesses worldwide. This is the initial public release of 306 L. monocytogenes genome sequences as part of the 100K Pathogen Genome Project. These isolates represent global genomic diversity in L. monocytogenes. PMID:28183778

  8. Draft Genome Sequences of 1,183 Salmonella Strains from the 100K Pathogen Genome Project.

    PubMed

    Kong, Nguyet; Davis, Matthew; Arabyan, Narine; Huang, Bihua C; Weis, Allison M; Chen, Poyin; Thao, Kao; Ng, Whitney; Chin, Ning; Foutouhi, Soraya; Foutouhi, Azarene; Kaufman, James; Xie, Yi; Storey, Dylan B; Weimer, Bart C

    2017-07-13

    Salmonella is a common food-associated bacterium that has substantial impact on worldwide human health and the global economy. This is the public release of 1,183 Salmonella draft genome sequences as part of the 100K Pathogen Genome Project. These isolates represent global genomic diversity in the Salmonella genus. Copyright © 2017 Kong et al.

  9. 959 Nematode Genomes: a semantic wiki for coordinating sequencing projects

    PubMed Central

    Kumar, Sujai; Schiffer, Philipp H.; Blaxter, Mark

    2012-01-01

    Genome sequencing has been democratized by second-generation technologies, and even small labs can sequence metazoan genomes now. In this article, we describe ‘959 Nematode Genomes’—a community-curated semantic wiki to coordinate the sequencing efforts of individual labs to collectively sequence 959 genomes spanning the phylum Nematoda. The main goal of the wiki is to track sequencing projects that have been proposed, are in progress, or have been completed. Wiki pages for species and strains are linked to pages for people and organizations, using machine- and human-readable metadata that users can query to see the status of their favourite worm. The site is based on the same platform that runs Wikipedia, with semantic extensions that allow the underlying taxonomy and data storage models to be maintained and updated with ease compared with a conventional database-driven web site. The wiki also provides a way to track and share preliminary data if those data are not polished enough to be submitted to the official sequence repositories. In just over a year, this wiki has already fostered new international collaborations and attracted newcomers to the enthusiastic community of nematode genomicists. www.nematodegenomes.org. PMID:22058131

  10. Rhipicephalus (Boophilus) microplus strain Deutsch, whole genome shotgun sequencing project first submission of genome sequence

    USDA-ARS?s Scientific Manuscript database

    The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence difficult. Cot filtration/selection techniques were used to reduce the repetitive fraction of the tick genome and enrich for the fraction of DNA with gene-containing regions. The Cot-selected ...

  11. DOE project on genome mapping and sequencing. Progress report, 1992

    SciTech Connect

    Evans, G.A.

    1992-12-31

    These efforts on the human genome project were initiated in September, 1990, to contribute towards completion of the human genome project physical mapping effort. In the original application, the authors proposed a novel strategy for constructing a physical map of human chromosome 11, based upon techniques derived in this group and by others. The original goals were to (1) produce a set of cosmid reference clones mapped to specific sites by high resolution fluorescence in situ hybridization, (2) produce a set of associated STS sequences and PCR primers for each site, (3) isolate YAC clones corresponding to each STS and, (4) construct YAC contigs such that > 90% of the chromosome would be covered by contigs of 2 mb or greater. Since that time, and with the advent of new technology and reagents, the strategy has been modified slightly but still retains the same goals as originally proposed. The authors have added a project to produce chromosome 11-specific cDNAs and determine the map location and DNA sequence of a selected portion of them.

  12. A large and diverse collection of bovine genome sequences from the Canadian Cattle Genome Project.

    PubMed

    Stothard, Paul; Liao, Xiaoping; Arantes, Adriano S; De Pauw, Mary; Coros, Colin; Plastow, Graham S; Sargolzaei, Mehdi; Crowley, John J; Basarab, John A; Schenkel, Flavio; Moore, Stephen; Miller, Stephen P

    2015-01-01

    The Canadian Cattle Genome Project is a large-scale international project that aims to develop genomics-based tools to enhance the efficiency and sustainability of beef and dairy production. Obtaining DNA sequence information is an important part of achieving this goal as it facilitates efforts to associate specific DNA differences with phenotypic variation. These associations can be used to guide breeding decisions and provide valuable insight into the molecular basis of traits. We describe a dataset of 379 whole-genome sequences, taken primarily from key historic Bos taurus animals, along with the analyses that were performed to assess data quality. The sequenced animals represent ten populations relevant to beef or dairy production. Animal information (name, breed, population), sequence data metrics (mapping rate, depth, concordance), and sequence repository identifiers (NCBI BioProject and BioSample IDs) are provided to enable others to access and exploit this sequence information. The large number of whole-genome sequences generated as a result of this project will contribute to ongoing work aiming to catalogue the variation that exists in cattle as well as efforts to improve traits through genotype-guided selection. Studies of gene function, population structure, and sequence evolution are also likely to benefit from the availability of this resource.

  13. Sequencing of a new target genome: the Pediculus humanus humanus (Phthiraptera: Pediculidae) genome project.

    PubMed

    Pittendrigh, B R; Clark, J M; Johnston, J S; Lee, S H; Romero-Severson, J; Dasch, G A

    2006-11-01

    The human body louse, Pediculus humanus humanus (L.), and the human head louse, Pediculus humanus capitis, belong to the hemimetabolous order Phthiraptera. The body louse is the primary vector that transmits the bacterial agents of louse-borne relapsing fever, trench fever, and epidemic typhus. The genomes of the bacterial causative agents of several of these aforementioned diseases have been sequenced. Thus, determining the body louse genome will enhance studies of host-vector-pathogen interactions. Although not important as a major disease vector, head lice are of major social concern. Resistance to traditional pesticides used to control head and body lice have developed. It is imperative that new molecular targets be discovered for the development of novel compounds to control these insects. No complete genome sequence exists for a hemimetabolous insect species primarily because hemimetabolous insects often have large (2000 Mb) to very large (up to 16,300 Mb) genomes. Fortuitously, we determined that the human body louse has one of the smallest genome sizes known in insects, suggesting it may be a suitable choice as a minimal hemimetabolous genome in which many genes have been eliminated during its adaptation to human parasitism. Because many louse species infest birds and mammals, the body louse genome-sequencing project will facilitate studies of their comparative genomics. A 6-8X coverage of the body louse genome, plus sequenced expressed sequence tags, should provide the entomological, evolutionary biology, medical, and public health communities with useful genetic information.

  14. The ClinSeq Project: Piloting large-scale genome sequencing for research in genomic medicine

    PubMed Central

    Biesecker, Leslie G.; Mullikin, James C.; Facio, Flavia M.; Turner, Clesson; Cherukuri, Praveen F.; Blakesley, Robert W.; Bouffard, Gerard G.; Chines, Peter S.; Cruz, Pedro; Hansen, Nancy F.; Teer, Jamie K.; Maskeri, Baishali; Young, Alice C.; Manolio, Teri A.; Wilson, Alexander F.; Finkel, Toren; Hwang, Paul; Arai, Andrew; Remaley, Alan T.; Sachdev, Vandana; Shamburek, Robert; Cannon, Richard O.; Green, Eric D.

    2009-01-01

    ClinSeq is a pilot project to investigate the use of whole-genome sequencing as a tool for clinical research. By piloting the acquisition of large amounts of DNA sequence data from individual human subjects, we are fostering the development of hypothesis-generating approaches for performing research in genomic medicine, including the exploration of issues related to the genetic architecture of disease, implementation of genomic technology, informed consent, disclosure of genetic information, and archiving, analyzing, and displaying sequence data. In the initial phase of ClinSeq, we are enrolling roughly 1000 participants; the evaluation of each includes obtaining a detailed family and medical history, as well as a clinical evaluation. The participants are being consented broadly for research on many traits and for whole-genome sequencing. Initially, Sanger-based sequencing of 300–400 genes thought to be relevant to atherosclerosis is being performed, with the resulting data analyzed for rare, high-penetrance variants associated with specific clinical traits. The participants are also being consented to allow the contact of family members for additional studies of sequence variants to explore their potential association with specific phenotypes. Here, we present the general considerations in designing ClinSeq, preliminary results based on the generation of an initial 826 Mb of sequence data, the findings for several genes that serve as positive controls for the project, and our views about the potential implications of ClinSeq. The early experiences with ClinSeq illustrate how large-scale medical sequencing can be a practical, productive, and critical component of research in genomic medicine. PMID:19602640

  15. The Qatar genome project: translation of whole-genome sequencing into clinical practice.

    PubMed

    Zayed, Hatem

    2016-10-01

    Qatar Genome Project was launched in 2013 with the intent to sequence the genome of each Qatari citizen in an effort to protect Qataris from the high rate of indigenous genetic diseases by allowing the mapping of disease-causing variants/rare variants and establishing a Qatari reference genome. Indeed, this project is expected to have numerous global benefits because the elevated homogeneity of the Qatari population, that will make Qatar an excellent genetic laboratory that will generate a wealth of data that will allow us to make sense of the genotype-phenotype correlations of many diseases, especially the complex multifactorial diseases, and will pave the way for changing the traditional medical practice of looking first at the phenotype rather than the genotype. © 2016 John Wiley & Sons Ltd.

  16. Len Gen: The international lentil genome sequencing project

    USDA-ARS?s Scientific Manuscript database

    We have been sequencing CDC Redberry using NGS of paired-end and mate-pair libraries over a wide range of sizes and technologies. The most recent draft (v0.7) of approximately 150x coverage produced scaffolds covering over half the genome (2.7 Gb of the expected 4.3 Gb). Long reads from PacBio sequ...

  17. Genome Sequencing.

    PubMed

    Verma, Mansi; Kulshrestha, Samarth; Puri, Ayush

    2017-01-01

    Genome sequencing is an important step toward correlating genotypes with phenotypic characters. Sequencing technologies are important in many fields in the life sciences, including functional genomics, transcriptomics, oncology, evolutionary biology, forensic sciences, and many more. The era of sequencing has been divided into three generations. First generation sequencing involved sequencing by synthesis (Sanger sequencing) and sequencing by cleavage (Maxam-Gilbert sequencing). Sanger sequencing led to the completion of various genome sequences (including human) and provided the foundation for development of other sequencing technologies. Since then, various techniques have been developed which can overcome some of the limitations of Sanger sequencing. These techniques are collectively known as "Next-generation sequencing" (NGS), and are further classified into second and third generation technologies. Although NGS methods have many advantages in terms of speed, cost, and parallelism, the accuracy and read length of Sanger sequencing is still superior and has confined the use of NGS mainly to resequencing genomes. Consequently, there is a continuing need to develop improved real time sequencing techniques. This chapter reviews some of the options currently available and provides a generic workflow for sequencing a genome.

  18. The International Pea Genome Sequencing Project: Sequencing and Assembly Progresses Updates

    USDA-ARS?s Scientific Manuscript database

    The International Consortium for the Pea Genome Sequencing (ICPG) includes scientists from six countries around the world. Its aim is to provide a high quality reference of the pea genome to the scientific community as well as to the pea breeder community. The consortium proposed a strategy that int...

  19. Sequence Analysis and Characterization of Active Human Alu Subfamilies Based on the 1000 Genomes Pilot Project

    PubMed Central

    Konkel, Miriam K.; Walker, Jerilyn A.; Hotard, Ashley B.; Ranck, Megan C.; Fontenot, Catherine C.; Storer, Jessica; Stewart, Chip; Marth, Gabor T.; Batzer, Mark A.

    2015-01-01

    The goal of the 1000 Genomes Consortium is to characterize human genome structural variation (SV), including forms of copy number variations such as deletions, duplications, and insertions. Mobile element insertions, particularly Alu elements, are major contributors to genomic SV among humans. During the pilot phase of the project we experimentally validated 645 (611 intergenic and 34 exon targeted) polymorphic “young” Alu insertion events, absent from the human reference genome. Here, we report high resolution sequencing of 343 (322 unique) recent Alu insertion events, along with their respective target site duplications, precise genomic breakpoint coordinates, subfamily assignment, percent divergence, and estimated A-rich tail lengths. All the sequenced Alu loci were derived from the AluY lineage with no evidence of retrotransposition activity involving older Alu families (e.g., AluJ and AluS). AluYa5 is currently the most active Alu subfamily in the human lineage, followed by AluYb8, and many others including three newly identified subfamilies we have termed AluYb7a3, AluYb8b1, and AluYa4a1. This report provides the structural details of 322 unique Alu variants from individual human genomes collectively adding about 100 kb of genomic variation. Many Alu subfamilies are currently active in human populations, including a surprising level of AluY retrotransposition. Human Alu subfamilies exhibit continuous evolution with potential drivers sprouting new Alu lineages. PMID:26319576

  20. Whole Genome Sequencing

    MedlinePlus

    ... you want to learn. Search form Search Whole Genome Sequencing You are here Home Testing & Services Testing ... the full story, click here . What is whole genome sequencing? Whole genome sequencing is the mapping out ...

  1. Fungal Genome Sequencing and Bioenergy

    SciTech Connect

    Baker, Scott E.; Thykaer, Jette; Adney, William S.; Brettin, T.; Brockman, Fred J.; D'haeseleer, Patrik; Martinez, Antonio D.; Miller, R. M.; Rokhsar, Daniel S.; Schadt, Christopher W.; Torok, Tamas; Tuskan, Gerald; Bennett, Joan W.; Berka, Randy; Briggs, Steve; Heitman, Joseph; Taylor, John; Turgeon, Barbara G.; Werner-Washburne, Maggie; Himmel, Michael E.

    2008-09-30

    To date, the number of ongoing filamentous fungal genome sequencing projects is almost tenfold fewer than those of bacterial and archaeal genome projects. The fungi chosen for sequencing represent narrow kingdom diversity; most are pathogens or models. We advocate an ambitious, forward-looking phylogenetic-based genome sequencing program, designed to capture metabolic diversity within the fungal kingdom, thereby enhancing research into alternative bioenergy sources, bioremediation, and fungal-environment interactions.

  2. Sequence Analysis and Characterization of Active Human Alu Subfamilies Based on the 1000 Genomes Pilot Project.

    PubMed

    Konkel, Miriam K; Walker, Jerilyn A; Hotard, Ashley B; Ranck, Megan C; Fontenot, Catherine C; Storer, Jessica; Stewart, Chip; Marth, Gabor T; Batzer, Mark A

    2015-08-29

    The goal of the 1000 Genomes Consortium is to characterize human genome structural variation (SV), including forms of copy number variations such as deletions, duplications, and insertions. Mobile element insertions, particularly Alu elements, are major contributors to genomic SV among humans. During the pilot phase of the project we experimentally validated 645 (611 intergenic and 34 exon targeted) polymorphic "young" Alu insertion events, absent from the human reference genome. Here, we report high resolution sequencing of 343 (322 unique) recent Alu insertion events, along with their respective target site duplications, precise genomic breakpoint coordinates, subfamily assignment, percent divergence, and estimated A-rich tail lengths. All the sequenced Alu loci were derived from the AluY lineage with no evidence of retrotransposition activity involving older Alu families (e.g., AluJ and AluS). AluYa5 is currently the most active Alu subfamily in the human lineage, followed by AluYb8, and many others including three newly identified subfamilies we have termed AluYb7a3, AluYb8b1, and AluYa4a1. This report provides the structural details of 322 unique Alu variants from individual human genomes collectively adding about 100 kb of genomic variation. Many Alu subfamilies are currently active in human populations, including a surprising level of AluY retrotransposition. Human Alu subfamilies exhibit continuous evolution with potential drivers sprouting new Alu lineages. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  3. The MedSeq Project: a randomized trial of integrating whole genome sequencing into clinical medicine.

    PubMed

    Vassy, Jason L; Lautenbach, Denise M; McLaughlin, Heather M; Kong, Sek Won; Christensen, Kurt D; Krier, Joel; Kohane, Isaac S; Feuerman, Lindsay Z; Blumenthal-Barby, Jennifer; Roberts, J Scott; Lehmann, Lisa Soleymani; Ho, Carolyn Y; Ubel, Peter A; MacRae, Calum A; Seidman, Christine E; Murray, Michael F; McGuire, Amy L; Rehm, Heidi L; Green, Robert C

    2014-03-20

    Whole genome sequencing (WGS) is already being used in certain clinical and research settings, but its impact on patient well-being, health-care utilization, and clinical decision-making remains largely unstudied. It is also unknown how best to communicate sequencing results to physicians and patients to improve health. We describe the design of the MedSeq Project: the first randomized trials of WGS in clinical care. This pair of randomized controlled trials compares WGS to standard of care in two clinical contexts: (a) disease-specific genomic medicine in a cardiomyopathy clinic and (b) general genomic medicine in primary care. We are recruiting 8 to 12 cardiologists, 8 to 12 primary care physicians, and approximately 200 of their patients. Patient participants in both the cardiology and primary care trials are randomly assigned to receive a family history assessment with or without WGS. Our laboratory delivers a genome report to physician participants that balances the needs to enhance understandability of genomic information and to convey its complexity. We provide an educational curriculum for physician participants and offer them a hotline to genetics professionals for guidance in interpreting and managing their patients' genome reports. Using varied data sources, including surveys, semi-structured interviews, and review of clinical data, we measure the attitudes, behaviors and outcomes of physician and patient participants at multiple time points before and after the disclosure of these results. The impact of emerging sequencing technologies on patient care is unclear. We have designed a process of interpreting WGS results and delivering them to physicians in a way that anticipates how we envision genomic medicine will evolve in the near future. That is, our WGS report provides clinically relevant information while communicating the complexity and uncertainty of WGS results to physicians and, through physicians, to their patients. This project will not only

  4. Genome Sequence of Brevibacillus formosus F12T for a Genome-Sequencing Project for Genomic Taxonomy and Phylogenomics of Bacillus-Like Bacteria

    PubMed Central

    Wang, Jie-Ping; Liu, Guo-Hong; Chen, Qian-qian; Zhu, Yu-jing; Chen, Zheng; Che, Jian-mei

    2015-01-01

    Brevibacillus formosus F12T is a Gram-positive, spore-forming, and strictly aerobic bacterium. Here, we report the draft 6.215-Mb genome sequence of B. formosus F12T, which will provide useful information for genomic taxonomy and phylogenomics of Bacillus-like bacteria, as well as for the functional gene mining and application of B. formosus. PMID:26205874

  5. Sequencing technologies and genome sequencing.

    PubMed

    Pareek, Chandra Shekhar; Smoczynski, Rafal; Tretyn, Andrzej

    2011-11-01

    The high-throughput - next generation sequencing (HT-NGS) technologies are currently the hottest topic in the field of human and animals genomics researches, which can produce over 100 times more data compared to the most sophisticated capillary sequencers based on the Sanger method. With the ongoing developments of high throughput sequencing machines and advancement of modern bioinformatics tools at unprecedented pace, the target goal of sequencing individual genomes of living organism at a cost of $1,000 each is seemed to be realistically feasible in the near future. In the relatively short time frame since 2005, the HT-NGS technologies are revolutionizing the human and animal genome researches by analysis of chromatin immunoprecipitation coupled to DNA microarray (ChIP-chip) or sequencing (ChIP-seq), RNA sequencing (RNA-seq), whole genome genotyping, genome wide structural variation, de novo assembling and re-assembling of genome, mutation detection and carrier screening, detection of inherited disorders and complex human diseases, DNA library preparation, paired ends and genomic captures, sequencing of mitochondrial genome and personal genomics. In this review, we addressed the important features of HT-NGS like, first generation DNA sequencers, birth of HT-NGS, second generation HT-NGS platforms, third generation HT-NGS platforms: including single molecule Heliscope™, SMRT™ and RNAP sequencers, Nanopore, Archon Genomics X PRIZE foundation, comparison of second and third HT-NGS platforms, applications, advances and future perspectives of sequencing technologies on human and animal genome research.

  6. Rhipicephalus microplus strain Deutsch, whole genome shotgun sequencing project Version 2

    USDA-ARS?s Scientific Manuscript database

    The cattle tick, Rhipicephalus (Boophilus) microplus, has a genome over 2.4 times the size of the human genome, and with over 70% of repetitive DNA, this genome would prove very costly to sequence at today's prices and difficult to assemble and analyze. Cot filtration/selection techniques were used ...

  7. Perspectives from the Avian Phylogenomics Project: Questions that Can Be Answered with Sequencing All Genomes of a Vertebrate Class.

    PubMed

    Jarvis, Erich D

    2016-01-01

    The rapid pace of advances in genome technology, with concomitant reductions in cost, makes it feasible that one day in our lifetime we will have available extant genomes of entire classes of species, including vertebrates. I recently helped cocoordinate the large-scale Avian Phylogenomics Project, which collected and sequenced genomes of 48 bird species representing most currently classified orders to address a range of questions in phylogenomics and comparative genomics. The consortium was able to answer questions not previously possible with just a few genomes. This success spurred on the creation of a project to sequence the genomes of at least one individual of all extant ∼10,500 bird species. The initiation of this project has led us to consider what questions now impossible to answer could be answered with all genomes, and could drive new questions now unimaginable. These include the generation of a highly resolved family tree of extant species, genome-wide association studies across species to identify genetic substrates of many complex traits, redefinition of species and the species concept, reconstruction of the genomes of common ancestors, and generation of new computational tools to address these questions. Here I present visions for the future by posing and answering questions regarding what scientists could potentially do with available genomes of an entire vertebrate class.

  8. Sequencing and characterizing the genome of Estrella lausannensis as an undergraduate project: training students and biological insights

    PubMed Central

    Bertelli, Claire; Aeby, Sébastien; Chassot, Bérénice; Clulow, James; Hilfiker, Olivier; Rappo, Samuel; Ritzmann, Sébastien; Schumacher, Paolo; Terrettaz, Céline; Benaglio, Paola; Falquet, Laurent; Farinelli, Laurent; Gharib, Walid H.; Goesmann, Alexander; Harshman, Keith; Linke, Burkhard; Miyazaki, Ryo; Rivolta, Carlo; Robinson-Rechavi, Marc; van der Meer, Jan Roelof; Greub, Gilbert

    2015-01-01

    With the widespread availability of high-throughput sequencing technologies, sequencing projects have become pervasive in the molecular life sciences. The huge bulk of data generated daily must be analyzed further by biologists with skills in bioinformatics and by “embedded bioinformaticians,” i.e., bioinformaticians integrated in wet lab research groups. Thus, students interested in molecular life sciences must be trained in the main steps of genomics: sequencing, assembly, annotation and analysis. To reach that goal, a practical course has been set up for master students at the University of Lausanne: the “Sequence a genome” class. At the beginning of the academic year, a few bacterial species whose genome is unknown are provided to the students, who sequence and assemble the genome(s) and perform manual annotation. Here, we report the progress of the first class from September 2010 to June 2011 and the results obtained by seven master students who specifically assembled and annotated the genome of Estrella lausannensis, an obligate intracellular bacterium related to Chlamydia. The draft genome of Estrella is composed of 29 scaffolds encompassing 2,819,825 bp that encode for 2233 putative proteins. Estrella also possesses a 9136 bp plasmid that encodes for 14 genes, among which we found an integrase and a toxin/antitoxin module. Like all other members of the Chlamydiales order, Estrella possesses a highly conserved type III secretion system, considered as a key virulence factor. The annotation of the Estrella genome also allowed the characterization of the metabolic abilities of this strictly intracellular bacterium. Altogether, the students provided the scientific community with the Estrella genome sequence and a preliminary understanding of the biology of this recently-discovered bacterial genus, while learning to use cutting-edge technologies for sequencing and to perform bioinformatics analyses. PMID:25745418

  9. The human genome project.

    PubMed Central

    Olson, M V

    1993-01-01

    The Human Genome Project in the United States is now well underway. Its programmatic direction was largely set by a National Research Council report issued in 1988. The broad framework supplied by this report has survived almost unchanged despite an upheaval in the technology of genome analysis. This upheaval has primarily affected physical and genetic mapping, the two dominant activities in the present phase of the project. Advances in mapping techniques have allowed good progress toward the specific goals of the project and are also providing strong corollary benefits throughout biomedical research. Actual DNA sequencing of the genomes of the human and model organisms is still at an early stage. There has been little progress in the intrinsic efficiency of DNA-sequence determination. However, refinements in experimental protocols, instrumentation, and project management have made it practical to acquire sequence data on an enlarged scale. It is also increasingly apparent that DNA-sequence data provide a potent means of relating knowledge gained from the study of model organisms to human biology. There is as yet little indication that the infusion of technology from outside biology into the Human Genome Project has been effectively stimulated. Opportunities in this area remain large, posing substantial technical and policy challenges. PMID:8506271

  10. CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects

    PubMed Central

    Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf

    2014-01-01

    CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB PMID:25281234

  11. CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects.

    PubMed

    Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf

    2014-01-01

    CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB. © The Author(s) 2014. Published by Oxford University Press.

  12. Genomic Sequencing in Cancer

    PubMed Central

    Tuna, Musaffe; Amos, Christopher I.

    2013-01-01

    Genomic sequencing has provided critical insights into the etiology of both simple and complex diseases. The enormous reductions in cost for whole genome sequencing have allowed this technology to gain increasing use. Whole genome analysis has impacted research of complex diseases including cancer by allowing the systematic analysis of entire genomes in a single experiment, thereby facilitating the discovery of somatic and germline mutations, and identification of the function and impact of the insertions, deletions, and structural rearrangements, including translocations and inversions, in novel disease genes. Whole-genome sequencing can be used to provide the most comprehensive characterization of the cancer genome, the complexity of which we are only beginning to understand. Hence in this review, we focus on whole-genome sequencing in cancer. PMID:23178448

  13. All about the Human Genome Project (HGP)

    MedlinePlus

    ... Genome Resources Access to the full human sequence All About The Human Genome Project (HGP) The Human ... an international research effort to sequence and map all of the genes - together known as the genome - ...

  14. Sequencing crop genomes: approaches and applications

    USDA-ARS?s Scientific Manuscript database

    Plant genome sequencing methodology parrallels the sequencing of the human genome. The first projects were slow and very expensive. BAC by BAC approaches were utilized first and whole-genome shotgun sequencing rapidly replaced that approach. So called 'next generation' technologies such as short rea...

  15. The Tip of the Iceberg: Clinical Implications of Genomic Sequencing Projects in Head and Neck Cancer

    PubMed Central

    Birkeland, Andrew C.; Ludwig, Megan L.; Meraj, Taha S.; Brenner, J. Chad; Prince, Mark E.

    2015-01-01

    Recent genomic sequencing studies have provided valuable insight into genetic aberrations in head and neck squamous cell carcinoma. Despite these great advances, certain hurdles exist in translating genomic findings to clinical care. Further correlation of genetic findings to clinical outcomes, additional analyses of subgroups of head and neck cancers and follow-up investigation into genetic heterogeneity are needed. While the development of targeted therapy trials is of key importance, numerous challenges exist in establishing and optimizing such programs. This review discusses potential upcoming steps for further genetic evaluation of head and neck cancers and implementation of genetic findings into precision medicine trials. PMID:26506389

  16. The Giardia genome project database.

    PubMed

    McArthur, A G; Morrison, H G; Nixon, J E; Passamaneck, N Q; Kim, U; Hinkle, G; Crocker, M K; Holder, M E; Farr, R; Reich, C I; Olsen, G E; Aley, S B; Adam, R D; Gillin, F D; Sogin, M L

    2000-08-15

    The Giardia genome project database provides an online resource for Giardia lamblia (WB strain, clone C6) genome sequence information. The database includes edited single-pass reads, the results of BLASTX searches, and details of progress towards sequencing the entire 12 million-bp Giardia genome. Pre-sorted BLASTX results can be retrieved based on keyword searches and BLAST searches of the high throughput Giardia data can be initiated from the web site or through NCBI. Descriptions of the genomic DNA libraries, project protocols and summary statistics are also available. Although the Giardia genome project is ongoing, new sequences are made available on a bi-monthly basis to ensure that researchers have access to information that may assist them in the search for genes and their biological function. The current URL of the Giardia genome project database is www.mbl.edu/Giardia.

  17. Genetic evolution of pancreatic cancer: lessons learnt from the pancreatic cancer genome sequencing project

    PubMed Central

    Iacobuzio-Donahue, Christine A

    2012-01-01

    Pancreatic cancer is a disease caused by the accumulation of genetic alterations in specific genes. Elucidation of the human genome sequence, in conjunction with technical advances in the ability to perform whole exome sequencing, have provided new insight into the mutational spectra characteristic of this lethal tumour type. Most recently, exomic sequencing has been used to clarify the clonal evolution of pancreatic cancer as well as provide time estimates of pancreatic carcinogenesis, indicating that a long window of opportunity may exist for early detection of this disease while in the curative stage. Moving forward, these mutational analyses indicate potential targets for personalised diagnostic and therapeutic intervention as well as the optimal timing for intervention based on the natural history of pancreatic carcinogenesis and progression. PMID:21749982

  18. Whole-exome/genome sequencing and genomics.

    PubMed

    Grody, Wayne W; Thompson, Barry H; Hudgins, Louanne

    2013-12-01

    As medical genetics has progressed from a descriptive entity to one focused on the functional relationship between genes and clinical disorders, emphasis has been placed on genomics. Genomics, a subelement of genetics, is the study of the genome, the sum total of all the genes of an organism. The human genome, which is contained in the 23 pairs of nuclear chromosomes and in the mitochondrial DNA of each cell, comprises >6 billion nucleotides of genetic code. There are some 23,000 protein-coding genes, a surprisingly small fraction of the total genetic material, with the remainder composed of noncoding DNA, regulatory sequences, and introns. The Human Genome Project, launched in 1990, produced a draft of the genome in 2001 and then a finished sequence in 2003, on the 50th anniversary of the initial publication of Watson and Crick's paper on the double-helical structure of DNA. Since then, this mass of genetic information has been translated at an ever-increasing pace into useable knowledge applicable to clinical medicine. The recent advent of massively parallel DNA sequencing (also known as shotgun, high-throughput, and next-generation sequencing) has brought whole-genome analysis into the clinic for the first time, and most of the current applications are directed at children with congenital conditions that are undiagnosable by using standard genetic tests for single-gene disorders. Thus, pediatricians must become familiar with this technology, what it can and cannot offer, and its technical and ethical challenges. Here, we address the concepts of human genomic analysis and its clinical applicability for primary care providers.

  19. The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes

    PubMed Central

    Pruitt, Kim D.; Harrow, Jennifer; Harte, Rachel A.; Wallin, Craig; Diekhans, Mark; Maglott, Donna R.; Searle, Steve; Farrell, Catherine M.; Loveland, Jane E.; Ruef, Barbara J.; Hart, Elizabeth; Suner, Marie-Marthe; Landrum, Melissa J.; Aken, Bronwen; Ayling, Sarah; Baertsch, Robert; Fernandez-Banet, Julio; Cherry, Joshua L.; Curwen, Val; DiCuccio, Michael; Kellis, Manolis; Lee, Jennifer; Lin, Michael F.; Schuster, Michael; Shkeda, Andrew; Amid, Clara; Brown, Garth; Dukhanina, Oksana; Frankish, Adam; Hart, Jennifer; Maidak, Bonnie L.; Mudge, Jonathan; Murphy, Michael R.; Murphy, Terence; Rajan, Jeena; Rajput, Bhanu; Riddick, Lillian D.; Snow, Catherine; Steward, Charles; Webb, David; Weber, Janet A.; Wilming, Laurens; Wu, Wenyu; Birney, Ewan; Haussler, David; Hubbard, Tim; Ostell, James; Durbin, Richard; Lipman, David

    2009-01-01

    Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions. PMID:19498102

  20. Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains

    PubMed Central

    Lewis, Tony E.; Sillitoe, Ian; Andreeva, Antonina; Blundell, Tom L.; Buchan, Daniel W.A.; Chothia, Cyrus; Cuff, Alison; Dana, Jose M.; Filippis, Ioannis; Gough, Julian; Hunter, Sarah; Jones, David T.; Kelley, Lawrence A.; Kleywegt, Gerard J.; Minneci, Federico; Mitchell, Alex; Murzin, Alexey G.; Ochoa-Montaño, Bernardo; Rackham, Owen J. L.; Smith, James; Sternberg, Michael J. E.; Velankar, Sameer; Yeats, Corin; Orengo, Christine

    2013-01-01

    Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence–structure–function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker’s yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs). PMID:23203986

  1. A locally funded Puerto Rican parrot (Amazona vittata) genome sequencing project increases avian data and advances young researcher education.

    PubMed

    Oleksyk, Taras K; Pombert, Jean-Francois; Siu, Daniel; Mazo-Vargas, Anyimilehidi; Ramos, Brian; Guiblet, Wilfried; Afanador, Yashira; Ruiz-Rodriguez, Christina T; Nickerson, Michael L; Logue, David M; Dean, Michael; Figueroa, Luis; Valentin, Ricardo; Martinez-Cruzado, Juan-Carlos

    2012-09-28

    Amazona vittata is a critically endangered Puerto Rican endemic bird, the only surviving native parrot species in the United States territory, and the first parrot in the large Neotropical genus Amazona, to be studied on a genomic scale. In a unique community-based funded project, DNA from an A. vittata female was sequenced using a HiSeq Illumina platform, resulting in a total of ~42.5 billion nucleotide bases. This provided approximately 26.89x average coverage depth at the completion of this funding phase. Filtering followed by assembly resulted in 259,423 contigs (N50 = 6,983 bp, longest = 75,003 bp), which was further scaffolded into 148,255 fragments (N50 = 19,470, longest = 206,462 bp). This provided ~76% coverage of the genome based on an estimated size of 1.58 Gb. The assembled scaffolds allowed basic genomic annotation and comparative analyses with other available avian whole-genome sequences. The current data represents the first genomic information from and work carried out with a unique source of funding. This analysis further provides a means for directed training of young researchers in genetic and bioinformatics analyses and will facilitate progress towards a full assembly and annotation of the Puerto Rican parrot genome. It also adds extensive genomic data to a new branch of the avian tree, making it useful for comparative analyses with other avian species. Ultimately, the knowledge acquired from these data will contribute to an improved understanding of the overall population health of this species and aid in ongoing and future conservation efforts.

  2. A locally funded Puerto Rican parrot (Amazona vittata) genome sequencing project increases avian data and advances young researcher education

    PubMed Central

    2012-01-01

    Background Amazona vittata is a critically endangered Puerto Rican endemic bird, the only surviving native parrot species in the United States territory, and the first parrot in the large Neotropical genus Amazona, to be studied on a genomic scale. Findings In a unique community-based funded project, DNA from an A. vittata female was sequenced using a HiSeq Illumina platform, resulting in a total of ~42.5 billion nucleotide bases. This provided approximately 26.89x average coverage depth at the completion of this funding phase. Filtering followed by assembly resulted in 259,423 contigs (N50 = 6,983 bp, longest = 75,003 bp), which was further scaffolded into 148,255 fragments (N50 = 19,470, longest = 206,462 bp). This provided ~76% coverage of the genome based on an estimated size of 1.58 Gb. The assembled scaffolds allowed basic genomic annotation and comparative analyses with other available avian whole-genome sequences. Conclusions The current data represents the first genomic information from and work carried out with a unique source of funding. This analysis further provides a means for directed training of young researchers in genetic and bioinformatics analyses and will facilitate progress towards a full assembly and annotation of the Puerto Rican parrot genome. It also adds extensive genomic data to a new branch of the avian tree, making it useful for comparative analyses with other avian species. Ultimately, the knowledge acquired from these data will contribute to an improved understanding of the overall population health of this species and aid in ongoing and future conservation efforts. PMID:23587420

  3. Development of a high density integrated reference genetic linkage map for the multinational Brassica rapa Genome Sequencing Project.

    PubMed

    Li, Xiaonan; Ramchiary, Nirala; Choi, Su Ryun; Van Nguyen, Dan; Hossain, Md Jamil; Yang, Hyeon Kook; Lim, Yong Pyo

    2010-11-01

    We constructed a high-density Brassica rapa integrated linkage map by combining a reference genetic map of 78 doubled haploid lines derived from Chiifu-401-42 × Kenshin (CKDH) and a new map of 190 F2 lines derived from Chiifu-401-42 × rapid cycling B. rapa (CRF2). The integrated map contains 1017 markers and covers 1262.0 cM of the B. rapa genome, with an average interlocus distance of 1.24 cM. High similarity of marker order and position was observed among the linkage groups of the maps with few short-distance inversions. In total, 155 simple sequence repeat (SSR) markers, anchored to 102 new bacterial artificial chromosomes (BACs) and 146 intron polymorphic (IP) markers were mapped in the integrated map, which would be helpful to align the sequenced BACs in the ongoing multinational Brassica rapa Genome Sequencing Project (BrGSP). Further, comparison of the B. rapa consensus map with the 10 B. juncea A-genome linkage groups by using 98 common IP markers showed high-degree colinearity between the A-genome linkage groups, except for few markers showing inversion or translocation. Suggesting that chromosomes are highly conserved between these Brassica species, although they evolved independently after divergence. The sequence information coming out of BrGSP would be useful for B. juncea breeding. and the identified Arabidopsis chromosomal blocks and known quantitative trait loci (QTL) information of B. juncea could be applied to improve other Brassica crops including B. rapa.

  4. Prenatal Whole Genome Sequencing

    PubMed Central

    Donley, Greer; Hull, Sara Chandros; Berkman, Benjamin E.

    2014-01-01

    With whole genome sequencing set to become the preferred method of prenatal screening, we need to pay more attention to the massive amount of information it will deliver to parents—and the fact that we don't yet understand what most of it means. PMID:22777977

  5. Sequencing Intractable DNA to Close Microbial Genomes

    SciTech Connect

    Hurt, Jr., Richard Ashley; Brown, Steven D; Podar, Mircea; Palumbo, Anthony Vito; Elias, Dwayne A

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  6. Toward nanoscale genome sequencing.

    PubMed

    Ryan, Declan; Rahimi, Maryam; Lund, John; Mehta, Ranjana; Parviz, Babak A

    2007-09-01

    This article reports on the state-of-the-art technologies that sequence DNA using miniaturized devices. The article considers the miniaturization of existing technologies for sequencing DNA and the opportunities for cost reduction that 'on-chip' devices can deliver. The ability to construct nano-scale structures and perform measurements using novel nano-scale effects has provided new opportunities to identify nucleotides directly using physical, and not chemical, methods. The challenges that these technologies need to overcome to provide a US$1000-genome sequencing technology are also presented.

  7. 11,670 whole-genome sequences representative of the Han Chinese population from the CONVERGE project

    PubMed Central

    Cai, Na; Bigdeli, Tim B.; Kretzschmar, Warren W.; Li, Yihan; Liang, Jieqin; Hu, Jingchu; Peterson, Roseann E.; Bacanu, Silviu; Webb, Bradley Todd; Riley, Brien; Li, Qibin; Marchini, Jonathan; Mott, Richard; Kendler, Kenneth S.; Flint, Jonathan

    2017-01-01

    The China, Oxford and Virginia Commonwealth University Experimental Research on Genetic Epidemiology (CONVERGE) project on Major Depressive Disorder (MDD) sequenced 11,670 female Han Chinese at low-coverage (1.7X), providing the first large-scale whole genome sequencing resource representative of the largest ethnic group in the world. Samples are collected from 58 hospitals from 23 provinces around China. We are able to call 22 million high quality single nucleotide polymorphisms (SNP) from the nuclear genome, representing the largest SNP call set from an East Asian population to date. We use these variants for imputation of genotypes across all samples, and this has allowed us to perform a successful genome wide association study (GWAS) on MDD. The utility of these data can be extended to studies of genetic ancestry in the Han Chinese and evolutionary genetics when integrated with data from other populations. Molecular phenotypes, such as copy number variations and structural variations can be detected, quantified and analysed in similar ways. PMID:28195579

  8. 11,670 whole-genome sequences representative of the Han Chinese population from the CONVERGE project.

    PubMed

    Cai, Na; Bigdeli, Tim B; Kretzschmar, Warren W; Li, Yihan; Liang, Jieqin; Hu, Jingchu; Peterson, Roseann E; Bacanu, Silviu; Webb, Bradley Todd; Riley, Brien; Li, Qibin; Marchini, Jonathan; Mott, Richard; Kendler, Kenneth S; Flint, Jonathan

    2017-02-14

    The China, Oxford and Virginia Commonwealth University Experimental Research on Genetic Epidemiology (CONVERGE) project on Major Depressive Disorder (MDD) sequenced 11,670 female Han Chinese at low-coverage (1.7X), providing the first large-scale whole genome sequencing resource representative of the largest ethnic group in the world. Samples are collected from 58 hospitals from 23 provinces around China. We are able to call 22 million high quality single nucleotide polymorphisms (SNP) from the nuclear genome, representing the largest SNP call set from an East Asian population to date. We use these variants for imputation of genotypes across all samples, and this has allowed us to perform a successful genome wide association study (GWAS) on MDD. The utility of these data can be extended to studies of genetic ancestry in the Han Chinese and evolutionary genetics when integrated with data from other populations. Molecular phenotypes, such as copy number variations and structural variations can be detected, quantified and analysed in similar ways.

  9. Sequencing the maize genome.

    PubMed

    Martienssen, Robert A; Rabinowicz, Pablo D; O'Shaughnessy, Andrew; McCombie, W Richard

    2004-04-01

    Sequencing of complex genomes can be accomplished by enriching shotgun libraries for genes. In maize, gene-enrichment by copy-number normalization (high C(0)t) and methylation filtration (MF) have been used to generate up to two-fold coverage of the gene-space with less than 1 million sequencing reads. Simulations using sequenced bacterial artificial chromosome (BAC) clones predict that 5x coverage of gene-rich regions, accompanied by less than 1x coverage of subclones from BAC contigs, will generate high-quality mapped sequence that meets the needs of geneticists while accommodating unusually high levels of structural polymorphism. By sequencing several inbred strains, we propose a strategy for capturing this polymorphism to investigate hybrid vigor or heterosis.

  10. Towards a reference pecan genome sequence

    USDA-ARS?s Scientific Manuscript database

    The cost of generating DNA sequence data has declined dramatically over the previous 15 years as a result of the Human Genome Project and the potential applications of genome sequencing for human medicine. This cost reduction has generated renewed interest among crop breeding scientists in applying...

  11. Defining Genome Project Standards in a New Era of Sequencing (GSC8 Meeting)

    ScienceCinema

    Chain, Patrick [DOE JGI

    2016-07-12

    The Genomic Standards Consortium was formed in September 2005. It is an international, open-membership working body which promotes standardization in the description of genomes and the exchange and integration of genomic data. The 2009 meeting was an activity of a five-year funding "Research Coordination Network" from the National Science Foundation and was organized held at the DOE Joint Genome Institute with organizational support provided by the JGI and by the University of California - San Diego.

  12. The snail (Biomphalaria glabrata) genome project.

    PubMed

    Raghavan, Nithya; Knight, Matty

    2006-04-01

    In 2001, ideas for a snail genome project were discussed at the American Society of Parasitologists meeting (New Mexico) and a snail genome consortium was subsequently established (the first consortium meeting was held in 2005). A proposal for sequencing the snail genome was submitted to the National Human Genome Research Institute, and Biomphalaria glabrata was prioritized as a non-mammalian sequencing target in 2004. The sequencing of the genome of this medically important snail is now underway.

  13. Snake Genome Sequencing: Results and Future Prospects

    PubMed Central

    Kerkkamp, Harald M. I.; Kini, R. Manjunatha; Pospelov, Alexey S.; Vonk, Freek J.; Henkel, Christiaan V.; Richardson, Michael K.

    2016-01-01

    Snake genome sequencing is in its infancy—very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression. PMID:27916957

  14. The Cancer Genome Anatomy Project: EST Sequencing and the Genetics of Cancer Progression1

    PubMed Central

    Krizman, David B; Wagner, Lukas; Lash, Alex; Strausberg, Robert L; Emmert-Buck, Michael R

    1999-01-01

    Abstract As the process of tumor progression proceeds from the normal cellular state to a preneoplastic condition and finally to the fully invasive form, the molecular characteristics of the cell change as well. These characteristics can be considered a molecular fingerprint of the cell at each stage of progression and, analogous to fingerprinting a criminal, can be used as markers of the progression process. Based on this premise, the Cancer Genome Anatomy Project was initiated with the broad goal of determining the comprehensive molecular characterization of normal, premalignant, and malignant tumor cells, thus making a reality the identification of all major cellular mechanisms leading to tumor initiation and progression ([Strausberg, R.L., Dahl, C.A., and Klausner, R.D. (1997). “New opportunities for uncovering the molecular basis of cancer.” Nat. Genet., 16: 415–516.], www.ncbi.nlm.nih.gov/ncicgap/). The expectation of determining the genetic fingerprints of cancer progression will allow for 1) correlation of disease progression with therapeutic outcome; 2) improved evaluation of disease treatment; 3) stimulation of novel approaches to prevention, detection, and therapy; and 4) enhanced diagnostic tools for clinical applications. Whereas acquiring the comprehensive molecular analysis of cancer progression may take years, results from initial, short-term goals are currently being realized and are proving very fruitful. PMID:10933042

  15. The Cancer Genome Anatomy Project: EST sequencing and the genetics of cancer progression.

    PubMed

    Krizman, D B; Wagner, L; Lash, A; Strausberg, R L; Emmert-Buck, M R

    1999-06-01

    As the process of tumor progression proceeds from the normal cellular state to a preneoplastic condition and finally to the fully invasive form, the molecular characteristics of the cell change as well. These characteristics can be considered a molecular fingerprint of the cell at each stage of progression and, analogous to fingerprinting a criminal, can be used as markers of the progression process. Based on this premise, the Cancer Genome Anatomy Project was initiated with the broad goal of determining the comprehensive molecular characterization of normal, premalignant, and malignant tumor cells, thus making a reality the identification of all major cellular mechanisms leading to tumor initiation and progression ([Strausberg, R.L., Dahl, C.A., and Klausner, R.D. (1997). "New opportunities for uncovering the molecular basis of cancer." Nat. Genet., 16: 415-516.], www.ncbi.nlm.nih.gov/ncicgap/). The expectation of determining the genetic fingerprints of cancer progression will allow for 1) correlation of disease progression with therapeutic outcome; 2) improved evaluation of disease treatment; 3) stimulation of novel approaches to prevention, detection, and therapy; and 4) enhanced diagnostic tools for clinical applications. Whereas acquiring the comprehensive molecular analysis of cancer progression may take years, results from initial, short-term goals are currently being realized and are proving very fruitful.

  16. The human genome project

    SciTech Connect

    Yager, T.D.; Zewert, T.E.; Hood, L.E. )

    1994-04-01

    The Human Genome Project (HGP) is a coordinated worldwide effort to precisely map the human genome and the genomes of selected model organisms. The first explicit proposal for this project dates from 1985 although its foundations (both conceptual and technological) can be traced back many years in genetics, molecular biology, and biotechnology. The HGP has matured rapidly and is producing results of great significance.

  17. Towards Sequencing Cotton (Gossypium) Genomes

    USDA-ARS?s Scientific Manuscript database

    Despite rapidly decreasing costs and innovative technologies, sequencing of angiosperm genomes is not yet undertaken lightly. Generating larger amounts of sequence data more quickly does not address the difficulties of sequencing and assembling complex genomes de novo. The cotton genomes represent a...

  18. The Pediatric Cancer Genome Project

    PubMed Central

    Downing, James R; Wilson, Richard K; Zhang, Jinghui; Mardis, Elaine R; Pui, Ching-Hon; Ding, Li; Ley, Timothy J; Evans, William E

    2013-01-01

    The St. Jude Children’s Research Hospital–Washington University Pediatric Cancer Genome Project (PCGP) is participating in the international effort to identify somatic mutations that drive cancer. These cancer genome sequencing efforts will not only yield an unparalleled view of the altered signaling pathways in cancer but should also identify new targets against which novel therapeutics can be developed. Although these projects are still deep in the phase of generating primary DNA sequence data, important results are emerging and valuable community resources are being generated that should catalyze future cancer research. We describe here the rationale for conducting the PCGP, present some of the early results of this project and discuss the major lessons learned and how these will affect the application of genomic sequencing in the clinic. PMID:22641210

  19. Patients’ perceived utility of whole-genome sequencing for their healthcare: findings from the MedSeq project

    PubMed Central

    Lupo, Philip J; Robinson, Jill O; Diamond, Pamela M; Jamal, Leila; Danysh, Heather E; Blumenthal-Barby, Jennifer; Lehmann, Lisa Soleymani; Vassy, Jason L; Christensen, Kurt D; Green, Robert C; McGuire, Amy L

    2016-01-01

    Aim To evaluate patients’ expectations regarding the perceived utility of whole-genome sequencing (WGS). Materials & methods We used latent class analysis to characterize individuals enrolled in the MedSeq Project based on their perceived utility of WGS. Multinomial logistic regression was used to evaluate associations between participant characteristics and latent classes. Results Findings characterized participants into one of three perceived utility groups: enthusiasts, who had a high probability of agreement with all utility items (23%); health conscious, who perceived utility in medically related areas (60%) or skeptics, who had a low probability of agreement with utility items (17%). Trust significantly predicted latent class. Conclusion Understanding differences in perceived utility of WGS may inform strategies for uptake of this technology. PMID:27019659

  20. Multiplexed fragaria chloroplast genome sequencing

    Treesearch

    W. Njuguna; A. Liston; R. Cronn; N.V. Bassil

    2010-01-01

    A method to sequence multiple chloroplast genomes using ultra high throughput sequencing technologies was recently described. Complete chloroplast genome sequences can resolve phylogenetic relationships at low taxonomic levels and identify informative point mutations and indels. The objective of this research was to sequence multiple Fragaria...

  1. Deep whole-genome sequencing of 90 Han Chinese genomes.

    PubMed

    Lan, Tianming; Lin, Haoxiang; Zhu, Wenjuan; Laurent, Tellier Christian Asker Melchior; Yang, Mengcheng; Liu, Xin; Wang, Jun; Wang, Jian; Yang, Huanming; Xu, Xun; Guo, Xiaosen

    2017-09-01

    Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency < 5%), including 5 813 503 single nucleotide polymorphisms, 1 169 199 InDels, and 17 927 structural variants. Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the

  2. Genome Sequence Databases (Overview): Sequencing and Assembly

    SciTech Connect

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  3. Opportunities and challenges for the integration of massively parallel genomic sequencing into clinical practice: lessons from the ClinSeq project.

    PubMed

    Biesecker, Leslie G

    2012-04-01

    The debate surrounding the return of results from high-throughput genomic interrogation encompasses many important issues including ethics, law, economics, and social policy. As well, the debate is also informed by the molecular, genetic, and clinical foundations of the emerging field of clinical genomics, which is based on this new technology. This article outlines the main biomedical considerations of sequencing technologies and demonstrates some of the early clinical experiences with the technology to enable the debate to stay focused on real-world practicalities. These experiences are based on early data from the ClinSeq project, which is a project to pilot the use of massively parallel sequencing in a clinical research context with a major aim to develop modes of returning results to individual subjects. The study has enrolled >900 subjects and generated exome sequence data on 572 subjects. These data are beginning to be interpreted and returned to the subjects, which provides examples of the potential usefulness and pitfalls of clinical genomics. There are numerous genetic results that can be readily derived from a genome including rare, high-penetrance traits, and carrier states. However, much work needs to be done to develop the tools and resources for genomic interpretation. The main lesson learned is that a genome sequence may be better considered as a health-care resource, rather than a test, one that can be interpreted and used over the lifetime of the patient.

  4. Motivations, concerns and preferences of personal genome sequencing research participants: Baseline findings from the HealthSeq project

    PubMed Central

    Sanderson, Saskia C; Linderman, Michael D; Suckiel, Sabrina A; Diaz, George A; Zinberg, Randi E; Ferryman, Kadija; Wasserstein, Melissa; Kasarskis, Andrew; Schadt, Eric E

    2016-01-01

    Whole exome/genome sequencing (WES/WGS) is increasingly offered to ostensibly healthy individuals. Understanding the motivations and concerns of research participants seeking out personal WGS and their preferences regarding return-of-results and data sharing will help optimize protocols for WES/WGS. Baseline interviews including both qualitative and quantitative components were conducted with research participants (n=35) in the HealthSeq project, a longitudinal cohort study of individuals receiving personal WGS results. Data sharing preferences were recorded during informed consent. In the qualitative interview component, the dominant motivations that emerged were obtaining personal disease risk information, satisfying curiosity, contributing to research, self-exploration and interest in ancestry, and the dominant concern was the potential psychological impact of the results. In the quantitative component, 57% endorsed concerns about privacy. Most wanted to receive all personal WGS results (94%) and their raw data (89%); a third (37%) consented to having their data shared to the Database of Genotypes and Phenotypes (dbGaP). Early adopters of personal WGS in the HealthSeq project express a variety of health- and non-health-related motivations. Almost all want all available findings, while also expressing concerns about the psychological impact and privacy of their results. PMID:26036856

  5. Motivations, concerns and preferences of personal genome sequencing research participants: Baseline findings from the HealthSeq project.

    PubMed

    Sanderson, Saskia C; Linderman, Michael D; Suckiel, Sabrina A; Diaz, George A; Zinberg, Randi E; Ferryman, Kadija; Wasserstein, Melissa; Kasarskis, Andrew; Schadt, Eric E

    2016-01-01

    Whole exome/genome sequencing (WES/WGS) is increasingly offered to ostensibly healthy individuals. Understanding the motivations and concerns of research participants seeking out personal WGS and their preferences regarding return-of-results and data sharing will help optimize protocols for WES/WGS. Baseline interviews including both qualitative and quantitative components were conducted with research participants (n=35) in the HealthSeq project, a longitudinal cohort study of individuals receiving personal WGS results. Data sharing preferences were recorded during informed consent. In the qualitative interview component, the dominant motivations that emerged were obtaining personal disease risk information, satisfying curiosity, contributing to research, self-exploration and interest in ancestry, and the dominant concern was the potential psychological impact of the results. In the quantitative component, 57% endorsed concerns about privacy. Most wanted to receive all personal WGS results (94%) and their raw data (89%); a third (37%) consented to having their data shared to the Database of Genotypes and Phenotypes (dbGaP). Early adopters of personal WGS in the HealthSeq project express a variety of health- and non-health-related motivations. Almost all want all available findings, while also expressing concerns about the psychological impact and privacy of their results.

  6. The 1000 bull genome project

    USDA-ARS?s Scientific Manuscript database

    To meet growing global demands for high value protein from milk and meat, rates of genetic gain in domestic cattle must be accelerated. At the same time, animal health and welfare must be considered. The 1000 bull genomes project supports these goals by providing annotated sequence variants and ge...

  7. Biodiversity, genomes, and DNA sequence databases.

    PubMed

    Leipe, D D

    1996-12-01

    There are approximately 1.4 million organisms on this planet that have been described morphologically but there is no comparable coverage of biodiversity at the molecular level. Little more than 1% of the known species have been subject to any molecular scrutiny and eukaryotic genome projects have focused on a group of closely related model organisms. The past year, however, has seen an approximately 80% increase in the number of species represented in sequence databases and the completion of the sequencing of three prokaryotic genomes. Large-scale sequencing projects seem set to begin coverage of a wider range of the eukaryotic diversity, including green plants, microsporidians and diplomonads.

  8. From sequence mapping to genome assemblies.

    PubMed

    Otto, Thomas D

    2015-01-01

    The development of "next-generation" high-throughput sequencing technologies has made it possible for many labs to undertake sequencing-based research projects that were unthinkable just a few years ago. Although the scientific applications are diverse, e.g., new genome projects, gene expression analysis, genome-wide functional screens, or epigenetics-the sequence data are usually processed in one of two ways: sequence reads are either mapped to an existing reference sequence, or they are built into a new sequence ("de novo assembly"). In this chapter, we first discuss some limitations of the mapping process and how these may be overcome through local sequence assembly. We then introduce the concept of de novo assembly and describe essential assembly improvement procedures such as scaffolding, contig ordering, gap closure, error evaluation, gene annotation transfer and ab initio gene annotation. The results are high-quality draft assemblies that will facilitate informative downstream analyses.

  9. General method of rapid Smith/Birnstiel mapping adds for gap closure in shotgun microbial genome sequencing projects: application to Pseudomonas putida KT2440.

    PubMed

    Weinel, C; Tümmler, B; Hilbert, H; Nelson, K E; Kiewitz, C

    2001-11-15

    A physical mapping strategy has been developed to verify and accelerate the assembly and gap closure phase of a microbial genome shotgun-sequencing project. The protocol was worked out during the ongoing Pseudomonas putida KT2440 genome project. A macro-restriction map was constructed by linking probe hybridisation of SwaI- or I-CeuI-restricted chromosomes to serve as a backbone for the quick quality control of sequence and contig assemblies. The library of PCR-generated SwaI linking probes was derived from the sequence assembly after 3- and 6-fold genome coverage. In order to support gap closure in regions with ambiguous assemblies such as the repetitive sequence of the seven ribosomal operons, high-resolution Smith/Birnstiel maps were generated by Southern hybridisation of pulsed-field gel electrophoresis-separated rare-cutter complete/frequent-cutter partial digestions with rare-cutter fragment end probes. Overall 1.5 Mb of the 6.1 Mb P.putida KT2440 genome has been subjected to high-resolution physical mapping in order to align assemblies generated from shotgun sequencing.

  10. Targeted sequencing of plant genomes

    Treesearch

    Mark D. Huynh

    2014-01-01

    Next-generation sequencing (NGS) has revolutionized the field of genetics by providing a means for fast and relatively affordable sequencing. With the advancement of NGS, wholegenome sequencing (WGS) has become more commonplace. However, sequencing an entire genome is still not cost effective or even beneficial in all cases. In studies that do not require a whole-...

  11. Venturia carpophila draft genome sequence

    USDA-ARS?s Scientific Manuscript database

    Venturia carpophila causes peach scab, a disease that renders peach fruit unmarketable. We report a high-quality draft genome sequence (36.9 Mb) of V. carpophila from an isolate collected from a peach tree in central Georgia in the United States. The genome sequence described will be a useful resour...

  12. Detection of genes in Escherichia coli sequences determined by genome projects and prediction of protein production levels, based on multivariate diversity in codon usage.

    PubMed

    Kanaya, S; Kudo, Y; Nakamura, Y; Ikemura, T

    1996-06-01

    We used principal component analysis to develop measures (called Z-parameters in this study) which reflect the diversity of codon usage in Escherichia coli genes. Protein production levels for 1500 CDSs (protein-coding sequences) identified by E.coli genome projects in Japan and the US were estimated from a correlation equation between Z1 and cellular protein content obtained through analysis of the genes experimentally characterized. Through the profile analysis of Z1 for E.coli sequences obtained by the Japanese Project, we predicted an additional 36 CDSs that had not been annotated in the International DNA Database. Thirty-one out of the 36 CDSs could be assigned to presumptive protein genes through a BLASTX search for recent protein databases in the Genome Net in Japan. Detailed examination of the Z1-parameter profile led us to assess sequencing errors which cause frame-shift.

  13. Human Genome Project

    SciTech Connect

    Block, S.; Cornwall, J.; Dally, W.; Dyson, F.; Fortson, N.; Joyce, G.; Kimble, H. J.; Lewis, N.; Max, C.; Prince, T.; Schwitters, R.; Weinberger, P.; Woodin, W. H.

    1998-01-04

    The study reviews Department of Energy supported aspects of the United States Human Genome Project, the joint National Institutes of Health/Department of Energy program to characterize all human genetic material, to discover the set of human genes, and to render them accessible for further biological study. The study concentrates on issues of technology, quality assurance/control, and informatics relevant to current effort on the genome project and needs beyond it. Recommendations are presented on areas of the genome program that are of particular interest to and supported by the Department of Energy.

  14. Genomic Sequence Comparisons, 1987-2003 Final Report

    SciTech Connect

    George M. Church

    2004-07-29

    This project was to develop new DNA sequencing and RNA and protein quantitation methods and related genome annotation tools. The project began in 1987 with the development of multiplex sequencing (published in Science in 1988), and one of the first automated sequencing methods. This lead to the first commercial genome sequence in 1994 and to the establishment of the main commercial participants (GTC then Agencourt) in the public DOE/NIH genome project. In collaboration with GTC we contributed to one of the first complete DOE genome sequences, in 1997, that of Methanobacterium thermoautotropicum, a species of great relevance to energy-rich gas production.

  15. Psychological and behavioural impact of returning personal results from whole-genome sequencing: the HealthSeq project

    PubMed Central

    Sanderson, Saskia C; Linderman, Michael D; Suckiel, Sabrina A; Zinberg, Randi; Wasserstein, Melissa; Kasarskis, Andrew; Diaz, George A; Schadt, Eric E

    2017-01-01

    Providing ostensibly healthy individuals with personal results from whole-genome sequencing could lead to improved health and well-being via enhanced disease risk prediction, prevention, and diagnosis, but also poses practical and ethical challenges. Understanding how individuals react psychologically and behaviourally will be key in assessing the potential utility of personal whole-genome sequencing. We conducted an exploratory longitudinal cohort study in which quantitative surveys and in-depth qualitative interviews were conducted before and after personal results were returned to individuals who underwent whole-genome sequencing. The participants were offered a range of interpreted results, including Alzheimer's disease, type 2 diabetes, pharmacogenomics, rare disease-associated variants, and ancestry. They were also offered their raw data. Of the 35 participants at baseline, 29 (82.9%) completed the 6-month follow-up. In the quantitative surveys, test-related distress was low, although it was higher at 1-week than 6-month follow-up (Z=2.68, P=0.007). In the 6-month qualitative interviews, most participants felt happy or relieved about their results. A few were concerned, particularly about rare disease-associated variants and Alzheimer's disease results. Two of the 29 participants had sought clinical follow-up as a direct or indirect consequence of rare disease-associated variants results. Several had mentioned their results to their doctors. Some participants felt having their raw data might be medically useful to them in the future. The majority reported positive reactions to having their genomes sequenced, but there were notable exceptions to this. The impact and value of returning personal results from whole-genome sequencing when implemented on a larger scale remains to be seen. PMID:28051073

  16. Psychological and behavioural impact of returning personal results from whole-genome sequencing: the HealthSeq project.

    PubMed

    Sanderson, Saskia C; Linderman, Michael D; Suckiel, Sabrina A; Zinberg, Randi; Wasserstein, Melissa; Kasarskis, Andrew; Diaz, George A; Schadt, Eric E

    2017-02-01

    Providing ostensibly healthy individuals with personal results from whole-genome sequencing could lead to improved health and well-being via enhanced disease risk prediction, prevention, and diagnosis, but also poses practical and ethical challenges. Understanding how individuals react psychologically and behaviourally will be key in assessing the potential utility of personal whole-genome sequencing. We conducted an exploratory longitudinal cohort study in which quantitative surveys and in-depth qualitative interviews were conducted before and after personal results were returned to individuals who underwent whole-genome sequencing. The participants were offered a range of interpreted results, including Alzheimer's disease, type 2 diabetes, pharmacogenomics, rare disease-associated variants, and ancestry. They were also offered their raw data. Of the 35 participants at baseline, 29 (82.9%) completed the 6-month follow-up. In the quantitative surveys, test-related distress was low, although it was higher at 1-week than 6-month follow-up (Z=2.68, P=0.007). In the 6-month qualitative interviews, most participants felt happy or relieved about their results. A few were concerned, particularly about rare disease-associated variants and Alzheimer's disease results. Two of the 29 participants had sought clinical follow-up as a direct or indirect consequence of rare disease-associated variants results. Several had mentioned their results to their doctors. Some participants felt having their raw data might be medically useful to them in the future. The majority reported positive reactions to having their genomes sequenced, but there were notable exceptions to this. The impact and value of returning personal results from whole-genome sequencing when implemented on a larger scale remains to be seen.

  17. Malaria Genome Sequencing Project

    DTIC Science & Technology

    2004-01-01

    Fawcett T (1992). The biochemistry and molecular Bracchi V, Langsley G , et al. (1996). PfKIN, an SNF1 type pro- biology of plant lipid biosynthesis...176 (1994). some 2 bands from five gels were adjusted to 0.3 M and G . Langsley , Nucleic Acids Res. 16, 4331 (1988), 23. L. Aravind et aL, data not...S., Ralph, S. A., McFadden, G . I., Cummings, L. M., Subramanian, G . M., Mungall, C., Venter, J. C., Carucci, D. J., Hoffman, S. L., Newbold, C., Davis

  18. Genome sequencing of the important oilseed crop Sesamum indicum L

    PubMed Central

    2013-01-01

    The Sesame Genome Working Group (SGWG) has been formed to sequence and assemble the sesame (Sesamum indicum L.) genome. The status of this project and our planned analyses are described. PMID:23369264

  19. Genomic standards consortium projects.

    PubMed

    Field, Dawn; Sterk, Peter; Kottmann, Renzo; De Smet, J Wim; Amaral-Zettler, Linda; Cochrane, Guy; Cole, James R; Davies, Neil; Dawyndt, Peter; Garrity, George M; Gilbert, Jack A; Glöckner, Frank Oliver; Hirschman, Lynette; Klenk, Hans-Peter; Knight, Rob; Kyrpides, Nikos; Meyer, Folker; Karsch-Mizrachi, Ilene; Morrison, Norman; Robbins, Robert; San Gil, Inigo; Sansone, Susanna; Schriml, Lynn; Tatusova, Tatiana; Ussery, Dave; Yilmaz, Pelin; White, Owen; Wooley, John; Caporaso, Gregory

    2014-06-15

    The Genomic Standards Consortium (GSC) is an open-membership community that was founded in 2005 to work towards the development, implementation and harmonization of standards in the field of genomics. Starting with the defined task of establishing a minimal set of descriptions the GSC has evolved into an active standards-setting body that currently has 18 ongoing projects, with additional projects regularly proposed from within and outside the GSC. Here we describe our recently enacted policy for proposing new activities that are intended to be taken on by the GSC, along with the template for proposing such new activities.

  20. [RadGenomics project].

    PubMed

    Iwakawa, Mayumi; Imai, Takashi; Harada, Yoshinobu; Ban, Sadayuki; Michikawa, Yu-ichi; Saegusa, Kumiko; Sagara, Masasi; Tsuji, Atsushi; Noda, Shuhei; Ishikawa, Atsuko

    2002-08-01

    Human health conditions are largely determined by a complex interplay among genetic susceptibility, environmental factors, and aging. The RadGenomics project, which began in April 2001, promotes analysis of genes in response to irradiation, identification of their allelic variants in the human population, development of an effective procedure for quantitating individual radio-sensitivity, and analysis of the interrelationship between genetic heterogeneity and susceptibility to irradiation. Major groups of genes with which the project will concern itself include DNA repair genes, cell cycle genes, oncogenes, tumor suppressor genes, genes for programmed cell death, genes for signal transduction, and genes for oxidative processes. The outcome of the RadGenomics project should lead to improved protocols for personalized radiotherapy and reduce the possible side effects of treatment. The project will contribute to future research on the molecular mechanisms of radiation sensitivity in humans and stimulate the development of new high-throughput technology for a broader application of the biological and medical sciences. Identification of functionally important polymorphisms in the radiation response genes may determine individual differences in sensitivity to radiation exposure. The staff members, who are specialists in a variety of fields including genome science, radiation biology, medical science, molecular biology, and bioinformatics, have come to the RadGenomics project from various universities, companies, and research institutes.

  1. Complete genome sequence of Methanocorpusculum labreanum type strain Z

    SciTech Connect

    Anderson, Iain; Sieprawska-Lupa, Magdalena; Goltsman, Eugene; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Tice, Hope; Dalin, Eileen; Barry, Kerrie; Pitluck, Sam; Hauser, Loren John; Land, Miriam L; Lucas, Susan; Richardson, P M; Whitman, W. B.; Kyrpides, Nikos C

    2009-01-01

    Methanocorpusculum labreanum is a methanogen belonging to the order Methanomicrobiales within the archaeal phylum Euryarchaeota. The type strain Z was isolated from surface sediments of Tar Pit Lake in the La Brea Tar Pits in Los Angeles, California. M. labreanum is of phylogenetic interest because at the time the sequencing project began only one genome had previously been sequenced from the order Methanomicrobiales. We report here the complete genome sequence of M. labreanum type strain Z and its annotation. This is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.

  2. Sequencing Complex Genomic Regions

    SciTech Connect

    Eichler, Evan

    2009-05-28

    Evan Eichler, Howard Hughes Medical Investigator at the University of Washington, gives the May 28, 2009 keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM. Part 1 of 2

  3. Sequencing Complex Genomic Regions

    SciTech Connect

    Eichler, Evan

    2009-05-28

    Evan Eichler, Howard Hughes Medical Investigator at the University of Washington, gives the May 28, 2009 keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM. Part 2 of 2

  4. Identification and annotation of repetitive sequences in fungal genomes

    USDA-ARS?s Scientific Manuscript database

    Cheaper and faster sequencing technologies have fundamentally changed the pace of genome sequencing projects and have contributed to the ever-increasing volume of genomic data. This has been paralleled by an increase in computational power and resources to process and translate raw sequence data int...

  5. Computational Genomics: From Genome Sequence To Global Gene Regulation

    NASA Astrophysics Data System (ADS)

    Li, Hao

    2000-03-01

    As various genome projects are shifting to the post-sequencing phase, it becomes a big challenge to analyze the sequence data and extract biological information using computational tools. In the past, computational genomics has mainly focused on finding new genes and mapping out their biological functions. With the rapid accumulation of experimental data on genome-wide gene activities, it is now possible to understand how genes are regulated on a genomic scale. A major mechanism for gene regulation is to control the level of transcription, which is achieved by regulatory proteins that bind to short DNA sequences - the regulatory elements. We have developed a new approach to identifying regulatory elements in genomes. The approach formalizes how one would proceed to decipher a ``text'' consisting of a long string of letters written in an unknown language that did not delineate words. The algorithm is based on a statistical mechanics model in which the sequence is segmented probabilistically into ``words'' and a ``dictionary'' of ``words'' is built concurrently. For the control regions in the yeast genome, we built a ``dictionary'' of about one thousand words which includes many known as well as putative regulatory elements. I will discuss how we can use this dictionary to search for genes that are likely to be regulated in a similar fashion and to analyze gene expression data generated from DNA micro-array experiments.

  6. Integrating sequence, evolution and functional genomics in regulatory genomics

    PubMed Central

    Vingron, Martin; Brazma, Alvis; Coulson, Richard; van Helden, Jacques; Manke, Thomas; Palin, Kimmo; Sand, Olivier; Ukkonen, Esko

    2009-01-01

    With genome analysis expanding from the study of genes to the study of gene regulation, 'regulatory genomics' utilizes sequence information, evolution and functional genomics measurements to unravel how regulatory information is encoded in the genome. PMID:19226437

  7. Does less mean more for the genome projects? Large-scale sequencing requires smaller, more powerful processors

    SciTech Connect

    Hodgson, J.

    1995-03-01

    Maynard Olson has very definite views on the status of DNA sequence technology in relation to the demands of the human genome program. {open_quotes}Efforts to automate conventional technology have almost totally failed...What is currently called {open_quotes}automation{close_quotes} involves the use of alleged labor-saving devices. These have little positive effect - and may even have a negative effect - in embedding to an ever-increasing degree the molecular biologist in the process of sequencing.{close_quotes} That is something that Olson, now working with Leroy Hood at the department of molecular biotechnology at the University of Washington (Seattle, WA), plans to change. The department, established in 1991, and partially funded by a $12 million gift from Microsoft`s (Seattle, WA) William Gates III, will take a multidisciplinary, technology-led approach to sequencing development. Olson thinks that the technology that eventually emerges will be {open_quotes}conventional{close_quotes} only in the sense that it will involve Sanger sequencing and electrophoretic separation in some form. 3 refs.

  8. Genome sequences and great expectations

    PubMed Central

    Iliopoulos, Ioannis; Tsoka, Sophia; Andrade, Miguel A; Janssen, Paul; Audit, Benjamin; Tramontano, Anna; Valencia, Alfonso; Leroy, Christophe; Sander, Chris; Ouzounis, Christos A

    2001-01-01

    To assess how automatic function assignment will contribute to genome annotation in the next five years, we have performed an analysis of 31 available genome sequences. An emerging pattern is that function can be predicted for almost two-thirds of the 73,500 genes that were analyzed. Despite progress in computational biology, there will always be a great need for large-scale experimental determination of protein function. PMID:11178275

  9. Evaluation of Genome Sequencing Quality in Selected Plant Species Using Expressed Sequence Tags

    PubMed Central

    Shangguan, Lingfei; Han, Jian; Kayesh, Emrul; Sun, Xin; Zhang, Changqing; Pervaiz, Tariq; Wen, Xicheng; Fang, Jinggui

    2013-01-01

    Background With the completion of genome sequencing projects for more than 30 plant species, large volumes of genome sequences have been produced and stored in online databases. Advancements in sequencing technologies have reduced the cost and time of whole genome sequencing enabling more and more plants to be subjected to genome sequencing. Despite this, genome sequence qualities of multiple plants have not been evaluated. Methodology/Principal Finding Integrity and accuracy were calculated to evaluate the genome sequence quality of 32 plants. The integrity of a genome sequence is presented by the ratio of chromosome size and genome size (or between scaffold size and genome size), which ranged from 55.31% to nearly 100%. The accuracy of genome sequence was presented by the ratio between matched EST and selected ESTs where 52.93% ∼ 98.28% and 89.02% ∼ 98.85% of the randomly selected clean ESTs could be mapped to chromosome and scaffold sequences, respectively. According to the integrity, accuracy and other analysis of each plant species, thirteen plant species were divided into four levels. Arabidopsis thaliana, Oryza sativa and Zea mays had the highest quality, followed by Brachypodium distachyon, Populus trichocarpa, Vitis vinifera and Glycine max, Sorghum bicolor, Solanum lycopersicum and Fragaria vesca, and Lotus japonicus, Medicago truncatula and Malus × domestica in that order. Assembling the scaffold sequences into chromosome sequences should be the primary task for the remaining nineteen species. Low GC content and repeat DNA influences genome sequence assembly. Conclusion The quality of plant genome sequences was found to be lower than envisaged and thus the rapid development of genome sequencing projects as well as research on bioinformatics tools and the algorithms of genome sequence assembly should provide increased processing and correction of genome sequences that have already been published. PMID:23922843

  10. Mapping and sequencing the human genome

    SciTech Connect

    1988-01-01

    Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.

  11. Mapping and Sequencing the Human Genome

    DOE R&D Accomplishments Database

    1988-01-01

    Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.

  12. Complete genome sequence of Acetohalobium arabaticum type strain (Z-7288).

    PubMed

    Sikorski, Johannes; Lapidus, Alla; Chertkov, Olga; Lucas, Susan; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Han, Cliff; Brambilla, Evelyne; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Bruce, David; Detter, Chris; Tapia, Roxanne; Goodwin, Lynne; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Rohde, Manfred; Göker, Markus; Spring, Stefan; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-08-20

    Acetohalobium arabaticum Zhilina and Zavarzin 1990 is of special interest because of its physiology and its participation in the anaerobic C(1)-trophic chain in hypersaline environments. This is the first completed genome sequence of the family Halobacteroidaceae and only the second genome sequence in the order Halanaerobiales. The 2,469,596 bp long genome with its 2,353 protein-coding and 90 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  13. Complete genome sequence of Arcanobacterium haemolyticum type strain (11018T)

    SciTech Connect

    Yasawong, Montri; Teshima, Hazuki; Lapidus, Alla L.; Nolan, Matt; Lucas, Susan; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Bruce, David; Detter, J. Chris; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Rohde, Manfred; Sikorski, Johannes; Pukall, Rudiger; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-01

    Vulcanisaeta distributa Itoh et al. 2002 belongs to the family Thermoproteaceae in the phylum Crenarchaeota. The genus Vulcanisaeta is characterized by a global distribution in hot and acidic springs. This is the first genome sequence from a member of the genus Vulcanisaeta and seventh genome sequence in the family Thermoproteaceae. The 2,374,137 bp long genome with its 2,544 protein-coding and 49 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  14. Complete genome sequence of Methanoculleus marisnigri type strain JR1

    SciTech Connect

    Anderson, Iain; Sieprawska-Lupa, Magdalena; Goltsman, Eugene; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Tice, Hope; Dalin, Eileen; Barry, Kerrie; Saunders, Elizabeth H; Han, Cliff; Brettin, Tom; Detter, J. Chris; Bruce, David; Mikhailova, Natalia; Pitluck, Sam; Hauser, Loren John; Land, Miriam L; Lucas, Susan; Richardson, P M; Whitman, W. B.; Kyrpides, Nikos C

    2009-01-01

    Methanoculleus marisnigri Romesser et al. 1981 is a methanogen belonging to the order Methanomicrobiales within the archaeal phylum Euryarchaeota. The type strain, JR1, was isolated from anoxic sediments of the Black Sea. M. marisnigri is of phylogenetic interest because at the time the sequencing project began only one genome had previously been sequenced from the order Methanomicrobiales. We report here the complete genome sequence of M. marisnigri type strain JR1 and its annotation. This is part of a Joint Genome Institute 2006 Community Sequencing Program to sequence genomes of diverse Archaea.

  15. Genome Sequence of Canine Herpesvirus

    PubMed Central

    Papageorgiou, Konstantinos V.; Suárez, Nicolás M.; Wilkie, Gavin S.; McDonald, Michael; Graham, Elizabeth M.; Davison, Andrew J.

    2016-01-01

    Canine herpesvirus is a widespread alphaherpesvirus that causes a fatal haemorrhagic disease of neonatal puppies. We have used high-throughput methods to determine the genome sequences of three viral strains (0194, V777 and V1154) isolated in the United Kingdom between 1985 and 2000. The sequences are very closely related to each other. The canine herpesvirus genome is estimated to be 125 kbp in size and consists of a unique long sequence (97.5 kbp) and a unique short sequence (7.7 kbp) that are each flanked by terminal and internal inverted repeats (38 bp and 10.0 kbp, respectively). The overall nucleotide composition is 31.6% G+C, which is the lowest among the completely sequenced alphaherpesviruses. The genome contains 76 open reading frames predicted to encode functional proteins, all of which have counterparts in other alphaherpesviruses. The availability of the sequences will facilitate future research on the diagnosis and treatment of canine herpesvirus-associated disease. PMID:27213534

  16. Genome Sequence of Spizellomyces punctatus

    PubMed Central

    Russ, Carsten; Lang, B. Franz; Chen, Zehua; Gujja, Sharvari; Shea, Terrance; Zeng, Qiandong; Young, Sarah; Nusbaum, Chad

    2016-01-01

    Spizellomyces punctatus is a basally branching chytrid fungus that is found in the Chytridiomycota phylum. Spizellomyces species are common in soil and of importance in terrestrial ecosystems. Here, we report the genome sequence of S. punctatus, which will facilitate the study of this group of early diverging fungi. PMID:27540072

  17. Fusicladium effusum draft genome sequence

    USDA-ARS?s Scientific Manuscript database

    The pecan scab fungus (Fusicladium effusum [G. Winter]) is an economically important pathogen of pecan (Carya illinoinensis [Wangenh]. K. Koch), on account of its impact on yield and quality of valuable nutmeats. We describe the first draft genome sequence of F. effusum, the characteristics of annot...

  18. Selection of sequence variants to improve dairy cattle genomic predictions

    USDA-ARS?s Scientific Manuscript database

    Genomic prediction reliabilities improved when adding selected sequence variants from run 5 of the 1,000 bull genomes project. High density (HD) imputed genotypes for 26,970 progeny tested Holstein bulls were combined with sequence variants for 444 Holstein animals. The first test included 481,904 c...

  19. The life cycle of a genome project: perspectives and guidelines inspired by insect genome projects.

    PubMed

    Papanicolaou, Alexie

    2016-01-01

    Many research programs on non-model species biology have been empowered by genomics. In turn, genomics is underpinned by a reference sequence and ancillary information created by so-called "genome projects". The most reliable genome projects are the ones created as part of an active research program and designed to address specific questions but their life extends past publication. In this opinion paper I outline four key insights that have facilitated maintaining genomic communities: the key role of computational capability, the iterative process of building genomic resources, the value of community participation and the importance of manual curation. Taken together, these ideas can and do ensure the longevity of genome projects and the growing non-model species community can use them to focus a discussion with regards to its future genomic infrastructure.

  20. Genome Sequences of Eight Morphologically Diverse Alphaproteobacteria▿

    PubMed Central

    Brown, Pamela J. B.; Kysela, David T.; Buechlein, Aaron; Hemmerich, Chris; Brun, Yves V.

    2011-01-01

    The Alphaproteobacteriacomprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium. PMID:21705585

  1. Genome sequences of eight morphologically diverse Alphaproteobacteria.

    PubMed

    Brown, Pamela J B; Kysela, David T; Buechlein, Aaron; Hemmerich, Chris; Brun, Yves V

    2011-09-01

    The Alphaproteobacteria comprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium.

  2. Genome Sequence of Mycobacteriophage Momo

    PubMed Central

    Bina, Elizabeth A.; Brahme, Indraneel S.; Hill, Amy B.; Himmelstein, Philip H.; Hunsicker, Sara M.; Ish, Amanda R.; Le, Tinh S.; Martin, Mary M.; Moscinski, Catherine N.; Shetty, Sameer A.; Swierzewski, Tomasz; Iyengar, Varun B.; Kim, Hannah; Schafer, Claire E.; Grubb, Sarah R.; Warner, Marcie H.; Bowman, Charles A.; Russell, Daniel A.; Hatfull, Graham F.

    2015-01-01

    Momo is a newly discovered phage of Mycobacterium smegmatis mc2155. Momo has a double-stranded DNA genome 154,553 bp in length, with 233 predicted protein-encoding genes, 34 tRNA genes, and one transfer-messenger RNA (tmRNA) gene. Momo has a myoviral morphology and shares extensive nucleotide sequence similarity with subcluster C1 mycobacteriophages. PMID:26089415

  3. Genome Sequence of Mycobacteriophage Momo.

    PubMed

    Pope, Welkin H; Bina, Elizabeth A; Brahme, Indraneel S; Hill, Amy B; Himmelstein, Philip H; Hunsicker, Sara M; Ish, Amanda R; Le, Tinh S; Martin, Mary M; Moscinski, Catherine N; Shetty, Sameer A; Swierzewski, Tomasz; Iyengar, Varun B; Kim, Hannah; Schafer, Claire E; Grubb, Sarah R; Warner, Marcie H; Bowman, Charles A; Russell, Daniel A; Hatfull, Graham F

    2015-06-18

    Momo is a newly discovered phage of Mycobacterium smegmatis mc(2)155. Momo has a double-stranded DNA genome 154,553 bp in length, with 233 predicted protein-encoding genes, 34 tRNA genes, and one transfer-messenger RNA (tmRNA) gene. Momo has a myoviral morphology and shares extensive nucleotide sequence similarity with subcluster C1 mycobacteriophages. Copyright © 2015 Pope et al.

  4. Gambling on a shortcut to genome sequencing

    SciTech Connect

    Roberts, L.

    1991-06-21

    Almost from the start of the Human Genome Project, a debate has been raging over whether to sequence the entire human genome, all 3 billion bases, or just the genes - a mere 2% or 3% of the genome, and by far the most interesting part. In England, Sydney Brenner convinced the Medical Research Council (MRC) to start with the expressed genes, or complementary DNAs. But the US stance has been that the entire sequence is essential if we are to understand the blueprint of man. Craig Venter of the National Institute of Neurological Disorders and Stroke says that focusing on the expressed genes may be even more useful than expected. His strategy involves randomly selecting clones from cDNA libraries which theoretically contain all the genes that are switched on at a particular time in a particular tissue. Then the researchers sequence just a short stretch of each clone, about 400 to 500 bases, to create can expressed sequence tag or EST. The sequences of these ESTs are then stored in a database. Using that information, other researchers can then recreate that EST by using polymerase chain reaction techniques.

  5. Personal genome sequencing: current approaches and challenges

    PubMed Central

    Snyder, Michael; Du, Jiang; Gerstein, Mark

    2010-01-01

    The revolution in DNA sequencing technologies has now made it feasible to determine the genome sequences of many individuals; i.e., “personal genomes.” Genome sequences of cells and tissues from both normal and disease states have been determined. Using current approaches, whole human genome sequences are not typically assembled and determined de novo, but, instead, variations relative to a reference sequence are identified. We discuss the current state of personal genome sequencing, the main steps involved in determining a genome sequence (i.e., identifying single-nucleotide polymorphisms [SNPs] and structural variations [SVs], assembling new sequences, and phasing haplotypes), and the challenges and performance metrics for evaluating the accuracy of the reconstruction. Finally, we consider the possible individual and societal benefits of personal genome sequences. PMID:20194435

  6. The life cycle of a genome project: perspectives and guidelines inspired by insect genome projects

    PubMed Central

    Papanicolaou, Alexie

    2016-01-01

    Many research programs on non-model species biology have been empowered by genomics. In turn, genomics is underpinned by a reference sequence and ancillary information created by so-called “genome projects”. The most reliable genome projects are the ones created as part of an active research program and designed to address specific questions but their life extends past publication. In this opinion paper I outline four key insights that have facilitated maintaining genomic communities: the key role of computational capability, the iterative process of building genomic resources, the value of community participation and the importance of manual curation. Taken together, these ideas can and do ensure the longevity of genome projects and the growing non-model species community can use them to focus a discussion with regards to its future genomic infrastructure. PMID:27006757

  7. The Human Genome Diversity Project: past, present and future.

    PubMed

    Cavalli-Sforza, L Luca

    2005-04-01

    The Human Genome Project, in accomplishing its goal of sequencing one human genome, heralded a new era of research, a component of which is the systematic study of human genetic variation. Despite delays, the Human Genome Diversity Project has started to make progress in understanding the patterns of this variation and its causes, and also promises to provide important information for biomedical studies.

  8. Selecting sequence variants to improve genomic predictions for dairy cattle

    USDA-ARS?s Scientific Manuscript database

    Millions of genetic variants have been identified by population-scale sequencing projects, but subsets are needed for routine genomic predictions or to include on genotyping arrays. Methods of selecting sequence variants were compared using both simulated sequence genotypes and actual data from run ...

  9. Genome sequencing of the redbanded stink bug (Piezodorus guildinii)

    USDA-ARS?s Scientific Manuscript database

    We assembled a partial genome sequence from the redbanded stink bug, Piezodorus guildinii from Illumina MiSeq sequencing runs. The sequence has been submitted and published under NCBI GenBank Accession Number JTEQ01000000. The BioProject and BioSample Accession numbers are PRJNA263369 and SAMN030997...

  10. Some problems with a crash program. [Human Genome Project

    SciTech Connect

    Davis, B.D. )

    1991-01-01

    This brief article describes the controversy over funding for the Human Genome Project. The author states that if the goal of the project is to understand the human genome by relating structure to function if would not appear justifiable to invest massive amounts of money now in improving sequencing techniques. Complete sequencing data may emerge eventually from expanding regions of interest.

  11. Next-generation sequencing strategies for characterizing the turkey genome.

    PubMed

    Dalloul, Rami A; Zimin, Aleksey V; Settlage, Robert E; Kim, Sungwon; Reed, Kent M

    2014-02-01

    The turkey genome sequencing project was initiated in 2008 and has relied primarily on next-generation sequencing (NGS) technologies. Our first efforts used a synergistic combination of 2 NGS platforms (Roche/454 and Illumina GAII), detailed bacterial artificial chromosome (BAC) maps, and unique assembly tools to sequence and assemble the genome of the domesticated turkey, Meleagris gallopavo. Since the first release in 2010, efforts to improve the genome assembly, gene annotation, and genomic analyses continue. The initial assembly build (2.01) represented about 89% of the genome sequence with 17X coverage depth (931 Mb). Sequence contigs were assigned to 30 of the 40 chromosomes with approximately 10% of the assembled sequence corresponding to unassigned chromosomes (ChrUn). The sequence has been refined through both genome-wide and area-focused sequencing, including shotgun and paired-end sequencing, and targeted sequencing of chromosomal regions with low or incomplete coverage. These additional efforts have improved the sequence assembly resulting in 2 subsequent genome builds of higher genome coverage (25X/Build3.0 and 30X/Build4.0) with a current sequence totaling 1,010 Mb. Further, BAC with end sequences assigned to the Z/W and MG18 (MHC) chromosomes, ChrUn, or not placed in the previous build were isolated, deeply sequenced (Hi-Seq), and incorporated into the latest build (5.0). To aid in the annotation and to generate a gene expression atlas of major tissues, a comprehensive set of RNA samples was collected at various developmental stages of female and male turkeys. Transcriptome sequencing data (using Illumina Hi-Seq) will provide information to enhance the final assembly and ultimately improve sequence annotation. The most current sequence covers more than 95% of the turkey genome and should yield a much improved gene level of annotation, making it a valuable resource for studying genetic variations underlying economically important traits in poultry.

  12. The Materials Genome Project

    NASA Astrophysics Data System (ADS)

    Aourag, H.

    2008-09-01

    In the past, the search for new and improved materials was characterized mostly by the use of empirical, trial- and-error methods. This picture of materials science has been changing as the knowledge and understanding of fundamental processes governing a material's properties and performance (namely, composition, structure, history, and environment) have increased. In a number of cases, it is now possible to predict a material's properties before it has even been manufactured thus greatly reducing the time spent on testing and development. The objective of modern materials science is to tailor a material (starting with its chemical composition, constituent phases, and microstructure) in order to obtain a desired set of properties suitable for a given application. In the short term, the traditional "empirical" methods for developing new materials will be complemented to a greater degree by theoretical predictions. In some areas, computer simulation is already used by industry to weed out costly or improbable synthesis routes. Can novel materials with optimized properties be designed by computers? Advances in modelling methods at the atomic level coupled with rapid increases in computer capabilities over the last decade have led scientists to answer this question with a resounding "yes'. The ability to design new materials from quantum mechanical principles with computers is currently one of the fastest growing and most exciting areas of theoretical research in the world. The methods allow scientists to evaluate and prescreen new materials "in silico" (in vitro), rather than through time consuming experimentation. The Materials Genome Project is to pursue the theory of large scale modeling as well as powerful methods to construct new materials, with optimized properties. Indeed, it is the intimate synergy between our ability to predict accurately from quantum theory how atoms can be assembled to form new materials and our capacity to synthesize novel materials atom

  13. MIPS: a database for genomes and protein sequences.

    PubMed

    Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).

  14. [Human genome project: a federator program of genomic medicine].

    PubMed

    Sfar, S; Chouchane, L

    2008-05-01

    The Human Genome Project improves our understanding of the molecular genetics basis of the inherited and complex diseases such as diabetes, schizophrenia, and cancer. Information from the human genome sequence is essential for several antenatal and neonatal screening programmes. The new genomic tools emerging from this project have revolutionized biology and medicine and have transformed our understanding of health and the provision of healthcare. Its implications pervade all areas of medicine, from disease prediction and prevention to the diagnosis and treatment of all forms of disease. Increasingly, it will be possible to drive predisposition testing into clinical practice, to develop new treatments or to adapt available treatments more specifically to an individual's genetic make-up. This genomic information should transform the traditional medications that are effective for every members of the population to personalized medicine and personalized therapy. The pharmacogenomics could give rise to a new generation of highly effective drugs that treat causes, not just symptoms.

  15. Scientific Goals of the Human Genome Project.

    ERIC Educational Resources Information Center

    Wills, Christopher

    1993-01-01

    The Human Genome Project, an effort to sequence all the DNA of a human cell, is needed to better understand the behavior of chromosomes during cell division, with the ultimate goal of understanding the specific genes contributing to specific diseases and disabilities. (MSE)

  16. Scientific Goals of the Human Genome Project.

    ERIC Educational Resources Information Center

    Wills, Christopher

    1993-01-01

    The Human Genome Project, an effort to sequence all the DNA of a human cell, is needed to better understand the behavior of chromosomes during cell division, with the ultimate goal of understanding the specific genes contributing to specific diseases and disabilities. (MSE)

  17. Utilizing the Jaccard index to reveal population stratification in sequencing data: a simulation study and an application to the 1000 Genomes Project.

    PubMed

    Prokopenko, Dmitry; Hecker, Julian; Silverman, Edwin K; Pagano, Marcello; Nöthen, Markus M; Dina, Christian; Lange, Christoph; Fier, Heide Loehlein

    2016-05-01

    Population stratification is one of the major sources of confounding in genetic association studies, potentially causing false-positive and false-negative results. Here, we present a novel approach for the identification of population substructure in high-density genotyping data/next generation sequencing data. The approach exploits the co-appearances of rare genetic variants in individuals. The method can be applied to all available genetic loci and is computationally fast. Using sequencing data from the 1000 Genomes Project, the features of the approach are illustrated and compared to existing methodology (i.e. EIGENSTRAT). We examine the effects of different cutoffs for the minor allele frequency on the performance of the approach. We find that our approach works particularly well for genetic loci with very small minor allele frequencies. The results suggest that the inclusion of rare-variant data/sequencing data in our approach provides a much higher resolution picture of population substructure than it can be obtained with existing methodology. Furthermore, in simulation studies, we find scenarios where our method was able to control the type 1 error more precisely and showed higher power. dmitry.prokopenko@uni-bonn.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  18. Trends in Next-Generation Sequencing and a New Era for Whole Genome Sequencing

    PubMed Central

    2016-01-01

    This article is a mini-review that provides a general overview for next-generation sequencing (NGS) and introduces one of the most popular NGS applications, whole genome sequencing (WGS), developed from the expansion of human genomics. NGS technology has brought massively high throughput sequencing data to bear on research questions, enabling a new era of genomic research. Development of bioinformatic software for NGS has provided more opportunities for researchers to use various applications in genomic fields. De novo genome assembly and large scale DNA resequencing to understand genomic variations are popular genomic research tools for processing a tremendous amount of data at low cost. Studies on transcriptomes are now available, from previous-hybridization based microarray methods. Epigenetic studies are also available with NGS applications such as whole genome methylation sequencing and chromatin immunoprecipitation followed by sequencing. Human genetics has faced a new paradigm of research and medical genomics by sequencing technologies since the Human Genome Project. The trend of NGS technologies in human genomics has brought a new era of WGS by enabling the building of human genomes databases and providing appropriate human reference genomes, which is a necessary component of personalized medicine and precision medicine. PMID:27915479

  19. Trends in Next-Generation Sequencing and a New Era for Whole Genome Sequencing.

    PubMed

    Park, Sang Tae; Kim, Jayoung

    2016-11-01

    This article is a mini-review that provides a general overview for next-generation sequencing (NGS) and introduces one of the most popular NGS applications, whole genome sequencing (WGS), developed from the expansion of human genomics. NGS technology has brought massively high throughput sequencing data to bear on research questions, enabling a new era of genomic research. Development of bioinformatic software for NGS has provided more opportunities for researchers to use various applications in genomic fields. De novo genome assembly and large scale DNA resequencing to understand genomic variations are popular genomic research tools for processing a tremendous amount of data at low cost. Studies on transcriptomes are now available, from previous-hybridization based microarray methods. Epigenetic studies are also available with NGS applications such as whole genome methylation sequencing and chromatin immunoprecipitation followed by sequencing. Human genetics has faced a new paradigm of research and medical genomics by sequencing technologies since the Human Genome Project. The trend of NGS technologies in human genomics has brought a new era of WGS by enabling the building of human genomes databases and providing appropriate human reference genomes, which is a necessary component of personalized medicine and precision medicine.

  20. Fungal genome sequencing: basic biology to biotechnology.

    PubMed

    Sharma, Krishna Kant

    2016-08-01

    The genome sequences provide a first glimpse into the genomic basis of the biological diversity of filamentous fungi and yeast. The genome sequence of the budding yeast, Saccharomyces cerevisiae, with a small genome size, unicellular growth, and rich history of genetic and molecular analyses was a milestone of early genomics in the 1990s. The subsequent completion of fission yeast, Schizosaccharomyces pombe and genetic model, Neurospora crassa initiated a revolution in the genomics of the fungal kingdom. In due course of time, a substantial number of fungal genomes have been sequenced and publicly released, representing the widest sampling of genomes from any eukaryotic kingdom. An ambitious genome-sequencing program provides a wealth of data on metabolic diversity within the fungal kingdom, thereby enhancing research into medical science, agriculture science, ecology, bioremediation, bioenergy, and the biotechnology industry. Fungal genomics have higher potential to positively affect human health, environmental health, and the planet's stored energy. With a significant increase in sequenced fungal genomes, the known diversity of genes encoding organic acids, antibiotics, enzymes, and their pathways has increased exponentially. Currently, over a hundred fungal genome sequences are publicly available; however, no inclusive review has been published. This review is an initiative to address the significance of the fungal genome-sequencing program and provides the road map for basic and applied research.

  1. Whole-Genome Sequencing: Manual Library Preparation.

    PubMed

    Mardis, Elaine; McCombie, W Richard

    2017-01-03

    This protocol describes a manual approach for the preparation of genomic DNA libraries suitable for Illumina sequencing. Genomic DNA fragments produced by shearing by sonication are ligated to adaptors and amplified by polymerase chain reaction (PCR). The amplified DNA, separated by size and gel-purified, is suitable for use as template in whole-genome sequencing.

  2. SP8 Sequencing Extinct Genomes

    PubMed Central

    Poinar, H.

    2007-01-01

    Nucleic acids, which hold clues to the evolution of various animal and hominid taxa, are comparatively weak molecules from other cellular debris, and thus evolutionary biologists are in essence time trapped. Fortunately, DNA and protein fragments do exist in fossil remains beyond what theoretical experimentation would suggest. Sequestering of DNA molecules in humic or Maillard-like complexes likely represents a rich source of DNA molecules from the past, which have yet to be tapped. These molecules were impossible to acquire due to the selective nature of the polymerase chain reaction. Recently, however, rapid parallel pyrosequencing techniques, such as those used in metagenomics-based research, which, in theory, allow for the identification of all short nucleotide sequences in a sample in a non-selective approach, have the potential to allow the identification of all nucleic acids in a sample, and thus represent the way forward for ancient DNA. In theory, this new technology will allow the completion of genomes of extinct animals, plants, and microbes. I will discuss the benefits and pitfalls of this metagenomics approach to ancient DNA, highlighting our recent efforts underway to sequence the wooly mammoth genome as well as other fossil remains.

  3. Draft Genome Sequence of Lactobacillus rhamnosus 2166

    PubMed Central

    Melnikov, Vyacheslav G.; Kosarev, Igor V.; Abramov, Vyacheslav M.

    2014-01-01

    In this report, we present a draft sequence of the genome of Lactobacillus rhamnosus strain 2166, a potential novel probiotic. Genome annotation and read mapping onto a reference genome of L. rhamnosus strain GG allowed for the identification of the differences and similarities in the genomic contents and gene arrangements of these strains. PMID:24558254

  4. Animal selection for whole genome sequencing by quantifying the unique contribution of homozygous haplotypes sequenced

    USDA-ARS?s Scientific Manuscript database

    Major whole genome sequencing projects promise to identify rare and causal variants within livestock species; however, the efficient selection of animals for sequencing remains a major problem within these surveys. The goal of this project was to develop a library of high accuracy genetic variants f...

  5. Weeding out the genes: the Arabidopsis genome project.

    PubMed

    Martienssen, R A

    2000-05-01

    The Arabidopsis genome sequence is scheduled for completion at the end of this year (December 2000). It will be the first higher plant genome to be sequenced, and will allow a detailed comparison with bacterial, yeast and animal genomes. Already, two of the five chromosomes have been sequenced, and we have had our first glimpse of higher eukaryotic centromeres, and the structure of heterochromatin. The implications for understanding plant gene function, genome structure and genome organization are profound. In this review, the lessons learned for future genome projects are reviewed as well as a summary of the initial findings in Arabidopsis.

  6. Value of a newly sequenced bacterial genome

    PubMed Central

    Barbosa, Eudes GV; Aburjaile, Flavia F; Ramos, Rommel TJ; Carneiro, Adriana R; Le Loir, Yves; Baumbach, Jan; Miyoshi, Anderson; Silva, Artur; Azevedo, Vasco

    2014-01-01

    Next-generation sequencing (NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft (partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the “scientific value” of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information. PMID:24921006

  7. Comprehensive genome sequencing of the liver cancer genome.

    PubMed

    Nakagawa, Hidewaki; Shibata, Tatsuhiro

    2013-11-01

    Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related death worldwide. Recently, comprehensive whole genome and exome sequencing analyses for HCC revealed new cancer-associated genes and a variety of genomic alterations. In particular, frequent genetic alterations of the chromatin remodeling genes were observed, suggesting a new potential therapeutic target for HCC. Sequencing analysis has further identified the molecular complexities of multicentric lesions and intratumoral heterogeneity. Detailed analyses of the somatic substitution pattern of the cancer genome and the HBV virus genome integration sites by using whole-genome sequencing will elucidate the molecular basis and diverse etiological factors involved in liver cancer development.

  8. Insights from 20 years of bacterial genome sequencing.

    PubMed

    Land, Miriam; Hauser, Loren; Jun, Se-Ran; Nookaew, Intawat; Leuze, Michael R; Ahn, Tae-Hyuk; Karpinets, Tatiana; Lund, Ole; Kora, Guruprased; Wassenaar, Trudy; Poudel, Suresh; Ussery, David W

    2015-03-01

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome

  9. Insights from 20 years of bacterial genome sequencing

    SciTech Connect

    Land, Miriam L.; Hauser, Loren; Jun, Se-Ran; Nookaew, Intawat; Leuze, Michael Rex; Ahn, Tae-Hyuk; Karpinets, Tatiana V.; Lund, Ole; Kora, Guruprased H.; Wassenaar, Trudy; Poudel, Suresh; Ussery, David W.

    2015-02-27

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome

  10. Insights from 20 years of bacterial genome sequencing

    DOE PAGES

    Land, Miriam L.; Hauser, Loren; Jun, Se-Ran; ...

    2015-02-27

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date,more » there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about

  11. Insights from twenty years of bacterial genome sequencing

    SciTech Connect

    Land, Miriam L; Hauser, Loren John; Jun, Se Ran; Nookaew, Intawat; Leuze, Michael Rex; Ahn, Tae-Hyuk; Karpinets, Tatiana V; Lund, Ole; Kora, Guruprasad H; Wassenaar, Trudy; Poudel, Suresh; Ussery, David W

    2015-01-01

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome

  12. Complete genome sequence of Thermomonospora curvata type strain (B9)

    SciTech Connect

    Chertkov, Olga; Sikorski, Johannes; Nolan, Matt; Lapidus, Alla L.; Lucas, Susan; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Ngatchou, Olivier Duplex; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Brettin, Thomas S; Han, Cliff; Detter, J. Chris; Rohde, Manfred; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2011-01-01

    Thermomonospora curvata Henssen 1957 is the type species of the genus Thermomonospora. This genus is of interest because members of this clade are sources of new antibiotics, enzymes, and products with pharmacological activity. In addition, members of this genus participate in the active degradation of cellulose. This is the first complete genome sequence of a member of the family Thermomonosporaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 5,639,016 bp long genome with its 4,985 protein-coding and 76 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  13. Complete genome sequence of Sulfurospirillum deleyianum type strain (5175T)

    PubMed Central

    Sikorski, Johannes; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Chen, Feng; Tice, Hope; Cheng, Jan-Fang; Saunders, Elizabeth; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ovchinnikova, Galina; Pati, Amrita; Ivanova, Natalia; Mavromatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; Chain, Patrick; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Brettin, Thomas; Detter, John C.; Han, Cliff; Rohde, Manfred; Lang, Elke; Spring, Stefan; Göker, Markus; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2010-01-01

    Sulfurospirillum deleyianum Schumacher et al. 1993 is the type species of the genus Sulfurospirillum. S. deleyianum is a model organism for studying sulfur reduction and dissimilatory nitrate reduction as an energy source for growth. Also, it is a prominent model organism for studying the structural and functional characteristics of cytochrome c nitrite reductase. Here, we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the genus Sulfurospirillum. The 2,306,351 bp long genome with its 2,291 protein-coding and 52 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304697

  14. The revolution of whole genome sequencing to study parasites.

    PubMed

    Forrester, Sarah Jayne; Hall, Neil

    2014-07-01

    Genome sequencing has revolutionized the way in which we approach biological research from fundamental molecular biology to ecology and epidemiology. In the last 10 years the field of genomics has changed enormously as technology has improved and the tools for genomic sequencing have moved out of a few dedicated centers and now can be performed on bench-top instruments. In this review we will cover some of the key discoveries that were catalyzed by some of the first genome projects and discuss how this field is developing, what the new challenges are and how this may impact on research in the near future.

  15. Complete genome sequence of Acidimicrobium ferrooxidans type strain (ICPT)

    SciTech Connect

    Clum, Alicia; Nolan, Matt; Lang, Elke; Glavina Del Rio, Tijana; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavrommatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Goker, Markus; Spring, Stefan; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia C.; Chain, Patrick; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Lapidus, Alla

    2009-05-20

    Acidimicrobium ferrooxidans (Clark and Norris 1996) is the sole and type species of the genus, which until recently was the only genus within the actinobacterial family Acidimicrobiaceae and in the order Acidomicrobiales. Rapid oxidation of iron pyrite during autotrophic growth in the absence of an enhanced CO2 concentration is characteristic for A. ferrooxidans. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the order Acidomicrobiales, and the 2,158,157 bp long single replicon genome with its 2038 protein coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  16. Complete genome sequence of Sulfurospirillum deleyianum type strain (5175T)

    SciTech Connect

    Sikorski, Johannes; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Chen, Feng; Tice, Hope; Cheng, Jan-Fang; Saunders, Elizabeth H; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Ovchinnikova, Galina; Pati, Amrita; Ivanova, N; Mavromatis, K; Chen, Amy; Palaniappan, Krishna; Chain, Patrick S. G.; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Detter, J. Chris; Han, Cliff; Rohde, Manfred; Lang, Elke; Spring, Stefan; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-01

    Sulfurospirillum deleyianum Schumacher et al. 1993 is the type species of the genus Sulfurospirillum. S. deleyianum is a model organism for studying sulfur reduction and dissimilatory nitrate reduction as energy source for growth. Also, it is a prominent model organism for studying the structural and functional characteristics of the cytochrome c nitrite reductase. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the genus Sulfurospirillum. The 2,306,351 bp long genome with its 2291 protein-coding and 52 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  17. Complete genome sequence of Spirosoma linguale type strain (1T)

    SciTech Connect

    Lail, Kathleen; Sikorski, Johannes; Saunders, Elizabeth H; Lapidus, Alla L.; Glavina Del Rio, Tijana; Copeland, A; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Nolan, Matt; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Ivanova, N; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Detter, J. Chris; Schutze, Andrea; Rohde, Manfred; Tindall, Brian; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Chen, Feng

    2010-01-01

    Spirosoma linguale Migula 1894 is the type species of the genus. S. linguale is a free-living and non-pathogenic organism, known for its peculiar ringlike and horseshoe-shaped cell morphology. Here we describe the features of this organism, together with the complete ge-nome sequence and annotation. This is only the third completed genome sequence of a member of the family Cytophagaceae. The 8,491,258 bp long genome with its eight plas-mids, 7,069 protein-coding and 60 RNA genes is part of the Genomic Encyclopedia of Bacte-ria and Archaea project.

  18. Complete genome sequence of Gordonia bronchialis type strain (3410T)

    PubMed Central

    Ivanova, Natalia; Sikorski, Johannes; Jando, Marlen; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Del Rio, Tijana Glavina; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Chain, Patrick; Saunders, Elizabeth; Han, Cliff; Detter, John C.; Brettin, Thomas; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C.

    2010-01-01

    Gordonia bronchialis Tsukamura 1971 is the type species of the genus. G. bronchialis is a human-pathogenic organism that has been isolated from a large variety of human tissues. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family Gordoniaceae. The 5,290,012 bp long genome with its 4,944 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304674

  19. Complete genome sequence of Gordonia bronchialis type strain (3410T)

    SciTech Connect

    Ivanova, N; Sikorski, Johannes; Jando, Marlen; Lapidus, Alla L.; Nolan, Matt; Glavina Del Rio, Tijana; Tice, Hope; Copeland, A; Cheng, Jan-Fang; Chen, Feng; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Saunders, Elizabeth H; Han, Cliff; Detter, J C; Brettin, Thomas S; Rohde, Manfred; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2010-01-01

    Gordonia bronchialis Tsukamura 1971 is the type species of the genus. G. bronchialis is a human-pathogenic organism that has been isolated from a large variety of human tissues. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family Gordoniaceae. The 5,290,012 bp long genome with its 4,944 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  20. Transcriptome and genome sequencing uncovers functional variation in humans.

    PubMed

    Lappalainen, Tuuli; Sammeth, Michael; Friedländer, Marc R; 't Hoen, Peter A C; Monlong, Jean; Rivas, Manuel A; Gonzàlez-Porta, Mar; Kurbatova, Natalja; Griebel, Thasso; Ferreira, Pedro G; Barann, Matthias; Wieland, Thomas; Greger, Liliana; van Iterson, Maarten; Almlöf, Jonas; Ribeca, Paolo; Pulyakhina, Irina; Esser, Daniela; Giger, Thomas; Tikhonov, Andrew; Sultan, Marc; Bertier, Gabrielle; MacArthur, Daniel G; Lek, Monkol; Lizano, Esther; Buermans, Henk P J; Padioleau, Ismael; Schwarzmayr, Thomas; Karlberg, Olof; Ongen, Halit; Kilpinen, Helena; Beltran, Sergi; Gut, Marta; Kahlem, Katja; Amstislavskiy, Vyacheslav; Stegle, Oliver; Pirinen, Matti; Montgomery, Stephen B; Donnelly, Peter; McCarthy, Mark I; Flicek, Paul; Strom, Tim M; Lehrach, Hans; Schreiber, Stefan; Sudbrak, Ralf; Carracedo, Angel; Antonarakis, Stylianos E; Häsler, Robert; Syvänen, Ann-Christine; van Ommen, Gert-Jan; Brazma, Alvis; Meitinger, Thomas; Rosenstiel, Philip; Guigó, Roderic; Gut, Ivo G; Estivill, Xavier; Dermitzakis, Emmanouil T

    2013-09-26

    Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project--the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.

  1. Complete genome sequence of Staphylothermus hellenicus P8T

    SciTech Connect

    Anderson, Iain; Wirth, Reinhard; Lucas, Susan; Copeland, A; Lapidus, Alla L.; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Davenport, Karen W.; Detter, J. Chris; Han, Cliff; Tapia, Roxanne; Land, Miriam L; Hauser, Loren John; Pati, Amrita; Mikhailova, Natalia; Woyke, Tanja; Klenk, Hans-Peter; Kyrpides, Nikos C; Ivanova, N

    2011-01-01

    Staphylothermus hellenicus belongs to the order Desulfurococcales within the archaeal phy- lum Crenarchaeota. Strain P8T is the type strain of the species and was isolated from a shal- low hydrothermal vent system at Palaeochori Bay, Milos, Greece. It is a hyperthermophilic, anaerobic heterotroph. Here we describe the features of this organism together with the com- plete genome sequence and annotation. The 1,580,347 bp genome with its 1,668 protein- coding and 48 RNA genes was sequenced as part of a DOE Joint Genome Institute (JGI) La- boratory Sequencing Program (LSP) project.

  2. Validation of rice genome sequence by optical mapping

    PubMed Central

    Zhou, Shiguo; Bechner, Michael C; Place, Michael; Churas, Chris P; Pape, Louise; Leong, Sally A; Runnheim, Rod; Forrest, Dan K; Goldstein, Steve; Livny, Miron; Schwartz, David C

    2007-01-01

    Background Rice feeds much of the world, and possesses the simplest genome analyzed to date within the grass family, making it an economically relevant model system for other cereal crops. Although the rice genome is sequenced, validation and gap closing efforts require purely independent means for accurate finishing of sequence build data. Results To facilitate ongoing sequencing finishing and validation efforts, we have constructed a whole-genome SwaI optical restriction map of the rice genome. The physical map consists of 14 contigs, covering 12 chromosomes, with a total genome size of 382.17 Mb; this value is about 11% smaller than original estimates. 9 of the 14 optical map contigs are without gaps, covering chromosomes 1, 2, 3, 4, 5, 7, 8 10, and 12 in their entirety – including centromeres and telomeres. Alignments between optical and in silico restriction maps constructed from IRGSP (International Rice Genome Sequencing Project) and TIGR (The Institute for Genomic Research) genome sequence sources are comprehensive and informative, evidenced by map coverage across virtually all published gaps, discovery of new ones, and characterization of sequence misassemblies; all totalling ~14 Mb. Furthermore, since optical maps are ordered restriction maps, identified discordances are pinpointed on a reliable physical scaffold providing an independent resource for closure of gaps and rectification of misassemblies. Conclusion Analysis of sequence and optical mapping data effectively validates genome sequence assemblies constructed from large, repeat-rich genomes. Given this conclusion we envision new applications of such single molecule analysis that will merge advantages offered by high-resolution optical maps with inexpensive, but short sequence reads generated by emerging sequencing platforms. Lastly, map construction techniques presented here points the way to new types of comparative genome analysis that would focus on discernment of structural differences

  3. The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes.

    PubMed

    Mao, Qing; Ciotlos, Serban; Zhang, Rebecca Yu; Ball, Madeleine P; Chin, Robert; Carnevali, Paolo; Barua, Nina; Nguyen, Staci; Agarwal, Misha R; Clegg, Tom; Connelly, Abram; Vandewege, Ward; Zaranek, Alexander Wait; Estep, Preston W; Church, George M; Drmanac, Radoje; Peters, Brock A

    2016-10-11

    Since the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information. As part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics' Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics' standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data. These genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function.

  4. Cancer whole-genome sequencing: present and future.

    PubMed

    Nakagawa, H; Wardell, C P; Furuta, M; Taniguchi, H; Fujimoto, A

    2015-12-03

    Recent explosive advances in next-generation sequencing technology and computational approaches to massive data enable us to analyze a number of cancer genome profiles by whole-genome sequencing (WGS). To explore cancer genomic alterations and their diversity comprehensively, global and local cancer genome-sequencing projects, including ICGC and TCGA, have been analyzing many types of cancer genomes mainly by exome sequencing. However, there is limited information on somatic mutations in non-coding regions including untranslated regions, introns, regulatory elements and non-coding RNAs, and rearrangements, sometimes producing fusion genes, and pathogen detection in cancer genomes remain widely unexplored. WGS approaches can detect these unexplored mutations, as well as coding mutations and somatic copy number alterations, and help us to better understand the whole landscape of cancer genomes and elucidate functions of these unexplored genomic regions. Analysis of cancer genomes using the present WGS platforms is still primitive and there are substantial improvements to be made in sequencing technologies, informatics and computer resources. Taking account of the extreme diversity of cancer genomes and phenotype, it is also required to analyze much more WGS data and integrate these with multi-omics data, functional data and clinical-pathological data in a large number of sample sets to interpret them more fully and efficiently.

  5. Progress in Understanding and Sequencing the Genome of Brassica rapa

    PubMed Central

    Hong, Chang Pyo; Kwon, Soo-Jin; Kim, Jung Sun; Yang, Tae-Jin; Park, Beom-Seok; Lim, Yong Pyo

    2008-01-01

    Brassica rapa, which is closely related to Arabidopsis thaliana, is an important crop and a model plant for studying genome evolution via polyploidization. We report the current understanding of the genome structure of B. rapa and efforts for the whole-genome sequencing of the species. The tribe Brassicaceae, which comprises ca. 240 species, descended from a common hexaploid ancestor with a basic genome similar to that of Arabidopsis. Chromosome rearrangements, including fusions and/or fissions, resulted in the present-day “diploid” Brassica species with variation in chromosome number and phenotype. Triplicated genomic segments of B. rapa are collinear to those of A. thaliana with InDels. The genome triplication has led to an approximately 1.7-fold increase in the B. rapa gene number compared to that of A. thaliana. Repetitive DNA of B. rapa has also been extensively amplified and has diverged from that of A. thaliana. For its whole-genome sequencing, the Brassica rapa Genome Sequencing Project (BrGSP) consortium has developed suitable genomic resources and constructed genetic and physical maps. Ten chromosomes of B. rapa are being allocated to BrGSP consortium participants, and each chromosome will be sequenced by a BAC-by-BAC approach. Genome sequencing of B. rapa will offer a new perspective for plant biology and evolution in the context of polyploidization. PMID:18288250

  6. Rickettsia felis, from culture to genome sequencing.

    PubMed

    Ogata, H; Robert, C; Audic, S; Robineau, S; Blanc, G; Fournier, P E; Renesto, P; Claverie, J M; Raoult, D

    2005-12-01

    Rickettsia felis has been recently cultured in XTC2 cells. This allows production of enough bacteria to create a genomic bank and to sequence it. The chromosome of R. felis is longer than that of previously sequenced rickettsiae and it possess 2 plasmids. Microscopically, this bacterium exhibits two forms of pili: one resembles a conjugative pilus and another forms hair-like projections that may play a role in pathogenicity. R. felis also exhibits several copies of ankyrin-repeat genes and tetratricopeptide encoding gene that are specifically linked to pathogenic host-associated bacteria. It also contains toxin-antitoxin system encoding genes that are extremely rare in intracellular bacteria and may be linked to plasmid maintenance.

  7. The Genome Sequencing Center at NCGR

    SciTech Connect

    Schilkey, Faye

    2010-06-02

    Faye Schilkey from the National Center for Genome Resources discusses NCGR's research, sequencing and analysis experience on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  8. Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

    PubMed

    Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi

    2017-07-01

    PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.

  9. Genome Sequence of Lactobacillus rhamnosus ATCC 8530

    PubMed Central

    Pittet, Vanessa; Ewen, Emily; Bushell, Barry R.

    2012-01-01

    Lactobacillus rhamnosus is found in the human gastrointestinal tract and is important for probiotics. We became interested in L. rhamnosus isolate ATCC 8530 in relation to beer spoilage and hops resistance. We report here the genome sequence of this isolate, along with a brief comparison to other available L. rhamnosus genome sequences. PMID:22247527

  10. Genome sequence of Lactobacillus rhamnosus ATCC 8530.

    PubMed

    Pittet, Vanessa; Ewen, Emily; Bushell, Barry R; Ziola, Barry

    2012-02-01

    Lactobacillus rhamnosus is found in the human gastrointestinal tract and is important for probiotics. We became interested in L. rhamnosus isolate ATCC 8530 in relation to beer spoilage and hops resistance. We report here the genome sequence of this isolate, along with a brief comparison to other available L. rhamnosus genome sequences.

  11. Marsupial Genome Sequences: Providing Insight into Evolution and Disease

    PubMed Central

    Deakin, Janine E.

    2012-01-01

    Marsupials (metatherians), with their position in vertebrate phylogeny and their unique biological features, have been studied for many years by a dedicated group of researchers, but it has only been since the sequencing of the first marsupial genome that their value has been more widely recognised. We now have genome sequences for three distantly related marsupial species (the grey short-tailed opossum, the tammar wallaby, and Tasmanian devil), with the promise of many more genomes to be sequenced in the near future, making this a particularly exciting time in marsupial genomics. The emergence of a transmissible cancer, which is obliterating the Tasmanian devil population, has increased the importance of obtaining and analysing marsupial genome sequence for understanding such diseases as well as for conservation efforts. In addition, these genome sequences have facilitated studies aimed at answering questions regarding gene and genome evolution and provided insight into the evolution of epigenetic mechanisms. Here I highlight the major advances in our understanding of evolution and disease, facilitated by marsupial genome projects, and speculate on the future contributions to be made by such sequences. PMID:24278712

  12. Personal genomes in progress: from the human genome project to the personal genome project.

    PubMed

    Lunshof, Jeantine E; Bobe, Jason; Aach, John; Angrist, Misha; Thakuria, Joseph V; Vorhaus, Daniel B; Hoehe, Margret R; Church, George M

    2010-01-01

    The cost of a diploid human genome sequence has dropped from about $70M to $2000 since 2007--even as the standards for redundancy have increased from 7x to 40x in order to improve call rates. Coupled with the low return on investment for common single-nucleotide polylmorphisms, this has caused a significant rise in interest in correlating genome sequences with comprehensive environmental and trait data (GET). The cost of electronic health records, imaging, and microbial, immunological, and behavioral data are also dropping quickly. Sharing such integrated GET datasets and their interpretations with a diversity of researchers and research subjects highlights the need for informed-consent models capable of addressing novel privacy and other issues, as well as for flexible data-sharing resources that make materials and data available with minimum restrictions on use. This article examines the Personal Genome Project's effort to develop a GET database as a public genomics resource broadly accessible to both researchers and research participants, while pursuing the highest standards in research ethics.

  13. Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data.

    PubMed

    Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay

    2013-01-01

    Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.

  14. Genomics and the Human Genome Project: implications for psychiatry.

    PubMed

    Kelsoe, John R

    2004-11-01

    In the past decade the Human Genome Project has made extraordinary strides in understanding of fundamental human genetics. The complete human genetic sequence has been determined, and the chromosomal location of almost all human genes identified. Presently, a large international consortium, the HapMap Project, is working to identify a large portion of genetic variation in different human populations and the structure and relationship of these variants to each other. The Human Genome Project has approached human genetics on a scale not previously seen in biology. This has been made possible by dramatic advances in high throughput technology and bio-informatics. Tools such as gene chips and micro-arrays have spawned an entirely new strategy to examine the function and expression of genes in a massively parallel fashion. Together these tools have dramatically advanced our knowledge about the human genome. They promise powerful new approaches to complex genetic traits such as psychiatric illness. The goals and progress of the Human Genome Project and the technology involved are reviewed. The implications of this science for psychiatric genetics are discussed.

  15. Maize genome sequencing by methylation filtration.

    PubMed

    Palmer, Lance E; Rabinowicz, Pablo D; O'Shaughnessy, Andrew L; Balija, Vivekanand S; Nascimento, Lidia U; Dike, Sujit; de la Bastide, Melissa; Martienssen, Robert A; McCombie, W Richard

    2003-12-19

    Gene enrichment strategies offer an alternative to sequencing large and repetitive genomes such as that of maize. We report the generation and analysis of nearly 100,000 undermethylated (or methylation filtration) maize sequences. Comparison with the rice genome reveals that methylation filtration results in a more comprehensive representation of maize genes than those that result from expressed sequence tags or transposon insertion sites sequences. About 7% of the repetitive DNA is unmethylated and thus selected in our libraries, but potentially active transposons and unmethylated organelle genomes can be identified. Reverse transcription polymerase chain reaction can be used to finish the maize transcriptome.

  16. Human genome sequencing in health and disease.

    PubMed

    Gonzaga-Jauregui, Claudia; Lupski, James R; Gibbs, Richard A

    2012-01-01

    Following the "finished," euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges.

  17. Human Genome Sequencing in Health and Disease

    PubMed Central

    Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.

    2013-01-01

    Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320

  18. Complete genome sequence of Cellulomonas flavigena type strain (134).

    PubMed

    Abt, Birte; Foster, Brian; Lapidus, Alla; Clum, Alicia; Sun, Hui; Pukall, Rüdiger; Lucas, Susan; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Goodwin, Lynne; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Rohde, Manfred; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-07-29

    Cellulomonas flavigena (Kellerman and McBeth 1912) Bergey et al. 1923 is the type species of the genus Cellulomonas of the actinobacterial family Cellulomonadaceae. Members of the genus Cellulomonas are of special interest for their ability to degrade cellulose and hemicellulose, particularly with regard to the use of biomass as an alternative energy source. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the genus Cellulomonas, and next to the human pathogen Tropheryma whipplei the second complete genome sequence within the actinobacterial family Cellulomonadaceae. The 4,123,179 bp long single replicon genome with its 3,735 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  19. Complete genome sequence of Cellulomonas flavigena type strain (134T)

    PubMed Central

    Abt, Birte; Foster, Brian; Lapidus, Alla; Clum, Alicia; Sun, Hui; Pukall, Rüdiger; Lucas, Susan; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Goodwin, Lynne; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Rohde, Manfred; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2010-01-01

    Cellulomonas flavigena (Kellerman and McBeth 1912) Bergey et al. 1923 is the type species of the genus Cellulomonas of the actinobacterial family Cellulomonadaceae. Members of the genus Cellulomonas are of special interest for their ability to degrade cellulose and hemicellulose, particularly with regard to the use of biomass as an alternative energy source. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the genus Cellulomonas, and next to the human pathogen Tropheryma whipplei the second complete genome sequence within the actinobacterial family Cellulomonadaceae. The 4,123,179 bp long single replicon genome with its 3,735 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304688

  20. Complete genome sequence of Cellulomonas flavigena type strain (134T)

    SciTech Connect

    Abt, Birte; Foster, Brian; Lapidus, Alla L.; Clum, Alicia; Sun, Hui; Pukall, Rudiger; Lucas, Susan; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Goodwin, Lynne A.; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Rohde, Manfred; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-01

    Cellulomonas flavigena (Kellerman and McBeth 1912) Bergey et al. 1923 is the type species of the genus Cellulomonas of the actinobacterial family Cellulomonadaceae. Members of the genus Cellulomonas are of special interest for their ability to degrade cellulose and hemicellulose, particularly with regard to the use of biomass as an alternative energy source. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the genus Cellulomonas, and next to the human pathogen Tropheryma whipplei the second complete genome sequence within the actinobacterial family Cellulomonadaceae. The 4,123,179 bp long single replicon genome with its 3,735 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  1. The Z curve database: a graphic representation of genome sequences.

    PubMed

    Zhang, Chun-Ting; Zhang, Ren; Ou, Hong-Yu

    2003-03-22

    Genome projects for many prokaryotic and eukaryotic species have been completed and more new genome projects are being underway currently. The availability of a large number of genomic sequences for researchers creates a need to find graphic tools to study genomes in a perceivable form. The Z curve is one of such tools available for visualizing genomes. The Z curve is a unique three-dimensional curve representation for a given DNA sequence in the sense that each can be uniquely reconstructed given the other. The Z curve database for more than 1000 genomes have been established here. The database contains the Z curves for archaea, bacteria, eukaryota, organelles, phages, plasmids, viroids and viruses, whose genomic sequences are currently available. All the 3-dimensional Z curves and their three component curves are stored in the database. The applications of the Z curve database on comparative genomics, gene prediction, computation of G+C content with a windowless technique, prediction of replication origins and terminations of bacterial and archaeal genomes and study of local deviations from the Chargaff Parity Rule 2 etc. are presented in detail. The Z curve database reported here is a treasure trove in which biologists could find useful biological knowledge.

  2. The genome sequence of parrot bornavirus 5.

    PubMed

    Guo, Jianhua; Tizard, Ian

    2015-12-01

    Although several new avian bornaviruses have recently been described, information on their evolution, virulence, and sequence are often limited. Here we report the complete genome sequence of parrot bornavirus 5 (PaBV-5) isolated from a case of proventricular dilatation disease in a Palm cockatoo (Probosciger aterrimus). The complete genome consists of 8842 nucleotides with distinct 5' and 3' end sequences. This virus shares nucleotide sequence identities of 69-74 % with other bornaviruses in the genomic regions excluding the 5' and 3' terminal sequences. Phylogenetic analysis based on the genomic regions demonstrated this new isolate is an isolated branch within the clade that includes the aquatic bird bornaviruses and the passerine bornaviruses. Based on phylogenetic analyses and its low nucleotide sequence identities with other bornavirus, we support the proposal that PaBV-5 be assigned to a new bornavirus species:- Psittaciform 2 bornavirus.

  3. Human genome project and sickle cell disease.

    PubMed

    Norman, Brenda J; Miller, Sheila D

    2011-01-01

    Sickle cell disease is one of the most common genetic blood disorders in the United States that affects 1 in every 375 African Americans. Sickle cell disease is an inherited condition caused by abnormal hemoglobin in the red blood cells. The Human Genome Project has provided valuable insight and extensive research advances in the understanding of the human genome and sickle cell disease. Significant progress in genetic knowledge has led to an increase in the ability for researchers to map and sequence genes for diagnosis, treatment, and prevention of sickle cell disease and other chronic illnesses. This article explores some of the recent knowledge and advances about sickle cell disease and the Human Genome Project.

  4. [DNA analysis for the post genome-sequencing era].

    PubMed

    Kambara, Hideki

    2002-05-01

    With the completion of the human genome sequencing, the new post genome-sequencing era has started. The major subjects are clarifying the function of genes to apply this information to medical as well as various industrial fields. Various DNA analysis methods and instruments for gene expression profiling as well as genetic diversity including SNPs typing are required and have been developed. Here, the history and technologies related to DNA analysis including the Wada project in the early 1980's, and the Human genome project from 1990 are described. Various new technologies have developed in this decade. They include a capillary gel array DNA sequencer, DNA chips, bead probe arrays, a new DNA sequencing method using pyrosequencing and an efficient SNP typing method by BAMPER.

  5. GenomeCons: a web server for manipulating multiple genome sequence alignments and their consensus sequences.

    PubMed

    Sato, Tetsuya; Suyama, Mikita

    2015-04-15

    Genome sequence alignments provide valuable information on many aspects of molecular biological processes. In this study, we developed a web server, GenomeCons, for manipulating multiple genome sequence alignments and their consensus sequences for high-throughput genome sequence analyses. This server facilitates the visual inspection of multiple genome sequence alignments for a set of genomic intervals at a time. This allows the user to examine how these sites are evolutionarily conserved over time for their functional importance. The server also reports consensus sequences for the input genomic intervals, which can be applied to downstream analyses such as the identification of common motifs in the regions determined by ChIP-seq experiments. GenomeCons is freely accessible at http://bioinfo.sls.kyushu-u.ac.jp/genomecons/ mikita@bioreg.kyushu-u.ac.jp. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. Translational genomics for plant breeding with the genome sequence explosion.

    PubMed

    Kang, Yang Jae; Lee, Taeyoung; Lee, Jayern; Shim, Sangrea; Jeong, Haneul; Satyawan, Dani; Kim, Moon Young; Lee, Suk-Ha

    2016-04-01

    The use of next-generation sequencers and advanced genotyping technologies has propelled the field of plant genomics in model crops and plants and enhanced the discovery of hidden bridges between genotypes and phenotypes. The newly generated reference sequences of unstudied minor plants can be annotated by the knowledge of model plants via translational genomics approaches. Here, we reviewed the strategies of translational genomics and suggested perspectives on the current databases of genomic resources and the database structures of translated information on the new genome. As a draft picture of phenotypic annotation, translational genomics on newly sequenced plants will provide valuable assistance for breeders and researchers who are interested in genetic studies. © 2015 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  7. Meeting the challenges of non-referenced genome assembly from short-read sequence data

    Treesearch

    M. Parks; A. Liston; R. Cronn

    2010-01-01

    Massively parallel sequencing technologies (MPST) offer unprecedented opportunities for novel sequencing projects. MPST, while offering tremendous sequencing capacity, are typically most effective in resequencing projects (as opposed to the sequencing of novel genomes) due to the fact that sequence is returned in relatively short reads. Nonetheless, there is great...

  8. Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.

    PubMed

    Staats, Martijn; Erkens, Roy H J; van de Vossenberg, Bart; Wieringa, Jan J; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E; Bakker, Freek T

    2013-01-01

    Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal

  9. Genome Sequencing and Analysis Conference IV

    SciTech Connect

    Not Available

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  10. Completely phased genome sequencing through chromosome sorting

    PubMed Central

    Yang, Hong; Chen, Xi; Wong, Wing Hung

    2011-01-01

    The two haploid genome sequences that a person inherits from the two parents represent the most fundamentally useful type of genetic information for the study of heritable diseases and the development of personalized medicine. Because of the difficulty in obtaining long-range phase information, current sequencing methods are unable to provide this information. Here, we introduce and show feasibility of a scalable approach capable of generating genomic sequences completely phased across the entire chromosome. PMID:21169219

  11. Complete genome sequence of Serratia plymuthica strain AS12

    SciTech Connect

    Neupane, Saraswoti; Finlay, Roger D.; Alstrom, Sadhna; Goodwin, Lynne A.; Kyrpides, Nikos C; Lucas, Susan; Lapidus, Alla L.; Bruce, David; Pitluck, Sam; Peters, Lin; Ovchinnikova, Galina; Chertkov, Olga; Han, James; Han, Cliff; Tapia, Roxanne; Detter, J. Chris; Land, Miriam L; Hauser, Loren John; Cheng, Jan-Fang; Ivanova, N; Pagani, Ioanna; Klenk, Hans-Peter; Woyke, Tanja; Hogberg, Nils

    2012-01-01

    A plant associated member of the family Enterobacteriaceae, Serratia plymuthica strain AS12 was isolated from rapeseed roots. It is of scientific interest due to its plant growth promoting and plant pathogen inhibiting ability. The genome of S. plymuthica AS12 comprises a 5,443,009 bp long circular chromosome, which consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome was sequenced within the 2010 DOE-JGI Community Sequencing Program (CSP2010) as part of the project entitled 'Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogens'.

  12. Complete genome sequence of Serratia plymuthica strain AS12

    PubMed Central

    Finlay, Roger D.; Alström, Sadhna; Goodwin, Lynne; Kyrpides, Nikos C.; Lucas, Susan; Lapidus, Alla; Bruce, David; Pitluck, Sam; Peters, Lin; Ovchinnikova, Galina; Chertkov, Olga; Han, James; Han, Cliff; Tapia, Roxanne; Detter, John C.; Land, Miriam; Hauser, Loren; Cheng, Jan-Fang; Ivanova, Natalia; Pagani, Ioanna; Klenk, Hans-Peter; Woyke, Tanja; Högberg, Nils

    2012-01-01

    A plant-associated member of the family Enterobacteriaceae, Serratia plymuthica strain AS12 was isolated from rapeseed roots. It is of scientific interest because it promotes plant growth and inhibits plant pathogens. The genome of S. plymuthica AS12 comprises a 5,443,009 bp long circular chromosome, which consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome was sequenced within the 2010 DOE-JGI Community Sequencing Program (CSP2010) as part of the project entitled “Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogens”. PMID:22768360

  13. GI-POP: a combinational annotation and genomic island prediction pipeline for ongoing microbial genome projects.

    PubMed

    Lee, Chi-Ching; Chen, Yi-Ping Phoebe; Yao, Tzu-Jung; Ma, Cheng-Yu; Lo, Wei-Cheng; Lyu, Ping-Chiang; Tang, Chuan Yi

    2013-04-10

    Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project.

  14. Peanut (Arachis hypogaea) Expressed Sequence Tag Project: Progress and Application

    PubMed Central

    Feng, Suping; Wang, Xingjun; Zhang, Xinyou; Dang, Phat M.; Holbrook, C. Corley; Culbreath, Albert K.; Wu, Yaoting; Guo, Baozhu

    2012-01-01

    Many plant ESTs have been sequenced as an alternative to whole genome sequences, including peanut because of the genome size and complexity. The US peanut research community had the historic 2004 Atlanta Genomics Workshop and named the EST project as a main priority. As of August 2011, the peanut research community had deposited 252,832 ESTs in the public NCBI EST database, and this resource has been providing the community valuable tools and core foundations for various genome-scale experiments before the whole genome sequencing project. These EST resources have been used for marker development, gene cloning, microarray gene expression and genetic map construction. Certainly, the peanut EST sequence resources have been shown to have a wide range of applications and accomplished its essential role at the time of need. Then the EST project contributes to the second historic event, the Peanut Genome Project 2010 Inaugural Meeting also held in Atlanta where it was decided to sequence the entire peanut genome. After the completion of peanut whole genome sequencing, ESTs or transcriptome will continue to play an important role to fill in knowledge gaps, to identify particular genes and to explore gene function. PMID:22745594

  15. An overview of the human genome project

    SciTech Connect

    Batzer, M.A.

    1994-01-01

    The human genome project is one of the most ambitious scientific projects to date, with the ultimate goal being a nucleotide sequence for all four billion bases of human DNA. In the process of determining the nucleotide sequence for each base, the location, function, and regulatory regions from the estimated 100,000 human genes will be identified. The genome project itself relies upon maps of the human genetic code derived from several different levels of resolution. Genetic linkage analysis provides a low resolution genome map. The information for genetic linkage maps is derived from the analysis of chromosome specific markers such as Sequence Tagged Sites (STSs), Variable Number of Tandem Repeats (VNTRs) or other polymorphic (highly informative) loci in a number of different-families. Using this information the location of an unknown disease gene can be limited to a region comprised of one million base pairs of DNA or less. After this point, one must construct or have access to a physical map of the region of interest. Physical mapping involves the construction of an ordered overlapping (contiguous) set of recombinant DNA clones. These clones may be derived from a number of different vectors including cosmids, Bacterial Artificial Chromosomes (BACs), P1 derived Artificial Chromosomes (PACs), somatic cell hybrids, or Yeast Artificial Chromosomes (YACs). The ultimate goal for physical mapping is to establish a completely overlapping (contiguous) set of clones for the entire genome. After a gene or region of interest has been localized using physical mapping the nucleotide sequence is determined. The overlap between genetic mapping, physical mapping and DNA sequencing has proven to be a powerful tool for the isolation of disease genes through positional cloning.

  16. Genomic sequencing of Pleistocene cave bears

    SciTech Connect

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  17. The genome sequence of Drosophila melanogaster.

    SciTech Connect

    2000-03-24

    The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the {approximately}120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes {approximately}13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

  18. The genome sequence of Drosophila melanogaster.

    PubMed

    Adams, M D; Celniker, S E; Holt, R A; Evans, C A; Gocayne, J D; Amanatides, P G; Scherer, S E; Li, P W; Hoskins, R A; Galle, R F; George, R A; Lewis, S E; Richards, S; Ashburner, M; Henderson, S N; Sutton, G G; Wortman, J R; Yandell, M D; Zhang, Q; Chen, L X; Brandon, R C; Rogers, Y H; Blazej, R G; Champe, M; Pfeiffer, B D; Wan, K H; Doyle, C; Baxter, E G; Helt, G; Nelson, C R; Gabor, G L; Abril, J F; Agbayani, A; An, H J; Andrews-Pfannkoch, C; Baldwin, D; Ballew, R M; Basu, A; Baxendale, J; Bayraktaroglu, L; Beasley, E M; Beeson, K Y; Benos, P V; Berman, B P; Bhandari, D; Bolshakov, S; Borkova, D; Botchan, M R; Bouck, J; Brokstein, P; Brottier, P; Burtis, K C; Busam, D A; Butler, H; Cadieu, E; Center, A; Chandra, I; Cherry, J M; Cawley, S; Dahlke, C; Davenport, L B; Davies, P; de Pablos, B; Delcher, A; Deng, Z; Mays, A D; Dew, I; Dietz, S M; Dodson, K; Doup, L E; Downes, M; Dugan-Rocha, S; Dunkov, B C; Dunn, P; Durbin, K J; Evangelista, C C; Ferraz, C; Ferriera, S; Fleischmann, W; Fosler, C; Gabrielian, A E; Garg, N S; Gelbart, W M; Glasser, K; Glodek, A; Gong, F; Gorrell, J H; Gu, Z; Guan, P; Harris, M; Harris, N L; Harvey, D; Heiman, T J; Hernandez, J R; Houck, J; Hostin, D; Houston, K A; Howland, T J; Wei, M H; Ibegwam, C; Jalali, M; Kalush, F; Karpen, G H; Ke, Z; Kennison, J A; Ketchum, K A; Kimmel, B E; Kodira, C D; Kraft, C; Kravitz, S; Kulp, D; Lai, Z; Lasko, P; Lei, Y; Levitsky, A A; Li, J; Li, Z; Liang, Y; Lin, X; Liu, X; Mattei, B; McIntosh, T C; McLeod, M P; McPherson, D; Merkulov, G; Milshina, N V; Mobarry, C; Morris, J; Moshrefi, A; Mount, S M; Moy, M; Murphy, B; Murphy, L; Muzny, D M; Nelson, D L; Nelson, D R; Nelson, K A; Nixon, K; Nusskern, D R; Pacleb, J M; Palazzolo, M; Pittman, G S; Pan, S; Pollard, J; Puri, V; Reese, M G; Reinert, K; Remington, K; Saunders, R D; Scheeler, F; Shen, H; Shue, B C; Sidén-Kiamos, I; Simpson, M; Skupski, M P; Smith, T; Spier, E; Spradling, A C; Stapleton, M; Strong, R; Sun, E; Svirskas, R; Tector, C; Turner, R; Venter, E; Wang, A H; Wang, X; Wang, Z Y; Wassarman, D A; Weinstock, G M; Weissenbach, J; Williams, S M; WoodageT; Worley, K C; Wu, D; Yang, S; Yao, Q A; Ye, J; Yeh, R F; Zaveri, J S; Zhan, M; Zhang, G; Zhao, Q; Zheng, L; Zheng, X H; Zhong, F N; Zhong, W; Zhou, X; Zhu, S; Zhu, X; Smith, H O; Gibbs, R A; Myers, E W; Rubin, G M; Venter, J C

    2000-03-24

    The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

  19. Accurate whole human genome sequencing using reversible terminator chemistry.

    PubMed

    Bentley, David R; Balasubramanian, Shankar; Swerdlow, Harold P; Smith, Geoffrey P; Milton, John; Brown, Clive G; Hall, Kevin P; Evers, Dirk J; Barnes, Colin L; Bignell, Helen R; Boutell, Jonathan M; Bryant, Jason; Carter, Richard J; Keira Cheetham, R; Cox, Anthony J; Ellis, Darren J; Flatbush, Michael R; Gormley, Niall A; Humphray, Sean J; Irving, Leslie J; Karbelashvili, Mirian S; Kirk, Scott M; Li, Heng; Liu, Xiaohai; Maisinger, Klaus S; Murray, Lisa J; Obradovic, Bojan; Ost, Tobias; Parkinson, Michael L; Pratt, Mark R; Rasolonjatovo, Isabelle M J; Reed, Mark T; Rigatti, Roberto; Rodighiero, Chiara; Ross, Mark T; Sabot, Andrea; Sankar, Subramanian V; Scally, Aylwyn; Schroth, Gary P; Smith, Mark E; Smith, Vincent P; Spiridou, Anastassia; Torrance, Peta E; Tzonev, Svilen S; Vermaas, Eric H; Walter, Klaudia; Wu, Xiaolin; Zhang, Lu; Alam, Mohammed D; Anastasi, Carole; Aniebo, Ify C; Bailey, David M D; Bancarz, Iain R; Banerjee, Saibal; Barbour, Selena G; Baybayan, Primo A; Benoit, Vincent A; Benson, Kevin F; Bevis, Claire; Black, Phillip J; Boodhun, Asha; Brennan, Joe S; Bridgham, John A; Brown, Rob C; Brown, Andrew A; Buermann, Dale H; Bundu, Abass A; Burrows, James C; Carter, Nigel P; Castillo, Nestor; Chiara E Catenazzi, Maria; Chang, Simon; Neil Cooley, R; Crake, Natasha R; Dada, Olubunmi O; Diakoumakos, Konstantinos D; Dominguez-Fernandez, Belen; Earnshaw, David J; Egbujor, Ugonna C; Elmore, David W; Etchin, Sergey S; Ewan, Mark R; Fedurco, Milan; Fraser, Louise J; Fuentes Fajardo, Karin V; Scott Furey, W; George, David; Gietzen, Kimberley J; Goddard, Colin P; Golda, George S; Granieri, Philip A; Green, David E; Gustafson, David L; Hansen, Nancy F; Harnish, Kevin; Haudenschild, Christian D; Heyer, Narinder I; Hims, Matthew M; Ho, Johnny T; Horgan, Adrian M; Hoschler, Katya; Hurwitz, Steve; Ivanov, Denis V; Johnson, Maria Q; James, Terena; Huw Jones, T A; Kang, Gyoung-Dong; Kerelska, Tzvetana H; Kersey, Alan D; Khrebtukova, Irina; Kindwall, Alex P; Kingsbury, Zoya; Kokko-Gonzales, Paula I; Kumar, Anil; Laurent, Marc A; Lawley, Cynthia T; Lee, Sarah E; Lee, Xavier; Liao, Arnold K; Loch, Jennifer A; Lok, Mitch; Luo, Shujun; Mammen, Radhika M; Martin, John W; McCauley, Patrick G; McNitt, Paul; Mehta, Parul; Moon, Keith W; Mullens, Joe W; Newington, Taksina; Ning, Zemin; Ling Ng, Bee; Novo, Sonia M; O'Neill, Michael J; Osborne, Mark A; Osnowski, Andrew; Ostadan, Omead; Paraschos, Lambros L; Pickering, Lea; Pike, Andrew C; Pike, Alger C; Chris Pinkard, D; Pliskin, Daniel P; Podhasky, Joe; Quijano, Victor J; Raczy, Come; Rae, Vicki H; Rawlings, Stephen R; Chiva Rodriguez, Ana; Roe, Phyllida M; Rogers, John; Rogert Bacigalupo, Maria C; Romanov, Nikolai; Romieu, Anthony; Roth, Rithy K; Rourke, Natalie J; Ruediger, Silke T; Rusman, Eli; Sanches-Kuiper, Raquel M; Schenker, Martin R; Seoane, Josefina M; Shaw, Richard J; Shiver, Mitch K; Short, Steven W; Sizto, Ning L; Sluis, Johannes P; Smith, Melanie A; Ernest Sohna Sohna, Jean; Spence, Eric J; Stevens, Kim; Sutton, Neil; Szajkowski, Lukasz; Tregidgo, Carolyn L; Turcatti, Gerardo; Vandevondele, Stephanie; Verhovsky, Yuli; Virk, Selene M; Wakelin, Suzanne; Walcott, Gregory C; Wang, Jingwen; Worsley, Graham J; Yan, Juying; Yau, Ling; Zuerlein, Mike; Rogers, Jane; Mullikin, James C; Hurles, Matthew E; McCooke, Nick J; West, John S; Oaks, Frank L; Lundberg, Peter L; Klenerman, David; Durbin, Richard; Smith, Anthony J

    2008-11-06

    DNA sequence information underpins genetic research, enabling discoveries of important biological or medical benefit. Sequencing projects have traditionally used long (400-800 base pair) reads, but the existence of reference sequences for the human and many other genomes makes it possible to develop new, fast approaches to re-sequencing, whereby shorter reads are compared to a reference to identify intraspecies genetic variation. Here we report an approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost. Single molecules of DNA are attached to a flat surface, amplified in situ and used as templates for synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analysed to generate high-quality sequence. We demonstrate application of this approach to human genome sequencing on flow-sorted X chromosomes and then scale the approach to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. We build an accurate consensus sequence from >30x average depth of paired 35-base reads. We characterize four million single-nucleotide polymorphisms and four hundred thousand structural variants, many of which were previously unknown. Our approach is effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.

  20. Human genetics and genomics a decade after the release of the draft sequence of the human genome.

    PubMed

    Naidoo, Nasheen; Pawitan, Yudi; Soong, Richie; Cooper, David N; Ku, Chee-Seng

    2011-10-01

    Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade.

  1. Human genetics and genomics a decade after the release of the draft sequence of the human genome

    PubMed Central

    2011-01-01

    Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade. PMID:22155605

  2. Draft genome sequences of 21 Salmonella enterica serovar enteritidis strains.

    PubMed

    Timme, Ruth E; Allard, Marc W; Luo, Yan; Strain, Errol; Pettengill, James; Wang, Charles; Li, Cong; Keys, Christine E; Zheng, Jie; Stones, Robert; Wilson, Mark R; Musser, Steven M; Brown, Eric W

    2012-11-01

    Salmonella enterica subsp. enterica serovar Enteritidis is a common food-borne pathogen, often associated with shell eggs and poultry. Here, we report draft genomes of 21 S. Enteritidis strains associated with or related to the U.S.-wide 2010 shell egg recall. Eleven of these genomes were from environmental isolates associated with the egg outbreak, and 10 were reference isolates from previous years, unrelated to the outbreak. The whole-genome sequence data for these 21 human pathogen strains are being released in conjunction with the newly formed 100K Genome Project.

  3. BioProject Number PRJNA254501: Total of 190 sampled plants from Physalis philadelphica (tomatillo) genome sequencing BioSamples SAMN02904339-SAMN02904528

    USDA-ARS?s Scientific Manuscript database

    This BioProject consists of raw genotyping-by-sequencing data collected in 96-plex format on an Illumina HiSeq 2000 sequencing system. There were 190 sampled plants from Physalis philadelphica (tomatillo). The experiment resulted in the development of more than 77,000 single nucleotide polymorphism ...

  4. The Human Genome Diversity Project

    SciTech Connect

    Cavalli-Sforza, L.

    1994-12-31

    The Human Genome Diversity Project (HGD Project) is an international anthropology project that seeks to study the genetic richness of the entire human species. This kind of genetic information can add a unique thread to the tapestry knowledge of humanity. Culture, environment, history, and other factors are often more important, but humanity`s genetic heritage, when analyzed with recent technology, brings another type of evidence for understanding species` past and present. The Project will deepen the understanding of this genetic richness and show both humanity`s diversity and its deep and underlying unity. The HGD Project is still largely in its planning stages, seeking the best ways to reach its goals. The continuing discussions of the Project, throughout the world, should improve the plans for the Project and their implementation. The Project is as global as humanity itself; its implementation will require the kinds of partnerships among different nations and cultures that make the involvement of UNESCO and other international organizations particularly appropriate. The author will briefly discuss the Project`s history, describe the Project, set out the core principles of the Project, and demonstrate how the Project will help combat the scourge of racism.

  5. Implications of the Human Genome Project

    SciTech Connect

    Kitcher, P.

    1998-11-01

    The Human Genome Project (HGP), launched in 1991, aims to map and sequence the human genome by 2006. During the fifteen-year life of the project, it is projected that $3 billion in federal funds will be allocated to it. The ultimate aims of spending this money are to analyze the structure of human DNA, to identify all human genes, to recognize the functions of those genes, and to prepare for the biology and medicine of the twenty-first century. The following summary examines some of the implications of the program, concentrating on its scientific import and on the ethical and social problems that it raises. Its aim is to expose principles that might be used in applying the information which the HGP will generate. There is no attempt here to translate the principles into detailed proposals for legislation. Arguments and discussion can be found in the full report, but, like this summary, that report does not contain any legislative proposals.

  6. Exploiting long read sequencing technologies to establish high quality highly contiguous pig reference genome assemblies

    USDA-ARS?s Scientific Manuscript database

    The current pig reference genome sequence (Sscrofa10.2) was established using Sanger sequencing and following the clone-by-clone hierarchical shotgun sequencing approach used in the public human genome project. However, as sequence coverage was low (4-6x) the resulting assembly was only of draft qua...

  7. Cancer genome-sequencing study design.

    PubMed

    Mwenifumbo, Jill C; Marra, Marco A

    2013-05-01

    Discoveries from cancer genome sequencing have the potential to translate into advances in cancer prevention, diagnostics, prognostics, treatment and basic biology. Given the diversity of downstream applications, cancer genome-sequencing studies need to be designed to best fulfil specific aims. Knowledge of second-generation cancer genome-sequencing study design also facilitates assessment of the validity and importance of the rapidly growing number of published studies. In this Review, we focus on the practical application of second-generation sequencing technology (also known as next-generation sequencing) to cancer genomics and discuss how aspects of study design and methodological considerations - such as the size and composition of the discovery cohort - can be tailored to serve specific research aims.

  8. Strategies for complete plastid genome sequencing.

    PubMed

    Twyford, Alex D; Ness, Rob W

    2016-10-28

    Plastid sequencing is an essential tool in the study of plant evolution. This high-copy organelle is one of the most technically accessible regions of the genome, and its sequence conservation makes it a valuable region for comparative genome evolution, phylogenetic analysis and population studies. Here, we discuss recent innovations and approaches for de novo plastid assembly that harness genomic tools. We focus on technical developments including low-cost sequence library preparation approaches for genome skimming, enrichment via hybrid baits and methylation-sensitive capture, sequence platforms with higher read outputs and longer read lengths, and automated tools for assembly. These developments allow for a much more streamlined assembly than via conventional short-range PCR. Although newer methods make complete plastid sequencing possible for any land plant or green alga, there are still challenges for producing finished plastomes particularly from herbarium material or from structurally divergent plastids such as those of parasitic plants.

  9. The Human Genome Project: how do we protect Australians?

    PubMed

    Stott Despoja, N

    It is the moon landing of the nineties: the ambitious Human Genome Project--identifying the up to 100,000 genes that make up human DNA and the sequences of the three billion base-pairs that comprise the human genome. However, unlike the moon landing, the effects of the genome project will have a fundamental impact on the way we see ourselves and each other.

  10. Standardized metadata for human pathogen/vector genomic sequences.

    PubMed

    Dugan, Vivien G; Emrich, Scott J; Giraldo-Calderón, Gloria I; Harb, Omar S; Newman, Ruchi M; Pickett, Brett E; Schriml, Lynn M; Stockwell, Timothy B; Stoeckert, Christian J; Sullivan, Dan E; Singh, Indresh; Ward, Doyle V; Yao, Alison; Zheng, Jie; Barrett, Tanya; Birren, Bruce; Brinkac, Lauren; Bruno, Vincent M; Caler, Elizabet; Chapman, Sinéad; Collins, Frank H; Cuomo, Christina A; Di Francesco, Valentina; Durkin, Scott; Eppinger, Mark; Feldgarden, Michael; Fraser, Claire; Fricke, W Florian; Giovanni, Maria; Henn, Matthew R; Hine, Erin; Hotopp, Julie Dunning; Karsch-Mizrachi, Ilene; Kissinger, Jessica C; Lee, Eun Mi; Mathur, Punam; Mongodin, Emmanuel F; Murphy, Cheryl I; Myers, Garry; Neafsey, Daniel E; Nelson, Karen E; Nierman, William C; Puzak, Julia; Rasko, David; Roos, David S; Sadzewicz, Lisa; Silva, Joana C; Sobral, Bruno; Squires, R Burke; Stevens, Rick L; Tallon, Luke; Tettelin, Herve; Wentworth, David; White, Owen; Will, Rebecca; Wortman, Jennifer; Zhang, Yun; Scheuermann, Richard H

    2014-01-01

    High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS) and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a

  11. Standardized Metadata for Human Pathogen/Vector Genomic Sequences

    PubMed Central

    Dugan, Vivien G.; Emrich, Scott J.; Giraldo-Calderón, Gloria I.; Harb, Omar S.; Newman, Ruchi M.; Pickett, Brett E.; Schriml, Lynn M.; Stockwell, Timothy B.; Stoeckert, Christian J.; Sullivan, Dan E.; Singh, Indresh; Ward, Doyle V.; Yao, Alison; Zheng, Jie; Barrett, Tanya; Birren, Bruce; Brinkac, Lauren; Bruno, Vincent M.; Caler, Elizabet; Chapman, Sinéad; Collins, Frank H.; Cuomo, Christina A.; Di Francesco, Valentina; Durkin, Scott; Eppinger, Mark; Feldgarden, Michael; Fraser, Claire; Fricke, W. Florian; Giovanni, Maria; Henn, Matthew R.; Hine, Erin; Hotopp, Julie Dunning; Karsch-Mizrachi, Ilene; Kissinger, Jessica C.; Lee, Eun Mi; Mathur, Punam; Mongodin, Emmanuel F.; Murphy, Cheryl I.; Myers, Garry; Neafsey, Daniel E.; Nelson, Karen E.; Nierman, William C.; Puzak, Julia; Rasko, David; Roos, David S.; Sadzewicz, Lisa; Silva, Joana C.; Sobral, Bruno; Squires, R. Burke; Stevens, Rick L.; Tallon, Luke; Tettelin, Herve; Wentworth, David; White, Owen; Will, Rebecca; Wortman, Jennifer; Zhang, Yun; Scheuermann, Richard H.

    2014-01-01

    High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium’s minimal information (MIxS) and NCBI’s BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a

  12. Sequencing three crocodilian genomes to illuminate the evolution of archosaurs and amniotes.

    PubMed

    St John, John A; Braun, Edward L; Isberg, Sally R; Miles, Lee G; Chong, Amanda Y; Gongora, Jaime; Dalzell, Pauline; Moran, Christopher; Bed'hom, Bertrand; Abzhanov, Arkhat; Burgess, Shane C; Cooksey, Amanda M; Castoe, Todd A; Crawford, Nicholas G; Densmore, Llewellyn D; Drew, Jennifer C; Edwards, Scott V; Faircloth, Brant C; Fujita, Matthew K; Greenwold, Matthew J; Hoffmann, Federico G; Howard, Jonathan M; Iguchi, Taisen; Janes, Daniel E; Khan, Shahid Yar; Kohno, Satomi; de Koning, Ap Jason; Lance, Stacey L; McCarthy, Fiona M; McCormack, John E; Merchant, Mark E; Peterson, Daniel G; Pollock, David D; Pourmand, Nader; Raney, Brian J; Roessler, Kyria A; Sanford, Jeremy R; Sawyer, Roger H; Schmidt, Carl J; Triplett, Eric W; Tuberville, Tracey D; Venegas-Anaya, Miryam; Howard, Jason T; Jarvis, Erich D; Guillette, Louis J; Glenn, Travis C; Green, Richard E; Ray, David A

    2012-01-31

    The International Crocodilian Genomes Working Group (ICGWG) will sequence and assemble the American alligator (Alligator mississippiensis), saltwater crocodile (Crocodylus porosus) and Indian gharial (Gavialis gangeticus) genomes. The status of these projects and our planned analyses are described.

  13. Sequencing three crocodilian genomes to illuminate the evolution of archosaurs and amniotes

    PubMed Central

    2012-01-01

    The International Crocodilian Genomes Working Group (ICGWG) will sequence and assemble the American alligator (Alligator mississippiensis), saltwater crocodile (Crocodylus porosus) and Indian gharial (Gavialis gangeticus) genomes. The status of these projects and our planned analyses are described. PMID:22293439

  14. Coupled amplification and sequencing of genomic DNA.

    PubMed Central

    Ruano, G; Kidd, K K

    1991-01-01

    Addition of dideoxyribonucleotides during the exponential phase of the PCR should result in the synthesis of two complementary sequence ladders. We have explored this hypothesis to develop coupled amplification and sequencing of genomic DNA. Coupled amplification and sequencing is a biphasic method for sequencing both strands of template as they are amplified. Stage I selects and amplifies a single target from the genomic DNA sample. Stage II accomplishes the sequencing as well as additional amplification of the target using aliquots from the stage I reaction mixed with end-labeled primer and dideoxynucleotides. We have successfully applied coupled amplification and sequencing to a 300-base-pair fragment 4 kilobases upstream from HOX2B directly from human whole genomic DNA. Images PMID:1672768

  15. Sequencing and Analysis of Neanderthal Genomic DNA

    PubMed Central

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith, Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Pääbo, Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2008-01-01

    Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence indicate that the 65,250 base pairs of hominid sequence so far identified in the library are of Neanderthal origin, the strongest being the ascertainment of sequence identities between Neanderthal and chimpanzee at sites where the human genomic sequence is different. These results enabled us to calculate the human-Neanderthal divergence time based on multiple randomly distributed autosomal loci. Our analyses suggest that on average the Neanderthal genomic sequence we obtained and the reference human genome sequence share a most recent common ancestor ~706,000 years ago, and that the human and Neanderthal ancestral populations split ~370,000 years ago, before the emergence of anatomically modern humans. Our finding that the Neanderthal and human genomes are at least 99.5% identical led us to develop and successfully implement a targeted method for recovering specific ancient DNA sequences from metagenomic libraries. This initial analysis of the Neanderthal genome advances our understanding of the evolutionary relationship of Homo sapiens and Homo neanderthalensis and signifies the dawn of Neanderthal genomics. PMID:17110569

  16. Genome Radio Project: Quarterly report

    SciTech Connect

    1997-08-01

    The process of conducting background research for the programs of the Genome Radio Project is continuing. The most developed of the program ``backgrounders`` have been reviewed by series and program advisors from various fields. Preliminary and background interviews have been conducted with dozens of potential program participants and advisors. Structurally, efforts are being directed toward developing and formalizing the project and series advisor relationships so that the best use can be made of those experts who have offered to assist the project in its presentation of program content. The library of research materials has been expanded considerably, creating a useful resource library for the producers.

  17. Microbial species delineation using whole genome sequences

    SciTech Connect

    Kyrpides, Nikos; Mukherjee, Supratim; Ivanova, Natalia; Mavrommatics, Kostas; Pati, Amrita; Konstantinidis, Konstantinos

    2014-10-20

    Species assignments in prokaryotes use a manual, poly-phasic approach utilizing both phenotypic traits and sequence information of phylogenetic marker genes. With thousands of genomes being sequenced every year, an automated, uniform and scalable approach exploiting the rich genomic information in whole genome sequences is desired, at least for the initial assignment of species to an organism. We have evaluated pairwise genome-wide Average Nucleotide Identity (gANI) values and alignment fractions (AFs) for nearly 13,000 genomes using our fast implementation of the computation, identifying robust and widely applicable hard cut-offs for species assignments based on AF and gANI. Using these cutoffs, we generated stable species-level clusters of organisms, which enabled the identification of several species mis-assignments and facilitated the assignment of species for organisms without species definitions.

  18. Genome sequence of Coxiella burnetii strain Namibia

    PubMed Central

    2014-01-01

    We present the whole genome sequence and annotation of the Coxiella burnetii strain Namibia. This strain was isolated from an aborting goat in 1991 in Windhoek, Namibia. The plasmid type QpRS was confirmed in our work. Further genomic typing placed the strain into a unique genomic group. The genome sequence is 2,101,438 bp long and contains 1,979 protein-coding and 51 RNA genes, including one rRNA operon. To overcome the poor yield from cell culture systems, an additional DNA enrichment with whole genome amplification (WGA) methods was applied. We describe a bioinformatics pipeline for improved genome assembly including several filters with a special focus on WGA characteristics. PMID:25593636

  19. Complete genome sequence of Streptobacillus moniliformis type strain (9901T)

    SciTech Connect

    Nolan, Matt; Gronow, Sabine; Lapidus, Alla L.; Ivanova, N; Copeland, A; Lucas, Susan; Glavina Del Rio, Tijana; Chen, Feng; Sims, David; Meincke, Linda; Bruce, David; Goodwin, Lynne A.; Han, Cliff; Detter, J. Chris; Ovchinnikova, Galina; Pati, Amrita; Mavromatis, K; Mikhailova, Natalia; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Rohde, Manfred; Sproer, Cathrin; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Chain, Patrick S. G.

    2009-01-01

    Streptobacillus moniliformis Levaditi et al. 1925 is the sole and type species of the genus, and is of phylogenetic interest because of its isolated location in the sparsely populated and neither taxonomically nor genomically much accessed family 'Leptotrichiaceae' within the phylum 'Fusobacteria'. S. moniliformis, a Gram-negative, non-motile and pleomorphic bacterium, is the etiologic agent of rat bite fever and Haverhill fever. Strain 9901T, the type strain of the species, was isolated from a patient with rat bite fever. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is only the second completed genome sequence of the order 'Fusobacteriales' and no more than the third sequence from the phylum 'Fusobacteria'. The 1,662,578 bp long chromosome and the 10,702 bp plasmid with a total of 1511 protein-coding and 55 RNA genes are part of the Genomic Encyclopedia of Bacteria and Archaea project.

  20. Complete genome sequence of Pirellula staleyi type strain (ATCC 27377).

    PubMed

    Clum, Alicia; Tindall, Brian J; Sikorski, Johannes; Ivanova, Natalia; Mavrommatis, Konstantinos; Lucas, Susan; Glavina, Tijana; Del Rio; Nolan, Matt; Chen, Feng; Tice, Hope; Pitluck, Sam; Cheng, Jan-Fang; Chertkov, Olga; Brettin, Thomas; Han, Cliff; Detter, John C; Kuske, Cheryl; Bruce, David; Goodwin, Lynne; Ovchinikova, Galina; Pati, Amrita; Mikhailova, Natalia; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Chain, Patrick; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lapidus, Alla

    2009-12-30

    Pirellula staleyi Schlesner and Hirsch 1987 is the type species of the genus Pirellula of the family Planctomycetaceae. Members of this pear- or teardrop-shaped bacterium show a clearly visible pointed attachment pole and can be distinguished from other Planctomycetes by a lack of true stalks. Strains closely related to the species have been isolated from fresh and brackish water, as well as from hypersaline lakes. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the order Planctomyces and only the second sequence from the phylum Planctobacteria/Planctomycetes. The 6,196,199 bp long genome with its 4773 protein-coding and 49 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  1. Complete genome sequence of Kangiella koreensis type strain (SW-125).

    PubMed

    Han, Cliff; Sikorski, Johannes; Lapidus, Alla; Nolan, Matt; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Copeland, Alex; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Chain, Patrick; Saunders, Elizabeth; Brettin, Thomas; Göker, Markus; Tindall, Brian J; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Detter, John C

    2009-11-22

    Kangiella koreensis (Yoon et al. 2004) is the type species of the genus and is of phylogenetic interest because of the very isolated location of the genus Kangiella in the gammaproteobacterial order Oceanospirillales. K. koreensis SW-125(T) is a Gram-negative, non-motile, non-spore-forming bacterium isolated from tidal flat sediments at Daepo Beach, Yellow Sea, Korea. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first completed genome sequence from the genus Kangiella and only the fourth genome from the order Oceanospirillales. This 2,852,073 bp long single replicon genome with its 2647 protein-coding and 48 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  2. Complete genome sequence of Haliangium ochraceum type strain (SMP-2).

    PubMed

    Ivanova, Natalia; Daum, Chris; Lang, Elke; Abt, Birte; Kopitz, Markus; Saunders, Elizabeth; Lapidus, Alla; Lucas, Susan; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Mavromatis, Konstantinos; Pati, Amrita; Mikhailova, Natalia; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Detter, John C; Brettin, Thomas; Rohde, Manfred; Göker, Markus; Bristow, Jim; Markowitz, Victor; Eisen, Jonathan A; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-28

    Haliangium ochraceum Fudou et al. 2002 is the type species of the genus Haliangium in the myxococcal family 'Haliangiaceae'. Members of the genus Haliangium are the first halophilic myxobacterial taxa described. The cells of the species follow a multicellular lifestyle in highly organized biofilms, called swarms, they decompose bacterial and yeast cells as most myxobacteria do. The fruiting bodies contain particularly small coccoid myxospores. H. ochraceum encodes the first actin homologue identified in a bacterial genome. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the myxococcal suborder Nannocystineae, and the 9,446,314 bp long single replicon genome with its 6,898 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  3. First Complete Genome Sequence of Corynebacterium riegelii

    PubMed Central

    Greninger, Alexander L.; Streithorst, Jessica

    2017-01-01

    ABSTRACT Here, we report the first complete genome sequence of Corynebacterium riegelii strain PUDD_83A45, isolated from the urine of a patient with urinary tract infection. The genome measured 2.56 Mb and contained no plasmid. PMID:28360160

  4. Multiple Genome Sequences of Lactobacillus plantarum Strains.

    PubMed

    Kafka, Thomas A; Geissler, Andreas J; Vogel, Rudi F

    2017-07-20

    We report here the genome sequences of four Lactobacillus plantarum strains which vary in surface hydrophobicity. Bioinformatic analysis, using additional genomes of Lactobacillus plantarum strains, revealed a possible correlation between the cell wall teichoic acid-type and cell surface hydrophobicity and provide the basis for consecutive analyses. Copyright © 2017 Kafka et al.

  5. Multiple Genome Sequences of Lactobacillus plantarum Strains

    PubMed Central

    Kafka, Thomas A.; Geissler, Andreas J.

    2017-01-01

    ABSTRACT We report here the genome sequences of four Lactobacillus plantarum strains which vary in surface hydrophobicity. Bioinformatic analysis, using additional genomes of Lactobacillus plantarum strains, revealed a possible correlation between the cell wall teichoic acid-type and cell surface hydrophobicity and provide the basis for consecutive analyses. PMID:28729269

  6. Virtually sequenced: The next genomic generation

    SciTech Connect

    Bains, W.

    1996-06-01

    The announcement of {open_quotes}virtual genomics{close_quotes} requires evaluation of the efficiency and accuracy of computer-generated sequencing efforts. {open_quotes}Digital Northerns{close_quotes}, or Northern blot electrophoresis done in the realm of computer data, have been developed by Incyte Pharmaceuticals (Palo Alto, CA) and Human Genome Sciences (Rockville, MD). 12 refs., 2 figs.

  7. Refined Pichia pastoris reference genome sequence.

    PubMed

    Sturmberger, Lukas; Chappell, Thomas; Geier, Martina; Krainer, Florian; Day, Kasey J; Vide, Ursa; Trstenjak, Sara; Schiefer, Anja; Richardson, Toby; Soriaga, Leah; Darnhofer, Barbara; Birner-Gruenberger, Ruth; Glick, Benjamin S; Tolstorukov, Ilya; Cregg, James; Madden, Knut; Glieder, Anton

    2016-10-10

    Strains of the species Komagataella phaffii are the most frequently used "Pichia pastoris" strains employed for recombinant protein production as well as studies on peroxisome biogenesis, autophagy and secretory pathway analyses. Genome sequencing of several different P. pastoris strains has provided the foundation for understanding these cellular functions in recent genomics, transcriptomics and proteomics experiments. This experimentation has identified mistakes, gaps and incorrectly annotated open reading frames in the previously published draft genome sequences. Here, a refined reference genome is presented, generated with genome and transcriptome sequencing data from multiple P. pastoris strains. Twelve major sequence gaps from 20 to 6000 base pairs were closed and 5111 out of 5256 putative open reading frames were manually curated and confirmed by RNA-seq and published LC-MS/MS data, including the addition of new open reading frames (ORFs) and a reduction in the number of spliced genes from 797 to 571. One chromosomal fragment of 76kbp between two previous gaps on chromosome 1 and another 134kbp fragment at the end of chromosome 4, as well as several shorter fragments needed re-orientation. In total more than 500 positions in the genome have been corrected. This reference genome is presented with new chromosomal numbering, positioning ribosomal repeats at the distal ends of the four chromosomes, and includes predicted chromosomal centromeres as well as the sequence of two linear cytoplasmic plasmids of 13.1 and 9.5kbp found in some strains of P. pastoris.

  8. Genome sequence and analysis of Lactobacillus helveticus

    PubMed Central

    Cremonesi, Paola; Chessa, Stefania; Castiglioni, Bianca

    2013-01-01

    The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of Lactobacillus helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE) inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract. As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones. PMID:23335916

  9. Sequencing and comparing whole mitochondrial genomes ofanimals

    SciTech Connect

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  10. Genomic sequencing of Pleistocene cave bears.

    PubMed

    Noonan, James P; Hofreiter, Michael; Smith, Doug; Priest, James R; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J Chris; Pääbo, Svante; Rubin, Edward M

    2005-07-22

    Despite the greater information content of genomic DNA, ancient DNA studies have largely been limited to the amplification of mitochondrial sequences. Here we describe metagenomic libraries constructed with unamplified DNA extracted from skeletal remains of two 40,000-year-old extinct cave bears. Analysis of approximately 1 megabase of sequence from each library showed that despite significant microbial contamination, 5.8 and 1.1% of clones contained cave bear inserts, yielding 26,861 base pairs of cave bear genome sequence. Comparison of cave bear and modern bear sequences revealed the evolutionary relationship of these lineages. The metagenomic approach used here establishes the feasibility of ancient DNA genome sequencing programs.

  11. Direct determination of diploid genome sequences

    PubMed Central

    Weisenfeld, Neil I.; Kumar, Vijay; Shah, Preyas; Church, Deanna M.; Jaffe, David B.

    2017-01-01

    Determining the genome sequence of an organism is challenging, yet fundamental to understanding its biology. Over the past decade, thousands of human genomes have been sequenced, contributing deeply to biomedical research. In the vast majority of cases, these have been analyzed by aligning sequence reads to a single reference genome, biasing the resulting analyses, and in general, failing to capture sequences novel to a given genome. Some de novo assemblies have been constructed free of reference bias, but nearly all were constructed by merging homologous loci into single “consensus” sequences, generally absent from nature. These assemblies do not correctly represent the diploid biology of an individual. In exactly two cases, true diploid de novo assemblies have been made, at great expense. One was generated using Sanger sequencing, and one using thousands of clone pools. Here, we demonstrate a straightforward and low-cost method for creating true diploid de novo assemblies. We make a single library from ∼1 ng of high molecular weight DNA, using the 10x Genomics microfluidic platform to partition the genome. We applied this technique to seven human samples, generating low-cost HiSeq X data, then assembled these using a new “pushbutton” algorithm, Supernova. Each computation took 2 d on a single server. Each yielded contigs longer than 100 kb, phase blocks longer than 2.5 Mb, and scaffolds longer than 15 Mb. Our method provides a scalable capability for determining the actual diploid genome sequence in a sample, opening the door to new approaches in genomic biology and medicine. PMID:28381613

  12. My Identical Twin Sequenced our Genome.

    PubMed

    Schilit, Samantha L P; Schilit Nitenson, Arielle

    2017-04-01

    With rapidly declining costs, whole genome sequencing is becoming feasible for widespread use. Although cost-effectiveness is driving increased use of the technology, comprehensive recommendations on how to handle ethical dilemmas have yet to reach a consensus. In this article, Sam shares her experience of undergoing whole genome sequencing. Despite the deeply private nature of the test, the results do not solely belong to Sam; her identical twin sister, Arielle, shares virtually the same genome and received results without a formal consent process. This article explores their parallel experiences as a way of highlighting the controversial ethics of a private test with familial implications.

  13. The Human Genome Project: big science transforms biology and medicine.

    PubMed

    Hood, Leroy; Rowen, Lee

    2013-01-01

    The Human Genome Project has transformed biology through its integrated big science approach to deciphering a reference human genome sequence along with the complete sequences of key model organisms. The project exemplifies the power, necessity and success of large, integrated, cross-disciplinary efforts - so-called 'big science' - directed towards complex major objectives. In this article, we discuss the ways in which this ambitious endeavor led to the development of novel technologies and analytical tools, and how it brought the expertise of engineers, computer scientists and mathematicians together with biologists. It established an open approach to data sharing and open-source software, thereby making the data resulting from the project accessible to all. The genome sequences of microbes, plants and animals have revolutionized many fields of science, including microbiology, virology, infectious disease and plant biology. Moreover, deeper knowledge of human sequence variation has begun to alter the practice of medicine. The Human Genome Project has inspired subsequent large-scale data acquisition initiatives such as the International HapMap Project, 1000 Genomes, and The Cancer Genome Atlas, as well as the recently announced Human Brain Project and the emerging Human Proteome Project.

  14. Harnessing Whole Genome Sequencing in Medical Mycology.

    PubMed

    Cuomo, Christina A

    2017-01-01

    Comparative genome sequencing studies of human fungal pathogens enable identification of genes and variants associated with virulence and drug resistance. This review describes current approaches, resources, and advances in applying whole genome sequencing to study clinically important fungal pathogens. Genomes for some important fungal pathogens were only recently assembled, revealing gene family expansions in many species and extreme gene loss in one obligate species. The scale and scope of species sequenced is rapidly expanding, leveraging technological advances to assemble and annotate genomes with higher precision. By using iteratively improved reference assemblies or those generated de novo for new species, recent studies have compared the sequence of isolates representing populations or clinical cohorts. Whole genome approaches provide the resolution necessary for comparison of closely related isolates, for example, in the analysis of outbreaks or sampled across time within a single host. Genomic analysis of fungal pathogens has enabled both basic research and diagnostic studies. The increased scale of sequencing can be applied across populations, and new metagenomic methods allow direct analysis of complex samples.

  15. Complete Genome Sequencing of Trivittatus virus

    PubMed Central

    Groseth, Allison; Vine, Veronica; Weisend, Carla; Ebihara, Hideki

    2015-01-01

    Trivittatus virus (family Bunyaviridae, genus Orthobunyavirus) represents an important genetic intermediate between the California encephalitis group, and Bwamba/Pongola and Nyando groups. Here, we report the first complete genome sequence of the prototype (Eklund) strain, isolated in 1948, which interestingly shows only few differences compared to partial sequences of modern strains. PMID:26212363

  16. Complete Genome Sequence of Lleida Bat Lyssavirus

    PubMed Central

    Marston, Denise A.; Ellis, Richard J.; Wise, Emma L.; Aréchiga-Ceballos, Nidia; Freuling, Conrad M.; Banyard, Ashley C.; McElhinney, Lorraine M.; de Lamballerie, Xavier; Müller, Thomas; Echevarría, Juan E.

    2017-01-01

    ABSTRACT All lyssaviruses (family Rhabdoviridae) cause the disease rabies, an acute progressive encephalitis for which, once symptoms occur, there is no effective cure. Using next-generation sequencing, the full-genome sequence for a novel lyssavirus, Lleida bat lyssavirus (LLEBV), from the original brain of a common bent-winged bat has been confirmed. PMID:28082487

  17. Genome Sequence of Pseudomonas chlororaphis Strain 189.

    PubMed

    Town, Jennifer; Audy, Patrice; Boyetchko, Susan M; Dumonceaux, Tim J

    2016-06-23

    Pseudomonas chlororaphis strain 189 is a potent inhibitor of the growth of the potato pathogen Phytophthora infestans We determined the complete, finished sequence of the 6.8-Mbp genome of this strain, consisting of a single contiguous molecule. Strain 189 is closely related to previously sequenced strains of P. chlororaphis. Copyright © 2016 Town et al.

  18. Draft Genome Sequences of Elizabethkingia meningoseptica

    PubMed Central

    Matyi, Stephanie A.; Hoyt, Peter R.; Hosoyama, Akira; Yamazoe, Atsushi; Fujita, Nobuyuki

    2013-01-01

    Elizabethkingia meningoseptica is ubiquitous in nature, exhibits a multiple-antibiotic resistance phenotype, and causes rare opportunistic infections. We now report two draft genome sequences of E. meningoseptica type strains that were sequenced independently in two laboratories. PMID:23846266

  19. Origins of the Human Genome Project

    DOE R&D Accomplishments Database

    Cook-Deegan, Robert (Affiliation: Institute of Medicine, National Academy of Sciences)

    1993-07-01

    The human genome project was borne of technology, grew into a science bureaucracy in the United States and throughout the world, and is now being transformed into a hybrid academic and commercial enterprise. The next phase of the project promises to veer more sharply toward commercial application, harnessing both the technical prowess of molecular biology and the rapidly growing body of knowledge about DNA structure to the pursuit of practical benefits. Faith that the systematic analysis of DNA structure will prove to be a powerful research tool underlies the rationale behind the genome project. The notion that most genetic information is embedded in the sequence of CNA base pairs comprising chromosomes is a central tenet. A rough analogy is to liken an organism's genetic code to computer code. The coal of the genome project, in this parlance, is to identify and catalog 75,000 or more files (genes) in the software that directs construction of a self-modifying and self-replicating system -- a living organism.

  20. Origins of the Human Genome Project

    SciTech Connect

    Cook-Deegan, Robert

    1993-07-01

    The human genome project was borne of technology, grew into a science bureaucracy in the US and throughout the world, and is now being transformed into a hybrid academic and commercial enterprise. The next phase of the project promises to veer more sharply toward commercial application, harnessing both the technical prowess of molecular biology and the rapidly growing body of knowledge about DNA structure to the pursuit of practical benefits. Faith that the systematic analysis of DNA structure will prove to be a powerful research tool underlies the rationale behind the genome project. The notion that most genetic information is embedded in the sequence of CNA base pairs comprising chromosomes is a central tenet. A rough analogy is to liken an organism's genetic code to computer code. The coal of the genome project, in this parlance, is to identify and catalog 75,000 or more files (genes) in the software that directs construction of a self-modifying and self-replicating system -- a living organism.

  1. Whole-genome sequencing for comparative genomics and de novo genome assembly.

    PubMed

    Benjak, Andrej; Sala, Claudia; Hartkoorn, Ruben C

    2015-01-01

    Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).

  2. Personal Genome Sequencing in Ostensibly Healthy Individuals and the PeopleSeq Consortium

    PubMed Central

    Linderman, Michael D.; Nielsen, Daiva E.; Green, Robert C.

    2016-01-01

    Thousands of ostensibly healthy individuals have had their exome or genome sequenced, but a much smaller number of these individuals have received any personal genomic results from that sequencing. We term those projects in which ostensibly healthy participants can receive sequencing-derived genetic findings and may also have access to their genomic data as participatory predispositional personal genome sequencing (PPGS). Here we are focused on genome sequencing applied in a pre-symptomatic context and so define PPGS to exclude diagnostic genome sequencing intended to identify the molecular cause of suspected or diagnosed genetic disease. In this report we describe the design of completed and underway PPGS projects, briefly summarize the results reported to date and introduce the PeopleSeq Consortium, a newly formed collaboration of PPGS projects designed to collect much-needed longitudinal outcome data. PMID:27023617

  3. Sequence composition and genome organization of maize

    PubMed Central

    Messing, Joachim; Bharti, Arvind K.; Karlowski, Wojciech M.; Gundlach, Heidrun; Kim, Hye Ran; Yu, Yeisoo; Wei, Fusheng; Fuks, Galina; Soderlund, Carol A.; Mayer, Klaus F. X.; Wing, Rod A.

    2004-01-01

    Zea mays L. ssp. mays, or corn, one of the most important crops and a model for plant genetics, has a genome ≈80% the size of the human genome. To gain global insight into the organization of its genome, we have sequenced the ends of large insert clones, yielding a cumulative length of one-eighth of the genome with a DNA sequence read every 6.2 kb, thereby describing a large percentage of the genes and transposable elements of maize in an unbiased approach. Based on the accumulative 307 Mb of sequence, repeat sequences occupy 58% and genic regions occupy 7.5%. A conservative estimate predicts ≈59,000 genes, which is higher than in any other organism sequenced so far. Because the sequences are derived from bacterial artificial chromosome clones, which are ordered in overlapping bins, tagged genes are also ordered along continuous chromosomal segments. Based on this positional information, roughly one-third of the genes appear to consist of tandemly arrayed gene families. Although the ancestor of maize arose by tetraploidization, fewer than half of the genes appear to be present in two orthologous copies, indicating that the maize genome has undergone significant gene loss since the duplication event. PMID:15388850

  4. Comparison of 61 Sequenced Escherichia coli Genomes

    PubMed Central

    Lukjancenko, Oksana; Wassenaar, Trudy M.

    2010-01-01

    Escherichia coli is an important component of the biosphere and is an ideal model for studies of processes involved in bacterial genome evolution. Sixty-one publically available E. coli and Shigella spp. sequenced genomes are compared, using basic methods to produce phylogenetic and proteomics trees, and to identify the pan- and core genomes of this set of sequenced strains. A hierarchical clustering of variable genes allowed clear separation of the strains into clusters, including known pathotypes; clinically relevant serotypes can also be resolved in this way. In contrast, when in silico MLST was performed, many of the various strains appear jumbled and less well resolved. The predicted pan-genome comprises 15,741 gene families, and only 993 (6%) of the families are represented in every genome, comprising the core genome. The variable or ‘accessory’ genes thus make up more than 90% of the pan-genome and about 80% of a typical genome; some of these variable genes tend to be co-localized on genomic islands. The diversity within the species E. coli, and the overlap in gene content between this and related species, suggests a continuum rather than sharp species borders in this group of Enterobacteriaceae. PMID:20623278

  5. Genomic Encyclopedia of Type Strains, Phase I: The one thousand microbial genomes (KMG-I) project

    PubMed Central

    Woyke, Tanja; Eisen, Jonathan A.; Garrity, George; Lilburn, Timothy G.; Beck, Brian J.; Whitman, William B.; Hugenholtz, Phil; Klenk, Hans-Peter

    2013-01-01

    The Genomic Encyclopedia of Bacteria and Archaea (GEBA) project was launched by the JGI in 2007 as a pilot project with the objective of sequencing 250 bacterial and archaeal genomes. The two major goals of that project were (a) to test the hypothesis that there are many benefits to the use the phylogenetic diversity of organisms in the tree of life as a primary criterion for generating their genome sequence and (b) to develop the necessary framework, technology and organization for large-scale sequencing of microbial isolate genomes. While the GEBA pilot project has not yet been entirely completed, both of the original goals have already been successfully accomplished, leading the way for the next phase of the project. Here we propose taking the GEBA project to the next level, by generating high quality draft genomes for 1,000 bacterial and archaeal strains. This represents a combined 16-fold increase in both scale and speed as compared to the GEBA pilot project (250 isolate genomes in 4+ years). We will follow a similar approach for organism selection and sequencing prioritization as was done for the GEBA pilot project (i.e. phylogenetic novelty, availability and growth of cultures of type strains and DNA extraction capability), focusing on type strains as this ensures reproducibility of our results and provides the strongest linkage between genome sequences and other knowledge about each strain. In turn, this project will constitute a pilot phase of a larger effort that will target the genome sequences of all available type strains of the Bacteria and Archaea. PMID:25197443

  6. Genomic Encyclopedia of Type Strains, Phase I: The one thousand microbial genomes (KMG-I) project.

    PubMed

    Kyrpides, Nikos C; Woyke, Tanja; Eisen, Jonathan A; Garrity, George; Lilburn, Timothy G; Beck, Brian J; Whitman, William B; Hugenholtz, Phil; Klenk, Hans-Peter

    2014-06-15

    The Genomic Encyclopedia of Bacteria and Archaea (GEBA) project was launched by the JGI in 2007 as a pilot project with the objective of sequencing 250 bacterial and archaeal genomes. The two major goals of that project were (a) to test the hypothesis that there are many benefits to the use the phylogenetic diversity of organisms in the tree of life as a primary criterion for generating their genome sequence and (b) to develop the necessary framework, technology and organization for large-scale sequencing of microbial isolate genomes. While the GEBA pilot project has not yet been entirely completed, both of the original goals have already been successfully accomplished, leading the way for the next phase of the project. Here we propose taking the GEBA project to the next level, by generating high quality draft genomes for 1,000 bacterial and archaeal strains. This represents a combined 16-fold increase in both scale and speed as compared to the GEBA pilot project (250 isolate genomes in 4+ years). We will follow a similar approach for organism selection and sequencing prioritization as was done for the GEBA pilot project (i.e. phylogenetic novelty, availability and growth of cultures of type strains and DNA extraction capability), focusing on type strains as this ensures reproducibility of our results and provides the strongest linkage between genome sequences and other knowledge about each strain. In turn, this project will constitute a pilot phase of a larger effort that will target the genome sequences of all available type strains of the Bacteria and Archaea.

  7. In Silico Whole Genome Sequencer and Analyzer (iWGS): a Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

    PubMed Central

    Zhou, Xiaofan; Peris, David; Kominek, Jacek; Kurtzman, Cletus P.; Hittinger, Chris Todd; Rokas, Antonis

    2016-01-01

    The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimental design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS. PMID:27638685

  8. in silico Whole Genome Sequencer & Analyzer (iWGS): A Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies.

    PubMed

    Zhou, Xiaofan; Peris, David; Kominek, Jacek; Kurtzman, Cletus P; Hittinger, Chris Todd; Rokas, Antonis

    2016-09-16

    The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in non-model organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimental design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.

  9. in silico Whole Genome Sequencer & Analyzer (iWGS): A Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

    SciTech Connect

    Zhou, Xiaofan; Peris, David; Kominek, Jacek; Kurtzman, Cletus P.; Hittinger, Chris Todd; Rokas, A.

    2016-09-16

    The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimental design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.

  10. in silico Whole Genome Sequencer & Analyzer (iWGS): A Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

    DOE PAGES

    Zhou, Xiaofan; Peris, David; Kominek, Jacek; ...

    2016-09-16

    The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimentalmore » design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less

  11. Complete genome sequence of Geodermatophilus obscurus type strain (G-20).

    PubMed

    Ivanova, Natalia; Sikorski, Johannes; Jando, Marlen; Munk, Christine; Lapidus, Alla; Glavina Del Rio, Tijana; Copeland, Alex; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Nolan, Matt; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Meincke, Linda; Brettin, Thomas; Detter, John C; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-03-30

    Geodermatophilus obscurus Luedemann 1968 is the type species of the genus, which is the type genus of the family Geodermatophilaceae. G. obscurus is of interest as it has frequently been isolated from stressful environments such as rock varnish in deserts, and as it exhibits interesting phenotypes such as lytic capability of yeast cell walls, UV-C resistance, strong production of extracellular functional amyloid (FuBA) and manganese oxidation. This is the first completed genome sequence of the family Geodermatophilaceae. The 5,322,497 bp long genome with its 5,161 protein-coding and 58 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  12. A physical map of the papaya genome with integrated genetic map and genome sequence

    PubMed Central

    2009-01-01

    Background Papaya is a major fruit crop in tropical and subtropical regions worldwide and has primitive sex chromosomes controlling sex determination in this trioecious species. The papaya genome was recently sequenced because of its agricultural importance, unique biological features, and successful application of transgenic papaya for resistance to papaya ringspot virus. As a part of the genome sequencing project, we constructed a BAC-based physical map using a high information-content fingerprinting approach to assist whole genome shotgun sequence assembly. Results The physical map consists of 963 contigs, representing 9.4× genome equivalents, and was integrated with the genetic map and genome sequence using BAC end sequences and a sequence-tagged high-density genetic map. The estimated genome coverage of the physical map is about 95.8%, while 72.4% of the genome was aligned to the genetic map. A total of 1,181 high quality overgo (overlapping oligonucleotide) probes representing conserved sequences in Arabidopsis and genetically mapped loci in Brassica were anchored on the physical map, which provides a foundation for comparative genomics in the Brassicales. The integrated genetic and physical map aligned with the genome sequence revealed recombination hotspots as well as regions suppressed for recombination across the genome, particularly on the recently evolved sex chromosomes. Suppression of recombination spread to the adjacent region of the male specific region of the Y chromosome (MSY), and recombination rates were recovered gradually and then exceeded the genome average. Recombination hotspots were observed at about 10 Mb away on both sides of the MSY, showing 7-fold increase compared with the genome wide average, demonstrating the dynamics of recombination of the sex chromosomes. Conclusion A BAC-based physical map of papaya was constructed and integrated with the genetic map and genome sequence. The integrated map facilitated the draft genome assembly

  13. Genome Sequence of Yersinia pestis KIM†

    PubMed Central

    Deng, Wen; Burland, Valerie; Plunkett III, Guy; Boutin, Adam; Mayhew, George F.; Liss, Paul; Perna, Nicole T.; Rose, Debra J.; Mau, Bob; Zhou, Shiguo; Schwartz, David C.; Fetherston, Jaqueline D.; Lindler, Luther E.; Brubaker, Robert R.; Plano, Gregory V.; Straley, Susan C.; McDonough, Kathleen A.; Nilles, Matthew L.; Matson, Jyl S.; Blattner, Frederick R.; Perry, Robert D.

    2002-01-01

    We present the complete genome sequence of Yersinia pestis KIM, the etiologic agent of bubonic and pneumonic plague. The strain KIM, biovar Mediaevalis, is associated with the second pandemic, including the Black Death. The 4.6-Mb genome encodes 4,198 open reading frames (ORFs). The origin, terminus, and most genes encoding DNA replication proteins are similar to those of Escherichia coli K-12. The KIM genome sequence was compared with that of Y. pestis CO92, biovar Orientalis, revealing homologous sequences but a remarkable amount of genome rearrangement for strains so closely related. The differences appear to result from multiple inversions of genome segments at insertion sequences, in a manner consistent with present knowledge of replication and recombination. There are few differences attributable to horizontal transfer. The KIM and E. coli K-12 genome proteins were also compared, exposing surprising amounts of locally colinear “backbone,” or synteny, that is not discernible at the nucleotide level. Nearly 54% of KIM ORFs are significantly similar to K-12 proteins, with conserved housekeeping functions. However, a number of E. coli pathways and transport systems and at least one global regulator were not found, reflecting differences in lifestyle between them. In KIM-specific islands, new genes encode candidate pathogenicity proteins, including iron transport systems, putative adhesins, toxins, and fimbriae. PMID:12142430

  14. Genome Sequence of the Palaeopolyploid soybean

    SciTech Connect

    Schmutz, Jeremy; Cannon, Steven B.; Schlueter, Jessica; Ma, Jianxin; Mitros, Therese; Nelson, William; Hyten, David L.; Song, Qijian; Thelen, Jay J.; Cheng, Jianlin; Xu, Dong; Hellsten, Uffe; May, Gregory D.; Yu, Yeisoo; Sakura, Tetsuya; Umezawa, Taishi; Bhattacharyya, Madan K.; Sandhu, Devinder; Valliyodan, Babu; Lindquist, Erika; Peto, Myron; Grant, David; Shu, Shengqiang; Goodstein, David; Barry, Kerrie; Futrell-Griggs, Montona; Abernathy, Brian; Du, Jianchang; Tian, Zhixi; Zhu, Liucun; Gill, Navdeep; Joshi, Trupti; Libault, Marc; Sethuraman, Anand; Zhang, Xue-Cheng; Shinozaki, Kazuo; Nguyen, Henry T.; Wing, Rod A.; Cregan, Perry; Specht, James; Grimwood, Jane; Rokhsar, Dan; Stacey, Gary; Shoemaker, Randy C.; Jackson, Scott A.

    2009-08-03

    Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70percent more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78percent of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75percent of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

  15. Complete genome sequence of Alicyclobacillus acidocaldarius type strain (104-IAT)

    PubMed Central

    Mavromatis, Konstantinos; Sikorski, Johannes; Lapidus, Alla; Glavina Del Rio, Tijana; Copeland, Alex; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Nolan, Matt; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Chain, Patrick; Meincke, Linda; Sims, David; Chertkov, Olga; Han, Cliff; Brettin, Thomas; Detter, John C.; Wahrenburg, Claudia; Rohde, Manfred; Pukall, Rüdiger; Göker, Markus; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C.

    2010-01-01

    Alicyclobacillus acidocaldarius (Darland and Brock 1971) is the type species of the larger of the two genera in the bacillal family ‘Alicyclobacillaceae’. A. acidocaldarius is a free-living and non-pathogenic organism, but may also be associated with food and fruit spoilage. Due to its acidophilic nature, several enzymes from this species have since long been subjected to detailed molecular and biochemical studies. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family ‘Alicyclobacillaceae’. The 3,205,686 bp long genome (chromosome and three plasmids) with its 3,153 protein-coding and 82 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304673

  16. Complete genome sequence of Nakamurella multipartita type strain (Y-104).

    PubMed

    Tice, Hope; Mayilraj, Shanmugam; Sims, David; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Glavina Del Rio, Tijana; Copeland, Alex; Cheng, Jan-Fang; Meincke, Linda; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Detter, John C; Brettin, Thomas; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Chen, Feng

    2010-03-30

    Nakamurella multipartita (Yoshimi et al. 1996) Tao et al. 2004 is the type species of the monospecific genus Nakamurella in the actinobacterial suborder Frankineae. The nonmotile, coccus-shaped strain was isolated from activated sludge acclimated with sugar-containing synthetic wastewater, and is capable of accumulating large amounts of polysaccharides in its cells. Here we describe the features of the organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of a member of the family Nakamurellaceae. The 6,060,298 bp long single replicon genome with its 5415 protein-coding and 56 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  17. Complete genome sequence of Tsukamurella paurometabola type strain (no. 33).

    PubMed

    Munk, A Christine; Lapidus, Alla; Lucas, Susan; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Del Rio, Tijana Glavina; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Huntemann, Marcel; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Tapia, Roxanne; Han, Cliff; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Brettin, Thomas; Yasawong, Montri; Brambilla, Evelyne-Marie; Rohde, Manfred; Sikorski, Johannes; Göker, Markus; Detter, John C; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2011-07-01

    Tsukamurella paurometabola corrig. (Steinhaus 1941) Collins et al. 1988 is the type species of the genus Tsukamurella, which is the type genus to the family Tsukamurellaceae. The species is not only of interest because of its isolated phylogenetic location, but also because it is a human opportunistic pathogen with some strains of the species reported to cause lung infection, lethal meningitis, and necrotizing tenosynovitis. This is the first completed genome sequence of a member of the genus Tsukamurella and the first genome sequence of a member of the family Tsukamurellaceae. The 4,479,724 bp long genome contains a 99,806 bp long plasmid and a total of 4,335 protein-coding and 56 RNA genes, and is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  18. Complete genome sequence of Alicyclobacillus acidocaldarius type strain (104-IAT)

    SciTech Connect

    Mavromatis, K; Sikorski, Johannes; Lapidus, Alla L.; Glavina Del Rio, Tijana; Copeland, A; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Nolan, Matt; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Ivanova, N; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Meincke, Linda; Sims, David; Chertkov, Olga; Han, Cliff; Brettin, Tom; Detter, J C; Wahrenburg, Claudia; Rohde, Manfred; Pukall, Rudiger; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2010-01-01

    Alicyclobacillus acidocaldarius (Darland and Brock 1971) is the type species of the larger of the two genera in the bacillal family Alicyclobacillaceae . A. acidocaldarius is a free-living and non-pathogenic organism, but may also be associated with food and fruit spoilage. Due to its acidophilic nature, several enzymes from this species have since long been subjected to detailed molecular and biochemical studies. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family Alicyclobacillaceae . The 3,205,686 bp long genome (chromosome and three plasmids) with its 3,153 protein-coding and 82 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  19. Complete genome sequence of Desulfotomaculum acetoxidans type strain (5575).

    PubMed

    Spring, Stefan; Lapidus, Alla; Schröder, Maren; Gleim, Dorothea; Sims, David; Meincke, Linda; Glavina Del Rio, Tijana; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Nolan, Matt; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Chain, Patrick; Saunders, Elizabeth; Brettin, Thomas; Detter, John C; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Han, Cliff

    2009-11-22

    Desulfotomaculum acetoxidans Widdel and Pfennig 1977 was one of the first sulfate-reducing bacteria known to grow with acetate as sole energy and carbon source. It is able to oxidize substrates completely to carbon dioxide with sulfate as the electron acceptor, which is reduced to hydrogen sulfide. All available data about this species are based on strain 5575(T), isolated from piggery waste in Germany. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a Desulfotomaculum species with validly published name. The 4,545,624 bp long single replicon genome with its 4370 protein-coding and 100 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  20. Complete genome sequence of Chitinophaga pinensis type strain (UQM 2034).

    PubMed

    Glavina Del Rio, Tijana; Abt, Birte; Spring, Stefan; Lapidus, Alla; Nolan, Matt; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Chain, Patrick; Saunders, Elizabeth; Detter, John C; Brettin, Thomas; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lucas, Susan

    2010-02-28

    Chitinophaga pinensis Sangkhobol and Skerman 1981 is the type strain of the species which is the type species of the rapidly growing genus Chitinophaga in the sphingobacterial family 'Chitinophagaceae'. Members of the genus Chitinophaga vary in shape between filaments and spherical bodies without the production of a fruiting body, produce myxospores, and are of special interest for their ability to degrade chitin. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the family 'Chitinophagaceae', and the 9,127,347 bp long single replicon genome with its 7,397 protein-coding and 95 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  1. Complete genome sequence of Desulfohalobium retbaense type strain (HR(100)).

    PubMed

    Spring, Stefan; Nolan, Matt; Lapidus, Alla; Glavina Del Rio, Tijana; Copeland, Alex; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Land, Miriam; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Munk, Christine; Kiss, Hajnalka; Chain, Patrick; Han, Cliff; Brettin, Thomas; Detter, John C; Schüler, Esther; Göker, Markus; Rohde, Manfred; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-28

    Desulfohalobium retbaense (Ollivier et al. 1991) is the type species of the polyphyletic genus Desulfohalobium, which comprises, at the time of writing, two species and represents the family Desulfohalobiaceae within the Deltaproteobacteria. D. retbaense is a moderately halophilic sulfate-reducing bacterium, which can utilize H(2) and a limited range of organic substrates, which are incompletely oxidized to acetate and CO(2), for growth. The type strain HR(100) (T) was isolated from sediments of the hypersaline Retba Lake in Senegal. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the family Desulfohalobiaceae. The 2,909,567 bp genome (one chromosome and a 45,263 bp plasmid) with its 2,552 protein-coding and 57 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  2. Complete genome sequence of Nocardiopsis dassonvillei type strain (IMRU 509).

    PubMed

    Sun, Hui; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Del Rio, Tijana Glavina; Tice, Hope; Cheng, Jan-Fang; Tapia, Roxane; Han, Cliff; Goodwin, Lynne; Pitluck, Sam; Pagani, Ioanna; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Djao, Olivier Duplex Ngatchou; Rohde, Manfred; Sikorski, Johannes; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-11-30

    Nocardiopsis dassonvillei (Brocq-Rousseau 1904) Meyer 1976 is the type species of the genus Nocardiopsis, which in turn is the type genus of the family Nocardiopsaceae. This species is of interest because of its ecological versatility. Members of N. dassonvillei have been isolated from a large variety of natural habitats such as soil and marine sediments, from different plant and animal materials as well as from human patients. Moreover, representatives of the genus Nocardiopsis participate actively in biopolymer degradation. This is the first complete genome sequence in the family Nocardiopsaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 6,543,312 bp long genome consist of a 5.77 Mbp chromosome and a 0.78 Mbp plasmid and with its 5,570 protein-coding and 77 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  3. Complete genome sequence of Desulfotomaculum acetoxidans type strain (5575T)

    SciTech Connect

    Spring, Stefan; Lapidus, Alla L.; Schroder, Maren; Gleim, Dorothea; Sims, David; Meincke, Linda; Glavina Del Rio, Tijana; Tice, Hope; Copeland, A; Cheng, Jan-Fang; Chen, Feng; Lucas, Susan; Nolan, Matt; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Saunders, Elizabeth H; Brettin, Tom; Detter, J. Chris; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Han, Cliff

    2009-01-01

    Desulfotomaculum acetoxidans Widdel and Pfennig 1977 was one of the first sulfate-reducing bacteria known to grow with acetate as sole energy and carbon source. It is able to oxidize substrates completely to carbon dioxide with sulfate as the electron acceptor, which is reduced to hydrogen sulfide. All available data about this species are based on strain 5575T, isolated from piggery waste in Germany. Here we describe the features of this organ-ism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a Desulfotomaculum species with validly published name. The 4,545,624 bp long single replicon genome with its 4370 protein-coding and 100 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  4. Complete genome sequence of Arcobacter nitrofigilis type strain (CIT)

    SciTech Connect

    Pati, Amrita; Gronow, Sabine; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Han, Cliff; Chertkov, Olga; Bruce, David; Tapia, Roxanne; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Jeffries, Cynthia; Detter, J. Chris; Rohde, Manfred; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2010-01-01

    Arcobacter nitrofigilis (McClung et al. 1983) Vandamme et al. 1991 is the type species of the genus Arcobacter in the epsilonproteobacterial family Campylobacteraceae. The species was first described in 1983 as Campylobacter nitrofigilis [1] after its detection as a free-living, nitrogen-fixing Campylobacter species associated with Spartina alterniflora Loisel. roots [2]. It is of phylogenetic interest because of its lifestyle as a symbiotic organism in a marine environment in contrast to many other Arcobacter species which are associated with warm-blooded animals and tend to be pathogenic. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a type stain of the genus Arcobacter. The 3,192,235 bp genome with its 3,154 protein-coding and 70 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  5. Genome sequence of Phytophthora ramorum: implications for management

    Treesearch

    Brett Tyler; Sucheta Tripathy; Nik Grunwald; Kurt Lamour; Kelly Ivors; Matteo Garbelotto; Daniel Rokhsar; Nik Putnam; Igor Grigoriev; Jeffrey Boore

    2006-01-01

    A draft genome sequence has been determined for Phytophthora ramorum, together with a draft sequence of the soybean pathogen Phytophthora sojae. The P. ramorum genome was sequenced to a depth of 7-fold coverage, while the P. sojae genome was sequenced to a depth of 9-fold coverage. The genome...

  6. Legume genomics: understanding biology through DNA and RNA sequencing

    PubMed Central

    O'Rourke, Jamie A.; Bolon, Yung-Tsi; Bucciarelli, Bruna; Vance, Carroll P.

    2014-01-01

    Background The legume family (Leguminosae) consists of approx. 17 000 species. A few of these species, including, but not limited to, Phaseolus vulgaris, Cicer arietinum and Cajanus cajan, are important dietary components, providing protein for approx. 300 million people worldwide. Additional species, including soybean (Glycine max) and alfalfa (Medicago sativa), are important crops utilized mainly in animal feed. In addition, legumes are important contributors to biological nitrogen, forming symbiotic relationships with rhizobia to fix atmospheric N2 and providing up to 30 % of available nitrogen for the next season of crops. The application of high-throughput genomic technologies including genome sequencing projects, genome re-sequencing (DNA-seq) and transcriptome sequencing (RNA-seq) by the legume research community has provided major insights into genome evolution, genomic architecture and domestication. Scope and Conclusions This review presents an overview of the current state of legume genomics and explores the role that next-generation sequencing technologies play in advancing legume genomics. The adoption of next-generation sequencing and implementation of associated bioinformatic tools has allowed researchers to turn each species of interest into their own model organism. To illustrate the power of next-generation sequencing, an in-depth overview of the transcriptomes of both soybean and white lupin (Lupinus albus) is provided. The soybean transcriptome focuses on analysing seed development in two near-isogenic lines, examining the role of transporters, oil biosynthesis and nitrogen utilization. The white lupin transcriptome analysis examines how phosphate deficiency alters gene expression patterns, inducing the formation of cluster roots. Such studies illustrate the power of next-generation sequencing and bioinformatic analyses in elucidating the gene networks underlying biological processes. PMID:24769535

  7. Genome Update. Let the consumer beware: Streptomyces genome sequence quality.

    PubMed

    Studholme, David J

    2016-01-01

    A genome sequence assembly represents a model of a genome. This article explores some tools and methods for assessing the quality of an assembly, using publicly available data for Streptomyces species as the example. There is great variability in quality of assemblies deposited in GenBank. Only in a small minority of these assemblies are the raw data available, enabling full appraisal of the assembly quality.

  8. Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    PubMed

    Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.

  9. Sequencing and comparative analysis of the gorilla MHC genomic sequence

    PubMed Central

    Wilming, Laurens G.; Hart, Elizabeth A.; Coggill, Penny C.; Horton, Roger; Gilbert, James G. R.; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L.

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC. PMID:23589541

  10. [The Human Genome Project, genetic viability and genetic epidemiology].

    PubMed

    Hagymási, Krisztina; Tulassay, Zsolt

    2005-12-18

    The goal of the Human Genome Project to elucidate the complete sequence of the human genome has been achieved. The aims of the "post-genome" era are explaining the genetic information, characterisation of functional elements encoded in the human genome and mapping the human genetic variability as well. Two unrelated human beings also share 99.9% of their genomic sequence. The difference of 0.1% is the result of genetic polymorphisms: single nucleotide polymorphisms, repetitive sequences and insertion/deletion. The genetic differences, coupled with environmental exposures will determine the phenotypic variation we observe in health or disease. The disease-causing genetic variants can be identified by linkage analysis or association studies. The knowledge of human genome and application of multiple biomarkers will improve our ability to identify individuals at risk, so that preventive interventions can be applied, earlier diagnosis can be made and treatment can be optimized.

  11. Complete genome sequence of Haliscomenobacter hydrossis type strain (OT)

    SciTech Connect

    Daligault, Hajnalka E.; Lapidus, Alla L.; Zeytun, Ahmet; Nolan, Matt; Lucas, Susan; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Pagani, Ioanna; Ivanova, N; Huntemann, Marcel; Mavromatis, K; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Brambilla, Evelyne-Marie; Rohde, Manfred; Verbarg, Susanne; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Woyke, Tanja

    2011-01-01

    Haliscomenobacter hydrossis van Veen et al. 1973 is the type species of the genus Halisco- menobacter, which belongs to order 'Sphingobacteriales'. The species is of interest because of its isolated phylogenetic location in the tree of life, especially the so far genomically un- charted part of it, and because the organism grows in a thin, hardly visible hyaline sheath. Members of the species were isolated from fresh water of lakes and from ditch water. The genome of H. hydrossis is the first completed genome sequence reported from a member of the family 'Saprospiraceae'. The 8,771,651 bp long genome with its three plasmids of 92 kbp, 144 kbp and 164 kbp length contains 6,848 protein-coding and 60 RNA genes, and is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  12. Accelerating Genome Sequencing 100X with FPGAs

    SciTech Connect

    Storaasli, Olaf O; Strenski, Dave

    2007-01-01

    The performance of two Cray XD1 systems with Virtex-II Pro 50 and Virtex-4 LX160 FPGAs was evaluated using the FASTA computational biology program for human genome (DNA and protein) sequence comparisons. FPGA speedups of 50X (Virtex-II Pro 50) and 100X (Virtex-4 LX160) over a 2.2 GHz Opteron were obtained. FPGA coding issues for human genome data are described.

  13. Deep whole-genome sequencing of 100 southeast Asian Malays.

    PubMed

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-10

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies.

  14. Deep Whole-Genome Sequencing of 100 Southeast Asian Malays

    PubMed Central

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-01

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. PMID:23290073

  15. Genomic Sequence Variation Markup Language (GSVML).

    PubMed

    Nakaya, Jun; Kimura, Michio; Hiroi, Kaei; Ido, Keisuke; Yang, Woosung; Tanaka, Hiroshi

    2010-02-01

    With the aim of making good use of internationally accumulated genomic sequence variation data, which is increasing rapidly due to the explosive amount of genomic research at present, the development of an interoperable data exchange format and its international standardization are necessary. Genomic Sequence Variation Markup Language (GSVML) will focus on genomic sequence variation data and human health applications, such as gene based medicine or pharmacogenomics. We developed GSVML through eight steps, based on case analysis and domain investigations. By focusing on the design scope to human health applications and genomic sequence variation, we attempted to eliminate ambiguity and to ensure practicability. We intended to satisfy the requirements derived from the use case analysis of human-based clinical genomic applications. Based on database investigations, we attempted to minimize the redundancy of the data format, while maximizing the data covering range. We also attempted to ensure communication and interface ability with other Markup Languages, for exchange of omics data among various omics researchers or facilities. The interface ability with developing clinical standards, such as the Health Level Seven Genotype Information model, was analyzed. We developed the human health-oriented GSVML comprising variation data, direct annotation, and indirect annotation categories; the variation data category is required, while the direct and indirect annotation categories are optional. The annotation categories contain omics and clinical information, and have internal relationships. For designing, we examined 6 cases for three criteria as human health application and 15 data elements for three criteria as data formats for genomic sequence variation data exchange. The data format of five international SNP databases and six Markup Languages and the interface ability to the Health Level Seven Genotype Model in terms of 317 items were investigated. GSVML was developed as

  16. Sorghum Genome Sequencing by Methylation Filtration

    PubMed Central

    Budiman, Muhammad A; Nunberg, Andrew; Citek, Robert W; Robbins, Dan; Jones, Joshua; Flick, Elizabeth; Rohlfing, Theresa; Fries, Jason; Bradford, Kourtney; McMenamy, Jennifer; Smith, Michael; Holeman, Heather; Roe, Bruce A; Wiley, Graham; Korf, Ian F; Rabinowicz, Pablo D; Lakey, Nathan; McCombie, W. Richard; Jeddeloh, Jeffrey A; Martienssen, Robert A

    2005-01-01

    Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF) technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis. PMID:15660154

  17. Sorghum genome sequencing by methylation filtration.

    PubMed

    Bedell, Joseph A; Budiman, Muhammad A; Nunberg, Andrew; Citek, Robert W; Robbins, Dan; Jones, Joshua; Flick, Elizabeth; Rholfing, Theresa; Fries, Jason; Bradford, Kourtney; McMenamy, Jennifer; Smith, Michael; Holeman, Heather; Roe, Bruce A; Wiley, Graham; Korf, Ian F; Rabinowicz, Pablo D; Lakey, Nathan; McCombie, W Richard; Jeddeloh, Jeffrey A; Martienssen, Robert A

    2005-01-01

    Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF) technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis.

  18. The 1000 Genomes Project: data management and community access.

    PubMed

    Clarke, Laura; Zheng-Bradley, Xiangqun; Smith, Richard; Kulesha, Eugene; Xiao, Chunlin; Toneva, Iliana; Vaughan, Brendan; Preuss, Don; Leinonen, Rasko; Shumway, Martin; Sherry, Stephen; Flicek, Paul

    2012-04-27

    The 1000 Genomes Project was launched as one of the largest distributed data collection and analysis projects ever undertaken in biology. In addition to the primary scientific goals of creating both a deep catalog of human genetic variation and extensive methods to accurately discover and characterize variation using new sequencing technologies, the project makes all of its data publicly available. Members of the project data coordination center have developed and deployed several tools to enable widespread data access.

  19. Easy quantitative assessment of genome editing by sequence trace decomposition

    PubMed Central

    Brinkman, Eva K.; Chen, Tao; Amendola, Mario; van Steensel, Bas

    2014-01-01

    The efficacy and the mutation spectrum of genome editing methods can vary substantially depending on the targeted sequence. A simple, quick assay to accurately characterize and quantify the induced mutations is therefore needed. Here we present TIDE, a method for this purpose that requires only a pair of PCR reactions and two standard capillary sequencing runs. The sequence traces are then analyzed by a specially developed decomposition algorithm that identifies the major induced mutations in the projected editing site and accurately determines their frequency in a cell population. This method is cost-effective and quick, and it provides much more detailed information than current enzyme-based assays. An interactive web tool for automated decomposition of the sequence traces is available. TIDE greatly facilitates the testing and rational design of genome editing strategies. PMID:25300484

  20. Easy quantitative assessment of genome editing by sequence trace decomposition.

    PubMed

    Brinkman, Eva K; Chen, Tao; Amendola, Mario; van Steensel, Bas

    2014-12-16

    The efficacy and the mutation spectrum of genome editing methods can vary substantially depending on the targeted sequence. A simple, quick assay to accurately characterize and quantify the induced mutations is therefore needed. Here we present TIDE, a method for this purpose that requires only a pair of PCR reactions and two standard capillary sequencing runs. The sequence traces are then analyzed by a specially developed decomposition algorithm that identifies the major induced mutations in the projected editing site and accurately determines their frequency in a cell population. This method is cost-effective and quick, and it provides much more detailed information than current enzyme-based assays. An interactive web tool for automated decomposition of the sequence traces is available. TIDE greatly facilitates the testing and rational design of genome editing strategies. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Complete genome sequence of Acetohalobium arabaticum type strain (Z-7288T)

    PubMed Central

    Sikorski, Johannes; Lapidus, Alla; Chertkov, Olga; Lucas, Susan; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Han, Cliff; Brambilla, Evelyne; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Bruce, David; Detter, Chris; Tapia, Roxanne; Goodwin, Lynne; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Rohde, Manfred; Göker, Markus; Spring, Stefan; Woyke, Tanja; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2010-01-01

    Acetohalobium arabaticum Zhilina and Zavarzin 1990 is of special interest because of its physiology and its participation in the anaerobic C1-trophic chain in hypersaline environments. This is the first completed genome sequence of the family Halobacteroidaceae and only the second genome sequence in the order Halanaerobiales. The 2,469,596 bp long genome with its 2,353 protein-coding and 90 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304692

  2. Complete genome sequence of Vulcanisaeta distributa type strain (IC-017T)

    SciTech Connect

    Mavromatis, K; Sikorski, Johannes; Pabst, Elke; Teshima, Hazuki; Lapidus, Alla L.; Lucas, Susan; Nolan, Matt; Glavina Del Rio, Tijana; Cheng, Jan-Fang; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Rohde, Manfred; Spring, Stefan; Goker, Markus; Wirth, Reinhard; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2010-01-01

    Vulcanisaeta distributa Itoh et al. 2002 belongs to the family Thermoproteaceae in the phylum Crenarchaeota. The genus Vulcanisaeta is characterized by a global distribution in hot and acidic springs. This is the first genome sequence from a member of the genus Vulcanisaeta and seventh genome sequence in the family Thermoproteaceae. The 2,374,137 bp long genome with its 2,544 protein-coding and 49 RNA genes is a part of the Genomic Encyclopedia of Bacteriaand Archaea project.

  3. Project 1: Microbial Genomes: A Genomic Approach to Understanding the Evolution of Virulence. Project 2: From Genomes to Life: Drosophilia Development in Space and Time

    SciTech Connect

    Robert DeSalle

    2004-09-10

    This project seeks to use the genomes of two close relatives, A. actinomycetemcomitans and H. aphrophilus, to understand the evolutionary changes that take place in a genome to make it more or less virulent. Our primary specific aim of this project was to sequence, annotate, and analyze the genomes of Actinobacillus actinomycetemcomitans (CU1000, serotype f) and Haemophilus aphrophilus. With these genome sequences we have then compared the whole genome sequences to each other and to the current Aa (HK1651 www.genome.ou.edu) genome project sequence along with other fully sequenced Pasteurellaceae to determine inter and intra species differences that may account for the differences and similarities in disease. We also propose to create and curate a comprehensive database where sequence information and analysis for the Pasteurellaceae (family that includes the genera Actinobacillus and Haemophilus) are readily accessible. And finally we have proposed to develop phylogenetic techniques that can be used to efficiently and accurately examine the evolution of genomes. Below we report on progress we have made on these major specific aims. Progress on the specific aims is reported below under two major headings--experimental approaches and bioinformatics and systematic biology approaches.

  4. Complete genome sequence of Caulobacter crescentus

    PubMed Central

    Nierman, William C.; Feldblyum, Tamara V.; Laub, Michael T.; Paulsen, Ian T.; Nelson, Karen E.; Eisen, Jonathan; Heidelberg, John F.; Alley, M. R. K.; Ohta, Noriko; Maddock, Janine R.; Potocka, Isabel; Nelson, William C.; Newton, Austin; Stephens, Craig; Phadke, Nikhil D.; Ely, Bert; DeBoy, Robert T.; Dodson, Robert J.; Durkin, A. Scott; Gwinn, Michelle L.; Haft, Daniel H.; Kolonay, James F.; Smit, John; Craven, M. B.; Khouri, Hoda; Shetty, Jyoti; Berry, Kristi; Utterback, Teresa; Tran, Kevin; Wolf, Alex; Vamathevan, Jessica; Ermolaeva, Maria; White, Owen; Salzberg, Steven L.; Venter, J. Craig; Shapiro, Lucy; Fraser, Claire M.

    2001-01-01

    The complete genome sequence of Caulobacter crescentus was determined to be 4,016,942 base pairs in a single circular chromosome encoding 3,767 genes. This organism, which grows in a dilute aquatic environment, coordinates the cell division cycle and multiple cell differentiation events. With the annotated genome sequence, a full description of the genetic network that controls bacterial differentiation, cell growth, and cell cycle progression is within reach. Two-component signal transduction proteins are known to play a significant role in cell cycle progression. Genome analysis revealed that the C. crescentus genome encodes a significantly higher number of these signaling proteins (105) than any bacterial genome sequenced thus far. Another regulatory mechanism involved in cell cycle progression is DNA methylation. The occurrence of the recognition sequence for an essential DNA methylating enzyme that is required for cell cycle regulation is severely limited and shows a bias to intergenic regions. The genome contains multiple clusters of genes encoding proteins essential for survival in a nutrient poor habitat. Included are those involved in chemotaxis, outer membrane channel function, degradation of aromatic ring compounds, and the breakdown of plant-derived carbon sources, in addition to many extracytoplasmic function sigma factors, providing the organism with the ability to respond to a wide range of environmental fluctuations. C. crescentus is, to our knowledge, the first free-living α-class proteobacterium to be sequenced and will serve as a foundation for exploring the biology of this group of bacteria, which includes the obligate endosymbiont and human pathogen Rickettsia prowazekii, the plant pathogen Agrobacterium tumefaciens, and the bovine and human pathogen Brucella abortus. PMID:11259647

  5. Genomic Treasure Troves: Complete Genome Sequencing of Herbarium and Insect Museum Specimens

    PubMed Central

    Staats, Martijn; Erkens, Roy H. J.; van de Vossenberg, Bart; Wieringa, Jan J.; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E.; Bakker, Freek T.

    2013-01-01

    Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22–82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4–97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2–71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more

  6. Mapping and sequencing of structural variation from eight human genomes

    PubMed Central

    Kidd, Jeffrey M.; Cooper, Gregory M.; Donahue, William F.; Hayden, Hillary S.; Sampas, Nick; Graves, Tina; Hansen, Nancy; Teague, Brian; Alkan, Can; Antonacci, Francesca; Haugen, Eric; Zerr, Troy; Yamada, N. Alice; Tsang, Peter; Newman, Tera L.; Tüzün, Eray; Cheng, Ze; Ebling, Heather M.; Tusneem, Nadeem; David, Robert; Gillett, Will; Phelps, Karen A.; Weaver, Molly; Saranga, David; Brand, Adrianne; Tao, Wei; Gustafson, Erik; McKernan, Kevin; Chen, Lin; Malig, Maika; Smith, Joshua D.; Korn, Joshua M.; McCarroll, Steven A.; Altshuler, David A.; Peiffer, Daniel A.; Dorschner, Michael; Stamatoyannopoulos, John; Schwartz, David; Nickerson, Deborah A.; Mullikin, James C.; Wilson, Richard K.; Bruhn, Laurakay; Olson, Maynard V.; Kaul, Rajinder; Smith, Douglas R.; Eichler, Evan E.

    2008-01-01

    Genetic variation among individual humans occurs on many different scales, ranging from gross alterations in the human karyotype to single nucleotide changes. Here we explore variation on an intermediate scale—particularly insertions, deletions and inversions affecting from a few thousand to a few million base pairs. We employed a clone-based method to interrogate this intermediate structural variation in eight individuals of diverse geographic ancestry. Our analysis provides a comprehensive overview of the normal pattern of structural variation present in these genomes, refining the location of 1,695 structural variants. We find that 50% were seen in more than one individual and that nearly half lay outside regions of the genome previously described as structurally variant. We discover 525 new insertion sequences that are not present in the human reference genome and show that many of these are variable in copy number between individuals. Complete sequencing of 261 structural variants reveals considerable locus complexity and provides insights into the different mutational processes that have shaped the human genome. These data provide the first high-resolution sequence map of human structural variation—a standard for genotyping platforms and a prelude to future individual genome sequencing projects. PMID:18451855

  7. Complete genome sequence of Pyrolobus fumarii type strain (1AT)

    SciTech Connect

    Anderson, Iain; Goker, Markus; Nolan, Matt; Lucas, Susan; Hammon, Nancy; Deshpande, Shweta; Cheng, Jan-Fang; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne A.; Pitluck, Sam; Huntemann, Marcel; Liolios, Konstantinos; Ivanova, N; Pagani, Ioanna; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Brambilla, Evelyne-Marie; Huber, Harald; Yasawong, Montri; Rohde, Manfred; Spring, Stefan; Abt, Birte; Sikorski, Johannes; Wirth, Reinhard; Detter, J. Chris; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lapidus, Alla L.

    2011-01-01

    Pyrolobus fumarii Bl chl et al. 1997 is the type species of the genus Pyrolobus, which be- longs to the crenarchaeal family Pyrodictiaceae. The species is a facultatively microaerophilic non-motile crenarchaeon. It is of interest because of its isolated phylogenetic location in the tree of life and because it is a hyperthermophilic chemolithoautotroph known as the primary producer of organic matter at deep-sea hydrothermal vents. P. fumarii exhibits currently the highest optimal growth temperature of all life forms on earth (106 C). This is the first com- pleted genome sequence of a member of the genus Pyrolobus to be published and only the second genome sequence from a member of the family Pyrodictiaceae. Although Diversa Corporation announced the completion of sequencing of the P. fumarii genome on Septem- ber 25, 2001, this sequence was never released to the public. The 1,843,267 bp long genome with its 1,986 protein-coding and 52 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  8. The complete genome sequence of Mycobacterium bovis

    PubMed Central

    Garnier, Thierry; Eiglmeier, Karin; Camus, Jean-Christophe; Medina, Nadine; Mansoor, Huma; Pryor, Melinda; Duthoy, Stephanie; Grondin, Sophie; Lacroix, Celine; Monsempe, Christel; Simon, Sylvie; Harris, Barbara; Atkin, Rebecca; Doggett, Jon; Mayes, Rebecca; Keating, Lisa; Wheeler, Paul R.; Parkhill, Julian; Barrell, Bart G.; Cole, Stewart T.; Gordon, Stephen V.; Hewinson, R. Glyn

    2003-01-01

    Mycobacterium bovis is the causative agent of tuberculosis in a range of animal species and man, with worldwide annual losses to agriculture of $3 billion. The human burden of tuberculosis caused by the bovine tubercle bacillus is still largely unknown. M. bovis was also the progenitor for the M. bovis bacillus Calmette–Guérin vaccine strain, the most widely used human vaccine. Here we describe the 4,345,492-bp genome sequence of M. bovis AF2122/97 and its comparison with the genomes of Mycobacterium tuberculosis and Mycobacterium leprae. Strikingly, the genome sequence of M. bovis is >99.95% identical to that of M. tuberculosis, but deletion of genetic information has led to a reduced genome size. Comparison with M. leprae reveals a number of common gene losses, suggesting the removal of functional redundancy. Cell wall components and secreted proteins show the greatest variation, indicating their potential role in host–bacillus interactions or immune evasion. Furthermore, there are no genes unique to M. bovis, implying that differential gene expression may be the key to the host tropisms of human and bovine bacilli. The genome sequence therefore offers major insight on the evolution, host preference, and pathobiology of M. bovis. PMID:12788972

  9. Optimizing cancer genome sequencing and analysis

    PubMed Central

    Griffith, Malachi; Miller, Christopher A.; Griffith, Obi L.; Krysiak, Kilannin; Skidmore, Zachary L.; Ramu, Avinash; Walker, Jason R.; Dang, Ha X.; Trani, Lee; Larson, David E.; Demeter, Ryan T.; Wendl, Michael C.; McMichael, Joshua F.; Austin, Rachel E.; Magrini, Vincent; McGrath, Sean D.; Ly, Amy; Kulkarni, Shashikant; Cordes, Matthew G.; Fronick, Catrina C.; Fulton, Robert S.; Maher, Christopher A.; Ding, Li; Klco, Jeffery M.; Mardis, Elaine R.; Ley, Timothy J.; Wilson, Richard K.

    2015-01-01

    Summary Tumors are typically sequenced to depths of 75–100× (exome) or 30–50× (whole genome). We demonstrate that current sequencing paradigms are inadequate for tumors that are impure, aneuploid or clonally heterogeneous. To reassess optimal sequencing strategies, we performed ultra-deep (up to ~312×) whole genome sequencing (WGS) and exome capture (up to ~433×) of a primary acute myeloid leukemia, its subsequent relapse, and a matched normal skin sample. We tested multiple alignment and variant calling algorithms and validated ~200,000 putative SNVs by sequencing them to depths of ~1,000×. Additional targeted sequencing provided over 10,000× coverage and ddPCR assays provided up to ~250,000× sampling of selected sites. We evaluated the effects of different library generation approaches, depth of sequencing, and analysis strategies on the ability to effectively characterize a complex tumor. This dataset, representing the most comprehensively sequenced tumor described to date, will serve as an invaluable community resource (dbGaP accession id phs000159). PMID:26645048

  10. Human genome project: revolutionizing biology through leveraging technology

    NASA Astrophysics Data System (ADS)

    Dahl, Carol A.; Strausberg, Robert L.

    1996-04-01

    The Human Genome Project (HGP) is an international project to develop genetic, physical, and sequence-based maps of the human genome. Since the inception of the HGP it has been clear that substantially improved technology would be required to meet the scientific goals, particularly in order to acquire the complete sequence of the human genome, and that these technologies coupled with the information forthcoming from the project would have a dramatic effect on the way biomedical research is performed in the future. In this paper, we discuss the state-of-the-art for genomic DNA sequencing, technological challenges that remain, and the potential technological paths that could yield substantially improved genomic sequencing technology. The impact of the technology developed from the HGP is broad-reaching and a discussion of other research and medical applications that are leveraging HGP-derived DNA analysis technologies is included. The multidisciplinary approach to the development of new technologies that has been successful for the HGP provides a paradigm for facilitating new genomic approaches toward understanding the biological role of functional elements and systems within the cell, including those encoded within genomic DNA and their molecular products.

  11. A map of human genome variation from population scale sequencing

    PubMed Central

    2011-01-01

    The 1000 Genomes Project aims to provide a deep characterisation of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. We present results of the pilot phase of the project, designed to develop and compare different strategies for genome wide sequencing with high throughput sequencing platforms. We undertook three projects: low coverage whole genome sequencing of 179 individuals from four populations, high coverage sequencing of two mother-father-child trios, and exon targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million SNPs, 1 million short insertions and deletions and 20,000 structural variants, the majority of which were previously undescribed. We show that over 95% of the currently accessible variants found in any individual are present in this dataset; on average, each person carries approximately 250 to 300 loss of function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios we directly estimate the rate of de novo germline base substitution mutations to be approximately 10−8 per base pair per generation. We find many putative functional variants with large allele frequency differences between populations. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research. PMID:20981092

  12. [Mapping and human genome sequence program].

    PubMed

    Weissenbach, J

    1997-03-01

    Until recently, human genome programs focused primarily on establishing maps that would provide signposts to researchers seeking to identify genes responsible for inherited diseases, as well as a basis for genome sequencing studies. Preestablished gene mapping goals have been reached. The over 7,000 microsatellite markers identified to date provide a map of sufficient density to allow localization of the gene of a monogenic disease with a precision of 1 to 2 million base pairs. The physical map, based on systematically arranged overlapping sets of artificial yeast chromosomes (YACs), has also made considerable headway during the last few years. The most recently published map covers more than 90% of the genome. However, currently available physical maps cannot be used for sequencing studies because multiple rearrangements occur in YACs. The recently developed sets of radioinduced hybrids are extremely useful for incorporating genes into existing maps. A network of American and European laboratories has successfully used these radioinduced hybrids to map 15,000 gene tags from large-scale cDNA library sequencing programs. There are increasingly pressing reasons for initiating large scale human genome sequencing studies.

  13. Draft Genome Sequence of Lactobacillus plantarum 2165

    PubMed Central

    Abramov, Vyacheslav M.

    2014-01-01

    This report describes a draft genome sequence of Lactobacillus plantarum 2165. The data demonstrate the presence of a large number of genes responsible for sugar metabolism and the fermentation activity of this bacterium. Different cell surface proteins, including fibronectin and mucus-binding adhesins, may contribute to the beneficial probiotic properties of this strain. PMID:24407651

  14. Genome sequence of Bacillus licheniformis WX-02.

    PubMed

    Yangtse, Wuming; Zhou, Yinhua; Lei, Yang; Qiu, Yimin; Wei, Xuetuan; Ji, Zhixia; Qi, Gaofu; Yong, Yangchun; Chen, Lingling; Chen, Shouwen

    2012-07-01

    Bacillus licheniformis is an important bacterium that has been used extensively for large-scale industrial production of exoenzymes and peptide antibiotics. B. licheniformis WX-02 produces poly-gamma-glutamate increasingly when fermented under stress conditions. Here its genome sequence (4,270,104 bp, with G+C content of 46.06%), which comprises a circular chromosome, is announced.

  15. Draft genome sequence of Bacillus oceanisediminis 2691.

    PubMed

    Lee, Yong-Jik; Lee, Sang-Jae; Jeong, Haeyoung; Kim, Hyun Ju; Ryu, Naeun; Kim, Byoung-Chan; Lee, Han-Seung; Lee, Dong-Woo; Lee, Sang Jun

    2012-11-01

    Bacillus oceanisediminis 2691 is an aerobic, Gram-positive, spore-forming, and moderately halophilic bacterium that was isolated from marine sediment of the Yellow Sea coast of South Korea. Here, we report the draft genome sequence of B. oceanisediminis 2691 that may have an important role in the bioremediation of marine sediment.

  16. Complete Genome Sequences of 61 Mycobacteriophages

    PubMed Central

    2016-01-01

    Mycobacteriophages—viruses of mycobacteria—provide insights into viral diversity and evolution as well as numerous tools for genetic dissection of Mycobacterium tuberculosis. Here we report the complete genome sequences of 61 mycobacteriophages newly isolated from environmental samples using Mycobacterium smegmatis mc2155 that expand our understanding of phage diversity. PMID:27389257

  17. The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide.

    PubMed

    Liolios, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Kyrpides, Nikos C

    2006-01-01

    The Genomes On Line Database (GOLD) is a web resource for comprehensive access to information regarding complete and ongoing genome sequencing projects worldwide. The database currently incorporates information on over 1500 sequencing projects, of which 294 have been completed and the data deposited in the public databases. GOLD v.2 has been expanded to provide information related to organism properties such as phenotype, ecotype and disease. Furthermore, project relevance and availability information is now included. GOLD is available at http://www.genomesonline.org. It is also mirrored at the Institute of Molecular Biology and Biotechnology, Crete, Greece at http://gold.imbb.forth.gr/

  18. Whole Genome Sequence of a Turkish Individual

    PubMed Central

    Dogan, Haluk; Can, Handan; Otu, Hasan H.

    2014-01-01

    Although whole human genome sequencing can be done with readily available technical and financial resources, the need for detailed analyses of genomes of certain populations still exists. Here we present, for the first time, sequencing and analysis of a Turkish human genome. We have performed 35x coverage using paired-end sequencing, where over 95% of sequencing reads are mapped to the reference genome covering more than 99% of the bases. The assembly of unmapped reads rendered 11,654 contigs, 2,168 of which did not reveal any homology to known sequences, resulting in ∼1 Mbp of unmapped sequence. Single nucleotide polymorphism (SNP) discovery resulted in 3,537,794 SNP calls with 29,184 SNPs identified in coding regions, where 106 were nonsense and 259 were categorized as having a high-impact effect. The homo/hetero zygosity (1,415,123∶2,122,671 or 1∶1.5) and transition/transversion ratios (2,383,204∶1,154,590 or 2.06∶1) were within expected limits. Of the identified SNPs, 480,396 were potentially novel with 2,925 in coding regions, including 48 nonsense and 95 high-impact SNPs. Functional analysis of novel high-impact SNPs revealed various interaction networks, notably involving hereditary and neurological disorders or diseases. Assembly results indicated 713,640 indels (1∶1.09 insertion/deletion ratio), ranging from −52 bp to 34 bp in length and causing about 180 codon insertion/deletions and 246 frame shifts. Using paired-end- and read-depth-based methods, we discovered 9,109 structural variants and compared our variant findings with other populations. Our results suggest that whole genome sequencing is a valuable tool for understanding variations in the human genome across different populations. Detailed analyses of genomes of diverse origins greatly benefits research in genetics and medicine and should be conducted on a larger scale. PMID:24416366

  19. Hardware accelerator for genomic sequence alignment.

    PubMed

    Chiang, Jason; Studniberg, Michael; Shaw, Jack; Seto, Shaw; Truong, Kevin

    2006-01-01

    To infer homology and subsequently gene function, the Smith-Waterman algorithm is used to find the optimal local alignment between two sequences. When searching sequence databases that may contain billions of sequences, this algorithm becomes computationally expensive. Consequently, in this paper, we focused on accelerating the Smith-Waterman algorithm by modifying the computationally repeated portion of the algorithm by FPGA hardware custom instructions. These simple modifications accelerated the algorithm runtime by an average of 287% compared to the pure software implementation. Therefore, further design of FPGA accelerated hardware offers a promising direction to seeking runtime improvement of genomic database searching.

  20. A Draft Sequence of the Neandertal Genome

    PubMed Central

    Green, Richard E.; Li, Heng; Zhai, Weiwei; Fritz, Markus Hsi-Yang; Hansen, Nancy F.; Durand, Eric Y.; Malaspinas, Anna-Sapfo; Jensen, Jeffrey D.; Marques-Bonet, Tomas; Alkan, Can; Prüfer, Kay; Meyer, Matthias; Burbano, Hernán A.; Good, Jeffrey M.; Schultz, Rigo; Aximu-Petri, Ayinuer; Butthof, Anne; Höber, Barbara; Höffner, Barbara; Siegemund, Madlen; Weihmann, Antje; Nusbaum, Chad; Lander, Eric S.; Russ, Carsten; Novod, Nathaniel; Affourtit, Jason; Egholm, Michael; Verna, Christine; Rudan, Pavao; Brajkovic, Dejana; Kucan, Željko; Gušic, Ivan; Doronichev, Vladimir B.; Golovanova, Liubov V.; Lalueza-Fox, Carles; de la Rasilla, Marco; Fortea, Javier; Rosas, Antonio; Schmitz, Ralf W.; Johnson, Philip L. F.; Eichler, Evan E.; Falush, Daniel; Birney, Ewan; Mullikin, James C.; Slatkin, Montgomery; Nielsen, Rasmus; Kelso, Janet; Lachmann, Michael; Reich, David; Pääbo, Svante

    2016-01-01

    Neandertals, the closest evolutionary relatives of present-day humans, lived in large parts of Europe and western Asia before disappearing 30,000 years ago. We present a draft sequence of the Neandertal genome composed of more than 4 billion nucleotides from three individuals. Comparisons of the Neandertal genome to the genomes of five present-day humans from different parts of the world identify a number of genomic regions that may have been affected by positive selection in ancestral modern humans, including genes involved in metabolism and in cognitive and skeletal development. We show that Neandertals shared more genetic variants with present-day humans in Eurasia than with present-day humans in sub-Saharan Africa, suggesting that gene flow from Neandertals into the ancestors of non-Africans occurred before the divergence of Eurasian groups from each other. PMID:20448178

  1. Complete genome sequence of Intrasporangium calvumtype strain (7 KIPT)

    SciTech Connect

    Glavina Del Rio, Tijana; Chertkov, Olga; Yasawong, Montri; Lucas, Susan; Deshpande, Shweta; Cheng, Jan-Fang; Detter, J. Chris; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Rohde, Manfred; Pukall, Rudiger; Sikorski, Johannes; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lapidus, Alla L.

    2010-01-01

    Intrasporangium calvum Kalakoutskii et al. 1967 is the type species of the genus Intrasporangium, which belongs to the actinobacterial family Intrasporangiaceae. The species is a Gram-positive bacterium that forms a branching mycelium, which tends to break into irregular fragments. The mycelium of this strain may bear intercalary vesicles but does not contain spores. The strain described in this study is an airborne organism that was isolated from a school dining room in 1967. One particularly interesting feature of I. calvum is that the type of its menaquinone is different from all other representatives of the family Intrasporangiaceae. This is the first completed genome sequence from a member of the genus Intrasporangium and also the first sequence from the family Intrasporangiaceae. The 4,024,382 bp long genome with its 3,653 protein-coding and 57 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  2. Complete genome sequence of Ferrimonas balearica type strain (PATT)

    SciTech Connect

    Nolan, Matt; Sikorski, Johannes; Davenport, Karen W.; Lucas, Susan; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Tapia, Roxanne; Brettin, Thomas S; Detter, J. Chris; Han, Cliff; Yasawong, Montri; Rohde, Manfred; Tindall, Brian; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lapidus, Alla L.

    2010-01-01

    Ferrimonas balerica (Rossello-Mora et al. 1996) is the type species of the genus Ferrimonas, which belongs to the gammaproteobacterial family Ferrimonadaceae. The species is a Gram-negative, motile, facultatively anaerobic and non spore-forming bacterium, which is of special interest because it is a chemoorganotroph and has a strictly respiratory metabolism with oxygen, nitrate, Fe(III)-oxyhydroxide, Fe(III)-citrate, MnO2, selenate, selenite and thiosulfate as electron acceptors. This is the first completed genome sequence of a member of the genus Ferrimonas and also the first sequence from a member of the family Ferrimonadaceae. The 4,279,159 bp long genome with its 3,803 protein-coding and 144 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  3. Complete genome sequence of Oceanithermus profundus type strain (506T)

    SciTech Connect

    Pati, Amrita; Zhang, Xiaojing; Lapidus, Alla L.; Nolan, Matt; Lucas, Susan; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Pagani, Ioanna; Ivanova, N; Mavromatis, K; Chen, Amy; Palaniappan, Krishna; Hauser, Loren John; Jeffries, Cynthia; Brambilla, Evelyne-Marie; Ruhl, Alina; Mwirichia, Romano; Rohde, Manfred; Tindall, Brian; Sikorski, Johannes; Wirth, Reinhard; Goker, Markus; Woyke, Tanja; Detter, J. Chris; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Land, Miriam L

    2011-01-01

    Oceanithermus profundus Miroshnichenko et al. 2003 is the type species of the genus Oceanithermus, which belongs to the family Thermaceae. The genus currently comprises two species whose members are thermophilic and are able to reduce sulfur compounds and nitrite. The organism is adapted to the salinity of sea water, is able to utilize a broad range of carbohydrates, some proteinaceous substrates, organic acids and alcohols. This is the first completed genome sequence of a member of the genus Oceanithermus and the fourth sequence from the family Thermaceae. The 2,439,291 bp long genome with its 2,391 protein-coding and 54 RNA genes consists of one chromosome and a 135,351 bp long plasmid, and is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  4. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

    SciTech Connect

    Liolios, Konstantinos; Chen, Amy; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Phil; Markowitz, Victor; Kyrpides, Nikos C.

    2009-09-01

    The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification.

  5. The Genome 10K Project: a way forward.

    PubMed

    Koepfli, Klaus-Peter; Paten, Benedict; O'Brien, Stephen J

    2015-01-01

    The Genome 10K Project was established in 2009 by a consortium of biologists and genome scientists determined to facilitate the sequencing and analysis of the complete genomes of 10,000 vertebrate species. Since then the number of selected and initiated species has risen from ∼26 to 277 sequenced or ongoing with funding, an approximately tenfold increase in five years. Here we summarize the advances and commitments that have occurred by mid-2014 and outline the achievements and present challenges of reaching the 10,000-species goal. We summarize the status of known vertebrate genome projects, recommend standards for pronouncing a genome as sequenced or completed, and provide our present and future vision of the landscape of Genome 10K. The endeavor is ambitious, bold, expensive, and uncertain, but together the Genome 10K Consortium of Scientists and the worldwide genomics community are moving toward their goal of delivering to the coming generation the gift of genome empowerment for many vertebrate species.

  6. Dominant short repeated sequences in bacterial genomes.

    PubMed

    Avershina, Ekaterina; Rudi, Knut

    2015-03-01

    We use a novel multidimensional searching approach to present the first exhaustive search for all possible repeated sequences in 166 genomes selected to cover the bacterial domain. We found an overrepresentation of repeated sequences in all but one of the genomes. The most prevalent repeats by far were related to interspaced short palindromic repeats (CRISPRs)—conferring bacterial adaptive immunity. We identified a deep branching clade of thermophilic Firmicutes containing the highest number of CRISPR repeats. We also identified a high prevalence of tandem repeated heptamers. In addition, we identified GC-rich repeats that could potentially be involved in recombination events. Finally, we identified repeats in a 16322 amino acid mega protein (involved in biofilm formation) and inverted repeats flanking miniature transposable elements (MITEs). In conclusion, the exhaustive search for repeated sequences identified new elements and distribution of these, which has implications for understanding both the ecology and evolution of bacteria.

  7. The Human Genome Project and biology education

    SciTech Connect

    McInerney, J.D.

    1995-12-01

    Within the last several years, biologists celebrated the fortieth anniversary of the Watson-Crick model of DNA and the fiftieth anniversary of the demonstration that DNA is the genetic material, discoveries that began a pervasive and ongoing revolution in biology and medicine. Nobelist Joshua Lederberg, for example, called the work of Avery`s group {open_quotes}the most important discovery in biology in the twentieth century.{close_quotes} This early work on DNA also contributed to a revolution in biology education, beginning in the 1960s. Like the biological revolution that is its counterpart, however, the educational revolution is incomplete, in part because the science continues to evolve, but primarily because scientists and science educators have not yet responded completely to the challenges of genetics and molecular biology. These challenges are made even more obvious by the scope and visibility of the Human Genome Project, the international project intended to map and sequence all human genes. Science educators face 4 challenges discussed in this article and using the Genome project as an example: teach for conceptual understanding; the nature of science; the personal and social impact of science and technology; the principles of technology.

  8. Draft genome sequence of the Algerian bee Apis mellifera intermissa.

    PubMed

    Haddad, Nizar Jamal; Loucif-Ayad, Wahida; Adjlane, Noureddine; Saini, Deepti; Manchiganti, Rushiraj; Krishnamurthy, Venkatesh; AlShagoor, Banan; Batainh, Ahmed Mahmud; Mugasimangalam, Raja

    2015-06-01

    Apis mellifera intermissa is the native honeybee subspecies of Algeria. A. m. intermissa occurs in Tunisia, Algeria and Morocco, between the Atlas and the Mediterranean and Atlantic coasts. This bee is very important due to its high ability to adapt to great variations in climatic conditions and due to its preferable cleaning behavior. Here we report the draft genome sequence of this honey bee, its Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession JSUV00000000. The 240-Mb genome is being annotated and analyzed. Comparison with the genome of other Apis mellifera sub-species promises to yield insights into the evolution of adaptations to high temperature and resistance to Varroa parasite infestation.

  9. Draft genome sequence of the Algerian bee Apis mellifera intermissa

    PubMed Central

    Haddad, Nizar Jamal; Loucif-Ayad, Wahida; Adjlane, Noureddine; Saini, Deepti; Manchiganti, Rushiraj; Krishnamurthy, Venkatesh; AlShagoor, Banan; Batainh, Ahmed Mahmud; Mugasimangalam, Raja

    2015-01-01

    Apis mellifera intermissa is the native honeybee subspecies of Algeria. A. m. intermissa occurs in Tunisia, Algeria and Morocco, between the Atlas and the Mediterranean and Atlantic coasts. This bee is very important due to its high ability to adapt to great variations in climatic conditions and due to its preferable cleaning behavior. Here we report the draft genome sequence of this honey bee, its Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession JSUV00000000. The 240-Mb genome is being annotated and analyzed. Comparison with the genome of other Apis mellifera sub-species promises to yield insights into the evolution of adaptations to high temperature and resistance to Varroa parasite infestation. PMID:26484171

  10. Sequence modelling and an extensible data model for genomic database

    SciTech Connect

    Li, Peter Wei-Der

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  11. Sequence modelling and an extensible data model for genomic database

    SciTech Connect

    Li, Peter Wei-Der Lawrence Berkeley Lab., CA )

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  12. Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes

    USDA-ARS?s Scientific Manuscript database

    Genomics research in mammals has produced reference genome sequences that are essential for identifying variation associated with disease. High quality reference genome sequences are now available for humans, model species, and economically important agricultural animals. Comparisons between these s...

  13. Complete genome sequence of Sebaldella termitidis type strain (NCTC 11300).

    PubMed

    Harmon-Smith, Miranda; Celia, Laura; Chertkov, Olga; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Han, Cliff; Detter, John C; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Pati, Amrita; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Brettin, Thomas; Göker, Markus; Beck, Brian; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Chen, Feng

    2010-03-30

    Sebaldella termitidis (Sebald 1962) Collins and Shah 1986, is the only species in the genus Sebaldella within the fusobacterial family 'Leptotrichiaceae'. The sole and type strain of the species was first isolated about 50 years ago from intestinal content of Mediterranean termites. The species is of interest for its very isolated phylogenetic position within the phylum Fusobacteria in the tree of life, with no other species sharing more than 90% 16S rRNA sequence similarity. The 4,486,650 bp long genome with its 4,210 protein-coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  14. Draft Genome Sequence of Leucobacter sp. Strain UCD-THU (Phylum Actinobacteria)

    PubMed Central

    Holland-Moritz, Hannah E.; Bevans, Dakota R.; Lang, Jenna M.; Darling, Aaron E.; Coil, David A.

    2013-01-01

    Here we present the draft genome of Leucobacter sp. strain UCD-THU. The genome contains 3,317,267 bp in 11 scaffolds. This strain was isolated from a residential toilet as part of an undergraduate project to sequence reference genomes of microbes from the built environment. PMID:23792744

  15. Complete genome sequence of Streptosporangium roseum type strain (NI 9100T)

    SciTech Connect

    Nolan, Matt; Sikorski, Johannes; Jando, Marlen; Lucas, Susan; Lapidus, Alla L.; Glavina Del Rio, Tijana; Chen, Feng; Tice, Hope; Pitluck, Sam; Cheng, Jan-Fang; Chertkov, Olga; Sims, David; Meincke, Linda; Brettin, Thomas S; Han, Cliff; Detter, J. Chris; Bruce, David; Goodwin, Lynne A.; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Chen, Amy; Palaniappan, Krishna; Chain, Patrick S. G.; Rohde, Manfred; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-01

    Streptosporangium roseum Crauch 1955 is the type strain of the species which is the type species of the genus Streptosporangium. The pinkish coiled Streptomyces-like organism with a spore case was isolated from vegetable garden soil in 1955. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the family Streptosporangiaceae, and the second largest microbial genome sequence ever deciphered. The 10,369,518 bp long genome with its 9421 protein-coding and 80 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  16. Comparative Analysis of Genome Sequences with VISTA

    DOE Data Explorer

    Dubchak, Inna

    VISTA is a comprehensive suite of programs and databases developed by and hosted at the Genomics Division of Lawrence Berkeley National Laboratory. They provide information and tools designed to facilitate comparative analysis of genomic sequences. Users have two ways to interact with the suite of applications at the VISTA portal. They can submit their own sequences and alignments for analysis (VISTA servers) or examine pre-computed whole-genome alignments of different species. A key menu option is the Enhancer Browser and Database at http://enhancer.lbl.gov/. The VISTA Enhancer Browser is a central resource for experimentally validated human noncoding fragments with gene enhancer activity as assessed in transgenic mice. Most of these noncoding elements were selected for testing based on their extreme conservation with other vertebrates. The results of this enhancer screen are provided through this publicly available website. The browser also features relevant results by external contributors and a large collection of additional genome-wide conserved noncoding elements which are candidate enhancer sequences. The LBL developers invite external groups to submit computational predictions of developmental enhancers. As of 10/19/2009 the database contains information on 1109 in vivo tested elements - 508 elements with enhancer activity.

  17. Agaricus bisporus genome sequence: a commentary.

    PubMed

    Kerrigan, Richard W; Challen, Michael P; Burton, Kerry S

    2013-06-01

    The genomes of two isolates of Agaricus bisporus have been sequenced recently. This soil-inhabiting fungus has a wide geographical distribution in nature and it is also cultivated in an industrialized indoor process ($4.7bn annual worldwide value) to produce edible mushrooms. Previously this lignocellulosic fungus has resisted precise econutritional classification, i.e. into white- or brown-rot decomposers. The generation of the genome sequence and transcriptomic analyses has revealed a new classification, 'humicolous', for species adapted to grow in humic-rich, partially decomposed leaf material. The Agaricus biporus genomes contain a collection of polysaccharide and lignin-degrading genes and more interestingly an expanded number of genes (relative to other lignocellulosic fungi) that enhance degradation of lignin derivatives, i.e. heme-thiolate peroxidases and β-etherases. A motif that is hypothesized to be a promoter element in the humicolous adaptation suite is present in a large number of genes specifically up-regulated when the mycelium is grown on humic-rich substrate. The genome sequence of A. bisporus offers a platform to explore fungal biology in carbon-rich soil environments and terrestrial cycling of carbon, nitrogen, phosphorus and potassium.

  18. Genome sequence of Aspergillus luchuensis NBRC 4314

    PubMed Central

    Yamada, Osamu; Machida, Masayuki; Hosoyama, Akira; Goto, Masatoshi; Takahashi, Toru; Futagami, Taiki; Yamagata, Youhei; Takeuchi, Michio; Kobayashi, Tetsuo; Koike, Hideaki; Abe, Keietsu; Asai, Kiyoshi; Arita, Masanori; Fujita, Nobuyuki; Fukuda, Kazuro; Higa, Ken-ichi; Horikawa, Hiroshi; Ishikawa, Takeaki; Jinno, Koji; Kato, Yumiko; Kirimura, Kohtaro; Mizutani, Osamu; Nakasone, Kaoru; Sano, Motoaki; Shiraishi, Yohei; Tsukahara, Masatoshi; Gomi, Katsuya

    2016-01-01

    Awamori is a traditional distilled beverage made from steamed Thai-Indica rice in Okinawa, Japan. For brewing the liquor, two microbes, local kuro (black) koji mold Aspergillus luchuensis and awamori yeast Saccharomyces cerevisiae are involved. In contrast, that yeasts are used for ethanol fermentation throughout the world, a characteristic of Japanese fermentation industries is the use of Aspergillus molds as a source of enzymes for the maceration and saccharification of raw materials. Here we report the draft genome of a kuro (black) koji mold, A. luchuensis NBRC 4314 (RIB 2604). The total length of nonredundant sequences was nearly 34.7 Mb, comprising approximately 2,300 contigs with 16 telomere-like sequences. In total, 11,691 genes were predicted to encode proteins. Most of the housekeeping genes, such as transcription factors and N-and O-glycosylation system, were conserved with respect to Aspergillus niger and Aspergillus oryzae. An alternative oxidase and acid-stable α-amylase regarding citric acid production and fermentation at a low pH as well as a unique glutamic peptidase were also found in the genome. Furthermore, key biosynthetic gene clusters of ochratoxin A and fumonisin B were absent when compared with A. niger genome, showing the safety of A. luchuensis for food and beverage production. This genome information will facilitate not only comparative genomics with industrial kuro-koji molds, but also molecular breeding of the molds in improvements of awamori fermentation. PMID:27651094

  19. An evaluation of Comparative Genome Sequencing (CGS) by comparing two previously-sequenced bacterial genomes

    PubMed Central

    Herring, Christopher D; Palsson, Bernhard Ø

    2007-01-01

    Background With the development of new technology, it has recently become practical to resequence the genome of a bacterium after experimental manipulation. It is critical though to know the accuracy of the technique used, and to establish confidence that all of the mutations were detected. Results In order to evaluate the accuracy of genome resequencing using the microarray-based Comparative Genome Sequencing service provided by Nimblegen Systems Inc., we resequenced the E. coli strain W3110 Kohara using MG1655 as a reference, both of which have been completely sequenced using traditional sequencing methods. CGS detected 7 of 8 small sequence differences, one large deletion, and 9 of 12 IS element insertions present in W3110, but did not detect a large chromosomal inversion. In addition, we confirmed that CGS also detected 2 SNPs, one deletion and 7 IS element insertions that are not present in the genome sequence, which we attribute to changes that occurred after the creation of the W3110 lambda clone library. The false positive rate for SNPs was one per 244 Kb of genome sequence. Conclusion CGS is an effective way to detect multiple mutations present in one bacterium relative to another, and while highly cost-effective, is prone to certain errors. Mutations occurring in repeated sequences or in sequences with a high degree of secondary structure may go undetected. It is also critical to follow up on regions of interest in which SNPs were not called because they often indicate deletions or IS element insertions. PMID:17697331

  20. Draft Genome Sequence of Microbacterium sp. Strain UCD-TDU (Phylum Actinobacteria)

    PubMed Central

    Bendiks, Zachary A.; Lang, Jenna M.; Darling, Aaron E.; Coil, David A.

    2013-01-01

    Here, we present the draft genome sequence of Microbacterium sp. strain UCD-TDU, a member of the phylum Actinobacteria. The assembly contains 3,746,321 bp (in 8 scaffolds). This strain was isolated from a residential toilet as part of an undergraduate student research project to sequence reference genomes of microbes from the built environment. PMID:23516225

  1. Current status of the Plasmodium falciparum genome project.

    PubMed

    Dame, J B; Arnot, D E; Bourke, P F; Chakrabarti, D; Christodoulou, Z; Coppel, R L; Cowman, A F; Craig, A G; Fischer, K; Foster, J; Goodman, N; Hinterberg, K; Holder, A A; Holt, D C; Kemp, D J; Lanzer, M; Lim, A; Newbold, C I; Ravetch, J V; Reddy, G R; Rubio, J; Schuster, S M; Su, X Z; Thompson, J K; Werner, E B

    1996-07-01

    The Plasmodium falciparum Genome Project is a collaborative effort by many laboratories that will provide detailed molecular information about the parasite, which may be used for developing practical control measures. Initial goals are to prepare an electronically indexed clone bank containing partially sequenced clones representing up to 80% of the parasite's genes and to prepare an ordered set of overlapping clones spanning each of the parasite's 14 chromosomes. Currently, clones of genomic DNA, prepared as yeast artificial chromosomes, are arranged into contigs covering approximately 70% of the genome of parasite clone 3D7, gene sequence tags are available from more than contigs covering approximately 70% of the genome of parasite clone 3D7, gene sequence tags are available from more than 20% of the parasite's genes, and approximately 5% of the parasite's genes are tentatively identified from similarity searches of entries in the international sequence databases. A total of > 0.5 Mb of P. falciparum sequence tag data is available. The gene sequence tags are presently being used to complete YAC contig assembly and localize the cloned genes to positions on the physical map in preparation for sequencing the genome. Routes of access to project information and services are described.

  2. Synaptotagmin gene content of the sequenced genomes.

    PubMed

    Craxton, Molly

    2004-07-06

    Synaptotagmins exist as a large gene family in mammals. There is much interest in the function of certain family members which act crucially in the regulated synaptic vesicle exocytosis required for efficient neurotransmission. Knowledge of the functions of other family members is relatively poor and the presence of Synaptotagmin genes in plants indicates a role for the family as a whole which is wider than neurotransmission. Identification of the Synaptotagmin genes within completely sequenced genomes can provide the entire Synaptotagmin gene complement of each sequenced organism. Defining the detailed structures of all the Synaptotagmin genes and their encoded products can provide a useful resource for functional studies and a deeper understanding of the evolution of the gene family. The current rapid increase in the number of sequenced genomes from different branches of the tree of life, together with the public deposition of evolutionarily diverse transcript sequences make such studies worthwhile. I have compiled a detailed list of the Synaptotagmin genes of Caenorhabditis, Anopheles, Drosophila, Ciona, Danio, Fugu, Mus, Homo, Arabidopsis and Oryza by examining genomic and transcript sequences from public sequence databases together with some transcript sequences obtained by cDNA library screening and RT-PCR. I have compared all of the genes and investigated the relationship between plant Synaptotagmins and their non-Synaptotagmin counterparts. I have identified and compared 98 Synaptotagmin genes from 10 sequenced genomes. Detailed comparison of transcript sequences reveals abundant and complex variation in Synaptotagmin gene expression and indicates the presence of Synaptotagmin genes in all animals and land plants. Amino acid sequence comparisons indicate patterns of conservation and diversity in function. Phylogenetic analysis shows the origin of Synaptotagmins in multicellular eukaryotes and their great diversification in animals. Synaptotagmins occur in

  3. Complete genome sequence of Streptobacillus moniliformis type strain (9901T)

    PubMed Central

    Nolan, Matt; Gronow, Sabine; Lapidus, Alla; Ivanova, Natalia; Copeland, Alex; Lucas, Susan; Del Rio, Tijana Glavina; Chen, Feng; Tice, Hope; Pitluck, Sam; Cheng, Jan-Fang; Sims, David; Meincke, Linda; Bruce, David; Goodwin, Lynne; Brettin, Thomas; Han, Cliff; Detter, John C.; Ovchinikova, Galina; Pati, Amrita; Mavromatis, Konstantinos; Mikhailova, Natalia; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Rohde, Manfred; Spröer, Cathrin; Göker, Markus; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Chain, Patrick

    2009-01-01

    Streptobacillus moniliformis Levaditi et al. 1925 is the type and sole species of the genus Streptobacillus, and is of phylogenetic interest because of its isolated location in the sparsely populated and neither taxonomically nor genomically much accessed family 'Leptotrichiaceae' within the phylum Fusobacteria. The 'Leptotrichiaceae' have not been well characterized, genomically or taxonomically. S. moniliformis,is a Gram-negative, non-motile, pleomorphic bacterium and is the etiologic agent of rat bite fever and Haverhill fever. Strain 9901T, the type strain of the species, was isolated from a patient with rat bite fever. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is only the second completed genome sequence of the order Fusobacteriales and no more than the third sequence from the phylum Fusobacteria. The 1,662,578 bp long chromosome and the 10,702 bp plasmid with a total of 1511 protein-coding and 55 RNA genes are part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304670

  4. Complete genome sequence of the Antarctic Halorubrum lacusprofundi type strain ACAM 34

    DOE PAGES

    Anderson, Iain J.; DasSarma, Priya; Lucas, Susan; ...

    2016-09-10

    Halorubrum lacusprofundi is an extreme halophile within the archaeal phylum Euryarchaeota. The type strain ACAM 34 was isolated from Deep Lake, Antarctica. H. lacusprofundi is of phylogenetic interest because it is distantly related to the haloarchaea that have previously been sequenced. It is also of interest because of its psychrotolerance. We report here the complete genome sequence of H. lacusprofundi type strain ACAM 34 and its annotation. In conclusion, this genome is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.

  5. Complete genome sequence of the Antarctic Halorubrum lacusprofundi type strain ACAM 34

    SciTech Connect

    Anderson, Iain J.; DasSarma, Priya; Lucas, Susan; Copeland, Alex; Lapidus, Alla; Del Rio, Tijana Glavina; Tice, Hope; Dalin, Eileen; Bruce, David C.; Goodwin, Lynne; Pitluck, Sam; Sims, David; Brettin, Thomas S.; Detter, John C.; Han, Cliff S.; Larimer, Frank; Hauser, Loren; Land, Miriam; Ivanova, Natalia; Richardson, Paul; Cavicchioli, Ricardo; DasSarma, Shiladitya; Woese, Carl R.; Kyrpides, Nikos C.

    2016-09-10

    Halorubrum lacusprofundi is an extreme halophile within the archaeal phylum Euryarchaeota. The type strain ACAM 34 was isolated from Deep Lake, Antarctica. H. lacusprofundi is of phylogenetic interest because it is distantly related to the haloarchaea that have previously been sequenced. It is also of interest because of its psychrotolerance. We report here the complete genome sequence of H. lacusprofundi type strain ACAM 34 and its annotation. In conclusion, this genome is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.

  6. Complete genome sequence of the Antarctic Halorubrum lacusprofundi type strain ACAM 34.

    PubMed

    Anderson, Iain J; DasSarma, Priya; Lucas, Susan; Copeland, Alex; Lapidus, Alla; Del Rio, Tijana Glavina; Tice, Hope; Dalin, Eileen; Bruce, David C; Goodwin, Lynne; Pitluck, Sam; Sims, David; Brettin, Thomas S; Detter, John C; Han, Cliff S; Larimer, Frank; Hauser, Loren; Land, Miriam; Ivanova, Natalia; Richardson, Paul; Cavicchioli, Ricardo; DasSarma, Shiladitya; Woese, Carl R; Kyrpides, Nikos C

    2016-01-01

    Halorubrum lacusprofundi is an extreme halophile within the archaeal phylum Euryarchaeota. The type strain ACAM 34 was isolated from Deep Lake, Antarctica. H. lacusprofundi is of phylogenetic interest because it is distantly related to the haloarchaea that have previously been sequenced. It is also of interest because of its psychrotolerance. We report here the complete genome sequence of H. lacusprofundi type strain ACAM 34 and its annotation. This genome is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.

  7. Complete genome sequence of Halomicrobium mukohataei type strain (arg-2).

    PubMed

    Tindall, Brian J; Schneider, Susanne; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Chen, Feng; Tice, Hope; Cheng, Jan-Fang; Saunders, Elizabeth; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Mikhailova, Natalia; Pati, Amrita; Ivanova, Natalia; Mavrommatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; Chain, Patrick; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Brettin, Thomas; Han, Cliff; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C; Detter, John C

    2009-11-22

    Halomicrobium mukohataei (Ihara et al. 1997) Oren et al. 2002 is the type species of the genus Halomicrobium. It is of phylogenetic interest because of its isolated location within the large euryarchaeal family Halobacteriaceae. H. mukohataei is an extreme halophile that grows essentially aerobically, but can also grow anaerobically under a change of morphology and with nitrate as electron acceptor. The strain, whose genome is described in this report, is a free-living, motile, Gram-negative euryarchaeon, originally isolated from Salinas Grandes in Jujuy, Andes highlands, Argentina. Its genome contains three genes for the 16S rRNA that differ from each other by up to 9%. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence from the poorly populated genus Halomicrobium, and the 3,332,349 bp long genome (chromosome and one plasmid) with its 3416 protein-coding and 56 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  8. Sequencing of Seven Haloarchaeal Genomes Reveals Patterns of Genomic Flux

    PubMed Central

    Lynch, Erin A.; Langille, Morgan G. I.; Darling, Aaron; Wilbanks, Elizabeth G.; Haltiner, Caitlin; Shao, Katie S. Y.; Starr, Michael O.; Teiling, Clotilde; Harkins, Timothy T.; Edwards, Robert A.; Eisen, Jonathan A.; Facciotti, Marc T.

    2012-01-01

    We report the sequencing of seven genomes from two haloarchaeal genera, Haloferax and Haloarcula. Ease of cultivation and the existence of well-developed genetic and biochemical tools for several diverse haloarchaeal species make haloarchaea a model group for the study of archaeal biology. The unique physiological properties of these organisms also make them good candidates for novel enzyme discovery for biotechnological applications. Seven genomes were sequenced to ∼20×coverage and assembled to an average of 50 contigs (range 5 scaffolds - 168 contigs). Comparisons of protein-coding gene compliments revealed large-scale differences in COG functional group enrichment between these genera. Analysis of genes encoding machinery for DNA metabolism reveals genera-specific expansions of the general transcription factor TATA binding protein as well as a history of extensive duplication and horizontal transfer of the proliferating cell nuclear antigen. Insights gained from this study emphasize the importance of haloarchaea for investigation of archaeal biology. PMID:22848480

  9. The first genome sequences of human bocaviruses from Vietnam

    PubMed Central

    2016-01-01

    As part of an ongoing effort to generate complete genome sequences of hand, foot and mouth disease-causing enteroviruses directly from clinical specimens, two complete coding sequences and two partial genomic sequences of human bocavirus 1 (n=3) and 2 (n=1) were co-amplified and sequenced, representing the first genome sequences of human bocaviruses from Vietnam. The sequences may aid future study aiming at understanding the evolution of the pathogen. PMID:28090592

  10. Whole genome sequence analysis of Mycobacterium suricattae.

    PubMed

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; van der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah Musa; Siame, Kabengele Keith; Gey van Pittius, Nicolaas Claudius; van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-12-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  11. Sequencing of GJB2 in Cameroonians and Black South Africans and comparison to 1000 Genomes Project Data Support Need to Revise Strategy for Discovery of Nonsyndromic Deafness Genes in Africans.

    PubMed

    Bosch, Jason; Noubiap, Jean Jacques N; Dandara, Collet; Makubalo, Nomlindo; Wright, Galen; Entfellner, Jean-Baka Domelevo; Tiffin, Nicki; Wonkam, Ambroise

    2014-11-01

    Mutations in the GJB2 gene, encoding connexin 26, could account for 50% of congenital, nonsyndromic, recessive deafness cases in some Caucasian/Asian populations. There is a scarcity of published data in sub-Saharan Africans. We Sanger sequenced the coding region of the GJB2 gene in 205 Cameroonian and Xhosa South Africans with congenital, nonsyndromic deafness; and performed bioinformatic analysis of variations in the GJB2 gene, incorporating data from the 1000 Genomes Project. Amongst Cameroonian patients, 26.1% were familial. The majority of patients (70%) suffered from sensorineural hearing loss. Ten GJB2 genetic variants were detected by sequencing. A previously reported pathogenic mutation, g.3741_3743delTTC (p.F142del), and a putative pathogenic mutation, g.3816G>A (p.V167M), were identified in single heterozygous samples. Amongst eight the remaining variants, two novel variants, g.3318-41G>A and g.3332G>A, were reported. There were no statistically significant differences in allele frequencies between cases and controls. Principal Components Analyses differentiated between Africans, Asians, and Europeans, but only explained 40% of the variation. The present study is the first to compare African GJB2 sequences with the data from the 1000 Genomes Project and have revealed the low variation between population groups. This finding has emphasized the hypothesis that the prevalence of mutations in GJB2 in nonsyndromic deafness amongst European and Asian populations is due to founder effects arising after these individuals migrated out of Africa, and not to a putative "protective" variant in the genomic structure of GJB2 in Africans. Our results confirm that mutations in GJB2 are not associated with nonsyndromic deafness in Africans.

  12. Benchmark dataset for Whole Genome sequence compression.

    PubMed

    C L, Biji; Nair, Achuthsankar

    2016-05-16

    The research in DNA data compression lacks a standard dataset to test out compression tools specific to DNA. This paper argues that the current state of achievement in DNA compression is unable to be bench marked in the absence of such scientifically compiled whole genome sequence dataset and proposes a bench mark dataset using multistage sampling procedure. Considering the genome sequence of organisms available in the National Centre for Biotechnology and Information (NCBI) as the universe, the proposed dataset selects 1105 prokaryotes, 200 plasmids, 164 viruses and 65 eukaryotes. This paper reports the results of using 3 established tools on the newly compiled dataset and show that their strength and weakness are evident only with a comparison based on the scientifically compiled bench mark data set.

  13. Complete Genome Sequence of Ikoma Lyssavirus

    PubMed Central

    Marston, Denise A.; Ellis, Richard J.; Horton, Daniel L.; Kuzmin, Ivan V.; Wise, Emma L.; McElhinney, Lorraine M.; Banyard, Ashley C.; Ngeleja, Chanasa; Keyyu, Julius; Cleaveland, Sarah; Lembo, Tiziana; Rupprecht, Charles E.

    2012-01-01

    Lyssaviruses (family Rhabdoviridae) constitute one of the most important groups of viral zoonoses globally. All lyssaviruses cause the disease rabies, an acute progressive encephalitis for which, once symptoms occur, there is no effective cure. Currently available vaccines are highly protective against the predominantly circulating lyssavirus species. Using next-generation sequencing technologies, we have obtained the whole-genome sequence for a novel lyssavirus, Ikoma lyssavirus (IKOV), isolated from an African civet in Tanzania displaying clinical signs of rabies. Genetically, this virus is the most divergent within the genus Lyssavirus. Characterization of the genome will help to improve our understanding of lyssavirus diversity and enable investigation into vaccine-induced immunity and protection. PMID:22923801

  14. Simple sequence repeats in prokaryotic genomes

    PubMed Central

    Mrázek, Jan; Guo, Xiangxue; Shah, Apurva

    2007-01-01

    Simple sequence repeats (SSRs) in DNA sequences are composed of tandem iterations of short oligonucleotides and may have functional and/or structural properties that distinguish them from general DNA sequences. They are variable in length because of slip-strand mutations and may also affect local structure of the DNA molecule or the encoded proteins. Long SSRs (LSSRs) are common in eukaryotes but rare in most prokaryotes. In pathogens, SSRs can enhance antigenic variance of the pathogen population in a strategy that counteracts the host immune response. We analyze representations of SSRs in >300 prokaryotic genomes and report significant differences among different prokaryotes as well as among different types of SSRs. LSSRs composed of short oligonucleotides (1–4 bp length, designated LSSR1–4) are often found in host-adapted pathogens with reduced genomes that are not known to readily survive in a natural environment outside the host. In contrast, LSSRs composed of longer oligonucleotides (5–11 bp length, designated LSSR5–11) are found mostly in nonpathogens and opportunistic pathogens with large genomes. Comparisons among SSRs of different lengths suggest that LSSR1–4 are likely maintained by selection. This is consistent with the established role of some LSSR1–4 in enhancing antigenic variance. By contrast, abundance of LSSR5–11 in some genomes may reflect the SSRs' general tendency to expand rather than their specific role in the organisms' physiology. Differences among genomes in terms of SSR representations and their possible interpretations are discussed. PMID:17485665

  15. Draft Genome Sequence of Actinomyces massiliensis Strain 4401292T

    PubMed Central

    Robert, Catherine; Gimenez, Grégory; Gharbi, Reem; Raoult, Didier

    2012-01-01

    A draft genome sequence of Actinomyces massiliensis, an anaerobic bacterium isolated from a patient's blood culture, is described here. CRISPR-associated proteins, insertion sequences, and toxin-antitoxin loci were found on the genome. PMID:22933754

  16. Construction of an integrated database to support genomic sequence analysis

    SciTech Connect

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  17. Whole-genome sequencing in bacteriology: state of the art.

    PubMed

    Dark, Michael J

    2013-10-08

    Over the last ten years, genome sequencing capabilities have expanded exponentially. There have been tremendous advances in sequencing technology, DNA sample preparation, genome assembly, and data analysis. This has led to advances in a number of facets of bacterial genomics, including metagenomics, clinical medicine, bacterial archaeology, and bacterial evolution. This review examines the strengths and weaknesses of techniques in bacterial genome sequencing, upcoming technologies, and assembly techniques, as well as highlighting recent studies that highlight new applications for bacterial genomics.

  18. Whole-genome sequencing in bacteriology: state of the art

    PubMed Central

    Dark, Michael J

    2013-01-01

    Over the last ten years, genome sequencing capabilities have expanded exponentially. There have been tremendous advances in sequencing technology, DNA sample preparation, genome assembly, and data analysis. This has led to advances in a number of facets of bacterial genomics, including metagenomics, clinical medicine, bacterial archaeology, and bacterial evolution. This review examines the strengths and weaknesses of techniques in bacterial genome sequencing, upcoming technologies, and assembly techniques, as well as highlighting recent studies that highlight new applications for bacterial genomics. PMID:24143115

  19. Genome Sequence of Aerococcus viridans LL1

    PubMed Central

    Qin, Nan; Zheng, Beiwen; Yang, Fengling; Chen, Yanfei; Guo, Jing; Hu, Xinjun

    2012-01-01

    Aerococcus viridans is a catalase-negative Gram-positive bacterium and has been described as an airborne organism widely distributed in the hospital environment or in clinical specimens. We isolated A. viridans strain LL1 from indoor dust samples collected by a patient. Here, we prepared a genome sequence for this strain consisting of 31 contigs totaling 1,994,039 bases and a GC content of 39.42%. PMID:22815455

  20. Genome sequence of Aerococcus viridans LL1.

    PubMed

    Qin, Nan; Zheng, Beiwen; Yang, Fengling; Chen, Yanfei; Guo, Jing; Hu, Xinjun; Li, Lanjuan

    2012-08-01

    Aerococcus viridans is a catalase-negative Gram-positive bacterium and has been described as an airborne organism widely distributed in the hospital environment or in clinical specimens. We isolated A. viridans strain LL1 from indoor dust samples collected by a patient. Here, we prepared a genome sequence for this strain consisting of 31 contigs totaling 1,994,039 bases and a GC content of 39.42%.

  1. Complete Genome Sequences of 138 Mycobacteriophages

    PubMed Central

    2012-01-01

    Bacteriophages are the most numerous biological entities in the biosphere, and although their genetic diversity is high, it remains ill defined. Mycobacteriophages—the viruses of mycobacterial hosts—provide insights into this diversity as well as tools for manipulating Mycobacterium tuberculosis. We report here the complete genome sequences of 138 new mycobacteriophages, which—together with the 83 mycobacteriophages previously reported—represent the largest collection of phages known to infect a single common host, Mycobacterium smegmatis mc2 155. PMID:22282335

  2. Draft Genome Sequence of Rubrivivax gelatinosus CBS

    SciTech Connect

    Hu, P. S.; Lang, J.; Wawrousek, K.; Yu, J. P.; Maness, P. C.; Chen, J.

    2012-06-01

    Rubrivivax gelatinosus CBS, a purple nonsulfur photosynthetic bacterium, can grow photosynthetically using CO and N{sub 2} as the sole carbon and nitrogen nutrients, respectively. R. gelatinosus CBS is of particular interest due to its ability to metabolize CO and yield H{sub 2}. We present the 5-Mb draft genome sequence of R. gelatinosus CBS with the goal of providing genetic insight into the metabolic properties of this bacterium.

  3. Complete genome sequence of Candidatus Ruthia magnifica.

    PubMed

    Roeselers, Guus; Newton, Irene L G; Woyke, Tanja; Auchtung, Thomas A; Dilly, Geoffrey F; Dutton, Rachel J; Fisher, Meredith C; Fontanez, Kristina M; Lau, Evan; Stewart, Frank J; Richardson, Paul M; Barry, Kerrie W; Saunders, Elizabeth; Detter, John C; Wu, Dongying; Eisen, Jonathan A; Cavanaugh, Colleen M

    2010-10-27

    The hydrothermal vent clam Calyptogena magnifica (Bivalvia: Mollusca) is a member of the Vesicomyidae. Species within this family form symbioses with chemosynthetic Gammaproteobacteria. They exist in environments such as hydrothermal vents and cold seeps and have a rudimentary gut and feeding groove, indicating a large dependence on their endosymbionts for nutrition. The C. magnifica symbiont, Candidatus Ruthia magnifica, was the first intracellular sulfur-oxidizing endosymbiont to have its genome sequenced (Newton et al. 2007). Here we expand upon the original report and provide additional details complying with the emerging MIGS/MIMS standards. The complete genome exposed the genetic blueprint of the metabolic capabilities of the symbiont. Genes which were predicted to encode the proteins required for all the metabolic pathways typical of free-living chemoautotrophs were detected in the symbiont genome. These include major pathways including carbon fixation, sulfur oxidation, nitrogen assimilation, as well as amino acid and cofactor/vitamin biosynthesis. This genome sequence is invaluable in the study of these enigmatic associations and provides insights into the origin and evolution of autotrophic endosymbiosis.

  4. DNA methylation detection: bisulfite genomic sequencing analysis.

    PubMed

    Li, Yuanyuan; Tollefsbol, Trygve O

    2011-01-01

    DNA methylation, which most commonly occurs at the C5 position of cytosines within CpG dinucleotides, plays a pivotal role in many biological procedures such as gene expression, embryonic development, cellular proliferation, differentiation, and chromosome stability. Aberrant DNA methylation is often associated with loss of DNA homeostasis and genomic instability leading to the development of human diseases such as cancer. The importance of DNA methylation creates an urgent demand for effective methods with high sensitivity and reliability to explore innovative diagnostic and therapeutic strategies. Bisulfite genomic sequencing developed by Frommer and colleagues was recognized as a revolution in DNA methylation analysis based on conversion of genomic DNA by using sodium bisulfite. Besides various merits of the bisulfite genomic sequencing method such as being highly qualitative and quantitative, it serves as a fundamental principle to many derived methods to better interpret the mystery of DNA methylation. Here, we present a protocol currently frequently used in our laboratory that has proven to yield optimal outcomes. We also discuss the potential technical problems and troubleshooting notes for a variety of applications in this field.

  5. Sequencing Crop Genomes: A Gateway to Improve Tropical Agriculture

    PubMed Central

    Thottathil, Gincy Paily; Jayasekaran, Kandakumar; Othman, Ahmad Sofiman

    2016-01-01

    Agricultural development in the tropics lags behind development in the temperate latitudes due to the lack of advanced technology, and various biotic and abiotic factors. To cope with the increasing demand for food and other plant-based products, improved crop varieties have to be developed. To breed improved varieties, a better understanding of crop genetics is necessary. With the advent of next-generation DNA sequencing technologies, many important crop genomes have been sequenced. Primary importance has been given to food crops, including cereals, tuber crops, vegetables, and fruits. The DNA sequence information is extremely valuable for identifying key genes controlling important agronomic traits and for identifying genetic variability among the cultivars. However, massive DNA re-sequencing and gene expression studies have to be performed to substantially improve our understanding of crop genetics. Application of the knowledge obtained from the genomes, transcriptomes, expression studies, and epigenetic studies would enable the development of improved varieties and may lead to a second green revolution. The applications of next generation DNA sequencing technologies in crop improvement, its limitations, future prospects, and the features of important crop genome projects are reviewed herein. PMID:27019684

  6. Genome sequence of Halobacterium species NRC-1

    PubMed Central

    Ng, Wailap Victor; Kennedy, Sean P.; Mahairas, Gregory G.; Berquist, Brian; Pan, Min; Shukla, Hem Dutt; Lasky, Stephen R.; Baliga, Nitin S.; Thorsson, Vesteinn; Sbrogna, Jennifer; Swartzell, Steven; Weir, Douglas; Hall, John; Dahl, Timothy A.; Welti, Russell; Goo, Young Ah; Leithauser, Brent; Keller, Kim; Cruz, Randy; Danson, Michael J.; Hough, David W.; Maddocks, Deborah G.; Jablonski, Peter E.; Krebs, Mark P.; Angevine, Christine M.; Dale, Heather; Isenbarger, Thomas A.; Peck, Ronald F.; Pohlschroder, Mechthild; Spudich, John L.; Jung, Kwang-Hwan; Alam, Maqsudul; Freitas, Tracey; Hou, Shaobin; Daniels, Charles J.; Dennis, Patrick P.; Omer, Arina D.; Ebhardt, Holger; Lowe, Todd M.; Liang, Ping; Riley, Monica; Hood, Leroy; DasSarma, Shiladitya

    2000-01-01

    We report the complete sequence of an extreme halophile, Halobacterium sp. NRC-1, harboring a dynamic 2,571,010-bp genome containing 91 insertion sequences representing 12 families and organized into a large chromosome and 2 related minichromosomes. The Halobacterium NRC-1 genome codes for 2,630 predicted proteins, 36% of which are unrelated to any previously reported. Analysis of the genome sequence shows the presence of pathways for uptake and utilization of amino acids, active sodium-proton antiporter and potassium uptake systems, sophisticated photosensory and signal transduction pathways, and DNA replication, transcription, and translation systems resembling more complex eukaryotic organisms. Whole proteome comparisons show the definite archaeal nature of this halophile with additional similarities to the Gram-positive Bacillus subtilis and other bacteria. The ease of culturing Halobacterium and the availability of methods for its genetic manipulation in the laboratory, including construction of gene knockouts and replacements, indicate this halophile can serve as an excellent model system among the archaea. PMID:11016950

  7. Draft genome sequence of an aflatoxigenic Aspergillus species, A. bombycis

    USDA-ARS?s Scientific Manuscript database

    The genome of the A. bombycis Type strain was sequenced using a Personal Genome Machine, followed by annotation of its predicted genes. The genome size for A. bombycis was found to be approximately 37 Mb and contained 12,266 genes. This announcement introduces a sequenced genome for an aflatoxigenic...

  8. Identification of ancient remains through genomic sequencing

    PubMed Central

    Blow, Matthew J.; Zhang, Tao; Woyke, Tanja; Speller, Camilla F.; Krivoshapkin, Andrei; Yang, Dongya Y.; Derevianko, Anatoly; Rubin, Edward M.

    2008-01-01

    Studies of ancient DNA have been hindered by the preciousness of remains, the small quantities of undamaged DNA accessible, and the limitations associated with conventional PCR amplification. In these studies, we developed and applied a genomewide adapter-mediated emulsion PCR amplification protocol for ancient mammalian samples estimated to be between 45,000 and 69,000 yr old. Using 454 Life Sciences (Roche) and Illumina sequencing (formerly Solexa sequencing) technologies, we examined over 100 megabases of DNA from amplified extracts, revealing unbiased sequence coverage with substantial amounts of nonredundant nuclear sequences from the sample sources and negligible levels of human contamination. We consistently recorded over 500-fold increases, such that nanogram quantities of starting material could be amplified to microgram quantities. Application of our protocol to a 50,000-yr-old uncharacterized bone sample that was unsuccessful in mitochondrial PCR provided sufficient nuclear sequences for comparison with extant mammals and subsequent phylogenetic classification of the remains. The combined use of emulsion PCR amplification and high-throughput sequencing allows for the generation of large quantities of DNA sequence data from ancient remains. Using such techniques, even small amounts of ancient remains with low levels of endogenous DNA preservation may yield substantial quantities of nuclear DNA, enabling novel applications of ancient DNA genomics to the investigation of extinct phyla. PMID:18426903

  9. The Genome Russia project: closing the largest remaining omission on the world Genome map.

    PubMed

    Oleksyk, Taras K; Brukhin, Vladimir; O'Brien, Stephen J

    2015-01-01

    We are witnessing the great era of genome exploration of the world, as genetic variation in people is being detailed across multiple varied world populations in an effort unprecedented since the first human genome sequence appeared in 2001. However, these efforts have yet to produce a comprehensive mapping of humankind, because important regions of modern human civilization remain unexplored. The Genome Russia Project promises to fill one of the largest gaps, the expansive regions across the Russian Federation, informing not just medical genomics of the territories, but also the migration settlements  of historic and pre-historic Eurasian peoples.

  10. Swine Genome Sequencing Consortium (SGSC): A Strategic Roadmap for Sequencing The Pig Genome

    PubMed Central

    Schook, Lawrence B.; Beever, Jonathan E.; Rogers, Jane; Humphray, Sean; Archibald, Alan; Chardon, Patrick; Milan, Denis; Rohrer, Gary; Eversole, Kellye

    2005-01-01

    The Swine Genome Sequencing Consortium (SGSC) was formed in September 2003 by academic, government and industry representatives to provide international coordination for sequencing the pig genome. The SGSC’s mission is to advance biomedical research for animal production and health by the development of DNAbased tools and products resulting from the sequencing of the swine genome. During the past 2 years, the SGSC has met bi-annually to develop a strategic roadmap for creating the required scientific resources, to integrate existing physical maps, and to create a sequencing strategy that captured international participation and a broad funding base. During the past year, SGSC members have integrated their respective physical mapping data with the goal of creating a minimal tiling path (MTP) that will be used as the sequencing template. During the recent Plant and Animal Genome meeting (January 16, 2005 San Diego, CA), presentations demonstrated that a human–pig comparative map has been completed, BAC fingerprint contigs (FPC) for each of the autosomes and X chromosome have been constructed and that BAC end-sequencing has permitted, through BLAST analysis and RH-mapping, anchoring of the contigs. Thus, significant progress has been made towards the creation of a MTP. In addition, whole-genome (WG) shotgun libraries have been constructed and are currently being sequenced in various laboratories around the globe. Thus, a hybrid sequencing approach in which 3x coverage of BACs comprising the MTP and 3x of the WG-shotgun libraries will be used to develop a draft 6x coverage of the pig genome. PMID:18629187

  11. Detecting and Analyzing DNA Sequencing Errors: Toward a Higher Quality of the Bacillus subtilis Genome Sequence

    PubMed Central

    Médigue, Claudine; Rose, Matthias; Viari, Alain; Danchin, Antoine

    1999-01-01

    During the determination of a DNA sequence, the introduction of artifactual frameshifts and/or in-frame stop codons in putative genes can lead to misprediction of gene products. Detection of such errors with a method based on protein similarity matching is only possible when related sequences are available in databases. Here, we present a method to detect frameshift errors in DNA sequences that is based on the intrinsic properties of the coding sequences. It combines the results of two analyses, the search for translational initiation/termination sites and the prediction of coding regions. This method was used to screen the complete Bacillus subtilis genome sequence and the regions flanking putative errors were resequenced for verification. This procedure allowed us to correct the sequence and to analyze in detail the nature of the errors. Interestingly, in several cases in-frame termination codons or frameshifts were not sequencing errors but confirmed to be present in the chromosome, indicating that the genes are either nonfunctional (pseudogenes) or subject to regulatory processes such as programmed translational frameshifts. The method can be used for checking the quality of the sequences produced by any prokaryotic genome sequencing project. PMID:10568751

  12. Genome sequence of Aspergillus luchuensis NBRC 4314.

    PubMed

    Yamada, Osamu; Machida, Masayuki; Hosoyama, Akira; Goto, Masatoshi; Takahashi, Toru; Futagami, Taiki; Yamagata, Youhei; Takeuchi, Michio; Kobayashi, Tetsuo; Koike, Hideaki; Abe, Keietsu; Asai, Kiyoshi; Arita, Masanori; Fujita, Nobuyuki; Fukuda, Kazuro; Higa, Ken-Ichi; Horikawa, Hiroshi; Ishikawa, Takeaki; Jinno, Koji; Kato, Yumiko; Kirimura, Kohtaro; Mizutani, Osamu; Nakasone, Kaoru; Sano, Motoaki; Shiraishi, Yohei; Tsukahara, Masatoshi; Gomi, Katsuya

    2016-12-01

    Awamori is a traditional distilled beverage made from steamed Thai-Indica rice in Okinawa, Japan. For brewing the liquor, two microbes, local kuro (black) koji mold Aspergillus luchuensis and awamori yeast Saccharomyces cerevisiae are involved. In contrast, that yeasts are used for ethanol fermentation throughout the world, a characteristic of Japanese fermentation industries is the use of Aspergillus molds as a source of enzymes for the maceration and saccharification of raw materials. Here we report the draft genome of a kuro (black) koji mold, A. luchuensis NBRC 4314 (RIB 2604). The total length of nonredundant sequences was nearly 34.7 Mb, comprising approximately 2,300 contigs with 16 telomere-like sequences. In total, 11,691 genes were predicted to encode proteins. Most of the housekeeping genes, such as transcription factors and N-and O-glycosylation system, were conserved with respect to Aspergillus niger and Aspergillus oryzae An alternative oxidase and acid-stable α-amylase regarding citric acid production and fermentation at a low pH as well as a unique glutamic peptidase were also found in the genome. Furthermore, key biosynthetic gene clusters of ochratoxin A and fumonisin B were absent when compared with A. niger genome, showing the safety of A. luchuensis for food and beverage production. This genome information will facilitate not only comparative genomics with industrial kuro-koji molds, but also molecular breeding of the molds in improvements of awamori fermentation. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  13. Complete genome sequence of Actinosynnema mirum type strain (101T)

    SciTech Connect

    Land, Miriam; Lapidus, Alla; Mayilraj, Shanmugam; Chen, Feng; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Chertkov, Olga; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Rohde, Manfred; Goker, Markus; Pati, Amrita; Ivanova, Natalia; Mavrommatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia; Brettin, Thomas; Detter, John C.; Han, Cliff; Chain, Patrick; Tindall, Brian; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2009-05-20

    Actinosynnema mirum Hasegawa et al. 1978 is the type species of the genus, and is of phylogenetic interest because of its central phylogenetic location in the Actino-synnemataceae, a rapidly growing family within the actinobacterial suborder Pseudo-nocardineae. A. mirum is characterized by its motile spores borne on synnemata and as a producer of nocardicin antibiotics. It is capable of growing aerobically and under a moderate CO2 atmosphere. The strain is a Gram-positive, aerial and substrate mycelium producing bacterium, originally isolated from a grass blade collected from the Raritan River, New Jersey. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of a member of the family Actinosynnemataceae, and only the second sequence from the actinobacterial suborder Pseudonocardineae. The 8,248,144 bp long single replicon genome with its 7100 protein-coding and 77 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  14. Complete genome sequence of Actinosynnema mirum type strain (101T)

    SciTech Connect

    Land, Miriam L; Lapidus, Alla L.; Mayilraj, Shanmugam; Chen, Feng; Copeland, A; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Chertkov, Olga; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Rohde, Manfred; Goker, Markus; Pati, Amrita; Ivanova, N; Mavromatis, K; Chen, Amy; Palaniappan, Krishna; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Brettin, Thomas S; Detter, J. Chris; Han, Cliff; Chain, Patrick S. G.; Tindall, Brian; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2009-01-01

    Actinosynnema mirum Hasegawa et al. 1978 is the type species of the genus, and is of phylogenetic interest because of its central phylogenetic location in the Actino-synnemataceae, a rapidly growing family within the actinobacterial suborder Pseudo-nocardineae. A. mirum is characterized by its motile spores borne on synnemata and as a producer of nocardicin antibiotics. It is capable of growing aerobically and under a moderate CO2 atmosphere. The strain is a Gram-positive, aerial and substrate mycelium producing bacterium, originally isolated from a grass blade collected from the Raritan River, New Jersey. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of a member of the family Actinosynnemataceae, and only the second sequence from the actinobacterial suborder Pseudonocardineae. The 8,248,144 bp long single replicon genome with its 7100 protein-coding and 77 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  15. Large-Scale Sequencing: The Future of Genomic Sciences Colloquium

    SciTech Connect

    Margaret Riley; Merry Buckley

    2009-01-01

    Genetic sequencing and the various molecular techniques it has enabled have revolutionized the field of microbiology. Examining and comparing the genetic sequences borne by microbes - including bacteria, archaea, viruses, and microbial eukaryotes - provides researchers insights into the processes microbes carry out, their pathogenic traits, and new ways to use microorganisms in medicine and manufacturing. Until recently, sequencing entire microbial genomes has been laborious and expensive, and the decision to sequence the genome of an organism was made on a case-by-case basis by individual researchers and funding agencies. Now, thanks to new technologies, the cost and effort of sequencing is within reach for even the smallest facilities, and the ability to sequence the genomes of a significant fraction of microbial life may be possible. The availability of numerous microbial genomes will enable unprecedented insights into microbial evolution, function, and physiology. However, the current ad hoc approach to gathering sequence data has resulted in an unbalanced and highly biased sampling of microbial diversity. A well-coordinated, large-scale effort to target the breadth and depth of microbial diversity would result in the greatest impact. The American Academy of Microbiology convened a colloquium to discuss the scientific benefits of engaging in a large-scale, taxonomically-based sequencing project. A group of individuals with expertise in microbiology, genomics, informatics, ecology, and evolution deliberated on the issues inherent in such an effort and generated a set of specific recommendations for how best to proceed. The vast majority of microbes are presently uncultured and, thus, pose significant challenges to such a taxonomically-based approach to sampling genome diversity. However, we have yet to even scratch the surface of the genomic diversity among cultured microbes. A coordinated sequencing effort of cultured organisms is an appropriate place to begin

  16. Complete genome sequence of Syntrophobacter fumaroxidans strain (MPOB(T)).

    PubMed

    Plugge, Caroline M; Henstra, Anne M; Worm, Petra; Swarts, Daan C; Paulitsch-Fuchs, Astrid H; Scholten, Johannes C M; Lykidis, Athanasios; Lapidus, Alla L; Goltsman, Eugene; Kim, Edwin; McDonald, Erin; Rohlin, Lars; Crable, Bryan R; Gunsalus, Robert P; Stams, Alfons J M; McInerney, Michael J

    2012-10-10

    Syntrophobacter fumaroxidans strain MPOB(T) is the best-studied species of the genus Syntrophobacter. The species is of interest because of its anaerobic syntrophic lifestyle, its involvement in the conversion of propionate to acetate, H2 and CO2 during the overall degradation of organic matter, and its release of products that serve as substrates for other microorganisms. The strain is able to ferment fumarate in pure culture to CO2 and succinate, and is also able to grow as a sulfate reducer with propionate as an electron donor. This is the first complete genome sequence of a member of the genus Syntrophobacter and a member genus in the family Syntrophobacteraceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 4,990,251 bp long genome with its 4,098 protein-coding and 81 RNA genes is a part of the Microbial Genome Program (MGP) and the Genomes to Life (GTL) Program project.

  17. Complete genome sequence of Syntrophobacter fumaroxidans strain (MPOBT)

    PubMed Central

    Plugge, Caroline M.; Henstra, Anne M.; Worm, Petra; Swarts, Daan C.; Paulitsch-Fuchs, Astrid H.; Scholten, Johannes C.M.; Lykidis, Athanasios; Lapidus, Alla L.; Goltsman, Eugene; Kim, Edwin; McDonald, Erin; Rohlin, Lars; Crable, Bryan R.; Gunsalus, Robert P.; Stams, Alfons J.M.; McInerney, Michael J.

    2012-01-01

    Syntrophobacter fumaroxidans strain MPOBT is the best-studied species of the genus Syntrophobacter. The species is of interest because of its anaerobic syntrophic lifestyle, its involvement in the conversion of propionate to acetate, H2 and CO2 during the overall degradation of organic matter, and its release of products that serve as substrates for other microorganisms. The strain is able to ferment fumarate in pure culture to CO2 and succinate, and is also able to grow as a sulfate reducer with propionate as an electron donor. This is the first complete genome sequence of a member of the genus Syntrophobacter and a member genus in the family Syntrophobacteraceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 4,990,251 bp long genome with its 4,098 protein-coding and 81 RNA genes is a part of the Microbial Genome Program (MGP) and the Genomes to Life (GTL) Program project. PMID:23450070

  18. The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata

    SciTech Connect

    Fenner, Marsha W; Liolios, Konstantinos; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Kyrpides, Nikos C.

    2007-12-31

    The Genomes On Line Database (GOLD) is a comprehensive resource of information for genome and metagenome projects world-wide. GOLD provides access to complete and ongoing projects and their associated metadata through pre-computed lists and a search page. The database currently incorporates information for more than 2900 sequencing projects, of which 639 have been completed and the data deposited in the public databases. GOLD is constantly expanding to provide metadata information related to the project and the organism and is compliant with the Minimum Information about a Genome Sequence (MIGS) specifications.

  19. Transforming clinical microbiology with bacterial genome sequencing.

    PubMed

    Didelot, Xavier; Bowden, Rory; Wilson, Daniel J; Peto, Tim E A; Crook, Derrick W

    2012-09-01

    Whole-genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here, we review the current status of clinical microbiology and how it has already begun to be transformed by using next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties, such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. We predict that the application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow.

  20. Transforming clinical microbiology with bacterial genome sequencing

    PubMed Central

    2016-01-01

    Whole genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here we review the current status of clinical microbiology and how it has already begun to be transformed by the use of next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. The application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow. PMID:22868263

  1. Comparative sequence analysis of Sordaria macrospora and Neurospora crassa as a means to improve genome annotation.

    PubMed

    Nowrousian, Minou; Würtz, Christian; Pöggeler, Stefanie; Kück, Ulrich

    2004-03-01

    One of the most challenging parts of large scale sequencing projects is the identification of functional elements encoded in a genome. Recently, studies of genomes of up to six different Saccharomyces species have demonstrated that a comparative analysis of genome sequences from closely related species is a powerful approach to identify open reading frames and other functional regions within genomes [Science 301 (2003) 71, Nature 423 (2003) 241]. Here, we present a comparison of selected sequences from Sordaria macrospora to their corresponding Neurospora crassa orthologous regions. Our analysis indicates that due to the high degree of sequence similarity and conservation of overall genomic organization, S. macrospora sequence information can be used to simplify the annotation of the N. crassa genome.

  2. ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS)

    PubMed Central

    Otto, Thomas D; Gomes, Leonardo HF; Alves-Ferreira, Marcelo; de Miranda, Antonio B; Degrave, Wim M

    2008-01-01

    Background Genome survey sequences (GSS) offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in a genome sequencing project is of considerable importance, as such data strongly influence the estimation of genome coverage, library quality and progress in scaffold construction. Also, the elimination of repetitive sequences from the initial assembly process is important to avoid errors and unnecessary complexity. Repetitive sequences are also of interest in a variety of other studies, for instance as molecular markers. Results We designed and implemented a straightforward pipeline called ReRep, which combines bioinformatics tools for identifying repetitive structures in a GSS dataset. In a case study, we first applied the pipeline to a set of 970 GSSs, sequenced in our laboratory from the human pathogen Leishmania braziliensis, the causative agent of leishmaniosis, an important public health problem in Brazil. We also verified the applicability of ReRep to new sequencing technologies using a set of 454-reads of an Escheria coli. The behaviour of several parameters in the algorithm is evaluated and suggestions are made for tuning of the analysis. Conclusion The ReRep approach for identification of repetitive elements in GSS datasets proved to be straightforward and efficient. Several potential repetitive sequences were found in a L. braziliensis GSS dataset generated in our laboratory, and further validated by the analysis of a more complete genomic dataset from the EMBL and Sanger Centre databases. ReRep also identified most of the E. coli K12 repeats prior to assembly in an example dataset obtained by automated sequencing using 454 technology. The parameters controlling the algorithm behaved consistently and may be tuned to the properties of the dataset, in particular

  3. ReRep: computational detection of repetitive sequences in genome survey sequences (GSS).

    PubMed

    Otto, Thomas D; Gomes, Leonardo H F; Alves-Ferreira, Marcelo; de Miranda, Antonio B; Degrave, Wim M

    2008-09-09

    Genome survey sequences (GSS) offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in a genome sequencing project is of considerable importance, as such data strongly influence the estimation of genome coverage, library quality and progress in scaffold construction. Also, the elimination of repetitive sequences from the initial assembly process is important to avoid errors and unnecessary complexity. Repetitive sequences are also of interest in a variety of other studies, for instance as molecular markers. We designed and implemented a straightforward pipeline called ReRep, which combines bioinformatics tools for identifying repetitive structures in a GSS dataset. In a case study, we first applied the pipeline to a set of 970 GSSs, sequenced in our laboratory from the human pathogen Leishmania braziliensis, the causative agent of leishmaniosis, an important public health problem in Brazil. We also verified the applicability of ReRep to new sequencing technologies using a set of 454-reads of an Escheria coli. The behaviour of several parameters in the algorithm is evaluated and suggestions are made for tuning of the analysis. The ReRep approach for identification of repetitive elements in GSS datasets proved to be straightforward and efficient. Several potential repetitive sequences were found in a L. braziliensis GSS dataset generated in our laboratory, and further validated by the analysis of a more complete genomic dataset from the EMBL and Sanger Centre databases. ReRep also identified most of the E. coli K12 repeats prior to assembly in an example dataset obtained by automated sequencing using 454 technology. The parameters controlling the algorithm behaved consistently and may be tuned to the properties of the dataset, in particular to the length of sequencing

  4. Whole-genome sequence-based analysis of thyroid function

    PubMed Central

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J.; Traglia, Michela; Brown, Suzanne J.; Mullin, Benjamin H.; Shihab, Hashem A.; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R.; Beilby, John P.; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D.; Hui, Jennie; Lim, Ee M.; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R.B.; Bell, Jordana T.; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L.; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M.; Naitza, Silvia; Walsh, John P.; Spector, Tim; Davey Smith, George; Durbin, Richard; Brent Richards, J.; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J.; Wilson, Scott G.; Turki, Saeed Al; Anderson, Carl; Anney, Richard; Antony, Dinu; Artigas, Maria Soler; Ayub, Muhammad; Balasubramaniam, Senduran; Barrett, Jeffrey C.; Barroso, Inês; Beales, Phil; Bentham, Jamie; Bhattacharya, Shoumo; Birney, Ewan; Blackwood, Douglas; Bobrow, Martin; Bochukova, Elena; Bolton, Patrick; Bounds, Rebecca; Boustred, Chris; Breen, Gerome; Calissano, Mattia; Carss, Keren; Chatterjee, Krishna; Chen, Lu; Ciampi, Antonio; Cirak, Sebhattin; Clapham, Peter; Clement, Gail; Coates, Guy; Collier, David; Cosgrove, Catherine; Cox, Tony; Craddock, Nick; Crooks, Lucy; Curran, Sarah; Curtis, David; Daly, Allan; Day-Williams, Aaron; Day, Ian N.M.; Down, Thomas; Du, Yuanping; Dunham, Ian; Edkins, Sarah; Ellis, Peter; Evans, David; Faroogi, Sadaf; Fatemifar, Ghazaleh; Fitzpatrick, David R.; Flicek, Paul; Flyod, James; Foley, A. Reghan; Franklin, Christopher S.; Futema, Marta; Gallagher, Louise; Geihs, Matthias; Geschwind, Daniel; Griffin, Heather; Grozeva, Detelina; Guo, Xueqin; Guo, Xiaosen; Gurling, Hugh; Hart, Deborah; Hendricks, Audrey; Holmans, Peter; Howie, Bryan; Huang, Liren; Hubbard, Tim; Humphries, Steve E.; Hurles, Matthew E.; Hysi, Pirro; Jackson, David K.; Jamshidi, Yalda; Jing, Tian; Joyce, Chris; Kaye, Jane; Keane, Thomas; Keogh, Julia; Kemp, John; Kennedy, Karen; Kolb-Kokocinski, Anja; Lachance, Genevieve; Langford, Cordelia; Lawson, Daniel; Lee, Irene; Lek, Monkol; Liang, Jieqin; Lin, Hong; Li, Rui; Li, Yingrui; Liu, Ryan; Lönnqvist, Jouko; Lopes, Margarida; Lotchkova, Valentina; MacArthur, Daniel; Marchini, Jonathan; Maslen, John; Massimo, Mangino; Mathieson, Iain; Marenne, Gaëlle; McGuffin, Peter; McIntosh, Andrew; McKechanie, Andrew G.; McQuillin, Andrew; Metrustry, Sarah; Mitchison, Hannah; Moayyeri, Alireza; Morris, James; Muntoni, Francesco; Northstone, Kate; O'Donnovan, Michael; Onoufriadis, Alexandros; O'Rahilly, Stephen; Oualkacha, Karim; Owen, Michael J.; Palotie, Aarno; Panoutsopoulou, Kalliope; Parker, Victoria; Parr, Jeremy R.; Paternoster, Lavinia; Paunio, Tiina; Payne, Felicity; Pietilainen, Olli; Plagnol, Vincent; Quaye, Lydia; Quai, Michael A.; Raymond, Lucy; Rehnström, Karola; Richards, Brent; Ring, Susan; Ritchie, Graham R.S.; Roberts, Nicola; Savage, David B.; Scambler, Peter; Schiffels, Stephen; Schmidts, Miriam; Schoenmakers, Nadia; Semple, Robert K.; Serra, Eva; Sharp, Sally I.; Shin, So-Youn; Skuse, David; Small, Kerrin; Southam, Lorraine; Spasic-Boskovic, Olivera; Clair, David St; Stalker, Jim; Stevens, Elizabeth; Pourcian, Beate St; Sun, Jianping; Suvisaari, Jaana; Tachmazidou, Ionna; Tobin, Martin D.; Valdes, Ana; Kogelenberg, Margriet Van; Vijayarangakannan, Parthiban; Visscher, Peter M.; Wain, Louise V.; Walters, James T.R.; Wang, Guangbiao; Wang, Jun; Wang, Yu; Ward, Kirsten; Wheeler, Elanor; Whyte, Tamieka; Williams, Hywel; Williamson, Kathleen A.; Wilson, Crispian; Wong, Kim; Xu, ChangJiang; Yang, Jian; Zhang, Fend; Zhang, Pingbo

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10−9) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10−14). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10−9) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10−11). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function. PMID:25743335

  5. Whole-genome sequence-based analysis of thyroid function.

    PubMed

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J; Traglia, Michela; Brown, Suzanne J; Mullin, Benjamin H; Shihab, Hashem A; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R; Beilby, John P; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D; Hui, Jennie; Lim, Ee M; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R B; Bell, Jordana T; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M; Naitza, Silvia; Walsh, John P; Spector, Tim; Davey Smith, George; Durbin, Richard; Richards, J Brent; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J; Wilson, Scott G

    2015-03-06

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10(-9)) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10(-14)). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10(-9)) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10(-11)). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function.

  6. Complete genome sequence of Halopiger xanaduensis type strain (SH6T)

    SciTech Connect

    Anderson, Iain; Tindall, Brian; Rohde, Manfred; Lucas, Susan; Han, James; Lapidus, Alla L.; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Peters, Lin; Pati, Amrita; Mikhailova, Natalia; Pagani, Ioanna; Teshima, Hazuki; Han, Cliff; Tapia, Roxanne; Land, Miriam L; Woyke, Tanja; Klenk, Hans-Peter; Kyrpides, Nikos C; Ivanova, N

    2012-01-01

    Halopiger xanaduensis is the type species of the genus Halopiger and belongs to the euryarchaeal family Halobacteriaceae. H. xanaduensis strain SH-6, which is designated as the type strain, was isolated from the sediment of a salt lake in Inner Mongolia, Lake Shangmatala. Like other members of the family Halobacteriaceae, it is an extreme halophile requiring at least 2.5 M salt for growth. We report here the sequencing and annotation of the 4,355,268 bp genome, which includes one chromosome and three plasmids. This genome is part of a Joint Genome Institute (JGI) Community Sequencing Program (CSP) project to sequence diverse haloarchaeal genomes.

  7. Why Assembling Plant Genome Sequences Is So Challenging

    PubMed Central

    Claros, Manuel Gonzalo; Bautista, Rocío; Guerrero-Fernández, Darío; Benzerki, Hicham; Seoane, Pedro; Fernández-Pozo, Noé

    2012-01-01

    In spite of the biological and economic importance of plants, relatively few plant species have been sequenced. Only the genome sequence of plants with relatively small genomes, most of them angiosperms, in particular eudicots, has been determined. The arrival of next-generation sequencing technologies has allowed the rapid and efficient development of new genomic resources for non-model or orphan plant species. But the sequencing pace of plants is far from that of animals and microorganisms. This review focuses on the typical challenges of plant genomes that can explain why plant genomics is less developed than animal genomics. Explanations about the impact of some confounding factors emerging from the nature of plant genomes are given. As a result of these challenges and confounding factors, the correct assembly and annotation of plant genomes is hindered, genome drafts are produced, and advances in plant genomics are delayed. PMID:24832233

  8. Los Alamos Science: The Human Genome Project. Number 20, 1992

    DOE R&D Accomplishments Database

    Cooper, N. G.; Shea, N. eds.

    1992-01-01

    This document provides a broad overview of the Human Genome Project, with particular emphasis on work being done at Los Alamos. It tries to emphasize the scientific aspects of the project, compared to the more speculative information presented in the popular press. There is a brief introduction to modern genetics, including a review of classic work. There is a broad overview of the Genome Project, describing what the project is, what are some of its major five-year goals, what are major technological challenges ahead of the project, and what can the field of biology, as well as society expect to see as benefits from this project. Specific results on the efforts directed at mapping chromosomes 16 and 5 are discussed. A brief introduction to DNA libraries is presented, bearing in mind that Los Alamos has housed such libraries for many years prior to the Genome Project. Information on efforts to do applied computational work related to the project are discussed, as well as experimental efforts to do rapid DNA sequencing by means of single-molecule detection using applied spectroscopic methods. The article introduces the Los Alamos staff which are working on the Genome Project, and concludes with brief discussions on ethical, legal, and social implications of this work; a brief glimpse of genetics as it may be practiced in the next century; and a glossary of relevant terms.

  9. Los Alamos Science: The Human Genome Project. Number 20, 1992

    SciTech Connect

    Cooper, N G; Shea, N

    1992-01-01

    This article provides a broad overview of the Human Genome Project, with particular emphasis on work being done at Los Alamos. It tries to emphasize the scientific aspects of the project, compared to the more speculative information presented in the popular press. There is a brief introduction to modern genetics, including a review of classic work. There is a broad overview of the Genome Project, describing what the project is, what are some of its major five-year goals, what are major technological challenges ahead of the project, and what can the field of biology, as well as society expect to see as benefits from this project. Specific results on the efforts directed at mapping chromosomes 16 and 5 are discussed. A brief introduction to DNA libraries is presented, bearing in mind that Los Alamos has housed such libraries for many years prior to the Genome Project. Information on efforts to do applied computational work related to the project are discussed, as well as experimental efforts to do rapid DNA sequencing by means of single-molecule detection using applied spectroscopic methods. The article introduces the Los Alamos staff which are working on the Genome Project, and concludes with brief discussions on ethical, legal, and social implications of this work; a brief glimpse of genetics as it may be practiced in the next century; and a glossary of relevant terms.

  10. Porcine parvovirus: DNA sequence and genome organization.

    PubMed

    Ranz, A I; Manclús, J J; Díaz-Aroca, E; Casal, J I

    1989-10-01

    We have determined the nucleotide sequence of an almost full-length clone of porcine parvovirus (PPV). The sequence is 4973 nucleotides (nt) long. The 3' end of virion DNA shows a Y-shaped configuration homologous to rodent parvoviruses. The 5' end of virion DNA shows a repetition of 127 nt at the carboxy terminus of the capsid proteins. The overall organization of the PPV genome is similar to those of other autonomous parvoviruses. There are two large open reading frames (ORFs) that almost entirely cover the genome, both located in the same frame of the complementary strand. The left ORF encodes the non-structural protein NS1 and the right ORF encodes the capsid proteins (VP1, VP2 and VP3). Promoter analysis, location of splicing sites and putative amino acid sequences for the viral proteins show a high homology of PPV with feline panleukopenia virus and canine parvoviruses (FPV and CPV) and rodent parvovirus. Therefore we conclude that PPV is related to the Kilham rat virus (KRV) group of autonomous parvoviruses formed by KRV, minute virus of mice, Lu III, H-1, FPV and CPV.

  11. Viral sequences integrated into plant genomes.

    PubMed

    Harper, Glyn; Hull, Roger; Lockhart, Ben; Olszewski, Neil

    2002-01-01

    Sequences of various DNA plant viruses have been found integrated into the host genome. There are two forms of integrant, those that can form episomal viral infections and those that cannot. Integrants of three pararetroviruses, Banana streak virus (BSV), Tobacco vein clearing virus (TVCV), and Petunia vein clearing virus (PVCV), can generate episomal infections in certain hybrid plant hosts in response to stress. In the case of BSV and TVCV, one of the parents contains the integrant but is has not been seen to be activated in that parent; the other parent does not contain the integrant. The number of integrant loci is low for BSV and PVCV and high in TVCV. The structure of the integrants is complex, and it is thought that episomal virus is released by recombination and/or reverse transcription. Geminiviral and pararetroviral sequences are found in plant genomes although not so far associated with a virus disease. It appears that integration of viral sequences is widespread in the plant kingdom and has been occurring for a long period of time.

  12. The Chlamydomonas genome project: a decade on

    PubMed Central

    Blaby, Ian K.; Blaby-Haas, Crysten; Tourasse, Nicolas; Hom, Erik F. Y.; Lopez, David; Aksoy, Munevver; Grossman, Arthur; Umen, James; Dutcher, Susan; Porter, Mary; King, Stephen; Witman, George; Stanke, Mario; Harris, Elizabeth H.; Goodstein, David; Grimwood, Jane; Schmutz, Jeremy; Vallon, Olivier; Merchant, Sabeeha S.; Prochnik, Simon

    2014-01-01

    The green alga Chlamydomonas reinhardtii is a popular unicellular organism for studying photosynthesis, cilia biogenesis and micronutrient homeostasis. Ten years since its genome project was initiated, an iterative process of improvements to the genome and gene predictions has propelled this organism to the forefront of the “omics” era. Housed at Phytozome, the Joint Genome Institute’s (JGI) plant genomics portal, the most up-to-date genomic data include a genome arranged on chromosomes and high-quality gene models with alternative splice forms supported by an abundance of RNA-Seq data. Here, we present the past, present and future of Chlamydomonas genomics. Specifically, we detail progress on genome assembly and gene model refinement, discuss resources for gene annotations, functional predictions and locus ID mapping between versions and, importantly, outline a standardized framework for naming genes. PMID:24950814

  13. Draft Genome Sequence of Mycobacterium chimaera Type ...

    EPA Pesticide Factsheets

    We report the draft genome sequence of the type strain Mycobacterium chimaera Fl-0169T, a member of the Mycobacterium avium complex (MAC). M. chimaera Fl-0169T was isolated from a patient in Italy and is highly similar to strains of M. chimaera isolated in Ireland, though Fl-0169T possesses unique virulence genes. Evidence suggests that M. avium, M. intracellulare, and M. chimaera are differently virulent and a comparative genomic analysis is critically needed to identify diagnostic targets that reliably differentiate species of MAC. With treatment costs for Mycobacterium infections estimated to be >$1.8 B annually in the U.S., correct species identification will result in improved treatment selection, lower costs, and improved patient outcomes.

  14. Simple sequence repeats in bryophyte mitochondrial genomes.

    PubMed

    Zhao, Chao-Xian; Zhu, Rui-Liang; Liu, Yang

    2016-01-01

    Simple sequence repeats (SSRs) are thought to be common in plant mitochondrial (mt) genomes, but have yet to be fully described for bryophytes. We screened the mt genomes of two liverworts (Marchantia polymorpha and Pleurozia purpurea), two mosses (Physcomitrella patens and Anomodon rugelii) and two hornworts (Phaeoceros laevis and Nothoceros aenigmaticus), and detected 475 SSRs. Some SSRs are found conserved during the evolution, among which except one exists in both liverworts and mosses, all others are shared only by the two liverworts, mosses or hornworts. SSRs are known as DNA tracts having high mutation rates; however, according to our observations, they still can evolve slowly. The conservativeness of these SSRs suggests that they are under strong selection and could play critical roles in maintaining the gene functions.

  15. The Riken mouse genome encyclopedia project.

    PubMed

    Hayashizaki, Yoshihide

    2003-01-01

    The Riken mouse genome encyclopedia a comprehensive full-length cDNA collection and sequence database. High-level functional annotation is based on sequence homology search, expression profiling, mapping and protein-protein interactions. More than 1000000 clones prepared from 163 tissues were end-sequenced and classified into 128000 clusters, and 60000 representative clones were fully sequenced representing 24000 clear protein-encoding genes. The application of the mouse genome database for positional cloning and gene network regulation analysis is reported.

  16. Effort to map and sequence the human genome makes significant progress

    SciTech Connect

    Borman, S.

    1994-11-07

    The Human Genome Project, an international research effort to map and sequence the genomes of humans and selected model organisms, is making significant progress toward its goals four years into its projected 15-year life. A detailed human genetic linkage map has been developed ahead of time. A physical map, consisting of overlapping pieces of DNA, is only slightly behind schedule. Base-by-base sequencing of the human genome is lagging, but sequencing of model organisms is moving along very well, with the first complete eukaryotic genome likely to be completed within two years. Human Genome Project sponsorship of a map that would show the location of expressed human genes is still in the planning stage. However, such maps have been and are being produced privately on a large scale--a state of affairs that has stirred up considerable controversy about whether the ''market'' for such data is being cornered by proprietary interests.

  17. Mapping our genes: Federal genome projects: How vast. How fast. : Volume 1, Contractor reports

    SciTech Connect

    Reisher, S.R.; Friedmann, T.; Glover, J.; Heilbron, J.L.; Judson, H.F.

    1988-02-01

    This report contains contractor contributions solicited by the US Office of Technology Assessment in support of its recommendations for federal organization of the human genome project. The individual reports contained herein are entitled: Bibliometric analysis of work on human gene mapping; Medical implications of extensive physical and sequence characterization of the human genome; Mapping the human genome: Some implications; Mapping and sequencing the human genome: Considerations from the history of particle accelerators; Mapping the human genome: Historical background; Long-term implications of mapping and sequencing the human genome: Ethical and philosophical implications. Each report is also separately abstracted and indexed for the Energy Data Base. (DT)

  18. Genome Science: A Video Tour of the Washington University Genome Sequencing Center for High School and Undergraduate Students

    PubMed Central

    2005-01-01

    Sequencing of the human genome has ushered in a new era of biology. The technologies developed to facilitate the sequencing of the human genome are now being applied to the sequencing of other genomes. In 2004, a partnership was formed between Washington University School of Medicine Genome Sequencing Center's Outreach Program and Washington University Department of Biology Science Outreach to create a video tour depicting the processes involved in large-scale sequencing. “Sequencing a Genome: Inside the Washington University Genome Sequencing Center” is a tour of the laboratory that follows the steps in the sequencing pipeline, interspersed with animated explanations of the scientific procedures used at the facility. Accompanying interviews with the staff illustrate different entry levels for a career in genome science. This video project serves as an example of how research and academic institutions can provide teachers and students with access and exposure to innovative technologies at the forefront of biomedical research. Initial feedback on the video from undergraduate students, high school teachers, and high school students provides suggestions for use of this video in a classroom setting to supplement present curricula. PMID:16341256

  19. The human genome project: Prospects and implications for clinical medicine

    SciTech Connect

    Green, E.D.; Waterston, R.H. )

    1991-10-09

    The recently initiated human genome project is a large international effort to elucidate the genetic architecture of the genomes of man and several model organisms. The initial phases of this endeavor involve the establishment of rough blueprints (maps) of the genetic landscape of these genomes, with the long-term goal of determining their precise nucleotide sequences and identifying the genes. The knowledge gained by these studies will provide a vital tool for the study of many biologic processes and will have a profound impact on clinical medicine.

  20. Aligning Two Genomic Sequences That Contain Duplications

    NASA Astrophysics Data System (ADS)

    Hou, Minmei; Riemer, Cathy; Berman, Piotr; Hardison, Ross C.; Miller, Webb

    It is difficult to properly align genomic sequences that contain intra-species duplications. With this goal in mind, we have developed a tool, called TOAST (two-way orthologous alignment selection tool), for predicting whether two aligned regions from different species are orthologous, i.e., separated by a speciation event, as opposed to a duplication event. The advantage of restricting alignment to orthologous pairs is that they constitute the aligning regions that are most likely to share the same biological function, and most easily analyzed for evidence of selection. We evaluate TOAST on 12 human/mouse gene clusters.

  1. The Modest Beginnings of One Genome Project

    PubMed Central

    Kaback, David B.

    2013-01-01

    One of the top things on a geneticist’s wish list has to be a set of mutants for every gene in their particular organism. Such a set was produced for the yeast, Saccharomyces cerevisiae near the end of the 20th century by a consortium of yeast geneticists. However, the functional genomic analysis of one chromosome, its smallest, had already begun more than 25 years earlier as a project that was designed to define most or all of that chromosome’s essential genes by temperature-sensitive lethal mutations. When far fewer than expected genes were uncovered, the relatively new field of molecular cloning enabled us and indeed, the entire community of yeast researchers to approach this problem more definitively. These studies ultimately led to cloning, genomic sequencing, and the production and phenotypic analysis of the entire set of knockout mutations for this model organism as well as a better concept of what defines an essential function, a wish fulfilled that enables this model eukaryote to continue at the forefront of research in modern biology. PMID:23733847

  2. A field guide to whole-genome sequencing, assembly and annotation

    PubMed Central

    Ekblom, Robert; Wolf, Jochen B W

    2014-01-01

    Genome sequencing projects were long confined to biomedical model organisms and required the concerted effort of large consortia. Rapid progress in high-throughput sequencing technology and the simultaneous development of bioinformatic tools have democratized the field. It is now within reach for individual research groups in the eco-evolutionary and conservation community to generate de novo draft genome sequences for any organism of choice. Because of the cost and considerable effort involved in such an endeavour, the important first step is to thoroughly consider whether a genome sequence is necessary for addressing the biological question at hand. Once this decision is taken, a genome project requires careful planning with respect to the organism involved and the intended quality of the genome draft. Here, we briefly review the state of the art within this field and provide a step-by-step introduction to the workflow involved in genome sequencing, assembly and annotation with particular reference to large and complex genomes. This tutorial is targeted at scientists with a background in conservation genetics, but more generally, provides useful practical guidance for researchers engaging in whole-genome sequencing projects. PMID:25553065

  3. [Genomic variation in maize]. Final project report

    SciTech Connect

    Rivin, C.J.

    1991-12-31

    These studies have sought to learn how different DNA sequences and sequence arrangements contribute to genome plasticity in maize. We describe quantitative variation among maize inbred lines for tandemly arrayed and dispersed repeated DNA sequences and gene families, and qualitative variation for sequences homologous to the Mutator family of transposons. The potential of these sequences to undergo unequal crossing over, non-allelic (ectopic) recombination and transposition makes them a source of genome instability. We have found examples of rapid genomic change involving these sequences in Fl hybrids, tissue culture cells and regenerated plants. We describe the repetitive portion of the maize genome as composed primarily of sequences that vary markedly in copy number among different genetic stocks. The most highly variable is the 185 bp repeat associated with the heterochromatic chromosome knobs. Even in lines without visible knobs, there is a considerable quantity of tandemly arrayed repeats. We also found a high degree of variability for the tandemly arrayed 5S and ribosomal DNA repeats. While such variation might be expected as the result of unequal cross-over, we were surprised to find considerable variation among lower copy number, dispersed repeats as well. One highly repeated sequence that showed a complex tandem and dispersed arrangement stood out as showing no detectable variability among the maize lines. In striking contrast to the variability seen between the inbred stocks, individuals within a stock were indistinguishable with regard to their repeated sequence multiplicities.

  4. Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals.

    PubMed

    Taylor, Jeremy F; Whitacre, Lynsey K; Hoff, Jesse L; Tizioto, Polyana C; Kim, JaeWoo; Decker, Jared E; Schnabel, Robert D

    2016-08-17

    Decreasing sequencing costs and development of new protocols for characterizing global methylation, gene expression patterns and regulatory regions have stimulated the generation of large livestock datasets. Here, we discuss experiences in the analysis of whole-genome and transcriptome sequence data. We analyzed whole-genome sequence (WGS) data from 132 individuals from five canid species (Canis familiaris, C. latrans, C. dingo, C. aureus and C. lupus) and 61 breeds, three bison (Bison bison), 64 water buffalo (Bubalus bubalis) and 297 bovines from 17 breeds. By individual, data vary in extent of reference genome depth of coverage from 4.9X to 64.0X. We have also analyzed RNA-seq data for 580 samples representing 159 Bos taurus and Rattus norvegicus animals and 98 tissues. By aligning reads to a reference assembly and calling variants, we assessed effects of average depth of coverage on the actual coverage and on the number of called variants. We examined the identity of unmapped reads by assembling them and querying produced contigs against the non-redundant nucleic acids database. By imputing high-density single nucleotide polymorphism data on 4010 US registered Angus animals to WGS using Run4 of the 1000 Bull Genomes Project and assessing the accuracy of imputation, we identified misassembled reference sequence regions. We estimate that a 24X depth of coverage is required to achieve 99.5 % coverage of the reference assembly and identify 95 % of the variants within an individual's genome. Genomes sequenced to low average coverage (e.g., <10X) may fail to cover 10 % of the reference genome and identify <75 % of variants. About 10 % of genomic DNA or transcriptome sequence reads fail to align to the reference assembly. These reads include loci missing from the reference assembly and misassembled genes and interesting symbionts, commensal and pathogenic organisms. Assembly errors and a lack of annotation of functional elements significantly limit the utility of

  5. Ensemble analysis of adaptive compressed genome sequencing strategies.

    PubMed

    Taghavi, Zeinab

    2014-01-01

    Acquiring genomes at single-cell resolution has many applications such as in the study of microbiota. However, deep sequencing and assembly of all of millions of cells in a sample is prohibitively costly. A property that can come to rescue is that deep sequencing of every cell should not be necessary to capture all distinct genomes, as the majority of cells are biological replicates. Biologically important samples are often sparse in that sense. In this paper, we propose an adaptive compressed method, also known as distilled sensing, to capture all distinct genomes in a sparse microbial community with reduced sequencing effort. As opposed to group testing in which the number of distinct events is often constant and sparsity is equivalent to rarity of an event, sparsity in our case means scarcity of distinct events in comparison to the data size. Previously, we introduced the problem and proposed a distilled sensing solution based on the breadth first search strategy. We simulated the whole process which constrained our ability to study the behavior of the algorithm for the entire ensemble due to its computational intensity. In this paper, we modify our previous breadth first search strategy and introduce the depth first search strategy. Instead of simulating the entire process, which is intractable for a large number of experiments, we provide a dynamic programming algorithm to analyze the behavior of the method for the entire ensemble. The ensemble analysis algorithm recursively calculates the probability of capturing every distinct genome and also the expected total sequenced nucleotides for a given population profile. Our results suggest that the expected total sequenced nucleotides grows proportional to log of the number of cells and proportional linearly with the number of distinct genomes. The probability of missing a genome depends on its abundance and the ratio of its size over the maximum genome size in the sample. The modified resource allocation method

  6. A physical map of the highly heterozygous Populus genome: integration with the genome sequence and genetic map

    SciTech Connect

    Kelleher, Colin; CHIU, Dr. R.; Shin, Dr. H.; Krywinski, Martin; Fjell, Chris; Wilkin, Jennifer; Yin, Tongming; Difazio, Stephen P.

    2007-01-01

    As part of a larger project to sequence the Populus genome and generate genomic resources for this emerging model tree, we constructed a physical map of the Populus genome, representing one of the few such maps of an undomesticated, highly heterozygous plant species. The physical map, consisting of 2802 contigs, was constructed from fingerprinted bacterial artificial chromosome (BAC) clones. The map represents approximately 9.4-fold coverage of the Populus genome, which has been estimated from the genome sequence assembly to be 485 {+-} 10 Mb in size. BAC ends were sequenced to assist long-range assembly of whole-genome shotgun sequence scaffolds and to anchor the physical map to the genome sequence. Simple sequence repeat-based markers were derived from the end sequences and used to initiate integration of the BAC and genetic maps. A total of 2411 physical map contigs, representing 97% of all clones assigned to contigs, were aligned to the sequence assembly (JGI Populus trichocarpa, version 1.0). These alignments represent a total coverage of 384 Mb (79%) of the entire poplar sequence assembly and 295 Mb (96%) of linkage group sequence assemblies. A striking result of the physical map contig alignments to the sequence assembly was the co-localization of multiple contigs across numerous regions of the 19 linkage groups. Targeted sequencing of BAC clones and genetic analysis in a small number of representative regions showed that these co-aligning contigs represent distinct haplotypes in the heterozygous individual sequenced, and revealed the nature of these haplotype sequence differences.

  7. A decade of human genome project conclusion: Scientific diffusion about our genome knowledge.

    PubMed

    Moraes, Fernanda; Góes, Andréa

    2016-05-06

    The Human Genome Project (HGP) was initiated in 1990 and completed in 2003. It aimed to sequence the whole human genome. Although it represented an advance in understanding the human genome and its complexity, many questions remained unanswered. Other projects were launched in order to unravel the mysteries of our genome, including the ENCyclopedia of DNA Elements (ENCODE). This review aims to analyze the evolution of scientific knowledge related to both the HGP and ENCODE projects. Data were retrieved from scientific articles published in 1990-2014, a period comprising the development and the 10 years following the HGP completion. The fact that only 20,000 genes are protein and RNA-coding is one of the most striking HGP results. A new concept about the organization of genome arose. The ENCODE project was initiated in 2003 and targeted to map the functional elements of the human genome. This project revealed that the human genome is pervasively transcribed. Therefore, it was determined that a large part of the non-protein coding regions are functional. Finally, a more sophisticated view of chromatin structure emerged. The mechanistic functioning of the genome has been redrafted, revealing a much more complex picture. Besides, a gene-centric conception of the organism has to be reviewed. A number of criticisms have emerged against the ENCODE project approaches, raising the question of whether non-conserved but biochemically active regions are truly functional. Thus, HGP and ENCODE projects accomplished a great map of the human genome, but the data generated still requires further in depth analysis. © 2016 by The International Union of Biochemistry and Molecular Biology, 44:215-223, 2016. © 2016 The International Union of Biochemistry and Molecular Biology.

  8. Initial sequencing and comparative analysis of the mouse genome

    SciTech Connect

    Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F.; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E.; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R.; Brown, Daniel G.; Brown, Stephen D.; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D.; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T.; Church, Deanna M.; Clamp, Michele; Clee, Christopher; Collins, Francis S.; Cook, Lisa L.; Copley, Richard R.; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D.; Deri, Justin; Dermitzakis, Emmanouil T.; Dewey, Colin; Dickens, Nicholas J.; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M.; Eddy, Sean R.; Elnitski, Laura; Emes, Richard D.; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A.; Flicek, Paul; Foley, Karen; Frankel, Wayne N.; Fulton, Lucinda A.; Fulton, Robert S.; Furey, Terrence S.; Gage, Diane; Gibbs, Richard A.; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A.; Green, Eric D.; Gregory, Simon; Guigo, Roderic; Guyer, Mark; Hardison, Ross C.; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W.; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B.; Johnson, L. Steven; Jones, Matthew; Jones, Thomas A.; Joy, Ann; Kamal, Michael; Karlsson, Elinor K.; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W. James; Kirby, Andrew; Kolbe, Diana L.; Korf, Ian; Kucherlapati, Raju S.; Kulbokas III, Edward J.; Kulp, David; Landers, Tom; Leger, J.P.; Leonard, Steven; Letunic, Ivica; Levine, Rosie; et al.

    2002-12-15

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  9. Initial sequencing and comparative analysis of the mouse genome.

    PubMed

    Waterston, Robert H; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R; Brown, Daniel G; Brown, Stephen D; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T; Church, Deanna M; Clamp, Michele; Clee, Christopher; Collins, Francis S; Cook, Lisa L; Copley, Richard R; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D; Deri, Justin; Dermitzakis, Emmanouil T; Dewey, Colin; Dickens, Nicholas J; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M; Eddy, Sean R; Elnitski, Laura; Emes, Richard D; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A; Flicek, Paul; Foley, Karen; Frankel, Wayne N; Fulton, Lucinda A; Fulton, Robert S; Furey, Terrence S; Gage, Diane; Gibbs, Richard A; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A; Green, Eric D; Gregory, Simon; Guigó, Roderic; Guyer, Mark; Hardison, Ross C; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B; Johnson, L Steven; Jones, Matthew; Jones, Thomas A; Joy, Ann; Kamal, Michael; Karlsson, Elinor K; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W James; Kirby, Andrew; Kolbe, Diana L; Korf, Ian; Kucherlapati, Raju S; Kulbokas, Edward J; Kulp, David; Landers, Tom; Leger, J P; Leonard, Steven; Letunic, Ivica; Levine, Rosie; Li, Jia; Li, Ming; Lloyd, Christine; Lucas, Susan; Ma, Bin; Maglott, Donna R; Mardis, Elaine R; Matthews, Lucy; Mauceli, Evan; Mayer, John H; McCarthy, Megan; McCombie, W Richard; McLaren, Stuart; McLay, Kirsten; McPherson, John D; Meldrim, Jim; Meredith, Beverley; Mesirov, Jill P; Miller, Webb; Miner, Tracie L; Mongin, Emmanuel; Montgomery, Kate T; Morgan, Michael; Mott, Richard; Mullikin, James C; Muzny, Donna M; Nash, William E; Nelson, Joanne O; Nhan, Michael N; Nicol, Robert; Ning, Zemin; Nusbaum, Chad; O'Connor, Michael J; Okazaki, Yasushi; Oliver, Karen; Overton-Larty, Emma; Pachter, Lior; Parra, Genís; Pepin, Kymberlie H; Peterson, Jane; Pevzner, Pavel; Plumb, Robert; Pohl, Craig S; Poliakov, Alex; Ponce, Tracy C; Ponting, Chris P; Potter, Simon; Quail, Michael; Reymond, Alexandre; Roe, Bruce A; Roskin, Krishna M; Rubin, Edward M; Rust, Alistair G; Santos, Ralph; Sapojnikov, Victor; Schultz, Brian; Schultz, Jörg; Schwartz, Matthias S; Schwartz, Scott; Scott, Carol; Seaman, Steven; Searle, Steve; Sharpe, Ted; Sheridan, Andrew; Shownkeen, Ratna; Sims, Sarah; Singer, Jonathan B; Slater, Guy; Smit, Arian; Smith, Douglas R; Spencer, Brian; Stabenau, Arne; Stange-Thomann, Nicole; Sugnet, Charles; Suyama, Mikita; Tesler, Glenn; Thompson, Johanna; Torrents, David; Trevaskis, Evanne; Tromp, John; Ucla, Catherine; Ureta-Vidal, Abel; Vinson, Jade P; Von Niederhausern, Andrew C; Wade, Claire M; Wall, Melanie; Weber, Ryan J; Weiss, Robert B; Wendl, Michael C; West, Anthony P; Wetterstrand, Kris; Wheeler, Raymond; Whelan, Simon; Wierzbowski, Jamey; Willey, David; Williams, Sophie; Wilson, Richard K; Winter, Eitan; Worley, Kim C; Wyman, Dudley; Yang, Shan; Yang, Shiaw-Pyng; Zdobnov, Evgeny M; Zody, Michael C; Lander, Eric S

    2002-12-05

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  10. Complete genome sequence of Desulfomicrobium baculatum type strain (XT)

    SciTech Connect

    Copeland, Alex; Spring, Stefan; Goker, Markus; Schneider, Susanne; Lapidus, Alla; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Nolan, Matt; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavrommatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia C; Meincke, Linda; Sims, David; Brettin, Thomas; Detter, John C; Han, Cliff; Chain, Patrick; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C; Lucas, Susan

    2009-05-20

    Desulfomicrobium baculatum is the type species of the genus Desulfomicrobium, which is the type genus of the family Desulfomicrobiaceae. It is of phylogenetic interest because of the isolated location of the family Desulfomicrobiaceae within the order Desulfovibrionales. D. baculatum strain XT is a Gram-negative, motile, sulfate-reducing bacterium isolated from water-saturated manganese carbonate ore. It is strictly anaerobic and does not require NaCl for growth, although NaCl concentrations up to 6percent (w/v) are tolerated. The metabolism is respiratory or fermentative. In the presence of sulfate, pyruvate and lactate are incompletely oxidized to acetate and CO2. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the deltaproteobacterial family Desulfomicrobiaceae, and this 3,942,657 bp long single replicon genome with its 3494 protein-coding and 72 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  11. Complete genome sequence of Kytococcus sedentarius type strain (541T)

    PubMed Central

    Sims, David; Brettin, Thomas; Detter, John C.; Han, Cliff; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Chen, Feng; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ovchinnikova, Galina; Pati, Amrita; Ivanova, Natalia; Mavrommatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; D'haeseleer, Patrik; Chain, Patrick; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Schneider, Susanne; Göker, Markus; Pukall, Rüdiger; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2009-01-01

    Kytococcus sedentarius (ZoBell and Upham 1944) Stackebrandt et al. 1995 is the type strain of the species, and is of phylogenetic interest because of its location in the Dermacoccaceae, a poorly studied family within the actinobacterial suborder Micrococcineae. Kytococcus sedentarius is known for the production of oligoketide antibiotics as well as for its role as an opportunistic pathogen causing valve endocarditis, hemorrhagic pneumonia, and pitted keratolysis. It is strictly aerobic and can only grow when several amino acids are provided in the medium. The strain described in this report is a free-living, nonmotile, Gram-positive bacterium, originally isolated from a marine environment. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the family Dermacoccaceae and the 2,785,024 bp long single replicon genome with its 2639 protein-coding and 64 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304632

  12. Complete genome sequence of Halanaerobium praevalens type strain (GSLT)

    SciTech Connect

    Ivanova, N; Sikorski, Johannes; Chertkov, Olga; Nolan, Matt; Lucas, Susan; Hammon, Nancy; Deshpande, Shweta; Cheng, Jan-Fang; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne A.; Pitluck, Sam; Huntemann, Marcel; Liolios, Konstantinos; Pagani, Ioanna; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Brambilla, Evelyne-Marie; Kannan, K. Palani; Rohde, Manfred; Tindall, Brian; Goker, Markus; Detter, J. Chris; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lapidus, Alla L.

    2011-01-01

    Halanaerobium praevalens Zeikus et al. 1984 is the type species of the genus Halanaero- bium, which in turn is the type genus of the family Halanaerobiaceae. The species is of inter- est because it is able to reduce a variety of nitro-substituted aromatic compounds at a high rate, and because of its ability to degrade organic pollutants. The strain is also of interest be- cause it functions as a hydrolytic bacterium, fermenting complex organic matter and produc- ing intermediary metabolites for other trophic groups such as sulfate-reducing and methano- genic bacteria. It is further reported as being involved in carbon removal in the Great Salt Lake, its source of isolation. This is the first completed genome sequence of a representative of the genus Halanaerobium and the second genome sequence from a type strain of the fami- ly Halanaerobiaceae. The 2,309,262 bp long genome with its 2,110 protein-coding and 70 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  13. Complete genome sequence of Desulfomicrobium baculatum type strain (XT)

    SciTech Connect

    Copeland, A; Spring, Stefan; Goker, Markus; Schneider, Susan; Lapidus, Alla L.; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Chen, Feng; Nolan, Matt; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Ivanova, N; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Meincke, Linda; Sims, David; Brettin, Tom; Detter, J. Chris; Han, Cliff; Chain, Patrick S. G.; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2009-01-01

    Desulfomicrobium baculatum is the type species of the genus Desulfomicrobium, which is the type genus of the family Desulfomicrobiaceae. It is of phylogenetic interest because of the isolated location of the family Desulfomicrobiaceae within the order Desulfovibrionales. D. baculatum strain XT is a Gram-negative, motile, sulfate-reducing bacterium isolated from wa-ter-saturated manganese carbonate ore. It is strictly anaerobic and does not require NaCl for growth, although NaCl concentrations up to 6% (w/v) are tolerated. The metabolism is respi-ratory or fermentative. In the presence of sulfate, pyruvate and lactate are incompletely oxi-dized to acetate and CO2. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the deltaproteobacterial family Desulfomicrobiaceae, and this 3,942,657 bp long single replicon genome with its 3494 protein-coding and 72 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  14. Complete genome sequence of Denitrovibrio acetiphilus type strain (N2460).

    PubMed

    Kiss, Hajnalka; Lang, Elke; Lapidus, Alla; Copeland, Alex; Nolan, Matt; Glavina Del Rio, Tijana; Chen, Feng; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Han, Cliff; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Pati, Amrita; Ivanova, Natalia; Mavromatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Detter, John C; Brettin, Thomas; Spring, Stefan; Rohde, Manfred; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-06-15

    Denitrovibrio acetiphilus Myhr and Torsvik 2000 is the type species of the genus Denitrovibrio in the bacterial family Deferribacteraceae. It is of phylogenetic interest because there are only six genera described in the family Deferribacteraceae. D. acetiphilus was isolated as a representative of a population reducing nitrate to ammonia in a laboratory column simulating the conditions in off-shore oil recovery fields. When nitrate was added to this column undesirable hydrogen sulfide production was stopped because the sulfate reducing populations were superseded by these nitrate reducing bacteria. Here we describe the features of this marine, mesophilic, obligately anaerobic organism respiring by nitrate reduction, together with the complete genome sequence, and annotation. This is the second complete genome sequence of the order Deferribacterales and the class Deferribacteres, which is the sole class in the phylum Deferribacteres. The 3,222,077 bp genome with its 3,034 protein-coding and 51 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  15. Research ethics and the challenge of whole-genome sequencing

    PubMed Central

    McGuire, Amy L.; Caulfield, Timothy; Cho, Mildred K.

    2008-01-01

    The recent completion of the first two individual whole-genome sequences is a research milestone. As personal genome research advances, investigators and international research bodies must ensure ethical research conduct. We identify three major ethical considerations that have been implicated in whole-genome research: the return of research results to participants; the obligations, if any, that are owed to participants’ relatives; and the future use of samples and data taken for whole-genome sequencing. Although the issues are not new, we discuss their implications for personal genomics and provide recommendations for appropriate management in the context of research involving individual whole-genome sequencing. PMID:18087293

  16. Reassociation kinetics-based approach for partial genome sequencing of the cattle tick, Rhipicephalus (Boophilus) microplus

    PubMed Central

    2010-01-01

    Background The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence fiscally and technically problematic. To selectively obtain gene-enriched regions of this tick's genome, Cot filtration was performed, and Cot-filtered DNA was sequenced via 454 FLX pyrosequencing. Results The sequenced Cot-filtered genomic DNA was assembled with an EST-based gene index of 14,586 unique entries where each EST served as a potential "seed" for scaffold formation. The new sequence assembly extended the lengths of 3,913 of the 14,586 gene index entries. Over half of the extensions corresponded to extensions of over 30 amino acids. To survey the repetitive elements in the tick genome, the complete sequences of five BAC clones were determined. Both Class I and II transposable elements were found. Comparison of the BAC and Cot filtration data indicates that Cot filtration was highly successful in filtering repetitive DNA out of the genomic DNA used in 454 sequencing. Conclusion Cot filtration is a very useful strategy to incorporate into genome sequencing projects on organisms with large genome sizes and which contain high percentages of repetitive, difficult to assemble, genomic DNA. Combining the Cot selection approach with 454 sequencing and assembly with a pre-existing EST database as seeds resulted in extensions of 27% of the members of the EST database. PMID:20540747

  17. Permanent draft genome sequence of the gliding predator Saprospira grandis strain Sa g1 (= HR1)

    SciTech Connect

    Mavromatis, K; Chertkov, Olga; Lapidus, Alla L.; Nolan, Matt; Lucas, Susan; Tice, Hope; Glavina Del Rio, Tijana; Cheng, Jan-Fang; Han, Cliff; Tapia, Roxanne; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Huntemann, Marcel; Liolios, Konstantinos; Pagani, Ioanna; Ivanova, N; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Brambilla, Evelyne-Marie; Rohde, Manfred; Spring, Stefan; Goker, Markus; Detter, J. Chris; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Woyke, Tanja

    2012-01-01

    Saprospira grandis Gross et al. 1911 is a member of the Saprospiraceae, a family in the class 'Sphingobacteria' that remains poorly characterized at the genomic level. The species is known for preying on other marine bacteria via 'ixotrophy'. S. grandis strain Sa g1 was isolated from decaying crab carapace in France and was selected for genome sequencing because of its isolated location in the tree of life. Only one type strain genome has been published so far from the Saprospiraceae, while the sequence of strain Sa g1 represents the second genome to be published from a non-type strain of S. grandis. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 4,495,250 bp long Improved-High-Quality draft of the genome with its 3,536 protein-coding and 62 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  18. Draft Genome Sequence of the Fungus Trametes hirsuta 072.

    PubMed

    Pavlov, Andrey R; Tyazhelova, Tatiana V; Moiseenko, Konstantin V; Vasina, Daria V; Mosunova, Olga V; Fedorova, Tatiana V; Maloshenok, Lilya G; Landesman, Elena O; Bruskin, Sergei A; Psurtseva, Nadezhda V; Slesarev, Alexei I; Kozyavkin, Sergei A; Koroleva, Olga V

    2015-11-19

    A standard draft genome sequence of the white rot saprotrophic fungus Trametes hirsuta 072 (Basidiomycota, Polyporales) is presented. The genome sequence contains about 33.6 Mb assembled in 141 scaffolds with a G+C content of ~57.6%. The draft genome annotation predicts 14,598 putative protein-coding open reading frames (ORFs). Copyright © 2015 Pavlov et al.

  19. Draft Genome Sequence of the Fungus Trametes hirsuta 072

    PubMed Central

    Tyazhelova, Tatiana V.; Moiseenko, Konstantin V.; Vasina, Daria V.; Mosunova, Olga V.; Fedorova, Tatiana V.; Maloshenok, Lilya G.; Landesman, Elena O.; Bruskin, Sergei A.; Psurtseva, Nadezhda V.; Slesarev, Alexei I.; Kozyavkin, Sergei A.; Koroleva, Olga V.

    2015-01-01

    A standard draft genome sequence of the white rot saprotrophic fungus Trametes hirsuta 072 (Basidiomycota, Polyporales) is presented. The genome sequence contains about 33.6 Mb assembled in 141 scaffolds with a G+C content of ~57.6%. The draft genome annotation predicts 14,598 putative protein-coding open reading frames (ORFs). PMID:26586872

  20. Detecting long tandem duplications in genomic sequences

    PubMed Central

    2012-01-01

    Background Detecting duplication segments within completely sequenced genomes provides valuable information to address genome evolution and in particular the important question of the emergence of novel functions. The usual approach to gene duplication detection, based on all-pairs protein gene comparisons, provides only a restricted view of duplication. Results In this paper, we introduce ReD Tandem, a software using a flow based chaining algorithm targeted at detecting tandem duplication arrays of moderate to longer length regions, with possibly locally weak similarities, directly at the DNA level. On the A. thaliana genome, using a reference set of tandem duplicated genes built using TAIR,a we show that ReD Tandem is able to predict a large fraction of recently duplicated genes (dS < 1) and that it is also able to predict tandem duplications involving non coding elements such as pseudo-genes or RNA genes. Conclusions ReD Tandem allows to identify large tandem duplications without any annotation, leading to agnostic identification of tandem duplications. This approach nicely complements the usual protein gene based which ignores duplications involving non coding regions. It is however inherently restricted to relatively recent duplications. By recovering otherwise ignored events, ReD Tandem gives a more comprehensive view of existing evolutionary processes and may also allow to improve existing annotations. PMID:22568762

  1. Rapid whole genome sequencing and precision neonatology.

    PubMed

    Petrikin, Joshua E; Willig, Laurel K; Smith, Laurie D; Kingsmore, Stephen F

    2015-12-01

    Traditionally, genetic testing has been too slow or perceived to be impractical to initial management of the critically ill neonate. Technological advances have led to the ability to sequence and interpret the entire genome of a neonate in as little as 26 h. As the cost and speed of testing decreases, the utility of whole genome sequencing (WGS) of neonates for acute and latent genetic illness increases. Analyzing the entire genome allows for concomitant evaluation of the currently identified 5588 single gene diseases. When applied to a select population of ill infants in a level IV neonatal intensive care unit, WGS yielded a diagnosis of a causative genetic disease in 57% of patients. These diagnoses may lead to clinical management changes ranging from transition to palliative care for uniformly lethal conditions for alteration or initiation of medical or surgical therapy to improve outcomes in others. Thus, institution of 2-day WGS at time of acute presentation opens the possibility of early implementation of precision medicine. This implementation may create opportunities for early interventional, frequently novel or off-label therapies that may alter disease trajectory in infants with what would otherwise be fatal disease. Widespread deployment of rapid WGS and precision medicine will raise ethical issues pertaining to interpretation of variants of unknown significance, discovery of incidental findings related to adult onset conditions and carrier status, and implementation of medical therapies for which little is known in terms of risks and benefits. Despite these challenges, precision neonatology has significant potential both to decrease infant mortality related to genetic diseases with onset in newborns and to facilitate parental decision making regarding transition to palliative care.

  2. prot4EST: Translating Expressed Sequence Tags from neglected genomes

    PubMed Central

    Wasmuth, James D; Blaxter, Mark L

    2004-01-01

    Background The genomes of an increasing number of species are being investigated through generation of expressed sequence tags (ESTs). However, ESTs are prone to sequencing errors and typically define incomplete transcripts, making downstream annotation difficult. Annotation would be greatly improved with robust polypeptide translations. Many current solutions for EST translation require a large number of full-length gene sequences for training purposes, a resource that is not available for the majority of EST projects. Results As part of our ongoing EST programs investigating these "neglected" genomes, we have developed a polypeptide prediction pipeline, prot4EST. It incorporates freely available software to produce final translations that are more accurate than those derived from any single method. We show that this integrated approach goes a long way to overcoming the deficit in training data. Conclusions prot4EST provides a portable EST translation solution and can be usefully applied to >95% of EST projects to improve downstream annotation. It is freely available from . PMID:15571632

  3. An improved chloroplast DNA extraction procedure for whole plastid genome sequencing.

    PubMed

    Shi, Chao; Hu, Na; Huang, Hui; Gao, Ju; Zhao, You-Jie; Gao, Li-Zhi

    2012-01-01

    Chloroplast genomes supply valuable genetic information for evolutionary and functional studies in plants. The past five years have witnessed a dramatic increase in the number of completely sequenced chloroplast genomes with the application of second-generation sequencing technology in plastid genome sequencing projects. However, cost-effective high-throughput chloroplast DNA (cpDNA) extraction becomes a major bottleneck restricting the application, as conventional methods are difficult to make a balance between the quality and yield of cpDNAs. We first tested two traditional methods to isolate cpDNA from the three species, Oryza brachyantha, Leersia japonica and Prinsepia utihis. Both of them failed to obtain properly defined cpDNA bands. However, we developed a simple but efficient method based on sucrose gradients and found that the modified protocol worked efficiently to isolate the cpDNA from the same three plant species. We sequenced the isolated DNA samples with Illumina (Solexa) sequencing technology to test cpDNA purity according to aligning sequence reads to the reference chloroplast genomes, showing that the reference genome was properly covered. We show that 40-50% cpDNA purity is achieved with our method. Here we provide an improved method used to isolate cpDNA from angiosperms. The Illumina sequencing results suggest that the isolated cpDNA has reached enough yield and sufficient purity to perform subsequent genome assembly. The cpDNA isolation protocol thus will be widely applicable to the plant chloroplast genome sequencing projects.

  4. Draft Genome Sequences of Campylobacter jejuni Strains That Cause Abortion in Livestock

    PubMed Central

    Weis, Allison M.; Clothier, Kristin A.; Huang, Bihua C.; Kong, Nguyet

    2016-01-01

    Campylobacter jejuni is an intestinal bacterium that can cause abortion in livestock. This publication announces the public release of 15 Campylobacter jejuni genome sequences from isolates linked to abortion in livestock. These isolates are part of the 100K Pathogen Genome Project and are from clinical cases at the University of California (UC) Davis. PMID:27908990

  5. Draft Genome Sequence of Pedobacter agri PB92T, Which Belongs to the Family Sphingobacteriaceae

    PubMed Central

    Lee, Myunglip; Roh, Seong Woon; Lee, Hae-Won; Yim, Kyung June; Kim, Kil-Nam; Bae, Jin-Woo; Choi, Kwang-Sik; Jeon, You-Jin; Jung, Won-Kyo; Kang, Heewan

    2012-01-01

    Strain PB92T of Pedobacter agri, which belongs to the family Sphingobacteriaceae, was isolated from soil in the Republic of Korea. The draft genome of strain PB92T contains 5,141,552 bp, with a G+C content of 38.0%. This is the third genome sequencing project of the type strains among the Pedobacter species. PMID:22740666

  6. First Complete Genome Sequence of Cherry virus A

    PubMed Central

    Koinuma, Hiroaki; Nijo, Takamichi; Iwabuchi, Nozomu; Yoshida, Tetsuya; Keima, Takuya; Okano, Yukari; Maejima, Kensaku; Yamaji, Yasuyuki

    2016-01-01

    The 5′-terminal genomic sequence of Cherry virus A (CVA) has long been unknown. We determined the first complete genome sequence of an apricot isolate of CVA (7,434 nucleotides [nt]). The 5′-untranslated region was 107 nt in length, which was 53 nt longer than those of known CVA sequences. PMID:27284130

  7. Next Generation Sequencing at the University of Chicago Genomics Core

    SciTech Connect

    Faber, Pieter

    2013-04-24

    The University of Chicago Genomics Core provides University of Chicago investigators (and external clients) access to State-of-the-Art genomics capabilities: next generation sequencing, Sanger sequencing / genotyping and micro-arrays (gene expression, genotyping, and methylation). The current presentation will highlight our capabilities in the area of ultra-high throughput sequencing analysis.

  8. Whole genome mapping as a fast-track tool to assess genomic stability of sequenced Staphylococcus aureus strains.

    PubMed

    Sabirova, Julia S; Xavier, Basil Britto; Ieven, Margareta; Goossens, Herman; Malhotra-Kumar, Surbhi

    2014-10-08

    Whole genome (optical) mapping (WGM), a state-of-the-art mapping technology based on the generation of high resolution restriction maps, has so far been used for typing clinical outbreak strains and for mapping de novo sequence contigs in genome sequencing projects. We employed WGM to assess the genomic stability of previously sequenced Staphylococcus aureus strains that are commonly used in laboratories as reference standards. S. aureus strains (n = 12) were mapped on the Argus™ Optical Mapping System (Opgen Inc, Gaithersburg, USA). Assembly of NcoI-restricted DNA molecules, visualization, and editing of whole genome maps was performed employing MapManager and MapSolver softwares (Opgen Inc). In silico whole genome NcoI-restricted maps were also generated from available sequence data, and compared to the laboratory-generated maps. Strains showing differences between the two maps were resequenced using Nextera XT DNA Sample Preparation Kit and Miseq Reagent Kit V2 (MiSeq, Illumina) and de novo assembled into sequence contigs using the Velvet assembly tool. Sequence data were correlated with corresponding whole genome maps to perform contig mapping and genome assembly using MapSolver. Of the twelve strains tested, one (USA300_FPR3757) showed a 19-kbp deletion on WGM compared to its in silico generated map and reference sequence data. Resequencing of the USA300_FPR3757 identified the deleted fragment to be a 13 kbp-long integrative conjugative element ICE6013. Frequent subculturing and inter-laboratory transfers can induce genomic and therefore, phenotypic changes that could compromise the utility of standard reference strains. WGM can thus be used as a rapid genome screening method to identify genomic rearrangements whose size and type can be confirmed by sequencing.

  9. Draft Genome Sequence of Bacillus amyloliquefaciens B-1895

    PubMed Central

    Melnikov, Vyacheslav G.; Chistyakov, Vladimir A.

    2014-01-01

    In this report, we present a draft genome sequence of Bacillus amyloliquefaciens strain B-1895. Comparison with the genome of a reference strain demonstrated similar overall organization, as well as differences involving large gene clusters. PMID:24948774

  10. Draft Genome Sequence of Brevibacterium massiliense Strain 541308T

    PubMed Central

    Robert, Catherine; Gimenez, Grégory; Raoult, Didier

    2012-01-01

    A draft genome sequence of Brevibacterium massiliense, an aerobic bacterium isolated from a human ankle discharge, is described here. CRISPR-associated proteins were found to be encoded in the genome, and analysis of transport proteins was performed. PMID:22933772

  11. Exploring cancer genomic data from the cancer genome atlas project

    PubMed Central

    Lee, Ju-Seog

    2016-01-01

    The Cancer Genome Atlas (TCGA) has compiled genomic, epigenomic, and proteomic data from more than 10,000 samples derived from 33 types of cancer, aiming to improve our understanding of the molecular basis of cancer development. Availability of these genome-wide information provides an unprecedented opportunity for uncovering new key regulators of signaling pathways or new roles of pre-existing members in pathways. To take advantage of the advancement, it will be necessary to learn systematic approaches that can help to uncover novel genes reflecting genetic alterations, prognosis, or response to treatments. This minireview describes the updated status of TCGA project and explains how to use TCGA data. PMID:27530686

  12. Racial/Ethnic Disparities in Genomic Sequencing.

    PubMed

    Spratt, Daniel E; Chan, Tiffany; Waldron, Levi; Speers, Corey; Feng, Felix Y; Ogunwobi, Olorunseun O; Osborne, Joseph R

    2016-08-01

    Although poorly understood, there is heterogeneity in the molecular biology of cancer across race and ethnicities. The representation of racial minorities in large genomic sequencing efforts is unclear, and could have an impact on health care disparities. To determine the racial distribution among samples sequenced within The Cancer Genome Atlas (TCGA) and the deficit of samples needed to detect moderately common mutational frequencies in racial minorities. This was a retrospective review of individual patient data from TCGA data portal accessed in July 2015. TCGA comprises samples from a wide array of institutions primarily across the United States. Samples from 10 of the 31 currently available tumor types were analyzed, comprising 5729 samples from the approximately 11 000 available. Using the estimated median somatic mutational frequency, the samples needed beyond TCGA to detect a 10% and 5% mutational frequency over the background somatic mutation frequency were calculated for each tumor type by racial ethnicity. Of the 5729 samples, 77% (n = 4389) were white, 12% (n = 660) were black, 3% (n = 173) were Asian, 3% (n = 149) were Hispanic, and less than 0.5% combined were from patients of Native Hawaiian, Pacific Islander, Alaskan Native, or American Indian decent. This overrepresents white patients compared with the US population and underrepresents primarily Asian and Hispanic patients. With a somatic mutational frequency of 0.7 (prostate cancer) to 9.9 (lung squamous cell cancer), all tumor types from white patients contained enough samples to detect a 10% mutational frequency. This is in contrast to all other racial ethnicities, for which group-specific mutations with 10% frequency would be detectable only for black patients with breast cancer. Group-specific mutations with 5% frequency would be undetectable in any racial minority, but detectable in white patients for all cancer types except lung (adenocarcinoma and squamous cell carcinoma

  13. Attitudes towards the Human Genome Project.

    ERIC Educational Resources Information Center

    Shahroudi, Julie; Shaw, Geraldine

    Attitudes concerning the Human Genome Project were reported by faculty (N=40) and students (N=66) from a liberal arts college. Positive attitudes toward the project involved privacy, insurance and health, economic purposes, reproductive purposes, genetic counseling, religion and overall opinions. Negative attitudes were expressed regarding…

  14. Attitudes towards the Human Genome Project.

    ERIC Educational Resources Information Center

    Shahroudi, Julie; Shaw, Geraldine

    Attitudes concerning the Human Genome Project were reported by faculty (N=40) and students (N=66) from a liberal arts college. Positive attitudes toward the project involved privacy, insurance and health, economic purposes, reproductive purposes, genetic counseling, religion and overall opinions. Negative attitudes were expressed regarding…

  15. The African Genome Variation Project shapes medical genetics in Africa

    NASA Astrophysics Data System (ADS)

    Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O.; Choudhury, Ananyo; Ritchie, Graham R. S.; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N.; Young, Elizabeth H.; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P.; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A.; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S.

    2015-01-01

    Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.

  16. The African Genome Variation Project shapes medical genetics in Africa.

    PubMed

    Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O; Choudhury, Ananyo; Ritchie, Graham R S; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N; Young, Elizabeth H; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S

    2015-01-15

    Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.

  17. Are We There Yet? Reliably Estimating the Completeness of Plant Genome Sequences[OPEN

    PubMed Central

    2016-01-01

    Genome sequencing is becoming cheaper and faster thanks to the introduction of next-generation sequencing techniques. Dozens of new plant genome sequences have been released in recent years, ranging from small to gigantic repeat-rich or polyploid genomes. Most genome projects have a dual purpose: delivering a contiguous, complete genome assembly and creating a full catalog of correctly predicted genes. Frequently, the completeness of a species’ gene catalog is measured using a set of marker genes that are expected to be present. This expectation can be defined along an evolutionary gradient, ranging from highly conserved genes to species-specific genes. Large-scale population resequencing studies have revealed that gene space is fairly variable even between closely related individuals, which limits the definition of the expected gene space, and, consequently, the accuracy of estimates used to assess genome and gene space completeness. We argue that, based on the desired applications of a genome sequencing project, different completeness scores for the genome assembly and/or gene space should be determined. Using examples from several dicot and monocot genomes, we outline some pitfalls and recommendations regarding methods to estimate completeness during different steps of genome assembly and annotation. PMID:27512012

  18. Complete genome sequence of Calditerrivibrio nitroreducens type strain (Yu37-1T)

    SciTech Connect

    Pitluck, Sam; Sikorski, Johannes; Zeytun, Ahmet; Lapidus, Alla L.; Nolan, Matt; Lucas, Susan; Hammon, Nancy; Deshpande, Shweta; Cheng, Jan-Fang; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne A.; Liolios, Konstantinos; Pagani, Ioanna; Ivanova, N; Mavromatis, K; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Detter, J. Chris; Brambilla, Evelyne-Marie; Ngatchou, Olivier Duplex; Rohde, Manfred; Spring, Stefan; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Land, Miriam L

    2011-01-01

    Calditerrivibrio nitroreducens Iino et al. 2008 is the type species of the genus Calditerrivibrio. The species is of interest because of its important role in the nitrate cycle as nitrate reducer and for its isolated phylogenetic position in the Tree of Life. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the third complete genome sequence of a member of the family Deferribacteraceae. The 2,216,552 bp long genome with its 2,128 protein-coding and 50 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  19. Complete genome sequence of Xylanimonas cellulosilytica type strain (XIL07T)

    SciTech Connect

    Foster, Brian; Pukall, Rudiger; Abt, Birte; Nolan, Matt; Glavina Del Rio, Tijana; Chen, Feng; Lucas, Susan; Tice, Hope; Pitluck, Sam; Cheng, Jan-Fang; Chertkov, Olga; Brettin, Thomas S; Han, Cliff; Detter, J C; Bruce, David; Goodwin, Lynne A.; Ivanova, N; Mavromatis, K; Pati, Amrita; Mikhailova, Natalia; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Rohde, Manfred; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lapidus, Alla L.

    2010-01-01

    Xylanimonas cellulosilytica Rivas et al. 2003 is the type species of the genus Xylanimonas of the actinobacterial family Promicromonosporaceae. The species X. cellulosilytica is of interest because of its ability to hydrolyze cellulose and xylan. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the large family Promicromonosporaceae, and the 3,831,380 bp long genome (one chromosome plus an 88,604 bp long plasmid) with its 3485 protein-coding and 61 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  20. Genome sequence of the Antarctic rhodopsins- containing flavobacterium Gillisia limnaea type strain (R- 8282T)

    SciTech Connect

    Riedel, Thomas; Held, Brittany; Nolan, Matt; Lucas, Susan; Lapidus, Alla L.; Tice, Hope; Glavina Del Rio, Tijana; Cheng, Jan-Fang; Han, Cliff; Tapia, Roxanne; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Mavromatis, K; Pagani, Ioanna; Ivanova, N; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Rohde, Manfred; Tindall, Brian; Detter, J. Chris; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Woyke, Tanja

    2012-01-01

    Gillisia limnaea Van Trappen et al. 2004 is the type species of the genus Gillisia, which is a mem- ber of the well characterized family Flavobacteriaceae. The genome of G. limnea R-8282T is the first sequenced genome (permanent draft) from a type strain of the genus Gillisia. Here we de- scribe the features of this organism, together with the permanent-draft genome sequence and an- notation. The 3,966,857 bp long chromosome (two scaffolds) with its 3,569 protein-coding and 51 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  1. Complete genome sequence of Thermosediminibacter oceani type strain (JW/IW-1228PT)

    SciTech Connect

    Pitluck, Sam; Yasawong, Montri; Munk, Christine; Nolan, Matt; Lapidus, Alla L.; Lucas, Susan; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Bruce, David; Detter, J. Chris; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne A.; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Rohde, Manfred; Spring, Stefan; Sikorski, Johannes; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-01

    Thermosediminibacter oceani (Lee et al. 2006) is the type species of the genus Thermosediminibacter in the family Thermoanaerobacteraceae. The anaerobic, barophilic, chemoorganotrophic thermophile is characterized by straight to curved Gram-negative rods. The strain described in this study has been isolated from a core sample of deep sea sediments of the Peruvian high productivity upwelling system. This is the first completed genome sequence of a member of the genus Thermosediminibacter and the seventh genome sequence in the family Thermoanaerobacteraceae. The 2,280,035 bp long genome with its 2,285 protein-coding and 63 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  2. Complete genome sequence of Sanguibacter keddieii type strain (ST-74T)

    SciTech Connect

    Ivanova, Natalia; Sikorski, Johannes; Sims, David; Brettin, Thomas; Detter, John C.; Han, Cliff; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Chen, Feng; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Pati, Amrita; Mavromatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; D'haeseleer, Patrik; Chain, Patrick; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Goker, Markus; Pukall, Rudiger; Klenk, Hans-Peter; Kyrpides, Nikos

    2009-05-20

    Sanguibacter keddieii is the type species of the genus Sanguibacter, the only described genus within the family of Sanguibacteraceae. Phylogenetically, this family is located in the neighbourhood of the genus Oerskovia and the family Cellulomonadaceae within the actinobacterial suborder Micrococcineae. The strain described in this report was isolated from blood of apparently healthy cows. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the family Sanguibacteraceae, and the 4,253,413 bp long single replicon genome with its 3735 protein-coding and 70 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  3. Integration of new alternative reference strain genome sequences into the Saccharomyces genome database.

    PubMed

    Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C; Dalusag, Kyla; Demeter, Janos; Engel, Stacia; Hellerstedt, Sage T; Karra, Kalpana; Hitz, Benjamin C; Nash, Robert S; Paskov, Kelley; Sheppard, Travis; Skrzypek, Marek; Weng, Shuai; Wong, Edith; Michael Cherry, J

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. To provide a wider scope of genetic and phenotypic variation in yeast, the genome sequences and their corresponding annotations from 11 alternative S. cerevisiae reference strains have been integrated into SGD. Genomic and protein sequence information for genes from these strains are now available on the Sequence and Protein tab of the corresponding Locus Summary pages. We illustrate how these genome sequences can be utilized to aid our understanding of strain-specific functional and phenotypic differences.Database URL: www.yeastgenome.org.

  4. DNA Data Bank of Japan at work on genome sequence data.

    PubMed

    Tateno, Y; Fukami-Kobayashi, K; Miyazaki, S; Sugawara, H; Gojobori, T

    1998-01-01

    We at the DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) have recently begun receiving, processing and releasing EST and genome sequence data submitted by various Japanese genome projects. The data include those for human, Arabidopsis thaliana, rice, nematode, Synechocystis sp. and Escherichia coli. Since the quantity of data is very large, we organized teams to conduct preliminary discussions with project teams about data submission and handling for release to the public. We also developed a mass submission tool to cope with a large quantity of data. In addition, to provide genome data on WWW, we developed a genome information system using Java. This system (http://mol.genes.nig.ac.jp/ecoli/) can in theory be used for any genome sequence data. These activities will facilitate processing of large quantities of EST and genome data.

  5. DNA Data Bank of Japan at work on genome sequence data.

    PubMed Central

    Tateno, Y; Fukami-Kobayashi, K; Miyazaki, S; Sugawara, H; Gojobori, T

    1998-01-01

    We at the DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) have recently begun receiving, processing and releasing EST and genome sequence data submitted by various Japanese genome projects. The data include those for human, Arabidopsis thaliana, rice, nematode, Synechocystis sp. and Escherichia coli. Since the quantity of data is very large, we organized teams to conduct preliminary discussions with project teams about data submission and handling for release to the public. We also developed a mass submission tool to cope with a large quantity of data. In addition, to provide genome data on WWW, we developed a genome information system using Java. This system (http://mol.genes.nig.ac.jp/ecoli/) can in theory be used for any genome sequence data. These activities will facilitate processing of large quantities of EST and genome data. PMID:9399792

  6. Draft Genome Sequence for a Urinary Isolate of Nosocomiicoccus ampullae

    PubMed Central

    Hilt, Evann E.; Price, Travis K.; Diebel, Katherine; Putonti, Catherine

    2016-01-01

    A draft genome sequence for a urinary isolate of Nosocomiicoccus ampullae (UMB0853) was investigated. The size of the genome was 1,578,043 bp, with an observed G+C content of 36.1%. Annotation revealed 10 rRNA sequences, 40 tRNA genes, and 1,532 protein-coding sequences. Genome coverage was 727× and consisted of 32 contigs, with an N50 of 109,831 bp. PMID:27856579

  7. Discovery of common sequences absent in the human reference genome using pooled samples from next generation sequencing.

    PubMed

    Liu, Yu; Koyutürk, Mehmet; Maxwell, Sean; Xiang, Min; Veigl, Martina; Cooper, Richard S; Tayo, Bamidele O; Li, Li; LaFramboise, Thomas; Wang, Zhenghe; Zhu, Xiaofeng; Chance, Mark R

    2014-08-16

    Sequences up to several megabases in length have been found to be present in individual genomes but absent in the human reference genome. These sequences may be common in populations, and their absence in the reference genome may indicate rare variants in the genomes of individuals who served as donors for the human genome project. As the reference genome is used in probe design for microarray technology and mapping short reads in next generation sequencing (NGS), this missing sequence could be a source of bias in functional genomic studies and variant analysis. One End Anchor (OEA) and/or orphan reads from paired-end sequencing have been used to identify novel sequences that are absent in reference genome. However, there is no study to investigate the distribution, evolution and functionality of those sequences in human populations. To systematically identify and study the missing common sequences (micSeqs), we extended the previous method by pooling OEA reads from large number of individuals and applying strict filtering methods to remove false sequences. The pipeline was applied to data from phase 1 of the 1000 Genomes Project. We identified 309 micSeqs that are present in at least 1% of the human population, but absent in the reference genome. We confirmed 76% of these 309 micSeqs by comparison to other primate genomes, individual human genomes, and gene expression data. Furthermore, we randomly selected fifteen micSeqs and confirmed their presence using PCR validation in 38 additional individuals. Functional analysis using published RNA-seq and ChIP-seq data showed that eleven micSeqs are highly expressed in human brain and three micSeqs contain transcription factor (TF) binding regions, suggesting they are functional elements. In addition, the identified micSeqs are absent in non-primates and show dynamic acquisition during primate evolution culminating with most micSeqs being present in Africans, suggesting some micSeqs may be important sources of human

  8. Selection to sequence: opportunities in fungal genomics

    SciTech Connect

    Baker, Scott E.

    2009-12-01

    Selection is a biological force, causing genotypic and phenotypic change over time. Whether environmental or human induced, selective pressures shape the genotypes and the phenotypes of organisms both in nature and in the laboratory. In nature, selective pressure is highly dynamic and the sum of the environment and other organisms. In the laboratory, selection is used in genetic studies and industrial strain development programs to isolate mutants affecting biological processes of interest to researchers. Selective pressures are important considerations for fungal biology. In the laboratory a number of fungi are used as experimental systems to study a wide range of biological processes and in nature fungi are important pathogens of plants and animals and play key roles in carbon and nitrogen cycling. The continued development of high throughput sequencing technologies makes it possible to characterize at the genomic level, the effect of selective pressures both in the lab and in nature for filamentous fungi as well as other organisms.

  9. Genes after the human genome project.

    PubMed

    Baetu, Tudor M

    2012-03-01

    While the Human Genome Nomenclature Committee (HGNC) concept of the gene can accommodate a wide variety of genomic sequences contributing to phenotypic outcomes, it fails to specify how sequences should be grouped when dealing with complex loci consisting of adjacent/overlapping sequences contributing to the same phenotype, distant sequences shown to contribute to the same gene product, and partially overlapping sequences identified by different techniques. The purpose of this paper is to review recently proposed concepts of the gene and critically assess how well they succeed in addressing the above problems while preserving the degree of generality achieved by the HGNC concept. I conclude that a dynamic interplay between mapping and syntax-based concepts is required in order to satisfy these desiderata. Copyright © 2011 Elsevier Ltd. All rights reserved.

  10. The complete genome sequence of a dog: a perspective.

    PubMed

    Lee, Soohyun; Kasif, Simon

    2006-06-01

    A complete, high-quality reference sequence of a dog genome was recently produced by a team of researchers led by the Broad Institute, achieving another major milestone in deciphering the genomic landscape of mammalian organisms. The genome sequence provides an indispensable resource for comparative analysis and novel insights into dog and human evolution and history. Together with the survey sequence of a poodle previously published in 2003, the two dog genome sequences allowed identification of more than 2.5 million single nucleotide polymorphisms within and between dog breeds, which can be used in evolutionary analysis, behavioral studies and disease gene mapping.(1)

  11. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects.

    PubMed

    Holt, Carson; Yandell, Mark

    2011-12-22

    Second-generation sequencing technologies are precipitating major shifts with regards to what kinds of genomes are being sequenced and how they are annotated. While the first generation of genome projects focused on well-studied model organisms, many of today's projects involve exotic organisms whose genomes are largely terra incognita. This complicates their annotation, because unlike first-generation projects, there are no pre-existing 'gold-standard' gene-models with which to train gene-finders. Improvements in genome assembly and the wide availability of mRNA-seq data are also creating opportunities to update and re-annotate previously published genome annotations. Today's genome projects are thus in need of new genome annotation tools that can meet the challenges and opportunities presented by second-generation sequencing technologies. We present MAKER2, a genome annotation and data management tool designed for second-generation genome projects. MAKER2 is a multi-threaded, parallelized application that can process second-generation datasets of virtually any size. We show that MAKER2 can produce accurate annotations for novel genomes where training-data are limited, of low quality or even non-existent. MAKER2 also provides an easy means to use mRNA-seq data to improve annotation quality; and it can use these data to update legacy annotations, significantly improving their quality. We also show that MAKER2 can evaluate the quality of genome annotations, and identify and prioritize problematic annotations for manual review. MAKER2 is the first annotation engine specifically designed for second-generation genome projects. MAKER2 scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets.

  12. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects

    PubMed Central

    2011-01-01

    Background Second-generation sequencing technologies are precipitating major shifts with regards to what kinds of genomes are being sequenced and how they are annotated. While the first generation of genome projects focused on well-studied model organisms, many of today's projects involve exotic organisms whose genomes are largely terra incognita. This complicates their annotation, because unlike first-generation projects, there are no pre-existing 'gold-standard' gene-models with which to train gene-finders. Improvements in genome assembly and the wide availability of mRNA-seq data are also creating opportunities to update and re-annotate previously published genome annotations. Today's genome projects are thus in need of new genome annotation tools that can meet the challenges and opportunities presented by second-generation sequencing technologies. Results We present MAKER2, a genome annotation and data management tool designed for second-generation genome projects. MAKER2 is a multi-threaded, parallelized application that can process second-generation datasets of virtually any size. We show that MAKER2 can produce accurate annotations for novel genomes where training-data are limited, of low quality or even non-existent. MAKER2 also provides an easy means to use mRNA-seq data to improve annotation quality; and it can use these data to update legacy annotations, significantly improving their quality. We also show that MAKER2 can evaluate the quality of genome annotations, and identify and prioritize problematic annotations for manual review. Conclusions MAKER2 is the first annotation engine specifically designed for second-generation genome projects. MAKER2 scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets. PMID:22192575

  13. Mitochondrial genome sequences from wild and cultivated barley (Hordeum vulgare).

    PubMed

    Hisano, Hiroshi; Tsujimura, Mai; Yoshida, Hideya; Terachi, Toru; Sato, Kazuhiro

    2016-10-24

    Sequencing analysis of mitochondrial genomes is important for understanding the evolution and genome structures of various plant species. Barley is a self-pollinated diploid plant with seven chromosomes comprising a large haploid genome of 5.1 Gbp. Wild barley (Hordeum vulgare ssp. spontaneum) and cultivated barley (H. vulgare ssp. vulgare) have cross compatibility and closely related genomes, although a significant number of nucleotide polymorphisms have been reported between their genomes. We determined the complete nucleotide sequences of the mitochondrial genomes of wild and cultivated barley. Two independent circular maps of the 525,599 bp barley mitochondrial genome were constructed by de novo assembly of high-throughput sequencing reads of barley lines H602 and Haruna Nijo, with only three SNPs detected between haplotypes. These mitochondrial genomes contained 33 protein-coding genes, three ribosomal RNAs, 16 transfer RNAs, 188 new ORFs, six major repeat sequences and several types of transposable elements. Of the barley mitochondrial genome-encoded proteins, NAD6, NAD9 and RPS4 had unique structures among grass species. The mitochondrial genome of barley was similar to those of other grass species in terms of gene content, but the configuration of the genes was highly differentiated from that of other grass species. Mitochondrial genome sequencing is essential for annotating the barley nuclear genome; our mitochondrial sequencing identified a significant number of fragmented mitochondrial sequences in the reported nuclear genome sequences. Little polymorphism was detected in the barley mitochondrial genome sequences, which should be explored further to elucidate the evolution of barley.

  14. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

    USDA-ARS?s Scientific Manuscript database

    An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions fr...

  15. The potato psyllid genome project

    USDA-ARS?s Scientific Manuscript database

    The potato psyllid (Bactericera cockerelli) is a Hemipteran pest of solanaceous plants and limits potato and tomato production by the transmission of Candidatus Liberibacter solanacearum. Genomic information on the potato psyllid is limited but is vital in developing appropriate management strategi...

  16. Strategies for undertaking expressed sequence tag (EST) projects.

    PubMed

    Clifton, Sandra W; Mitreva, Makedonka

    2009-01-01

    Complementary DNA (cDNA) sequencing can be used to sample an organism's transcriptome, and the generated EST sequences can be used for a variety of purposes. They are especially important for enhancing the utility of a genome sequence or for providing a gene catalog for a genome that has not or will not be sequenced. In planning and executing a cDNA project, several criteria must be considered. One should clearly define the project purpose, including organism tissue(s) choice, whether those tissues should be pooled, ability to acquire adequate amounts of clean and well-preserved tissue, choice of type(s) of library, and construction of a library (or libraries) that is compatible with project goals. In addition, one must possess the skills to construct the library (or libraries), keeping in mind the number of clones that will be necessary to meet the project requirements. If one is inexperienced in cDNA library construction, it might be wise to outsource the library production and/or sequence and analysis to a sequencing center or to a company that specializes in those activities. One should also be aware that new sequencing platforms are being marketed that may offer simpler protocols that can produce cDNA data in a more rapid and economical manner. Of course, the bioinformatics tools will have to be in place to de-convolute and aid in data analysis for these newer technologies. Possible funding sources for these projects include well-justified grant proposals, private funding, and/or collaborators with available funds.

  17. Complete genome sequence of Jonesia denitrificans type strain (Prevot 55134T)

    SciTech Connect

    Pukall, Rudiger; Gehrich-Schroeter, Gabriele; Lapidus, Alla L.; Nolan, Matt; Glavina Del Rio, Tijana; Lucas, Susan; Chen, Feng; Tice, Hope; Pitluck, Sam; Cheng, Jan-Fang; Copeland, A; Saunders, Elizabeth H; Detter, J. Chris; Bruce, David; Goodwin, Lynne A.; Pati, Amrita; Ivanova, N; Mavromatis, K; Ovchinnikova, Galina; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Han, Cliff

    2009-01-01

    Jonesia denitrificans (Prevot 1961) Rocourt et al. 1987 is the type species of the genus Jonesia, and is of phylogenetic interest because of its isolated location in the actinobacterial suborder Micrococcineae. J. denitrificans is characterized by a typical coryneform morphology and is able to form irregular nonsporulating rods showing branched and clublike forms. Coccoid cells occur in older cultures. J. denitrificans is classified as a pathogenic organism for animals (vertebrates). The type strain whose genome is described here was originally isolated from cooked ox blood. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the genus for which a complete genome sequence is described. The 2,749,646 bp long genome with its 2558 protein-coding and 71 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  18. The reference genome sequence of Saccharomyces cerevisiae: then and now.

    PubMed

    Engel, Stacia R; Dietrich, Fred S; Fisk, Dianna G; Binkley, Gail; Balakrishnan, Rama; Costanzo, Maria C; Dwight, Selina S; Hitz, Benjamin C; Karra, Kalpana; Nash, Robert S; Weng, Shuai; Wong, Edith D; Lloyd, Paul; Skrzypek, Marek S; Miyasato, Stuart R; Simison, Matt; Cherry, J Michael

    2014-03-20

    The genome of the budding yeast Saccharomyces cerevisiae was the first completely sequenced from a eukaryote. It was released in 1996 as the work of a worldwide effort of hundreds of researchers. In the time since, the yeast genome has been intensively studied by geneticists, molecular biologists, and computational scientists all over the world. Maintenance and annotation of the genome sequence have long been provided by the Saccharomyces Genome Database, one of the original model organism databases. To deepen our understanding of the eukaryotic genome, the S. cerevisiae strain S288C reference genome sequence was updated recently in its first major update since 1996. The new version, called "S288C 2010," was determined from a single yeast colony using modern sequencing technologies and serves as the anchor for further innovations in yeast genomic science.

  19. The Reference Genome Sequence of Saccharomyces cerevisiae: Then and Now

    PubMed Central

    Engel, Stacia R.; Dietrich, Fred S.; Fisk, Dianna G.; Binkley, Gail; Balakrishnan, Rama; Costanzo, Maria C.; Dwight, Selina S.; Hitz, Benjamin C.; Karra, Kalpana; Nash, Robert S.; Weng, Shuai; Wong, Edith D.; Lloyd, Paul; Skrzypek, Marek S.; Miyasato, Stuart R.; Simison, Matt; Cherry, J. Michael

    2014-01-01

    The genome of the budding yeast Saccharomyces cerevisiae was the first completely sequenced from a eukaryote. It was released in 1996 as the work of a worldwide effort of hundreds of researchers. In the time since, the yeast genome has been intensively studied by geneticists, molecular biologists, and computational scientists all over the world. Maintenance and annotation of the genome sequence have long been provided by the Saccharomyces Genome Database, one of the original model organism databases. To deepen our understanding of the eukaryotic genome, the S. cerevisiae strain S288C reference genome sequence was updated recently in its first major update since 1996. The new version, called “S288C 2010,” was determined from a single yeast colony using modern sequencing technologies and serves as the anchor for further innovations in yeast genomic science. PMID:24374639

  20. Efficient analysis of mouse genome sequences reveal many nonsense variants

    PubMed Central

    Steeland, Sophie; Timmermans, Steven; Van Ryckeghem, Sara; Hulpiau, Paco; Saeys, Yvan; Van Montagu, Marc; Vandenbroucke, Roosmarijn E.; Libert, Claude

    2016-01-01

    Genetic polymorphisms in coding genes play an important role when using mouse inbred strains as research models. They have been shown to influence research results, explain phenotypical differences between inbred strains, and increase the amount of interesting gene variants present in the many available inbred lines. SPRET/Ei is an inbred strain derived from Mus spretus that has ∼1% sequence difference with the C57BL/6J reference genome. We obtained a listing of all SNPs and insertions/deletions (indels) present in SPRET/Ei from the Mouse Genomes Project (Wellcome Trust Sanger Institute) and processed these data to obtain an overview of all transcripts having nonsynonymous coding sequence variants. We identified 8,883 unique variants affecting 10,096 different transcripts from 6,328 protein-coding genes, which is about 28% of all coding genes. Because only a subset of these variants results in drastic changes in proteins, we focused on variations that are nonsense mutations that ultimately resulted in a gain of a stop codon. These genes were identified by in silico changing the C57BL/6J coding sequences to the SPRET/Ei sequences, converting them to amino acid (AA) sequences, and comparing the AA sequences. All variants and transcripts affected were also stored in a database, which can be browsed using a SPRET/Ei M. spretus variants web tool (www.spretus.org), including a manual. We validated the tool by demonstrating the loss of function of three proteins predicted to be severely truncated, namely Fas, IRAK2, and IFNγR1. PMID:27147605

  1. Whole-Genome Sequencing in Outbreak Analysis

    PubMed Central

    Turner, Stephen D.; Riley, Margaret F.; Petri, William A.; Hewlett, Erik L.

    2015-01-01

    SUMMARY In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  2. Cancer Genome Anatomy Project | Office of Cancer Genomics

    Cancer.gov

    The National Cancer Institute (NCI) Cancer Genome Anatomy Project (CGAP) is an online resource designed to provide the research community access to biological tissue characterization data. Request a free copy of the CGAP Website Virtual Tour CD from ocg@mail.nih.gov.

  3. Cancer Genome Anatomy Project | Office of Cancer Genomics

    Cancer.gov

    The National Cancer Institute (NCI) Cancer Genome Anatomy Project (CGAP) is an online resource designed to provide the research community access to biological tissue characterization data. Request a free copy of the CGAP Website Virtual Tour CD from ocg@mail.nih.gov.

  4. A taste of pineapple evolution through genome sequencing.

    PubMed

    Xu, Qing; Liu, Zhong-Jian

    2015-12-01

    The genome sequence assembly of the highly heterozygous Ananas comosus and its varieties is an impressive technical achievement. The sequence opens the door to a greater understanding of pineapple morphology and evolution.

  5. Complete Genome Sequence of Pigmentation Negative Yersinia Pestis strain Cadman Running head: Complete Genome Sequence of Y. pestis strain Cadman

    DTIC Science & Technology

    2016-10-27

    Institute of Infectious Diseases, Fort Detrick, Frederick, Maryland, USA 9 10 11 Running head: Complete Genome Sequence of Y. pestis strain Cadman...1 Complete Genome Sequence of Pigmentation Negative Yersinia pestis strain Cadman 1 2 3 Sean Lovetta, Kitty Chaseb, Galina Korolevaa, Gustavo...we report the genome sequence of Yersinia pestis strain Cadman, an attenuated strain 25 lacking the pgm locus. Y. pestis is the causative agent of

  6. The African Genome Variation Project shapes medical genetics in Africa

    PubMed Central

    Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O.; Choudhury, Ananyo; Ritchie, Graham R. S.; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N.; Young, Elizabeth H.; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P.; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A.; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S.

    2014-01-01

    Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterisation of African genetic diversity is needed. The African Genome Variation Project (AGVP) provides a resource to help design, implement and interpret genomic studies in sub-Saharan Africa (SSA) and worldwide. The AGVP represents dense genotypes from 1,481 and whole genome sequences (WGS) from 320 individuals across SSA. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across SSA. We identify new loci under selection, including for malaria and hypertension. We show that modern imputation panels can identify association signals at highly differentiated loci across populations in SSA. Using WGS, we show further improvement in imputation accuracy supporting efforts for large-scale sequencing of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa, showing for the first time that such designs are feasible. PMID:25470054

  7. First complete genome sequence of Bacillus glycinifermentans B-27.

    PubMed

    Stadermann, Kai Bernd; Blom, Jochen; Borgmeier, Claudia; Sciberras, Natalie; Herbold, Sandra; Kipker, Maike; Meurer, Guido; Molck, Stella; Petri, Daniel; Pelzer, Stefan; Schneider, Jessica

    2017-09-10

    The first complete genome sequence of Bacillus glycinifermentans B-27 was determined by SMRT sequencing generating a genome sequence with a total length of 4,607,442 bases. Based on this sequence 4738 protein-coding sequences were predicted and used to identify gene clusters that are related to the production of secondary metabolites such as Lichenysin, Bacillibactin and Bacitracin. This genomic potential combined with the ability of B. glycinifermentans B-27 to grown in bile containing media might contribute to a future application of this strain as probiotic in productive livestock potentially inhibiting competing and pathogenic organisms. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  8. Complete genome sequence of Halomonas sp. R5-57.

    PubMed

    Williamson, Adele; De Santi, Concetta; Altermark, Bjørn; Karlsen, Christian; Hjerde, Erik

    2016-01-01

    The marine Arctic isolate Halomonas sp. R5-57 was sequenced as part of a bioprospecting project which aims to discover novel enzymes and organisms from low-temperature environments, with potential uses in biotechnological applications. Phenotypically, Halomonas sp. R5-57 exhibits high salt tolerance over a wide range of temperatures and has extra-cellular hydrolytic activities with several substrates, indicating it secretes enzymes which may function in high salinity conditions. Genome sequencing identified the genes involved in the biosynthesis of the osmoprotectant ectoine, which has applications in food processing and pharmacy, as well as those involved in production of polyhydroxyalkanoates, which can serve as precursors to bioplastics. The percentage identity of these biosynthetic genes from Halomonas sp. R5-57 and current production strains varies between 99 % for some to 69 % for others, thus it is plausible that R5-57 may have a different production capacity to currently used strains, or that in the case of PHAs, the properties of the final product may vary. Here we present the finished genome sequence (LN813019) of Halomonas sp. R5-57 which will facilitate exploitation of this bacterium; either as a whole-cell production host, or by recombinant expression of its individual enzymes.

  9. MIPS: a database for protein sequences and complete genomes.

    PubMed Central

    Mewes, H W; Hani, J; Pfeiffer, F; Frishman, D

    1998-01-01

    The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). Through its WWW server (http://www.mips.biochem.mpg.de ) MIPS provides access to a variety of generic databases, including a database of protein families as well as automatically generated data by the systematic application of sequence analysis algorithms. The yeast genome sequence and its related information was also compiled on CD-ROM to provide dynamic interactive access to the 16 chromosomes of the first eukaryotic genome unraveled. PMID:9399795

  10. Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes.

    PubMed

    Kalbfleisch, Ted; Heaton, Michael P

    2013-01-01

    Genomics research in mammals has produced reference genome sequences that are essential for identifying variation associated with disease.  High quality reference genome sequences are now available for humans, model species, and economically important agricultural animals.  Comparisons between these species have provided unique insights into mammalian gene function.  However, the number of species with reference genomes is small compared to those needed for studying molecular evolutionary relationships in the tree of life.  For example, among the even-toed ungulates there are approximately 300 species whose phylogenetic relationships have been calculated in the 10k trees project.  Only six of these have reference genomes:  cattle, swine, sheep, goat, water buffalo, and bison.  Although reference sequences will eventually be developed for additional hoof stock, the resources in terms of time, money, infrastructure and expertise required to develop a quality reference genome may be unattainable for most species for at least another decade.  In this work we mapped 35 Gb of next generation sequence data of a Katahdin sheep to its own species' reference genome ( Ovis aries Oar3.1) and to that of a species that diverged 15 to 30 million years ago ( Bos taurus UMD3.1).  In total, 56% of reads covered 76% of UMD3.1 to an average depth of 6.8 reads per site, 83 million variants were identified, of which 78 million were homozygous and likely represent interspecies nucleotide differences. Excluding repeat regions and sex chromosomes, nearly 3.7 million heterozygous sites were identified in this animal vs. bovine UMD3.1, representing polymorphisms occurring in sheep.  Of these, 41% could be readily mapped to orthologous positions in ovine Oar3.1 with 80% corroborated as heterozygous.  These variant sites, identified via interspecies mapping could be used for comparative genomics, disease association studies, and ultimately to understand mammalian gene

  11. The international Genome sample resource (IGSR): A worldwide collection of genome variation incorporating the 1000 Genomes Project data

    PubMed Central

    Clarke, Laura; Fairley, Susan; Zheng-Bradley, Xiangqun; Streeter, Ian; Perry, Emily; Lowy, Ernesto; Tassé, Anne-Marie; Flicek, Paul

    2017-01-01

    The International Genome Sample Resource (IGSR; http://www.internationalgenome.org) expands in data type and population diversity the resources from the 1000 Genomes Project. IGSR represents the largest open collection of human variation data and provides easy access to these resources. IGSR was established in 2015 to maintain and extend the 1000 Genomes Project data, which has been widely used as a reference set of human variation and by researchers developing analysis methods. IGSR has mapped all of the 1000 Genomes sequence to the newest human reference (GRCh38), and will release updated variant calls to ensure maximal usefulness of the existing data. IGSR is collecting new structural variation data on the 1000 Genomes samples from long read sequencing and other technologies, and will collect relevant functional data into a single comprehensive resource. IGSR is extending coverage with new populations sequenced by collaborating groups. Here, we present the new data and analysis that IGSR has made available. We have also introduced a new data portal that increases discoverability of our data—previously only browseable through our FTP site—by focusing on particular samples, populations or data sets of interest. PMID:27638885

  12. The international Genome sample resource (IGSR): A worldwide collection of genome variation incorporating the 1000 Genomes Project data.

    PubMed

    Clarke, Laura; Fairley, Susan; Zheng-Bradley, Xiangqun; Streeter, Ian; Perry, Emily; Lowy, Ernesto; Tassé, Anne-Marie; Flicek, Paul

    2017-01-04

    The International Genome Sample Resource (IGSR; http://www.internationalgenome.org) expands in data type and population diversity the resources from the 1000 Genomes Project. IGSR represents the largest open collection of human variation data and provides easy access to these resources. IGSR was established in 2015 to maintain and extend the 1000 Genomes Project data, which has been widely used as a reference set of human variation and by researchers developing analysis methods. IGSR has mapped all of the 1000 Genomes sequence to the newest human reference (GRCh38), and will release updated variant calls to ensure maximal usefulness of the existing data. IGSR is collecting new structural variation data on the 1000 Genomes samples from long read sequencing and other technologies, and will collect relevant functional data into a single comprehensive resource. IGSR is extending coverage with new populations sequenced by collaborating groups. Here, we present the new data and analysis that IGSR has made available. We have also introduced a new data portal that increases discoverability of our data-previously only browseable through our FTP site-by focusing on particular samples, populations or data sets of interest. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Genome Wide Characterization of Simple Sequence Repeats in Cucumber

    USDA-ARS?s Scientific Manuscript database

    The whole genome sequence of the cucumber cultivar Gy14 was recently sequenced at 15× coverage with the Roche 454 Titanium technology. The microsatellite DNA sequences (simple sequence repeats, SSRs) in the assembled scaffolds were computationally explored and characterized. A total of 112,073 SSRs ...

  14. Finishing The Euchromatic Sequence Of The Human Genome

    SciTech Connect

    Rubin, Edward M.; Lucas, Susan; Richardson, Paul; Rokhsar, Daniel; Pennacchio, Len

    2004-09-07

    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process.The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers {approx}99% of the euchromatic genome and is accurate to an error rate of {approx}1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number,birth and death. Notably, the human genome seems to encode only20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead.

  15. Microbial genome sequencing using optical mapping and Illumina sequencing

    USDA-ARS?s Scientific Manuscript database

    Introduction Optical mapping is a technique in which strands of genomic DNA are digested with one or more restriction enzymes, and a physical map of the genome constructed from the resulting image. In outline, genomic DNA is extracted from a pure culture, linearly arrayed on a specialized glass sli...

  16. Genome sequence of Brevibacillus laterosporus strain GI-9.

    PubMed

    Sharma, Vikas; Singh, Pradip K; Midha, Samriti; Ranjan, Manish; Korpole, Suresh; Patil, Prabhu B

    2012-03-01

    We report the 5.18-Mb genome sequence of Brevibacillus laterosporus strain GI-9, isolated from a subsurface soil sample during a screen for novel strains producing antimicrobial compounds. The draft genome of this strain will aid in biotechnological exploitation and comparative genomics of Brevibacillus laterosporus strains.

  17. Draft Genome Sequence of Cystobacter ferrugineus Strain Cbfe23

    PubMed Central

    Akbar, Shukria; Dowd, Scot E.

    2017-01-01

    ABSTRACT In an effort to explore myxobacterial natural product biosynthetic pathways, the draft genome sequence of Cystobacter ferrugineus strain Cbfe23 has been obtained. Analysis of the genome using antiSMASH suggests a multitude of unique natural product biosynthetic pathways. This genome will contribute to the investigation of secondary metabolism in other myxobacterial species. PMID:28183768

  18. Draft Genome Sequence of Archangium sp. Strain Cb G35

    PubMed Central

    Adaikpoh, Barbara I.; Dowd, Scot E.

    2017-01-01

    ABSTRACT In an effort to explore myxobacterial natural product biosynthetic pathways, the draft genome sequence of Archangium sp. strain Cb G35 has been obtained. Analysis of the genome using antiSMASH predicts 49 natural product biosynthetic pathways. This genome will contribute to the investigation of myxobacterial secondary metabolite biosynthetic pathways. PMID:28232451

  19. Crustacean oxi-reductases protein sequences derived from a functional genomic project potentially involved in ecdysteroid hormones metabolism - a starting point for function examination.

    PubMed

    Tom, Moshe; Manfrin, Chiara; Giulianini, Piero G; Pallavicini, Alberto

    2013-12-01

    A transcriptomic assembly originated from hypodermis and Y organ of the crustacean Pontastacus leptodactylus is used here for in silico characterization of oxi-reductase enzymes potentially involved in the metabolism of ecdysteroid molting hormones. RNA samples were extracted from male Y organ and its neighboring hypodermis in all stages of the molt cycle. An equimolar RNA mix from all stages was sequenced using next generation sequencing technologies and de novo assembled, resulting with 74,877 unique contigs. These transcript sequences were annotated by examining their resemblance to all GenBank translated transcripts, determining their Gene Ontology terms and their characterizing domains. Based on the present knowledge of arthropod ecdysteroid metabolism and more generally on steroid metabolism in other taxa, transcripts potentially related to ecdysteroid metabolism were identified and their longest possible conceptual protein sequences were constructed in two stages, correct reading frame was deduced from BLASTX resemblances, followed by elongation of the protein sequence by identifying the correct translation frame of the original transcript. The analyzed genes belonged to several oxi-reductase superfamilies including the Rieske non heme iron oxygenases, cytochrome P450s, short-chained hydroxysteroid oxi-reductases, aldo/keto oxireductases, lamin B receptor/sterol reductases and glucose-methanol-cholin oxi-reductatses. A total of 68 proteins were characterized and the most probable participants in the ecdysteroid metabolism where indicated. The study provides transcript and protein structural information, a starting point for further functional studies, using a variety of gene-specific methods to demonstrate or disprove the roles of these proteins in relation to ecdysteroid metabolism in P. leptodactylus.

  20. Cancer Genome Sequencing: Understanding Malignancy as a Disease of the Genome, its Conformation, and its Evolution

    PubMed Central

    Patel, Lalit R.; Nykter, Matti; Chen, Kexin; Zhang, Wei

    2013-01-01

    Advances in cancer genomics have been propelled by the steady evolution of molecular profiling technologies. Over the past decade, high-throughput sequencing technologies have matured to the point necessary to support disease-specific shotgun sequencing. This has compelled whole-genome sequencing studies across a broad panel of malignancies. The emergence of high-throughput sequencing technologies has inspired new chemical and computational techniques enabling interrogation of cancer-specific genomic and transcriptomic variants, previously unannotated genes, and chromatin structure. Finally, recent progress in single-cell sequencing holds great promise for studies interrogating the consequences of tumor evolution in cancers presenting with genomic heterogeneity. PMID:23111104

  1. SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets.

    PubMed

    Sarovich, Derek S; Price, Erin P

    2014-09-08

    Next-generation sequencing (NGS) is now a commonplace tool for molecular characterisation of virtually any species of interest. Despite the ever-increasing use of NGS in laboratories worldwide, analysis of whole genome re-sequencing (WGS) datasets from start to finish remains nontrivial due to the fragmented nature of NGS software and the lack of experienced bioinformaticists in many research teams. We describe SPANDx (Synergised Pipeline for Analysis of NGS Data in Linux), a new tool for high-throughput comparative analysis of haploid WGS datasets comprising one through thousands of genomes. SPANDx consolidates several well-validated, open-source packages into a single tool, mitigating the need to learn and manipulate individual NGS programs. SPANDx incorporates BWA for alignment of raw NGS reads against a reference genome or pan-genome, followed by data filtering, variant calling and annotation using Picard, GATK, SAMtools and SnpEff. BEDTools has also been included for genetic locus presence/absence (P/A) determination to easily visualise the core and accessory genomes. Additional SPANDx features include construction of error-corrected single-nucleotide polymorphism (SNP) and insertion-deletion matrices, and P/A matrices, to enable user-friendly visualisation of genetic variants. The SNP matrices generated using VCFtools and GATK are directly importable into PAUP*, PHYLIP or RAxML for downstream phylogenetic analysis. SPANDx has been developed to handle NGS data from Illumina, Ion Personal Genome Machine (PGM) and 454 platforms, and we demonstrate that it has comparable performance across Illumina MiSeq/HiSeq2000 and Ion PGM data. SPANDx is an all-in-one tool for comprehensive haploid WGS analysis. SPANDx is open source and is freely available at: http://sourceforge.net/projects/spandx/.

  2. Complete Genome Sequence of Bacillus megaterium Bacteriophage Eldridge

    PubMed Central

    Reveille, Alexandra M.; Eldridge, Kimberly A.

    2016-01-01

    In this study the complete genome sequence of the unique bacteriophage Eldridge, isolated from soil using Bacillus megaterium as the host organism, was determined. Eldridge is a myovirus with a genome consisting of 242 genes and is unique when compared to phage sequences in GenBank. PMID:27103735

  3. Complete Genome Sequence of Staphylococcus pseudintermedius Type Strain LMG 22219

    PubMed Central

    Abouelkhair, Mohamed A.; Riley, Matthew C.; Bemis, David A.

    2017-01-01

    ABSTRACT We report the first complete genome sequence of LMG 22219 (=ON 86T = CCUG 49543T), the Staphylococcus pseudintermedius type strain isolated from feline lung tissue. This sequence information will facilitate phylogenetic comparisons of staphylococcal species and other bacteria at the genome level. PMID:28209834

  4. Draft Genome Sequence of the Pelagic Photoferrotroph Chlorobium phaeoferrooxidans

    PubMed Central

    Hahn, Aria S.; Morgan-Lang, Connor; Thompson, Katherine J.; Simister, Rachel L.; Llirós, Marc; Hirst, Martin; Hallam, Steven J.

    2017-01-01

    ABSTRACT Here, we report the draft genome sequence of Chlorobium phaeoferrooxidans, a photoferrotrophic member of the genus Chlorobium in the phylum Chlorobi. This genome sequence provides insight into the metabolic capacity that underpins photoferrotrophy within low-light-adapted pelagic Chlorobi. PMID:28360175

  5. Whole-Genome Sequences of 26 Vibrio cholerae Isolates

    PubMed Central

    Watve, Samit S.; Chande, Aroon T.; Rishishwar, Lavanya; Jordan, I. King

    2016-01-01

    The human pathogen Vibrio cholerae employs several adaptive mechanisms for environmental persistence, including natural transformation and type VI secretion, creating a reservoir for the spread of disease. Here, we report whole-genome sequences of 26 diverse V. cholerae isolates, significantly increasing the sequence diversity of publicly available V. cholerae genomes. PMID:28007852

  6. Draft Genome Sequence of Rhodococcus sp. Strain 66b

    PubMed Central

    Myers, Cindy A.; O’Sullivan, Cathryn A.; Roper, Margaret M.

    2017-01-01

    ABSTRACT We report here the draft genome sequence and annotation of Rhodococcus sp. strain 66b isolated from the soil of southwest Western Australia. This strain exhibits a range of bioactivities, including plant growth promotion, biosurfactant production, and wax degradation. Whole-genome sequencing was conducted to uncover the underlying mechanisms. PMID:28546474

  7. Complete Genome Sequence of Lactobacillus plantarum CGMCC 8198

    PubMed Central

    Dong, Qing-Qing; Hu, Hai-Jie; Wang, Qiu-Tong; Gu, Xiang-Chao; Zhou, Hao; Zhou, Wen-Juan; Ni, Xiao-Meng

    2017-01-01

    ABSTRACT We report the complete genome sequence of Lactobacillus plantarum CGMCC 8198, a novel probiotic strain isolated from fermented herbage. We have determined the complete genome sequence of strain L. plantarum CGMCC 8198, which consists of genes that are likely to be involved in dairy fermentation and that have probiotic qualities. PMID:28183756

  8. Complete Genome Sequence of Burkholderia cepacia Strain LO6

    PubMed Central

    Belcaid, Mahdi; Kang, Yun; Tuanyok, Apichai

    2015-01-01

    Burkholderia cepacia strain LO6 is a betaproteobacterium that was isolated from a cystic fibrosis patient. Here we report the 6.4 Mb draft genome sequence assembled into 2 contigs. This genome sequence will aid the transcriptomic profiling of this bacterium and help us to better understand the mechanisms specific to pulmonary infections. PMID:26067955

  9. Complete Genome Sequence of Burkholderia cepacia Strain LO6.

    PubMed

    Belcaid, Mahdi; Kang, Yun; Tuanyok, Apichai; Hoang, Tung T

    2015-06-11

    Burkholderia cepacia strain LO6 is a betaproteobacterium that was isolated from a cystic fibrosis patient. Here we report the 6.4 Mb draft genome sequence assembled into 2 contigs. This genome sequence will aid the transcriptomic profiling of this bacterium and help us to better understand the mechanisms specific to pulmonary infections.

  10. Draft Genome Sequence of Neurospora crassa Strain FGSC 73

    SciTech Connect

    Baker, Scott E.; Schackwitz, Wendy; Lipzen, Anna; Martin, Joel; Haridas, Sajeet; LaButti, Kurt; Grigoriev, Igor V.; Simmons, Blake A.; McCluskey, Kevin

    2015-03-05

    We report the elucidation of the complete genome of the Neurospora crassa (Shear and Dodge) strain FGSC 73, a mat-a, trp-3 mutant strain. The genome sequence around the idiotypic mating type locus represents the only publicly available sequence for a mat-a strain. 40.42 Megabases are assembled into 358 scaffolds carrying 11,978 gene models.

  11. Draft Genome Sequence of Neurospora crassa Strain FGSC 73

    DOE PAGES

    Baker, Scott E.; Schackwitz, Wendy; Lipzen, Anna; ...

    2015-04-02

    We report the elucidation of the complete genome of the Neurospora crassa (Shear and Dodge) strain FGSC 73, a mat-a, trp-3 mutant strain. The genome sequence around the idiotypic mating type locus represents the only publicly available sequence for a mat-a strain. 40.42 Megabases are assembled into 358 scaffolds carrying 11,978 gene models.

  12. Genome Sequence of Pasteurella multocida Strain Razi_Pm0001

    PubMed Central

    Tadayon, Keyvan

    2017-01-01

    ABSTRACT We report here the genome sequence of Pasteurella multocida Razi_Pm0001 from bovine origin, isolated in Iran in 1936. The genome has a size of 2,360,663 bp, a G+C content of 40.4%, and is predicted to contain 2,052 coding sequences. PMID:28153892

  13. Complete genome sequence of Enterobacter aerogenes KCTC 2190.

    PubMed

    Shin, Sang Heum; Kim, Sewhan; Kim, Jae Young; Lee, Soojin; Um, Youngsoon; Oh, Min-Kyu; Kim, Young-Rok; Lee, Jinwon; Yang, Kap-Seok

    2012-05-01

    This is the first complete genome sequence of the Enterobacter aerogenes species. Here we present the genome sequence of E. aerogenes KCTC 2190, which contains 5,280,350 bp with a G + C content of 54.8 mol%, 4,912 protein-coding genes, and 109 structural RNAs.

  14. Complete Genome Sequences of Five Paenibacillus larvae Bacteriophages.

    PubMed

    Sheflo, Michael A; Gardner, Adam V; Merrill, Bryan D; Fisher, Joshua N B; Lunt, Bryce L; Breakwell, Donald P; Grose, Julianne H; Burnett, Sandra H

    2013-11-14

    Paenibacillus larvae is a pathogen of honeybees that causes American foulbrood (AFB). We isolated bacteriophages from soil containing bee debris collected near beehives in Utah. We announce five high-quality complete genome sequences, which represent the first completed genome sequences submitted to GenBank for any P. larvae bacteriophage.

  15. Draft Genome Sequence of Tannerella forsythia Type Strain ATCC 43037.

    PubMed

    Friedrich, Valentin; Pabinger, Stephan; Chen, Tsute; Messner, Paul; Dewhirst, Floyd E; Schäffer, Christina

    2015-06-11

    Tannerella forsythia is an oral pathogen implicated in the development of periodontitis. Here, we report the draft genome sequence of the Tannerella forsythia strain ATCC 43037. The previously available genome of this designation (NCBI reference sequence NC_016610.1) was discovered to be derived from a different strain, FDC 92A2 (= ATCC BAA-2717).

  16. Draft Genome Sequence of Micrococcus luteus (Schroeter) Cohn (ATCC 12698).

    PubMed

    Putonti, Catherine; Cudone, Evan; Kalesinskas, Laurynas; Engelbrecht, Kathleen C; Koenig, David W; Wolfe, Alan J

    2017-07-06

    The actinobacterium Micrococcus luteus can be found in a wide variety of habitats. Here, we report the 2,411,958-bp draft genome sequence of the type strain M. leuteus (Schroeter) Cohn (ATCC 12698). Characteristic of this taxa, the genome sequence has a high G+C content, 73.14%. Copyright © 2017 Putonti et al.

  17. Full Genome Sequence of Giant Panda Rotavirus Strain CH-1

    PubMed Central

    Guo, Ling; Yang, Shaolin; Wang, Chengdong; Chen, Shijie; Yang, Xiaonong; Hou, Rong; Quan, Zifang; Hao, Zhongxiang

    2013-01-01

    We report here the complete genomic sequence of the giant panda rotavirus strain CH-1. This work is the first to document the complete genomic sequence (segments 1 to 11) of the CH-1 strain, which offers an effective platform for providing authentic research experiences to novice scientists. PMID:23469354

  18. Almost finished: the complete genome sequence of Mycosphaerella graminicola

    USDA-ARS?s Scientific Manuscript database

    Mycosphaerella graminicola causes septoria tritici blotch of wheat. An 8.9x shotgun sequence of bread wheat strain IPO323 was generated through the Community Sequencing Program of the U.S. Department of Energy’s Joint Genome Institute (JGI), and was finished at the Stanford Human Genome Center. The ...

  19. Initial sequencing and analysis of the human genome.

    PubMed

    Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, Y; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowki, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

    2001-02-15

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

  20. Draft genome sequence of Kocuria rhizophila P7-4.

    PubMed

    Kim, Woo-Jin; Kim, Young-Ok; Kim, Dae-Soo; Choi, Sang-Haeng; Kim, Dong-Wook; Lee, Jun-Seo; Kong, Hee Jeong; Nam, Bo-Hye; Kim, Bong-Seok; Lee, Sang-Jun; Park, Hong-Seog; Chae, Sung-Hwa

    2011-08-01

    We report the draft genome sequence of Kocuria rhizophila P7-4, which was isolated from the intestine of Siganus doliatus caught in the Pacific Ocean. The 2.83-Mb genome sequence consists of 75 large contigs (>100 bp in size) and contains 2,462 predicted protein-coding genes.

  1. Draft Genome Sequence of Vibrio (Listonella) anguillarum ATCC 14181

    PubMed Central

    Grim, Christopher J.

    2016-01-01

    We report the draft genome sequence of Vibrio anguillarum ATCC 14181, a Gram-negative, hemolytic, O2 serotype marine bacterium that causes mortality in mariculture species. The availability of this genome sequence will add to our knowledge of diversity and virulence mechanisms of Vibrio anguillarum as well as other pathogenic Vibrio spp. PMID:27795288

  2. Draft Genome Sequence of “Cohnella kolymensis” B-2846

    PubMed Central

    Kudryashova, Ekaterina B.; Ariskina, Elena V.

    2016-01-01

    A draft genome sequence of “Cohnella kolymensis” strain B-2846 was derived using IonTorrent sequencing technology. The size of the assembly and G+C content were in agreement with those of other species of this genus. Characterization of the genome of a novel species of Cohnella will assist in bacterial systematics. PMID:26769947

  3. Complete genome sequence of ‘Candidatus Liberibacter africanus’

    USDA-ARS?s Scientific Manuscript database

    The complete genome sequence of ‘Candidatus Liberibacter africanus’ (Laf), strain ptsapsy, was obtained by an Illumina HiSeq 2000. The Laf genome comprises 1,192,232 nucleotides, 34.5% GC content, 1,141 predicted coding sequences, 44 tRNAs, 3 complete copies of ribosomal RNA genes (16S, 23S and 5S) ...

  4. Draft Genome Sequence of the Suttonella ornithocola Bacterium

    PubMed Central

    Waldman Ben-Asher, Hiba; Yerushalmi, Rebecca; Wachtel, Chaim; Barbiro-Michaely, Efrat

    2017-01-01

    ABSTRACT   We report here the draft genome sequence of the Suttonella ornithocola bacterium. To date, this bacterium, found in birds, passed only phylogenetic and phenotypic analyses. To our knowledge, this is the first publication of the Suttonella ornithocola genome sequence. The genetic profile provides a basis for further analysis of its infection pathways. PMID:28209820

  5. Complete Genome Sequence of Staphylococcus pseudintermedius Type Strain LMG 22219.

    PubMed

    Abouelkhair, Mohamed A; Riley, Matthew C; Bemis, David A; Kania, Stephen A

    2017-02-16

    We report the first complete genome sequence of LMG 22219 (=ON 86(T) = CCUG 49543(T)), the Staphylococcus pseudintermedius type strain isolated from feline lung tissue. This sequence information will facilitate phylogenetic comparisons of staphylococcal species and other bacteria at the genome level.

  6. Genomic DNA Enrichment Using Sequence Capture Microarrays: a Novel Approach to Discover Sequence Nucleotide Polymorphisms (SNP) in Brassica napus L

    PubMed Central

    Clarke, Wayne E.; Parkin, Isobel A.; Gajardo, Humberto A.; Gerhardt, Daniel J.; Higgins, Erin; Sidebottom, Christine; Sharpe, Andrew G.; Snowdon, Rod J.; Federico, Maria L.; Iniguez-Luy, Federico L.

    2013-01-01

    Targeted genomic selection methodologies, or sequence capture, allow for DNA enrichment and large-scale resequencing and characterization of natural genetic variation in species with complex genomes, such as rapeseed canola (Brassica napus L., AACC, 2n=38). The main goal of this project was to combine sequence capture with next generation sequencing (NGS) to discover single nucleotide polymorphisms (SNPs) in specific areas of the B. napus genome historically associated (via quantitative trait loci –QTL– analysis) to traits of agronomical and nutritional importance. A 2.1 million feature sequence capture platform was designed to interrogate DNA sequence variation across 47 specific genomic regions, representing 51.2 Mb of the Brassica A and C genomes, in ten diverse rapeseed genotypes. All ten genotypes were sequenced using the 454 Life Sciences chemistry and to assess the effect of increased sequence depth, two genotypes were also sequenced using Illumina HiSeq chemistry. As a result, 589,367 potentially useful SNPs were identified. Analysis of sequence coverage indicated a four-fold increased representation of target regions, with 57% of the filtered SNPs falling within these regions. Sixty percent of discovered SNPs corresponded to transitions while 40% were transversions. Interestingly, fifty eight percent of the SNPs were found in genic regions while 42% were found in intergenic regions. Further, a high percentage of genic SNPs was found in exons (65% and 64% for the A and C genomes, respectively). Two different genotyping assays were used to validate the discovered SNPs. Validation rates ranged from 61.5% to 84% of tested SNPs, underpinning the effectiveness of this SNP discovery approach. Most importantly, the discovered SNPs were associated with agronomically important regions of the B. napus genome generating a novel data resource for research and breeding this crop species. PMID:24312619

  7. Genomic DNA enrichment using sequence capture microarrays: a novel approach to discover sequence nucleotide polymorphisms (SNP) in Brassica napus L.

    PubMed

    Clarke, Wayne E; Parkin, Isobel A; Gajardo, Humberto A; Gerhardt, Daniel J; Higgins, Erin; Sidebottom, Christine; Sharpe, Andrew G; Snowdon, Rod J; Federico, Maria L; Iniguez-Luy, Federico L

    2013-01-01

    Targeted genomic selection methodologies, or sequence capture, allow for DNA enrichment and large-scale resequencing and characterization of natural genetic variation in species with complex genomes, such as rapeseed canola (Brassica napus L., AACC, 2n=38). The main goal of this project was to combine sequence capture with next generation sequencing (NGS) to discover single nucleotide polymorphisms (SNPs) in specific areas of the B. napus genome historically associated (via quantitative trait loci -QTL- analysis) to traits of agronomical and nutritional importance. A 2.1 million feature sequence capture platform was designed to interrogate DNA sequence variation across 47 specific genomic regions, representing 51.2 Mb of the Brassica A and C genomes, in ten diverse rapeseed genotypes. All ten genotypes were sequenced using the 454 Life Sciences chemistry and to assess the effect of increased sequence depth, two genotypes were also sequenced using Illumina HiSeq chemistry. As a result, 589,367 potentially useful SNPs were identified. Analysis of sequence coverage indicated a four-fold increased representation of target regions, with 57% of the filtered SNPs falling within these regions. Sixty percent of discovered SNPs corresponded to transitions while 40% were transversions. Interestingly, fifty eight percent of the SNPs were found in genic regions while 42% were found in intergenic regions. Further, a high percentage of genic SNPs was found in exons (65% and 64% for the A and C genomes, respectively). Two different genotyping assays were used to validate the discovered SNPs. Validation rates ranged from 61.5% to 84% of tested SNPs, underpinning the effectiveness of this SNP discovery approach. Most importantly, the discovered SNPs were associated with agronomically important regions of the B. napus genome generating a novel data resource for research and breeding this crop species.

  8. Haplotype-resolved genome sequencing of a Gujarati Indian individual.

    PubMed

    Kitzman, Jacob O; Mackenzie, Alexandra P; Adey, Andrew; Hiatt, Joseph B; Patwardhan, Rupali P; Sudmant, Peter H; Ng, Sarah B; Alkan, Can; Qiu, Ruolan; Eichler, Evan E; Shendure, Jay

    2011-01-01

    Haplotype information is essential to the complete description and interpretation of genomes, genetic diversity and genetic ancestry. Although individual human genome sequencing is increasingly routine, nearly all such genomes are unresolved with respect to haplotype. Here we combine the throughput of massively parallel sequencing with the contiguity information provided by large-insert cloning to experimentally determine the haplotype-resolved genome of a South Asian individual. A single fosmid library was split into a modest number of pools, each providing ∼3% physical coverage of the diploid genome. Sequencing of each pool yielded reads overwhelmingly derived from only one homologous chromosome at any given location. These data were combined with whole-genome shotgun sequence to directly phase 94% of ascertained heterozygous single nucleotide polymorphisms (SNPs) into long haplotype blocks (N50 of 386 kilobases (kbp)). This method also facilitates the analysis of structural variation, for example, to anchor novel insertions to specific locations and haplotypes.

  9. Genome sequences published outside of Standards in Genomic Sciences, January – June 2011

    PubMed Central

    Nelson, Oranmiyan W.; Garrity, George M.

    2011-01-01

    The purpose of this table is to provide the community with a citable record of publications of ongoing genome sequencing projects that have led to a publication in the scientific literature. While our goal is to make the list complete, there is no guarantee that we may have omitted one or more publications appearing in this time frame. Readers and authors who wish to have publications added to this subsequent versions of this list are invited to provide the bibliometric data for such references to the SIGS editorial office.

  10. Genome sequences published outside of Standards in Genomic Sciences, October - November 2012

    PubMed Central

    Nelson, Oranmiyan W.; Garrity, George M.

    2012-01-01

    The purpose of this table is to provide the community with a citable record of publications of ongoing genome sequencing projects that have led to a publication in the scientific literature. While our goal is to make the list complete, there is no guarantee that we may have omitted one or more publications appearing in this time frame. Readers and authors who wish to have publications added to subsequent versions of this list are invited to provide the bibliographic data for such references to the SIGS editorial office.

  11. Genome sequences published outside of Standards in Genomic Sciences, July - October 2012

    PubMed Central

    Nelson, Oranmiyan W.; Garrity, George M.

    2012-01-01

    The purpose of this table is to provide the community with a citable record of publications of ongoing genome sequencing projects that have led to a publication in the scientific literature. While our goal is to make the list complete, there is no guarantee that we may have omitted one or more publications appearing in this time frame. Readers and authors who wish to have publications added to subsequent versions of this list are invited to provide the bibliographic data for such references to the SIGS editorial office.

  12. Genome sequences published outside of Standards in Genomic Sciences, May-June 2012

    PubMed Central

    Nelson, Oranmiyan W.; Garrity, George M.

    2012-01-01

    The purpose of this table is to provide the community with a citable record of publications of ongoing genome sequencing projects that have led to a publication in the scientific literature. While our goal is to make the list complete, there is no guarantee that we may have omitted one or more publications appearing in this time frame. Readers and authors who wish to have publications added to subsequent versions of this list are invited to provide the bibliographic data for such references to the SIGS editorial office.

  13. From complete genome sequence to “complete“ understanding?

    PubMed Central

    Galperin, Michael Y.; Koonin, Eugene V.

    2011-01-01

    The rapidly accumulating genome sequence data allow researchers to address fundamental biological questions that were not even asked just a few years ago. A major problem in genomics is the widening gap between the rapid progress in genome sequencing and the comparatively slow progress in the functional characterization of sequenced genomes. Here we discuss two key questions of genome biology: whether we need more genomes, and how deep is our understanding of biology based on genomic analysis. We argue that overly specific annotations of gene functions are often less useful than the more generic, but also more robust, functional assignments based on protein family classification. We also discuss problems in understanding the functions of the remaining “conserved hypothetical” genes. PMID:20647113

  14. A physical map of the highly heterozygous Populus genome: integration with the genome sequence and genetic map and analysis of haplotype variation

    SciTech Connect

    Kelleher, Colin; Chiu, Readman; Shin, Heesun; Bosdet, Ian; Krywinski, Martin; Fjell, Chris; Wilkin, Jennifer; Yin, Tongming; DiFazio, Stephen P; Ali, Johar; Asano, Jennifer; Chan, Susanna; Cloutier, Alison; Girn, Noreen; Leach, Stephen; Lee, Darlene; Mathewson, Carrie; Olson, Teika; O'Connor, Katie; Prabhu, Anna-Liisa; Smailus, Duane; Stott, Jeffery; Tsai, Miranda; Wye, Natasaja; Yang, George; Zhuang, Jun; Holt, Robert A.; Putnam, Nicholas; Vrebalov, Julia; Giovannoni, James; Grimwood, Jane; Schmutz, Jeremy; Rokhsar, Daniel; Jones, Steven; Marra, Marco; Tuskan, Gerald A; Bohlmann, J.; Ellis, Brian; Ritland, Kermit; Douglas, Carl; Schein, Jacqueline

    2007-01-01

    As part of a larger project to sequence the Populus genome and generate genomic resources for this emerging model tree, we constructed a physical map of the Populus genome, representing one of the first maps of an undomesticated, highly heterozygous plant species. The physical map, consisting of 2,802 contigs, was constructed from fingerprinted bacterial artificial chromosome (BAC) clones. The map represents approximately 9.4-fold coverage of the 485+10 Mb Populus genome, as estimated from the genome sequence assembly. BAC ends were sequenced to aid in long-range assembly of whole genome shotgun sequence scaffolds and to anchor the physical map to the genome sequence. Simple sequence repeat (SSR)-based markers were derived from the end sequences and used to initiate integration of the BAC and genetic maps. 2,411 physical map contigs, representing 97% of all clones assigned to contigs, were aligned to the sequence assembly (JGI Populus trichocarpa v1.0). These alignments represent a total coverage of 384 Mb (79%) of the entire poplar sequence assembly and 295 Mb (96%) of linkage group sequence assemblies. A striking result of the physical map contig alignments to the sequence assembly was the co-localization of multiple contigs across numerous regions of the 19 linkage groups. Targeted sequencing of BAC clones and genetic analysis in a small number of representative regions showed that these co-aligning contigs represent distinct haplotypes in the heterozygous individual sequenced, and revealed the nature of these haplotype sequence differences.

  15. De novo assembly of a bell pepper endornavirus genome sequence using RNA sequencing data.

    PubMed

    Jo, Yeonhwa; Choi, Hoseng; Cho, Won Kyong

    2015-03-19

    The genus Endornavirus is a double-stranded RNA virus that infects a wide range of hosts. In this study, we report on the de novo assembly of a bell pepper endornavirus genome sequence by RNA sequencing (RNA-Seq). Our result demonstrates the successful application of RNA-Seq to obtain a complete viral genome sequence from the transcriptome data.

  16. The Oryza map alignment project: Construction, alignment and analysis of 12 BAC fingerprint/end sequence framework physical maps that represent the 10 genome types of genus Oryza

    USDA-ARS?s Scientific Manuscript database

    The Oryza Map Alignment Project (OMAP) provides the first comprehensive experimental system for understanding the evolution, physiology and biochemistry of a full genus in plants or animals. We have constructed twelve deep-coverage BAC libraries that are representative of both diploid and tetraploid...

  17. The Brachypodium genome sequence: a resource for oat genomics research

    USDA-ARS?s Scientific Manuscript database

    Oat (Avena sativa) is an important cereal crop used as both an animal feed and for human consumption. Genetic and genomic research on oat is hindered because it is hexaploid and possesses a large (13 Gb) genome. Diploid Avena relatives have been employed for genetic and genomic studies, but only mod...

  18. Complete genome sequence of Thermocrinis albus type strain (HI 11/12T)

    SciTech Connect

    Wirth, Reinhard; Sikorski, Johannes; Brambilla, Evelyne-Marie; Misra, Monica; Lapidus, Alla L.; Copeland, A; Nolan, Matt; Lucas, Susan; Chen, Feng; Cheng, Jan-Fang; Tice, Hope; Han, Cliff; Detter, J. Chris; Tapia, Roxanne; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Pati, Amrita; Anderson, Iain; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Chen, Amy; Palaniappan, Krishna; Bilek, Yvonne; Hader, Thomas; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Tindall, Brian; Rohde, Manfred; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-01

    Thermocrinis albus Eder and Huber 2002 is one of three species in the genus Thermocrinis in the family Aquificaceae. Members of this family have become of significant interest because of their involvement in global biogeochemical cycles in high-temperature ecosystems. This interest had already spurred several genome sequencing projects for members of the family. We here report the first completed genome sequence a member of the genus Thermocrinis and the first type strain genome from a member of the family Aquificaceae. The 1,500,577 bp long genome with its 1,603 protein-coding and 47 RNA genes is part of the Genomic Encyc-lopedia of Bacteria and Archaea project.

  19. Ten years of bacterial genome sequencing: comparative-genomics-based discoveries.

    PubMed

    Binnewies, Tim T; Motro, Yair; Hallin, Peter F; Lund, Ole; Dunn, David; La, Tom; Hampson, David J; Bellgard, Matthew; Wassenaar, Trudy M; Ussery, David W

    2006-07-01

    It has been more than 10 years since the first bacterial genome sequence was published. Hundreds of bacterial genome sequences are now available for comparative genomics, and searching a given protein against more than a thousand genomes will soon be possible. The subject of this review will address a relatively straightforward question: "What have we learned from this vast amount of new genomic data?" Perhaps one of the most important lessons has been that genetic diversity, at the level of large-scale variation amongst even genomes of the same species, is far greater than was thought. The classical textbook view of evolution relying on the relatively slow accumulation of mutational events at the level of individual bases scattered throughout the genome has changed. One of the most obvious conclusions from examining the sequences from several hundred bacterial genomes is the enormous amount of diversity--even in different genomes from the same bacterial species. This diversity is generated by a variety of mechanisms, including mobile genetic elements and bacteriophages. An examination of the 20 Escherichia coli genomes sequenced so far dramatically illustrates this, with the genome size ranging from 4.6 to 5.5 Mbp; much of the variation appears to be of phage origin. This review also addresses mobile genetic elements, including pathogenicity islands and the structure of transposable elements. There are at least 20 different methods available to compare bacterial genomes. Metagenomics offers the chance to study genomic sequences found in ecosystems, including genomes of species that are difficult to culture. It has become clear that a genome sequence represents more than just a collection of gene sequences for an organism and that information concerning the environment and growth conditions for the organism are important for interpretation of the genomic data. The newly proposed Minimal Information about a Genome Sequence standard has been developed to obtain this

  20. Draft Sequences of the Radish (Raphanus sativus L.) Genome

    PubMed Central

    Kitashiba, Hiroyasu; Li, Feng; Hirakawa, Hideki; Kawanabe, Takahiro; Zou, Zhongwei; Hasegawa, Yoichi; Tonosaki, Kaoru; Shirasawa, Sachiko; Fukushima, Aki; Yokoi, Shuji; Takahata, Yoshihito; Kakizaki, Tomohiro; Ishida, Masahiko; Okamoto, Shunsuke; Sakamoto, Koji; Shirasawa, Kenta; Tabata, Satoshi; Nishio, Takeshi

    2014-01-01

    Radish (Raphanus sativus L., n = 9) is one of the major vegetables in Asia. Since the genomes of Brassica and related species including radish underwent genome rearrangement, it is quite difficult to perform functional analysis based on the reported genomic sequence of Brassica rapa. Therefore, we performed genome sequencing of radish. Short reads of genomic sequences of 191.1 Gb were obtained by next-generation sequencing (NGS) for a radish inbred line, and 76,592 scaffolds of ≥300 bp were constructed along with the bacterial artificial chromosome-end sequences. Finally, the whole draft genomic sequence of 402 Mb spanning 75.9% of the estimated genomic size and containing 61,572 predicted genes was obtained. Subsequently, 221 single nucleotide polymorphism markers and 768 PCR-RFLP markers were used together with the 746 markers produced in our previous study for the construction of a linkage map. The map was combined further with another radish linkage map constructed mainly with expressed sequence tag-simple sequence repeat markers into a high-density integrated map of 1,166 cM with 2,553 DNA markers. A total of 1,345 scaffolds were assigned to the linkage map, spanning 116.0 Mb. Bulked PCR products amplified by 2,880 primer pairs were sequenced by NGS, and SNPs in eight inbred lines were identified. PMID:24848699

  1. Draft sequences of the radish (Raphanus sativus L.) genome.

    PubMed

    Kitashiba, Hiroyasu; Li, Feng; Hirakawa, Hideki; Kawanabe, Takahiro; Zou, Zhongwei; Hasegawa, Yoichi; Tonosaki, Kaoru; Shirasawa, Sachiko; Fukushima, Aki; Yokoi, Shuji; Takahata, Yoshihito; Kakizaki, Tomohiro; Ishida, Masahiko; Okamoto, Shunsuke; Sakamoto, Koji; Shirasawa, Kenta; Tabata, Satoshi; Nishio, Takeshi

    2014-10-01

    Radish (Raphanus sativus L., n = 9) is one of the major vegetables in Asia. Since the genomes of Brassica and related species including radish underwent genome rearrangement, it is quite difficult to perform functional analysis based on the reported genomic sequence of Brassica rapa. Therefore, we performed genome sequencing of radish. Short reads of genomic sequences of 191.1 Gb were obtained by next-generation sequencing (NGS) for a radish inbred line, and 76,592 scaffolds of ≥ 300 bp were constructed along with the bacterial artificial chromosome-end sequences. Finally, the whole draft genomic sequence of 402 Mb spanning 75.9% of the estimated genomic size and containing 61,572 predicted genes was obtained. Subsequently, 221 single nucleotide polymorphism markers and 768 PCR-RFLP markers were used together with the 746 markers produced in our previous study for the construction of a linkage map. The map was combined further with another radish linkage map constructed mainly with expressed sequence tag-simple sequence repeat markers into a high-density integrated map of 1,166 cM with 2,553 DNA markers. A total of 1,345 scaffolds were assigned to the linkage map, spanning 116.0 Mb. Bulked PCR products amplified by 2,880 primer pairs were sequenced by NGS, and SNPs in eight inbred lines were identified. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  2. Genome Sequence of Human Rhinovirus A22, Strain Lancaster/2015

    PubMed Central

    Atkinson, Kate V.; Bishop, Lisa A.; Rhodes, Glenn; Salez, Nicolas; McEwan, Neil R.; Hegarty, Matthew J.; Robey, Julie; Harding, Nicola; Wetherell, Simon; Lauder, Robert M.; Pickup, Roger W.; Wilkinson, Mark

    2017-01-01

    ABSTRACT The genome of human rhinovirus A22 (HRV-A22) was assembled by deep sequencing RNA samples from nasopharyngeal swabs. The assembled genome is 8.7% divergent from the HRV-A22 reference strain over its full length, and it is only the second full-length genome sequence for HRV-A22. The new strain is designated strain HRV-A22/Lancaster/2015. PMID:28336607

  3. Tyrosine kinome sequencing of pediatric acute lymphoblastic leukemia: a report from the Children's Oncology Group TARGET Project | Office of Cancer Genomics

    Cancer.gov

    TARGET researchers sequenced the tyrosine kinome and downstream signaling genes in 45 high-risk pediatric ALL cases with activated kinase signaling, including Ph-like ALL, to establish the incidence of tyrosine kinase mutations in this cohort. The study confirmed previously identified somatic mutations in JAK and FLT3, but did not find novel alterations in any additional tyrosine kinases or downstream genes. The mechanism of kinase signaling activation in this high-risk subgroup of pediatric ALL remains largely unknown.

  4. NCBI Reference Sequence project: update and current status.

    PubMed

    Pruitt, Kim D; Tatusova, Tatiana; Maglott, Donna R

    2003-01-01

    The goal of the NCBI Reference Sequence (RefSeq) project is to provide the single best non-redundant and comprehensive collection of naturally occurring biological molecules, representing the central dogma. Nucleotide and protein sequences are explicitly linked on a residue-by-residue basis in this collection. Ideally all molecule types will be available for each well-studied organism, but the initial database collection pragmatically includes only those molecules and organisms that are most readily identified. Thus different amounts of information are available for different organisms at any given time. Furthermore, for some organisms additional intermediate records are provided when the genome sequence is not yet finished. The collection is supplied by NCBI through three distinct pipelines in addition to collaborations with community groups. The collection is curated on an ongoing basis. Additional information about the NCBI RefSeq project is available at http://www.ncbi.nih.gov/RefSeq/.

  5. Whole-Genome sequencing and genetic variant analysis of a quarter Horse mare

    PubMed Central

    2012-01-01

    Background The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. Results Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. Conclusions This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids. PMID:22340285

  6. The genome sequence of Schizosaccharomyces pombe.

    PubMed

    Wood, V; Gwilliam, R; Rajandream, M-A; Lyne, M; Lyne, R; Stewart, A; Sgouros, J; Peat, N; Hayles, J; Baker, S; Basham, D; Bowman, S; Brooks, K; Brown, D; Brown, S; Chillingworth, T; Churcher, C; Collins, M; Connor, R; Cronin, A; Davis, P; Feltwell, T; Fraser, A; Gentles, S; Goble, A; Hamlin, N; Harris, D; Hidalgo, J; Hodgson, G; Holroyd, S; Hornsby, T; Howarth, S; Huckle, E J; Hunt, S; Jagels, K; James, K; Jones, L; Jones, M; Leather, S; McDonald, S; McLean, J; Mooney, P; Moule, S; Mungall, K; Murphy, L; Niblett, D; Odell, C; Oliver, K; O'Neil, S; Pearson, D; Quail, M A; Rabbinowitsch, E; Rutherford, K; Rutter, S; Saunders, D; Seeger, K; Sharp, S; Skelton, J; Simmonds, M; Squares, R; Squares, S; Stevens, K; Taylor, K; Taylor, R G; Tivey, A; Walsh, S; Warren, T; Whitehead, S; Woodward, J; Volckaert, G; Aert, R; Robben, J; Grymonprez, B; Weltjens, I; Vanstreels, E; Rieger, M; Schäfer, M; Müller-Auer, S; Gabel, C; Fuchs, M; Düsterhöft, A; Fritzc, C; Holzer, E; Moestl, D; Hilbert, H; Borzym, K; Langer, I; Beck, A; Lehrach, H; Reinhardt, R; Pohl, T M; Eger, P; Zimmermann, W; Wedler, H; Wambutt, R; Purnelle, B; Goffeau, A; Cadieu, E; Dréano, S; Gloux, S; Lelaure, V; Mottier, S; Galibert, F; Aves, S J; Xiang, Z; Hunt, C; Moore, K; Hurst, S M; Lucas, M; Rochet, M; Gaillardin, C; Tallada, V A; Garzon, A; Thode, G; Daga, R R; Cruzado, L; Jimenez, J; Sánchez, M; del Rey, F; Benito, J; Domínguez, A; Revuelta, J L; Moreno, S; Armstrong, J; Forsburg, S L; Cerutti, L; Lowe, T; McCombie, W R; Paulsen, I; Potashkin, J; Shpakovski, G V; Ussery, D; Barrell, B G; Nurse, P; Cerrutti, L

    2002-02-21

    We have sequenced and annotated the genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (Saccharomyces cerevisiae), possibly reflecting more-extended control regions. Some 43% of the genes contain introns, of which there are 4,730. Fifty genes have significant similarity with human disease genes; half of these are cancer related. We identify highly conserved genes important for eukaryotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing. These genes may have originated with the appearance of eukaryotic life. Few similarly conserved genes that are important for multicellular organization were identified, suggesting that the transition from prokaryotes to eukaryotes required more new genes than did the transition from unicellular to multicellular organization.

  7. Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi

    SciTech Connect

    Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.; Casjens, S. R.; Qiu, W.-G.; Mongodin, E. F.; Luft, B. J.

    2011-02-01

    Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.

  8. The dog genome: survey sequencing and comparative analysis.

    PubMed

    Kirkness, Ewen F; Bafna, Vineet; Halpern, Aaron L; Levy, Samuel; Remington, Karin; Rusch, Douglas B; Delcher, Arthur L; Pop, Mihai; Wang, Wei; Fraser, Claire M; Venter, J Craig

    2003-09-26

    A survey of the dog genome sequence (6.22 million sequence reads; 1.5x coverage) demonstrates the power of sample sequencing for comparative analysis of mammalian genomes and the generation of species-specific resources. More than 650 million base pairs (>25%) of dog sequence align uniquely to the human genome, including fragments of putative orthologs for 18,473 of 24,567 annotated human genes. Mutation rates, conserved synteny, repeat content, and phylogeny can be compared among human, mouse, and dog. A variety of polymorphic elements are identified that will be valuable for mapping the genetic basis of diseases and traits in the dog.

  9. Complete genome sequence of Leptotrichia buccalis type strain (C-1013-bT)

    SciTech Connect

    Ivanova, Natalia; Gronow, Sabine; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Chen, Feng; Tice, Hope; Cheng, Jan-Fang; Saunders, Liz; Bruce, David; Goodwin, Lynne; Brettin, Thomas; Detter, John C.; Han, Cliff; Pitluck, Sam; Mikhailova, Natalia; Pati, Amrita; Mavromatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia C.; Chain, Patrick; Rohde, Christine; Goker, Markus; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2009-05-20

    Leptotrichia buccalis (Robin 1853) Trevisan 1879 is the type species of the genus, and is of phylogenetic interest because of its isolated location in the sparsely populated and neither taxonomically nor genomically adequately accessed family 'Leptotrichiaceae' within the phylum 'Fusobacteria'. Species of Leptotrichia are large fusiform non-motile, non-sporulating rods, which often populate the human oral flora. L. buccalis is anaerobic to aerotolerant, and saccharolytic. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of the order 'Fusobacteriales' and no more than the second sequence from the phylum 'Fusobacteria'. The 2,465,610 bp long single replicon genome with its 2306 protein-coding and 61 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  10. Complete genome sequence of Brachyspira murdochii type strain (56-150T)

    PubMed Central

    Pati, Amrita; Sikorski, Johannes; Gronow, Sabine; Munk, Christine; Lapidus, Alla; Copeland, Alex; Glavina Del Tio, Tijana; Nolan, Matt; Lucas, Susan; Chen, Feng; Tice, Hope; Cheng, Jan-Fang; Han, Cliff; Detter, John C.; Bruce, David; Tapia, Roxanne; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Spring, Stefan; Rohde, Manfred; Göker, Markus; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2010-01-01

    Brachyspira murdochii Stanton et al. 1992 is a non-pathogenic, host-associated spirochete of the family Brachyspiraceae. Initially isolated from the intestinal content of a healthy swine, the ‘group B spirochaetes’ were first described as Serpulina murdochii. Members of the family Brachyspiraceae are of great phylogenetic interest because of the extremely isolated location of this family within the phylum ‘Spirochaetes’. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a type strain of a member of the family Brachyspiraceae and only the second genome sequence from a member of the genus Brachyspira. The 3,241,804 bp long genome with its 2,893 protein-coding and 40 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304710

  11. Whole genome sequencing in the undergraduate classroom: outcomes and lessons from a pilot course.

    PubMed

    Drew, Jennifer C; Triplett, Eric W

    2008-01-01

    The BIO2010 report challenged undergraduate institutions to prepare the next generation of researchers for the changing direction of biology that increasingly integrates advanced technologies, digital information, and large-scale analyses. In response, the Microbiology and Cell Science Department at the University of Florida developed a research-based course, "Bacterial Genome Sequencing." The objectives were to teach undergraduates about genomics and original research by sequencing a bacterial genome, to develop scientific communication skills by writing and submitting the project results as a class effort, and to promote an interest in biological research, particularly genomics. The students worked together to sequence, assemble, and annotate the Enterobacter cloacae P101 genome. We assessed student learning, scientific communication skills, and student attitudes by a variety of methods including exams, writing assignments, oral presentations, pre- and postcourse surveys, and a final exit survey. Assessment results demonstrate student learning gains and positive attitudes regarding the course.

  12. Complete genome sequence of Aminobacterium colombiense type strain (ALA-1T)

    SciTech Connect

    Chertkov, Olga; Sikorski, Johannes; Brambilla, Evelyne-Marie; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Han, Cliff; Detter, J. Chris; Bruce, David; Tapia, Roxanne; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Spring, Stefan; Rohde, Manfred; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-01

    Aminobacterium colombiense Baena et al. 1999 is the type species of the genus Aminobacterium. This genus is of large interest because of its isolated phylogenetic location in the family Synergistaceae, its stricty anaerobic lifestyle, and its ability to grow by fermentation of a limited range of amino acids but not carbohydrates. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the second completed genome sequence of a member of the family Synergistaceae and the first genome sequence of a member of the genus Aminobacterium. The 1,980,592 bp long genome with its 1,914 protein-coding and 56 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  13. Complete genome sequence of Leptotrichia buccalis type strain (C-1013-bT)

    SciTech Connect

    Ivanova, N; Gronow, Sabine; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Chen, Feng; Tice, Hope; Cheng, Jan-Fang; Saunders, Elizabeth H; Bruce, David; Goodwin, Lynne A.; Detter, J. Chris; Han, Cliff; Pitluck, Sam; Mikhailova, Natalia; Pati, Amrita; Mavromatis, K; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Rohde, Christine; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2009-01-01

    Leptotrichia buccalis (Robin 1853) Trevisan 1879 is the type species of the genus, and is of phylogenetic interest because of its isolated location in the sparsely populated and neither taxonomically nor genomically adequately accessed family 'Leptotrichiaceae' within the phylum 'Fusobacteria'. Species of Leptotrichia are large, fusiform, non-motile, non-sporulating rods, which often populate the human oral flora. L. buccalis is anaerobic to aerotolerant, and saccharolytic. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of the order 'Fusobacteriales' and no more than the second sequence from the phylum 'Fusobacteria'. The 2,465,610 bp long single replicon genome with its 2306 protein-coding and 61 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  14. Complete genome sequence of Aminobacterium colombiense type strain (ALA-1T)

    PubMed Central

    Chertkov, Olga; Sikorski, Johannes; Brambilla, Evelyne; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Han, Cliff; Detter, John C.; Bruce, David; Tapia, Roxanne; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Spring, Stefan; Rohde, Manfred; Göker, Markus; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2010-01-01

    Aminobacterium colombiense Baena et al. 1999 is the type species of the genus Aminobacterium. This genus is of large interest because of its isolated phylogenetic location in the family Synergistaceae, its strictly anaerobic lifestyle, and its ability to grow by fermentation of a limited range of amino acids but not carbohydrates. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the second completed genome sequence of a member of the family Synergistaceae and the first genome sequence of a member of the genus Aminobacterium. The 1,980,592 bp long genome with its 1,914 protein-coding and 56 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304712

  15. Deep sequencing of 10,000 human genomes

    PubMed Central

    Pierce, Levi C. T.; Biggs, William H.; di Iulio, Julia; Wong, Emily H. M.; Fabani, Martin M.; Kirkness, Ewen F.; Moustafa, Ahmed; Shah, Naisha; Xie, Chao; Brewerton, Suzanne C.; Bulsara, Nadeem; Garner, Chad; Metzker, Gary; Sandoval, Efren; Perkins, Brad A.; Och, Franz J.; Turpaz, Yaron; Venter, J. Craig

    2016-01-01

    We report on the sequencing of 10,545 human genomes at 30×–40× coverage with an emphasis on quality metrics and novel variant and sequence discovery. We find that 84% of an individual human genome can be sequenced confidently. This high-confidence region includes 91.5% of exon