Note: This page contains sample records for the topic genome sequencing centers from Science.gov.
While these samples are representative of the content of Science.gov,
they are not comprehensive nor are they the most current set.
We encourage you to perform a real-time search of Science.gov
to obtain the most current and comprehensive results.
Last update: November 12, 2013.
1

Genome Science: A Video Tour of the Washington University Genome Sequencing Center for High School and Undergraduate Students  

ERIC Educational Resources Information Center

|Sequencing of the human genome has ushered in a new era of biology. The technologies developed to facilitate the sequencing of the human genome are now being applied to the sequencing of other genomes. In 2004, a partnership was formed between Washington University School of Medicine Genome Sequencing Center's Outreach Program and Washington…

Flowers, Susan K.; Easter, Carla; Holmes, Andrea; Cohen, Brian; Bednarski, April E.; Mardis, Elaine R.; Wilson, Richard K.; Elgin, Sarah C. R.

2005-01-01

2

Introducing National Center for Genome Resources (NCGR) Informatics (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)  

SciTech Connect

John Crow from the National Center for Genome Resources discusses his organization's informatics at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

Crow, John [National Center for Genome Resources

2012-06-01

3

Genome Characterization Centers  

Cancer.gov

Genomics is a fast-moving field with novel technologies and platforms that help characterize the genome being made available to the research community on a continual basis. The Cancer Genome Atlas (TCGA) Genome Characterization Centers (GCCs) are responsible for characterizing all of the genomic changes found in the tumors studied as part of the TCGA program.

4

Human Genome Center  

NSDL National Science Digital Library

Human Genome Center At Lawrence Berkeley Lab (LBL), Berkeley, California: offering information about projects in Biology, Informatics and Instrumentation, photos of LBL robotic instruments, software, and online access to one LBL genomic database.

5

Sequencing technologies and genome sequencing  

Microsoft Academic Search

The high-throughput - next generation sequencing (HT-NGS) technologies are currently the hottest topic in the field of human\\u000a and animals genomics researches, which can produce over 100 times more data compared to the most sophisticated capillary sequencers\\u000a based on the Sanger method. With the ongoing developments of high throughput sequencing machines and advancement of modern\\u000a bioinformatics tools at unprecedented pace,

Chandra Shekhar Pareek; Rafal Smoczynski; Andrzej Tretyn

6

Multiplexed Fragaria Chloroplast Genome Sequencing  

Technology Transfer Automated Retrieval System (TEKTRAN)

A method to sequence multiple chloroplast genomes that uses the sequencing depth of ultra high throughput sequencing technologies was recently described. Sequencing complete chloroplast genomes can resolve phylogenetic relationships at low taxonomic levels and identify point mutations and indels tha...

7

Malaria Genome Sequencing Project.  

National Technical Information Service (NTIS)

The objectives of this 5-year Cooperative Agreement between TTGR and the Malaria Program, NMRC, were to: (Specific Aim 1) sequence 3.5 Mb of P. falciparum genomic DNA; (Specific Aim 2) annotate the sequence; (Specific Aim 3) release the information to the...

M. J. Gardner

2003-01-01

8

Malaria Genome Sequencing Project.  

National Technical Information Service (NTIS)

The objectives of this 5-year Cooperative Agreement between TICR and the Malaria Program, NMPC, were to: Specific Aim 1, sequence 3.5 Mb of P. ralciparum genomic DNA; Specific Aim 2, annotate the sequence; Specific Aim 3, release the information to the sc...

M. J. Gardner

2001-01-01

9

Porcine Genomic Sequencing Initiative  

Microsoft Academic Search

A. Specific biological rationales for the utility of the porcine sequence information Rationale and Objectives. Completion of the human genome sequence provides the starting point for understanding the genetic complexity of humans and how genetic variation contributes to diverse phenotypes and disease. It is clear that model organisms have played an invaluable role in the synthesis of this understanding. It

Gary Rohrer; Jonathan E. Beever; Max F. Rothschild; Lawrence Schook; Richard Gibbs; George Weinstock; W. Gregory

10

Genome Sequence Databases (Overview): Sequencing and Assembly  

SciTech Connect

From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

Lapidus, Alla L.

2009-01-01

11

Fungal Genome Sequencing and Bioenergy  

SciTech Connect

To date, the number of ongoing filamentous fungal genome sequencing projects is almost tenfold fewer than those of bacterial and archaeal genome projects. The fungi chosen for sequencing represent narrow kingdom diversity; most are pathogens or models. We advocate an ambitious, forward-looking phylogenetic-based genome sequencing program, designed to capture metabolic diversity within the fungal kingdom, thereby enhancing research into alternative bioenergy sources, bioremediation, and fungal-environment interactions.

Schadt, Christopher Warren [ORNL; Baker, Scott [Pacific Northwest National Laboratory (PNNL); Thykaer, Jette [Pacific Northwest National Laboratory (PNNL); Adney, William S [National Renewable Energy Laboratory (NREL); Brettin, Tom [Los Alamos National Laboratory (LANL); Brockman, Fred [Pacific Northwest National Laboratory (PNNL); Dhaeseleer, Patrick [Lawrence Livermore National Laboratory (LLNL); Martinez, A diego [Los Alamos National Laboratory (LANL); Miller, R michael [Argonne National Laboratory (ANL); Rokhsar, Daniel [U.S. Department of Energy, Joint Genome Institute; Torok, Tamas [U.S. Department of Energy, Joint Genome Institute; Tuskan, Gerald A [ORNL; Bennett, Joan [Rutgers University; Berka, Randy [Novozymes, Inc; Briggs, Steven [University of California, San Diego; Heitman, Joseph [Duke University; Rizvi, L [Royal Ontario Museum; Taylor, John [University of California, Berkeley; Turgeon, Gillian [Cornell University; Werner-Washburne, Maggie [University of New Mexico, Albuquerque; Himmel, Michael [ORNL

2008-01-01

12

Fungal Genome Sequencing and Bioenergy  

SciTech Connect

To date, the number of ongoing filamentous fungal genome sequencing projects is almost tenfold fewer than those of bacterial and archaeal genome projects. The fungi chosen for sequencing represent narrow kingdom diversity; most are pathogens or models. We advocate an ambitious, forward-looking phylogenetic-based genome sequencing program, designed to capture metabolic diversity within the fungal kingdom, thereby enhancing research into alternative bioenergy sources, bioremediation, and fungal-environment interactions.

Baker, Scott E.; Thykaer, Jette; Adney, William S.; Brettin, T.; Brockman, Fred J.; D'haeseleer, Patrik; Martinez, Antonio D.; Miller, R. M.; Rokhsar, Daniel S.; Schadt, Christopher W.; Torok, Tamas; Tuskan, Gerald; Bennett, Joan W.; Berka, Randy; Briggs, Steve; Heitman, Joseph; Taylor, John; Turgeon, Barbara G.; Werner-Washburne, Maggie; Himmel, Michael E.

2008-09-30

13

MIPS: a database for protein sequences and complete genomes  

Microsoft Academic Search

The MIPS group (Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)) at the Max-Planck- Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis

Hans-werner Mewes; Jean Hani; Friedhelm Pfeiffer; Dmitrij Frishman

1998-01-01

14

MIPS: a database for genomes and protein sequences  

Microsoft Academic Search

The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried, near Munich, Germany, continues its longstanding tradition to develop and maintain high quality curated genome databases. In addition, efforts have been intensified to cover the wealth of complete genome sequences in a systematic, comprehensive form. Bioinformatics, supporting national as well as European sequencing and functional analysis projects, has resulted in several

Hans-werner Mewes; Dmitrij Frishman; Christian Gruber; Birgitta Geier; Dirk Haase; Andreas Kaps; Kai Lemcke; Gertrud Mannhaupt; Friedhelm Pfeiffer; Christine M. Schüller; S. Stocker; B. Weil

2000-01-01

15

Whole Genome Sequencing Program (WGS)  

Center for Food Safety and Applied Nutrition (CFSAN)

... Read FDA's article in The New England Journal of Medicine (March 2011) about how genome sequencing helped resolve a salmonellosis outbreak ... More results from www.fda.gov/food/foodscienceresearch/wholegenomesequencingprogramwgs

16

Integrating sequence, evolution and functional genomics in regulatory genomics  

PubMed Central

With genome analysis expanding from the study of genes to the study of gene regulation, 'regulatory genomics' utilizes sequence information, evolution and functional genomics measurements to unravel how regulatory information is encoded in the genome.

Vingron, Martin; Brazma, Alvis; Coulson, Richard; van Helden, Jacques; Manke, Thomas; Palin, Kimmo; Sand, Olivier; Ukkonen, Esko

2009-01-01

17

Evidence from genome-wide simple sequence repeat markers for a polyphyletic origin and secondary centers of genetic diversity of Brassica juncea in China and India.  

PubMed

The oilseed Brassica juncea is an important crop with a long history of cultivation in India and China. Previous studies have suggested a polyphyletic origin of B. juncea and more than one migration from the primary to secondary centers of diversity. We investigated molecular genetic diversity based on 99 simple sequence repeat markers in 119 oilseed B. juncea varieties from China, India, Europe, and Australia to test whether molecular differentiation follows Vavilov's proposal of secondary centers of diversity in India and China. Two distinct groups were identified by markers in the A genome, and the same two groups were confirmed by markers in the B genome. Group 1 included accessions from central and western India, in addition to those from eastern China. Group 2 included accessions from central and western China, as well as those from northern and eastern India. European and Australian accessions were found only in Group 2. Chinese accessions had higher allelic diversity per accession (Group 1) and more private alleles per accession (Groups 1 and 2) than those from India. The marker data and geographic distribution of Groups 1 and 2 were consistent with two independent migrations of B. juncea from its center of origin in the Middle East and neighboring regions along trade routes to western China and northern India, followed by regional adaptation. Group 1 migrated further south and west in India, and further east in China, than Group 2. Group 2 showed diverse agroecological adaptation, with yellow-seeded spring-sown types in central and western China and brown-seeded autumn-sown types in India. PMID:23519868

Chen, Sheng; Wan, Zhenjie; Nelson, Matthew N; Chauhan, Jitendra S; Redden, Robert; Burton, Wayne A; Lin, Ping; Salisbury, Phillip A; Fu, Tingdong; Cowling, Wallace A

2013-03-21

18

The Fungal Genome Initiative and Lessons Learned from Genome Sequencing  

Microsoft Academic Search

The sequence of Saccharomyces cerevisiae enabled systematic genome-wide experimental approaches, demonstrating the power of having the complete genome of an organism. The rapid impact of these methods on research in yeast mobilized an effort to expand genomic resources for other fungi. The “fungal genome initiative” represents an organized genome sequencing effort to promote comparative and evolutionary studies across the fungal

Christina A. Cuomo; Bruce W. Birren

2010-01-01

19

Genome sequencing for healthy individuals.  

PubMed

Genome sequencing of healthy individuals has the potential to lead to improved well-being and disease prevention, but numerous challenges remain that must be addressed to realize these benefits and, importantly, these benefits must be equitable across society. PMID:24035073

Sanderson, Saskia C

2013-09-11

20

Center for Eukaryotic Structural Genomics  

NSDL National Science Digital Library

A collaboration between the Department of Biochemistry at the University of Wisconsin-Madison, the Medical College of Wisconsin, Molecular Kinetics, Inc., and Hebrew University, the Center for Eukaryotic Structural Genomics (CESG) intends to "develop critical technologies for determining three-dimensional structures of proteins rapidly and economically." The site gives an overview of CESG, including the goals and mission of the center, biographies of people involved, and the methodology and results of the program. The results section is the most substantial part of the site, giving information on how target proteins were selected, protocols and technology used, publications based on CESG research, and more.

21

Two genome sequences of the same bacterial strain, Gluconacetobacter diazotrophicus PAl 5, suggest a new standard in genome sequence submission  

PubMed Central

Gluconacetobacter diazotrophicus PAl 5 is of agricultural significance due to its ability to provide fixed nitrogen to plants. Consequently, its genome sequence has been eagerly anticipated to enhance understanding of endophytic nitrogen fixation. Two groups have sequenced the PAl 5 genome from the same source (ATCC 49037), though the resulting sequences contain a surprisingly high number of differences. Therefore, an optical map of PAl 5 was constructed in order to determine which genome assembly more closely resembles the chromosomal DNA by aligning each sequence against a physical map of the genome. While one sequence aligned very well, over 98% of the second sequence contained numerous rearrangements. The many differences observed between these two genome sequences could be owing to either assembly errors or rapid evolutionary divergence. The extent of the differences derived from sequence assembly errors could be assessed if the raw sequencing reads were provided by both genome centers at the time of genome sequence submission. Hence, a new genome sequence standard is proposed whereby the investigator supplies the raw reads along with the closed sequence so that the community can make more accurate judgments on whether differences observed in a single stain may be of biological origin or are simply caused by differences in genome assembly procedures.

Giongo, Adriana; Tyler, Heather L.; Zipperer, Ursula N.; Triplett, Eric W.

2010-01-01

22

THE TRIBOLIUM GENOME SEQUENCING PROJECT AND ITS IMPLICATIONS FOR DIABROTICA GENETICS RESEARCH  

Technology Transfer Automated Retrieval System (TEKTRAN)

The genome of Tribolium castaneum, the red flour beetle, is currently being sequenced at the Human Genome Sequencing Center, Baylor College of Medicine. Tribolium is the only beetle whose genome will have been entirely sequenced. For this reason the Tribolium genome project is of particular releva...

23

Genome Sequence of Burkholderia pseudomallei NCTC 13392.  

PubMed

Here, we describe the draft genome sequence of Burkholderia pseudomallei NCTC 13392. This isolate has been distributed as K96243, but distinct genomic differences have been identified. The genomic sequence of this isolate will provide the genomic context for previously conducted functional studies. PMID:23704173

Sahl, Jason W; Stone, Joshua K; Gelhaus, H Carl; Warren, Richard L; Cruttwell, Caroline J; Funnell, Simon G; Keim, Paul; Tuanyok, Apichai

2013-05-23

24

Genome Sequence of Burkholderia pseudomallei NCTC 13392  

PubMed Central

Here, we describe the draft genome sequence of Burkholderia pseudomallei NCTC 13392. This isolate has been distributed as K96243, but distinct genomic differences have been identified. The genomic sequence of this isolate will provide the genomic context for previously conducted functional studies.

Sahl, Jason W.; Stone, Joshua K.; Gelhaus, H. Carl; Warren, Richard L.; Cruttwell, Caroline J.; Funnell, Simon G.; Keim, Paul

2013-01-01

25

MIPS: a database for genomes and protein sequences  

Microsoft Academic Search

The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein

Hans-werner Mewes; Dmitrij Frishman; Ulrich Güldener; Gertrud Mannhaupt; Klaus F. X. Mayer; Martin Mokrejs; Burkhard Morgenstern; Martin Münsterkötter; Stephen Rudd; B. Weil

2002-01-01

26

Sequence and analysis of the Arabidopsis genome  

Microsoft Academic Search

The comprehensive analysis of the genome sequence of the plant Arabidopsis thaliana has been completed recently. The genome sequence and associated analyses provide the foundations for rapid progress in many fields of plant research, such as the exploitation of genetic variation in Arabidopsis ecotypes, the assessment of the transcriptome and proteome, and the association of genome changes at the sequence

Michael Bevan; Klaus Mayer; Owen White; Jonathan A Eisen; Daphne Preuss; Thomas Bureau; Steven L Salzberg; Hans-Werner Mewes

2001-01-01

27

GENOME SEQUENCING AND ANALYSIS OF ASPERGILLUS ORYZAE  

Technology Transfer Automated Retrieval System (TEKTRAN)

The genome of Aspergillus oryzae, an important industrial fungus used in the production of oriental fermented foods, such as soy sauce, miso, and sake, has been sequenced. The genome sequence reveals a wealth of genes encoding secreted enzymes. A comparison with the genome sequences of A. nidulans...

28

Complementary DNA Sequencing: Expressed Sequence Tags and Human Genome Project  

Microsoft Academic Search

Automated partial DNA sequencing was conducted on more than 600 randomly selected human brain complementary DNA (cDNA) clones to generate expressed sequence tags (ESTs). ESTs have applications in the discovery of new human genes, mapping of the human genome, and identification of coding regions in genomic sequences. Of the sequences generated, 337 represent new genes, including 48 with significant similarity

Mark D. Adams; Jenny M. Kelley; Jeannine D. Gocayne; Mark Dubnick; Mihael H. Polymeropoulos; Hong Xiao; Carl R. Merril; Andrew Wu; Bjorn Olde; Ruben F. Moreno; Anthony R. Kerlavage; W. Richard McCombie; J. Craig Venter

1991-01-01

29

Genomics Activities - Center for Biologics Evaluation and ...  

Center for Biologics Evaluation and Research (CBER)

Text VersionPage 1. Genomics Activities Center for Biologics Evaluation and Research ... Page 7. New Activities at CBER Supporting Genomics Research ... More results from www.fda.gov/downloads/advisorycommittees/committeesmeetingmaterials

30

Genome sequencing and functional genomics approaches in tomato  

Microsoft Academic Search

Tomato genome sequencing has been taking place through an international, 10-year initiative entitled the “International Solanaceae Genome Project” (SOL). The strategy proposed by the SOL consortium is to sequence the approximately 220?Mb of euchromatin that contains the majority of genes, rather than the entire tomato genome. Tomato and other Solanaceae plants have unique developmental aspects, such as the formation of

Daisuke Shibata

2005-01-01

31

Sequencing Intractable DNA to Close Microbial Genomes  

SciTech Connect

Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

Hurt, Jr., Richard Ashley [ORNL; Brown, Steven D [ORNL; Podar, Mircea [ORNL; Palumbo, Anthony Vito [ORNL; Elias, Dwayne A [ORNL

2012-01-01

32

SP8 Sequencing Extinct Genomes  

PubMed Central

Nucleic acids, which hold clues to the evolution of various animal and hominid taxa, are comparatively weak molecules from other cellular debris, and thus evolutionary biologists are in essence time trapped. Fortunately, DNA and protein fragments do exist in fossil remains beyond what theoretical experimentation would suggest. Sequestering of DNA molecules in humic or Maillard-like complexes likely represents a rich source of DNA molecules from the past, which have yet to be tapped. These molecules were impossible to acquire due to the selective nature of the polymerase chain reaction. Recently, however, rapid parallel pyrosequencing techniques, such as those used in metagenomics-based research, which, in theory, allow for the identification of all short nucleotide sequences in a sample in a non-selective approach, have the potential to allow the identification of all nucleic acids in a sample, and thus represent the way forward for ancient DNA. In theory, this new technology will allow the completion of genomes of extinct animals, plants, and microbes. I will discuss the benefits and pitfalls of this metagenomics approach to ancient DNA, highlighting our recent efforts underway to sequence the wooly mammoth genome as well as other fossil remains.

Poinar, H.

2007-01-01

33

An Intelligent System for Searching Genomic Sequences  

Microsoft Academic Search

In this paper, we have developed an intelligent system for searching comparative genomic sequences which departs from the traditional sequence alignment methods of nucleic residues or alphabets. Instead, we use the composition vector method that exploits pattern structures in sequences and indexing techniques for building a genomic database of prokaryotic organisms and their phylogenetic relationships. For the structural analysis of

Vandana Gummuluru; Su-shing Chen

2007-01-01

34

INTEGRATION OF THE RECOMBINATION AND PHYSICAL MAPS WITH THE GENOME SEQUENCE OF TRIBOLIUM CASTANEUM  

Technology Transfer Automated Retrieval System (TEKTRAN)

The final assembly of the Tribolium genome sequence and its integration with genetic and physical mapping data is nearing completion. Release 2 of the genome assembly by the Baylor College of Medicine’s Human Genome Sequencing Center consists of 420 sequence scaffolds which encompass >95% of the cl...

35

Accurate and comprehensive sequencing of personal genomes.  

PubMed

As whole-genome sequencing becomes commoditized and we begin to sequence and analyze personal genomes for clinical and diagnostic purposes, it is necessary to understand what constitutes a complete sequencing experiment for determining genotypes and detecting single-nucleotide variants. Here, we show that the current recommendation of ?30× coverage is not adequate to produce genotype calls across a large fraction of the genome with acceptably low error rates. Our results are based on analyses of a clinical sample sequenced on two related Illumina platforms, GAII(x) and HiSeq 2000, to a very high depth (126×). We used these data to establish genotype-calling filters that dramatically increase accuracy. We also empirically determined how the callable portion of the genome varies as a function of the amount of sequence data used. These results help provide a "sequencing guide" for future whole-genome sequencing decisions and metrics by which coverage statistics should be reported. PMID:21771779

Ajay, Subramanian S; Parker, Stephen C J; Abaan, Hatice Ozel; Fajardo, Karin V Fuentes; Margulies, Elliott H

2011-07-19

36

GAIA: framework annotation of genomic sequence.  

PubMed

As increasing amounts of genomic sequence from many organisms become available, and as DNA sequences become a primary reagent in biologic investigations, the role of annotation as a prospective guide for laboratory experiments will expand rapidly. Here we describe a process of high-throughput, reliable annotation, called framework annotation, which is designed to provide a foundation for initial biologic characterization of previously unexamined sequence. To examine this concept in practice, we have constructed Genome Annotation and Information Analysis (GAIA), a prototype software architecture that implements several elements important for framework annotation. The center of GAIA consists of an annotation database and the associated data management subsystem that forms the software bus along which other components communicate. The schema for this database defines three principal concepts: (1) Entries, consisting of sequence and associated historical data; (2) Features, comprising information of biologic interest; and (3) Experiments, describing the evidence that supports Features. The database permits tracking of annotation results over time, as well as assessment of the reliability of particular results. New framework annotation is produced by CARTA, a set of autonomous sensors that perform automatic analyses and assert results into the annotation database. These results are available via a Web-based query interface that uses graphical Java applets as well as text-based HTML pages to display data at different levels of resolution and permit interactive exploration of annotation. We present results for initial application of framework annotation to a set of test sequences, demonstrating its effectiveness in providing a starting point for biologic investigation, and discuss ways in which the current prototype can be improved. The prototype is available for public use and comment at http://www.cbil.upenn.edu/gaia. PMID:9521927

Bailey, L C; Fischer, S; Schug, J; Crabtree, J; Gibson, M; Overton, G C

1998-03-01

37

The Characterization of Twenty Sequenced Human Genomes  

Microsoft Academic Search

We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten “case” genomes from individuals with severe hemophilia A and ten “control” genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof

Kimberly Pelak; Kevin V. Shianna; Dongliang Ge; Jessica M. Maia; Mingfu Zhu; Jason P. Smith; Elizabeth T. Cirulli; Jacques Fellay; Samuel P. Dickson; Curtis E. Gumbs; Erin L. Heinzen; Anna C. Need; Elizabeth K. Ruzzo; Abanish Singh; C. Ryan Campbell; Linda K. Hong; Katharina A. Lornsen; Alexander M. McKenzie; Nara L. M. Sobreira; Julie E. Hoover-Fong; Joshua D. Milner; Ruth Ottman; Barton F. Haynes; James J. Goedert; David B. Goldstein

2010-01-01

38

Complete Genome Sequence of Mycobacterium massiliense  

PubMed Central

Mycobacterium massiliense is a rapidly growing bacterium associated with opportunistic infections. The genome of a representative isolate (strain GO 06) recovered from wound samples from patients who underwent arthroscopic or laparoscopic surgery was sequenced. To the best of our knowledge, this is the first announcement of the complete genome sequence of an M. massiliense strain.

Raiol, Taina; Ribeiro, Guilherme Menegoi; Maranhao, Andrea Queiroz; Bocca, Anamelia Lorenzetti; Silva-Pereira, Ildinete; Junqueira-Kipnis, Ana Paula; Brigido, Marcelo de Macedo

2012-01-01

39

The sequence of the human genome.  

PubMed

A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge. PMID:11181995

Venter, J C; Adams, M D; Myers, E W; Li, P W; Mural, R J; Sutton, G G; Smith, H O; Yandell, M; Evans, C A; Holt, R A; Gocayne, J D; Amanatides, P; Ballew, R M; Huson, D H; Wortman, J R; Zhang, Q; Kodira, C D; Zheng, X H; Chen, L; Skupski, M; Subramanian, G; Thomas, P D; Zhang, J; Gabor Miklos, G L; Nelson, C; Broder, S; Clark, A G; Nadeau, J; McKusick, V A; Zinder, N; Levine, A J; Roberts, R J; Simon, M; Slayman, C; Hunkapiller, M; Bolanos, R; Delcher, A; Dew, I; Fasulo, D; Flanigan, M; Florea, L; Halpern, A; Hannenhalli, S; Kravitz, S; Levy, S; Mobarry, C; Reinert, K; Remington, K; Abu-Threideh, J; Beasley, E; Biddick, K; Bonazzi, V; Brandon, R; Cargill, M; Chandramouliswaran, I; Charlab, R; Chaturvedi, K; Deng, Z; Di Francesco, V; Dunn, P; Eilbeck, K; Evangelista, C; Gabrielian, A E; Gan, W; Ge, W; Gong, F; Gu, Z; Guan, P; Heiman, T J; Higgins, M E; Ji, R R; Ke, Z; Ketchum, K A; Lai, Z; Lei, Y; Li, Z; Li, J; Liang, Y; Lin, X; Lu, F; Merkulov, G V; Milshina, N; Moore, H M; Naik, A K; Narayan, V A; Neelam, B; Nusskern, D; Rusch, D B; Salzberg, S; Shao, W; Shue, B; Sun, J; Wang, Z; Wang, A; Wang, X; Wang, J; Wei, M; Wides, R; Xiao, C; Yan, C; Yao, A; Ye, J; Zhan, M; Zhang, W; Zhang, H; Zhao, Q; Zheng, L; Zhong, F; Zhong, W; Zhu, S; Zhao, S; Gilbert, D; Baumhueter, S; Spier, G; Carter, C; Cravchik, A; Woodage, T; Ali, F; An, H; Awe, A; Baldwin, D; Baden, H; Barnstead, M; Barrow, I; Beeson, K; Busam, D; Carver, A; Center, A; Cheng, M L; Curry, L; Danaher, S; Davenport, L; Desilets, R; Dietz, S; Dodson, K; Doup, L; Ferriera, S; Garg, N; Gluecksmann, A; Hart, B; Haynes, J; Haynes, C; Heiner, C; Hladun, S; Hostin, D; Houck, J; Howland, T; Ibegwam, C; Johnson, J; Kalush, F; Kline, L; Koduru, S; Love, A; Mann, F; May, D; McCawley, S; McIntosh, T; McMullen, I; Moy, M; Moy, L; Murphy, B; Nelson, K; Pfannkoch, C; Pratts, E; Puri, V; Qureshi, H; Reardon, M; Rodriguez, R; Rogers, Y H; Romblad, D; Ruhfel, B; Scott, R; Sitter, C; Smallwood, M; Stewart, E; Strong, R; Suh, E; Thomas, R; Tint, N N; Tse, S; Vech, C; Wang, G; Wetter, J; Williams, S; Williams, M; Windsor, S; Winn-Deen, E; Wolfe, K; Zaveri, J; Zaveri, K; Abril, J F; Guigó, R; Campbell, M J; Sjolander, K V; Karlak, B; Kejariwal, A; Mi, H; Lazareva, B; Hatton, T; Narechania, A; Diemer, K; Muruganujan, A; Guo, N; Sato, S; Bafna, V; Istrail, S; Lippert, R; Schwartz, R; Walenz, B; Yooseph, S; Allen, D; Basu, A; Baxendale, J; Blick, L; Caminha, M; Carnes-Stine, J; Caulk, P; Chiang, Y H; Coyne, M; Dahlke, C; Mays, A; Dombroski, M; Donnelly, M; Ely, D; Esparham, S; Fosler, C; Gire, H; Glanowski, S; Glasser, K; Glodek, A; Gorokhov, M; Graham, K; Gropman, B; Harris, M; Heil, J; Henderson, S; Hoover, J; Jennings, D; Jordan, C; Jordan, J; Kasha, J; Kagan, L; Kraft, C; Levitsky, A; Lewis, M; Liu, X; Lopez, J; Ma, D; Majoros, W; McDaniel, J; Murphy, S; Newman, M; Nguyen, T; Nguyen, N; Nodell, M; Pan, S; Peck, J; Peterson, M; Rowe, W; Sanders, R; Scott, J; Simpson, M; Smith, T; Sprague, A; Stockwell, T; Turner, R; Venter, E; Wang, M; Wen, M; Wu, D; Wu, M; Xia, A; Zandieh, A; Zhu, X

2001-02-16

40

Human genome sequencing in health and disease.  

PubMed

Following the "finished," euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320

Gonzaga-Jauregui, Claudia; Lupski, James R; Gibbs, Richard A

2012-01-01

41

Human Genome Sequencing in Health and Disease  

PubMed Central

Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges.

Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.

2013-01-01

42

Genome Sequencing and Analysis Conference IV  

SciTech Connect

J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

Not Available

1993-12-31

43

Engineering in genomics-automating the Genome Center  

Microsoft Academic Search

Engineering enters genomics principally through the development of hardware and software tools or processes (operations research) to aid the biologist\\/geneticist to take different, more, or higher quality data. The Human Genome Project, in order to meet its goals for mapping and sequencing, is pushing to advance the state of the art in instrumentation, automation and computational biology. The focus of

Harold Garner

1994-01-01

44

Plantagora: Modeling Whole Genome Sequencing and Assembly of Plant Genomes  

PubMed Central

Background Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them. Methodology/Principal Findings For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website. Conclusions/Significance Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly further.

Barthelson, Roger; McFarlin, Adam J.; Rounsley, Steven D.; Young, Sarah

2011-01-01

45

Next-generation sequencing: applications beyond genomes  

Microsoft Academic Search

The development of DNA sequencing more than 30 years ago has profoundly impacted biological research. In the last couple of years, remarkable technological innovations have emerged that allow the direct and cost-effective sequencing of complex samples at unprecedented scale and speed. These next-generation technologies make it feasible to sequence not only static genomes, but also entire transcriptomes expressed under different

Samuel Marguerat; Jürg Bähler

2008-01-01

46

Progress in Arabidopsis genome sequencing and functional genomics.  

PubMed

Arabidopsis thaliana has a relatively small genome of approximately 130 Mb containing about 10% repetitive DNA. Genome sequencing studies reveal a gene-rich genome, predicted to contain approximately 25000 genes spaced on average every 4.5 kb. Between 10 to 20% of the predicted genes occur as clusters of related genes, indicating that local sequence duplication and subsequent divergence generates a significant proportion of gene families. In addition to gene families, repetitive sequences comprise individual and small clusters of two to three retroelements and other classes of smaller repeats. The clustering of highly repetitive elements is a striking feature of the A. thaliana genome emerging from sequence and other analyses. PMID:10751689

Wambutt, R; Murphy, G; Volckaert, G; Pohl, T; Düsterhöft, A; Stiekema, W; Entian, K D; Terryn, N; Harris, B; Ansroge, W; Brandt, P; Grivell, L; Rieger, M; Weichselgartner, M; de Simone, V; Obermaier, B; Mache, R; Müller, M; Kreis, M; Delseny, M; Puigdomenech, P; Watson, M; Schmidtheini, T; Reichert, B; Portatelle, D; Perez-Alonso, M; Bountry, M; Bancroft, I; Vos, P; Hoheisel, J; Zimmermann, W; Wedler, H; Ridley, P; Langham, S A; McCullagh, B; Bilham, L; Robben, J; Van der Schueren, J; Grymonprez, B; Chuang, Y J; Vandenbussche, F; Braeken, M; Weltjens, I; Voet, M; Bastiens, I; Aert, R; Defoor, E; Weitzenegger, T; Bothe, G; Rose, M

2000-03-31

47

Coupled amplification and sequencing of genomic DNA  

SciTech Connect

Addition of dideoxyribonucleotides during the exponential phase of the PCR should result in the synthesis of two complementary sequence ladders. The authors have explored this hypothesis to develop coupled amplification and sequencing of genomic DNA. Coupled amplification and sequencing is a biphasic method for sequencing both strands of template as they are amplified. Stage I selects and amplifies a single target form the genomic DNA sample. Stage II accomplishes the sequencing as well as additional amplification of the target using aliquots from the stage I reaction mixed with end-labeled primer and dideoxynucleotiodes. They have successfully applied coupled amplification and sequencing to a 300-base-pair fragment 4 kilobases upstream from HOX2B directly from human whole genomic DNA.

Ruano, G.; Kidd, K.K. (Yale Univ. School of Medicine, New Haven, CT (United States))

1991-04-01

48

Whole genome shotgun sequencing guided by bioinformatics pipelines—an optimized approach for an established technique  

Microsoft Academic Search

While the sequencing of bacterial genomes has become a routine procedure at major sequencing centers, there are still a number of genome projects at small- or medium-size facilities. For these facilities a maximum of control over sequencing, assembling and finishing is essential. At the same time, facilities have to be able to co-operate at minimum costs for the overall project.

Olaf Kaiser; Daniela Bartels; Thomas Bekel; Alexander Goesmann; Sebastian Kespohl; Alfred Pühler; Folker Meyer

2003-01-01

49

Genome Sequence of Lactobacillus crispatus ST1?  

PubMed Central

Lactobacillus crispatus is a common member of the beneficial microbiota present in the vertebrate gastrointestinal and human genitourinary tracts. Here, we report the genome sequence of L. crispatus ST1, a chicken isolate displaying strong adherence to vaginal epithelial cells.

Ojala, Teija; Kuparinen, Veera; Koskinen, J. Patrik; Alatalo, Edward; Holm, Liisa; Auvinen, Petri; Edelman, Sanna; Westerlund-Wikstrom, Benita; Korhonen, Timo K.; Paulin, Lars; Kankainen, Matti

2010-01-01

50

Virtually sequenced: The next genomic generation  

SciTech Connect

The announcement of {open_quotes}virtual genomics{close_quotes} requires evaluation of the efficiency and accuracy of computer-generated sequencing efforts. {open_quotes}Digital Northerns{close_quotes}, or Northern blot electrophoresis done in the realm of computer data, have been developed by Incyte Pharmaceuticals (Palo Alto, CA) and Human Genome Sciences (Rockville, MD). 12 refs., 2 figs.

Bains, W. [PA Consulting Group, Melbourn (United Kingdom)

1996-06-01

51

Finding approximate tandem repeats in genomic sequences  

Microsoft Academic Search

An efficient algorithm is presented for detecting approximate tandem repeats in genomic sequences. The algorithm is based on a flexible statistical model which allows a wide range of definitions of approximate tandem repeats. The ideas and methods underlying the algorithm are described and examined and its effectiveness on genomic data is demonstrated.

Ydo Wexler; Zohar Yakhini; Yechezkel Kashi; Dan Geiger

2004-01-01

52

Genomic sequencing of Pleistocene cave bears.  

PubMed

Despite the greater information content of genomic DNA, ancient DNA studies have largely been limited to the amplification of mitochondrial sequences. Here we describe metagenomic libraries constructed with unamplified DNA extracted from skeletal remains of two 40,000-year-old extinct cave bears. Analysis of approximately 1 megabase of sequence from each library showed that despite significant microbial contamination, 5.8 and 1.1% of clones contained cave bear inserts, yielding 26,861 base pairs of cave bear genome sequence. Comparison of cave bear and modern bear sequences revealed the evolutionary relationship of these lineages. The metagenomic approach used here establishes the feasibility of ancient DNA genome sequencing programs. PMID:15933159

Noonan, James P; Hofreiter, Michael; Smith, Doug; Priest, James R; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J Chris; Pääbo, Svante; Rubin, Edward M

2005-06-02

53

Intraspecies sequence comparisons for annotating genomes.  

PubMed

Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intraspecies sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents, and a set of genomic intervals were amplified, resequenced, and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C. intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom. It also raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. PMID:15545499

Boffelli, Dario; Weer, Claire V; Weng, Li; Lewis, Keith D; Shoukry, Malak I; Pachter, Lior; Keys, David N; Rubin, Edward M

2004-11-15

54

Complete genome sequence of arracacha mottle virus.  

PubMed

Arracacha mottle virus (AMoV) is the only potyvirus reported to infect arracacha (Arracacia xanthorrhiza) in Brazil. Here, the complete genome sequence of an isolate of AMoV was determined to be 9,630 nucleotides in length, excluding the 3' poly-A tail, and encoding a polyprotein of 3,135 amino acids and a putative P3N-PIPO protein. Its genomic organization is typical of a member of the genus Potyvirus, containing all conserved motifs. Its full genome sequence shared 56.2 % nucleotide identity with sunflower chlorotic mottle virus and verbena virus Y, the most closely related viruses. PMID:23001696

Orílio, Anelise F; Lucinda, Natalia; Dusi, André N; Nagata, Tatsuya; Inoue-Nagata, Alice K

2012-09-22

55

Genomic Sequencing of Single Microbial Cells from Environmental Samples  

SciTech Connect

Recently developed techniques allow genomic DNA sequencing from single microbial cells [Lasken RS: Single-cell genomic sequencing using multiple displacement amplification, Curr Opin Microbiol 2007, 10:510-516]. Here, we focus on research strategies for putting these methods into practice in the laboratory setting. An immediate consequence of single-cell sequencing is that it provides an alternative to culturing organisms as a prerequisite for genomic sequencing. The microgram amounts of DNA required as template are amplified from a single bacterium by a method called multiple displacement amplification (MDA) avoiding the need to grow cells. The ability to sequence DNA from individual cells will likely have an immense impact on microbiology considering the vast numbers of novel organisms, which have been inaccessible unless culture-independent methods could be used. However, special approaches have been necessary to work with amplified DNA. MDA may not recover the entire genome from the single copy present in most bacteria. Also, some sequence rearrangements can occur during the DNA amplification reaction. Over the past two years many research groups have begun to use MDA, and some practical approaches to single-cell sequencing have been developed. We review the consensus that is emerging on optimum methods, reliability of amplified template, and the proper interpretation of 'composite' genomes which result from the necessity of combining data from several single-cell MDA reactions in order to complete the assembly. Preferred laboratory methods are considered on the basis of experience at several large sequencing centers where >70% of genomes are now often recovered from single cells. Methods are reviewed for preparation of bacterial fractions from environmental samples, single-cell isolation, DNA amplification by MDA, and DNA sequencing.

Ishoey, Thomas; Woyke, Tanja; Stepanauskas, Ramunas; Novotny, Mark; Lasken, Roger S.

2008-02-01

56

Sorghum Genome Sequencing by Methylation Filtration  

Microsoft Academic Search

Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF) technology. The evidence suggests that 96% of the genes have been sequence tagged,

Joseph A Bedell; Muhammad A Budiman; Andrew Nunberg; Robert W Citek; Dan Robbins; Joshua Jones; Elizabeth Flick; Theresa Rohlfing; Jason Fries; Kourtney Bradford; Jennifer McMenamy; Michael Smith; Heather Holeman; Bruce A Roe; Graham Wiley; Ian F Korf; Pablo D Rabinowicz; Nathan Lakey; W. Richard McCombie; Jeffrey A Jeddeloh; Robert A Martienssen

2005-01-01

57

Computational Genomics: From Genome Sequence To Global Gene Regulation  

NASA Astrophysics Data System (ADS)

As various genome projects are shifting to the post-sequencing phase, it becomes a big challenge to analyze the sequence data and extract biological information using computational tools. In the past, computational genomics has mainly focused on finding new genes and mapping out their biological functions. With the rapid accumulation of experimental data on genome-wide gene activities, it is now possible to understand how genes are regulated on a genomic scale. A major mechanism for gene regulation is to control the level of transcription, which is achieved by regulatory proteins that bind to short DNA sequences - the regulatory elements. We have developed a new approach to identifying regulatory elements in genomes. The approach formalizes how one would proceed to decipher a ``text'' consisting of a long string of letters written in an unknown language that did not delineate words. The algorithm is based on a statistical mechanics model in which the sequence is segmented probabilistically into ``words'' and a ``dictionary'' of ``words'' is built concurrently. For the control regions in the yeast genome, we built a ``dictionary'' of about one thousand words which includes many known as well as putative regulatory elements. I will discuss how we can use this dictionary to search for genes that are likely to be regulated in a similar fashion and to analyze gene expression data generated from DNA micro-array experiments.

Li, Hao

2000-03-01

58

Latent Periodicities in Genome Sequences  

Microsoft Academic Search

A novel approach is presented for the detection of periodicities in DNA sequences. A DNA sequence can be modelled as a nonstationary stochastic process that exhibits various statistical periodicities over different regions. The coding part of the DNA, for instance, exhibits statistical periodicity with period three. Such regions in DNA are modelled as generated from a collection of information sources

Raman Arora; William A. Sethares; James A. Bucklew

2008-01-01

59

International Rice Genome Sequencing Project: the effort to completely sequence the rice genome  

Microsoft Academic Search

The International Rice Genome Sequencing Project (IRGSP) involves researchers from ten countries who are working to completely and accurately sequence the rice genome within a short period. Sequencing uses a map-based clone-by-clone shotgun strategy; shared bacterial artificial chromosome\\/ P1-derived artificial chromosome libraries have been constructed from Oryza sativa ssp. japonica variety ‘Nipponbare’. End-sequencing, fingerprinting and marker-aided PCR screening are being

Takuji Sasaki; Benjamin Burr

2000-01-01

60

Genome Sequencing, Assembly and Gene Prediction in Fungi  

Microsoft Academic Search

Genome sequencing and the science of genomics is now being applied to the study of fungi. Although resources have been slow in coming, a number of fungi are now being sequenced and an increasingly diverse array of these organisms are being considered as candidates for whole genome sequencing. Currently there are only two complete fungal genome sequences available, those of

Brendan Loftus

2003-01-01

61

Finishing the euchromatic sequence of the human genome  

Microsoft Academic Search

The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and

2004-01-01

62

Genome Sequence of Yersinia pestis KIM†  

PubMed Central

We present the complete genome sequence of Yersinia pestis KIM, the etiologic agent of bubonic and pneumonic plague. The strain KIM, biovar Mediaevalis, is associated with the second pandemic, including the Black Death. The 4.6-Mb genome encodes 4,198 open reading frames (ORFs). The origin, terminus, and most genes encoding DNA replication proteins are similar to those of Escherichia coli K-12. The KIM genome sequence was compared with that of Y. pestis CO92, biovar Orientalis, revealing homologous sequences but a remarkable amount of genome rearrangement for strains so closely related. The differences appear to result from multiple inversions of genome segments at insertion sequences, in a manner consistent with present knowledge of replication and recombination. There are few differences attributable to horizontal transfer. The KIM and E. coli K-12 genome proteins were also compared, exposing surprising amounts of locally colinear “backbone,” or synteny, that is not discernible at the nucleotide level. Nearly 54% of KIM ORFs are significantly similar to K-12 proteins, with conserved housekeeping functions. However, a number of E. coli pathways and transport systems and at least one global regulator were not found, reflecting differences in lifestyle between them. In KIM-specific islands, new genes encode candidate pathogenicity proteins, including iron transport systems, putative adhesins, toxins, and fimbriae.

Deng, Wen; Burland, Valerie; Plunkett III, Guy; Boutin, Adam; Mayhew, George F.; Liss, Paul; Perna, Nicole T.; Rose, Debra J.; Mau, Bob; Zhou, Shiguo; Schwartz, David C.; Fetherston, Jaqueline D.; Lindler, Luther E.; Brubaker, Robert R.; Plano, Gregory V.; Straley, Susan C.; McDonough, Kathleen A.; Nilles, Matthew L.; Matson, Jyl S.; Blattner, Frederick R.; Perry, Robert D.

2002-01-01

63

Using comparative genomics to reorder the human genome sequence into a virtual sheep genome  

Microsoft Academic Search

BACKGROUND: Is it possible to construct an accurate and detailed subgene-level map of a genome using bacterial artificial chromosome (BAC) end sequences, a sparse marker map, and the sequences of other genomes? RESULTS: A sheep BAC library, CHORI-243, was constructed and the BAC end sequences were determined and mapped with high sensitivity and low specificity onto the frameworks of the

Brian P Dalrymple; Ewen F Kirkness; Mikhail Nefedov; Sean McWilliam; Abhirami Ratnakumar; Wes Barris; Shaying Zhao; Jyoti Shetty; Jillian F Maddox; Margaret O'Grady; Frank Nicholas; Allan M Crawford; Tim Smith; Pieter J de Jong; John McEwan; V Hutton Oddy; Noelle E Cockett

2007-01-01

64

Accelerating Genome Sequencing 100X with FPGAs  

SciTech Connect

The performance of two Cray XD1 systems with Virtex-II Pro 50 and Virtex-4 LX160 FPGAs was evaluated using the FASTA computational biology program for human genome (DNA and protein) sequence comparisons. FPGA speedups of 50X (Virtex-II Pro 50) and 100X (Virtex-4 LX160) over a 2.2 GHz Opteron were obtained. FPGA coding issues for human genome data are described.

Storaasli, Olaf O [ORNL; Strenski, Dave [Cray, Inc.

2007-01-01

65

Genome sequence of Haemophilus parasuis strain 29755.  

PubMed

Haemophilus parasuis is a member of the family Pasteurellaceae and is the etiologic agent of Glässer's disease in pigs, a systemic syndrome associated with only a subset of isolates. The genetic basis for virulence and systemic spread of particular H. parasuis isolates is currently unknown. Strain 29755 is an invasive isolate that has long been used in the study of Glässer's disease. Accordingly, the genome sequence of strain 29755 is of considerable importance to investigators endeavoring to understand the molecular pathogenesis of H. parasuis. Here we describe the features of the 2,224,137 bp draft genome sequence of strain 29755 generated from 454-FLX pyrosequencing. These data comprise the first publicly available genome sequence for this bacterium. PMID:22180811

Mullins, Michael A; Register, Karen B; Bayles, Darrell O; Dyer, David W; Kuehn, Joanna S; Phillips, Gregory J

2011-09-23

66

Genome sequence of Haemophilus parasuis strain 29755  

PubMed Central

Haemophilus parasuis is a member of the family Pasteurellaceae and is the etiologic agent of Glässer’s disease in pigs, a systemic syndrome associated with only a subset of isolates. The genetic basis for virulence and systemic spread of particular H. parasuis isolates is currently unknown. Strain 29755 is an invasive isolate that has long been used in the study of Glässer’s disease. Accordingly, the genome sequence of strain 29755 is of considerable importance to investigators endeavoring to understand the molecular pathogenesis of H. parasuis. Here we describe the features of the 2,224,137 bp draft genome sequence of strain 29755 generated from 454-FLX pyrosequencing. These data comprise the first publicly available genome sequence for this bacterium.

Mullins, Michael A.; Bayles, Darrell O.; Dyer, David W.; Kuehn, Joanna S.; Phillips, Gregory J.

2011-01-01

67

Sequencing and analysis of a genomic fragment provide an insight into the Dunaliella viridis genomic sequence.  

PubMed

Dunaliella is a genus of wall-less unicellular eukaryotic green alga. Its exceptional resistances to salt and various other stresses have made it an ideal model for stress tolerance study. However, very little is known about its genome and genomic sequences. In this study, we sequenced and analyzed a 29,268 bp genomic fragment from Dunaliella viridis. The fragment showed low sequence homology to the GenBank database. At the nucleotide level, only a segment with significant sequence homology to 18S rRNA was found. The fragment contained six putative genes, but only one gene showed significant homology at the protein level to GenBank database. The average GC content of this sequence was 51.1%, which was much lower than that of close related green algae Chlamydomonas (65.7%). Significant segmental duplications were found within this fragment. The duplicated sequences accounted for about 35.7% of the entire region. Large amounts of simple sequence repeats (microsatellites) were found, with strong bias towards (AC)(n) type (76%). Analysis of other Dunaliella genomic sequences in the GenBank database (total 25,749 bp) was in agreement with these findings. These sequence features made it difficult to sequence Dunaliella genomic sequences. Further investigation should be made to reveal the biological significance of these unique sequence features. PMID:17091199

Sun, Xiao-Ming; Tang, Yuan-Ping; Meng, Xiang-Zong; Zhang, Wen-Wen; Li, Shan; Deng, Zhi-Rui; Xu, Zheng-Kai; Song, Ren-Tao

2006-11-01

68

Scoring Pairwise Genomic Sequence Alignments  

Microsoft Academic Search

IntroductionMost sequence alignment programs employ an explicit scheme for assigning ascore to every possible alignment. This provides the criterion to prefer onealignment over another. Alignment scores typically involve a score for eachpossible aligned pair of symbols, together with a penalty for each gap in thealignment. For protein alignments, the scores for all possible aligned pairsconstitute a 20-by-20 substitution matrix. Amino

F. Chiaromonte; V. B. Yap; W. Miller

2002-01-01

69

The Trichomonas vaginalis Genome Sequencing Project  

NSDL National Science Digital Library

The Institute for Genomic Research (TIGR) in 2003 released the first draft assembly of the Trichomonas vaginalis_genome, available through this website to the academic and not-for-profit research community for noncommercial use only. TIGR will release more data at regular intervals during the sequencing project, which should help researchers better understand this widespread parasite and its role in HIV infection, neo-natal disorders, predisposition to cervical cancer, and of course, vaginitis. The website also includes background information on T. vaginalis, as well as a link to TIGR's sequencing project for Entamoeba histolytica -- a closely related organism.

70

Complete Genome Sequence of Ikoma Lyssavirus  

PubMed Central

Lyssaviruses (family Rhabdoviridae) constitute one of the most important groups of viral zoonoses globally. All lyssaviruses cause the disease rabies, an acute progressive encephalitis for which, once symptoms occur, there is no effective cure. Currently available vaccines are highly protective against the predominantly circulating lyssavirus species. Using next-generation sequencing technologies, we have obtained the whole-genome sequence for a novel lyssavirus, Ikoma lyssavirus (IKOV), isolated from an African civet in Tanzania displaying clinical signs of rabies. Genetically, this virus is the most divergent within the genus Lyssavirus. Characterization of the genome will help to improve our understanding of lyssavirus diversity and enable investigation into vaccine-induced immunity and protection.

Marston, Denise A.; Ellis, Richard J.; Horton, Daniel L.; Kuzmin, Ivan V.; Wise, Emma L.; McElhinney, Lorraine M.; Banyard, Ashley C.; Ngeleja, Chanasa; Keyyu, Julius; Cleaveland, Sarah; Lembo, Tiziana; Rupprecht, Charles E.

2012-01-01

71

Complete genome sequence of Caulobacter crescentus.  

PubMed

The complete genome sequence of Caulobacter crescentus was determined to be 4,016,942 base pairs in a single circular chromosome encoding 3,767 genes. This organism, which grows in a dilute aquatic environment, coordinates the cell division cycle and multiple cell differentiation events. With the annotated genome sequence, a full description of the genetic network that controls bacterial differentiation, cell growth, and cell cycle progression is within reach. Two-component signal transduction proteins are known to play a significant role in cell cycle progression. Genome analysis revealed that the C. crescentus genome encodes a significantly higher number of these signaling proteins (105) than any bacterial genome sequenced thus far. Another regulatory mechanism involved in cell cycle progression is DNA methylation. The occurrence of the recognition sequence for an essential DNA methylating enzyme that is required for cell cycle regulation is severely limited and shows a bias to intergenic regions. The genome contains multiple clusters of genes encoding proteins essential for survival in a nutrient poor habitat. Included are those involved in chemotaxis, outer membrane channel function, degradation of aromatic ring compounds, and the breakdown of plant-derived carbon sources, in addition to many extracytoplasmic function sigma factors, providing the organism with the ability to respond to a wide range of environmental fluctuations. C. crescentus is, to our knowledge, the first free-living alpha-class proteobacterium to be sequenced and will serve as a foundation for exploring the biology of this group of bacteria, which includes the obligate endosymbiont and human pathogen Rickettsia prowazekii, the plant pathogen Agrobacterium tumefaciens, and the bovine and human pathogen Brucella abortus. PMID:11259647

Nierman, W C; Feldblyum, T V; Laub, M T; Paulsen, I T; Nelson, K E; Eisen, J A; Heidelberg, J F; Alley, M R; Ohta, N; Maddock, J R; Potocka, I; Nelson, W C; Newton, A; Stephens, C; Phadke, N D; Ely, B; DeBoy, R T; Dodson, R J; Durkin, A S; Gwinn, M L; Haft, D H; Kolonay, J F; Smit, J; Craven, M B; Khouri, H; Shetty, J; Berry, K; Utterback, T; Tran, K; Wolf, A; Vamathevan, J; Ermolaeva, M; White, O; Salzberg, S L; Venter, J C; Shapiro, L; Fraser, C M; Eisen, J

2001-03-20

72

Complete genome sequence of Caulobacter crescentus  

PubMed Central

The complete genome sequence of Caulobacter crescentus was determined to be 4,016,942 base pairs in a single circular chromosome encoding 3,767 genes. This organism, which grows in a dilute aquatic environment, coordinates the cell division cycle and multiple cell differentiation events. With the annotated genome sequence, a full description of the genetic network that controls bacterial differentiation, cell growth, and cell cycle progression is within reach. Two-component signal transduction proteins are known to play a significant role in cell cycle progression. Genome analysis revealed that the C. crescentus genome encodes a significantly higher number of these signaling proteins (105) than any bacterial genome sequenced thus far. Another regulatory mechanism involved in cell cycle progression is DNA methylation. The occurrence of the recognition sequence for an essential DNA methylating enzyme that is required for cell cycle regulation is severely limited and shows a bias to intergenic regions. The genome contains multiple clusters of genes encoding proteins essential for survival in a nutrient poor habitat. Included are those involved in chemotaxis, outer membrane channel function, degradation of aromatic ring compounds, and the breakdown of plant-derived carbon sources, in addition to many extracytoplasmic function sigma factors, providing the organism with the ability to respond to a wide range of environmental fluctuations. C. crescentus is, to our knowledge, the first free-living ?-class proteobacterium to be sequenced and will serve as a foundation for exploring the biology of this group of bacteria, which includes the obligate endosymbiont and human pathogen Rickettsia prowazekii, the plant pathogen Agrobacterium tumefaciens, and the bovine and human pathogen Brucella abortus.

Nierman, William C.; Feldblyum, Tamara V.; Laub, Michael T.; Paulsen, Ian T.; Nelson, Karen E.; Eisen, Jonathan; Heidelberg, John F.; Alley, M. R. K.; Ohta, Noriko; Maddock, Janine R.; Potocka, Isabel; Nelson, William C.; Newton, Austin; Stephens, Craig; Phadke, Nikhil D.; Ely, Bert; DeBoy, Robert T.; Dodson, Robert J.; Durkin, A. Scott; Gwinn, Michelle L.; Haft, Daniel H.; Kolonay, James F.; Smit, John; Craven, M. B.; Khouri, Hoda; Shetty, Jyoti; Berry, Kristi; Utterback, Teresa; Tran, Kevin; Wolf, Alex; Vamathevan, Jessica; Ermolaeva, Maria; White, Owen; Salzberg, Steven L.; Venter, J. Craig; Shapiro, Lucy; Fraser, Claire M.

2001-01-01

73

Genomic Sequencing of Pleistocene Cave Bears  

Microsoft Academic Search

Despite the greater information content of genomic DNA, ancient DNA studies have largely been limited to the amplification of mitochondrial sequences. Here we describe metagenomic libraries constructed with unamplified DNA extracted from skeletal remains of two 40,000-year-old extinct cave bears. Analysis of ~1 megabase of sequence from each library showed that despite significant microbial contamination, 5.8 and 1.1% of clones

James P. Noonan; Michael Hofreiter; Doug Smith; James R. Priest; Nadin Rohland; Gernot Rabeder; Johannes Krause; J. Chris Detter; Svante Pääbo; Edward M. Rubin

2005-01-01

74

Genome Sequence Assembly Using Trace Signals and Additional Sequence Information  

Microsoft Academic Search

Motivation: This article presents a method for as- sembling shotgun sequences which primarily uses high confidence regions whilst taking advantage of additional available information such as low con- fidence regions, quality values or repetitive region tags. Conflict situations are resolved with routines for analysing trace signals. Results: Initial tests with different human and mouse genome projects showed promising results but

Bastien Chevreux; Thomas Wetter; Sándor Suhai

1999-01-01

75

Mapping and sequencing the human genome  

SciTech Connect

Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.

none,

1988-01-01

76

Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria  

PubMed Central

Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the “gold standard” of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available to scientists and routine diagnostic laboratories. Currently, the cost is below that of traditional MLST. The new challenges will be how to extract the relevant information from the large amount of data so as to allow for comparison over time and between laboratories. Ideally, this information should also allow for comparison to historical data. We developed a Web-based method for MLST of 66 bacterial species based on WGS data. As input, the method uses short sequence reads from four sequencing platforms or preassembled genomes. Updates from the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56 MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types of their isolates on the basis of WGS data. This method is publicly available at www.cbs.dtu.dk/services/MLST.

Cosentino, Salvatore; Rasmussen, Simon; Friis, Carsten; Hasman, Henrik; Marvig, Rasmus Lykke; Jelsbak, Lars; Sicheritz-Ponten, Thomas; Ussery, David W.; Aarestrup, Frank M.; Lund, Ole

2012-01-01

77

The genome sequence DataBase.  

PubMed

The Genome Sequence DataBase (GSDB) is a database of publicly available nucleotide sequences and their associated biological and bibliographic information. Several notable changes have occurred in the past year: GSDB stopped accepting data submissions from researchers; ownership of data submitted to GSDB was transferred to GenBank; sequence analysis capabilities were expanded to include Smith-Waterman and Frame Search; and Sequence Viewer became available to Mac users. The content of GSDB remains up-to-date because publicly available data is acquired from the International Nucleotide Sequence Database Collaboration databases (IC) on a nightly basis. This allows GSDB to continue providing researchers with the ability to analyze, query and retrieve nucleotide sequences in the database. GSDB and its related tools are freely accessible from the URL: http://www.ncgr.org PMID:10592174

Harger, C; Chen, G; Farmer, A; Huang, W; Inman, J; Kiphart, D; Schilkey, F; Skupski, M P; Weller, J

2000-01-01

78

The Genome Sequence DataBase  

PubMed Central

The Genome Sequence DataBase (GSDB) is a database of publicly available nucleotide sequences and their associated biological and bibliographic information. Several notable changes have occurred in the past year: GSDB stopped accepting data submissions from researchers; ownership of data submitted to GSDB was transferred to GenBank; sequence analysis capabilities were expanded to include Smith–Waterman and Frame Search; and Sequence Viewer became available to Mac users. The content of GSDB remains up-to-date because publicly available data is acquired from the International Nucleotide Sequence Database Collaboration databases (IC) on a nightly basis. This allows GSDB to continue providing researchers with the ability to analyze, query and retrieve nucleotide sequences in the database. GSDB and its related tools are freely accessible from the URL: http://www.ncgr.org

Harger, C.; Chen, G.; Farmer, A.; Huang, W.; Inman, J.; Kiphart, D.; Schilkey, F.; Skupski, M. P.; Weller, J.

2000-01-01

79

Genome sequence of Lactobacillus versmoldensis KCTC 3814.  

PubMed

Lactobacillus versmoldensis KCTC 3814 was isolated from raw fermented poultry salami. The species was present in high numbers and frequently dominated the lactic acid bacteria (LAB) populations of the products. Here, we announce the draft genome sequence of Lactobacillus versmoldensis KCTC 3814, isolated from poultry salami, and describe major findings from its annotation. PMID:21914893

Kim, Dae-Soo; Choi, Sang-Haeng; Kim, Dong-Wook; Kim, Ryong Nam; Nam, Seong-Hyeuk; Kang, Aram; Kim, Aeri; Park, Hong-Seog

2011-10-01

80

VIRAL SEQUENCES INTEGRATED INTO PLANT GENOMES  

Microsoft Academic Search

? Abstract Sequences of various DNA plant viruses have been found,integrated into the host genome. There are two forms of integrant, those that can form episomal viral infections and those that cannot. Integrants of three pararetroviruses, Banana streak virus (BSV), Tobacco vein clearing virus(TVCV), and Petunia vein clearing virus (PVCV), can generate episomal infections in certain hybrid plant hosts in

Glyn Harper; Roger Hull; Ben Lockhart; Neil Olszewski

2002-01-01

81

Hardware accelerator for genomic sequence alignment.  

PubMed

To infer homology and subsequently gene function, the Smith-Waterman algorithm is used to find the optimal local alignment between two sequences. When searching sequence databases that may contain billions of sequences, this algorithm becomes computationally expensive. Consequently, in this paper, we focused on accelerating the Smith-Waterman algorithm by modifying the computationally repeated portion of the algorithm by FPGA hardware custom instructions. These simple modifications accelerated the algorithm runtime by an average of 287% compared to the pure software implementation. Therefore, further design of FPGA accelerated hardware offers a promising direction to seeking runtime improvement of genomic database searching. PMID:17946720

Chiang, Jason; Studniberg, Michael; Shaw, Jack; Seto, Shaw; Truong, Kevin

2006-01-01

82

Comparison of Sample Sequences of the Salmonella typhi Genome to the Sequence of the Complete Escherichia coli K-12 Genome  

Microsoft Academic Search

Raw sequence data representing the majority of a bacterial genome can be obtained at a tiny fraction of the cost of a completed sequence. To demonstrate the utility of such a resource, 870 single-stranded M13 clones were sequenced from a shotgun library of the Salmonella typhi Ty2 genome. The sequence reads averaged over 400 bases and sampled the genome with

MICHAEL MCCLELLAND; RICHARD K. WILSON

1998-01-01

83

Cancer Genome Sequencing—An Interim Analysis  

Microsoft Academic Search

Abstract With the publishing,of the first complete,whole,genome,of a human cancer and its paired normal, we have passed a key milestone,in the,cancer,genome,sequencing,strategy. The generation of such data will, thanks to technical advances, soon,become,commonplace.,As a significant number,of proof- of-concept studies have been published, it is important to analyze,now,the likely implications,of these data and how,this information,might,frame,cancer,research,in the near future. The diversity of genes

Edward J. Fox; Jesse J. Salk; Lawrence A. Loeb

84

Sequencing of Seven Haloarchaeal Genomes Reveals Patterns of Genomic Flux  

PubMed Central

We report the sequencing of seven genomes from two haloarchaeal genera, Haloferax and Haloarcula. Ease of cultivation and the existence of well-developed genetic and biochemical tools for several diverse haloarchaeal species make haloarchaea a model group for the study of archaeal biology. The unique physiological properties of these organisms also make them good candidates for novel enzyme discovery for biotechnological applications. Seven genomes were sequenced to ?20×coverage and assembled to an average of 50 contigs (range 5 scaffolds - 168 contigs). Comparisons of protein-coding gene compliments revealed large-scale differences in COG functional group enrichment between these genera. Analysis of genes encoding machinery for DNA metabolism reveals genera-specific expansions of the general transcription factor TATA binding protein as well as a history of extensive duplication and horizontal transfer of the proliferating cell nuclear antigen. Insights gained from this study emphasize the importance of haloarchaea for investigation of archaeal biology.

Lynch, Erin A.; Langille, Morgan G. I.; Darling, Aaron; Wilbanks, Elizabeth G.; Haltiner, Caitlin; Shao, Katie S. Y.; Starr, Michael O.; Teiling, Clotilde; Harkins, Timothy T.; Edwards, Robert A.; Eisen, Jonathan A.; Facciotti, Marc T.

2012-01-01

85

Whole-genome sequencing in bacteriology: state of the art  

PubMed Central

Over the last ten years, genome sequencing capabilities have expanded exponentially. There have been tremendous advances in sequencing technology, DNA sample preparation, genome assembly, and data analysis. This has led to advances in a number of facets of bacterial genomics, including metagenomics, clinical medicine, bacterial archaeology, and bacterial evolution. This review examines the strengths and weaknesses of techniques in bacterial genome sequencing, upcoming technologies, and assembly techniques, as well as highlighting recent studies that highlight new applications for bacterial genomics.

Dark, Michael J

2013-01-01

86

The Norway spruce genome sequence and conifer genome evolution.  

PubMed

Conifers have dominated forests for more than 200?million years and are of huge ecological and economic importance. Here we present the draft assembly of the 20-gigabase genome of Norway spruce (Picea abies), the first available for any gymnosperm. The number of well-supported genes (28,354) is similar to the >100 times smaller genome of Arabidopsis thaliana, and there is no evidence of a recent whole-genome duplication in the gymnosperm lineage. Instead, the large genome size seems to result from the slow and steady accumulation of a diverse set of long-terminal repeat transposable elements, possibly owing to the lack of an efficient elimination mechanism. Comparative sequencing of Pinus sylvestris, Abies sibirica, Juniperus communis, Taxus baccata and Gnetum gnemon reveals that the transposable element diversity is shared among extant conifers. Expression of 24-nucleotide small RNAs, previously implicated in transposable element silencing, is tissue-specific and much lower than in other plants. We further identify numerous long (>10,000?base pairs) introns, gene-like fragments, uncharacterized long non-coding RNAs and short RNAs. This opens up new genomic avenues for conifer forestry and breeding. PMID:23698360

Nystedt, Björn; Street, Nathaniel R; Wetterbom, Anna; Zuccolo, Andrea; Lin, Yao-Cheng; Scofield, Douglas G; Vezzi, Francesco; Delhomme, Nicolas; Giacomello, Stefania; Alexeyenko, Andrey; Vicedomini, Riccardo; Sahlin, Kristoffer; Sherwood, Ellen; Elfstrand, Malin; Gramzow, Lydia; Holmberg, Kristina; Hällman, Jimmie; Keech, Olivier; Klasson, Lisa; Koriabine, Maxim; Kucukoglu, Melis; Käller, Max; Luthman, Johannes; Lysholm, Fredrik; Niittylä, Totte; Olson, Ake; Rilakovic, Nemanja; Ritland, Carol; Rosselló, Josep A; Sena, Juliana; Svensson, Thomas; Talavera-López, Carlos; Theißen, Günter; Tuominen, Hannele; Vanneste, Kevin; Wu, Zhi-Qiang; Zhang, Bo; Zerbe, Philipp; Arvestad, Lars; Bhalerao, Rishikesh; Bohlmann, Joerg; Bousquet, Jean; Garcia Gil, Rosario; Hvidsten, Torgeir R; de Jong, Pieter; MacKay, John; Morgante, Michele; Ritland, Kermit; Sundberg, Björn; Thompson, Stacey Lee; Van de Peer, Yves; Andersson, Björn; Nilsson, Ove; Ingvarsson, Pär K; Lundeberg, Joakim; Jansson, Stefan

2013-05-22

87

Complete genome sequence of Pyrobaculum oguniense  

PubMed Central

Pyrobaculum oguniense TE7 is an aerobic hyperthermophilic crenarchaeon isolated from a hot spring in Japan. Here we describe its main chromosome of 2,436,033 bp, with three large-scale inversions and an extra-chromosomal element of 16,887 bp. We have annotated 2,800 protein-coding genes and 145 RNA genes in this genome, including nine H/ACA-like small RNA, 83 predicted C/D box small RNA, and 47 transfer RNA genes. Comparative analyses with the closest known relative, the anaerobe Pyrobaculum arsenaticum from Italy, reveals unexpectedly high synteny and nucleotide identity between these two geographically distant species. Deep sequencing of a mixture of genomic DNA from multiple cells has illuminated some of the genome dynamics potentially shared with other species in this genus.

Bernick, David L.; Karplus, Kevin; Lui, Lauren M.; Coker, Joanna K. C.; Murphy, Julie N.; Chan, Patricia P.; Cozen, Aaron E.

2012-01-01

88

Genomic multiple sequence alignments: refinement using a genetic algorithm  

Microsoft Academic Search

BACKGROUND: Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a

Chunlin Wang; Elliot J. Lefkowitz

2005-01-01

89

Initial sequencing and comparative analysis of the mouse genome  

Microsoft Academic Search

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing

Robert H. Waterston; Kerstin Lindblad-Toh; Ewan Birney; Jane Rogers; Josep F. Abril; Pankaj Agarwal; Richa Agarwala; Rachel Ainscough; Marina Alexandersson; Peter An; Stylianos E. Antonarakis; John Attwood; Robert Baertsch; Jonathon Bailey; Karen Barlow; Stephan Beck; Eric Berry; Bruce Birren; Toby Bloom; Peer Bork; Marc Botcherby; Nicolas Bray; Michael R. Brent; Daniel G. Brown; Stephen D. Brown; Carol Bult; John Burton; Jonathan Butler; Robert D. Campbell; Piero Carninci; Simon Cawley; Francesca Chiaromonte; Asif T. Chinwalla; Deanna M. Church; Michele Clamp; Christopher Clee; Francis S. Collins; Lisa L. Cook; Richard R. Copley; Alan Coulson; Olivier Couronne; James Cuff; Val Curwen; Tim Cutts; Mark Daly; Robert David; Joy Davies; Kimberly D. Delehaunty; Justin Deri; Emmanouil T. Dermitzakis; Colin Dewey; Nicholas J. Dickens; Mark Diekhans; Sheila Dodge; Inna Dubchak; Diane M. Dunn; Sean R. Eddy; Laura Elnitski; Richard D. Emes; Pallavi Eswara; Eduardo Eyras; Adam Felsenfeld; Ginger A. Fewell; Paul Flicek; Karen Foley; Wayne N. Frankel; Lucinda A. Fulton; Robert S. Fulton; Terrence S. Furey; Diane Gage; Richard A. Gibbs; Gustavo Glusman; Sante Gnerre; Nick Goldman; Leo Goodstadt; Darren Grafham; Tina A. Graves; Eric D. Green; Simon Gregory; Roderic Guigó; Mark Guyer; Ross C. Hardison; David Haussler; Yoshihide Hayashizaki; LaDeana W. Hillier; Angela Hinrichs; Wratko Hlavina; Timothy Holzer; Fan Hsu; Axin Hua; Tim Hubbard; Adrienne Hunt; Ian Jackson; David B. Jaffe; L. Steven Johnson; Matthew Jones; Thomas A. Jones; Ann Joy; Michael Kamal; Elinor K. Karlsson; Donna Karolchik; Arkadiusz Kasprzyk; Jun Kawai; Evan Keibler; Cristyn Kells; W. James Kent; Andrew Kirby; Diana L. Kolbe; Ian Korf; Raju S. Kucherlapati; Edward J. Kulbokas; David Kulp; Tom Landers; J. P. Leger; Steven Leonard; Ivica Letunic; Rosie Levine; Jia Li; Ming Li; Christine Lloyd; Susan Lucas; Bin Ma; Donna R. Maglott; Elaine R. Mardis; Lucy Matthews; Evan Mauceli; John H. Mayer; Megan McCarthy; W. Richard McCombie; Stuart McLaren; Kirsten McLay; John D. McPherson; Jim Meldrim; Beverley Meredith; Jill P. Mesirov; Webb Miller; Tracie L. Miner; Emmanuel Mongin; Kate T. Montgomery; Michael Morgan; Richard Mott; James C. Mullikin; Donna M. Muzny; William E. Nash; Joanne O. Nelson; Michael N. Nhan; Robert Nicol; Zemin Ning; Chad Nusbaum; Michael J. O'Connor; Yasushi Okazaki; Karen Oliver; Emma Overton-Larty; Lior Pachter; Genís Parra; Kymberlie H. Pepin; Jane Peterson; Pavel Pevzner; Robert Plumb; Craig S. Pohl; Alex Poliakov; Tracy C. Ponce; Simon Potter; Michael Quail; Alexandre Reymond; Bruce A. Roe; Krishna M. Roskin; Edward M. Rubin; Alistair G. Rust; Victor Sapojnikov; Brian Schultz; Jörg Schultz; Scott Schwartz; Carol Scott; Steven Seaman; Steve Searle; Ted Sharpe; Andrew Sheridan; Ratna Shownkeen; Sarah Sims; Jonathan B. Singer; Guy Slater; Arian Smit; Douglas R. Smith; Brian Spencer; Arne Stabenau; Nicole Stange-Thomann; Charles Sugnet; Mikita Suyama; Glenn Tesler; Johanna Thompson; David Torrents; Evanne Trevaskis; John Tromp; Catherine Ucla; Abel Ureta-Vidal; Jade P. Vinson; Andrew C. von Niederhausern; Claire M. Wade; Melanie Wall; Ryan J. Weber; Robert B. Weiss; Michael C. Wendl; Anthony P. West; Kris Wetterstrand; Raymond Wheeler; Simon Whelan; Jamey Wierzbowski; David Willey; Sophie Williams; Richard K. Wilson; Eitan Winter; Kim C. Worley; Dudley Wyman; Shan Yang; Shiaw-Pyng Yang; Evgeny M. Zdobnov; Michael C. Zody; Eric S. Lander; Chris P. Ponting; Matthias S. Schwartz

2002-01-01

90

Building the sequence map of the human pan-genome  

Microsoft Academic Search

Here we integrate the de novo assembly of an Asian and an African genome with the NCBI reference human genome, as a step toward constructing the human pan-genome. We identified ?5 Mb of novel sequences not present in the reference genome in each of these assemblies. Most novel sequences are individual or population specific, as revealed by their comparison to

Ruiqiang Li; Yingrui Li; Hancheng Zheng; Ruibang Luo; Hongmei Zhu; Qibin Li; Wubin Qian; Yuanyuan Ren; Geng Tian; Jinxiang Li; Guangyu Zhou; Xuan Zhu; Honglong Wu; Junjie Qin; Xin Jin; Dongfang Li; Hongzhi Cao; Xueda Hu; Hélène Blanche; Howard Cann; Xiuqing Zhang; Songgang Li; Lars Bolund; Karsten Kristiansen; Huanming Yang; Jun Wang; Jian Wang

2009-01-01

91

Cactus: Algorithms for genome multiple sequence alignment  

PubMed Central

Much attention has been given to the problem of creating reliable multiple sequence alignments in a model incorporating substitutions, insertions, and deletions. Far less attention has been paid to the problem of optimizing alignments in the presence of more general rearrangement and copy number variation. Using Cactus graphs, recently introduced for representing sequence alignments, we describe two complementary algorithms for creating genomic alignments. We have implemented these algorithms in the new “Cactus” alignment program. We test Cactus using the Evolver genome evolution simulator, a comprehensive new tool for simulation, and show using these and existing simulations that Cactus significantly outperforms all of its peers. Finally, we make an empirical assessment of Cactus's ability to properly align genes and find interesting cases of intra-gene duplication within the primates.

Paten, Benedict; Earl, Dent; Nguyen, Ngan; Diekhans, Mark; Zerbino, Daniel; Haussler, David

2011-01-01

92

Finishing a whole-genome shotgun: Release 3 of the Drosophila melanogaster euchromatic genome sequence  

Microsoft Academic Search

BACKGROUND: The Drosophila melanogaster genome was the first metazoan genome to have been sequenced by the whole-genome shotgun (WGS) method. Two issues relating to this achievement were widely debated in the genomics community: how correct is the sequence with respect to base-pair (bp) accuracy and frequency of assembly errors? And, how difficult is it to bring a WGS sequence to

Susan E Celniker; David A Wheeler; Brent Kronmiller; Joseph W Carlson; Aaron Halpern; Sandeep Patel; Mark Adams; Mark Champe; Shannon P Dugan; Erwin Frise; Ann Hodgson; Reed A George; Roger A Hoskins; Todd Laverty; Donna M Muzny; Catherine R Nelson; Joanne M Pacleb; Soo Park; Barret D Pfeiffer; Stephen Richards; Erica J Sodergren; Robert Svirskas; Paul E Tabor; Kenneth Wan; Mark Stapleton; Granger G Sutton; Craig Venter; George Weinstock; Steven E Scherer; Eugene W Myers; Richard A Gibbs; Gerald M Rubin

2002-01-01

93

Draft Genome Sequence of Rubrivivax gelatinosus CBS  

SciTech Connect

Rubrivivax gelatinosus CBS, a purple nonsulfur photosynthetic bacterium, can grow photosynthetically using CO and N{sub 2} as the sole carbon and nitrogen nutrients, respectively. R. gelatinosus CBS is of particular interest due to its ability to metabolize CO and yield H{sub 2}. We present the 5-Mb draft genome sequence of R. gelatinosus CBS with the goal of providing genetic insight into the metabolic properties of this bacterium.

Hu, P. S.; Lang, J.; Wawrousek, K.; Yu, J. P.; Maness, P. C.; Chen, J.

2012-06-01

94

Draft Genome Sequence of Rubrivivax gelatinosus CBS  

PubMed Central

Rubrivivax gelatinosus CBS, a purple nonsulfur photosynthetic bacterium, can grow photosynthetically using CO and N2 as the sole carbon and nitrogen nutrients, respectively. R. gelatinosus CBS is of particular interest due to its ability to metabolize CO and yield H2. We present the 5-Mb draft genome sequence of R. gelatinosus CBS with the goal of providing genetic insight into the metabolic properties of this bacterium.

Hu, Pingsha; Lang, Juan; Wawrousek, Karen; Yu, Jianping; Maness, Pin-Ching

2012-01-01

95

Sequence of the Oxytricha trifallax macronuclear genome  

Microsoft Academic Search

We propose complete sequencing of the macronuclear genome of the ciliated protozoan Oxytricha trifallax (Alveolate; class Spirotrichea). Ciliates have been important experimental organisms for over 100 years, contributing to the discovery and understanding of many essential cellular processes—including self-splicing RNA, telomere biochemistry, and transcriptional regulation by histone modification—with Oxytricha representing the lineage—the spirotrichs—with the very surprising discoveries of gene- sized

Thomas G. Doak; Glenn Herrick; Laura F. Landweber; Robert B. Weiss

96

The Predictive Capacity of Personal Genome Sequencing  

PubMed Central

New DNA sequencing methods will soon make it possible to identify all germline variants in any individual at a reasonable cost. However, the ability of whole-genome sequencing to predict predisposition to common diseases in the general population is unknown. To estimate this predictive capacity, we use the concept of a “genometype”. A specific genometype represents the genomes in the population conferring a specific level of genetic risk for a specified disease. Using this concept, we estimated the capacity of whole-genome sequencing to identify individuals at clinically significant risk for 24 different diseases. Our estimates were derived from the analysis of large numbers of monozygotic twin pairs; twins of a pair share the same genometype and therefore identical genetic risk factors. Our analyses indicate that: (i) for 23 of the 24 diseases, the majority of individuals will receive negative test results, (ii) these negative test results will, in general, not be very informative, as the risk of developing 19 of the 24 diseases in those who test negative will still be, at minimum, 50 - 80% of that in the general population, and (iii) on the positive side, in the best-case scenario more than 90% of tested individuals might be alerted to a clinically significant predisposition to at least one disease. These results have important implications for the valuation of genetic testing by industry, health insurance companies, public policy makers and consumers.

Roberts, Nicholas J.; Vogelstein, Joshua T.; Parmigiani, Giovanni; Kinzler, Kenneth W.; Vogelstein, Bert; Velculescu, Victor E.

2013-01-01

97

Global Identification of Human Transcribed Sequences with Genome Tiling Arrays  

Microsoft Academic Search

Elucidating the transcribed regions of the genome constitutes a fundamental aspect of human biology, yet this remains an outstanding problem. To comprehensively identify coding sequences, we constructed a series of high-density oligonucleotide tiling arrays representing sense and antisense strands of the entire nonrepetitive sequence of the human genome. Transcribed sequences were located across the genome via hybridization to complementary DNA

Paul Bertone; Viktor Stolc; Thomas E. Royce; Joel S. Rozowsky; Alexander E. Urban; Xiaowei Zhu; John L. Rinn; Waraporn Tongprasit; Manoj Samanta; Sherman Weissman; Mark Gerstein; Michael Snyder

2004-01-01

98

Genome Sequences of Pseudomonas spp. Isolated from Cereal Crops  

PubMed Central

Compared to those of dicot-infecting bacteria, the available genome sequences of bacteria that infect wheat and barley are limited. Herein, we report the draft genome sequences of four pseudomonads originally isolated from these cereals. These genome sequences provide a useful resource for comparative analyses within the genus and for cross-kingdom analyses of plant pathogenesis.

Stiller, Jiri; Covarelli, Lorenzo; Lindeberg, Magdalen; Shivas, Roger G.; Manners, John M.

2013-01-01

99

Genome Sequences of Pseudomonas spp. Isolated from Cereal Crops.  

PubMed

Compared to those of dicot-infecting bacteria, the available genome sequences of bacteria that infect wheat and barley are limited. Herein, we report the draft genome sequences of four pseudomonads originally isolated from these cereals. These genome sequences provide a useful resource for comparative analyses within the genus and for cross-kingdom analyses of plant pathogenesis. PMID:23661484

Gardiner, Donald M; Stiller, Jiri; Covarelli, Lorenzo; Lindeberg, Magdalen; Shivas, Roger G; Manners, John M

2013-05-09

100

Identification of ancient remains through genomic sequencing.  

PubMed

Studies of ancient DNA have been hindered by the preciousness of remains, the small quantities of undamaged DNA accessible, and the limitations associated with conventional PCR amplification. In these studies, we developed and applied a genomewide adapter-mediated emulsion PCR amplification protocol for ancient mammalian samples estimated to be between 45,000 and 69,000 yr old. Using 454 Life Sciences (Roche) and Illumina sequencing (formerly Solexa sequencing) technologies, we examined over 100 megabases of DNA from amplified extracts, revealing unbiased sequence coverage with substantial amounts of nonredundant nuclear sequences from the sample sources and negligible levels of human contamination. We consistently recorded over 500-fold increases, such that nanogram quantities of starting material could be amplified to microgram quantities. Application of our protocol to a 50,000-yr-old uncharacterized bone sample that was unsuccessful in mitochondrial PCR provided sufficient nuclear sequences for comparison with extant mammals and subsequent phylogenetic classification of the remains. The combined use of emulsion PCR amplification and high-throughput sequencing allows for the generation of large quantities of DNA sequence data from ancient remains. Using such techniques, even small amounts of ancient remains with low levels of endogenous DNA preservation may yield substantial quantities of nuclear DNA, enabling novel applications of ancient DNA genomics to the investigation of extinct phyla. PMID:18426903

Blow, Matthew J; Zhang, Tao; Woyke, Tanja; Speller, Camilla F; Krivoshapkin, Andrei; Yang, Dongya Y; Derevianko, Anatoly; Rubin, Edward M

2008-04-21

101

Data structures and compression algorithms for genomic sequence data  

Microsoft Academic Search

Motivation: The continuing exponential accumulation of full genome data, including full diploid human genomes, creates new challenges not only for understanding genomic structure, function, and evolution, but also for the storage, navigation, and privacy of genomic data. Here we develop data structures and algorithms for the efficient storage of genomic and other sequence data that may also facilitate querying and

Marty C. Brandon; Douglas C. Wallace; Pierre Baldi

2009-01-01

102

The Center for Eukaryotic Structural Genomics.  

PubMed

The Center for Eukaryotic Structural Genomics (CESG) is a "specialized" or "technology development" center supported by the Protein Structure Initiative (PSI). CESG's mission is to develop improved methods for the high-throughput solution of structures from eukaryotic proteins, with a very strong weighting toward human proteins of biomedical relevance. During the first three years of PSI-2, CESG selected targets representing 601 proteins from Homo sapiens, 33 from mouse, 10 from rat, 139 from Galdieria sulphuraria, 35 from Arabidopsis thaliana, 96 from Cyanidioschyzon merolae, 80 from Plasmodium falciparum, 24 from yeast, and about 25 from other eukaryotes. Notably, 30% of all structures of human proteins solved by the PSI Centers were determined at CESG. Whereas eukaryotic proteins generally are considered to be much more challenging targets than prokaryotic proteins, the technology now in place at CESG yields success rates that are comparable to those of the large production centers that work primarily on prokaryotic proteins. We describe here the technological innovations that underlie CESG's platforms for bioinformatics and laboratory information management, target selection, protein production, and structure determination by X-ray crystallography or NMR spectroscopy. PMID:19130299

Markley, John L; Aceti, David J; Bingman, Craig A; Fox, Brian G; Frederick, Ronnie O; Makino, Shin-ichi; Nichols, Karl W; Phillips, George N; Primm, John G; Sahu, Sarata C; Vojtik, Frank C; Volkman, Brian F; Wrobel, Russell L; Zolnai, Zsolt

2009-01-08

103

Advances in understanding cancer genomes through second-generation sequencing  

Microsoft Academic Search

Cancers are caused by the accumulation of genomic alterations. Therefore, analyses of cancer genome sequences and structures provide insights for understanding cancer biology, diagnosis and therapy. The application of second-generation DNA sequencing technologies (also known as next-generation sequencing) — through whole-genome, whole-exome and whole-transcriptome approaches — is allowing substantial advances in cancer genomics. These methods are facilitating an increase in

Stacey Gabriel; Gad Getz; Matthew Meyerson

2010-01-01

104

Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence  

Microsoft Academic Search

BACKGROUND: Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS)

Frank M You; Naxin Huo; Karin R Deal; Yong Q Gu; Ming-Cheng Luo; Patrick E McGuire; Jan Dvorak; Olin D Anderson

2011-01-01

105

The Genome Sequence DataBase: towards an integrated functional genomics resource.  

PubMed Central

During 1998 the primary focus of the Genome Sequence DataBase (GSDB; http://www.ncgr.org/gsdb ) located at the National Center for Genome Resources (NCGR) has been to improve data quality, improve data collections, and provide new methods and tools to access and analyze data. Data quality has been improved by extensive curation of certain data fields necessary for maintaining data collections and for using certain tools. Data quality has also been increased by improvements to the suite of programs that import data from the International Nucleotide Sequence Database Collaboration (IC). The Sequence Tag Alignment and Consensus Knowledgebase (STACK), a database of human expressed gene sequences developed by the South African National Bioinformatics Institute (SANBI), became available within the last year, allowing public access to this valuable resource of expressed sequences. Data access was improved by the addition of the Sequence Viewer, a platform-independent graphical viewer for GSDB sequence data. This tool has also been integrated with other searching and data retrieval tools. A BLAST homology search service was also made available, allowing researchers to search all of the data, including the unique data, that are available from GSDB. These improvements are designed to make GSDB more accessible to users, extend the rich searching capability already present in GSDB, and to facilitate the transition to an integrated system containing many different types of biological data.

Skupski, M P; Booker, M; Farmer, A; Harpold, M; Huang, W; Inman, J; Kiphart, D; Kodira, C; Root, S; Schilkey, F; Schwertfeger, J; Siepel, A; Stamper, D; Thayer, N; Thompson, R; Wortman, J; Zhuang, J J; Harger, C

1999-01-01

106

The Genome Sequence DataBase: towards an integrated functional genomics resource.  

PubMed

During 1998 the primary focus of the Genome Sequence DataBase (GSDB; http://www.ncgr.org/gsdb ) located at the National Center for Genome Resources (NCGR) has been to improve data quality, improve data collections, and provide new methods and tools to access and analyze data. Data quality has been improved by extensive curation of certain data fields necessary for maintaining data collections and for using certain tools. Data quality has also been increased by improvements to the suite of programs that import data from the International Nucleotide Sequence Database Collaboration (IC). The Sequence Tag Alignment and Consensus Knowledgebase (STACK), a database of human expressed gene sequences developed by the South African National Bioinformatics Institute (SANBI), became available within the last year, allowing public access to this valuable resource of expressed sequences. Data access was improved by the addition of the Sequence Viewer, a platform-independent graphical viewer for GSDB sequence data. This tool has also been integrated with other searching and data retrieval tools. A BLAST homology search service was also made available, allowing researchers to search all of the data, including the unique data, that are available from GSDB. These improvements are designed to make GSDB more accessible to users, extend the rich searching capability already present in GSDB, and to facilitate the transition to an integrated system containing many different types of biological data. PMID:9847136

Skupski, M P; Booker, M; Farmer, A; Harpold, M; Huang, W; Inman, J; Kiphart, D; Kodira, C; Root, S; Schilkey, F; Schwertfeger, J; Siepel, A; Stamper, D; Thayer, N; Thompson, R; Wortman, J; Zhuang, J J; Harger, C

1999-01-01

107

Aligning Two Genomic Sequences That Contain Duplications  

NASA Astrophysics Data System (ADS)

It is difficult to properly align genomic sequences that contain intra-species duplications. With this goal in mind, we have developed a tool, called TOAST (two-way orthologous alignment selection tool), for predicting whether two aligned regions from different species are orthologous, i.e., separated by a speciation event, as opposed to a duplication event. The advantage of restricting alignment to orthologous pairs is that they constitute the aligning regions that are most likely to share the same biological function, and most easily analyzed for evidence of selection. We evaluate TOAST on 12 human/mouse gene clusters.

Hou, Minmei; Riemer, Cathy; Berman, Piotr; Hardison, Ross C.; Miller, Webb

108

The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative Genomics  

PubMed Central

The soil nematodes Caenorhabditis briggsae and Caenorhabditis elegans diverged from a common ancestor roughly 100 million years ago and yet are almost indistinguishable by eye. They have the same chromosome number and genome sizes, and they occupy the same ecological niche. To explore the basis for this striking conservation of structure and function, we have sequenced the C. briggsae genome to a high-quality draft stage and compared it to the finished C. elegans sequence. We predict approximately 19,500 protein-coding genes in the C. briggsae genome, roughly the same as in C. elegans. Of these, 12,200 have clear C. elegans orthologs, a further 6,500 have one or more clearly detectable C. elegans homologs, and approximately 800 C. briggsae genes have no detectable matches in C. elegans. Almost all of the noncoding RNAs (ncRNAs) known are shared between the two species. The two genomes exhibit extensive colinearity, and the rate of divergence appears to be higher in the chromosomal arms than in the centers. Operons, a distinctive feature of C. elegans, are highly conserved in C. briggsae, with the arrangement of genes being preserved in 96% of cases. The difference in size between the C. briggsae (estimated at approximately 104 Mbp) and C. elegans (100.3 Mbp) genomes is almost entirely due to repetitive sequence, which accounts for 22.4% of the C. briggsae genome in contrast to 16.5% of the C. elegans genome. Few, if any, repeat families are shared, suggesting that most were acquired after the two species diverged or are undergoing rapid evolution. Coclustering the C. elegans and C. briggsae proteins reveals 2,169 protein families of two or more members. Most of these are shared between the two species, but some appear to be expanding or contracting, and there seem to be as many as several hundred novel C. briggsae gene families. The C. briggsae draft sequence will greatly improve the annotation of the C. elegans genome. Based on similarity to C. briggsae, we found strong evidence for 1,300 new C. elegans genes. In addition, comparisons of the two genomes will help to understand the evolutionary forces that mold nematode genomes.

2003-01-01

109

Ten years of bacterial genome sequencing: comparative-genomics-based discoveries  

Microsoft Academic Search

It has been more than 10 years since the first bacterial genome sequence was published. Hundreds of bacterial genome sequences are now available for comparative genomics, and searching a given protein against more than a thousand genomes will soon be possible. The subject of this review will address a relatively straightforward question: “What have we learned from this vast amount of

Tim T. Binnewies; Yair Motro; Peter F. Hallin; Ole Lund; David Dunn; Tom La; David J. Hampson; Matthew Bellgard; Trudy M. Wassenaar; David W. Ussery

2006-01-01

110

THE RICE GENOME: The Cereal of the World's Poor Takes Center Stage  

NSDL National Science Digital Library

Access to the article is free, however registration and sign-in are required. The milestone publication of not one, but two, draft genome sequences of rice (Oryza sativa) brought the cereal crop of the world's poor to center stage. In their Perspectives, Cantrell and Reeves discuss the potential impacts of these sequences for humankind from the standpoints of food security and combating malnutrition.

Ronald P. Cantrell (International Rice Research Institute (IRRI);); Timothy G. Reeves (International Maize and Wheat Improvement Center (CIMMYT);)

2002-04-05

111

Initial sequencing and comparative analysis of the mouse genome.  

PubMed

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism. PMID:12466850

Waterston, Robert H; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R; Brown, Daniel G; Brown, Stephen D; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T; Church, Deanna M; Clamp, Michele; Clee, Christopher; Collins, Francis S; Cook, Lisa L; Copley, Richard R; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D; Deri, Justin; Dermitzakis, Emmanouil T; Dewey, Colin; Dickens, Nicholas J; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M; Eddy, Sean R; Elnitski, Laura; Emes, Richard D; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A; Flicek, Paul; Foley, Karen; Frankel, Wayne N; Fulton, Lucinda A; Fulton, Robert S; Furey, Terrence S; Gage, Diane; Gibbs, Richard A; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A; Green, Eric D; Gregory, Simon; Guigó, Roderic; Guyer, Mark; Hardison, Ross C; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B; Johnson, L Steven; Jones, Matthew; Jones, Thomas A; Joy, Ann; Kamal, Michael; Karlsson, Elinor K; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W James; Kirby, Andrew; Kolbe, Diana L; Korf, Ian; Kucherlapati, Raju S; Kulbokas, Edward J; Kulp, David; Landers, Tom; Leger, J P; Leonard, Steven; Letunic, Ivica; Levine, Rosie; Li, Jia; Li, Ming; Lloyd, Christine; Lucas, Susan; Ma, Bin; Maglott, Donna R; Mardis, Elaine R; Matthews, Lucy; Mauceli, Evan; Mayer, John H; McCarthy, Megan; McCombie, W Richard; McLaren, Stuart; McLay, Kirsten; McPherson, John D; Meldrim, Jim; Meredith, Beverley; Mesirov, Jill P; Miller, Webb; Miner, Tracie L; Mongin, Emmanuel; Montgomery, Kate T; Morgan, Michael; Mott, Richard; Mullikin, James C; Muzny, Donna M; Nash, William E; Nelson, Joanne O; Nhan, Michael N; Nicol, Robert; Ning, Zemin; Nusbaum, Chad; O'Connor, Michael J; Okazaki, Yasushi; Oliver, Karen; Overton-Larty, Emma; Pachter, Lior; Parra, Genís; Pepin, Kymberlie H; Peterson, Jane; Pevzner, Pavel; Plumb, Robert; Pohl, Craig S; Poliakov, Alex; Ponce, Tracy C; Ponting, Chris P; Potter, Simon; Quail, Michael; Reymond, Alexandre; Roe, Bruce A; Roskin, Krishna M; Rubin, Edward M; Rust, Alistair G; Santos, Ralph; Sapojnikov, Victor; Schultz, Brian; Schultz, Jörg; Schwartz, Matthias S; Schwartz, Scott; Scott, Carol; Seaman, Steven; Searle, Steve; Sharpe, Ted; Sheridan, Andrew; Shownkeen, Ratna; Sims, Sarah; Singer, Jonathan B; Slater, Guy; Smit, Arian; Smith, Douglas R; Spencer, Brian; Stabenau, Arne; Stange-Thomann, Nicole; Sugnet, Charles; Suyama, Mikita; Tesler, Glenn; Thompson, Johanna; Torrents, David; Trevaskis, Evanne; Tromp, John; Ucla, Catherine; Ureta-Vidal, Abel; Vinson, Jade P; Von Niederhausern, Andrew C; Wade, Claire M; Wall, Melanie; Weber, Ryan J; Weiss, Robert B; Wendl, Michael C; West, Anthony P; Wetterstrand, Kris; Wheeler, Raymond; Whelan, Simon; Wierzbowski, Jamey; Willey, David; Williams, Sophie; Wilson, Richard K; Winter, Eitan; Worley, Kim C; Wyman, Dudley; Yang, Shan; Yang, Shiaw-Pyng; Zdobnov, Evgeny M; Zody, Michael C; Lander, Eric S

2002-12-01

112

Initial sequencing and comparative analysis of the mouse genome  

SciTech Connect

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F.; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E.; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R.; Brown, Daniel G.; Brown, Stephen D.; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D.; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T.; Church, Deanna M.; Clamp, Michele; Clee, Christopher; Collins, Francis S.; Cook, Lisa L.; Copley, Richard R.; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D.; Deri, Justin; Dermitzakis, Emmanouil T.; Dewey, Colin; Dickens, Nicholas J.; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M.; Eddy, Sean R.; Elnitski, Laura; Emes, Richard D.; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A.; Flicek, Paul; Foley, Karen; Frankel, Wayne N.; Fulton, Lucinda A.; Fulton, Robert S.; Furey, Terrence S.; Gage, Diane; Gibbs, Richard A.; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A.; Green, Eric D.; Gregory, Simon; Guigo, Roderic; Guyer, Mark; Hardison, Ross C.; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W.; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B.; Johnson, L. Steven; Jones, Matthew; Jones, Thomas A.; Joy, Ann; Kamal, Michael; Karlsson, Elinor K.; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W. James; Kirby, Andrew; Kolbe, Diana L.; Korf, Ian; Kucherlapati, Raju S.; Kulbokas III, Edward J.; Kulp, David; Landers, Tom; Leger, J.P.; Leonard, Steven; Letunic, Ivica; Levine, Rosie; et al.

2002-12-15

113

Research ethics and the challenge of whole-genome sequencing  

PubMed Central

The recent completion of the first two individual whole-genome sequences is a research milestone. As personal genome research advances, investigators and international research bodies must ensure ethical research conduct. We identify three major ethical considerations that have been implicated in whole-genome research: the return of research results to participants; the obligations, if any, that are owed to participants’ relatives; and the future use of samples and data taken for whole-genome sequencing. Although the issues are not new, we discuss their implications for personal genomics and provide recommendations for appropriate management in the context of research involving individual whole-genome sequencing.

McGuire, Amy L.; Caulfield, Timothy; Cho, Mildred K.

2008-01-01

114

Complete Genome Sequence of Serratia plymuthica Bacteriophage ?MAM1  

PubMed Central

A virulent bacteriophage (?MAM1) that infects Serratia plymuthica was isolated from the natural environment and characterized. Genomic sequence analysis revealed a circular double-stranded DNA sequence of 157,834 bp, encoding 198 proteins and 3 tRNAs. The ?MAM1 genome shows high homology to previously reported ViI-like enterobacterial bacteriophage genomes.

Matilla, Miguel A.

2012-01-01

115

A snapshot of the emerging tomato genome sequence  

Microsoft Academic Search

The genome of tomato (Solanum lycopersicum L.) is being sequenced by an international consortium of 10 countries (Korea, China, the United Kingdom, India, the Netherlands, France, Japan, Spain, Italy, and the United States) as part of the larger “International Solanaceae Genome Project (SOL): Systems Approach to Diversity and Adaptation” initiative. The tomato genome sequencing project uses an ordered bacterial artificial

L. A. Mueller; R. M. Klein Lankhorst; S. D. Tanksley; R. M. Peters; Staveren van M. J; E. Datema; M. W. E. J. Fiers; Ham van R. C. H. J; D. Szinay; Jong de J. H. S. G. M; N. Menda; I. Y. Tecle; A. Bombarely; S. Stack; S. M. Royer; S.-B. Chang; L. A. Shearer; B. D. Kim; S.-H. Jo; C.-G. Hur; D. Choi; C.-B. Li; J. Zhao; H. Jiang; Y. Geng; Y. Dai; H. Fan; J. Chen; F. Lu; J. Shi; S. Sun; X. Yang; C. Lu; M. Chen; Z. Cheng; H. Ling; Y. Xue; Y. Wang; G. B. Seymour; G. J. Bishop; G. Bryan; J. Rogers; S. Sims; S. Butcher; D. Buchan; J. Abbott; H. Beasley; C. Nicholson; C. Riddle; S. Humphray; K. McLaren; S. Mathur; S. Vyas; A. U. Solanke; R. Kumar; V. Gupta; A. K. Sharma; P. Khurana; J. P. Khurana; A. Tyagi; Sarita; P. Chowdhury; S. Shridhar; D. Chattopadhyay; A. Pandit; P. Singh; A. Kumar; R. Dixit; A. Singh; S. Praveen; V. Dalal; M. Yadav; I. A. Ghazi; K. Gaikwad; T. R. Sharma; T. Mohapatra; N. K. Singh; H. de Jong; S. Peters; M. van Staveren; R. C. H. J. van Ham; P. Lindhout; M. Philippot; P. Frasse; F. Regad; M. Zouine; M. Bouzayen; E. Asamizu; S. Sato; H. Fukuoka; S. Tabata; D. Shibata; M. A. Botella; M. Perez-Alonso; V. Fernandez-Pedrosa; S. Osorio; A. Mico; A. Granell; Z. Zhang; J. He; S. Huang; Y. Du; D. Qu; L. Liu; D. Liu; J. Wang; Z. Ye; W. Yang; G. Wang; A. Vezzi; S. Todesco; G. Valle; G. Falcone; M. Pietrella; G. Giuliano; S. Grandillo; A. Traini; N. D'Agostino; M. L. Chiusano; M. Ercolano; A. Barone; L. Frusciante; H. Schoof; A. Jocker; R. Bruggmann; M. Spannagl; K. X. F. Mayer; R. Guigo; F. Camara; S. Rombauts; J. A. Fawcett; Y. Van de Peer; S. Knapp; D. Zamir; W. Stiekema

2009-01-01

116

Genome sequence of Pediococcus pentosaceus strain IE-3.  

PubMed

We report the 1.8-Mb genome sequence of Pediococcus pentosaceus strain IE-3, isolated from a dairy effluent sample. The whole-genome sequence of this strain will aid in comparative genomics of Pediococcus pentosaceus strains of diverse ecological origins and their biotechnological applications. PMID:22843596

Midha, Samriti; Ranjan, Manish; Sharma, Vikas; Kumari, Annu; Singh, Pradip Kumar; Korpole, Suresh; Patil, Prabhu B

2012-08-01

117

Complete genome sequence of the alkaliphilic bacterium Bacillus halodurans and genomic sequence comparison with Bacillus subtilis  

Microsoft Academic Search

The 4 202 353 bp genome of the alkaliphilic bacterium Bacillus halodurans C-125 contains 4066 predicted protein coding sequences (CDSs), 2141 (52.7%) of which have functional assignments, 1182 (29%) of which are conserved CDSs with unknown function and 743 (18.3%) of which have no match to any protein database. Among the total CDSs, 8.8% match sequences of proteins found only

Hideto Takami; Kaoru Nakasone; Yoshihiro Takaki; Go Maeno; Rumie Sasaki; Noriaki Masui; Fumie Fuji; Chie Hirama; Yuka Nakamura; Naotake Ogasawara; Satoru Kuhara; Koki Horikoshi

2000-01-01

118

Evaluation of Genome Sequencing Quality in Selected Plant Species Using Expressed Sequence Tags  

PubMed Central

Background With the completion of genome sequencing projects for more than 30 plant species, large volumes of genome sequences have been produced and stored in online databases. Advancements in sequencing technologies have reduced the cost and time of whole genome sequencing enabling more and more plants to be subjected to genome sequencing. Despite this, genome sequence qualities of multiple plants have not been evaluated. Methodology/Principal Finding Integrity and accuracy were calculated to evaluate the genome sequence quality of 32 plants. The integrity of a genome sequence is presented by the ratio of chromosome size and genome size (or between scaffold size and genome size), which ranged from 55.31% to nearly 100%. The accuracy of genome sequence was presented by the ratio between matched EST and selected ESTs where 52.93% ? 98.28% and 89.02% ? 98.85% of the randomly selected clean ESTs could be mapped to chromosome and scaffold sequences, respectively. According to the integrity, accuracy and other analysis of each plant species, thirteen plant species were divided into four levels. Arabidopsis thaliana, Oryza sativa and Zea mays had the highest quality, followed by Brachypodium distachyon, Populus trichocarpa, Vitis vinifera and Glycine max, Sorghum bicolor, Solanum lycopersicum and Fragaria vesca, and Lotus japonicus, Medicago truncatula and Malus × domestica in that order. Assembling the scaffold sequences into chromosome sequences should be the primary task for the remaining nineteen species. Low GC content and repeat DNA influences genome sequence assembly. Conclusion The quality of plant genome sequences was found to be lower than envisaged and thus the rapid development of genome sequencing projects as well as research on bioinformatics tools and the algorithms of genome sequence assembly should provide increased processing and correction of genome sequences that have already been published.

Shangguan, Lingfei; Han, Jian; Kayesh, Emrul; Sun, Xin; Zhang, Changqing; Pervaiz, Tariq; Wen, Xicheng; Fang, Jinggui

2013-01-01

119

[Progress on whole genome sequencing in woody plants].  

PubMed

In recent years, the number of sequencing data of plant whole genome have been increasing rapidly and the whole genome sequencing has been also performed widely in woody plants. However, there are a set of obstacles in investigating the whole genome sequencing in woody plants, which include larger genome, complex genome structure, limitations of assembly, annotation, functional analysis, and restriction of the funds for scientific research. Therefore, to promote the efficiency of the whole genome sequencing in woody plants, the development and defect of this field should be analyzed. The three-generation sequencing technologies (i.e., Sanger sequencing, synthesis sequencing, and single molecule sequencing) were compared in our studies. The progress mainly focused on the whole genome sequencing in four woody plants (Populus, Grapevine, Papaya, and Apple), and the application of sequencing results also was analyzed. The future of whole genome sequencing research in woody plants, consisting of material selection, establishment of genetic map and physical map, selection of sequencing technology, bioinformatic analysis, and application of sequencing results, was discussed. PMID:22382056

Shi, Ji-Sen; Wang, Zhan-Jun; Chen, Jin-Hui

2012-02-01

120

Complete genome sequence of Liberibacter crescens BT-1.  

PubMed

Liberibacter crescens BT-1, a Gram-negative, rod-shaped bacterial isolate, was previously recovered from mountain papaya to gain insight on Huanglongbing (HLB) and Zebra Chip (ZC) diseases. The genome of BT-1 was sequenced at the Interdisciplinary Center for Biotechnology Research (ICBR) at the University of Florida. A finished assembly and annotation yielded one chromosome with a length of 1,504,659 bp and a G+C content of 35.4%. Comparison to other species in the Liberibacter genus, L. crescens has many more genes in thiamine and essential amino acid biosynthesis. This likely explains why L. crescens BT-1 is culturable while the known Liberibacter strains have not yet been cultured. Similar to CandidatusL. asiaticus psy62, the L. crescens BT-1 genome contains two prophage regions. PMID:23408754

Leonard, Michael T; Fagen, Jennie R; Davis-Richardson, Austin G; Davis, Michael J; Triplett, Eric W

2012-12-12

121

Complete genome sequence of Liberibacter crescens BT-1  

PubMed Central

Liberibacter crescens BT-1, a Gram-negative, rod-shaped bacterial isolate, was previously recovered from mountain papaya to gain insight on Huanglongbing (HLB) and Zebra Chip (ZC) diseases. The genome of BT-1 was sequenced at the Interdisciplinary Center for Biotechnology Research (ICBR) at the University of Florida. A finished assembly and annotation yielded one chromosome with a length of 1,504,659 bp and a G+C content of 35.4%. Comparison to other species in the Liberibacter genus, L. crescens has many more genes in thiamine and essential amino acid biosynthesis. This likely explains why L. crescens BT-1 is culturable while the known Liberibacter strains have not yet been cultured. Similar to Candidatus L. asiaticus psy62, the L. crescens BT-1 genome contains two prophage regions.

Leonard, Michael T.; Fagen, Jennie R.; Davis-Richardson, Austin G.; Davis, Michael J.; Triplett, Eric W.

2012-01-01

122

Detecting long tandem duplications in genomic sequences  

PubMed Central

Background Detecting duplication segments within completely sequenced genomes provides valuable information to address genome evolution and in particular the important question of the emergence of novel functions. The usual approach to gene duplication detection, based on all-pairs protein gene comparisons, provides only a restricted view of duplication. Results In this paper, we introduce ReD Tandem, a software using a flow based chaining algorithm targeted at detecting tandem duplication arrays of moderate to longer length regions, with possibly locally weak similarities, directly at the DNA level. On the A. thaliana genome, using a reference set of tandem duplicated genes built using TAIR,a we show that ReD Tandem is able to predict a large fraction of recently duplicated genes (dS?

2012-01-01

123

Methods for Obtaining and Analyzing Whole Chloroplast Genome Sequences  

Microsoft Academic Search

During the past decade, there has been a rapid increase in our understanding of plastid genome organization and evolution due to the availability of many new completely sequenced genomes. There are 45 complete genomes published and ongoing projects are likely to increase this sampling to nearly 200 genomes during the next 5 years. Several groups of researchers including ours have

Robert K. Jansen; Linda A. Raubeson; Jeffrey L. Boore; Claude W. dePamphilis; Timothy W. Chumley; Rosemarie C. Haberle; Stacia K. Wyman; Andrew J. Alverson; Rhiannon Peery; Sallie J. Herman; H. Matthew Fourcade; Jennifer V. Kuehl; Joel R. McNeal; James Leebens-Mack; Liying Cui

2005-01-01

124

Genomic Sequence Comparisons, 1987-2003 Final Report  

SciTech Connect

This project was to develop new DNA sequencing and RNA and protein quantitation methods and related genome annotation tools. The project began in 1987 with the development of multiplex sequencing (published in Science in 1988), and one of the first automated sequencing methods. This lead to the first commercial genome sequence in 1994 and to the establishment of the main commercial participants (GTC then Agencourt) in the public DOE/NIH genome project. In collaboration with GTC we contributed to one of the first complete DOE genome sequences, in 1997, that of Methanobacterium thermoautotropicum, a species of great relevance to energy-rich gas production.

George M. Church

2004-07-29

125

Whole-Genome Shotgun Sequencing of a Colonizing Multilocus Sequence Type 17 Streptococcus agalactiae Strain  

PubMed Central

This report highlights the whole-genome shotgun draft sequence for a Streptococcus agalactiae strain representing multilocus sequence type (ST) 17, isolated from a colonized woman at 8 weeks postpartum. This sequence represents an important addition to the published genomes and will promote comparative genomic studies of S. agalactiae recovered from diverse sources.

Singh, Pallavi; Springman, A. Cody; Davies, H. Dele

2012-01-01

126

The genome sequence of Podospora anserina, a classic model fungus  

PubMed Central

The completed genome sequence of the coprophilous fungus Podospora anserina increases the sampling of fungal genomes. In line with its habitat of herbivore dung, this ascomycete has an exceptionally rich gene set devoted to the catabolism of complex carbohydrates.

Paoletti, Mathieu; Saupe, Sven J

2008-01-01

127

Next Generation Sequencing at the University of Chicago Genomics Core  

ScienceCinema

The University of Chicago Genomics Core provides University of Chicago investigators (and external clients) access to State-of-the-Art genomics capabilities: next generation sequencing, Sanger sequencing / genotyping and micro-arrays (gene expression, genotyping, and methylation). The current presentation will highlight our capabilities in the area of ultra-high throughput sequencing analysis.

128

Rapid genome sequencing with short universal tiling probes  

Microsoft Academic Search

The increasing availability of high-quality reference genomic sequences has created a demand for ways to survey the sequence differences present in individual genomes. Here we describe a DNA sequencing method based on hybridization of a universal panel of tiling probes. Millions of shotgun fragments are amplified in situ and subjected to sequential hybridization with short fluorescent probes. Long fragments of

Arno Pihlak; Göran Baurén; Ellef Hersoug; Peter Lönnerberg; Ats Metsis; Sten Linnarsson

2008-01-01

129

An update and lessons from whole-genome sequencing projects  

Microsoft Academic Search

A number of prokaryotic and eukaryotic genomes are currently being sequenced. Already, the nucleotide sequences of four yeast chromosomes and of 2.2 Mb from Caenorhabditis elegans have been reported. Human genomic sequences have also been used in comparative studies with both mouse and Fugu rubripes.

Steven JM Jones

1995-01-01

130

Next-generation sequencing and potential applications in fungal genomics.  

PubMed

Since the first fungal genome was sequenced in 1996, sequencing technologies have advanced dramatically. In recent years, it has become possible to cost-effectively generate vast amounts of DNA sequence data using a number of cell- and electrophoresis-free sequencing technologies, commonly known as "next" or "second" generation. In this chapter, we present a brief overview of next-generation sequencers that are commercially available now. Their potential applications in fungal genomics studies are discussed. PMID:21590412

Sanmiguel, Phillip

2011-01-01

131

Complete genome sequence of Arcanobacterium haemolyticum type strain (11018T)  

SciTech Connect

Vulcanisaeta distributa Itoh et al. 2002 belongs to the family Thermoproteaceae in the phylum Crenarchaeota. The genus Vulcanisaeta is characterized by a global distribution in hot and acidic springs. This is the first genome sequence from a member of the genus Vulcanisaeta and seventh genome sequence in the family Thermoproteaceae. The 2,374,137 bp long genome with its 2,544 protein-coding and 49 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

Yasawong, Montri [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Teshima, Hazuki [Los Alamos National Laboratory (LANL); Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Tapia, Roxanne [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Pukall, Rudiger [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

2010-01-01

132

Reconstructing cancer genomes from paired-end sequencing data  

PubMed Central

Background A cancer genome is derived from the germline genome through a series of somatic mutations. Somatic structural variants - including duplications, deletions, inversions, translocations, and other rearrangements - result in a cancer genome that is a scrambling of intervals, or "blocks" of the germline genome sequence. We present an efficient algorithm for reconstructing the block organization of a cancer genome from paired-end DNA sequencing data. Results By aligning paired reads from a cancer genome - and a matched germline genome, if available - to the human reference genome, we derive: (i) a partition of the reference genome into intervals; (ii) adjacencies between these intervals in the cancer genome; (iii) an estimated copy number for each interval. We formulate the Copy Number and Adjacency Genome Reconstruction Problem of determining the cancer genome as a sequence of the derived intervals that is consistent with the measured adjacencies and copy numbers. We design an efficient algorithm, called Paired-end Reconstruction of Genome Organization (PREGO), to solve this problem by reducing it to an optimization problem on an interval-adjacency graph constructed from the data. The solution to the optimization problem results in an Eulerian graph, containing an alternating Eulerian tour that corresponds to a cancer genome that is consistent with the sequencing data. We apply our algorithm to five ovarian cancer genomes that were sequenced as part of The Cancer Genome Atlas. We identify numerous rearrangements, or structural variants, in these genomes, analyze reciprocal vs. non-reciprocal rearrangements, and identify rearrangements consistent with known mechanisms of duplication such as tandem duplications and breakage/fusion/bridge (B/F/B) cycles. Conclusions We demonstrate that PREGO efficiently identifies complex and biologically relevant rearrangements in cancer genome sequencing data. An implementation of the PREGO algorithm is available at http://compbio.cs.brown.edu/software/.

2012-01-01

133

Complete genome sequence of Streptosporangium roseum type strain (NI 9100).  

PubMed

Streptosporangium roseum Crauch 1955 is the type strain of the species which is the type species of the genus Streptosporangium. The 'pinkish coiled Streptomyces-like organism with a spore case' was isolated from vegetable garden soil in 1955. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the family Streptosporangiaceae, and the second largest microbial genome sequence ever deciphered. The 10,369,518 bp long genome with its 9421 protein-coding and 80 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304675

Nolan, Matt; Sikorski, Johannes; Jando, Marlen; Lucas, Susan; Lapidus, Alla; Glavina Del Rio, Tijana; Chen, Feng; Tice, Hope; Pitluck, Sam; Cheng, Jan-Fang; Chertkov, Olga; Sims, David; Meincke, Linda; Brettin, Thomas; Han, Cliff; Detter, John C; Bruce, David; Goodwin, Lynne; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Chen, Amy; Palaniappan, Krishna; Chain, Patrick; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

2010-01-28

134

Applications of next-generation sequencing technologies in functional genomics  

Microsoft Academic Search

A new generation of sequencing technologies, from Illumina\\/Solexa, ABI\\/SOLiD, 454\\/Roche, and Helicos, has provided unprecedented opportunities for high-throughput functional genomic research. To date, these technologies have been applied in a variety of contexts, including whole-genome sequencing, targeted resequencing, discovery of transcription factor binding sites, and noncoding RNA expression profiling. This review discusses applications of next-generation sequencing technologies in functional genomics

Olena Morozova; Marco A. Marra

2008-01-01

135

Genome-Scale Validation of Deep-Sequencing Libraries  

Microsoft Academic Search

Chromatin immunoprecipitation followed by high-throughput (HTP) sequencing (ChIP-seq) is a powerful tool to establish protein-DNA interactions genome-wide. The primary limitation of its broad application at present is the often-limited access to sequencers. Here we report a protocol, Mab-seq, that generates genome-scale quality evaluations for nucleic acid libraries intended for deep-sequencing. We show how commercially available genomic microarrays can be used

Dominic Schmidt; Rory Stark; Michael D. Wilson; Gordon D. Brown; Duncan T. Odom; Jürg Bähler

2008-01-01

136

Ancient human genome sequence of an extinct Palaeo-Eskimo  

Microsoft Academic Search

We report here the genome sequence of an ancient human. Obtained from ~4,000-year-old permafrost-preserved hair, the genome represents a male individual from the first known culture to settle in Greenland. Sequenced to an average depth of 20×, we recover 79% of the diploid genome, an amount close to the practical limit of current sequencing technologies. We identify 353,151 high-confidence single-nucleotide

Morten Rasmussen; Yingrui Li; Stinus Lindgreen; Jakob Skou Pedersen; Anders Albrechtsen; Ida Moltke; Mait Metspalu; Ene Metspalu; Toomas Kivisild; Ramneek Gupta; Marcelo Bertalan; Kasper Nielsen; M. Thomas; P. Gilbert; Yong Wang; Maanasa Raghavan; Paula F. Campos; Hanne Munkholm Kamp; Andrew S. Wilson; Andrew Gledhill; Silvana Tridico; Michael Bunce; Eline D. Lorenzen; Jonas Binladen; Xiaosen Guo; Jing Zhao; Xiuqing Zhang; Hao Zhang; Tracey L. Pierre; Morten Meldgaard; Sardana A. Fedorova; Ludmila P. Osipova; Thomas F. G. Higham; Christopher Bronk; Finn C. Nielsen; Michael H. Crawford; Søren Brunak; Thomas Sicheritz-Ponten; Richard Villems; Rasmus Nielsen; Anders Krogh; Jun Wang; Eske Willerslev

2010-01-01

137

Synergy between sequence and size in Large-scale genomics  

Microsoft Academic Search

Until recently the study of individual DNA sequences and of total DNA content (the C-value) sat at opposite ends of the spectrum in genome biology. For gene sequencers, the vast stretches of non-coding DNA found in eukaryotic genomes were largely considered to be an annoyance, whereas genome-size researchers attributed little relevance to specific nucleotide sequences. However, the dawn of comprehensive

T. Ryan Gregory

2005-01-01

138

Genome sequencing and analysis of the biomass-degrading fungus ...  

Treesearch

Title: Genome sequencing and analysis of the biomass-degrading fungus Trichoderma reesei (syn. ... Keywords: Cellulase, microbial metabolism, regulation, enzymes, industrial applications, biotechnology, gene expression, chemical ...

139

Complete genome sequence of Allochromatium vinosum DSM 180T  

PubMed Central

Allochromatium vinosum formerly Chromatium vinosum is a mesophilic purple sulfur bacterium belonging to the family Chromatiaceae in the bacterial class Gammaproteobacteria. The genus Allochromatium contains currently five species. All members were isolated from freshwater, brackish water or marine habitats and are predominately obligate phototrophs. Here we describe the features of the organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the Chromatiaceae within the purple sulfur bacteria thriving in globally occurring habitats. The 3,669,074 bp genome with its 3,302 protein-coding and 64 RNA genes was sequenced within the Joint Genome Institute Community Sequencing Program.

Weissgerber, Thomas; Zigann, Renate; Bruce, David; Chang, Yun-juan; Detter, John C.; Han, Cliff; Hauser, Loren; Jeffries, Cynthia D.; Land, Miriam; Munk, A. Christine; Tapia, Roxanne; Dahl, Christiane

2011-01-01

140

Genome Sequence of Lactobacillus plantarum Strain UCMA 3037  

PubMed Central

Nucleic acid of the strain Lactobacillus plantarum UCMA 3037, isolated from raw milk camembert cheese in our laboratory, was sequenced. We present its draft genome sequence with the aim of studying its functional properties and relationship to the cheese ecosystem.

Naz, Saima; Tareb, Raouf; Bernardeau, Marion; Vaisse, Melissa; Lucchetti-Miganeh, Celine; Rechenmann, Mathias

2013-01-01

141

Ultra-high Throughput Sequencing and Genomics in CDRH  

Center for Biologics Evaluation and Research (CBER)

Text VersionPage 1. Ultra-high Throughput Sequencing and Genomics in CDRH ... Page 3. Ultra-high Throughput Sequencing* • Informal scientific meetings with ... More results from www.fda.gov/downloads/advisorycommittees/committeesmeetingmaterials

142

The $1000 Genome: Ethical and Legal Issues in Whole Genome Sequencing of Individuals  

Microsoft Academic Search

Progress in gene sequencing could make rapid whole genome sequencing of individuals affordable to millions of persons and useful for many purposes in a future era of genomic medicine. Using the idea of $1000 genome as a focus, this article reviews the main technical, ethical, and legal issues that must be resolved to make mass genotyping of individuals cost-effective and

John A. Robertson

2003-01-01

143

Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome  

Microsoft Academic Search

Background: It is widely accepted that comparative sequence data can aid the functional annotation of genome sequences; however, the most informative species and features of genome evolution for comparison remain to be determined. Results: We analyzed conservation in eight genomic regions (apterous, even-skipped, fushi tarazu, twist, and Rhodopsins 1, 2, 3 and 4) from four Drosophila species (D. erecta, D.

Casey M Bergman; Barret D Pfeiffer; Diego E Rincón-Limas; Roger A Hoskins; Andreas Gnirke; Chris J Mungall; Adrienne M Wang; Brent Kronmiller; Joanne Pacleb; Soo Park; Mark Stapleton; Kenneth Wan; Reed A George; Pieter J de Jong; Juan Botas; Gerald M Rubin; Susan E Celniker

2002-01-01

144

De Novo Next Generation Sequencing of Plant Genomes  

Microsoft Academic Search

The genome sequencing of all major food and bioenergy crops is of critical importance in the race to improve crop production\\u000a to meet the future food and energy security needs of the world. Next generation sequencing technologies have brought about\\u000a great improvements in sequencing throughput and cost, but do not yet allow for de novo sequencing of large repetitive genomes

Steve Rounsley; Pradeep Reddy Marri; Yeisoo Yu; Ruifeng He; Nick Sisneros; Jose Luis Goicoechea; So Jeong Lee; Angelina Angelova; Dave Kudrna; Meizhong Luo; Jason Affourtit; Brian Desany; James Knight; Faheem Niazi; Michael Egholm; Rod A. Wing

2009-01-01

145

Genome Project Standards in a New Era of Sequencing  

SciTech Connect

For over a decade, genome 43 sequences have adhered to only two standards that are relied on for purposes of sequence analysis by interested third parties (1, 2). However, ongoing developments in revolutionary sequencing technologies have resulted in a redefinition of traditional whole genome sequencing that requires a careful reevaluation of such standards. With commercially available 454 pyrosequencing (followed by Illumina, SOLiD, and now Helicos), there has been an explosion of genomes sequenced under the moniker 'draft', however these can be very poor quality genomes (due to inherent errors in the sequencing technologies, and the inability of assembly programs to fully address these errors). Further, one can only infer that such draft genomes may be of poor quality by navigating through the databases to find the number and type of reads deposited in sequence trace repositories (and not all genomes have this available), or to identify the number of contigs or genome fragments deposited to the database. The difficulty in assessing the quality of such deposited genomes has created some havoc for genome analysis pipelines and contributed to many wasted hours of (mis)interpretation. These same novel sequencing technologies have also brought an exponential leap in raw sequencing capability, and at greatly reduced prices that have further skewed the time- and cost-ratios of draft data generation versus the painstaking process of improving and finishing a genome. The resulting effect is an ever-widening gap between drafted and finished genomes that only promises to continue (Figure 1), hence there is an urgent need to distinguish good and poor datasets. The sequencing institutes in the authorship, along with the NIH's Human Microbiome Project Jumpstart Consortium (3), strongly believe that a new set of standards is required for genome sequences. The following represents a set of six community-defined categories of genome sequence standards that better reflect the quality of the genome sequence, based on our collective understanding of the different technologies, available assemblers, and the varied efforts to improve upon drafted genomes. Due to the increasingly rapid pace of genomics we avoided the use of rigid numerical thresholds in our definitions to take into account the types of products achieved by any combination of technology, chemistry, assembler, or improvement/finishing process.

GSC Consortia; HMP Jumpstart Consortia; Chain, P. S. G.; Grafham, D. V.; Fulton, R. S.; FitzGerald, M. G.; Hostetler, J.; Muzny, D.; Detter, J. C.; Ali, J.; Birren, B.; Bruce, D. C.; Buhay, C.; Cole, J. R.; Ding, Y.; Dugan, S.; Field, D.; Garrity, G. M.; Gibbs, R.; Graves, T.; Han, C. S.; Harrison, S. H.; Highlander, S.; Hugenholtz, P.; Khouri, H. M.; Kodira, C. D.; Kolker, E.; Kyrpides, N. C.; Lang, D.; Lapidus, A.; Malfatti, S. A.; Markowitz, V.; Metha, T.; Nelson, K. E.; Parkhill, J.; Pitluck, S.; Qin, X.; Read, T. D.; Schmutz, J.; Sozhamannan, S.; Strausberg, R.; Sutton, G.; Thomson, N. R.; Tiedje, J. M.; Weinstock, G.; Wollam, A.

2009-06-01

146

Finishing The Euchromatic Sequence Of The Human Genome  

SciTech Connect

The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process.The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers {approx}99% of the euchromatic genome and is accurate to an error rate of {approx}1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number,birth and death. Notably, the human genome seems to encode only20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead.

Rubin, Edward M.; Lucas, Susan; Richardson, Paul; Rokhsar, Daniel; Pennacchio, Len

2004-09-07

147

Whole-genome sequencing and variant discovery in C. elegans  

Microsoft Academic Search

Massively parallel sequencing instruments enable rapid and inexpensive DNA sequence data production. Because these instruments are new, their data require characterization with respect to accuracy and utility. To address this, we sequenced a Caernohabditis elegans N2 Bristol strain isolate using the Solexa Sequence Analyzer, and compared the reads to the reference genome to characterize the data and to evaluate coverage

LaDeana W Hillier; Gabor T Marth; Aaron R Quinlan; David Dooling; Ginger Fewell; Derek Barnett; Paul Fox; Jarret I Glasscock; Matthew Hickenbotham; Weichun Huang; Vincent J Magrini; Ryan J Richt; Sacha N Sander; Donald A Stewart; Michael Stromberg; Eric F Tsung; Todd Wylie; Tim Schedl; Richard K Wilson; Elaine R Mardis

2008-01-01

148

Validation of rice genome sequence by optical mapping  

PubMed Central

Background Rice feeds much of the world, and possesses the simplest genome analyzed to date within the grass family, making it an economically relevant model system for other cereal crops. Although the rice genome is sequenced, validation and gap closing efforts require purely independent means for accurate finishing of sequence build data. Results To facilitate ongoing sequencing finishing and validation efforts, we have constructed a whole-genome SwaI optical restriction map of the rice genome. The physical map consists of 14 contigs, covering 12 chromosomes, with a total genome size of 382.17 Mb; this value is about 11% smaller than original estimates. 9 of the 14 optical map contigs are without gaps, covering chromosomes 1, 2, 3, 4, 5, 7, 8 10, and 12 in their entirety – including centromeres and telomeres. Alignments between optical and in silico restriction maps constructed from IRGSP (International Rice Genome Sequencing Project) and TIGR (The Institute for Genomic Research) genome sequence sources are comprehensive and informative, evidenced by map coverage across virtually all published gaps, discovery of new ones, and characterization of sequence misassemblies; all totalling ~14 Mb. Furthermore, since optical maps are ordered restriction maps, identified discordances are pinpointed on a reliable physical scaffold providing an independent resource for closure of gaps and rectification of misassemblies. Conclusion Analysis of sequence and optical mapping data effectively validates genome sequence assemblies constructed from large, repeat-rich genomes. Given this conclusion we envision new applications of such single molecule analysis that will merge advantages offered by high-resolution optical maps with inexpensive, but short sequence reads generated by emerging sequencing platforms. Lastly, map construction techniques presented here points the way to new types of comparative genome analysis that would focus on discernment of structural differences revealed by optical maps constructed from a broad range of rice subspecies and varieties.

Zhou, Shiguo; Bechner, Michael C; Place, Michael; Churas, Chris P; Pape, Louise; Leong, Sally A; Runnheim, Rod; Forrest, Dan K; Goldstein, Steve; Livny, Miron; Schwartz, David C

2007-01-01

149

Mapping and sequencing complex genomes: let's get physical!  

Microsoft Academic Search

Physical maps provide an essential framework for ordering and joining sequence data, genetically mapped markers and large-insert clones in eukaryotic genome projects. A good physical map is also an important resource for cloning specific genes of interest, comparing genomes, and understanding the size and complexity of a genome. Although physical maps are usually taken at face value, a good deal

Blake C. Meyers; Simone Scalabrin; Michele Morgante

2004-01-01

150

Beyond the Sequence: Cellular Organization of Genome Function  

Microsoft Academic Search

Genomes are more than linear sequences. In vivo they exist as elaborate physical struc- tures, and their functional properties are strongly determined by their cellular organization. I discuss here the functional relevance of spatial and temporal genome organization at three hierarchical levels: the organization of nuclear processes, the higher-order organization of the chromatin fiber, and the spatial arrangement of genomes

Tom Misteli

2007-01-01

151

Mapping copy number variation by population-scale genome sequencing  

Microsoft Academic Search

Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from

Ryan E. Mills; Klaudia Walter; Chip Stewart; Robert E. Handsaker; Ken Chen; Can Alkan; Alexej Abyzov; Seungtai Chris Yoon; Kai Ye; R. Keira Cheetham; Asif Chinwalla; Donald F. Conrad; Yutao Fu; Fabian Grubert; Iman Hajirasouliha; Fereydoun Hormozdiari; Lilia M. Iakoucheva; Zamin Iqbal; Shuli Kang; Jeffrey M. Kidd; Miriam K. Konkel; Joshua Korn; Ekta Khurana; Deniz Kural; Hugo Y. K. Lam; Jing Leng; Ruiqiang Li; Yingrui Li; Chang-Yun Lin; Ruibang Luo; Xinmeng Jasmine Mu; James Nemesh; Heather E. Peckham; Tobias Rausch; Aylwyn Scally; Xinghua Shi; Michael P. Stromberg; Adrian M. Stütz; Alexander Eckehart Urban; Jerilyn A. Walker; Jiantao Wu; Yujun Zhang; Zhengdong D. Zhang; Mark A. Batzer; Li Ding; Gabor T. Marth; Gil McVean; Jonathan Sebat; Michael Snyder; Jun Wang; Kenny Ye; Evan E. Eichler; Mark B. Gerstein; Matthew E. Hurles; Charles Lee; Steven A. McCarroll; Jan O. Korbel

2011-01-01

152

Genome Sequence of Enterohemorrhagic Escherichia coli NCCP15658  

PubMed Central

Enterohemorrhagic Escherichia coli causes severe food-borne disease in the guts of humans and animals. Here, we report the high-quality draft genome sequence of E. coli NCCP15658 isolated from a patient in the Republic of Korea. Its genome size was determined to be 5.46 Mb, and its genomic features, including genes encoding virulence factors, were analyzed.

Song, Ju Yeon; Yoo, Ran Hee; Jang, Song Yee; Seong, Won-Keun; Kim, Seon-Young; Jeong, Haeyoung; Kang, Sung Gyun; Kim, Byung Kwon; Kwon, Soon-Kyeong; Lee, Choong Hoon; Yu, Dong Su; Park, Mi-Sun

2012-01-01

153

Genome sequence of Brevibacillus laterosporus strain GI-9.  

PubMed

We report the 5.18-Mb genome sequence of Brevibacillus laterosporus strain GI-9, isolated from a subsurface soil sample during a screen for novel strains producing antimicrobial compounds. The draft genome of this strain will aid in biotechnological exploitation and comparative genomics of Brevibacillus laterosporus strains. PMID:22328768

Sharma, Vikas; Singh, Pradip K; Midha, Samriti; Ranjan, Manish; Korpole, Suresh; Patil, Prabhu B

2012-03-01

154

Genome Sequence of Brevibacillus laterosporus Strain GI-9  

PubMed Central

We report the 5.18-Mb genome sequence of Brevibacillus laterosporus strain GI-9, isolated from a subsurface soil sample during a screen for novel strains producing antimicrobial compounds. The draft genome of this strain will aid in biotechnological exploitation and comparative genomics of Brevibacillus laterosporus strains.

Sharma, Vikas; Singh, Pradip K.; Midha, Samriti; Ranjan, Manish

2012-01-01

155

Identification of Candidate Drosophila Olfactory Receptors from Genomic DNA Sequence  

Microsoft Academic Search

We have taken advantage of the availability of a large amount of Drosophila genomic DNA sequence in the Berkeley Drosophila Genome Project database (?1\\/5 of the genome) to identify a family of novel seven transmembrane domain encoding genes that are putative Drosophila olfactory receptors. Members of the family are expressed in distinct subsets of olfactory neurons, and certain family members

Qian Gao; Andrew Chess

1999-01-01

156

Genome Sequence of the Rice Pathogen Pseudomonas fuscovaginae CB98818  

PubMed Central

Pseudomonas fuscovaginae is a phytopathogenic bacterium causing bacterial sheath brown rot of cereal crops. Here, we present the draft genome sequence of P. fuscovaginae CB98818, originally isolated from a diseased rice plant in China. The draft genome will aid in epidemiological studies, comparative genomics, and quarantine of this broad-host-range pathogen.

Xie, Guanlin; Cui, Zhouqi; Tao, Zhongyun; Qiu, Hui; Liu, He; Zhu, Bo; Jin, Gulei; Sun, Guochang; Almoneafy, Abdulwareth

2012-01-01

157

Research ethics and the challenge of whole-genome sequencing  

Microsoft Academic Search

The recent completion of the first two individual whole-genome sequences is a research milestone. As personal genome research advances, investigators and international research bodies must ensure ethical research conduct. We identify three major ethical considerations that have been implicated in whole-genome research: the return of research results to participants; the obligations, if any, that are owed to participants' relatives; and

Amy L. McGuire; Mildred K. Cho; Timothy Caulfield

2007-01-01

158

Genome Sequence of Aedes aegypti, a Major Arbovirus Vector  

Microsoft Academic Search

We present a draft sequence of the genome of Aedes aegypti, the primary vector for yellow fever and dengue fever, which at ~1376 million base pairs is about 5 times the size of the genome of the malaria vector Anopheles gambiae. Nearly 50% of the Ae. aegypti genome consists of transposable elements. These contribute to a factor of ~4 to

Vishvanath Nene; Jennifer R. Wortman; Daniel Lawson; Brian Haas; Chinnappa Kodira; Z. Tu; Brendan Loftus; Zhiyong Xi; Karyn Megy; Manfred Grabherr; Quinghu Ren; E. M. Zdobnov; N. F. Lobo; K. S. Campbell; S. E. Brown; M. F. Bonaldo; Jingsong Zhu; S. P. Sinkins; D. G. Hogenkamp; Paolo Amedeo; Peter Arensburger; P. W. Atkinson; Shelby Bidwell; Jim Biedler; Ewan Birney; Robert V. Bruggner; Javier Costas; M. R. Coy; Jonathan Crabtree; Matt Crawford; Becky deBruyn; David DeCaprio; Karin Eiglmeier; Eric Eisenstadt; Hamza El-Dorry; W. M. Gelbart; S. L. Gomes; Martin Hammond; Linda I. Hannick; M. H. Holmes; J. R. Hogan; David Jaffe; J. S. Johnston; R. C. Kennedy; Hean Koo; Saul Kravitz; Evgenia V. Kriventseva; David Kulp; Kurt LaButti; Eduardo Lee; Song Li; Diane D. Lovin; Chunhong Mao; Evan Mauceli; C. F. M. Menck; J. R. Miller; Philip Montgomery; Akio Mori; A. L. Nascimento; H. F. Naveira; Chad Nusbaum; S. O'Leary; Joshua Orvis; Mihaela Pertea; Hadi Quesneville; K. R. Reidenbach; Yu-Hui Rogers; C. W. Roth; J. R. Schneider; Michael Schatz; Martin Shumway; Mario Stanke; E. O. Stinson; J. M. C. Tubio; J. P. VanZee; Sergio Verjovski-Almeida; Doreen Werner; Owen White; Stefan Wyder; Qiandong Zeng; Qi Zhao; Yongmei Zhao; C. A. Hill; A. S. Raikhel; M. B. Soares; D. L. Knudson; N. H. Lee; James Galagan; S. L. Salzberg; I. T. Paulsen; George Dimopoulos; F. H. Collins; Bruce Birren; C. M. Fraser-Liggett; D. W. Severson

2007-01-01

159

Draft genome sequence of the coccolithovirus Emiliania huxleyi virus 202.  

PubMed

Emiliania huxleyi virus 202 (EhV-202) is a member of the Coccolithoviridae, a group of viruses that infect the marine coccolithophorid Emiliania huxleyi. EhV-202 has a 160- to 180-nm-diameter icosahedral structure and a genome of approximately 407 kbp, consisting of 485 coding sequences (CDSs). Here we describe the genomic features of EhV-202, together with a draft genome sequence and its annotation, highlighting the homology and heterogeneity of this genome in comparison with the EhV-86 reference genome. PMID:22282334

Nissimov, Jozef I; Worthy, Charlotte A; Rooks, Paul; Napier, Johnathan A; Kimmance, Susan A; Henn, Matthew R; Ogata, Hiroyuki; Allen, Michael J

2012-02-01

160

Draft genome sequence of the Coccolithovirus Emiliania huxleyi virus 203.  

PubMed

The Coccolithoviridae are a recently discovered group of viruses that infect the marine coccolithophorid Emiliania huxleyi. Emiliania huxleyi virus 203 (EhV-203) has a 160- to 180-nm-diameter icosahedral structure and a genome of approximately 400 kbp, consisting of 464 coding sequences (CDSs). Here we describe the genomic features of EhV-203 together with a draft genome sequence and its annotation, highlighting the homology and heterogeneity of this genome in comparison with the EhV-86 reference genome. PMID:22106382

Nissimov, Jozef I; Worthy, Charlotte A; Rooks, Paul; Napier, Johnathan A; Kimmance, Susan A; Henn, Matthew R; Ogata, Hiroyuki; Allen, Michael J

2011-12-01

161

Complete genome sequence of Enterobacter aerogenes KCTC 2190.  

PubMed

This is the first complete genome sequence of the Enterobacter aerogenes species. Here we present the genome sequence of E. aerogenes KCTC 2190, which contains 5,280,350 bp with a G + C content of 54.8 mol%, 4,912 protein-coding genes, and 109 structural RNAs. PMID:22493190

Shin, Sang Heum; Kim, Sewhan; Kim, Jae Young; Lee, Soojin; Um, Youngsoon; Oh, Min-Kyu; Kim, Young-Rok; Lee, Jinwon; Yang, Kap-Seok

2012-05-01

162

Limitations of next-generation genome sequence assembly  

Microsoft Academic Search

High-throughput sequencing technologies promise to transform the fields of genetics and comparative biology by delivering tens of thousands of genomes in the near future. Although it is feasible to construct de novo genome assemblies in a few months, there has been relatively little attention to what is lost by sole application of short sequence reads. We compared the recent de

Can Alkan; Saba Sajjadian; Evan E Eichler

2010-01-01

163

WHOLE GENOME SEQUENCE OF FUSARIUM GRAMINEARUM, LINEAGE 7  

Technology Transfer Automated Retrieval System (TEKTRAN)

We have generated a draft sequence assembly of the F. graminearum genome that is available on the web for download and query. The sequence is of high quality with the entire 36Mb assembly consisting of just 511 contigs (> 2kb) contained within 28 supercontigs (scaffolds). The second genome release...

164

Use of Whole Genome Sequence Data To Infer Baculovirus Phylogeny  

Microsoft Academic Search

Several phylogenetic methods based on whole genome sequence data were evaluated using data from nine complete baculovirus genomes. The utility of three independent character sets was assessed. The first data set comprised the sequences of the 63 genes common to these viruses. The second set of characters was based on gene order, and phylogenies were inferred using both breakpoint distance

ELISABETH A. HERNIOU; TERESA LUQUE; XINWEN CHEN; JUST M. VLAK; DOREEN WINSTANLEY; JENNIFER S. CORY; D. R. O'Reilly

2001-01-01

165

Draft Genome Sequence of Aspergillus oryzae Strain 3.042  

PubMed Central

Aspergillus oryzae is the most important fungus for the traditional fermentation in China and is particularly important in soy sauce fermentation. We report the 36,547,279-bp draft genome sequence of A. oryzae 3.042 and compared it to the published genome sequence of A. oryzae RIB40.

Zhao, Guozhong; Yao, Yunping; Qi, Wei; Wang, Chunling; Hou, Lihua; Zeng, Bin

2012-01-01

166

The mitochondrial genome sequence of the Tasmanian tiger (Thylacinus cynocephalus)  

Microsoft Academic Search

We report the first two complete mitochondrial genome sequences of the thylacine (Thylacinus cynocephalus), or so-called Tasmanian tiger, extinct since 1936. The thylacine's phylogenetic position within australidelphian marsupials has long been debated, and here we provide strong support for the thylacine's basal position in Dasyuromorphia, aided by mi- tochondrial genome sequence that we generated from the extant numbat (Myrmecobius fasciatus).

Webb Miller; Daniela I. Drautz; Jan E. Janecka; Arthur M. Lesk; Aakrosh Ratan; Lynn P. Tomsho; Mike Packard; Yeting Zhang; Lindsay R. McClellan; Ji Qi; Fangqing Zhao; M. Thomas; P. Gilbert; Juan Luis Arsuaga; Daniel H. Huson; Kristofer M. Helgen; William J. Murphy; Anders Gotherstrom; Stephan C. Schuster

2009-01-01

167

Sequence Surveyor: Leveraging Overview for Scalable Genomic Alignment Visualization  

Microsoft Academic Search

Fig. 1. Sequence Surveyor visualizing 100 synthetic genomes generated by an evolution simulation. Each genome is mapped to a row and genes are ordered by position. Color encodes the position of the gene within the chosen reference sequence (top row, indicated by the green box). Genes are aggregated, with each block's texture reflecting the overall distribution of colors in that

Danielle Albers; Colin Dewey; Michael Gleicher

2011-01-01

168

Complete Genome Sequence of Enterobacter aerogenes KCTC 2190  

PubMed Central

This is the first complete genome sequence of the Enterobacter aerogenes species. Here we present the genome sequence of E. aerogenes KCTC 2190, which contains 5,280,350 bp with a G + C content of 54.8 mol%, 4,912 protein-coding genes, and 109 structural RNAs.

Shin, Sang Heum; Kim, Sewhan; Kim, Jae Young; Lee, Soojin; Um, Youngsoon; Oh, Min-Kyu; Kim, Young-Rok; Lee, Jinwon

2012-01-01

169

Genome Sequence of Alcaligenes sp. Strain HPC1271  

PubMed Central

We report a draft genome sequence of Alcaligenes sp. strain HPC1271, which demonstrates antimicrobial activity against multidrug-resistant bacteria. Antibiotic production by Alcaligenes has not been frequently reported, and hence, the availability of the genome sequence should enable us to explore new antibiotic-producing gene clusters.

Sagarkar, Sneha; Tanksale, Himgouri; Sharma, Nandita; Qureshi, Asifa; Khardenavis, Anshuman; Purohit, Hemant J.

2013-01-01

170

The first Irish genome and ways of improving sequence accuracy  

PubMed Central

Whole-genome sequencing of an Irish person reveals hundreds of thousands of novel genomic variants. Imputation using previous known information improves the accuracy of low-read-depth sequencing. See research article: http://genomebiology.com/2010/11/9/R91

2010-01-01

171

Distribution and intensity of constraint in mammalian genomic sequence  

Microsoft Academic Search

Comparisons of orthologous genomic DNA sequences can be used to characterize regions that have been subject to purifying selection and are enriched for functional elements. We here present the results of such an analysis on an alignment of sequences from 29 mammalian species. The alignment captures ?3.9 neutral substitutions per site and spans ?1.9 Mbp of the human genome. We

Gregory M. Cooper; Eric A. Stone; Eric D. Green; Serafim Batzoglou; Arend Sidow

2005-01-01

172

Initial sequencing and analysis of the human genome  

Microsoft Academic Search

The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

Eric S. Lander; Lauren M. Linton; Bruce Birren; Chad Nusbaum; Michael C. Zody; Jennifer Baldwin; Keri Devon; Ken Dewar; Michael Doyle; William FitzHugh; Roel Funke; Diane Gage; Katrina Harris; Andrew Heaford; John Howland; Lisa Kann; Jessica Lehoczky; Rosie LeVine; Paul McEwan; Kevin McKernan; James Meldrim; Jill P. Mesirov; Cher Miranda; William Morris; Jerome Naylor; Christina Raymond; Mark Rosetti; Ralph Santos; Andrew Sheridan; Carrie Sougnez; Nicole Stange-Thomann; Nikola Stojanovic; Aravind Subramanian; Dudley Wyman; Jane Rogers; John Sulston; Rachael Ainscough; Stephan Beck; David Bentley; John Burton; Christopher Clee; Nigel Carter; Alan Coulson; Rebecca Deadman; Panos Deloukas; Andrew Dunham; Ian Dunham; Richard Durbin; Lisa French; Darren Grafham; Simon Gregory; Tim Hubbard; Sean Humphray; Adrienne Hunt; Matthew Jones; Christine Lloyd; Amanda McMurray; Lucy Matthews; Simon Mercer; Sarah Milne; James C. Mullikin; Andrew Mungall; Robert Plumb; Mark Ross; Ratna Shownkeen; Sarah Sims; Robert H. Waterston; Richard K. Wilson; LaDeana W. Hillier; John D. McPherson; Marco A. Marra; Elaine R. Mardis; Lucinda A. Fulton; Asif T. Chinwalla; Kymberlie H. Pepin; Warren R. Gish; Stephanie L. Chissoe; Michael C. Wendl; Kim D. Delehaunty; Tracie L. Miner; Andrew Delehaunty; Jason B. Kramer; Lisa L. Cook; Robert S. Fulton; Douglas L. Johnson; Patrick J. Minx; Sandra W. Clifton; Trevor Hawkins; Elbert Branscomb; Paul Predki; Paul Richardson; Sarah Wenning; Tom Slezak; Norman Doggett; Jan-Fang Cheng; Anne Olsen; Susan Lucas; Christopher Elkin; Edward Uberbacher; Marvin Frazier; Richard A. Gibbs; Donna M. Muzny; Steven E. Scherer; John B. Bouck; Erica J. Sodergren; Kim C. Worley; Catherine M. Rives; James H. Gorrell; Michael L. Metzker; Susan L. Naylor; Raju S. Kucherlapati; David L. Nelson; George M. Weinstock; Yoshiyuki Sakaki; Asao Fujiyama; Masahira Hattori; Tetsushi Yada; Atsushi Toyoda; Takehiko Itoh; Chiharu Kawagoe; Hidemi Watanabe; Yasushi Totoki; Todd Taylor; Jean Weissenbach; Roland Heilig; William Saurin; Francois Artiguenave; Philippe Brottier; Thomas Bruls; Eric Pelletier; Catherine Robert; Patrick Wincker; Douglas R. Smith; Lynn Doucette-Stamm; Marc Rubenfield; Keith Weinstock; Hong Mei Lee; JoAnn Dubois; André Rosenthal; Matthias Platzer; Gerald Nyakatura; Stefan Taudien; Andreas Rump; Huanming Yang; Jun Yu; Jian Wang; Guyang Huang; Jun Gu; Leroy Hood; Lee Rowen; Anup Madan; Shizen Qin; Ronald W. Davis; Nancy A. Federspiel; A. Pia Abola; Michael J. Proctor; Richard M. Myers; Jeremy Schmutz; Mark Dickson; Jane Grimwood; David R. Cox; Maynard V. Olson; Rajinder Kaul; Christopher Raymond; Nobuyoshi Shimizu; Kazuhiko Kawasaki; Shinsei Minoshima; Glen A. Evans; Maria Athanasiou; Roger Schultz; Bruce A. Roe; Feng Chen; Huaqin Pan; Juliane Ramser; Hans Lehrach; Richard Reinhardt; W. Richard McCombie; Melissa de la Bastide; Neilay Dedhia; Helmut Blöcker; Klaus Hornischer; Gabriele Nordsiek; Richa Agarwala; L. Aravind; Jeffrey A. Bailey; Serafim Batzoglou; Ewan Birney; Peer Bork; Daniel G. Brown; Christopher B. Burge; Lorenzo Cerutti; Hsiu-Chuan Chen; Deanna Church; Michele Clamp; Richard R. Copley; Tobias Doerks; Sean R. Eddy; Evan E. Eichler; Terrence S. Furey; James Galagan; James G. R. Gilbert; Cyrus Harmon; Yoshihide Hayashizaki; David Haussler; Henning Hermjakob; Karsten Hokamp; Wonhee Jang; L. Steven Johnson; Thomas A. Jones; Simon Kasif; Arek Kaspryzk; Scot Kennedy; W. James Kent; Paul Kitts; Eugene V. Koonin; Ian Korf; David Kulp; Doron Lancet; Todd M. Lowe; Aoife McLysaght; Tarjei Mikkelsen; John V. Moran; Nicola Mulder; Victor J. Pollara; Chris P. Ponting; Greg Schuler; Jörg Schultz; Guy Slater; Arian F. A. Smit; Elia Stupka; Joseph Szustakowki; Danielle Thierry-Mieg; Jean Thierry-Mieg; Lukas Wagner; John Wallis; Raymond Wheeler; Alan Williams; Yuri I. Wolf; Kenneth H. Wolfe; Shiaw-Pyng Yang; Ru-Fang Yeh; Francis Collins; Mark S. Guyer; Jane Peterson; Adam Felsenfeld; Kris A. Wetterstrand; Aristides Patrinos; Michael J. Morgan

2001-01-01

173

Genome Sequence of the Pathogenic Bacterium Vibrio vulnificus Biotype 3.  

PubMed

We report the first genome sequence of the pathogenic Vibrio vulnificus biotype 3. This draft genome sequence of the environmental strain VVyb1(BT3), isolated in Israel, provides a representation of this newly emerged clonal group, which reveals higher similarity to the clinical strains of biotype 1 than to the environmental ones. PMID:23599289

Danin-Poleg, Yael; Elgavish, Sharona; Raz, Nili; Efimov, Vera; Kashi, Yechezkel

2013-04-18

174

Genome Sequence of the Pathogenic Bacterium Vibrio vulnificus Biotype 3  

PubMed Central

We report the first genome sequence of the pathogenic Vibrio vulnificus biotype 3. This draft genome sequence of the environmental strain VVyb1(BT3), isolated in Israel, provides a representation of this newly emerged clonal group, which reveals higher similarity to the clinical strains of biotype 1 than to the environmental ones.

Danin-Poleg, Yael; Elgavish, Sharona; Raz, Nili; Efimov, Vera

2013-01-01

175

Draft Genome Sequence of the Wolbachia Endosymbiont of Drosophila suzukii  

PubMed Central

Wolbachia is one of the most successful and abundant symbiotic bacteria in nature, infecting more than 40% of the terrestrial arthropod species. Here we report the draft genome sequence of a novel Wolbachia strain named “wSuzi” that was retrieved from the genome sequencing of its host, the invasive pest Drosophila suzukii.

Cestaro, Alessandro; Kaur, Rupinder; Pertot, Ilaria; Rota-Stabelli, Omar; Anfora, Gianfranco

2013-01-01

176

High-quality genome sequence of Pichia pastoris CBS7435  

Microsoft Academic Search

The methylotrophic yeast Pichia pastoris (Komagataella phaffii) CBS7435 is the parental strain of commonly used P. pastoris recombinant protein production hosts making it well suited for improving the understanding of associated genomic features. Here, we present a 9.35Mbp high-quality genome sequence of P. pastoris CBS7435 established by a combination of 454 and Illumina sequencing. An automatic annotation of the genome

Andreas Küberl; Jessica Schneider; Gerhard G. Thallinger; Ingund Anderl; Daniel Wibberg; Tanja Hajek; Sebastian Jaenicke; Karina Brinkrolf; Alexander Goesmann; Rafael Szczepanowski; Alfred Pühler; Helmut Schwab; Anton Glieder; Harald Pichler

2011-01-01

177

A Complete Sequence of the T. tengcongensis Genome  

Microsoft Academic Search

Thermoanaerobacter tengcongensis is a rod-shaped, gram-negative, anaerobic eubacterium that was isolated from a freshwater hot spring in Tengchong, China. Using a whole-genome-shotgun method, we sequenced its 2,689,445-bp genome from an isolate, MB4 T (Genbank accession no. AE008691). The genome encodes 2588 predicted coding sequences (CDS). Among them, 1764 (68.2%) are classified according to homology to other documented proteins, and the

Qiyu Bao; Yuqing Tian; Wei Li; Zuyuan Xu; Zhenyu Xuan; Songnian Hu; Wei Dong; Jian Yang; Yanjiong Chen; Yanfen Xue; Yi Xu; Xiaoqin Lai; Li Huang; Xiuzhu Dong; Yanhe Ma; Lunjiang Ling; Huarong Tan; Runsheng Chen; Jian Wang; Jun Yu; Huanming Yang

2002-01-01

178

Low-pass sequencing for microbial comparative genomics  

Microsoft Academic Search

BACKGROUND: We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1) the metabolically versatile Haloarcula marismortui; (2) the non-pigmented Natrialba asiatica; (3) the psychrophile Halorubrum lacusprofundi and (4) the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1

Young Ah Goo; Jared Roach; Gustavo Glusman; Nitin S Baliga; Kerry Deutsch; Min Pan; Sean Kennedy; Shiladitya DasSarma; Wailap Victor Ng; Leroy Hood

2004-01-01

179

Genome sequence of the human malaria parasite Plasmodium falciparum  

Microsoft Academic Search

The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date.

Malcolm J. Gardner; Neil Hall; Eula Fung; Owen White; Matthew Berriman; Richard W. Hyman; Jane M. Carlton; Arnab Pain; Sharen Bowman; Ian T. Paulsen; Keith James; Kim Rutherford; Steven L. Salzberg; Alister Craig; Sue Kyes; Man-Suen Chan; Vishvanath Nene; Shamira J. Shallom; Bernard Suh; Jeremy Peterson; Sam Angiuoli; Mihaela Pertea; Jonathan Allen; Jeremy Selengut; Daniel Haft; Michael W. Mather; Akhil B. Vaidya; Alan H. Fairlamb; Martin J. Fraunholz; David S. Roos; Stuart A. Ralph; Geoffrey I. McFadden; Leda M. Cummings; G. Mani Subramanian; Chris Mungall; J. Craig Venter; Daniel J. Carucci; Stephen L. Hoffman; Chris Newbold; Ronald W. Davis; Claire M. Fraser; Bart Barrell

2002-01-01

180

Sequencing Initiative at the Norris Cotton Cancer Center  

PubMed Central

The Dartmouth Genomics Shared Resource recently purchased the Ion Torrent Personal Genome Machine (PGM) and the Ion Proton with contributions from the Norris Cotton Cancer Center (NCCC), Geisel School of Medicine and the Institute for Quantitative Biomedical Sciences. The transition to Ion Torrent deep sequencing was relatively smooth and the workflows easily established. In collaboration with the NCCC, we are offering NCCC investigators an initiative to encourage deep sequencing and translational research. Investigators can choose one of two cancer panels: the Ion Torrent hotspot cancer panel (50 genes), and a custom-designed cancer gene panel (541 genes). The 541-cancer gene panel includes the desired genes from every NCCC investigator, which covers a broad spectrum of cancers and signaling pathways. The 541-cancer gene panel was designed using the Haloplex system (Agilent, Santa Clara, CA). We have validated extraction of DNA from both formalin-fixed paraffin-embedded (FFPE) and fresh frozen tissues to offer clinicians and researchers options for sample collection. Data are presented from the hotspot cancer gene panel using DNA obtained from FFPE and frozen breast cancer tissues.

Shipman, S.; Trask, H.; Lytle, C.; Taylor, W.; Moore, J.; Tomlinson, C.; Kerley-Hamilton, Joanna

2013-01-01

181

Complete Genome Sequence of Probiotic Strain Lactobacillus acidophilus La-14.  

PubMed

We present the 1,991,830-bp complete genome sequence of Lactobacillus acidophilus strain La-14 (SD-5212). Comparative genomic analysis revealed 99.98% similarity overall to the L. acidophilus NCFM genome. Globally, 111 single nucleotide polymorphisms (SNPs) (95 SNPs, 16 indels) were observed throughout the genome. Also, a 416-bp deletion in the LA14_1146 sugar ABC transporter was identified. PMID:23788546

Stahl, Buffy; Barrangou, Rodolphe

2013-06-20

182

Complete Chloroplast Genome Sequence of Glycine max and Comparative Analyses with other Legume Genomes  

Microsoft Academic Search

Lack of complete chloroplast genome sequences is still one of the major limitations to extending chloroplast genetic engineering technology to useful crops. Therefore, we sequenced the soybean chloroplast genome and compared it to the other completely sequenced legumes, Lotus and Medicago. The chloroplast genome of Glycine is 152,218 basepairs (bp) in length, including a pair of inverted repeats of 25,574 bp

Christopher Saski; Seung-Bum Lee; Henry Daniell; Todd C. Wood; Jeffrey Tomkins; Hyi-Gyung Kim; Robert K. Jansen

2005-01-01

183

Sequences Associated with Centromere Competency in the Human Genome  

PubMed Central

Centromeres, the sites of spindle attachment during mitosis and meiosis, are located in specific positions in the human genome, normally coincident with diverse subsets of alpha satellite DNA. While there is strong evidence supporting the association of some subfamilies of alpha satellite with centromere function, the basis for establishing whether a given alpha satellite sequence is or is not designated a functional centromere is unknown, and attempts to understand the role of particular sequence features in establishing centromere identity have been limited by the near identity and repetitive nature of satellite sequences. Utilizing a broadly applicable experimental approach to test sequence competency for centromere specification, we have carried out a genomic and epigenetic functional analysis of endogenous human centromere sequences available in the current human genome assembly. The data support a model in which functionally competent sequences confer an opportunity for centromere specification, integrating genomic and epigenetic signals and promoting the concept of context-dependent centromere inheritance.

Hayden, Karen E.; Strome, Erin D.; Merrett, Stephanie L.; Lee, Hye-Ran; Rudd, M. Katharine

2013-01-01

184

The genome sequence of Schizosaccharomyces pombe.  

PubMed

We have sequenced and annotated the genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (Saccharomyces cerevisiae), possibly reflecting more-extended control regions. Some 43% of the genes contain introns, of which there are 4,730. Fifty genes have significant similarity with human disease genes; half of these are cancer related. We identify highly conserved genes important for eukaryotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing. These genes may have originated with the appearance of eukaryotic life. Few similarly conserved genes that are important for multicellular organization were identified, suggesting that the transition from prokaryotes to eukaryotes required more new genes than did the transition from unicellular to multicellular organization. PMID:11859360

Wood, V; Gwilliam, R; Rajandream, M-A; Lyne, M; Lyne, R; Stewart, A; Sgouros, J; Peat, N; Hayles, J; Baker, S; Basham, D; Bowman, S; Brooks, K; Brown, D; Brown, S; Chillingworth, T; Churcher, C; Collins, M; Connor, R; Cronin, A; Davis, P; Feltwell, T; Fraser, A; Gentles, S; Goble, A; Hamlin, N; Harris, D; Hidalgo, J; Hodgson, G; Holroyd, S; Hornsby, T; Howarth, S; Huckle, E J; Hunt, S; Jagels, K; James, K; Jones, L; Jones, M; Leather, S; McDonald, S; McLean, J; Mooney, P; Moule, S; Mungall, K; Murphy, L; Niblett, D; Odell, C; Oliver, K; O'Neil, S; Pearson, D; Quail, M A; Rabbinowitsch, E; Rutherford, K; Rutter, S; Saunders, D; Seeger, K; Sharp, S; Skelton, J; Simmonds, M; Squares, R; Squares, S; Stevens, K; Taylor, K; Taylor, R G; Tivey, A; Walsh, S; Warren, T; Whitehead, S; Woodward, J; Volckaert, G; Aert, R; Robben, J; Grymonprez, B; Weltjens, I; Vanstreels, E; Rieger, M; Schäfer, M; Müller-Auer, S; Gabel, C; Fuchs, M; Düsterhöft, A; Fritzc, C; Holzer, E; Moestl, D; Hilbert, H; Borzym, K; Langer, I; Beck, A; Lehrach, H; Reinhardt, R; Pohl, T M; Eger, P; Zimmermann, W; Wedler, H; Wambutt, R; Purnelle, B; Goffeau, A; Cadieu, E; Dréano, S; Gloux, S; Lelaure, V; Mottier, S; Galibert, F; Aves, S J; Xiang, Z; Hunt, C; Moore, K; Hurst, S M; Lucas, M; Rochet, M; Gaillardin, C; Tallada, V A; Garzon, A; Thode, G; Daga, R R; Cruzado, L; Jimenez, J; Sánchez, M; del Rey, F; Benito, J; Domínguez, A; Revuelta, J L; Moreno, S; Armstrong, J; Forsburg, S L; Cerutti, L; Lowe, T; McCombie, W R; Paulsen, I; Potashkin, J; Shpakovski, G V; Ussery, D; Barrell, B G; Nurse, P; Cerrutti, L

2002-02-21

185

Looking to future of genome mapping, sequencing  

SciTech Connect

The human genome mapping and sequencing project is perhaps the prime example of an international project in medicine today. The project director, Nobelist James D. Watson, PhD, noted at the bicentennial conference that it may be possible to bring the cost down to as low as 50{cents} a base pair without any enormous technological breakthroughs in the 10-nation effort. Another speaker, George Poste, PhD, DVM, DSc, head of research and development, Smith Kline French Laboratories, Philadelphia, PA, predicted that completion of the genetic dictionary will lead to compilation of a protein dictionary for each cell type for use against disease. Anti-trust legislation, he said, is overtly ignored all the time in the defense industry because it is deemed to be in the national interest. However, Poste went on, the legislative bodies of the world do not yet understand the implications of the directions in which we are going in terms of Big Biology and the requirements for companies to be able to work together.

Kangilaski, J.

1989-07-21

186

Genomics:GTL Bioenergy Research Centers White Paper  

SciTech Connect

In his Advanced Energy Initiative announced in January 2006, President George W. Bush committed the nation to new efforts to develop alternative sources of energy to replace imported oil and fossil fuels. Developing cost-effective and energy-efficient methods of producing renewable alternative fuels such as cellulosic ethanol from biomass and solar-derived biofuels will require transformational breakthroughs in science and technology. Incremental improvements in current bioenergy production methods will not suffice. The Genomics:GTL Bioenergy Research Centers will be dedicated to fundamental research on microbe and plant systems with the goal of developing knowledge that will advance biotechnology-based strategies for biofuels production. The aim is to spur substantial progress toward cost-effective production of biologically based renewable energy sources. This document describes the rationale for the establishment of the centers and their objectives in light of the U.S. Department of Energy's mission and goals. Developing energy-efficient and cost-effective methods of producing alternative fuels such as cellulosic ethanol from biomass will require transformational breakthroughs in science and technology. Incremental improvements in current bioenergy-production methods will not suffice. The focus on microbes (for cellular mechanisms) and plants (for source biomass) fundamentally exploits capabilities well known to exist in the microbial world. Thus 'proof of concept' is not required, but considerable basic research into these capabilities remains an urgent priority. Several developments have converged in recent years to suggest that systems biology research into microbes and plants promises solutions that will overcome critical roadblocks on the path to cost-effective, large-scale production of cellulosic ethanol and other renewable energy from biomass. The ability to rapidly sequence the DNA of any organism is a critical part of these new capabilities, but it is only a first step. Other advances include the growing number of high-throughput techniques for protein production and characterization; a range of new instrumentation for observing proteins and other cell constituents; the rapid growth of commercially available reagents for protein production; a new generation of high-intensity light sources that provide precision imaging on the nanoscale and allow observation of molecular interactions in ultrafast time intervals; major advances in computational capability; and the continually increasing numbers of these instruments and technologies within the national laboratory infrastructure, at universities, and in private industry. All these developments expand our ability to elucidate mechanisms present in living cells, but much more remains to be done. The Centers are designed to accomplish GTL program objectives more rapidly, more effectively, and at reduced cost by concentrating appropriate technologies and scientific expertise, from genome sequence to an integrated systems understanding of the pathways and internal structures of microbes and plants most relevant to developing bioenergy compounds. The Centers will seek to understand the principles underlying the structural and functional design of selected microbial, plant, and molecular systems. This will be accomplished by building technological pathways linking the genome-determined components in an organism with bioenergy-relevant cellular systems that can be characterized sufficiently to generate realistic options for biofuel development. In addition, especially in addressing what are believed to be nearer-term approaches to renewable energy (e.g., producing cellulosic ethanol cost-effectively and energy-efficiently), the Center research team must understand in depth the current industrial-level roadblocks and bottlenecks (see section, GTL's Vision for Biological Energy Alternatives, below). For the Centers, and indeed the entire BER effort, to be successful, Center research must be integrated with individual investigator research, and coordination of activities,

Mansfield, Betty Kay [ORNL; Alton, Anita Jean [ORNL; Andrews, Shirley H [ORNL; Bownas, Jennifer Lynn [ORNL; Casey, Denise [ORNL; Martin, Sheryl A [ORNL; Mills, Marissa [ORNL; Nylander, Kim [ORNL; Wyrick, Judy M [ORNL; Drell, Dr. Daniel [Office of Science, Department of Energy; Weatherwax, Sharlene [U.S. Department of Energy; Carruthers, Julie [U.S. Department of Energy

2006-08-01

187

Community-wide analysis of microbial genome sequence signatures  

PubMed Central

Background Analyses of DNA sequences from cultivated microorganisms have revealed genome-wide, taxa-specific nucleotide compositional characteristics, referred to as genome signatures. These signatures have far-reaching implications for understanding genome evolution and potential application in classification of metagenomic sequence fragments. However, little is known regarding the distribution of genome signatures in natural microbial communities or the extent to which environmental factors shape them. Results We analyzed metagenomic sequence data from two acidophilic biofilm communities, including composite genomes reconstructed for nine archaea, three bacteria, and numerous associated viruses, as well as thousands of unassigned fragments from strain variants and low-abundance organisms. Genome signatures, in the form of tetranucleotide frequencies analyzed by emergent self-organizing maps, segregated sequences from all known populations sharing < 50 to 60% average amino acid identity and revealed previously unknown genomic clusters corresponding to low-abundance organisms and a putative plasmid. Signatures were pervasive genome-wide. Clusters were resolved because intra-genome differences resulting from translational selection or protein adaptation to the intracellular (pH ~5) versus extracellular (pH ~1) environment were small relative to inter-genome differences. We found that these genome signatures stem from multiple influences but are primarily manifested through codon composition, which we propose is the result of genome-specific mutational biases. Conclusions An important conclusion is that shared environmental pressures and interactions among coevolving organisms do not obscure genome signatures in acid mine drainage communities. Thus, genome signatures can be used to assign sequence fragments to populations, an essential prerequisite if metagenomics is to provide ecological and biochemical insights into the functioning of microbial communities.

Dick, Gregory J; Andersson, Anders F; Baker, Brett J; Simmons, Sheri L; Thomas, Brian C; Yelton, A Pepper; Banfield, Jillian F

2009-01-01

188

Genome Sequence of the Trichosporon asahii Environmental Strain CBS 8904  

PubMed Central

This is the first report of the genome sequence of Trichosporon asahii environmental strain CBS 8904, which was isolated from maize cobs. Comparison of the genome sequence with that of clinical strain CBS 2479 revealed that they have >99% chromosomal and mitochondrial sequence identity, yet CBS 8904 has 368 specific genes. Analysis of clusters of orthologous groups predicted that 3,307 genes belong to 23 functional categories and 703 genes were predicted to have a general function.

Li, Hai Tao; Zhu, He; Zhou, Guang Peng; Wang, Meng; Wang, Lei

2012-01-01

189

Nucleotide sequence of the genomic RNA of bamboo mosaic potexvirus  

Microsoft Academic Search

The complete nucleotide sequence of the genomic RNA of bamboo mosaic virus (BaMV) was determined by sequencing a set of overlapping cDNA clones and by direct sequencing of the viral RNA. The RNA genome of BaMV is 63 66 nucleotides long (excluding 3'poly(A) tail) and contains six open reading frames (ORFs 1 to 6) coding for polypeptides with M~. values

Na-Sheng Lin; Biing-Yuan Lin; Neng-Wen Lo; Chung-Chi Hu; Teh-Yuan Chow; Yau-Heiu Hsu

1994-01-01

190

Multiple alignment of genomic sequences using CHAOS, DIALIGN and ABC  

Microsoft Academic Search

Comparative analysis of genomic sequences is a powerful approach to discover functional sites in these sequences. Herein, we present a WWW-based software system for multiple alignment of genomic sequences. We use the local alignment tool CHAOS to rapidly identify chains of pairwise similarities. These similarities are used as anchor points to speed up the DIALIGN multiple-alignment program. Finally,thevisualizationtoolABCisusedforinteract- ive graphical

Dirk Pöhler; Nadine Werner; Rasmus Steinkamp; Burkhard Morgenstern

2005-01-01

191

Data structures and compression algorithms for genomic sequence data  

PubMed Central

Motivation: The continuing exponential accumulation of full genome data, including full diploid human genomes, creates new challenges not only for understanding genomic structure, function and evolution, but also for the storage, navigation and privacy of genomic data. Here, we develop data structures and algorithms for the efficient storage of genomic and other sequence data that may also facilitate querying and protecting the data. Results: The general idea is to encode only the differences between a genome sequence and a reference sequence, using absolute or relative coordinates for the location of the differences. These locations and the corresponding differential variants can be encoded into binary strings using various entropy coding methods, from fixed codes such as Golomb and Elias codes, to variables codes, such as Huffman codes. We demonstrate the approach and various tradeoffs using highly variables human mitochondrial genome sequences as a testbed. With only a partial level of optimization, 3615 genome sequences occupying 56 MB in GenBank are compressed down to only 167 KB, achieving a 345-fold compression rate, using the revised Cambridge Reference Sequence as the reference sequence. Using the consensus sequence as the reference sequence, the data can be stored using only 133 KB, corresponding to a 433-fold level of compression, roughly a 23% improvement. Extensions to nuclear genomes and high-throughput sequencing data are discussed. Availability: Data are publicly available from GenBank, the HapMap web site, and the MITOMAP database. Supplementary materials with additional results, statistics, and software implementations are available from http://mammag.web.uci.edu/bin/view/Mitowiki/ProjectDNACompression. Contact: pfbaldi@ics.uci.edu

Brandon, Marty C.; Wallace, Douglas C.; Baldi, Pierre

2009-01-01

192

Whole-exome targeted sequencing of the uncharacterized pine genome.  

PubMed

The large genome size of many species hinders the development and application of genomic tools to study them. For instance, loblolly pine (Pinus taeda L.), an ecologically and economically important conifer, has a large and yet uncharacterized genome of 21.7 Gbp. To characterize the pine genome, we performed exome capture and sequencing of 14 729 genes derived from an assembly of expressed sequence tags. Efficiency of sequence capture was evaluated and shown to be similar across samples with increasing levels of complexity, including haploid cDNA, haploid genomic DNA and diploid genomic DNA. However, this efficiency was severely reduced for probes that overlapped multiple exons, presumably because intron sequences hindered probe:exon hybridizations. Such regions could not be entirely avoided during probe design, because of the lack of a reference sequence. To improve the throughput and reduce the cost of sequence capture, a method to multiplex the analysis of up to eight samples was developed. Sequence data showed that multiplexed capture was reproducible among 24 haploid samples, and can be applied for high-throughput analysis of targeted genes in large populations. Captured sequences were de novo assembled, resulting in 11 396 expanded and annotated gene models, significantly improving the knowledge about the pine gene space. Interspecific capture was also evaluated with over 98% of all probes designed from P. taeda that were efficient in sequence capture, were also suitable for analysis of the related species Pinus elliottii Engelm. PMID:23551702

Neves, Leandro G; Davis, John M; Barbazuk, William B; Kirst, Matias

2013-05-07

193

Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data  

PubMed Central

Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6–40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.

Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay

2013-01-01

194

Savant: genome browser for high-throughput sequencing data  

PubMed Central

Motivation: The advent of high-throughput sequencing (HTS) technologies has made it affordable to sequence many individuals' genomes. Simultaneously the computational analysis of the large volumes of data generated by the new sequencing machines remains a challenge. While a plethora of tools are available to map the resulting reads to a reference genome, and to conduct primary analysis of the mappings, it is often necessary to visually examine the results and underlying data to confirm predictions and understand the functional effects, especially in the context of other datasets. Results: We introduce Savant, the Sequence Annotation, Visualization and ANalysis Tool, a desktop visualization and analysis browser for genomic data. Savant was developed for visualizing and analyzing HTS data, with special care taken to enable dynamic visualization in the presence of gigabases of genomic reads and references the size of the human genome. Savant supports the visualization of genome-based sequence, point, interval and continuous datasets, and multiple visualization modes that enable easy identification of genomic variants (including single nucleotide polymorphisms, structural and copy number variants), and functional genomic information (e.g. peaks in ChIP-seq data) in the context of genomic annotations. Availability: Savant is freely available at http://compbio.cs.toronto.edu/savant Contact: savant@cs.toronto.edu

Fiume, Marc; Williams, Vanessa; Brook, Andrew; Brudno, Michael

2010-01-01

195

First draft genome sequence of the Japanese eel, Anguilla japonica.  

PubMed

The Japanese eel is a much appreciated research object and very important for Asian aquaculture; however, its genomic resources are still limited. We have used a streamlined bioinformatics pipeline for the de novo assembly of the genome sequence of the Japanese eel from raw Illumina sequence reads. The total assembled genome has a size of 1.15 Gbp, which is divided over 323,776 scaffolds with an N50 of 52,849 bp, a minimum scaffold size of 200 bp and a maximum scaffold size of 1.14 Mbp. Direct comparison of a representative set of scaffolds revealed that all the Hox genes and their intergenic distances are almost perfectly conserved between the European and the Japanese eel. The first draft genome sequence of an organism strongly catalyzes research progress in multiple fields. Therefore, the Japanese eel genome sequence will provide a rich resource of data for all scientists working on this important fish species. PMID:23026207

Henkel, Christiaan V; Dirks, Ron P; de Wijze, Daniëlle L; Minegishi, Yuki; Aoyama, Jun; Jansen, Hans J; Turner, Ben; Knudsen, Bjarne; Bundgaard, Martin; Hvam, Kenneth Lyneborg; Boetzer, Marten; Pirovano, Walter; Weltzien, Finn-Arne; Dufour, Sylvie; Tsukamoto, Katsumi; Spaink, Herman P; van den Thillart, Guido E E J M

2012-09-29

196

Reference genome sequence of the model plant Setaria  

SciTech Connect

We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The {approx}400-Mb assembly covers {approx}80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).

Bennetzen, Jeffrey L [ORNL; Yang, Xiaohan [ORNL; Ye, Chuyu [ORNL; Tuskan, Gerald A [ORNL

2012-01-01

197

Reference genome sequence of the model plant Setaria  

SciTech Connect

We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The ~400-Mb assembly covers ~80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).

Bennetzen, Jeffrey L [ORNL; Schmutz, Jeremy [Hudson Alpha Institute of Biotechnology; Wang, Hao [University of Georgia, Athens, GA; Percifield, Ryan [University of Georgia, Athens, GA; Hawkins, Jennifer [University of Georgia, Athens, GA; Pontaroli, Ana C. [University of Georgia, Athens, GA; Estep, Matt [University of Georgia, Athens, GA; Feng, Liang [University of Georgia, Athens, GA; Vaughn, Justin N [ORNL; Grimwood, Jane [Hudson Alpha Institute of Biotechnology; Jenkins, Jerry [Hudson Alpha Institute of Biotechnology; Barry, Kerrie [U.S. Department of Energy, Joint Genome Institute; Lindquist, Erika [U.S. Department of Energy, Joint Genome Institute; Hellsten, Uffe [U.S. Department of Energy, Joint Genome Institute; Deshpande, Shweta [U.S. Department of Energy, Joint Genome Institute; Wang, Xuewen [University of Georgia, Athens, GA; Wu, Xiaomei [University of Georgia, Athens, GA; Mitros, Therese [University of California, Berkeley; Triplett, Jimmy [University of Missouri, St. Louis; Yang, Xiaohan [ORNL; Ye, Chuyu [ORNL; Mauro-Herrera, Margarita [Oklahoma State University; Wang, Lin [Cornell University; Li, Pinghua [Cornell University; Sharma, Manoj [University of California, Davis; Sharma, Rita [University of California, Davis; Ronald, Pamela [University of California, Davis; Panaud, Olivier [Universite de Perpignan, Perpignan, France; Kellogg, Elizabeth A. [University of Missouri, St. Louis; Brutnell, Thomas P. [Cornell University; Doust, Andrew N. [Oklahoma State University; Tuskan, Gerald A [ORNL; Rokhsar, Daniel [U.S. Department of Energy, Joint Genome Institute; Devos, Katrien M [ORNL

2012-01-01

198

Fuzzy Genome Sequence Assembly for Single and Environmental Genomes  

Microsoft Academic Search

Summary. Traditional methods obtain a microorganism's DNA by culturing it in- dividually. Recent advances in genomics have lead to the procurement of DNA of more than one organism from its natural habitat. Indeed, natural microbial commu- nities are often very complex with tens and hundreds of species. Assembling these genomes is a crucial step irrespective of the method of obtaining

Sara Nasser; Adrienne Breland; Frederick C. Harris Jr.; Monica N. Nicolescu; Gregory L. Vert

2009-01-01

199

Comparative Analysis of Rice Genome Sequence to Understand the Molecular Basis of Genome Evolution  

Microsoft Academic Search

Accurate sequencing of the rice genome has ignited a passion for elucidating mechanism for sequence diversity among rice varieties\\u000a and species, both in protein-coding regions and in genomic regions that are important for chromosome functions. Here, we have\\u000a shown examples of sequence diversity in genic and non-genic regions. Sequence analysis of chromosome ends has revealed that\\u000a there is diversity in

Jianzhong Wu; Hiroshi Mizuno; Takuji Sasaki; Takashi Matsumoto

2008-01-01

200

Complete genome sequence of Sulfurospirillum deleyianum type strain (5175T)  

SciTech Connect

Sulfurospirillum deleyianum Schumacher et al. 1993 is the type species of the genus Sulfurospirillum. S. deleyianum is a model organism for studying sulfur reduction and dissimilatory nitrate reduction as energy source for growth. Also, it is a prominent model organism for studying the structural and functional characteristics of the cytochrome c nitrite reductase. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the genus Sulfurospirillum. The 2,306,351 bp long genome with its 2291 protein-coding and 52 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Lang, Elke [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Spring, Stefan [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

2010-01-01

201

Complete genome sequence of Thermomonospora curvata type strain (B9)  

SciTech Connect

Thermomonospora curvata Henssen 1957 is the type species of the genus Thermomonospora. This genus is of interest because members of this clade are sources of new antibiotics, enzymes, and products with pharmacological activity. In addition, members of this genus participate in the active degradation of cellulose. This is the first complete genome sequence of a member of the family Thermomonosporaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 5,639,016 bp long genome with its 4,985 protein-coding and 76 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

Chertkov, Olga [Los Alamos National Laboratory (LANL); Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Nolan, Matt [Joint Genome Institute, Walnut Creek, California; Lapidus, Alla L. [Joint Genome Institute, Walnut Creek, California; Lucas, Susan [Joint Genome Institute, Walnut Creek, California; Glavina Del Rio, Tijana [Joint Genome Institute, Walnut Creek, California; Tice, Hope [Joint Genome Institute, Walnut Creek, California; Cheng, Jan-Fang [Joint Genome Institute, Walnut Creek, California; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [Joint Genome Institute, Walnut Creek, California; Liolios, Konstantinos [Joint Genome Institute, Walnut Creek, California; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [Joint Genome Institute, Walnut Creek, California; Palaniappan, Krishna [Joint Genome Institute, Walnut Creek, California; Ngatchou, Olivier Duplex [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Brettin, Thomas S [ORNL; Han, Cliff [Los Alamos National Laboratory (LANL); Detter, J. Chris [Joint Genome Institute, Walnut Creek, California; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [Joint Genome Institute, Walnut Creek, California; Bristow, James [Joint Genome Institute, Walnut Creek, California; Eisen, Jonathan [Joint Genome Institute, Walnut Creek, California; Markowitz, Victor [Joint Genome Institute, Walnut Creek, California; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Kyrpides, Nikos C [Joint Genome Institute, Walnut Creek, California

2011-01-01

202

Complete genome sequence of Spirosoma linguale type strain (1T)  

SciTech Connect

Spirosoma linguale Migula 1894 is the type species of the genus. S. linguale is a free-living and non-pathogenic organism, known for its peculiar ringlike and horseshoe-shaped cell morphology. Here we describe the features of this organism, together with the complete ge-nome sequence and annotation. This is only the third completed genome sequence of a member of the family Cytophagaceae. The 8,491,258 bp long genome with its eight plas-mids, 7,069 protein-coding and 60 RNA genes is part of the Genomic Encyclopedia of Bacte-ria and Archaea project.

Lail, Kathleen [U.S. Department of Energy, Joint Genome Institute; Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Schutze, Andrea [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Tindall, Brian [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Chen, Feng [U.S. Department of Energy, Joint Genome Institute

2010-01-01

203

Complete genome sequence of Gordonia bronchialis type strain (3410T)  

PubMed Central

Gordonia bronchialis Tsukamura 1971 is the type species of the genus. G. bronchialis is a human-pathogenic organism that has been isolated from a large variety of human tissues. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family Gordoniaceae. The 5,290,012 bp long genome with its 4,944 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Ivanova, Natalia; Sikorski, Johannes; Jando, Marlen; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Del Rio, Tijana Glavina; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Chain, Patrick; Saunders, Elizabeth; Han, Cliff; Detter, John C.; Brettin, Thomas; Rohde, Manfred; Goker, Markus; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C.

2010-01-01

204

Complete genome sequence of Acidimicrobium ferrooxidans type strain (ICPT)  

SciTech Connect

Acidimicrobium ferrooxidans (Clark and Norris 1996) is the sole and type species of the genus, which until recently was the only genus within the actinobacterial family Acidimicrobiaceae and in the order Acidomicrobiales. Rapid oxidation of iron pyrite during autotrophic growth in the absence of an enhanced CO2 concentration is characteristic for A. ferrooxidans. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the order Acidomicrobiales, and the 2,158,157 bp long single replicon genome with its 2038 protein coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Clum, Alicia; Nolan, Matt; Lang, Elke; Glavina Del Rio, Tijana; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavrommatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Goker, Markus; Spring, Stefan; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia C.; Chain, Patrick; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Lapidus, Alla

2009-05-20

205

Curated list of prokaryote viruses with fully sequenced genomes  

Microsoft Academic Search

Genome sequencing is of enormous importance for classification of prokaryote viruses and for understanding the evolution of these viruses. This survey covers 284 sequenced viruses for which a full description has been published and for which the morphology is known. This corresponds to 219 (4%) of tailed and 75 (36%) of tailless viruses of prokaryotes. The number of sequenced tailless

Hans-W. Ackermann; Andrew M. Kropinski

2007-01-01

206

Cataloging Coding Sequence Variations in Human Genome Databases  

Microsoft Academic Search

BackgroundWith the recent growth of information on sequence variations in the human genome, predictions regarding the functional effects and relevance to disease phenotypes of coding sequence variations are becoming increasingly important. The aims of this study were to catalog protein-coding sequence variations (CVs) occurring in genetic variation databases and to use bioinformatic programs to analyze CVs. In addition, we aim

Hong-Hee Won; Hee-Jin Kim; Kyung-A. Lee; Jong-Won Kim; Cecile Fairhead

2008-01-01

207

Indexing Huge Genome Sequences for Solving Various Problems  

Microsoft Academic Search

Because of the increase in the size of genome sequence databases, the importance of indexing the sequences for fast queries grows. Suffix trees and suffix arrays are used for simple queries. However these are not suitable for complicated queries from huge amount of sequences because the indices are stored in disk which has slow access speed. We propose storing the

Kunihiko Sadakane; Tetsuo Shibuya

2001-01-01

208

Complete genome sequence of Thioalkalivibrio sp. K90mix.  

PubMed

Thioalkalivibrio sp. K90mix is an obligately chemolithoautotrophic, natronophilic sulfur-oxidizing bacterium (SOxB) belonging to the family Ectothiorhodospiraceae within the Gammaproteobacteria. The strain was isolated from a mixture of sediment samples obtained from different soda lakes located in the Kulunda Steppe (Altai, Russia) based on its extreme potassium carbonate tolerance as an enrichment method. Here we report the complete genome sequence of strain K90mix and its annotation. The genome was sequenced within the Joint Genome Institute Community Sequencing Program, because of its relevance to the sustainable removal of sulfide from wastewater and gas streams. PMID:22675584

Muyzer, Gerard; Sorokin, Dimitry Y; Mavromatis, Konstantinos; Lapidus, Alla; Foster, Brian; Sun, Hui; Ivanova, Natalia; Pati, Amrita; D'haeseleer, Patrik; Woyke, Tanja; Kyrpides, Nikos C

2011-12-23

209

Management of incidental findings in clinical genomic sequencing.  

PubMed

Genomic sequencing is becoming accurate, fast, and inexpensive, and is rapidly being incorporated into clinical practice. Incidental findings, which result in large numbers from genomic sequencing, are a potential barrier to the utility of this new technology due to their high prevalence and the lack of evidence or guidelines available to guide their clinical interpretation. This unit reviews the definition, classification, and management of incidental findings from genomic sequencing. The unit focuses on the clinical aspects of handling incidental findings, with an emphasis on the key role of clinical context in defining incidental findings and determining their clinical relevance and utility. PMID:23595601

Krier, Joel B; Green, Robert C

2013-01-01

210

Complete genome sequence of Staphylothermus hellenicus P8T  

SciTech Connect

Staphylothermus hellenicus belongs to the order Desulfurococcales within the archaeal phy- lum Crenarchaeota. Strain P8T is the type strain of the species and was isolated from a shal- low hydrothermal vent system at Palaeochori Bay, Milos, Greece. It is a hyperthermophilic, anaerobic heterotroph. Here we describe the features of this organism together with the com- plete genome sequence and annotation. The 1,580,347 bp genome with its 1,668 protein- coding and 48 RNA genes was sequenced as part of a DOE Joint Genome Institute (JGI) La- boratory Sequencing Program (LSP) project.

Anderson, Iain [U.S. Department of Energy, Joint Genome Institute; Wirth, Reinhard [Universitat Regensburg, Regensburg, Germany; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Davenport, Karen W. [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Tapia, Roxanne [Los Alamos National Laboratory (LANL); Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute

2011-01-01

211

Complete genome sequence of Staphylothermus hellenicus P8.  

PubMed

Staphylothermus hellenicus belongs to the order Desulfurococcales within the archaeal phylum Crenarchaeota. Strain P8(T) is the type strain of the species and was isolated from a shallow hydrothermal vent system at Palaeochori Bay, Milos, Greece. It is a hyperthermophilic, anaerobic heterotroph. Here we describe the features of this organism together with the complete genome sequence and annotation. The 1,580,347 bp genome with its 1,668 protein-coding and 48 RNA genes was sequenced as part of a DOE Joint Genome Institute (JGI) Laboratory Sequencing Program (LSP) project. PMID:22180806

Anderson, Iain; Wirth, Reinhard; Lucas, Susan; Copeland, Alex; Lapidus, Alla; Cheng, Jan-Fang; Goodwin, Lynne; Pitluck, Samuel; Davenport, Karen; Detter, John C; Han, Cliff; Tapia, Roxanne; Land, Miriam; Hauser, Loren; Pati, Amrita; Mikhailova, Natalia; Woyke, Tanja; Klenk, Hans-Peter; Kyrpides, Nikos; Ivanova, Natalia

2011-09-23

212

Complete genome sequence of Thioalkalivibrio sp. K90mix  

PubMed Central

Thioalkalivibrio sp. K90mix is an obligately chemolithoautotrophic, natronophilic sulfur-oxidizing bacterium (SOxB) belonging to the family Ectothiorhodospiraceae within the Gammaproteobacteria. The strain was isolated from a mixture of sediment samples obtained from different soda lakes located in the Kulunda Steppe (Altai, Russia) based on its extreme potassium carbonate tolerance as an enrichment method. Here we report the complete genome sequence of strain K90mix and its annotation. The genome was sequenced within the Joint Genome Institute Community Sequencing Program, because of its relevance to the sustainable removal of sulfide from wastewater and gas streams.

Muyzer, Gerard; Sorokin, Dimitry Y.; Mavromatis, Konstantinos; Lapidus, Alla; Foster, Brian; Sun, Hui; Ivanova, Natalia; Pati, Amrita; D'haeseleer, Patrik; Woyke, Tanja; Kyrpides, Nikos C.

2011-01-01

213

Exploring Microbial Genome Sequences to Identify Protein Families on the Grid.  

National Technical Information Service (NTIS)

The analysis of microbial genome sequences can identify protein families that provide potential drug targets for new antibiotics. With the rapid accumulation of newly sequenced genomes, the analysis of complete genome sequences has become a computationall...

Y. Sun A. Wipat M. Pocock P. Lee K. Flanagan J. Worthington

2005-01-01

214

Exon discovery by genomic sequence alignment  

Microsoft Academic Search

Motivation: During evolution, functional regions in ge- nomic sequences tend to be more highly conserved than randomly mutating 'junk DNA' so local sequence similarity often indicates biological functionality. This fact can be used to identify functional elements in large eukaryotic DNA sequences by cross-species sequence comparison. In recent years, several gene-prediction methods have been proposed that work by comparing anonymous

Burkhard Morgenstern; Oliver Rinner; Saïd Abdeddaïm; Dirk Haase; Klaus F. X. Mayer; Andreas W. M. Dress; Hans-werner Mewes

2002-01-01

215

Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence  

Microsoft Academic Search

BACKGROUND: Previous studies have suggested that recent segmental duplications, which are often involved in chromosome rearrangements underlying genomic disease, account for some 5% of the human genome. We have developed rapid computational heuristics based on BLAST analysis to detect segmental duplications, as well as regions containing potential sequence misassignments in the human genome assemblies. RESULTS: Our analysis of the June

Joseph Cheung; Xavier Estivill; Razi Khaja; Jeffrey R MacDonald; Ken Lau; Lap-Chee Tsui; Stephen W Scherer

2003-01-01

216

Rapid Genome Evolution Revealed by Comparative Sequence Analysis of Orthologous Regions from Four Triticeae Genomes  

Microsoft Academic Search

Bread wheat (Triticum aestivum) is an allohexaploid species, consisting of three subgenomes (A, B, and D). To study the molecular evolution of these closely related genomes, we compared the sequence of a 307-kb physical contig covering the high molecular weight (HMW)-glutenin locus from the A genome of durum wheat (Triticum turgidum, AABB) with the orthologous regions from the B genome

Yong Qiang Gu; Devin Coleman-Derr; Xiuying Kong; Olin D. Anderson

2004-01-01

217

Characterizing and interpreting genetic variation from personal genome sequencing.  

PubMed

Since the completion of the human genome project, there has been enormous progress in the development of novel technologies for DNA sequencing. The advent of next-generation sequencing technologies now makes it possible to sequence an entire human genome in one or a few experiments. As a consequence, several individual human genomes have now been fully sequenced, using different experimental strategies. Although the protocols differ between the various sequencing technologies, the challenges of analyzing the data, calling variation, and interpreting the results are similar for all platforms. Here, we give an overview of the human genome sequencing projects completed to date. The strategies for aligning sequence reads and extracting information about different types of genetic variation from the sequence data are discussed. Identification of structural variation, such as copy number variation and insertion-deletion variants, can be complex, and there are a plethora of algorithms and analysis tools available. We also give an overview of the challenge of interpreting the whole-genome sequence data both from a technical and clinical perspective. PMID:22228021

Johansson, Anna C V; Feuk, Lars

2012-01-01

218

Visualizing associations between genome sequences and gene expression data using genome-mean expression profiles  

Microsoft Academic Search

The combination of genome-wide expression patterns and full genome sequences offers a great opportunity to further our understanding of the mechanisms and logic of transcriptional regulation. Many methods have been described that identify sequence motifs enriched in transcription control regions of genes that share similar gene expression patterns. Here we present an alternative approach that evaluates the transcriptional information contained

Derek Y. Chiang; Patrick O. Brown; Michael B. Eisen

2001-01-01

219

The Arabidopsis lyrata genome sequence and the basis of rapid genome size change  

SciTech Connect

In our manuscript, we present a high-quality genome sequence of the Arabidopsis thaliana relative, Arabidopsis lyrata, produced by dideoxy sequencing. We have performed the usual types of genome analysis (gene annotation, dN/dS studies etc. etc.), but this is relegated to the Supporting Information. Instead, we focus on what was a major motivation for sequencing this genome, namely to understand how A. thaliana lost half its genome in a few million years and lived to tell the tale. The rather surprising conclusion is that there is not a single genomic feature that accounts for the reduced genome, but that every aspect centromeres, intergenic regions, transposable elements, gene family number is affected through hundreds of thousands of cuts. This strongly suggests that overall genome size in itself is what has been under selection, a suggestion that is strongly supported by our demonstration (using population genetics data from A. thaliana) that new deletions seem to be driven to fixation.

Hu, Tina T.; Pattyn, Pedro; Bakker, Erica G.; Cao, Jun; Cheng, Jan-Fang; Clark, Richard M.; Fahlgren, Noah; Fawcett, Jeffrey A.; Grimwood, Jane; Gundlach, Heidrun; Haberer, Georg; Hollister, Jesse D.; Ossowski, Stephan; Ottilar, Robert P.; Salamov, Asaf A.; Schneeberger, Korbinian; Spannagl, Manuel; Wang, Xi; Yang, Liang; Nasrallah, Mikhail E.; Bergelson, Joy; Carrington, James C.; Gaut, Brandon S.; Schmutz, Jeremy; Mayer, Klaus F. X.; Van de Peer, Yves; Grigoriev, Igor V.; Nordborg, Magnus; Weigel, Detlef; Guo, Ya-Long

2011-04-29

220

Assembly of large genomes using second-generation sequencing  

PubMed Central

Second-generation sequencing technology can now be used to sequence an entire human genome in a matter of days and at low cost. Sequence read lengths, initially very short, have rapidly increased since the technology first appeared, and we now are seeing a growing number of efforts to sequence large genomes de novo from these short reads. In this Perspective, we describe the issues associated with short-read assembly, the different types of data produced by second-gen sequencers, and the latest assembly algorithms designed for these data. We also review the genomes that have been assembled recently from short reads and make recommendations for sequencing strategies that will yield a high-quality assembly.

Schatz, Michael C.; Delcher, Arthur L.; Salzberg, Steven L.

2010-01-01

221

Recurrent insertion and duplication generate networks of transposable element sequences in the Drosophila melanogaster genome  

Microsoft Academic Search

BACKGROUND: The recent availability of genome sequences has provided unparalleled insights into the broad-scale patterns of transposable element (TE) sequences in eukaryotic genomes. Nevertheless, the difficulties that TEs pose for genome assembly and annotation have prevented detailed, quantitative inferences about the contribution of TEs to genomes sequences. RESULTS: Using a high-resolution annotation of TEs in Release 4 genome sequence, we

Casey M Bergman; Hadi Quesneville; Dominique Anxolabéhère; Michael Ashburner

2006-01-01

222

Complete genome sequence of Cellulomonas flavigena type strain (134T)  

SciTech Connect

Cellulomonas flavigena (Kellerman and McBeth 1912) Bergey et al. 1923 is the type species of the genus Cellulomonas of the actinobacterial family Cellulomonadaceae. Members of the genus Cellulomonas are of special interest for their ability to degrade cellulose and hemicellulose, particularly with regard to the use of biomass as an alternative energy source. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the genus Cellulomonas, and next to the human pathogen Tropheryma whipplei the second complete genome sequence within the actinobacterial family Cellulomonadaceae. The 4,123,179 bp long single replicon genome with its 3,735 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Abt, Birte [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Foster, Brian [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Clum, Alicia [U.S. Department of Energy, Joint Genome Institute; Sun, Hui [U.S. Department of Energy, Joint Genome Institute; Pukall, Rudiger [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

2010-01-01

223

Complete genome sequence of Haloterrigena turkmenica type strain (4k).  

PubMed

Haloterrigena turkmenica (Zvyagintseva and Tarasov 1987) Ventosa et al. 1999, comb. nov. is the type species of the genus Haloterrigena in the euryarchaeal family Halobacteriaceae. It is of phylogenetic interest because of the yet unclear position of the genera Haloterrigena and Natrinema within the Halobacteriaceae, which created some taxonomic problems historically. H. turkmenica, was isolated from sulfate saline soil in Turkmenistan, is a relatively fast growing, chemoorganotrophic, carotenoid-containing, extreme halophile, requiring at least 2 M NaCl for growth. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the genus Haloterrigena, but the eighth genome sequence from a member of the family Halobacteriaceae. The 5,440,782 bp genome (including six plasmids) with its 5,287 protein-coding and 63 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304683

Saunders, Elisabeth; Tindall, Brian J; Fähnrich, Regine; Lapidus, Alla; Copeland, Alex; Del Rio, Tijana Glavina; Lucas, Susan; Chen, Feng; Tice, Hope; Cheng, Jan-Fang; Han, Cliff; Detter, John C; Bruce, David; Goodwin, Lynne; Chain, Patrick; Pitluck, Sam; Pati, Amrita; Ivanova, Natalia; Mavromatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Brettin, Thomas; Rohde, Manfred; Göker, Markus; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

2010-02-28

224

Genome sequencing and analysis of the model grass Brachypodium distachyon  

SciTech Connect

Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.

Yang, Xiaohan [ORNL; Kalluri, Udaya C [ORNL; Tuskan, Gerald A [ORNL

2010-01-01

225

Genome analysis: A new approach for visualization of sequence organization in genomes  

Microsoft Academic Search

In this article we describe and demonstrate the versatility of a computer program, GENOME MAPPING, that uses interactive graphics\\u000a and runs on an IRIS workstation. The program helps to visualize as well as analyse global and local patterns of genomic DNA\\u000a sequences. It was developed keeping in mind the requirements of the human genome sequencing programme, which requires rapid\\u000a analysis

Pradeep Kumar Burma; Alok Raj; Jayant K. Deb; Samir K. Brahmachari

1992-01-01

226

Mitochondrial genome sequences and comparative genomics of Phytophthora ramorum and P. sojae  

Microsoft Academic Search

The sequences of the mitochondrial genomes of the oomycetes Phytophthora ramorum and P. sojae were determined during the course of complete nuclear genome sequencing (Tyler et al., Science, 313:1261,2006). Both mitochondrial\\u000a genomes are circular mapping, with sizes of 39,314 bp for P. ramorum and 42,977 bp for P. sojae. Each contains a total of 37 recognizable protein-encoding genes, 26 or 25 tRNAs (P.

Frank N. Martin; Douda Bensasson; Brett M. Tyler; Jeffrey L. Boore

2007-01-01

227

Complete genome sequence and comparative genomic analysis of an emerging human pathogen, serotype V Streptococcus agalactiae  

Microsoft Academic Search

The 2,160,267 bp genome sequence of Streptococcus agalactiae, the leading cause of bacterial sepsis, pneumonia, and meningitis in neonates in the U.S. and Europe, is predicted to encode 2,175 genes. Genome comparisons among S. agalactiae, Streptococcus pneumoniae, Streptococcus pyogenes, and the other completely sequenced genomes identified genes specific to the streptococci and to S. agalactiae. These in silico analyses, combined

Hervé Tettelin; Vega Masignani; Michael J. Cieslewicz; Jonathan A. Eisen; Scott Peterson; Michael R. Wessels; Ian T. Paulsen; Karen E. Nelson; Immaculada Margarit; Timothy D. Read; Lawrence C. Madoff; Alex M. Wolf; Maureen J. Beanan; Lauren M. Brinkac; Sean C. Daugherty; Robert T. Deboy; A. Scott Durkin; James F. Kolonay; Ramana Madupu; Matthew R. Lewis; Diana Radune; Nadezhda B. Fedorova; David Scanlan; Hoda Khouri; Stephanie Mulligan; Heather A. Carty; Robin T. Cline; Susan E. van Aken; John Gill; Maria Scarselli; Marirosa Mora; Emilia T. Iacobini; Cecilia Brettoni; Giuliano Galli; Massimo Mariani; Filippo Vegni; Domenico Maione; Daniela Rinaudo; Rino Rappuoli; John L. Telford; Dennis L. Kasper; Guido Grandi; Claire M. Fraser

2002-01-01

228

High-quality genome sequence of Pichia pastoris CBS7435.  

PubMed

The methylotrophic yeast Pichia pastoris (Komagataella phaffii) CBS7435 is the parental strain of commonly used P. pastoris recombinant protein production hosts making it well suited for improving the understanding of associated genomic features. Here, we present a 9.35 Mbp high-quality genome sequence of P. pastoris CBS7435 established by a combination of 454 and Illumina sequencing. An automatic annotation of the genome sequence yielded 5007 protein-coding genes, 124 tRNAs and 29 rRNAs. Moreover, we report the complete DNA sequence of the first mitochondrial genome of a methylotrophic yeast. Fifteen genes encoding proteins, 2 rRNA and 25 tRNA loci were identified on the 35.7 kbp circular, mitochondrial DNA. Furthermore, the architecture of the putative alpha mating factor protein of P. pastoris CBS7435 turned out to be more complex than the corresponding protein of Saccharomyces cerevisiae. PMID:21575661

Küberl, Andreas; Schneider, Jessica; Thallinger, Gerhard G; Anderl, Ingund; Wibberg, Daniel; Hajek, Tanja; Jaenicke, Sebastian; Brinkrolf, Karina; Goesmann, Alexander; Szczepanowski, Rafael; Pühler, Alfred; Schwab, Helmut; Glieder, Anton; Pichler, Harald

2011-05-06

229

Genome Sequence of the Fish Pathogen Flavobacterium columnare ATCC 49512  

PubMed Central

Flavobacterium columnare is a Gram-negative, rod-shaped, motile, and highly prevalent fish pathogen causing columnaris disease in freshwater fish worldwide. Here, we present the complete genome sequence of F. columnare strain ATCC 49512.

Tekedar, Hasan C.; Karsi, Attila; Gillaspy, Allison F.; Dyer, David W.; Benton, Nicole R.; Zaitshik, Jeremy; Vamenta, Stefanie; Banes, Michelle M.; Gulsoy, Nagihan; Aboko-Cole, Mary; Waldbieser, Geoffrey C.

2012-01-01

230

Genome Sequence of the Halophilic Archaeon Halococcus hamelinensis  

PubMed Central

Halococcus hamelinensis was isolated from hypersaline stromatolites in Shark Bay, Australia. Here we report the genome sequence (3,133,046 bp) of H. hamelinensis, which provides insights into the ecology, evolution, and adaptation of this novel microorganism.

Gudhka, Reema K.; Neilan, Brett A.

2012-01-01

231

Complete Genome Sequence of Pseudomonas denitrificans ATCC 13867  

PubMed Central

Pseudomonas denitrificans ATCC 13867, a Gram-negative facultative anaerobic bacterium, is known to produce vitamin B12 under aerobic conditions. This paper reports the annotated whole-genome sequence of the circular chromosome of this organism.

Ainala, Satish Kumar; Somasundar, Ashok

2013-01-01

232

Cancer Genome Sequencing and Its Implications for Personalized Cancer Vaccines  

PubMed Central

New DNA sequencing platforms have revolutionized human genome sequencing. The dramatic advances in genome sequencing technologies predict that the $1,000 genome will become a reality within the next few years. Applied to cancer, the availability of cancer genome sequences permits real-time decision-making with the potential to affect diagnosis, prognosis, and treatment, and has opened the door towards personalized medicine. A promising strategy is the identification of mutated tumor antigens, and the design of personalized cancer vaccines. Supporting this notion are preliminary analyses of the epitope landscape in breast cancer suggesting that individual tumors express significant numbers of novel antigens to the immune system that can be specifically targeted through cancer vaccines.

Li, Lijin; Goedegebuure, Peter; Mardis, Elaine R.; Ellis, Matthew J.C.; Zhang, Xiuli; Herndon, John M.; Fleming, Timothy P.; Carreno, Beatriz M.; Hansen, Ted H.; Gillanders, William E.

2011-01-01

233

Complete Genome Sequences of Six Strains of the Genus Methylobacterium  

SciTech Connect

The complete and assembled genome sequences were determined for six strains of the alphaproteobacterial genus Methylobacterium, chosen for their key adaptations to different plant-associated niches and environmental constraints.

Marx, Christopher J [Harvard University; Bringel, Francoise O. [University of Strasbourg; Christoserdova, Ludmila [University of Washington, Seattle; Moulin, Lionel [UMR, France; UI Hague, Muhammad Farhan [University of Strasbourg; Fleischman, Darrell E. [Wright State University, Dayton, OH; Gruffaz, Christelle [CNRS, Strasbourg, France; Jourand, Philippe [UMR, France; Knief, Claudia [ETH Zurich, Switzerland; Lee, Ming-Chun [Harvard University; Muller, Emilie E. L. [CNRS, Strasbourg, France; Nadalig, Thierry [CNRS, Strasbourg, France; Peyraud, Remi [ETH Zurich, Switzerland; Roselli, Sandro [CNRS, Strasbourg, France; Russ, Lina [ETH Zurich, Switzerland; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Ivanov, Pavel S. [University of Wyoming, Laramie; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Lajus, Aurelie [Genoscope/Centre National de la Recherche Scientifique-Unite Mixte de Recherche; Land, Miriam L [ORNL; Medigue, Claudine [Genoscope/Centre National de la Recherche Scientifique-Unite Mixte de Recherche; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Stolyar, Sergey [University of Washington; Vorholt, Julia A. [ETH Zurich, Switzerland; Vuilleumier, Stephane [University of Strasbourg

2012-01-01

234

Complete genome sequences of six strains of the genus methylobacterium  

SciTech Connect

The complete and assembled genome sequences were determined for six strains of the alphaproteobacterial genus Methylobacterium, chosen for their key adaptations to different plant-associated niches and environmental constraints.

Marx, Christopher J [Harvard University; Bringel, Francoise O. [University of Strasbourg; Christoserdova, Ludmila [University of Washington, Seattle; Moulin, Lionel [UMR, France; Farhan Ul Haque, Muhammad [CNRS, Strasbourg, France; Fleischman, Darrell E. [Wright State University, Dayton, OH; Gruffaz, Christelle [CNRS, Strasbourg, France; Jourand, Philippe [UMR, France; Knief, Claudia [ETH Zurich, Switzerland; Lee, Ming-Chun [Harvard University; Muller, Emilie E. L. [CNRS, Strasbourg, France; Nadalig, Thierry [CNRS, Strasbourg, France; Peyraud, Remi [ETH Zurich, Switzerland; Roselli, Sandro [CNRS, Strasbourg, France; Russ, Lina [ETH Zurich, Switzerland; Aguero, Fernan [Universidad Nacional de General San Martin; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Lajus, Aurelie [Genoscope/Centre National de la Recherche Scientifique-Unite Mixte de Recherche; Land, Miriam L [ORNL; Medigue, Claudine [Genoscope/Centre National de la Recherche Scientifique-Unite Mixte de Recherche; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Stolyar, Sergey [University of Washington; Vorholt, Julia A. [ETH Zurich, Switzerland; Vuilleumier, Stephane [University of Strasbourg

2012-01-01

235

Complete Genome Sequence of Rahnella aquatilis CIP 78.65  

PubMed Central

Rahnella aquatilis CIP 78.65 is a gammaproteobacterium isolated from a drinking water source in Lille, France. Here we report the complete genome sequence of Rahnella aquatilis CIP 78.65, the type strain of R. aquatilis.

Bruce, David; Detter, Chris; Goodwin, Lynne A.; Han, James; Han, Cliff S.; Held, Brittany; Land, Miriam L.; Mikhailova, Natalia; Nolan, Matt; Pennacchio, Len; Pitluck, Sam; Tapia, Roxanne; Woyke, Tanja; Sobecky, Patricia A.

2012-01-01

236

Genome Sequence of the Immunomodulatory Strain Bifidobacterium bifidum LMG 13195  

PubMed Central

In this work, we report the genome sequences of Bifidobacterium bifidum strain LMG13195. Results from our research group show that this strain is able to interact with human immune cells, generating functional regulatory T cells.

Gueimonde, Miguel; Ventura, Marco; Margolles, Abelardo

2012-01-01

237

Draft Genome Sequence of Lactobacillus casei W56  

PubMed Central

We announce the draft genome sequence of Lactobacillus casei W56 in one contig. This strain shows immunomodulatory and probiotic properties. The strain is also an ingredient of commercially available probiotic products.

Hochwind, Kerstin; Weinmaier, Thomas; Schmid, Michael; van Hemert, Saskia; Hartmann, Anton; Rattei, Thomas

2012-01-01

238

Bacterial epidemiology and biology - lessons from genome sequencing  

PubMed Central

Next-generation sequencing has ushered in a new era of microbial genomics, enabling the detailed historical and geographical tracing of bacteria. This is helping to shape our understanding of bacterial evolution.

2011-01-01

239

Sequencing of Chloroplast Genome Using Whole Cellular DNA and Solexa Sequencing Technology  

PubMed Central

Sequencing of the chloroplast (cp) genome using traditional sequencing methods has been difficult because of its size (>120?kb) and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the cp genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassica rapa accessions with one lane per accession. In total, 246, 362, and 361?Mb sequence data were generated for the three accessions Chiifu-401-42, Z16, and FT, respectively. Micro-reads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7–99.8 or 95.5–99.7% of the B. rapa cp genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of cp genome.

Wu, Jian; Liu, Bo; Cheng, Feng; Ramchiary, Nirala; Choi, Su Ryun; Lim, Yong Pyo; Wang, Xiao-Wu

2012-01-01

240

Sequencing of chloroplast genome using whole cellular DNA and solexa sequencing technology.  

PubMed

Sequencing of the chloroplast (cp) genome using traditional sequencing methods has been difficult because of its size (>120?kb) and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the cp genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassicarapa accessions with one lane per accession. In total, 246, 362, and 361?Mb sequence data were generated for the three accessions Chiifu-401-42, Z16, and FT, respectively. Micro-reads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7-99.8 or 95.5-99.7% of the B. rapa cp genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of cp genome. PMID:23162558

Wu, Jian; Liu, Bo; Cheng, Feng; Ramchiary, Nirala; Choi, Su Ryun; Lim, Yong Pyo; Wang, Xiao-Wu

2012-11-08

241

Compressing Genomic Sequence Fragments Using SlimGene  

NASA Astrophysics Data System (ADS)

With the advent of next generation sequencing technologies, the cost of sequencing whole genomes is poised to go below 1000 per human individual in a few years. As more and more genomes are sequenced, analysis methods are undergoing rapid development, making it tempting to store sequencing data for long periods of time so that the data can be re-analyzed with the latest techniques. The challenging open research problems, huge influx of data, and rapidly improving analysis techniques have created the need to store and transfer very large volumes of data.

Kozanitis, Christos; Saunders, Chris; Kruglyak, Semyon; Bafna, Vineet; Varghese, George

242

Complete genome sequence of Treponema pallidum strain DAL-1  

PubMed Central

Treponema pallidum strain DAL-1 is a human uncultivable pathogen causing the sexually transmitted disease syphilis. Strain DAL-1 was isolated from the amniotic fluid of a pregnant woman in the secondary stage of syphilis. Here we describe the 1,139,971 bp long genome of T. pallidum strain DAL-1 which was sequenced using two independent sequencing methods (454 pyrosequencing and Illumina). In rabbits, strain DAL-1 replicated better than the T. pallidum strain Nichols. The comparison of the complete DAL-1 genome sequence with the Nichols sequence revealed a list of genetic differences that are potentially responsible for the increased rabbit virulence of the DAL-1 strain.

Zobanikova, Marie; Mikolka, Pavol; Cejkova, Darina; Pospisilova, Petra; Chen, Lei; Strouhal, Michal; Qin, Xiang; Weinstock, George M.; Smajs, David

2012-01-01

243

An integrated semiconductor device enabling non-optical genome sequencing  

Microsoft Academic Search

The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing

Wolfgang Hinz; Todd M. Rearick; Jonathan Schultz; William Mileski; Mel Davey; John H. Leamon; Kim Johnson; Mark J. Milgrew; Matthew Edwards; Jeremy Hoon; Jan F. Simons; David Marran; Jason W. Myers; John F. Davidson; Annika Branting; John R. Nobile; Bernard P. Puc; David Light; Travis A. Clark; Martin Huber; Jeffrey T. Branciforte; Isaac B. Stoner; Simon E. Cawley; Michael Lyons; Yutao Fu; Nils Homer; Marina Sedova; Xin Miao; Brian Reed; Jeffrey Sabina; Erika Feierstein; Michelle Schorn; Mohammad Alanjary; Eileen Dimalanta; Devin Dressman; Rachel Kasinskas; Tanya Sokolsky; Jacqueline A. Fidanza; Eugeni Namsaraev; Kevin J. McKernan; Alan Williams; G. Thomas Roth; James Bustillo; Jonathan M. Rothberg

2011-01-01

244

A non-radioactive multiprime sequencing method for HIV genomes  

Microsoft Academic Search

A manual non-radioactive DNA sequencing protocol was developed for rapid analysis of variable HIV-1 genomes. Sets of up to ten primers were used in one sequencing reaction. After polyacrylamide gel electrophoresis and blotting onto nylon membranes the individual sequences were detected by hybridization with digoxigenin-labelled oligonucleotides and chemiluminescence. The method is applicable to any sequencing project where numerous variants of

Jutta Huber; Wolfgang Hell; Hans Wolf

1995-01-01

245

Intra-species sequence comparisons for annotating genomes  

SciTech Connect

Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intra-species sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents and a set of genomic intervals amplified, resequenced and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom and raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. The sequence data from this study has been submitted to GenBank under accession nos. AY667278-AY667407.

Boffelli, Dario; Weer, Claire V.; Weng, Li; Lewis, Keith D.; Shoukry, Malak I.; Pachter, Lior; Keys, David N.; Rubin, Edward M.

2004-07-15

246

Complete Genome Sequence of Methanomassiliicoccus luminyensis, the Largest Genome of a Human-Associated Archaea Species  

PubMed Central

The present study describes the complete and annotated genome sequence of Methanomassiliicoccus luminyensis strain B10 (DSM 24529T, CSUR P135), which was isolated from human feces. The 2.6-Mb genome represents the largest genome of a methanogenic euryarchaeon isolated from humans. The genome data of M. luminyensis reveal unique features and horizontal gene transfer events, which might have occurred during its adaptation and/or evolution in the human ecosystem.

Gorlas, Aurore; Robert, Catherine; Gimenez, Gregory; Drancourt, Michel

2012-01-01

247

The Genomic HyperBrowser: inferential genomics at the sequence level  

PubMed Central

The immense increase in the generation of genomic scale data poses an unmet analytical challenge, due to a lack of established methodology with the required flexibility and power. We propose a first principled approach to statistical analysis of sequence-level genomic information. We provide a growing collection of generic biological investigations that query pairwise relations between tracks, represented as mathematical objects, along the genome. The Genomic HyperBrowser implements the approach and is available at http://hyperbrowser.uio.no.

2010-01-01

248

Microsatellite evolution inferred from human- chimpanzee genomic sequence alignments  

Microsoft Academic Search

Most studies of microsatellite evolution utilize long, highly mutable loci, which are unrepresentative of the majority of simple repeats in the human genome. Here we use an unbiased sample of 2,467 microsatellite loci derived from alignments of 5.1 Mb of genomic sequence from human and chimpanzee to investigate the mutation process of tandemly repetitive DNA. The results indicate that the

Matthew T. Webster; Nick G. C. Smith; Hans Ellegren

2002-01-01

249

Second Generation Sequencing of the Mesothelioma Tumor Genome  

Microsoft Academic Search

The current paradigm for elucidating the molecular etiology of cancers relies on the interrogation of small numbers of genes, which limits the scope of investigation. Emerging second-generation massively parallel DNA sequencing technologies have enabled more precise definition of the cancer genome on a global scale. We examined the genome of a human primary malignant pleural mesothelioma (MPM) tumor and matched

Raphael Bueno; Assunta de Rienzo; Lingsheng Dong; Gavin J. Gordon; Colin F. Hercus; William G. Richards; Roderick V. Jensen; Arif Anwar; Gautam Maulik; Lucian R. Chirieac; Kim-Fong Ho; Bruce E. Taillon; Cynthia L. Turcotte; Robert G. Hercus; Steven R. Gullans; David J. Sugarbaker; Anita Brandstaetter

2010-01-01

250

Insights into hominid evolution from the gorilla genome sequence  

Microsoft Academic Search

Gorillas are humans’ closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing

Aylwyn Scally; Julien Y. Dutheil; LaDeana W. Hillier; Gregory E. Jordan; Ian Goodhead; Javier Herrero; Asger Hobolth; Tuuli Lappalainen; Thomas Mailund; Tomas Marques-Bonet; Shane McCarthy; Stephen H. Montgomery; Petra C. Schwalie; Y. Amy Tang; Michelle C. Ward; Yali Xue; Bryndis Yngvadottir; Can Alkan; Lars N. Andersen; Qasim Ayub; Edward V. Ball; Kathryn Beal; Brenda J. Bradley; Yuan Chen; Chris M. Clee; Stephen Fitzgerald; Tina A. Graves; Yong Gu; Paul Heath; Andreas Heger; Emre Karakoc; Anja Kolb-Kokocinski; Gavin K. Laird; Gerton Lunter; Stephen Meader; Matthew Mort; James C. Mullikin; Kasper Munch; Timothy D. O’Connor; Andrew D. Phillips; Javier Prado-Martinez; Anthony S. Rogers; Saba Sajjadian; Dominic Schmidt; Katy Shaw; Jared T. Simpson; Peter D. Stenson; Daniel J. Turner; Linda Vigilant; Albert J. Vilella; Weldon Whitener; Baoli Zhu; David N. Cooper; Pieter de Jong; Emmanouil T. Dermitzakis; Evan E. Eichler; Paul Flicek; Nick Goldman; Nicholas I. Mundy; Zemin Ning; Duncan T. Odom; Chris P. Ponting; Michael A. Quail; Oliver A. Ryder; Stephen M. Searle; Wesley C. Warren; Richard K. Wilson; Mikkel H. Schierup; Jane Rogers; Chris Tyler-Smith; Richard Durbin

2012-01-01

251

Genome Sequence of Pectobacterium sp. Strain SCC3193  

PubMed Central

We report the complete and annotated genome sequence of the plant-pathogenic enterobacterium Pectobacterium sp. strain SCC3193, a model strain isolated from potato in Finland. The Pectobacterium sp. SCC3193 genome consists of a 516,411-bp chromosome, with no plasmids.

Koskinen, J. Patrik; Laine, Pia; Niemi, Outi; Nykyri, Johanna; Harjunpaa, Heidi; Auvinen, Petri; Paulin, Lars; Pirhonen, Minna; Palva, Tapio

2012-01-01

252

A snapshot of the emerging tomato genome sequence  

Technology Transfer Automated Retrieval System (TEKTRAN)

The genome of tomato (Solanum lycopersicum) is being sequenced by an international consortium of 10 countries (Korea, China, the United Kingdom, India, the Netherlands, France, Japan, Spain, Italy and the United States) as part of a larger initiative called the ‘International Solanaceae Genome Proje...

253

Draft genome sequence of Paenibacillus peoriae strain KCTC 3763T.  

PubMed

Paenibacillus peoriae is a potentially plant-beneficial soil bacterium and is a close relative to Paenibacillus polymyxa, the type species of the genus Paenibacillus. Herein, we present the 5.77-Mb draft genome sequence of the P. peoriae type strain with the aim of providing insight into the genomic basis of plant growth-promoting Paenibacillus species. PMID:22328743

Jeong, Haeyoung; Choi, Soo-Keun; Park, Soo-Young; Kim, Sun Hong; Park, Seung-Hwan

2012-03-01

254

The Genomic Sequence of the Accidental Pathogen Legionella pneumophila  

Microsoft Academic Search

We present the genomic sequence of Legionella pneumophila, the bacterial agent of Legionnaires' disease, a potentially fatal pneumonia acquired from aerosolized contaminated fresh water. The genome includes a 45-kilobase pair element that can exist in chromosomal and episomal forms, selective expansions of important gene families, genes for unexpected metabolic pathways, and previously unknown candidate virulence determinants. We highlight the genes

Minchen Chien; Irina Morozova; Shundi Shi; Huitao Sheng; Jing Chen; Shawn M. Gomez; Gifty Asamani; Kendra Hill; John Nuara; Marc Feder; Justin Rineer; Joseph J. Greenberg; Valeria Steshenko; Samantha H. Park; Baohui Zhao; Elita Teplitskaya; John R. Edwards; Sergey Pampou; Anthi Georghiou; I.-Chun Chou; William Iannuccilli; Michael E. Ulz; Dae H. Kim; Alex Geringer-Sameth; Curtis Goldsberry; Pavel Morozov; Stuart G. Fischer; Gil Segal; Xiaoyan Qu; Andrey Rzhetsky; Peisen Zhang; Eftihia Cayanis; Pieter J. De Jong; Jingyue Ju; Sergey Kalachikov; Howard A. Shuman; James J. Russo

2004-01-01

255

Draft Genome Sequence of Avibacterium paragallinarum Strain 221.  

PubMed

Avibacterium paragallinarum is the causative agent of infectious coryza. Here we report the draft genome sequence of reference strain 221 of A. paragallinarum serovar A. The genome is composed of 135 contigs for 2,685,568 bp with a 41% G+C content. PMID:23704189

Xu, Fuzhou; Miao, Deyuan; Du, Yu; Chen, Xiaoling; Zhang, Peijun; Sun, Huiling

2013-05-23

256

A Cryptographic Approach to Securely Share and Query Genomic Sequences  

Microsoft Academic Search

To support large-scale biomedical research projects, organizations need to share person-specific genomic sequences without violating the privacy of their data subjects. In the past, organizations protected subjects' identities by removing identifiers, such as name and social security number; however, recent investigations illustrate that deidentified genomic data can be ldquoreidentifiedrdquo to named individuals using simple automated methods. In this paper, we

Murat Kantarcioglu; Ying Liu; Bradley Malin

2008-01-01

257

Complete Genome Sequence of the Soil Actinomycete Kocuria rhizophila  

Microsoft Academic Search

The soil actinomycete Kocuria rhizophila belongs to the suborder Micrococcineae, a divergent bacterial group for which only a limited amount of genomic information is currently available. K. rhizophila is also important in industrial applications; e.g., it is commonly used as a standard quality control strain for antimicrobial susceptibility testing. Sequencing and annotation of the genome of K. rhizophila DC2201 (NBRC

Hiromi Takarada; Mitsuo Sekine; Hiroki Kosugi; Yasunori Matsuo; Takatomo Fujisawa; Seiha Omata; Emi Kishi; Ai Shimizu; Naofumi Tsukatani; Satoshi Tanikawa; Nobuyuki Fujita; Shigeaki Harayama

2008-01-01

258

Complete Genome Sequence of Cyanobacterial Siphovirus KBS2A.  

PubMed

We present the genome of a cyanosiphovirus (KBS2A) that infects a marine Synechococcus sp. (strain WH7803). Unique to this genome, relative to other sequenced cyanosiphoviruses, is the absence of elements associated with integration into the host chromosome, suggesting this virus may not be able to establish a lysogenic relationship. PMID:23969045

Ponsero, Alise J; Chen, Feng; Lennon, Jay T; Wilhelm, Steven W

2013-08-22

259

Complete Genome Sequence of Antarctic Bacterium Psychrobacter sp. Strain G.  

PubMed

Here, we report the complete genome sequence of Psychrobacter sp. strain G, isolated from King George Island, Antarctica, which can produce lipolytic enzymes at low temperatures. The genomics information of this strain will facilitate the study of the physiology, cold adaptation properties, and evolution of this genus. PMID:24051316

Che, Shuai; Song, Lai; Song, Weizhi; Yang, Meng; Liu, Guiming; Lin, Xuezheng

2013-09-19

260

Taxonomy becoming a driving force in genome sequencing projects.  

PubMed

We studied the possible impact of genomic projects by comparing the number of published articles before and after the completion of the project. We found that for most species, there is no significant change in the number of citations. Also our study remarks the growing importance of taxonomy as main motivation for the sequencing of genomes. PMID:23453737

Tamames, Javier; Durante-Rodríguez, Gonzalo

2013-03-01

261

The genome sequence and structure of rice chromosome 1  

Microsoft Academic Search

The rice species Oryza sativa is considered to be a model plant because of its small genome size, extensive genetic map, relative ease of transformation and synteny with other cereal crops. Here we report the essentially complete sequence of chromosome 1, the longest chromosome in the rice genome. We summarize characteristics of the chromosome structure and the biological insight gained

Takuji Sasaki; Takashi Matsumoto; Kimiko Yamamoto; Katsumi Sakata; Tomoya Baba; Yuichi Katayose; Jianzhong Wu; Yoshihito Niimura; Zhukuan Cheng; Yoshiaki Nagamura; Baltazar A. Antonio; Hiroyuki Kanamori; Satomi Hosokawa; Masatoshi Masukawa; Koji Arikawa; Yoshino Chiden; Mika Hayashi; Masako Okamoto; Tsuyu Ando; Hiroyoshi Aoki; Kohei Arita; Masao Hamada; Chizuko Harada; Saori Hijishita; Mikiko Honda; Yoko Ichikawa; Atsuko Idonuma; Masumi Iijima; Michiko Ikeda; Maiko Ikeno; Sachie Ito; Tomoko Ito; Yuichi Ito; Yukiyo Ito; Aki Iwabuchi; Kozue Kamiya; Wataru Karasawa; Satoshi Katagiri; Ari Kikuta; Noriko Kobayashi; Izumi Kono; Kayo Machita; Tomoko Maehara; Hiroshi Mizuno; Tatsumi Mizubayashi; Yoshiyuki Mukai; Hideki Nagasaki; Marina Nakashima; Yuko Nakama; Yumi Nakamichi; Mari Nakamura; Nobukazu Namiki; Manami Negishi; Isamu Ohta; Nozomi Ono; Shoko Saji; Kumiko Sakai; Michie Shibata; Takanori Shimokawa; Ayahiko Shomura; Jianyu Song; Yuka Takazaki; Kimihiro Terasawa; Kumiko Tsuji; Kazunori Waki; Harumi Yamagata; Hiroko Yamane; Shoji Yoshiki; Rie Yoshihara; Kazuko Yukawa; Huisun Zhong; Hisakazu Iwama; Toshinori Endo; Hidetaka Ito; Jang Ho Hahn; Ho-Il Kim; Moo-Young Eun; Masahiro Yano; Jiming Jiang; Takashi Gojobori

2002-01-01

262

Sequence Analysis of the Genome of the Neodiprion sertifer Nucleopolyhedrovirus  

Microsoft Academic Search

The genome of the Neodiprion sertifer nucleopolyhedrovirus (NeseNPV), which infects the European pine sawfly, N. sertifer (Hymenoptera: Diprionidae), was sequenced and analyzed. The genome was 86,462 bp in size. The CG content of 34% was lower than that of the majority of baculoviruses. A total of 90 methionine- initiated open reading frames (ORFs) with more than 50 amino acids and

Alejandra Garcia-Maruniak; James E. Maruniak; Paolo M. A. Zanotto; Aissa E. Doumbouya; Jaw-Ching Liu; Thomas M. Merritt; Jennifer S. Lanoie

2004-01-01

263

Triticeae genomics: advances in sequence analysis of large genome cereal crops.  

PubMed

Whole genome sequencing provides direct access to all genes of an organism and represents an essential step towards a systematic understanding of (crop) plant biology. Wheat and barley, two of the most important crop species worldwide, have two- to five-fold larger genomes than human - too large to be completely sequenced at current costs. Nevertheless, significant progress has been made to unlock the gene contents of these species by sequencing expressed sequence tags (EST) for high-density mapping and as a basis for elucidating gene function on a large scale. Several megabases of genomic (BAC) sequences have been obtained providing a first insight into the complexity of these huge cereal genomes. However, to fully exploit the information of the wheat and barley genomes for crop improvement, sequence analysis of a significantly larger portion of the Triticeae genomes is needed. In this review an overview of the current status of Triticeae genome sequencing and a perspective concerning future developments in cereal structural genomics is provided. PMID:17295124

Stein, Nils

2007-01-01

264

Genome sequence of the biocontrol strain Pseudomonas fluorescens F113.  

PubMed

Pseudomonas fluorescens F113 is a plant growth-promoting rhizobacterium (PGPR) that has biocontrol activity against fungal plant pathogens and is a model for rhizosphere colonization. Here, we present its complete genome sequence, which shows that besides a core genome very similar to those of other strains sequenced within this species, F113 possesses a wide array of genes encoding specialized functions for thriving in the rhizosphere and interacting with eukaryotic organisms. PMID:22328765

Redondo-Nieto, Miguel; Barret, Matthieu; Morrisey, John P; Germaine, Kieran; Martínez-Granero, Francisco; Barahona, Emma; Navazo, Ana; Sánchez-Contreras, María; Moynihan, Jennifer A; Giddens, Stephen R; Coppoolse, Eric R; Muriel, Candela; Stiekema, Willem J; Rainey, Paul B; Dowling, David; O'Gara, Fergal; Martín, Marta; Rivilla, Rafael

2012-03-01

265

Complete Genome Sequences of Novel Rat Noroviruses in Hong Kong  

PubMed Central

We report two genome sequences of novel noroviruses isolated from fecal swab specimens of brown rats in Hong Kong. The complete genome is approximately 7.5 kb in length and consists of 3 overlapping open reading frames encoding ORF1 polyprotein, VP1, and VP2, respectively. Sequence analysis suggested that these noroviruses should be classified in genogroup V, but they are distinct from other known rodent noroviruses and represent a novel cluster within the genogroup.

Tse, Herman; Chan, Wan-Mui; Lam, Carol S. F.; Lau, Susanna K. P.; Woo, Patrick C. Y.

2012-01-01

266

Genome Sequence of the Biocontrol Strain Pseudomonas fluorescens F113  

PubMed Central

Pseudomonas fluorescens F113 is a plant growth-promoting rhizobacterium (PGPR) that has biocontrol activity against fungal plant pathogens and is a model for rhizosphere colonization. Here, we present its complete genome sequence, which shows that besides a core genome very similar to those of other strains sequenced within this species, F113 possesses a wide array of genes encoding specialized functions for thriving in the rhizosphere and interacting with eukaryotic organisms.

Redondo-Nieto, Miguel; Barret, Matthieu; Morrisey, John P.; Germaine, Kieran; Martinez-Granero, Francisco; Barahona, Emma; Navazo, Ana; Sanchez-Contreras, Maria; Moynihan, Jennifer A.; Giddens, Stephen R.; Coppoolse, Eric R.; Muriel, Candela; Stiekema, Willem J.; Rainey, Paul B.; Dowling, David; O'Gara, Fergal; Martin, Marta

2012-01-01

267

Complete Genome Sequence of Bacillus cereus Bacteriophage PBC1  

PubMed Central

Bacillus cereus is a ubiquitous, spore-forming bacterium associated with food poisoning cases. To develop an efficient biocontrol agent against B. cereus, we isolated lytic phage PBC1 and sequenced its genome. PBC1 showed a very low degree of homology to previously reported phages, implying that it is novel. Here we report the complete genome sequence of PBC1 and describe major findings from our analysis.

Kong, Minsuk; Kim, Minsik

2012-01-01

268

The Genome Sequence of the SARS-Associated Coronavirus  

Microsoft Academic Search

We sequenced the 29,751-base genome of the severe acute respiratory syndrome (SARS)-associated coronavirus known as the Tor2 isolate. The genome sequence reveals that this coronavirus is only moderately related to other known coronaviruses, including two human coronaviruses, HCoV-OC43 and HCoV-229E. Phylogenetic analysis of the predicted viral proteins indicates that the virus does not closely resemble any of the three previously

Marco A. Marra; Steven J. M. Jones; Caroline R. Astell; Robert A. Holt; Angela Brooks-Wilson; Yaron S. N. Butterfield; Jaswinder Khattra; Jennifer K. Asano; Sarah A. Barber; Susanna Y. Chan; Alison Cloutier; Shaun M. Coughlin; Doug Freeman; Noreen Girn; Obi L. Griffith; Stephen R. Leach; Michael Mayo; Helen McDonald; Stephen B. Montgomery; Pawan K. Pandoh; Anca S. Petrescu; A. Gordon Robertson; Jacqueline E. Schein; Asim Siddiqui; Duane E. Smailus; Jeff M. Stott; George S. Yang; Francis Plummer; Anton Andonov; Harvey Artsob; Nathalie Bastien; Kathy Bernard; Timothy F. Booth; Donnie Bowness; Michael Drebot; Lisa Fernando; Ramon Flick; Michael Garbutt; Michael Garbutt; Allen Grolla; Heinz Feldmann; Adrienne Meyers; Amin Kabani; Yan Li; Susan Normand; Ute Stroher; Graham A. Tipples; Shaun Tyler; Robert Vogrig; Diane Ward; Robert C. Brunham; Mel Krajden; Martin Petric; Danuta M. Skowronski; Chris Upton; Rachel L. Roper

2003-01-01

269

Genome Sequence of Pantoea agglomerans Strain IG1  

PubMed Central

Pantoea agglomerans is a Gram-negative bacterium that grows symbiotically with various plants. Here we report the 4.8-Mb genome sequence of P. agglomerans strain IG1. The lipopolysaccharides derived from P. agglomerans IG1 have been shown to be effective in the prevention of various diseases, such as bacterial or viral infection, lifestyle-related diseases. This genome sequence represents a substantial step toward the elucidation of pathways for production of lipopolysaccharides.

Matsuzawa, Tomohiko; Mori, Kazuki; Kadowaki, Takeshi; Shimada, Misato; Tashiro, Kosuke; Kuhara, Satoru; Inagawa, Hiroyuki; Soma, Gen-ichiro

2012-01-01

270

Genome sequence of Pantoea agglomerans strain IG1.  

PubMed

Pantoea agglomerans is a gram-negative bacterium that grows symbiotically with various plants. Here we report the 4.8-Mb genome sequence of P. agglomerans strain IG1. The lipopolysaccharides derived from P. agglomerans IG1 have been shown to be effective in the prevention of various diseases, such as bacterial or viral infection, lifestyle-related diseases. This genome sequence represents a substantial step toward the elucidation of pathways for production of lipopolysaccharides. PMID:22328756

Matsuzawa, Tomohiko; Mori, Kazuki; Kadowaki, Takeshi; Shimada, Misato; Tashiro, Kosuke; Kuhara, Satoru; Inagawa, Hiroyuki; Soma, Gen-ichiro; Takegawa, Kaoru

2012-03-01

271

Complete Genome Sequence of Bifidobacterium bifidum S17?  

PubMed Central

Here, we report on the first completely annotated genome sequence of a Bifidobacterium bifidum strain. B. bifidum S17, isolated from feces of a breast-fed infant, was shown to strongly adhere to intestinal epithelial cells and has potent anti-inflammatory activity in vitro and in vivo. The genome sequence will provide new insights into the biology of this potential probiotic organism and allow for the characterization of the molecular mechanisms underlying its beneficial properties.

Zhurina, Daria; Zomer, Aldert; Gleinser, Marita; Brancaccio, Vincenco Francesco; Auchter, Marc; Waidmann, Mark S.; Westermann, Christina; van Sinderen, Douwe; Riedel, Christian U.

2011-01-01

272

Complete genome sequence of Bifidobacterium bifidum S17.  

PubMed

Here, we report on the first completely annotated genome sequence of a Bifidobacterium bifidum strain. B. bifidum S17, isolated from feces of a breast-fed infant, was shown to strongly adhere to intestinal epithelial cells and has potent anti-inflammatory activity in vitro and in vivo. The genome sequence will provide new insights into the biology of this potential probiotic organism and allow for the characterization of the molecular mechanisms underlying its beneficial properties. PMID:21037011

Zhurina, Daria; Zomer, Aldert; Gleinser, Marita; Brancaccio, Vincenco Francesco; Auchter, Marc; Waidmann, Mark S; Westermann, Christina; van Sinderen, Douwe; Riedel, Christian U

2010-10-29

273

Sequencing viral genomes from a single isolated plaque  

PubMed Central

Background Whole genome sequencing of viruses and bacteriophages is often hindered because of the need for large quantities of genomic material. A method is described that combines single plaque sequencing with an optimization of Sequence Independent Single Primer Amplification (SISPA). This method can be used for de novo whole genome next-generation sequencing of any cultivable virus without the need for large-scale production of viral stocks or viral purification using centrifugal techniques. Methods A single viral plaque of a variant of the 2009 pandemic H1N1 human Influenza A virus was isolated and amplified using the optimized SISPA protocol. The sensitivity of the SISPA protocol presented here was tested with bacteriophage F_HA0480sp/Pa1651 DNA. The amplified products were sequenced with 454 and Illumina HiSeq platforms. Mapping and de novo assemblies were performed to analyze the quality of data produced from this optimized method. Results Analysis of the sequence data demonstrated that from a single viral plaque of Influenza A, a mapping assembly with 3590-fold average coverage representing 100% of the genome could be produced. The de novo assembled data produced contigs with 30-fold average sequence coverage, representing 96.5% of the genome. Using only 10 pg of starting DNA from bacteriophage F_HA0480sp/Pa1651 in the SISPA protocol resulted in sequencing data that gave a mapping assembly with 3488-fold average sequence coverage, representing 99.9% of the reference and a de novo assembly with 45-fold average sequence coverage, representing 98.1% of the genome. Conclusions The optimized SISPA protocol presented here produces amplified product that when sequenced will give high quality data that can be used for de novo assembly. The protocol requires only a single viral plaque or as little as 10 pg of DNA template, which will facilitate rapid identification of viruses during an outbreak and viruses that are difficult to propagate.

2013-01-01

274

Large-Scale Sequencing: The Future of Genomic Sciences Colloquium  

SciTech Connect

Genetic sequencing and the various molecular techniques it has enabled have revolutionized the field of microbiology. Examining and comparing the genetic sequences borne by microbes - including bacteria, archaea, viruses, and microbial eukaryotes - provides researchers insights into the processes microbes carry out, their pathogenic traits, and new ways to use microorganisms in medicine and manufacturing. Until recently, sequencing entire microbial genomes has been laborious and expensive, and the decision to sequence the genome of an organism was made on a case-by-case basis by individual researchers and funding agencies. Now, thanks to new technologies, the cost and effort of sequencing is within reach for even the smallest facilities, and the ability to sequence the genomes of a significant fraction of microbial life may be possible. The availability of numerous microbial genomes will enable unprecedented insights into microbial evolution, function, and physiology. However, the current ad hoc approach to gathering sequence data has resulted in an unbalanced and highly biased sampling of microbial diversity. A well-coordinated, large-scale effort to target the breadth and depth of microbial diversity would result in the greatest impact. The American Academy of Microbiology convened a colloquium to discuss the scientific benefits of engaging in a large-scale, taxonomically-based sequencing project. A group of individuals with expertise in microbiology, genomics, informatics, ecology, and evolution deliberated on the issues inherent in such an effort and generated a set of specific recommendations for how best to proceed. The vast majority of microbes are presently uncultured and, thus, pose significant challenges to such a taxonomically-based approach to sampling genome diversity. However, we have yet to even scratch the surface of the genomic diversity among cultured microbes. A coordinated sequencing effort of cultured organisms is an appropriate place to begin, since not only are their genomes available, but they are also accompanied by data on environment and physiology that can be used to understand the resulting data. As single cell isolation methods improve, there should be a shift toward incorporating uncultured organisms and communities into this effort. Efforts to sequence cultivated isolates should target characterized isolates from culture collections for which biochemical data are available, as well as other cultures of lasting value from personal collections. The genomes of type strains should be among the first targets for sequencing, but creative culture methods, novel cell isolation, and sorting methods would all be helpful in obtaining organisms we have not yet been able to cultivate for sequencing. The data that should be provided for strains targeted for sequencing will depend on the phylogenetic context of the organism and the amount of information available about its nearest relatives. Annotation is an important part of transforming genome sequences into useful resources, but it represents the most significant bottleneck to the field of comparative genomics right now and must be addressed. Furthermore, there is a need for more consistency in both annotation and achieving annotation data. As new annotation tools become available over time, re-annotation of genomes should be implemented, taking advantage of advancements in annotation techniques in order to capitalize on the genome sequences and increase both the societal and scientific benefit of genomics work. Given the proper resources, the knowledge and ability exist to be able to select model systems, some simple, some less so, and dissect them so that we may understand the processes and interactions at work in them. Colloquium participants suggest a five-pronged, coordinated initiative to exhaustively describe six different microbial ecosystems, designed to describe all the gene diversity, across genomes. In this effort, sequencing should be complemented by other experimental data, particularly transcriptomics and metabolomics data, all of which

Margaret Riley; Merry Buckley

2009-01-01

275

Mitochondrial Genome Sequence of the Legume Vicia faba  

PubMed Central

The number of plant mitochondrial genomes sequenced exceeds two dozen. However, for a detailed comparative study of different phylogenetic branches more plant mitochondrial genomes should be sequenced. This article presents sequencing data and comparative analysis of mitochondrial DNA (mtDNA) of the legume Vicia faba. The size of the V. faba circular mitochondrial master chromosome of cultivar Broad Windsor was estimated as 588,000?bp with a genome complexity of 387,745?bp and 52 conservative mitochondrial genes; 32 of them encoding proteins, 3 rRNA, and 17 tRNA genes. Six tRNA genes were highly homologous to chloroplast genome sequences. In addition to the 52 conservative genes, 114 unique open reading frames (ORFs) were found, 36 without significant homology to any known proteins and 29 with homology to the Medicago truncatula nuclear genome and to other plant mitochondrial ORFs, 49 ORFs were not homologous to M. truncatula but possessed sequences with significant homology to other plant mitochondrial or nuclear ORFs. In general, the unique ORFs revealed very low homology to known closely related legumes, but several sequence homologies were found between V. faba, Beta vulgaris, Nicotiana tabacum, Vitis vinifera, and even the monocots Oryza sativa and Zea mays. Most likely these ORFs arose independently during angiosperm evolution (Kubo and Mikami, 2007; Kubo and Newton, 2008). Computational analysis revealed in total about 45% of V. faba mtDNA sequence being homologous to the Medicago truncatula nuclear genome (more than to any sequenced plant mitochondrial genome), and 35% of this homology ranging from a few dozen to 12,806?bp are located on chromosome 1. Apparently, mitochondrial rrn5, rrn18, rps10, ATP synthase subunit alpha, cox2, and tRNA sequences are part of transcribed nuclear mosaic ORFs.

Negruk, Valentine

2013-01-01

276

Genome sequence of the date palm Phoenix dactylifera L.  

PubMed

Date palm (Phoenix dactylifera L.) is a cultivated woody plant species with agricultural and economic importance. Here we report a genome assembly for an elite variety (Khalas), which is 605.4?Mb in size and covers >90% of the genome (~671?Mb) and >96% of its genes (~41,660 genes). Genomic sequence analysis demonstrates that P. dactylifera experienced a clear genome-wide duplication after either ancient whole genome duplications or massive segmental duplications. Genetic diversity analysis indicates that its stress resistance and sugar metabolism-related genes tend to be enriched in the chromosomal regions where the density of single-nucleotide polymorphisms is relatively low. Using transcriptomic data, we also illustrate the date palm's unique sugar metabolism that underlies fruit development and ripening. Our large-scale genomic and transcriptomic data pave the way for further genomic studies not only on P. dactylifera but also other Arecaceae plants. PMID:23917264

Al-Mssallem, Ibrahim S; Hu, Songnian; Zhang, Xiaowei; Lin, Qiang; Liu, Wanfei; Tan, Jun; Yu, Xiaoguang; Liu, Jiucheng; Pan, Linlin; Zhang, Tongwu; Yin, Yuxin; Xin, Chengqi; Wu, Hao; Zhang, Guangyu; Ba Abdullah, Mohammed M; Huang, Dawei; Fang, Yongjun; Alnakhli, Yasser O; Jia, Shangang; Yin, An; Alhuzimi, Eman M; Alsaihati, Burair A; Al-Owayyed, Saad A; Zhao, Duojun; Zhang, Sun; Al-Otaibi, Noha A; Sun, Gaoyuan; Majrashi, Majed A; Li, Fusen; Tala; Wang, Jixiang; Yun, Quanzheng; Alnassar, Nafla A; Wang, Lei; Yang, Meng; Al-Jelaify, Rasha F; Liu, Kan; Gao, Shenghan; Chen, Kaifu; Alkhaldi, Samiyah R; Liu, Guiming; Zhang, Meng; Guo, Haiyan; Yu, Jun

2013-01-01

277

Complete chloroplast genome sequences of Solanum bulbocastanum , Solanum lycopersicum and comparative analyses with other Solanaceae genomes  

Microsoft Academic Search

Despite the agricultural importance of both potato and tomato, very little is known about their chloroplast genomes. Analysis of the complete sequences of tomato, potato, tobacco, and Atropa chloroplast genomes reveals significant insertions and deletions within certain coding regions or regulatory sequences (e.g., deletion of repeated sequences within 16S rRNA, ycf2 or ribosomal binding sites in ycf2). RNA, photosynthesis, and

Henry Daniell; Seung-Bum Lee; Justin Grevich; Christopher Saski; Tania Quesada-Vargas; Chittibabu Guda; Jeffrey Tomkins; Robert K. Jansen

2006-01-01

278

Structure-based inference of molecular functions of proteins of unknown function from Berkeley Structural Genomics Center  

SciTech Connect

Advances in sequence genomics have resulted in an accumulation of a huge number of protein sequences derived from genome sequences. However, the functions of a large portion of them cannot be inferred based on the current methods of sequence homology detection to proteins of known functions. Three-dimensional structure can have an important impact in providing inference of molecular function (physical and chemical function) of a protein of unknown function. Structural genomics centers worldwide have been determining many 3-D structures of the proteins of unknown functions, and possible molecular functions of them have been inferred based on their structures. Combined with bioinformatics and enzymatic assay tools, the successful acceleration of the process of protein structure determination through high throughput pipelines enables the rapid functional annotation of a large fraction of hypothetical proteins. We present a brief summary of the process we used at the Berkeley Structural Genomics Center to infer molecular functions of proteins of unknown function.

Kim, Sung-Hou; Shin, Dong Hae; Hou, Jingtong; Chandonia, John-Marc; Das, Debanu; Choi, In-Geol; Kim, Rosalind; Kim, Sung-Hou

2007-09-02

279

Choosing a Benchtop Sequencing Machine to Characterise Helicobacter pylori Genomes  

PubMed Central

The fully annotated genome sequence of the European strain, 26695 was first published in 1997 and, in 1999, it was directly compared to the USA isolate J99, promoting two standard laboratory isolates for Helicobacter pylori (H. pylori) research. With the genomic scaffolds available from these important genomes and the advent of benchtop high-throughput sequencing technology, a bacterial genome can now be sequenced within a few days. We sequenced and analysed strains J99 and 26695 using the benchtop-sequencing machines Ion Torrent PGM and the Illumina MiSeq Nextera and Nextera XT methodologies. Using publically available algorithms, we analysed the raw data and interrogated both genomes by mapping the data and by de novo assembly. We compared the accuracy of the coding sequence assemblies to the originally published sequences. With the Ion Torrent PGM, we found an inherently high-error rate in the raw sequence data. Using the Illumina MiSeq, we found significantly more non-covered nucleotides when using the less expensive Illumina Nextera XT compared with the Illumina Nextera library creation method. We found the most accurate de novo assemblies using the Nextera technology, however, extracting an accurate multi-locus sequence type was inconsistent compared to the Ion Torrent PGM. We found the cagPAI failed to assemble onto a single contig in all technologies but was more accurate using the Nextera. Our results indicate the Illumina MiSeq Nextera method is the most accurate for de novo whole genome sequencing of H. pylori.

Perkins, Timothy T.; Tay, Chin Yen; Thirriot, Fanny; Marshall, Barry

2013-01-01

280

Choosing a benchtop sequencing machine to characterise Helicobacter pylori genomes.  

PubMed

The fully annotated genome sequence of the European strain, 26695 was first published in 1997 and, in 1999, it was directly compared to the USA isolate J99, promoting two standard laboratory isolates for Helicobacter pylori (H. pylori) research. With the genomic scaffolds available from these important genomes and the advent of benchtop high-throughput sequencing technology, a bacterial genome can now be sequenced within a few days. We sequenced and analysed strains J99 and 26695 using the benchtop-sequencing machines Ion Torrent PGM and the Illumina MiSeq Nextera and Nextera XT methodologies. Using publically available algorithms, we analysed the raw data and interrogated both genomes by mapping the data and by de novo assembly. We compared the accuracy of the coding sequence assemblies to the originally published sequences. With the Ion Torrent PGM, we found an inherently high-error rate in the raw sequence data. Using the Illumina MiSeq, we found significantly more non-covered nucleotides when using the less expensive Illumina Nextera XT compared with the Illumina Nextera library creation method. We found the most accurate de novo assemblies using the Nextera technology, however, extracting an accurate multi-locus sequence type was inconsistent compared to the Ion Torrent PGM. We found the cagPAI failed to assemble onto a single contig in all technologies but was more accurate using the Nextera. Our results indicate the Illumina MiSeq Nextera method is the most accurate for de novo whole genome sequencing of H. pylori. PMID:23840736

Perkins, Timothy T; Tay, Chin Yen; Thirriot, Fanny; Marshall, Barry

2013-06-28

281

RESTseq--efficient benchtop population genomics with RESTriction Fragment SEQuencing.  

PubMed

We present RESTseq, an improved approach for a cost efficient, highly flexible and repeatable enrichment of DNA fragments from digested genomic DNA using Next Generation Sequencing platforms including small scale Personal Genome sequencers. Easy adjustments make it suitable for a wide range of studies requiring SNP detection or SNP genotyping from fine-scale linkage mapping to population genomics and population genetics also in non-model organisms. We demonstrate the validity of our approach by comparing two honeybee and several stingless bee samples. PMID:23691128

Stolle, Eckart; Moritz, Robin F A

2013-05-17

282

Complete genome sequence of Ferroglobus placidus AEDII12DO  

SciTech Connect

Ferroglobus placidus belongs to the order Archaeoglobales within the archaeal phylum Euryar- chaeota. Strain AEDII12DO is the type strain of the species and was isolated from a shallow marine hydrothermal system at Vulcano, Italy. It is a hyperthermophilic, anaerobic chemoli- thoautotroph, but it can also use a variety of aromatic compounds as electron donors. Here we describe the features of this organism together with the complete genome sequence and anno- tation. The 2,196,266 bp genome with its 2,567 protein-coding and 55 RNA genes was se- quenced as part of a DOE Joint Genome Institute Laboratory Sequencing Program (LSP) project.

Anderson, Iain [U.S. Department of Energy, Joint Genome Institute; Risso, Carla [University of Massachusetts, Amherst; Holmes, Dawn [University of Massachusetts, Amherst; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Brettin, Thomas S [ORNL; Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Tapia, Roxanne [Los Alamos National Laboratory (LANL); Larimer, Frank W [ORNL; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Lovley, Derek [University of Massachusetts, Amherst; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute

2011-01-01

283

Complete genome sequence of Serratia plymuthica strain AS12.  

PubMed

A plant-associated member of the family Enterobacteriaceae, Serratia plymuthica strain AS12 was isolated from rapeseed roots. It is of scientific interest because it promotes plant growth and inhibits plant pathogens. The genome of S. plymuthica AS12 comprises a 5,443,009 bp long circular chromosome, which consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome was sequenced within the 2010 DOE-JGI Community Sequencing Program (CSP2010) as part of the project entitled "Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogens". PMID:22768360

Neupane, Saraswoti; Finlay, Roger D; Alström, Sadhna; Goodwin, Lynne; Kyrpides, Nikos C; Lucas, Susan; Lapidus, Alla; Bruce, David; Pitluck, Sam; Peters, Lin; Ovchinnikova, Galina; Chertkov, Olga; Han, James; Han, Cliff; Tapia, Roxanne; Detter, John C; Land, Miriam; Hauser, Loren; Cheng, Jan-Fang; Ivanova, Natalia; Pagani, Ioanna; Klenk, Hans-Peter; Woyke, Tanja; Högberg, Nils

2012-05-01

284

Complete genome sequence of Serratia plymuthica strain AS12  

PubMed Central

A plant-associated member of the family Enterobacteriaceae, Serratia plymuthica strain AS12 was isolated from rapeseed roots. It is of scientific interest because it promotes plant growth and inhibits plant pathogens. The genome of S. plymuthica AS12 comprises a 5,443,009 bp long circular chromosome, which consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome was sequenced within the 2010 DOE-JGI Community Sequencing Program (CSP2010) as part of the project entitled “Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogens”.

Finlay, Roger D.; Alstrom, Sadhna; Goodwin, Lynne; Kyrpides, Nikos C.; Lucas, Susan; Lapidus, Alla; Bruce, David; Pitluck, Sam; Peters, Lin; Ovchinnikova, Galina; Chertkov, Olga; Han, James; Han, Cliff; Tapia, Roxanne; Detter, John C.; Land, Miriam; Hauser, Loren; Cheng, Jan-Fang; Ivanova, Natalia; Pagani, Ioanna; Klenk, Hans-Peter; Woyke, Tanja; Hogberg, Nils

2012-01-01

285

RESTseq - Efficient Benchtop Population Genomics with RESTriction Fragment SEQuencing  

PubMed Central

We present RESTseq, an improved approach for a cost efficient, highly flexible and repeatable enrichment of DNA fragments from digested genomic DNA using Next Generation Sequencing platforms including small scale Personal Genome sequencers. Easy adjustments make it suitable for a wide range of studies requiring SNP detection or SNP genotyping from fine-scale linkage mapping to population genomics and population genetics also in non-model organisms. We demonstrate the validity of our approach by comparing two honeybee and several stingless bee samples.

Stolle, Eckart; Moritz, Robin F. A.

2013-01-01

286

Comparison of Sample Sequences of the Salmonella typhi Genome to the Sequence of the Complete Escherichia coli K-12 Genome  

PubMed Central

Raw sequence data representing the majority of a bacterial genome can be obtained at a tiny fraction of the cost of a completed sequence. To demonstrate the utility of such a resource, 870 single-stranded M13 clones were sequenced from a shotgun library of the Salmonella typhi Ty2 genome. The sequence reads averaged over 400 bases and sampled the genome with an average spacing of once every 5,000 bases. A total of 339,243 bases of unique sequence was generated (approximately 7% representation). The sample of 870 sequences was compared to the complete Escherichia coli K-12 genome and to the rest of the GenBank database, which can also be considered a collection of sampled sequences. Despite the incomplete S. typhi data set, interesting categories could easily be discerned. Sixteen percent of the sequences determined from S. typhi had close homologs among known Salmonella sequences (P < 1e?40 in BlastX or BlastN), reflecting the proportion of these genomes that have been sequenced previously; 277 sequences (32%) had no apparent orthologs in the complete E. coli K-12 genome (P > 1e?20), of which 155 sequences (18%) had no close similarities to any sequence in the database (P > 1e?5). Eight of the 277 sequences had similarities to genes in other strains of E. coli or plasmids, and six sequences showed evidence of novel phage lysogens or sequence remnants of phage integrations, including a member of the lambda family (P < 1e?15). Twenty-three sample sequences had a significantly closer similarity a sequence in the database from organisms other than the E. coli/Salmonella clade (which includes Shigella and Citrobacter). These sequences are new candidate lateral transfer events to the S. typhi lineage or deletions on the E. coli K-12 lineage. Eleven putative junctions of insertion/deletion events greater than 100 bp were observed in the sample, indicating that well over 150 such events may distinguish S. typhi from E. coli K-12. The need for automatic methods to more effectively exploit sample sequences is discussed.

McClelland, Michael; Wilson, Richard K.

1998-01-01

287

Sequence analysis and organization of the Neodiprion abietis nucleopolyhedrovirus genome.  

PubMed

Of 30 baculovirus genomes that have been sequenced to date, the only nonlepidopteran baculoviruses include the dipteran Culex nigripalpus nucleopolyhedrovirus and two hymenopteran nucleopolyhedroviruses that infect the sawflies Neodiprion lecontei (NeleNPV) and Neodiprion sertifer (NeseNPV). This study provides a complete sequence and genome analysis of the nucleopolyhedrovirus that infects the balsam fir sawfly Neodiprion abietis (Hymenoptera, Symphyta, Diprionidae). The N. abietis nucleopolyhedrovirus (NeabNPV) is 84,264 bp in size, with a G+C content of 33.5%, and contains 93 predicted open reading frames (ORFs). Eleven predicted ORFs are unique to this baculovirus, 10 ORFs have a putative sequence homologue in the NeleNPV genome but not the NeseNPV genome, and 1 ORF (neab53) has a putative sequence homologue in the NeseNPV genome but not the NeleNPV genome. Specific repeat sequences are coincident with major genome rearrangements that distinguish NeabNPV and NeleNPV. Genes associated with these repeat regions encode a common amino acid motif, suggesting that they are a family of repeated contiguous gene clusters. Lepidopteran baculoviruses, similarly, have a family of repeated genes called the bro gene family. However, there is no significant sequence similarity between the NeabNPV and bro genes. Homologues of early-expressed genes such as ie-1 and lef-3 were absent in NeabNPV, as they are in the previously sequenced hymenopteran baculoviruses. Analyses of ORF upstream sequences identified potential temporally distinct genes on the basis of putative promoter elements. PMID:16809301

Duffy, Simon P; Young, Aaron M; Morin, Benoit; Lucarotti, Christopher J; Koop, Ben F; Levin, David B

2006-07-01

288

Biased distribution of DNA uptake sequences towards genome maintenance genes  

PubMed Central

Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9–10mers residing within coding regions are the DNA uptake sequences (DUS) required for natural genetic transformation. More importantly, we found a significantly higher density of DUS within genes involved in DNA repair, recombination, restriction-modification and replication than in any other annotated gene group in these organisms. Pasteurella multocida also displayed high frequencies of a putative DUS identical to that previously identified in H.influenzae and with a skewed distribution towards genome maintenance genes, indicating that this bacterium might be transformation competent under certain conditions. These results imply that the high frequency of DUS in genome maintenance genes is conserved among phylogenetically divergent species and thus are of significant biological importance. Increased DUS density is expected to enhance DNA uptake and the over-representation of DUS in genome maintenance genes might reflect facilitated recovery of genome preserving functions. For example, transient and beneficial increase in genome instability can be allowed during pathogenesis simply through loss of antimutator genes, since these DUS-containing sequences will be preferentially recovered. Furthermore, uptake of such genes could provide a mechanism for facilitated recovery from DNA damage after genotoxic stress.

Davidsen, Tonje; R?dland, Einar A.; Lagesen, Karin; Seeberg, Erling; Rognes, Torbj?rn; T?njum, Tone

2004-01-01

289

Draft genome sequences of 21 Salmonella enterica serovar enteritidis strains.  

PubMed

Salmonella enterica subsp. enterica serovar Enteritidis is a common food-borne pathogen, often associated with shell eggs and poultry. Here, we report draft genomes of 21 S. Enteritidis strains associated with or related to the U.S.-wide 2010 shell egg recall. Eleven of these genomes were from environmental isolates associated with the egg outbreak, and 10 were reference isolates from previous years, unrelated to the outbreak. The whole-genome sequence data for these 21 human pathogen strains are being released in conjunction with the newly formed 100K Genome Project. PMID:23045502

Timme, Ruth E; Allard, Marc W; Luo, Yan; Strain, Errol; Pettengill, James; Wang, Charles; Li, Cong; Keys, Christine E; Zheng, Jie; Stones, Robert; Wilson, Mark R; Musser, Steven M; Brown, Eric W

2012-11-01

290

Draft Genome Sequences of 21 Salmonella enterica Serovar Enteritidis Strains  

PubMed Central

Salmonella enterica subsp. enterica serovar Enteritidis is a common food-borne pathogen, often associated with shell eggs and poultry. Here, we report draft genomes of 21 S. Enteritidis strains associated with or related to the U.S.-wide 2010 shell egg recall. Eleven of these genomes were from environmental isolates associated with the egg outbreak, and 10 were reference isolates from previous years, unrelated to the outbreak. The whole-genome sequence data for these 21 human pathogen strains are being released in conjunction with the newly formed 100K Genome Project.

Allard, Marc W.; Luo, Yan; Strain, Errol; Pettengill, James; Wang, Charles; Li, Cong; Keys, Christine E.; Zheng, Jie; Stones, Robert; Wilson, Mark R.; Musser, Steven M.; Brown, Eric W.

2012-01-01

291

Complete genome sequence of Atopobium parvulum type strain (IPP 1246).  

PubMed

Atopobium parvulum (Weinberg et al. 1937) Collins and Wallbanks 1993 comb. nov. is the type strain of the species and belongs to the genomically yet unstudied Atopobium/Olsenella branch of the family Coriobacteriaceae. The species A. parvulum is of interest because its members are frequently isolated from the human oral cavity and are found to be associated with halitosis (oral malodor) but not with periodontitis. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the genus Atopobium, and the 1,543,805 bp long single replicon genome with its 1369 protein-coding and 49 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304653

Copeland, Alex; Sikorski, Johannes; Lapidus, Alla; Nolan, Matt; Del Rio, Tijana Glavina; Lucas, Susan; Chen, Feng; Tice, Hope; Pitluck, Sam; Cheng, Jan-Fang; Pukall, Rüdiger; Chertkov, Olga; Brettin, Thomas; Han, Cliff; Detter, John C; Kuske, Cheryl; Bruce, David; Goodwin, Lynne; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Chen, Amy; Palaniappan, Krishna; Chain, Patrick; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Detter, John C

2009-09-23

292

Complete genome sequence of Sulfurimonas autotrophica type strain (OK10).  

PubMed

Sulfurimonas autotrophica Inagaki et al. 2003 is the type species of the genus Sulfurimonas. This genus is of interest because of its significant contribution to the global sulfur cycle as it oxidizes sulfur compounds to sulfate and by its apparent habitation of deep-sea hydrothermal and marine sulfidic environments as potential ecological niche. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the second complete genome sequence of the genus Sulfurimonas and the 15(th) genome in the family Helicobacteraceae. The 2,153,198 bp long genome with its 2,165 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304749

Sikorski, Johannes; Munk, Christine; Lapidus, Alla; Ngatchou Djao, Olivier Duplex; Lucas, Susan; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Han, Cliff; Cheng, Jan-Fang; Tapia, Roxanne; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Sims, David; Meincke, Linda; Brettin, Thomas; Detter, John C; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Rohde, Manfred; Lang, Elke; Spring, Stefan; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

2010-10-27

293

Complete genome sequence of Kribbella flavida type strain (IFO 14399).  

PubMed

The genus Kribbella consists of 15 species, with Kribbella flavida (Park et al. 1999) as the type species. The name Kribbella was formed from the acronym of the Korea Research Institute of Bioscience and Biotechnology, KRIBB. Strains of the various Kribbella species were originally isolated from soil, potato, alum slate mine, patinas of catacombs or from horse racecourses. Here we describe the features of K. flavida together with the complete genome sequence and annotation. In addition to the 5.3 Mbp genome of Nocardioides sp. JS614, this is only the second completed genome sequence of the family Nocardioidaceae. The 7,579,488 bp long genome with its 7,086 protein-coding and 60 RNA genes and is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304701

Pukall, Rüdiger; Lapidus, Alla; Glavina Del Rio, Tijana; Copeland, Alex; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Nolan, Matt; Labutti, Kurt; Pati, Amrita; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pitluck, Sam; Bruce, David; Goodwin, Lynne; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Chen, Amy; Palaniappan, Krishna; Chain, Patrick; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Brettin, Thomas

2010-03-30

294

Complete Genome Sequence of Equine Herpesvirus Type 9  

PubMed Central

Equine herpesvirus type 9 (EHV-9), which we isolated from a case of epizootic encephalitis in a herd of Thomson's gazelles (Gazella thomsoni) in 1993, has been known to cause fatal encephalitis in Thomson's gazelle, giraffe, and polar bear in natural infections. Our previous report indicated that EHV-9 was similar to the equine pathogen equine herpesvirus type 1 (EHV-1), which mainly causes abortion, respiratory infection, and equine herpesvirus myeloencephalopathy. We determined the genome sequence of EHV-9. The genome has a length of 148,371 bp and all 80 of the open reading frames (ORFs) found in the genome of EHV-1. The nucleotide sequences of the ORFs in EHV-9 were 86 to 95% identical to those in EHV-1. The whole genome sequence should help to reveal the neuropathogenicity of EHV-9.

Yamaguchi, Tsuyoshi; Yamada, Souichi

2012-01-01

295

Complete genome sequence of Rhodothermus marinus type strain (R-10).  

PubMed

Rhodothermus marinus Alfredsson et al. 1995 is the type species of the genus and is of phylogenetic interest because the Rhodothermaceae represent the deepest lineage in the phylum Bacteroidetes. R. marinus R-10(T) is a Gram-negative, non-motile, non-spore-forming bacterium isolated from marine hot springs off the coast of Iceland. Strain R-10(T) is strictly aerobic and requires slightly halophilic conditions for growth. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the genus Rhodothermus, and only the second sequence from members of the family Rhodothermaceae. The 3,386,737 bp genome (including a 125 kb plasmid) with its 2914 protein-coding and 48 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304669

Nolan, Matt; Tindall, Brian J; Pomrenke, Helga; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Lucas, Susan; Chen, Feng; Tice, Hope; Cheng, Jan-Fang; Saunders, Elizabeth; Han, Cliff; Bruce, David; Goodwin, Lynne; Chain, Patrick; Pitluck, Sam; Ovchinikova, Galina; Pati, Amrita; Ivanova, Natalia; Mavromatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Brettin, Thomas; Göker, Markus; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Detter, John C

2009-12-29

296

Complete genome sequence of Streptobacillus moniliformis type strain (9901T)  

SciTech Connect

Streptobacillus moniliformis Levaditi et al. 1925 is the sole and type species of the genus, and is of phylogenetic interest because of its isolated location in the sparsely populated and neither taxonomically nor genomically much accessed family 'Leptotrichiaceae' within the phylum 'Fusobacteria'. S. moniliformis, a Gram-negative, non-motile and pleomorphic bacterium, is the etiologic agent of rat bite fever and Haverhill fever. Strain 9901T, the type strain of the species, was isolated from a patient with rat bite fever. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is only the second completed genome sequence of the order 'Fusobacteriales' and no more than the third sequence from the phylum 'Fusobacteria'. The 1,662,578 bp long chromosome and the 10,702 bp plasmid with a total of 1511 protein-coding and 55 RNA genes are part of the Genomic Encyclopedia of Bacteria and Archaea project.

Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Gronow, Sabine [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Sims, David [Los Alamos National Laboratory (LANL); Meincke, Linda [Los Alamos National Laboratory (LANL); Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Sproer, Cathrin [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL)

2009-01-01

297

Complete genome sequence of Shewanella putrefaciens. Final report  

SciTech Connect

Seventy percent of the costs for genome sequencing Shewanella putrefaciens (oneidensis) were requested. These funds were expected to allow completion of the low-pass (5-fold) random sequencing and complete closure and annotation of the 200 kbp plasmid. Because of cost reduction that occurred during the period of this grant, these goals have been far exceeded. Currently, the S. putrefaciens genome is very nearly completely closed, even though the genome was significantly larger than expected and extremely repetitive. The entire genome sequence has been made BLAST searchable on the TIGR web page, and an extensive effort has been made to make data and analyses available to all researchers working on S. putrefaciens (oneidensis).

Heidelberg, John F.

2001-04-01

298

Genomic insight into the common carp (Cyprinus carpio) genome by sequencing analysis of BAC-end sequences  

PubMed Central

Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES) are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio), a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3,100 microsyntenies, covering over 50% of the zebrafish genome. BES of common carp are tremendous tools for comparative mapping between the two closely related species, zebrafish and common carp, which should facilitate both structural and functional genome analysis in common carp.

2011-01-01

299

Monitoring Genomic Sequences during SELEX Using High-Throughput Sequencing: Neutral SELEX  

PubMed Central

Background SELEX is a well established in vitro selection tool to analyze the structure of ligand-binding nucleic acid sequences called aptamers. Genomic SELEX transforms SELEX into a tool to discover novel, genomically encoded RNA or DNA sequences binding a ligand of interest, called genomic aptamers. Concerns have been raised regarding requirements imposed on RNA sequences undergoing SELEX selection. Methodology/Principal Findings To evaluate SELEX and assess the extent of these effects, we designed and performed a Neutral SELEX experiment omitting the selection step, such that the sequences are under the sole selective pressure of SELEX's amplification steps. Using high-throughput sequencing, we obtained thousands of full-length sequences from the initial genomic library and the pools after each of the 10 rounds of Neutral SELEX. We compared these to sequences obtained from a Genomic SELEX experiment deriving from the same initial library, but screening for RNAs binding with high affinity to the E. coli regulator protein Hfq. With each round of Neutral SELEX, sequences became less stable and changed in nucleotide content, but no sequences were enriched. In contrast, we detected substantial enrichment in the Hfq-selected set with enriched sequences having structural stability similar to the neutral sequences but with significantly different nucleotide selection. Conclusions/Significance Our data indicate that positive selection in SELEX acts independently of the neutral selective requirements imposed on the sequences. We conclude that Genomic SELEX, when combined with high-throughput sequencing of positively and neutrally selected pools, as well as the gnomic library, is a powerful method to identify genomic aptamers.

Chen, Doris; Lorenz, Christina; Schroeder, Renee

2010-01-01

300

The impact of next-generation sequencing on genomics  

Microsoft Academic Search

This article reviews basic concepts, general applications, and the potential impact of next-generation sequencing (NGS) technologies on genomics, with particular reference to currently available and possible future platforms and bioinformatics. NGS technologies have demonstrated the capacity to sequence DNA at unprecedented speed, thereby enabling previously unimaginable scientific achievements and novel biological applications. But, the massive data produced by NGS also

Jun Zhang; Rod Chiodini; Ahmed Badr; Genfa Zhang

2011-01-01

301

A Comparison Study of Virus Classification by Genome Sequences  

Microsoft Academic Search

In this study, instead of traditional approaches to virus classification, we proposed a novel approach in the vector space model for virus classification via two types of genome sequences, DNA and CDS. For DNA sequence, in this study, the k-mer approach was adopted for pattern extraction and the entropy of the pattern frequency distribution among classes was for pattern weighting.

Jing-Doo Wang

2011-01-01

302

Analysis of Chimpanzee History Based on Genome Sequence Alignments  

Microsoft Academic Search

Population geneticists often study small numbers of carefully chosen loci, but it has become possible to obtain orders of magnitude for more data from overlaps of genome sequences. Here, we generate tens of millions of base pairs of multiple sequence alignments from combinations of three western chimpanzees, three central chimpanzees, an eastern chimpanzee, a bonobo, a human, an orangutan, and

Jennifer L. Caswell; Swapan Mallick; Daniel J. Richter; Julie Neubauer; Christine Schirmer; Sante Gnerre; David Reich

2008-01-01

303

Genome Sequence of Fusobacterium nucleatum Subspecies Polymorphum — a Genetically Tractable  

Microsoft Academic Search

Fusobacterium nucleatum is a prominent member of the oral microbiota and is a common cause of human infection. F. nucleatum includes five subspecies: polymorphum, nucleatum, vincentii, fusiforme, and animalis. F. nucleatum subsp. polymorphum ATCC 10953 has been well characterized phenotypically and, in contrast to previously sequenced strains, is amenable to gene transfer. We sequenced and annotated the 2,429,698 bp genome

Fusobacterium Sandor; E. Karpathy; Xiang Qin; Jason Gioia; Huaiyang Jiang; Yamei Liu; Joseph F. Petrosino; Shailaja Yerrapragada; George E. Fox; Susan Kinder Haake; George M. Weinstock; Sarah K. Highlander

304

Brucella microti: the genome sequence of an emerging pathogen  

Microsoft Academic Search

BACKGROUND: Using a combination of pyrosequencing and conventional Sanger sequencing, the complete genome sequence of the recently described novel Brucella species, Brucella microti, was determined. B. microti is a member of the genus Brucella within the Alphaproteobacteria, which consists of medically important highly pathogenic facultative intracellular bacteria. In contrast to all other Brucella species, B. microti is a fast growing

Stéphane Audic; Magali Lescot; Jean-Michel Claverie; Holger C Scholz

2009-01-01

305

DNA sequence organization in the genomes of five marine invertebrates  

Microsoft Academic Search

The arrangement of repetitive and non-repetitive sequence was studied in the genomic DNA of the oyster (Crassostrea virginica), the surf clam (Spisula solidissima), the horseshoe crab (Limulus polyphemus), a nemertean worm (Cerebratulus lacteus) and a jellyfish (Aurelia aurita). Except for the jellyfish these animals belong to the protostomial branch of animal evolution, for which little information regarding DNA sequence organization

Robert B. Goldberg; William R. Crain; Joan V. Ruderman; Gordon P. Moore; Thomas R. Barnett; Ratchford C. Higgins; Robert A. Gelfand; Glenn A. Galau; Roy J. Britten; Eric H. Davidson

1975-01-01

306

GENOMIC SEQUENCE ANALYSIS OF LEPTOSPIRA BORGPETERSENII SEROVAR HARDJO  

Technology Transfer Automated Retrieval System (TEKTRAN)

A genomic library from Leptospira borgpetersenii serovar hardjo strain JB197 was prepared by mechanically shearing the DNA and inserting it into a positive selection vector. DNA was prepared from approximately 22,000 random clones and used as templates for automated sequencing. Sequence data was c...

307

PHYTOPHTHORA GENOME SEQUENCES UNCOVER EVOLUTIONARY ORIGINS AND MECHANISMS OF PATHOGENESIS  

Technology Transfer Automated Retrieval System (TEKTRAN)

Draft genome sequences of the soybean pathogen Phytophthora sojae and the sudden oak death pathogen Phytophthora ramorum have been determined. Oomycetes such as these Phytophthora species share the kingdom Stramenopiles with photosynthetic algae such as diatoms, and the Phytophthora sequences sugges...

308

Molecular Poltergeists: Mitochondrial DNA Copies (numts) in Sequenced Nuclear Genomes  

Microsoft Academic Search

The natural transfer of DNA from mitochondria to the nucleus generates nuclear copies of mitochondrial DNA (numts) and is an ongoing evolutionary process, as genome sequences attest. In humans, five different numts cause genetic disease and a dozen human loci are polymorphic for the presence of numts, underscoring the rapid rate at which mitochondrial sequences reach the nucleus over evolutionary

Einat Hazkani-Covo; Raymond M. Zeller; William Martin

2010-01-01

309

Genome Sequencing and Bioinformatics Analyses of Higher Plants Chloroplasts  

Microsoft Academic Search

Chloroplast DNA in higher plants exist as closed circular molecules of about 150 kb (±30), usually presenting inverted repeat sequences separating two single copy regions (1). It is available the complete chloroplast genomes of around 13 higher plants species available in the gene bank. Our group has completely sequenced the sugarcane chloroplast DNA which is 141182 nucleotides in size. We

Helaine Carrer

310

Genome sequencing and analysis of the model grass Brachypodium distachyon  

Microsoft Academic Search

Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum

David F. Garvin; Todd C. Mockler; Jeremy Schmutz; Dan Rokhsar; Kerrie Barry; Susan Lucas; Miranda Harmon-Smith; Kathleen Lail; Hope Tice; Jane Grimwood; Neil McKenzie; Naxin Huo; Yong Q. Gu; Gerard R. Lazo; Olin D. Anderson; Frank M. You; Ming-Cheng Luo; Jan Dvorak; Jonathan Wright; Melanie Febrer; Michael W. Bevan; Dominika Idziak; Robert Hasterok; Erika Lindquist; Mei Wang; Samuel E. Fox; Henry D. Priest; Sergei A. Filichkin; Scott A. Givan; Douglas W. Bryant; Jeff H. Chang; Haiyan Wu; Wei Wu; An-Ping Hsia; Patrick S. Schnable; Anantharaman Kalyanaraman; Brad Barbazuk; Todd P. Michael; Samuel P. Hazen; Jennifer N. Bragg; Debbie Laudencia-Chingcuanco; Yiqun Weng; Georg Haberer; Manuel Spannagl; Klaus Mayer; Thomas Rattei; Therese Mitros; Sang-Jik Lee; Jocelyn K. C. Rose; Lukas A. Mueller; Jan P. Buchmann; Jaakko Tanskanen; Heidrun Gundlach; Antonio Costa de Oliveira; Luciano da C. Maia; William Belknap; Ning Jiang; Jinsheng Lai; Liucun Zhu; Jianxin Ma; Cheng Sun; Florent Murat; Michael Abrouk; Remy Bruggmann; Joachim Messing; Noah Fahlgren; Christopher M. Sullivan; James C. Carrington; Elisabeth J. Chapman; Greg D. May; Jixian Zhai; Matthias Ganssmann; Sai Guna Ranjan Gurazada; Marcelo German; Ludmila Tyler; Jiajie Wu; James Thomson; Shan Chen; Henrik V. Scheller; Jesper Harholt; Peter Ulvskov; Jeffrey A. Kimbrel; Laura E. Bartley; Peijian Cao; Ki-Hong Jung; Manoj K. Sharma; Miguel Vega-Sanchez; Pamela Ronald; Christopher D. Dardick; Stefanie de Bodt; Wim Verelst; Dirk Inzé; Maren Heese; Arp Schnittger; Xiaohan Yang; Udaya C. Kalluri; Gerald A. Tuskan; Zhihua Hua; Richard D. Vierstra; Yu Cui; Shuhong Ouyang; Qixin Sun; Zhiyong Liu; Alper Yilmaz; Erich Grotewold; Richard Sibout; Kian Hematy; Gregory Mouille; Herman Höfte; Jérome Pelloux; Devin O'Connor; James Schnable; Scott Rowe; Frank Harmon; Cynthia L. Cass; John C. Sedbrook; Mary E. Byrne; Sean Walsh; Janet Higgins; Pinghua Li; Thomas Brutnell; Turgay Unver; Hikmet Budak; Harry Belcram; Mathieu Charles; Boulos Chalhoub; Ivan Baxter

2010-01-01

311

Whole-genome sequencing of multiple Arabidopsis thaliana populations  

Microsoft Academic Search

The plant Arabidopsis thaliana occurs naturally in many different habitats throughout Eurasia. As a foundation for identifying genetic variation contributing to adaptation to diverse environments, a 1001 Genomes Project to sequence geographically diverse A. thaliana strains has been initiated. Here we present the first phase of this project, based on population-scale sequencing of 80 strains drawn from eight regions throughout

Jun Cao; Korbinian Schneeberger; Stephan Ossowski; Torsten Günther; Sebastian Bender; Joffrey Fitz; Daniel Koenig; Christa Lanz; Oliver Stegle; Christoph Lippert; Xi Wang; Felix Ott; Jonas Müller; Carlos Alonso-Blanco; Karsten Borgwardt; Karl J Schmid; Detlef Weigel

2011-01-01

312

Targeted enrichment of genomic DNA regions for next generation sequencing  

Microsoft Academic Search

In this review we discuss the latest targeted enrichment methods, and aspects of their utilization along with second generation sequencing for complex genome analysis. In doing so we provide an overview of issues involved in detecting genetic variation, for which targeted enrichment has become a powerful tool. We explain how targeted enrichment for next generation sequencing has made great progress

F. Mertens; A. El-Sharawy; S. Sauer; J. Van Helvoort; P. J. Van der Zaag; A. Franke; M. Nilsson; Lehrach. H; A. Brookes

2011-01-01

313

Motivators for participation in a whole-genome sequencing study: implications for translational genomics research  

Microsoft Academic Search

The promise of personalized medicine depends on the ability to integrate genetic sequencing information into disease risk assessment for individuals. As genomic sequencing technology enters the realm of clinical care, its scale necessitates answers to key social and behavioral research questions about the complexities of understanding, communicating, and ultimately using sequence information to improve health. Our study captured the motivations

Flavia M Facio; Stephanie Brooks; Johanna Loewenstein; Susannah Green; Leslie G Biesecker; Barbara B Biesecker

2011-01-01

314

Complete genome sequencing and variant analysis of a Pakistani individual.  

PubMed

We sequenced the genome of a Pakistani male at 25.5x coverage using massively parallel sequencing technology. More than 90% of the sequence reads were mapped to the human reference genome. In subsequent analysis, we identified 3?224?311 single-nucleotide polymorphisms (SNPs), of which 388?532 (12% of the total SNPs) had not been previously recorded in single nucleotide polymorphism database (dbSNP) or the 1000 Genomes Project database. The 5991 non-synonymous coding variants were screened for deleterious or disease-associated SNPs. Analysis of genes with deleterious SNPs identified 'retinoic acid signaling' and 'regulation of transcription' as the enriched Gene Ontology terms. Scanning of non-synonymous SNPs against the OMIM revealed several disease and phenotype-associated variants in Pakistani genome. Comparative analysis with Indian genome sequence revealed >1.8 million shared SNPs; 32% of which were annotated in ?14?000 genes. Gene Ontology (GO) terms analysis of these genes identified 'response to jasmonic acid stimulus', 'aminoglycoside antibiotic metabolic process' and 'glycoside metabolic process' with considerable enrichment. A total of 59?558 of small indels (1-5?bp) and 16?063 large structural variations were found; 54% of which was novel. Substantial number of novel structural variations discovered in Pakistani genome enforced previous inferences that (a) structural variations are major type of variation in the genome and (b) compared with SNPs, they putatively exhibit equivalent or superior functional roles. This genome sequence information will be an important reference for population-wide genomics studies of ethnically diverse South Asian subcontinent. PMID:23842039

Azim, Muhammad Kamran; Yang, Chuanchun; Yan, Zhixiang; Choudhary, Muhammad Iqbal; Khan, Asifullah; Sun, Xiao; Li, Ran; Asif, Huma; Sharif, Sana; Zhang, Yong

2013-07-11

315

Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence  

PubMed Central

Background Previous studies have suggested that recent segmental duplications, which are often involved in chromosome rearrangements underlying genomic disease, account for some 5% of the human genome. We have developed rapid computational heuristics based on BLAST analysis to detect segmental duplications, as well as regions containing potential sequence misassignments in the human genome assemblies. Results Our analysis of the June 2002 public human genome assembly revealed that 107.4 of 3,043.1 megabases (Mb) (3.53%) of sequence contained segmental duplications, each with size equal or more than 5 kb and 90% identity. We have also detected that 38.9 Mb (1.28%) of sequence within this assembly is likely to be involved in sequence misassignment errors. Furthermore, we have identified a significant subset (199,965 of 2,327,473 or 8.6%) of single-nucleotide polymorphisms (SNPs) in the public databases that are not true SNPs but are potential paralogous sequence variants. Conclusion Using two distinct computational approaches, we have identified most of the sequences in the human genome that have undergone recent segmental duplications. Near-identical segmental duplications present a major challenge to the completion of the human genome sequence. Potential sequence misassignments detected in this study would require additional efforts to resolve.

Cheung, Joseph; Estivill, Xavier; Khaja, Razi; MacDonald, Jeffrey R; Lau, Ken; Tsui, Lap-Chee; Scherer, Stephen W

2003-01-01

316

Bioinformatics approaches for genomics and post genomics applications of next-generation sequencing.  

PubMed

Technical advances such as the development of molecular cloning, Sanger sequencing, PCR and oligonucleotide microarrays are key to our current capacity to sequence, annotate and study complete organismal genomes. Recent years have seen the development of a variety of so-called 'next-generation' sequencing platforms, with several others anticipated to become available shortly. The previously unimaginable scale and economy of these methods, coupled with their enthusiastic uptake by the scientific community and the potential for further improvements in accuracy and read length, suggest that these technologies are destined to make a huge and ongoing impact upon genomic and post-genomic biology. However, like the analysis of microarray data and the assembly and annotation of complete genome sequences from conventional sequencing data, the management and analysis of next-generation sequencing data requires (and indeed has already driven) the development of informatics tools able to assemble, map, and interpret huge quantities of relatively or extremely short nucleotide sequence data. Here we provide a broad overview of bioinformatics approaches that have been introduced for several genomics and functional genomics applications of next-generation sequencing. PMID:19864250

Horner, David Stephen; Pavesi, Giulio; Castrignanò, Tiziana; De Meo, Paolo D'Onorio; Liuni, Sabino; Sammeth, Michael; Picardi, Ernesto; Pesole, Graziano

2009-10-27

317

Porcine Parvovirus: DNA Sequence and Genome Organization  

Microsoft Academic Search

SUMMARY We have determined the nucleotide sequence of an almost full-length clone of porcine parvovirus (PPV). The sequence is 4973 nucleotides (nt) long. The 3' end of virion DNA shows a Y-shaped configuration homologous to rodent parvoviruses. The 5' end of virion DNA shows a repetition of 127 nt at the carboxy terminus of the capsid proteins. The overall organization

ANA I. RANZ; J. J. Manclus; ESMERALDA DIAZ-AROCA

1989-01-01

318

Comparative Mouse Genomics Centers Consortium: The Mouse Genotype Database  

Microsoft Academic Search

The Comparative Mouse Genomics Centers Consortium (CMGCC) is a branch of the Environmental Genome Project sponsored by the National Institute of Environmental Health Sciences (NIEHS) focusing upon the identification of human single nucleotide polymorphisms (SNPs) that may confer disease susceptibility within the human population. The goal of the CMGCC (http:\\/\\/www.niehs.nih.gov\\/cmgcc\\/) is to make genetic mouse models for human SNPs within

Jesse C. Wiley; Manjula Prattipati; Ching-Ping Lin; Warren Ladiges

2006-01-01

319

Initial sequence and comparative analysis of the cat genome  

PubMed Central

The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing ?65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence.

Pontius, Joan U.; Mullikin, James C.; Smith, Douglas R.; Lindblad-Toh, Kerstin; Gnerre, Sante; Clamp, Michele; Chang, Jean; Stephens, Robert; Neelam, Beena; Volfovsky, Natalia; Schaffer, Alejandro A.; Agarwala, Richa; Narfstrom, Kristina; Murphy, William J.; Giger, Urs; Roca, Alfred L.; Antunes, Agostinho; Menotti-Raymond, Marilyn; Yuhki, Naoya; Pecon-Slattery, Jill; Johnson, Warren E.; Bourque, Guillaume; Tesler, Glenn; O'Brien, Stephen J.

2007-01-01

320

Combining two technologies for full genome sequencing of human.  

PubMed

At present, the new technologies of DNA sequencing are rapidly developing allowing quick and efficient characterisation of organisms at the level of the genome structure. In this study, the whole genome sequencing of a human (Russian man) was performed using two technologies currently present on the market - Sequencing by Oligonucleotide Ligation and Detection (SOLiD™) (Applied Biosystems) and sequencing technologies of molecular clusters using fluorescently labeled precursors (Illumina). The total number of generated data resulted in 108.3 billion base pairs (60.2 billion from Illumina technology and 48.1 billion from SOLiD technology). Statistics performed on reads generated by GAII and SOLiD showed that they covered 75% and 96% of the genome respectively. Short polymorphic regions were detected with comparable accuracy however, the absolute amount of them revealed by SOLiD was several times less than by GAII. Optimal algorithm for using the latest methods of sequencing was established for the analysis of individual human genomes. The study is the first Russian effort towards whole human genome sequencing. PMID:22649622

Skryabin, K G; Prokhortchouk, E B; Mazur, A M; Boulygina, E S; Tsygankova, S V; Nedoluzhko, A V; Rastorguev, S M; Matveev, V B; Chekanov, N N; D A, Goranskaya; Teslyuk, A B; Gruzdeva, N M; Velikhov, V E; Zaridze, D G; Kovalchuk, M V

2009-10-01

321

Combining Two Technologies for Full Genome Sequencing of Human  

PubMed Central

At present, the new technologies of DNA sequencing are rapidly developing allowing quick and efficient characterisation of organisms at the level of the genome structure. In this study, the whole genome sequencing of a human (Russian man) was performed using two technologies currently present on the market - Sequencing by Oligonucleotide Ligation and Detection (SOLiD™) (Applied Biosystems) and sequencing technologies of molecular clusters using fluorescently labeled precursors (Illumina). The total number of generated data resulted in 108.3 billion base pairs (60.2 billion from Illumina technology and 48.1 billion from SOLiD technology). Statistics performed on reads generated by GAII and SOLiD showed that they covered 75% and 96% of the genome respectively. Short polymorphic regions were detected with comparable accuracy however, the absolute amount of them revealed by SOLiD was several times less than by GAII. Optimal algorithm for using the latest methods of sequencing was established for the analysis of individual human genomes. The study is the first Russian effort towards whole human genome sequencing.

Skryabin, K.G.; Mazur, A.M.; Boulygina, E.S.; Tsygankova, S.V.; Nedoluzhko, A.V.; Rastorguev, S.M.; Matveev, V.B.; Chekanov, N.N.; D.A., Goranskaya; Teslyuk, A.B.; Gruzdeva, N.M.; Velikhov, V.E.; Zaridze, D.G.; Kovalchuk, M.V.

2009-01-01

322

The complete mitochondrial genome sequence of Pseudobagrus ussuriensis (Siluriformes: Bagridae).  

PubMed

The complete mitochondrial DNA genome sequence of Ussuri catfish (Pseudobagrus ussuriensis) was determined. The mitochondrial genome sequence is a circular molecule with 16,536 bp in length. It contains 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes, and 2 non-coding regions. The nucleotide acid composition of the entire mitogenome is 31.79% for A, 26.84% for T, 14.87% for G, and 26.50% for C, showing a high A+T content. The complete mitogenome sequence of P. ussuriensis can be used in the studies on molecular systematics, conservation genetics, and stock evaluation. PMID:23351066

Wan, Quan; Tao, Gang; Cheng, Qiqun; Chen, Ying; Qiao, Huiying

2013-01-25

323

Genome Sequence of Sinorhizobium meliloti Rm41.  

PubMed

Sinorhizobium meliloti Rm41 nodulates alfalfa plants, forming indeterminate type nodules. It is characterized by a strain-specific K-antigen able to replace exopolysaccharides in promotion of nodule invasion. We present the Rm41 genome, composed of one chromosome, the chromid pSymB, the megaplasmid pSymA, and the nonsymbiotic plasmid pRme41a. PMID:23405285

Weidner, Stefan; Baumgarth, Birgit; Göttfert, Michael; Jaenicke, Sebastian; Pühler, Alfred; Schneiker-Bekel, Susanne; Serrania, Javier; Szczepanowski, Rafael; Becker, Anke

2013-01-15

324

Genome Sequence of Sinorhizobium meliloti Rm41  

PubMed Central

Sinorhizobium meliloti Rm41 nodulates alfalfa plants, forming indeterminate type nodules. It is characterized by a strain-specific K-antigen able to replace exopolysaccharides in promotion of nodule invasion. We present the Rm41 genome, composed of one chromosome, the chromid pSymB, the megaplasmid pSymA, and the nonsymbiotic plasmid pRme41a.

Weidner, Stefan; Baumgarth, Birgit; Gottfert, Michael; Jaenicke, Sebastian; Puhler, Alfred; Schneiker-Bekel, Susanne; Serrania, Javier; Szczepanowski, Rafael

2013-01-01

325

Genome Sequence of Enterobacter cancerogenus YZ1  

PubMed Central

Enterobacter cancerogenus is usually known as an opportunistic human pathogen. Recently, it has attracted great attention for its capability to produce bioemulsifier, degrade xenobiotics, and resist alkalis and antibiotics. Here we report the complete genome of Enterobacter cancerogenus YZ1, isolated from a bran-feeding Coleoptera insect’s frass.

Wei, Yifeng; Yang, Yu; Zhou, Lisha; Liu, Zhangyi; Wang, Xiangyan; Yang, Rentao; Su, Qingqing; Zhou, Yuping

2013-01-01

326

Bisulfite genomic sequencing of microdissected cells  

Microsoft Academic Search

Mapping of methylation patterns in CpG islands has become an important tool for understanding tissue- specific gene expression in both normal and patho- logical situations. However, the inherent cellular heterogeneity of any given tissues can affect the outcome and interpretation of molecular studies. In order to analyse genomic DNA methylation on a pure cell population from tissue sample, we have

Antoine Kerjean; Annick Vieillefond; Nicolas Thiounn; Mathilde Sibony; Marc Jeanpierre; Pierre Jouannet

2001-01-01

327

Comparative assessment of methods for aligning multiple genome sequences.  

PubMed

Multiple sequence alignment is a difficult computational problem. There have been compelling pleas for methods to assess whole-genome multiple sequence alignments and compare the alignments produced by different tools. We assess the four ENCODE alignments, each of which aligns 28 vertebrates on 554 Mbp of total input sequence. We measure the level of agreement among the alignments and compare their coverage and accuracy. We find a disturbing lack of agreement among the alignments not only in species distant from human, but even in mouse, a well-studied model organism. Overall, the assessment shows that Pecan produces the most accurate or nearly most accurate alignment in all species and genomic location categories, while still providing coverage comparable to or better than that of the other alignments in the placental mammals. Our assessment reveals that constructing accurate whole-genome multiple sequence alignments remains a significant challenge, particularly for noncoding regions and distantly related species. PMID:20495551

Chen, Xiaoyu; Tompa, Martin

2010-05-23

328

The genome sequence of the model ascomycete fungus Podospora anserina  

Microsoft Academic Search

Background The dung-inhabiting ascomycete fungus Podospora anserina is a model used to study various aspects of eukaryotic and fungal biology, such as ageing, prions and sexual development. Results We present a 10X draft sequence of P. anserina genome, linked to the sequences of a large expressed sequence tag collection. Similar to higher eukaryotes, the P. anserina transcription\\/splicing machinery generates numerous

Olivier Lespinet; Fabienne Malagnac; Corinne Da Silva; Olivier Jaillon; Betina M Porcel; Arnaud Couloux; Jean-Marc Aury; Béatrice Ségurens; Julie Poulain; Véronique Anthouard; Sandrine Grossetete; Hamid Khalili; Evelyne Coppin; Michelle Déquard-Chablat; Marguerite Picard; Véronique Contamine; Sylvie Arnaise; Anne Bourdais; Véronique Berteaux-Lecellier; Daniel Gautheret; Ronald P de Vries; Evy Battaglia; Pedro M Coutinho; Etienne GJ Danchin; Bernard Henrissat; Riyad EL Khoury; Annie Sainsard-Chanet; Antoine Boivin; Bérangère Pinan-Lucarré; Carole H Sellem; Robert Debuchy; Patrick Wincker; Jean Weissenbach; Philippe Silar

2008-01-01

329

The Diploid Genome Sequence of an Individual Human  

Microsoft Academic Search

Presented here is a genome sequence of an individual human. It was produced from ?32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison

Samuel Levy; Granger Sutton; Pauline C Ng; Lars Feuk; Aaron L Halpern; Brian P Walenz; Nelson Axelrod; Jiaqi Huang; Ewen F Kirkness; Gennady Denisov; Yuan Lin; Jeffrey R MacDonald; Andy Wing Chun Pang; Mary Shago; Timothy B Stockwell; Alexia Tsiamouri; Vineet Bafna; Vikas Bansal; Saul A Kravitz; Dana A Busam; Karen Y Beeson; Tina C McIntosh; Karin A Remington; Josep F Abril; John Gill; Jon Borman; Yu-Hui Rogers; Marvin E Frazier; Stephen W Scherer; Robert L Strausberg; J. Craig Venter

2007-01-01

330

Complete Genome Sequences of Three Strains of Coxsackievirus A7  

PubMed Central

Genomes of three strains (Parker, USSR, and 275/58) of coxsackievirus A7 (CV-A7) were amplified by the long reverse transcription (RT)-PCR method and sequenced. While the sequences of Parker and USSR were identical, the similarities of 275/58 to the CV-A7 reference sequence, accession no. AY421765, were 82.6% and 96.2% for nucleotides and amino acids, respectively.

Yla-Pelto, Jani; Koskinen, Satu; Karelehto, Eveliina; Sittig, Eleonora; Roivainen, Merja; Hyypia, Timo

2013-01-01

331

Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis  

Microsoft Academic Search

BackgroundAnalysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available

Inês C. Conceição; Anthony D. Long; Jonathan D. Gruber; Patrícia Beldade

2011-01-01

332

Mitochondrial genome sequencing helps show the evolutionary mechanism of mitochondrial genome formation in Brassica  

PubMed Central

Background Angiosperm mitochondrial genomes are more complex than those of other organisms. Analyses of the mitochondrial genome sequences of at least 11 angiosperm species have showed several common properties; these cannot easily explain, however, how the diverse mitotypes evolved within each genus or species. We analyzed the evolutionary relationships of Brassica mitotypes by sequencing. Results We sequenced the mitotypes of cam (Brassica rapa), ole (B. oleracea), jun (B. juncea), and car (B. carinata) and analyzed them together with two previously sequenced mitotypes of B. napus (pol and nap). The sizes of whole single circular genomes of cam, jun, ole, and car are 219,747 bp, 219,766 bp, 360,271 bp, and 232,241 bp, respectively. The mitochondrial genome of ole is largest as a resulting of the duplication of a 141.8 kb segment. The jun mitotype is the result of an inherited cam mitotype, and pol is also derived from the cam mitotype with evolutionary modifications. Genes with known functions are conserved in all mitotypes, but clear variation in open reading frames (ORFs) with unknown functions among the six mitotypes was observed. Sequence relationship analysis showed that there has been genome compaction and inheritance in the course of Brassica mitotype evolution. Conclusions We have sequenced four Brassica mitotypes, compared six Brassica mitotypes and suggested a mechanism for mitochondrial genome formation in Brassica, including evolutionary events such as inheritance, duplication, rearrangement, genome compaction, and mutation.

2011-01-01

333

Sequence-Based Mapping of the Polyploid Wheat Genome  

PubMed Central

The emergence of new sequencing technologies has provided fast and cost-efficient strategies for high-resolution mapping of complex genomes. Although these approaches hold great promise to accelerate genome analysis, their application in studying genetic variation in wheat has been hindered by the complexity of its polyploid genome. Here, we applied the next-generation sequencing of a wheat doubled-haploid mapping population for high-resolution gene mapping and tested its utility for ordering shotgun sequence contigs of a flow-sorted wheat chromosome. A bioinformatical pipeline was developed for reliable variant analysis of sequence data generated for polyploid wheat mapping populations. The results of variant mapping were consistent with the results obtained using the wheat 9000 SNP iSelect assay. A reference map of the wheat genome integrating 2740 gene-associated single-nucleotide polymorphisms from the wheat iSelect assay, 1351 diversity array technology, 118 simple sequence repeat/sequence-tagged sites, and 416,856 genotyping-by-sequencing markers was developed. By analyzing the sequenced megabase-size regions of the wheat genome we showed that mapped markers are located within 40?100 kb from genes providing a possibility for high-resolution mapping at the level of a single gene. In our population, gene loci controlling a seed color phenotype cosegregated with 2459 markers including one that was located within the red seed color gene. We demonstrate that the high-density reference map presented here is a useful resource for gene mapping and linking physical and genetic maps of the wheat genome.

Saintenac, Cyrille; Jiang, Dayou; Wang, Shichen; Akhunov, Eduard

2013-01-01

334

Pervasive sequence patents cover the entire human genome.  

PubMed

The scope and eligibility of patents for genetic sequences have been debated for decades, but a critical case regarding gene patents (Association of Molecular Pathologists v. Myriad Genetics) is now reaching the US Supreme Court. Recent court rulings have supported the assertion that such patents can provide intellectual property rights on sequences as small as 15 nucleotides (15mers), but an analysis of all current US patent claims and the human genome presented here shows that 15mer sequences from all human genes match at least one other gene. The average gene matches 364 other genes as 15mers; the breast-cancer-associated gene BRCA1 has 15mers matching at least 689 other genes. Longer sequences (1,000 bp) still showed extensive cross-gene matches. Furthermore, 15mer-length claims from bovine and other animal patents could also claim as much as 84% of the genes in the human genome. In addition, when we expanded our analysis to full-length patent claims on DNA from all US patents to date, we found that 41% of the genes in the human genome have been claimed. Thus, current patents for both short and long nucleotide sequences are extraordinarily non-specific and create an uncertain, problematic liability for genomic medicine, especially in regard to targeted re-sequencing and other sequence diagnostic assays. PMID:23522065

Rosenfeld, Jeffrey; Mason, Christopher E

2013-03-25

335

Draft Genome Sequences of Two Virulent Serotypes of Avian Pasteurella multocida  

PubMed Central

Here we report the draft genome sequences of two virulent avian strains of Pasteurella multocida. Comparative analyses of these genomes were done with the published genome sequence of avirulent P. multocida strain Pm70.

Abrahante, Juan E.; Johnson, Timothy J.; Hunter, Samuel S.; Maheswaran, Samuel K.; Hauglund, Melissa J.; Bayles, Darrell O.; Tatum, Fred M.

2013-01-01

336

The Arabidopsis lyrata genome sequence and the basis of rapid genome size change.  

PubMed

We report the 207-Mb genome sequence of the North American Arabidopsis lyrata strain MN47 based on 8.3× dideoxy sequence coverage. We predict 32,670 genes in this outcrossing species compared to the 27,025 genes in the selfing species Arabidopsis thaliana. The much smaller 125-Mb genome of A. thaliana, which diverged from A. lyrata 10 million years ago, likely constitutes the derived state for the family. We found evidence for DNA loss from large-scale rearrangements, but most of the difference in genome size can be attributed to hundreds of thousands of small deletions, mostly in noncoding DNA and transposons. Analysis of deletions and insertions still segregating in A. thaliana indicates that the process of DNA loss is ongoing, suggesting pervasive selection for a smaller genome. The high-quality reference genome sequence for A. lyrata will be an important resource for functional, evolutionary and ecological studies in the genus Arabidopsis. PMID:21478890

Hu, Tina T; Pattyn, Pedro; Bakker, Erica G; Cao, Jun; Cheng, Jan-Fang; Clark, Richard M; Fahlgren, Noah; Fawcett, Jeffrey A; Grimwood, Jane; Gundlach, Heidrun; Haberer, Georg; Hollister, Jesse D; Ossowski, Stephan; Ottilar, Robert P; Salamov, Asaf A; Schneeberger, Korbinian; Spannagl, Manuel; Wang, Xi; Yang, Liang; Nasrallah, Mikhail E; Bergelson, Joy; Carrington, James C; Gaut, Brandon S; Schmutz, Jeremy; Mayer, Klaus F X; Van de Peer, Yves; Grigoriev, Igor V; Nordborg, Magnus; Weigel, Detlef; Guo, Ya-Long

2011-04-10

337

Genome sequence of the pea aphid Acyrthosiphon pisum.  

PubMed

Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetabolous insect provides an outgroup to the multiple published genomes of holometabolous insects. Pea aphids are host-plant specialists, they can reproduce both sexually and asexually, and they have coevolved with an obligate bacterial symbiont. Here we highlight findings from whole genome analysis that may be related to these unusual biological features. These findings include discovery of extensive gene duplication in more than 2000 gene families as well as loss of evolutionarily conserved genes. Gene family expansions relative to other published genomes include genes involved in chromatin modification, miRNA synthesis, and sugar transport. Gene losses include genes central to the IMD immune pathway, selenoprotein utilization, purine salvage, and the entire urea cycle. The pea aphid genome reveals that only a limited number of genes have been acquired from bacteria; thus the reduced gene count of Buchnera does not reflect gene transfer to the host genome. The inventory of metabolic genes in the pea aphid genome suggests that there is extensive metabolite exchange between the aphid and Buchnera, including sharing of amino acid biosynthesis between the aphid and Buchnera. The pea aphid genome provides a foundation for post-genomic studies of fundamental biological questions and applied agricultural problems. PMID:20186266

2010-02-23

338

Genome Sequence of the Pea Aphid Acyrthosiphon pisum  

PubMed Central

Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetabolous insect provides an outgroup to the multiple published genomes of holometabolous insects. Pea aphids are host-plant specialists, they can reproduce both sexually and asexually, and they have coevolved with an obligate bacterial symbiont. Here we highlight findings from whole genome analysis that may be related to these unusual biological features. These findings include discovery of extensive gene duplication in more than 2000 gene families as well as loss of evolutionarily conserved genes. Gene family expansions relative to other published genomes include genes involved in chromatin modification, miRNA synthesis, and sugar transport. Gene losses include genes central to the IMD immune pathway, selenoprotein utilization, purine salvage, and the entire urea cycle. The pea aphid genome reveals that only a limited number of genes have been acquired from bacteria; thus the reduced gene count of Buchnera does not reflect gene transfer to the host genome. The inventory of metabolic genes in the pea aphid genome suggests that there is extensive metabolite exchange between the aphid and Buchnera, including sharing of amino acid biosynthesis between the aphid and Buchnera. The pea aphid genome provides a foundation for post-genomic studies of fundamental biological questions and applied agricultural problems.

2010-01-01

339

Draft genome sequence of the Tibetan antelope.  

PubMed

The Tibetan antelope (Pantholops hodgsonii) is endemic to the extremely inhospitable high-altitude environment of the Qinghai-Tibetan Plateau, a region that has a low partial pressure of oxygen and high ultraviolet radiation. Here we generate a draft genome of this artiodactyl and use it to detect the potential genetic bases of highland adaptation. Compared with other plain-dwelling mammals, the genome of the Tibetan antelope shows signals of adaptive evolution and gene-family expansion in genes associated with energy metabolism and oxygen transmission. Both the highland American pika, and the Tibetan antelope have signals of positive selection for genes involved in DNA repair and the production of ATPase. Genes associated with hypoxia seem to have experienced convergent evolution. Thus, our study suggests that common genetic mechanisms might have been utilized to enable high-altitude adaptation. PMID:23673643

Ge, Ri-Li; Cai, Qingle; Shen, Yong-Yi; San, A; Ma, Lan; Zhang, Yong; Yi, Xin; Chen, Yan; Yang, Lingfeng; Huang, Ying; He, Rongjun; Hui, Yuanyuan; Hao, Meirong; Li, Yue; Wang, Bo; Ou, Xiaohua; Xu, Jiaohui; Zhang, Yongfen; Wu, Kui; Geng, Chunyu; Zhou, Weiping; Zhou, Taicheng; Irwin, David M; Yang, Yingzhong; Ying, Liu; Bao, Haihua; Kim, Jaebum; Larkin, Denis M; Ma, Jian; Lewin, Harris A; Xing, Jinchuan; Platt, Roy N; Ray, David A; Auvil, Loretta; Capitanu, Boris; Zhang, Xiufeng; Zhang, Guojie; Murphy, Robert W; Wang, Jun; Zhang, Ya-Ping; Wang, Jian

2013-01-01

340

Rosaceaous Genome Sequencing: Perspectives and Progress  

Microsoft Academic Search

\\u000a The long-term goal of plant genomics is to identify, isolate and determine the function of plant genes that are associated\\u000a with both vegetative and reproductive phenotypes. Most phenotypes require the coordinated activity and regulatory control\\u000a of suites of genes over time and in precise positions within the plant. Until recently, the idea of establishing a comprehensive\\u000a approach to isolate and

Bryon Sosinski; Vladimir Shulaev; Amit Dhingra; Ananth Kalyanaraman; Roger Bumgarner; Daniel Rokhsar; Ignazio Verde; Riccardo Velasco; Albert G. Abbott

341

Whole-genome haplotyping by dilution, amplification, and sequencing  

PubMed Central

Standard whole-genome genotyping technologies are unable to determine haplotypes. Here we describe a method for rapid and cost-effective long-range haplotyping. Genomic DNA is diluted and distributed into multiple aliquots such that each aliquot receives a fraction of a haploid copy. The DNA template in each aliquot is amplified by multiple displacement amplification, converted into barcoded sequencing libraries using Nextera technology, and sequenced in multiplexed pools. To assess the performance of our method, we combined two male genomic DNA samples at equal ratios, resulting in a sample with diploid X chromosomes with known haplotypes. Pools of the multiplexed sequencing libraries were subjected to targeted pull-down of a 1-Mb contiguous region of the X-chromosome Duchenne muscular dystrophy gene. We were able to phase the Duchenne muscular dystrophy region into two contiguous haplotype blocks with a mean length of 494 kb. The haplotypes showed 99% agreement with the consensus base calls made by sequencing the individual DNAs. We subsequently used the strategy to haplotype two human genomes. Standard genomic sequencing to identify all heterozygous SNPs in the sample was combined with dilution-amplification–based sequencing data to resolve the phase of identified heterozygous SNPs. Using this procedure, we were able to phase >95% of the heterozygous SNPs from the diploid sequence data. The N50 for a Yoruba male DNA was 702 kb whereas the N50 for a European female DNA was 358 kb. Therefore, the strategy described here is suitable for haplotyping of a set of targeted regions as well as of the entire genome.

Kaper, Fiona; Swamy, Sajani; Klotzle, Brandy; Munchel, Sarah; Cottrell, Joseph; Bibikova, Marina; Chuang, Han-Yu; Kruglyak, Semyon; Ronaghi, Mostafa; Eberle, Michael A.; Fan, Jian-Bing

2013-01-01

342

Whole-genome haplotyping by dilution, amplification, and sequencing.  

PubMed

Standard whole-genome genotyping technologies are unable to determine haplotypes. Here we describe a method for rapid and cost-effective long-range haplotyping. Genomic DNA is diluted and distributed into multiple aliquots such that each aliquot receives a fraction of a haploid copy. The DNA template in each aliquot is amplified by multiple displacement amplification, converted into barcoded sequencing libraries using Nextera technology, and sequenced in multiplexed pools. To assess the performance of our method, we combined two male genomic DNA samples at equal ratios, resulting in a sample with diploid X chromosomes with known haplotypes. Pools of the multiplexed sequencing libraries were subjected to targeted pull-down of a 1-Mb contiguous region of the X-chromosome Duchenne muscular dystrophy gene. We were able to phase the Duchenne muscular dystrophy region into two contiguous haplotype blocks with a mean length of 494 kb. The haplotypes showed 99% agreement with the consensus base calls made by sequencing the individual DNAs. We subsequently used the strategy to haplotype two human genomes. Standard genomic sequencing to identify all heterozygous SNPs in the sample was combined with dilution-amplification-based sequencing data to resolve the phase of identified heterozygous SNPs. Using this procedure, we were able to phase >95% of the heterozygous SNPs from the diploid sequence data. The N50 for a Yoruba male DNA was 702 kb whereas the N50 for a European female DNA was 358 kb. Therefore, the strategy described here is suitable for haplotyping of a set of targeted regions as well as of the entire genome. PMID:23509297

Kaper, Fiona; Swamy, Sajani; Klotzle, Brandy; Munchel, Sarah; Cottrell, Joseph; Bibikova, Marina; Chuang, Han-Yu; Kruglyak, Semyon; Ronaghi, Mostafa; Eberle, Michael A; Fan, Jian-Bing

2013-03-18

343

Building a model: developing genomic resources for common milkweed ( Asclepias syriaca ) with low coverage genome sequencing  

Microsoft Academic Search

Background  Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic\\u000a resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of\\u000a the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development

Shannon CK Straub; Mark Fishbein; Tatyana Livshultz; Zachary Foster; Matthew Parks; Kevin Weitemier; Richard C Cronn; Aaron Liston

2011-01-01

344

A rapid whole genome sequencing and analysis system supporting genomic epidemiology (7th Annual SFAF Meeting, 2012)  

ScienceCinema

Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

345

DNA sequencing of a cytogenetically normal acute myeloid leukemia genome  

PubMed Central

Lay Summary Acute myeloid leukemia is a highly malignant hematopoietic tumor that affects about 13,000 adults yearly in the United States. The treatment of this disease has changed little in the past two decades, since most of the genetic events that initiate the disease remain undiscovered. Whole genome sequencing is now possible at a reasonable cost and timeframe to utilize this approach for unbiased discovery of tumor-specific somatic mutations that alter the protein-coding genes. Here we show the results obtained by sequencing a typical acute myeloid leukemia genome and its matched normal counterpart, obtained from the patient’s skin. We discovered 10 genes with acquired mutations; two were previously described mutations thought to contribute to tumor progression, and 8 were novel mutations present in virtually all tumor cells at presentation and relapse, whose function is not yet known. Our study establishes whole genome sequencing as an unbiased method for discovering initiating mutations in cancer genomes, and for identifying novel genes that may respond to targeted therapies. We used massively parallel sequencing technology to sequence the genomic DNA of tumor and normal skin cells obtained from a patient with a typical presentation of FAB M1 Acute Myeloid Leukemia (AML) with normal cytogenetics. 32.7-fold ‘haploid’ coverage (98 billion bases) was obtained for the tumor genome, and 13.9-fold coverage (41.8 billion bases) was obtained for the normal sample. Of 2,647,695 well-supported Single Nucleotide Variants (SNVs) found in the tumor genome, 2,588,486 (97.7%) also were detected in the patient’s skin genome, limiting the number of variants that required further study. For the purposes of this initial study, we restricted our downstream analysis to the coding sequences of annotated genes: we found only eight heterozygous, non-synonymous somatic SNVs in the entire genome. All were novel, including mutations in protocadherin/cadherin family members (CDH24 and PCLKC), G-protein coupled receptors (GPR123 and EBI2), a protein phosphatase (PTPRT), a potential guanine nucleotide exchange factor (KNDC1), a peptide/drug transporter (SLC15A1), and a glutamate receptor gene (GRINL1B). We also detected previously described, recurrent somatic insertions in the FLT3 and NPM1 genes. Based on deep readcount data, we determined that all of these mutations (except FLT3) were present in nearly all tumor cells at presentation, and again at relapse 11 months later, suggesting that the patient had a single dominant clone containing all of the mutations. These results demonstrate the power of whole genome sequencing to discover novel cancer-associated mutations.

Ley, Timothy J; Mardis, Elaine R; Ding, Li; Fulton, Bob; McLellan, Michael D; Chen, Ken; Dooling, David; Dunford-Shore, Brian H; McGrath, Sean; Hickenbotham, Matthew; Cook, Lisa; Abbott, Rachel; Larson, David E; Koboldt, Dan C; Pohl, Craig; Smith, Scott; Hawkins, Amy; Abbott, Scott; Locke, Devin; Hillier, LaDeana W; Miner, Tracie; Fulton, Lucinda; Magrini, Vincent; Wylie, Todd; Glasscock, Jarret; Conyers, Joshua; Sander, Nathan; Shi, Xiaoqi; Osborne, John R; Minx, Patrick; Gordon, David; Chinwalla, Asif; Zhao, Yu; Ries, Rhonda E; Payton, Jacqueline E; Westervelt, Peter; Tomasson, Michael H; Watson, Mark; Baty, Jack; Ivanovich, Jennifer; Heath, Sharon; Shannon, William D; Nagarajan, Rakesh; Walter, Matthew J; Link, Daniel C; Graubert, Timothy A; DiPersio, John F; Wilson, Richard K

2008-01-01

346

The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative Genomics  

Microsoft Academic Search

The soil nematodes Caenorhabditis briggsae and Caenorhabditis elegans diverged from a common ancestor roughly 100 million years ago and yet are almost indistinguishable by eye. They have the same chromosome number and genome sizes, and they occupy the same ecological niche. To explore the basis for this striking conservation of structure and function, we have sequenced the C. briggsae genome

Lincoln D Stein; Zhirong Bao; Darin Blasiar; Thomas Blumenthal; Michael R Brent; Nansheng Chen; Asif Chinwalla; Laura Clarke; Chris Clee; Avril Coghlan; Alan Coulson; Peter DEustachio; David H. A Fitch; Lucinda A Fulton; Robert E Fulton; Sam Griffiths-Jones; Todd W Harris; LaDeana W Hillier; Ravi Kamath; Patricia E Kuwabara; Elaine R Mardis; Marco A Marra; Tracie L Miner; Patrick Minx; James C Mullikin; Robert W Plumb; Jane Rogers; Jacqueline E Schein; Marc Sohrmann; John Spieth; Jason E Stajich; Chaochun Wei; David Willey; Richard K Wilson; Richard Durbin; Robert H Waterston

2003-01-01

347

The genome sequence of caenorhabditis briggsae: a platform for comparative genomics  

Microsoft Academic Search

The soil nematodes Caenorhabditis briggsae and Caenorhabditis elegans diverged from a common ancestor roughly 100 million years ago and yet are almost indistinguishable by eye. They have the same chromosome number and genome sizes, and they occupy the same ecological niche. To explore the basis for this striking conservation of structure and function, we have sequenced the C. briggsae genome

Lincoln D. Stein; Zhirong Bao; Darin Blasiar; Thomas Blumenthal; Michael R. Brent; Nansheng Chen; Asif Chinwalla; Laura Clarke; Chris Clee; Avril Coghlan; Alan Coulson; Peter DEustachio; David H. A. Fitch; Lucinda A. Fulton; Robert E. Fulton; Sam Griffiths-Jones; Todd W. Harris; LaDeana W. Hillier; Ravi Kamath; Patricia E. Kuwabara; Elaine R. Mardis; Marco A. Marra; Tracie L. Miner; Patrick Minx; James C. Mullikin; Robert W. Plumb; Jane Rogers; Jacqueline E. Schein; Marc Sohrmann; John Spieth; Jason E. Stajich; Chaochun Wei; David Willey; Richard K. Wilson; Richard Durbin; Robert H. Waterston

2003-01-01

348

Rapid genome evolution as revealed by comparative sequence analysis of orthologous regions from four triticeae genomes  

Technology Transfer Automated Retrieval System (TEKTRAN)

Bread wheat (Triticum aestivum) is a hexaploid species, consisting of three subgenomes (A, B, and D). To study the molecular evolution of these closely related genomes, we compared the sequence of a 307-kb physical contig covering the HMW-glutenin locus from the A genome of durum wheat Triticum turg...

349

Patterns of damage in genomic DNA sequences from a Neandertal.  

PubMed

High-throughput direct sequencing techniques have recently opened the possibility to sequence genomes from Pleistocene organisms. Here we analyze DNA sequences determined from a Neandertal, a mammoth, and a cave bear. We show that purines are overrepresented at positions adjacent to the breaks in the ancient DNA, suggesting that depurination has contributed to its degradation. We furthermore show that substitutions resulting from miscoding cytosine residues are vastly overrepresented in the DNA sequences and drastically clustered in the ends of the molecules, whereas other substitutions are rare. We present a model where the observed substitution patterns are used to estimate the rate of deamination of cytosine residues in single- and double-stranded portions of the DNA, the length of single-stranded ends, and the frequency of nicks. The results suggest that reliable genome sequences can be obtained from Pleistocene organisms. PMID:17715061

Briggs, Adrian W; Stenzel, Udo; Johnson, Philip L F; Green, Richard E; Kelso, Janet; Prüfer, Kay; Meyer, Matthias; Krause, Johannes; Ronan, Michael T; Lachmann, Michael; Pääbo, Svante

2007-08-21

350

Triticeae genomics: advances in sequence analysis of large genome cereal crops  

Microsoft Academic Search

Whole genome sequencing provides direct access to all genes of an organism and represents an essential step towards a systematic\\u000a understanding of (crop) plant biology. Wheat and barley, two of the most important crop species worldwide, have two- to five-fold\\u000a larger genomes than human – too large to be completely sequenced at current costs. Nevertheless, significant progress has\\u000a been made

Nils Stein

2007-01-01

351

Complete genome sequence of Meiothermus ruber type strain (21T)  

SciTech Connect

Meiothermus ruber (Loginova et al. 1984) Nobre et al. 1996 is the type species of the genus Meiothermus. This thermophilic genus is of special interest, as its members can be affiliated to either low-temperature or high-temperature groups. The temperature related split is in accordance with the chemotaxonomic feature of the polar lipids. M. ruber is a representative of the low-temperature group. This is the first completed genome sequence of the genus Meiothermus and only the third genome sequence to be published from a member of the family Thermaceae. The 3,097,457 bp long genome with its 3,052 protein-coding and 53 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

Tindall, Brian [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Goltsman, Eugene [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Fahnrich, Regine [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute

2010-01-01

352

Complete genome sequence of Arcobacter nitrofigilis type strain (CIT)  

PubMed Central

Arcobacter nitrofigilis (McClung et al. 1983) Vandamme et al. 1991 is the type species of the genus Arcobacter in the family Campylobacteraceae within the Epsilonproteobacteria. The species was first described in 1983 as Campylobacter nitrofigilis [1] after its detection as a free-living, nitrogen-fixing Campylobacter species associated with Spartina alterniflora Loisel roots [2]. It is of phylogenetic interest because of its lifestyle as a symbiotic organism in a marine environment in contrast to many other Arcobacter species which are associated with warm-blooded animals and tend to be pathogenic. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a type stain of the genus Arcobacter. The 3,192,235 bp genome with its 3,154 protein-coding and 70 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Pati, Amrita; Gronow, Sabine; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Han, Cliff; Chertkov, Olga; Bruce, David; Tapia, Roxanne; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Detter, John C.; Rohde, Manfred; Goker, Markus; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C.

2010-01-01

353

Complete genomic DNA sequence of rock bream iridovirus.  

PubMed

Iridovirus is a causative agent of epizootics among cultured rock bream (Oplegnathus fasciatus) in Korea. Here, we report the complete genomic sequence of rock bream iridovirus (RBIV). The genome of RBIV was 112080 bp long and contained at least 118 putative open reading frames (ORFs), and its genome organization was similar to that of infectious spleen and kidney necrosis virus (ISKNV). Of the RBIV's 118 ORFs, 85 ORFs showed 60-99% amino acid identity to those of ISKNV. Phylogenetic analysis of major capsid protein (MCP), DNA repair protein RAD2, and DNA polymerase type-B family indicated that RBIV is closely related to red sea bream iridovirus (RSIV), Grouper sleepy disease iridovirus (GSDIV), Dwarf gourami iridovirus (DGIV), and ISKNV. The genome sequence provides useful information concerning the evolution and divergence of iridoviruses in cultured fish. PMID:15246274

Do, Jeong Wan; Moon, Chang Hoon; Kim, Hyo Jeong; Ko, Myoung Seok; Kim, Sung Bum; Son, Ji Hee; Kim, Jong Soo; An, Eun Jeong; Kim, Min Kyoung; Lee, Seung Koo; Han, Myung Shin; Cha, Seung Ju; Park, Mi Seon; Park, Myoung Ae; Kim, Yi Cheong; Kim, Jin Woo; Park, Jeong Woo

2004-08-01

354

Genome Sequence of a Baculovirus Pathogenic for Culex nigripalpus  

Microsoft Academic Search

In this report we describe the complete genome sequence of a nucleopolyhedrovirus that infects larval stages of the mosquito Culex nigripalpus (CuniNPV). The CuniNPV genome is a circular double-stranded DNA molecule of 108,252 bp and is predicted to contain 109 genes. Although 36 of these genes show homology to genes from other baculoviruses, their orientation and order exhibit little conservation

C. L. Afonso; E. R. Tulman; Z. Lu; C. A. Balinsky; B. A. Moser; J. J. Becnel; D. L. Rock; G. F. Kutish

2001-01-01

355

The complete sequence of the Adoxophyes orana granulovirus genome  

Microsoft Academic Search

The nucleotide sequence of the Adoxophyes orana granulovirus (AdorGV) DNA genome was determined and analysed. The genome contains 99,657 bp and has an A + T content of 65.5%. The analysis predicted 119 ORFs of 150 nucleotides or larger that showed minimal overlap. Of these putative genes, 104 (87%) were homologous to genes identified previously in other baculoviruses. The mean

Sally Wormleaton; John Kuzio; Doreen Winstanley

2003-01-01

356

The complete sequence of the Cydia pomonella granulovirus genome  

Microsoft Academic Search

The nucleotide sequence of the DNA genome of Cydia pomonella granulovirus (CpGV) was determined and analysed. The genome is composed of 123500 bp and has a GMC content of 45-2%. It contains 143 ORFs of 150 nucleotides or more that show minimal overlap. One- hundred-and-eighteen (82-5%) of these putative genes are homologous to genes previously identified in other baculoviruses. Among

Teresa Luque; Ruth Finch; Norman Crook; David R. O'Reilly; Doreen Winstanley

357

Conserved terminal sequences of rice ragged stunt virus genomic RNA  

Microsoft Academic Search

The 5'- and 3'-terminal nucleotide sequences of the dsRNA genome segments of rice ragged stunt virus (RRSV), a member of the plant Reoviridae, were determined and compared with those published for other viruses in this family. The 5'- and T-terminal regions of the RRSV plus strand RNA from all genome segments were found to have the same conserved hexanucleotide (5'

Jin Yan; Hiroshi Kudo; Ichiro Uyeda; Sang-Yong Lee; Eishiro Shikata

1992-01-01

358

Complete Sequence of the Citrus Tristeza Virus RNA Genome  

Microsoft Academic Search

The sequence of the entire genome of citrus tristeza virus (CTV), Florida isolate T36, was completed. The 19,296-nt CTV genome encodes 12 open reading frames (ORFs) potentially coding for at least 17 protein products. The 5?-proximal ORF 1a starts at nucleotide 108 and encodes a large polyprotein with calculated MW of 349 kDa containing domains characteristic of (from 5? to

A. V. Karasev; V. P. Boyko; S. Gowda; O. V. Nikolaeva; M. E. Hilf; E. V. Koonin; C. L. Niblett; K. Cline; D. J. Gumpf; R. F. Lee; S. M. Garnsey; D. J. Lewandowski; W. O. Dawson

1995-01-01

359

The Eimeria genome projects: a sequence of events  

Microsoft Academic Search

An international consortium is the driving force behind several new genome-related projects, mainly focused on Eimeria tenella, the cause of avian, caecal coccidiosis. The largest project is a whole genome shotgun project, which is at 8.3-fold coverage, and is complemented by complete sequencing of the two smallest E. tenella chromosomes and the provision of a physical framework through HAPPY mapping.

Martin W. Shirley; Al Ivens; Arthur Gruber; Alda M. B. N. Madeira; Kiew-Lian Wan; Paul H. Dear; Fiona M. Tomley

2004-01-01

360

Complete Genome Sequence of the Rearranged Porcine Circovirus Type 2  

PubMed Central

We first report here the genome sequences of 4 rearranged porcine circovirus type 2 strains, JSTZ, ZJQDH1, ZJQDH2, and JSHM, isolated from porcine sera in China. The complete circular genomes of these isolates are 578, 483, 574, and 772 nucleotides in length, respectively. They are predicted to be defective interfering particles of porcine circovirus type 2. The findings will help us to understand molecular evolution of porcine circovirus type 2 and the relationship between porcine circovirus type 2 and diseases.

Wen, Libin; Ni, Yanxiu; Zhang, Xuehan; Li, Bin; Wang, Xiaomin; Guo, Rong li; Yu, Zhengyu; Mao, Aihua; Zhou, Junming; Lv, Lixin; Jiang, Jieyuan

2012-01-01

361

Complete genome sequence of the fish pathogen Flavobacterium psychrophilum  

Microsoft Academic Search

We report here the complete genome sequence of the virulent strain JIP02\\/86 (ATCC 49511) of Flavobacterium psychrophilum, a widely distributed pathogen of wild and cultured salmonid fish. The genome consists of a 2,861,988–base pair (bp) circular chromosome with 2,432 predicted protein-coding genes. Among these predicted proteins, stress response mediators, gliding motility proteins, adhesins and many putative secreted proteases are probably

Mekki Boussaha; Valentin Loux; Jean-François Bernardet; Christian Michel; Brigitte Kerouault; Stanislas Mondot; Pierre Nicolas; Robert Bossy; Christophe Caron; Philippe Bessières; Jean-François Gibrat; Stéphane Claverol; Fabien Dumetz; Michel Le Hénaff; Abdenour Benmansour; Eric Duchaud

2007-01-01

362

Complete sequence of the Pepino mosaic virus RNA genome  

Microsoft Academic Search

Summary.  ?We have determined the complete nucleotide sequence (Accession No. AF484251) of the Pepino mosaic virus (PepMV) RNA genome.\\u000a PepMV is the etiological agent of a new disease which affects tomato crops in Europe and North America. The PepMV genome consists\\u000a of one single stranded positive sense RNA 6410?nt long that contains five open reading frames (ORFs). ORF 1 is the

J. M. Aguilar; M. D. Hernández-Gallardo; J. L. Cenis; A. Lacasa; M. A. Aranda

2002-01-01

363

Draft Genome Sequences of Actinobacillus pleuropneumoniae Serotypes 2 and 6 ?  

PubMed Central

Actinobacillus pleuropneumoniae is a bacterial pathogen that causes highly contagious respiratory infection in pigs and has a serious impact on the production economy and animal welfare. As clear differences in virulence between serotypes have been observed, the genetic basis should be investigated at the genomic level. Here, we present the draft genome sequences of the A. pleuropneumoniae serotypes 2 (strain 4226) and 6 (strain Femo).

Zhan, Bujie; Angen, ?ystein; Hedegaard, Jakob; Bendixen, Christian; Panitz, Frank

2010-01-01

364

Complete Genome Sequence of Pelagibacterium halotolerans B2T  

PubMed Central

Pelagibacterium halotolerans B2T is a marine halotolerant bacterium that was isolated from a seawater sample collected from the East China Sea. Here, we present the complete genome sequence of the type strain P. halotolerans B2T, which consists of one chromosome (3,944,837 bp; 61.4% G+C content) and one plasmid (4,050 bp; 56.1% G+C content). This is the first complete genome of a member of the Pelagibacterium genus.

Huo, Ying-Yi; Cheng, Hong; Han, Xi-Fang; Jiang, Xia-Wei; Sun, Cong; Zhang, Xin-Qi; Zhu, Xu-Fen; Liu, Yong-Feng; Li, Peng-Fei; Ni, Pei-Xiang

2012-01-01

365

The genome sequence of the filamentous fungus Neurospora crassa  

Microsoft Academic Search

Neurospora crassa is a central organism in the history of twentieth-century genetics, biochemistry and molecular biology. Here, we report a high-quality draft sequence of the N. crassa genome. The approximately 40-megabase genome encodes about 10,000 protein-coding genes-more than twice as many as in the fission yeast Schizosaccharomyces pombe and only about 25% fewer than in the fruitfly Drosophila melanogaster. Analysis

James E. Galagan; Sarah E. Calvo; Katherine A. Borkovich; Eric U. Selker; Nick D. Read; David Jaffe; William FitzHugh; Li-Jun Ma; Serge Smirnov; Seth Purcell; Bushra Rehman; Timothy Elkins; Reinhard Engels; Shunguang Wang; Cydney B. Nielsen; Jonathan Butler; Matthew Endrizzi; Dayong Qui; Peter Ianakiev; Deborah Bell-Pedersen; Mary Anne Nelson; Margaret Werner-Washburne; Claude P. Selitrennikoff; John A. Kinsey; Edward L. Braun; Alex Zelter; Ulrich Schulte; Gregory O. Kothe; Gregory Jedd; Werner Mewes; Chuck Staben; Edward Marcotte; David Greenberg; Alice Roy; Karen Foley; Jerome Naylor; Nicole Stange-Thomann; Robert Barrett; Sante Gnerre; Michael Kamal; Manolis Kamvysselis; Evan Mauceli; Cord Bielke; Stephen Rudd; Dmitrij Frishman; Svetlana Krystofova; Carolyn Rasmussen; Robert L. Metzenberg; David D. Perkins; Scott Kroken; Carlo Cogoni; Giuseppe Macino; David Catcheside; Weixi Li; Robert J. Pratt; Stephen A. Osmani; Colin P. C. DeSouza; Louise Glass; Marc J. Orbach; J. Andrew Berglund; Rodger Voelker; Oded Yarden; Michael Plamann; Stephan Seiler; Jay Dunlap; Alan Radford; Rodolfo Aramayo; Donald O. Natvig; Lisa A. Alex; Gertrud Mannhaupt; Daniel J. Ebbole; Michael Freitag; Ian Paulsen; Matthew S. Sachs; Eric S. Lander; Chad Nusbaum; Bruce Birren

2003-01-01

366

Genome sequence of the plant pathogen Ralstonia solanacearum  

Microsoft Academic Search

Ralstonia solanacearum is a devastating, soil-borne plant pathogen with a global distribution and an unusually wide host range. It is a model system for the dissection of molecular determinants governing pathogenicity. We present here the complete genome sequence and its analysis of strain GMI1000. The 5.8-megabase (Mb) genome is organized into two replicons: a 3.7-Mb chromosome and a 2.1-Mb megaplasmid.

M. Salanoubat; S. Genin; F. Artiguenave; J. Gouzy; S. Mangenot; M. Arlat; A. Billault; P. Brottier; J. C. Camus; L. Cattolico; M. Chandler; N. Choisne; C. Claudel-Renard; S. Cunnac; N. Demange; C. Gaspin; M. Lavie; A. Moisan; C. Robert; W. Saurin; T. Schiex; P. Siguier; P. Thébault; M. Whalen; P. Wincker; M. Levy; J. Weissenbach; C. A. Boucher

2002-01-01

367

Complete genome sequence of human adenovirus prototype 17.  

PubMed

As one of the first five human adenoviruses (HAdVs) to be sequenced, type 17 was important as a reference tool for comparative genomics of recently isolated HAdV pathogens in species D. HAdV-D17 was the first species D adenovirus to be sequenced and was deposited in GenBank in 1999. These genome data were not of high quality, and a redetermination of the same stock virus provides corrected data; among the differences are a length of 35,139 bp versus 35,100 bp in the original, and 160 mismatches to the original genome were found. Annotation of the coding sequences reveals 39 as opposed to 8, a finding which is important for phylogenomic studies. PMID:21980031

Dehghan, Shoaleh; Seto, Jason; Hudson, Nolan R; Robinson, Christopher M; Jones, Morris S; Dyer, David W; Chodosh, James; Seto, Donald

2011-11-01

368

Molecular Poltergeists: Mitochondrial DNA Copies (numts) in Sequenced Nuclear Genomes  

PubMed Central

The natural transfer of DNA from mitochondria to the nucleus generates nuclear copies of mitochondrial DNA (numts) and is an ongoing evolutionary process, as genome sequences attest. In humans, five different numts cause genetic disease and a dozen human loci are polymorphic for the presence of numts, underscoring the rapid rate at which mitochondrial sequences reach the nucleus over evolutionary time. In the laboratory and in nature, numts enter the nuclear DNA via non-homolgous end joining (NHEJ) at double-strand breaks (DSBs). The frequency of numt insertions among 85 sequenced eukaryotic genomes reveal that numt content is strongly correlated with genome size, suggesting that the numt insertion rate might be limited by DSB frequency. Polymorphic numts in humans link maternally inherited mitochondrial genotypes to nuclear DNA haplotypes during the past, offering new opportunities to associate nuclear markers with mitochondrial markers back in time.

Hazkani-Covo, Einat; Zeller, Raymond M.; Martin, William

2010-01-01

369

Molecular poltergeists: mitochondrial DNA copies (numts) in sequenced nuclear genomes.  

PubMed

The natural transfer of DNA from mitochondria to the nucleus generates nuclear copies of mitochondrial DNA (numts) and is an ongoing evolutionary process, as genome sequences attest. In humans, five different numts cause genetic disease and a dozen human loci are polymorphic for the presence of numts, underscoring the rapid rate at which mitochondrial sequences reach the nucleus over evolutionary time. In the laboratory and in nature, numts enter the nuclear DNA via non-homolgous end joining (NHEJ) at double-strand breaks (DSBs). The frequency of numt insertions among 85 sequenced eukaryotic genomes reveal that numt content is strongly correlated with genome size, suggesting that the numt insertion rate might be limited by DSB frequency. Polymorphic numts in humans link maternally inherited mitochondrial genotypes to nuclear DNA haplotypes during the past, offering new opportunities to associate nuclear markers with mitochondrial markers back in time. PMID:20168995

Hazkani-Covo, Einat; Zeller, Raymond M; Martin, William

2010-02-12

370

Whole-Genome Sequencing for Optimized Patient Management  

PubMed Central

Whole-genome sequencing of patient DNA can facilitate diagnosis of a disease, but its potential for guiding treatment has been under-realized. We interrogated the complete genome sequences of a 14-year-old fraternal twin pair diagnosed with dopa (3,4-dihydroxyphenylalanine)–responsive dystonia (DRD; Mendelian Inheritance in Man #128230). DRD is a genetically heterogeneous and clinically complex movement disorder that is usually treated with l-dopa, a precursor of the neurotransmitter dopamine. Whole-genome sequencing identified compound heterozygous mutations in the SPR gene encoding sepiapterin reductase. Disruption of SPR causes a decrease in tetrahydrobiopterin, a cofactor required for the hydroxylase enzymes that synthesize the neurotransmitters dopamine and serotonin. Supplementation of l-dopa therapy with 5-hydroxytryptophan, a serotonin precursor, resulted in clinical improvements in both twins.

Bainbridge, Matthew N.; Wiszniewski, Wojciech; Murdock, David R.; Friedman, Jennifer; Gonzaga-Jauregui, Claudia; Newsham, Irene; Reid, Jeffrey G.; Fink, John K.; Morgan, Margaret B.; Gingras, Marie-Claude; Muzny, Donna M.; Hoang, Linh D.; Yousaf, Shahed; Lupski, James R.; Gibbs, Richard A.

2012-01-01

371

Whole-genome sequencing for optimized patient management.  

PubMed

Whole-genome sequencing of patient DNA can facilitate diagnosis of a disease, but its potential for guiding treatment has been under-realized. We interrogated the complete genome sequences of a 14-year-old fraternal twin pair diagnosed with dopa (3,4-dihydroxyphenylalanine)-responsive dystonia (DRD; Mendelian Inheritance in Man #128230). DRD is a genetically heterogeneous and clinically complex movement disorder that is usually treated with l-dopa, a precursor of the neurotransmitter dopamine. Whole-genome sequencing identified compound heterozygous mutations in the SPR gene encoding sepiapterin reductase. Disruption of SPR causes a decrease in tetrahydrobiopterin, a cofactor required for the hydroxylase enzymes that synthesize the neurotransmitters dopamine and serotonin. Supplementation of l-dopa therapy with 5-hydroxytryptophan, a serotonin precursor, resulted in clinical improvements in both twins. PMID:21677200

Bainbridge, Matthew N; Wiszniewski, Wojciech; Murdock, David R; Friedman, Jennifer; Gonzaga-Jauregui, Claudia; Newsham, Irene; Reid, Jeffrey G; Fink, John K; Morgan, Margaret B; Gingras, Marie-Claude; Muzny, Donna M; Hoang, Linh D; Yousaf, Shahed; Lupski, James R; Gibbs, Richard A

2011-06-15

372

Bayesian inference of ancient human demography from individual genome sequences.  

PubMed

Whole-genome sequences provide a rich source of information about human evolution. Here we describe an effort to estimate key evolutionary parameters based on the whole-genome sequences of six individuals from diverse human populations. We used a Bayesian, coalescent-based approach to obtain information about ancestral population sizes, divergence times and migration rates from inferred genealogies at many neutrally evolving loci across the genome. We introduce new methods for accommodating gene flow between populations and integrating over possible phasings of diploid genotypes. We also describe a custom pipeline for genotype inference to mitigate biases from heterogeneous sequencing technologies and coverage levels. Our analysis indicates that the San population of southern Africa diverged from other human populations approximately 108-157 thousand years ago, that Eurasians diverged from an ancestral African population 38-64 thousand years ago, and that the effective population size of the ancestors of all modern humans was ?9,000. PMID:21926973

Gronau, Ilan; Hubisz, Melissa J; Gulko, Brad; Danko, Charles G; Siepel, Adam

2011-09-18

373

The complete genome sequence of Escherichia coli K-12.  

PubMed

The 4,639,221-base pair sequence of Escherichia coli K-12 is presented. Of 4288 protein-coding genes annotated, 38 percent have no attributed function. Comparison with five other sequenced microbes reveals ubiquitous as well as narrowly distributed gene families; many families of similar genes within E. coli are also evident. The largest family of paralogous proteins contains 80 ABC transporters. The genome as a whole is strikingly organized with respect to the local direction of replication; guanines, oligonucleotides possibly related to replication and recombination, and most genes are so oriented. The genome also contains insertion sequence (IS) elements, phage remnants, and many other patches of unusual composition indicating genome plasticity through horizontal transfer. PMID:9278503

Blattner, F R; Plunkett, G; Bloch, C A; Perna, N T; Burland, V; Riley, M; Collado-Vides, J; Glasner, J D; Rode, C K; Mayhew, G F; Gregor, J; Davis, N W; Kirkpatrick, H A; Goeden, M A; Rose, D J; Mau, B; Shao, Y

1997-09-01

374

Genome sequence of the model medicinal mushroom Ganoderma lucidum  

PubMed Central

Ganoderma lucidum is a widely used medicinal macrofungus in traditional Chinese medicine that creates a diverse set of bioactive compounds. Here we report its 43.3-Mb genome, encoding 16,113 predicted genes, obtained using next-generation sequencing and optical mapping approaches. The sequence analysis reveals an impressive array of genes encoding cytochrome P450s (CYPs), transporters and regulatory proteins that cooperate in secondary metabolism. The genome also encodes one of the richest sets of wood degradation enzymes among all of the sequenced basidiomycetes. In all, 24 physical CYP gene clusters are identified. Moreover, 78 CYP genes are coexpressed with lanosterol synthase, and 16 of these show high similarity to fungal CYPs that specifically hydroxylate testosterone, suggesting their possible roles in triterpenoid biosynthesis. The elucidation of the G. lucidum genome makes this organism a potential model system for the study of secondary metabolic pathways and their regulation in medicinal fungi.

Chen, Shilin; Xu, Jiang; Liu, Chang; Zhu, Yingjie; Nelson, David R.; Zhou, Shiguo; Li, Chunfang; Wang, Lizhi; Guo, Xu; Sun, Yongzhen; Luo, Hongmei; Li, Ying; Song, Jingyuan; Henrissat, Bernard; Levasseur, Anthony; Qian, Jun; Li, Jianqin; Luo, Xiang; Shi, Linchun; He, Liu; Xiang, Li; Xu, Xiaolan; Niu, Yunyun; Li, Qiushi; Han, Mira V.; Yan, Haixia; Zhang, Jin; Chen, Haimei; Lv, Aiping; Wang, Zhen; Liu, Mingzhu; Schwartz, David C.; Sun, Chao

2012-01-01

375

A Complete Sequence of the T. tengcongensis Genome  

PubMed Central

Thermoanaerobacter tengcongensis is a rod-shaped, gram-negative, anaerobic eubacterium that was isolated from a freshwater hot spring in Tengchong, China. Using a whole-genome-shotgun method, we sequenced its 2,689,445-bp genome from an isolate, MB4T (Genbank accession no. AE008691). The genome encodes 2588 predicted coding sequences (CDS). Among them, 1764 (68.2%) are classified according to homology to other documented proteins, and the rest, 824 CDS (31.8%), are functionally unknown. One of the interesting features of the T. tengcongensis genome is that 86.7% of its genes are encoded on the leading strand of DNA replication. Based on protein sequence similarity, the T. tengcongensis genome is most similar to that of Bacillus halodurans, a mesophilic eubacterium, among all fully sequenced prokaryotic genomes up to date. Computational analysis on genes involved in basic metabolic pathways supports the experimental discovery that T. tengcongensis metabolizes sugars as principal energy and carbon source and utilizes thiosulfate and element sulfur, but not sulfate, as electron acceptors. T. tengcongensis, as a gram-negative rod by empirical definitions (such as staining), shares many genes that are characteristics of gram-positive bacteria whereas it is missing molecular components unique to gram-negative bacteria. A strong correlation between the G?+?C content of tDNA and rDNA genes and the optimal growth temperature is found among the sequenced thermophiles. It is concluded that thermophiles are a biologically and phylogenetically divergent group of prokaryotes that have converged to sustain extreme environmental conditions over evolutionary timescale. [Supplemental material is available online at http://www.genome.org.

Bao, Qiyu; Tian, Yuqing; Li, Wei; Xu, Zuyuan; Xuan, Zhenyu; Hu, Songnian; Dong, Wei; Yang, Jian; Chen, Yanjiong; Xue, Yanfen; Xu, Yi; Lai, Xiaoqin; Huang, Li; Dong, Xiuzhu; Ma, Yanhe; Ling, Lunjiang; Tan, Huarong; Chen, Runsheng; Wang, Jian; Yu, Jun; Yang, Huanming

2002-01-01

376

Normalization and centering of array-based heterologous genome hybridization based on divergent control probes  

Microsoft Academic Search

Background  Hybridization of heterologous (non-specific) nucleic acids onto arrays designed for model-organisms has been proposed as a\\u000a viable genomic resource for estimating sequence variation and gene expression in non-model organisms. However, conventional\\u000a methods of normalization that assume equivalent distributions (such as quantile normalization) are inappropriate when applied\\u000a to non-specific (heterologous) hybridization. We propose an algorithm for normalizing and centering intensity data

Brian J Darby; Kenneth L Jones; David Wheeler; Michael A Herman

2011-01-01

377

Sequence complexity profiles of prokaryotic genomic sequences: A fast algorithm for calculating linguistic complexity  

Microsoft Academic Search

Motivation: One of the major features of genomic DNA sequences, distinguishing them from texts in most spoken or artificial languages, is their high repetitiveness. Variation in the repetitiveness of genomic texts reflects the presence and density of different biologically important messages. Thus, deviation from an expected number of repeats in both directions indicates a possible presence of a biological signal.

Olga G. Troyanskaya; Ora Arbell; Yair Koren; Gad M. Landau; Alexander Bolshoy

2002-01-01

378

Projector 2: contig mapping for efficient gap-closure of prokaryotic genome sequence assemblies  

Microsoft Academic Search

With genome sequencing efforts increasing expo- nentially, valuable information accumulates on geno- mic content of the various organisms sequenced. Projector 2 uses (un)finished genomic sequences of an organism as a template to infer linkage informa- tion for a genome sequence assembly of a related organism being sequenced. The remaining gaps between contigs for which no linkage information is present can

Sacha A. F. T. Van Hijum; Aldert L. Zomer; Oscar P. Kuipers; Jan Kok

2005-01-01

379

Draft Genome Sequence of Bacillus endophyticus 2102  

PubMed Central

Bacillus endophyticus 2102 is an endospore-forming, plant growth-promoting rhizobacterium isolated from a hypersaline pond in South Korea. Here we present the draft sequence of B. endophyticus 2102, which is of interest because of its potential use in the industrial production of algaecides and bioplastics and for the treatment of industrial textile effluents.

Lee, Yong-Jik; Lee, Sang-Jae; Kim, Sun Hong; Lee, Sang Jun; Kim, Byoung-Chan; Lee, Han-Seung

2012-01-01

380

Revised Genome Sequence of Brucella suis 1330  

PubMed Central

Brucella suis is a causative agent of porcine brucellosis. We report the resequencing of the original sample upon which the published sequence of Brucella suis 1330 is based and describe the differences between the published assembly and our assembly at 12 loci.

Tae, Hongseok; Shallom, Shamira; Settlage, Robert; Preston, Dale; Adams, L. Garry; Garner, Harold R.

2011-01-01

381

Complete genomic sequence of Pasteurella multocida,Pm70  

PubMed Central

We present here the complete genome sequence of a common avian clone of Pasteurella multocida, Pm70. The genome of Pm70 is a single circular chromosome 2,257,487 base pairs in length and contains 2,014 predicted coding regions, 6 ribosomal RNA operons, and 57 tRNAs. Genome-scale evolutionary analyses based on pairwise comparisons of 1,197 orthologous sequences between P. multocida, Haemophilus influenzae, and Escherichia coli suggest that P. multocida and H. influenzae diverged ?270 million years ago and the ? subdivision of the proteobacteria radiated about 680 million years ago. Two previously undescribed open reading frames, accounting for ?1% of the genome, encode large proteins with homology to the virulence-associated filamentous hemagglutinin of Bordetella pertussis. Consistent with the critical role of iron in the survival of many microbial pathogens, in silico and whole-genome microarray analyses identified more than 50 Pm70 genes with a potential role in iron acquisition and metabolism. Overall, the complete genomic sequence and preliminary functional analyses provide a foundation for future research into the mechanisms of pathogenesis and host specificity of this important multispecies pathogen.

May, Barbara J.; Zhang, Qing; Li, Ling Ling; Paustian, Michael L.; Whittam, Thomas S.; Kapur, Vivek

2001-01-01

382

The genome sequence of the colonial chordate, Botryllus schlosseri.  

PubMed

Botryllus schlosseri is a colonial urochordate that follows the chordate plan of development following sexual reproduction, but invokes a stem cell-mediated budding program during subsequent rounds of asexual reproduction. As urochordates are considered to be the closest living invertebrate relatives of vertebrates, they are ideal subjects for whole genome sequence analyses. Using a novel method for high-throughput sequencing of eukaryotic genomes, we sequenced and assembled 580 Mbp of the B. schlosseri genome. The genome assembly is comprised of nearly 14,000 intron-containing predicted genes, and 13,500 intron-less predicted genes, 40% of which could be confidently parceled into 13 (of 16 haploid) chromosomes. A comparison of homologous genes between B. schlosseri and other diverse taxonomic groups revealed genomic events underlying the evolution of vertebrates and lymphoid-mediated immunity. The B. schlosseri genome is a community resource for studying alternative modes of reproduction, natural transplantation reactions, and stem cell-mediated regeneration. DOI:http://dx.doi.org/10.7554/eLife.00569.001. PMID:23840927

Voskoboynik, Ayelet; Neff, Norma F; Sahoo, Debashis; Newman, Aaron M; Pushkarev, Dmitry; Koh, Winston; Passarelli, Benedetto; Fan, H Christina; Mantalas, Gary L; Palmeri, Karla J; Ishizuka, Katherine J; Gissi, Carmela; Griggio, Francesca; Ben-Shlomo, Rachel; Corey, Daniel M; Penland, Lolita; White, Richard A; Weissman, Irving L; Quake, Stephen R

2013-07-02

383

Melanoma genome sequencing reveals frequent PREX2 mutations.  

PubMed

Melanoma is notable for its metastatic propensity, lethality in the advanced setting and association with ultraviolet exposure early in life. To obtain a comprehensive genomic view of melanoma in humans, we sequenced the genomes of 25 metastatic melanomas and matched germline DNA. A wide range of point mutation rates was observed: lowest in melanomas whose primaries arose on non-ultraviolet-exposed hairless skin of the extremities (3 and 14 per megabase (Mb) of genome), intermediate in those originating from hair-bearing skin of the trunk (5-55 per Mb), and highest in a patient with a documented history of chronic sun exposure (111 per Mb). Analysis of whole-genome sequence data identified PREX2 (phosphatidylinositol-3,4,5-trisphosphate-dependent Rac exchange factor 2)--a PTEN-interacting protein and negative regulator of PTEN in breast cancer--as a significantly mutated gene with a mutation frequency of approximately 14% in an independent extension cohort of 107 human melanomas. PREX2 mutations are biologically relevant, as ectopic expression of mutant PREX2 accelerated tumour formation of immortalized human melanocytes in vivo. Thus, whole-genome sequencing of human melanoma tumours revealed genomic evidence of ultraviolet pathogenesis and discovered a new recurrently mutated gene in melanoma. PMID:22622578

Berger, Michael F; Hodis, Eran; Heffernan, Timothy P; Deribe, Yonathan Lissanu; Lawrence, Michael S; Protopopov, Alexei; Ivanova, Elena; Watson, Ian R; Nickerson, Elizabeth; Ghosh, Papia; Zhang, Hailei; Zeid, Rhamy; Ren, Xiaojia; Cibulskis, Kristian; Sivachenko, Andrey Y; Wagle, Nikhil; Sucker, Antje; Sougnez, Carrie; Onofrio, Robert; Ambrogio, Lauren; Auclair, Daniel; Fennell, Timothy; Carter, Scott L; Drier, Yotam; Stojanov, Petar; Singer, Meredith A; Voet, Douglas; Jing, Rui; Saksena, Gordon; Barretina, Jordi; Ramos, Alex H; Pugh, Trevor J; Stransky, Nicolas; Parkin, Melissa; Winckler, Wendy; Mahan, Scott; Ardlie, Kristin; Baldwin, Jennifer; Wargo, Jennifer; Schadendorf, Dirk; Meyerson, Matthew; Gabriel, Stacey B; Golub, Todd R; Wagner, Stephan N; Lander, Eric S; Getz, Gad; Chin, Lynda; Garraway, Levi A

2012-05-09

384

The genome sequence of the colonial chordate, Botryllus schlosseri  

PubMed Central

Botryllus schlosseri is a colonial urochordate that follows the chordate plan of development following sexual reproduction, but invokes a stem cell-mediated budding program during subsequent rounds of asexual reproduction. As urochordates are considered to be the closest living invertebrate relatives of vertebrates, they are ideal subjects for whole genome sequence analyses. Using a novel method for high-throughput sequencing of eukaryotic genomes, we sequenced and assembled 580 Mbp of the B. schlosseri genome. The genome assembly is comprised of nearly 14,000 intron-containing predicted genes, and 13,500 intron-less predicted genes, 40% of which could be confidently parceled into 13 (of 16 haploid) chromosomes. A comparison of homologous genes between B. schlosseri and other diverse taxonomic groups revealed genomic events underlying the evolution of vertebrates and lymphoid-mediated immunity. The B. schlosseri genome is a community resource for studying alternative modes of reproduction, natural transplantation reactions, and stem cell-mediated regeneration. DOI: http://dx.doi.org/10.7554/eLife.00569.001

Voskoboynik, Ayelet; Neff, Norma F; Sahoo, Debashis; Newman, Aaron M; Pushkarev, Dmitry; Koh, Winston; Passarelli, Benedetto; Fan, H Christina; Mantalas, Gary L; Palmeri, Karla J; Ishizuka, Katherine J; Gissi, Carmela; Griggio, Francesca; Ben-Shlomo, Rachel; Corey, Daniel M; Penland, Lolita; White, Richard A; Weissman, Irving L; Quake, Stephen R

2013-01-01

385

Workshop: Can you assemble whole genomes from next generation sequencing data?  

Microsoft Academic Search

Despite recent advances in sequencing technologies, our ability to reconstruct complete genomes from sequencing data has not improved. We will overview several of the challenges that need to be overcome to enable the full reconstruction of genomes from sequencing data.

Mihai Pop

2011-01-01

386

An evaluation of Comparative Genome Sequencing (CGS) by comparing two previously-sequenced bacterial genomes  

Microsoft Academic Search

BACKGROUND: With the development of new technology, it has recently become practical to resequence the genome of a bacterium after experimental manipulation. It is critical though to know the accuracy of the technique used, and to establish confidence that all of the mutations were detected. RESULTS: In order to evaluate the accuracy of genome resequencing using the microarray-based Comparative Genome

Christopher D Herring; Bernhard Ø Palsson

2007-01-01

387

A cryptographic approach to securely share and query genomic sequences.  

PubMed

To support large-scale biomedical research projects, organizations need to share person-specific genomic sequences without violating the privacy of their data subjects. In the past, organizations protected subjects' identities by removing identifiers, such as name and social security number; however, recent investigations illustrate that deidentified genomic data can be "reidentified" to named individuals using simple automated methods. In this paper, we present a novel cryptographic framework that enables organizations to support genomic data mining without disclosing the raw genomic sequences. Organizations contribute encrypted genomic sequence records into a centralized repository, where the administrator can perform queries, such as frequency counts, without decrypting the data. We evaluate the efficiency of our framework with existing databases of single nucleotide polymorphism (SNP) sequences and demonstrate that the time needed to complete count queries is feasible for real world applications. For example, our experiments indicate that a count query over 40 SNPs in a database of 5000 records can be completed in approximately 30 min with off-the-shelf technology. We further show that approximation strategies can be applied to significantly speed up query execution times with minimal loss in accuracy. The framework can be implemented on top of existing information and network technologies in biomedical environments. PMID:18779075

Kantarcioglu, Murat; Jiang, Wei; Liu, Ying; Malin, Bradley

2008-09-01

388

Complete genome sequence of Pyrolobus fumarii type strain (1AT)  

SciTech Connect

Pyrolobus fumarii Bl chl et al. 1997 is the type species of the genus Pyrolobus, which be- longs to the crenarchaeal family Pyrodictiaceae. The species is a facultatively microaerophilic non-motile crenarchaeon. It is of interest because of its isolated phylogenetic location in the tree of life and because it is a hyperthermophilic chemolithoautotroph known as the primary producer of organic matter at deep-sea hydrothermal vents. P. fumarii exhibits currently the highest optimal growth temperature of all life forms on earth (106 C). This is the first com- pleted genome sequence of a member of the genus Pyrolobus to be published and only the second genome sequence from a member of the family Pyrodictiaceae. Although Diversa Corporation announced the completion of sequencing of the P. fumarii genome on Septem- ber 25, 2001, this sequence was never released to the public. The 1,843,267 bp long genome with its 1,986 protein-coding and 52 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

Anderson, Iain [U.S. Department of Energy, Joint Genome Institute; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Hammon, Nancy [U.S. Department of Energy, Joint Genome Institute; Deshpande, Shweta [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Tapia, Roxanne [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Huntemann, Marcel [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Pagani, Ioanna [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Brambilla, Evelyne-Marie [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Huber, Harald [Universitat Regensburg, Regensburg, Germany; Yasawong, Montri [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Spring, Stefan [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Abt, Birte [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Wirth, Reinhard [Universitat Regensburg, Regensburg, Germany; Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute

2011-01-01

389

Mapping and sequencing of structural variation from eight human genomes  

PubMed Central

Genetic variation among individual humans occurs on many different scales, ranging from gross alterations in the human karyotype to single nucleotide changes. Here we explore variation on an intermediate scale—particularly insertions, deletions and inversions affecting from a few thousand to a few million base pairs. We employed a clone-based method to interrogate this intermediate structural variation in eight individuals of diverse geographic ancestry. Our analysis provides a comprehensive overview of the normal pattern of structural variation present in these genomes, refining the location of 1,695 structural variants. We find that 50% were seen in more than one individual and that nearly half lay outside regions of the genome previously described as structurally variant. We discover 525 new insertion sequences that are not present in the human reference genome and show that many of these are variable in copy number between individuals. Complete sequencing of 261 structural variants reveals considerable locus complexity and provides insights into the different mutational processes that have shaped the human genome. These data provide the first high-resolution sequence map of human structural variation—a standard for genotyping platforms and a prelude to future individual genome sequencing projects.

Kidd, Jeffrey M.; Cooper, Gregory M.; Donahue, William F.; Hayden, Hillary S.; Sampas, Nick; Graves, Tina; Hansen, Nancy; Teague, Brian; Alkan, Can; Antonacci, Francesca; Haugen, Eric; Zerr, Troy; Yamada, N. Alice; Tsang, Peter; Newman, Tera L.; Tuzun, Eray; Cheng, Ze; Ebling, Heather M.; Tusneem, Nadeem; David, Robert; Gillett, Will; Phelps, Karen A.; Weaver, Molly; Saranga, David; Brand, Adrianne; Tao, Wei; Gustafson, Erik; McKernan, Kevin; Chen, Lin; Malig, Maika; Smith, Joshua D.; Korn, Joshua M.; McCarroll, Steven A.; Altshuler, David A.; Peiffer, Daniel A.; Dorschner, Michael; Stamatoyannopoulos, John; Schwartz, David; Nickerson, Deborah A.; Mullikin, James C.; Wilson, Richard K.; Bruhn, Laurakay; Olson, Maynard V.; Kaul, Rajinder; Smith, Douglas R.; Eichler, Evan E.

2008-01-01

390

Genome sequence and analysis of the tuber crop potato.  

PubMed

Potato (Solanum tuberosum L.) is the world's most important non-grain food crop and is central to global food security. It is clonally propagated, highly heterozygous, autotetraploid, and suffers acute inbreeding depression. Here we use a homozygous doubled-monoploid potato clone to sequence and assemble 86% of the 844-megabase genome. We predict 39,031 protein-coding genes and present evidence for at least two genome duplication events indicative of a palaeopolyploid origin. As the first genome sequence of an asterid, the potato genome reveals 2,642 genes specific to this large angiosperm clade. We also sequenced a heterozygous diploid clone and show that gene presence/absence variants and other potentially deleterious mutations occur frequently and are a likely cause of inbreeding depression. Gene family expansion, tissue-specific expression and recruitment of genes to new pathways contributed to the evolution of tuber development. The potato genome sequence provides a platform for genetic improvement of this vital crop. PMID:21743474

Xu, Xun; Pan, Shengkai; Cheng, Shifeng; Zhang, Bo; Mu, Desheng; Ni, Peixiang; Zhang, Gengyun; Yang, Shuang; Li, Ruiqiang; Wang, Jun; Orjeda, Gisella; Guzman, Frank; Torres, Michael; Lozano, Roberto; Ponce, Olga; Martinez, Diana; De la Cruz, Germán; Chakrabarti, S K; Patil, Virupaksh U; Skryabin, Konstantin G; Kuznetsov, Boris B; Ravin, Nikolai V; Kolganova, Tatjana V; Beletsky, Alexey V; Mardanov, Andrei V; Di Genova, Alex; Bolser, Daniel M; Martin, David M A; Li, Guangcun; Yang, Yu; Kuang, Hanhui; Hu, Qun; Xiong, Xingyao; Bishop, Gerard J; Sagredo, Boris; Mejía, Nilo; Zagorski, Wlodzimierz; Gromadka, Robert; Gawor, Jan; Szczesny, Pawel; Huang, Sanwen; Zhang, Zhonghua; Liang, Chunbo; He, Jun; Li, Ying; He, Ying; Xu, Jianfei; Zhang, Youjun; Xie, Binyan; Du, Yongchen; Qu, Dongyu; Bonierbale, Merideth; Ghislain, Marc; Herrera, Maria del Rosario; Giuliano, Giovanni; Pietrella, Marco; Perrotta, Gaetano; Facella, Paolo; O'Brien, Kimberly; Feingold, Sergio E; Barreiro, Leandro E; Massa, Gabriela A; Diambra, Luis; Whitty, Brett R; Vaillancourt, Brieanne; Lin, Haining; Massa, Alicia N; Geoffroy, Michael; Lundback, Steven; DellaPenna, Dean; Buell, C Robin; Sharma, Sanjeev Kumar; Marshall, David F; Waugh, Robbie; Bryan, Glenn J; Destefanis, Marialaura; Nagy, Istvan; Milbourne, Dan; Thomson, Susan J; Fiers, Mark; Jacobs, Jeanne M E; Nielsen, Kåre L; Sønderkær, Mads; Iovene, Marina; Torres, Giovana A; Jiang, Jiming; Veilleux, Richard E; Bachem, Christian W B; de Boer, Jan; Borm, Theo; Kloosterman, Bjorn; van Eck, Herman; Datema, Erwin; Hekkert, Bas te Lintel; Goverse, Aska; van Ham, Roeland C H J; Visser, Richard G F

2011-07-10

391

Castor bean organelle genome sequencing and worldwide genetic diversity analysis.  

PubMed

Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade. PMID:21750729

Rivarola, Maximo; Foster, Jeffrey T; Chan, Agnes P; Williams, Amber L; Rice, Danny W; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M J; Khouri, Hoda M; Beckstrom-Sternberg, Stephen M; Allan, Gerard J; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D

2011-07-07

392

Genome sequence and description of Aeromicrobium massiliense sp. nov.  

PubMed

Aeromicrobium massiliense strain JC14(T)sp. nov. is the type strain of Aeromicrobium massiliense sp. nov., a new species within the genus Aeromicrobium. This strain, whose genome is described here, was isolated from the fecal microbiota of an asymptomatic patient. Aeromicrobium massiliense is an aerobic rod-shaped gram-positive bacterium. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,322,119 bp long genome contains 3,296 protein-coding and 51 RNA genes. PMID:23408663

Ramasamy, Dhamodharan; Kokcha, Sahare; Lagier, Jean-Christophe; Nguyen, Thi-Thien; Raoult, Didier; Fournier, Pierre-Edouard

2012-11-15

393

Draft Genome Sequence of Pseudoalteromonas luteoviolacea Strain B (ATCC 29581).  

PubMed

We report the 4.049-Mbp high-quality draft assembly of the Pseudoalteromonas luteoviolacea strain B (ATCC 29581) genome. This marine species is known to biosynthesize several antimicrobial compounds, including the purple pigment violacein. Whole-genome sequencing and genome mining will complement experimental studies aimed at elucidating novel biosynthetic pathways capable of producing pharmaceutically relevant molecules. Based upon 16S rRNA phylogenetic analysis, we propose that strain ATCC 29581 be classified as a distinct phylogenetic species of the genus Pseudoalteromonas. PMID:23516191

Cress, Brady F; Erkert, Kelly A; Barquera, Blanca; Koffas, Mattheos A G

2013-02-28

394

Draft Genome Sequence of Pseudoalteromonas luteoviolacea Strain B (ATCC 29581)  

PubMed Central

We report the 4.049-Mbp high-quality draft assembly of the Pseudoalteromonas luteoviolacea strain B (ATCC 29581) genome. This marine species is known to biosynthesize several antimicrobial compounds, including the purple pigment violacein. Whole-genome sequencing and genome mining will complement experimental studies aimed at elucidating novel biosynthetic pathways capable of producing pharmaceutically relevant molecules. Based upon 16S rRNA phylogenetic analysis, we propose that strain ATCC 29581 be classified as a distinct phylogenetic species of the genus Pseudoalteromonas.

Cress, Brady F.; Erkert, Kelly A.; Barquera, Blanca

2013-01-01

395

Genome sequence and description of Aeromicrobium massiliense sp. nov.  

PubMed Central

Aeromicrobium massiliense strain JC14Tsp. nov. is the type strain of Aeromicrobium massiliense sp. nov., a new species within the genus Aeromicrobium. This strain, whose genome is described here, was isolated from the fecal microbiota of an asymptomatic patient. Aeromicrobium massiliense is an aerobic rod-shaped gram-positive bacterium. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,322,119 bp long genome contains 3,296 protein-coding and 51 RNA genes.

Ramasamy, Dhamodharan; Kokcha, Sahare; Lagier, Jean-Christophe; Nguyen, Thi-Thien; Raoult, Didier

2012-01-01

396

Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome  

Microsoft Academic Search

Background  It is widely accepted that comparative sequence data can aid the functional annotation of genome sequences; however, the most\\u000a informative species and features of genome evolution for comparison remain to be determined.\\u000a \\u000a \\u000a \\u000a \\u000a Results  We analyzed conservation in eight genomic regions (apterous, even-skipped, fushi tarazu, twist, and Rhodopsins 1, 2, 3 and 4) from four Drosophila species (D. erecta, D. pseudoobscura, D.

Casey M Bergman; Barret D Pfeiffer; Diego E Rincón-Limas; Roger A Hoskins; Andreas Gnirke; Chris J Mungall; Adrienne M Wang; Brent Kronmiller; Joanne Pacleb; Soo Park; Mark Stapleton; Kenneth Wan; Reed A George; Pieter J de Jong; Juan Botas; Gerald M Rubin; Susan E Celniker

2002-01-01

397

The nucleotide sequence and genome organization of Plasmopara halstedii virus  

Microsoft Academic Search

Background  Only very few viruses of Oomycetes have been studied in detail. Isometric virions were found in different isolates of the\\u000a oomycete Plasmopara halstedii, the downy mildew pathogen of sunflower. However, complete nucleotide sequences and data on the genome organization were\\u000a lacking.\\u000a \\u000a \\u000a \\u000a \\u000a Methods  Viral RNA of different P. halstedii isolates was subjected to nucleotide sequencing and analysis of the viral genome. The

Marion Heller-Dohmen; Jens C Göpfert; Jens Pfannstiel; Otmar Spring

2011-01-01

398

Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing  

PubMed Central

Background Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. Results A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. Conclusions The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first step in the development of a community resource for further study of plant-insect co-evolution, anti-herbivore defense, floral developmental genetics, reproductive biology, chemical evolution, population genetics, and comparative genomics using milkweeds, and A. syriaca in particular, as ecological and evolutionary models.

2011-01-01

399

Establishing a framework for comparative analysis of genome sequences  

SciTech Connect

This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

Bansal, A.K.

1995-06-01

400

Deep Whole-Genome Sequencing of 100 Southeast Asian Malays  

PubMed Central

Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies.

Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

2013-01-01

401

Unveiling Mycoplasma hyopneumoniae Promoters: Sequence Definition and Genomic Distribution  

PubMed Central

Several Mycoplasma species have had their genome completely sequenced, including four strains of the swine pathogen Mycoplasma hyopneumoniae. Nevertheless, little is known about the nucleotide sequences that control transcriptional initiation in these microorganisms. Therefore, with the objective of investigating the promoter sequences of M. hyopneumoniae, 23 transcriptional start sites (TSSs) of distinct genes were mapped. A pattern that resembles the ?70 promoter ?10 element was found upstream of the TSSs. However, no ?35 element was distinguished. Instead, an AT-rich periodic signal was identified. About half of the experimentally defined promoters contained the motif 5?-TRTGn-3?, which was identical to the ?16 element usually found in Gram-positive bacteria. The defined promoters were utilized to build position-specific scoring matrices in order to scan putative promoters upstream of all coding sequences (CDSs) in the M. hyopneumoniae genome. Two hundred and one signals were found associated with 169 CDSs. Most of these sequences were located within 100 nucleotides of the start codons. This study has shown that the number of promoter-like sequences in the M. hyopneumoniae genome is more frequent than expected by chance, indicating that most of the sequences detected are probably biologically functional.

Weber, Shana de Souto; Sant'Anna, Fernando Hayashi; Schrank, Irene Silveira

2012-01-01

402

The mitochondrial genome sequence of the Tasmanian tiger (Thylacinus cynocephalus).  

PubMed

We report the first two complete mitochondrial genome sequences of the thylacine (Thylacinus cynocephalus), or so-called Tasmanian tiger, extinct since 1936. The thylacine's phylogenetic position within australidelphian marsupials has long been debated, and here we provide strong support for the thylacine's basal position in Dasyuromorphia, aided by mitochondrial genome sequence that we generated from the extant numbat (Myrmecobius fasciatus). Surprisingly, both of our thylacine sequences differ by 11%-15% from putative thylacine mitochondrial genes in GenBank, with one of our samples originating from a direct offspring of the previously sequenced individual. Our data sample each mitochondrial nucleotide an average of 50 times, thereby providing the first high-fidelity reference sequence for thylacine population genetics. Our two sequences differ in only five nucleotides out of 15,452, hinting at a very low genetic diversity shortly before extinction. Despite the samples' heavy contamination with bacterial and human DNA and their temperate storage history, we estimate that as much as one-third of the total DNA in each sample is from the thylacine. The microbial content of the two thylacine samples was subjected to metagenomic analysis, and showed striking differences between a wild-captured individual and a born-in-captivity one. This study therefore adds to the growing evidence that extensive sequencing of museum collections is both feasible and desirable, and can yield complete genomes. PMID:19139089

Miller, Webb; Drautz, Daniela I; Janecka, Jan E; Lesk, Arthur M; Ratan, Aakrosh; Tomsho, Lynn P; Packard, Mike; Zhang, Yeting; McClellan, Lindsay R; Qi, Ji; Zhao, Fangqing; Gilbert, M Thomas P; Dalén, Love; Arsuaga, Juan Luis; Ericson, Per G P; Huson, Daniel H; Helgen, Kristofer M; Murphy, William J; Götherström, Anders; Schuster, Stephan C

2009-01-12

403

The mitochondrial genome sequence of the Tasmanian tiger (Thylacinus cynocephalus)  

PubMed Central

We report the first two complete mitochondrial genome sequences of the thylacine (Thylacinus cynocephalus), or so-called Tasmanian tiger, extinct since 1936. The thylacine's phylogenetic position within australidelphian marsupials has long been debated, and here we provide strong support for the thylacine's basal position in Dasyuromorphia, aided by mitochondrial genome sequence that we generated from the extant numbat (Myrmecobius fasciatus). Surprisingly, both of our thylacine sequences differ by 11%–15% from putative thylacine mitochondrial genes in GenBank, with one of our samples originating from a direct offspring of the previously sequenced individual. Our data sample each mitochondrial nucleotide an average of 50 times, thereby providing the first high-fidelity reference sequence for thylacine population genetics. Our two sequences differ in only five nucleotides out of 15,452, hinting at a very low genetic diversity shortly before extinction. Despite the samples’ heavy contamination with bacterial and human DNA and their temperate storage history, we estimate that as much as one-third of the total DNA in each sample is from the thylacine. The microbial content of the two thylacine samples was subjected to metagenomic analysis, and showed striking differences between a wild-captured individual and a born-in-captivity one. This study therefore adds to the growing evidence that extensive sequencing of museum collections is both feasible and desirable, and can yield complete genomes.

Miller, Webb; Drautz, Daniela I.; Janecka, Jan E.; Lesk, Arthur M.; Ratan, Aakrosh; Tomsho, Lynn P.; Packard, Mike; Zhang, Yeting; McClellan, Lindsay R.; Qi, Ji; Zhao, Fangqing; Gilbert, M. Thomas P.; Dalen, Love; Arsuaga, Juan Luis; Ericson, Per G.P.; Huson, Daniel H.; Helgen, Kristofer M.; Murphy, William J.; Gotherstrom, Anders; Schuster, Stephan C.

2009-01-01

404

The Genome Sequence DataBase (GSDB): improving data quality and data access.  

PubMed

In 1997 the primary focus of the Genome Sequence DataBase (GSDB; www. ncgr.org/gsdb ) located at the National Center for Genome Resources was to improve data quality and accessibility. Efforts to increase the quality of data within the database included two major projects; one to identify and remove all vector contamination from sequences in the database and one to create premier sequence sets (including both alignments and discontiguous sequences). Data accessibility was improved during the course of the last year in several ways. First, a graphical database sequence viewer was made available to researchers. Second, an update process was implemented for the web-based query tool, Maestro. Third, a web-based tool, Excerpt, was developed to retrieve selected regions of any sequence in the database. And lastly, a GSDB flatfile that contains annotation unique to GSDB (e.g., sequence analysis and alignment data) was developed. Additionally, the GSDB web site provides a tool for the detection of matrix attachment regions (MARs), which can be used to identify regions of high coding potential. The ultimate goal of this work is to make GSDB a more useful resource for genomic comparison studies and gene level studies by improving data quality and by providing data access capabilities that are consistent with the needs of both types of studies. PMID:9399793

Harger, C; Skupski, M; Bingham, J; Farmer, A; Hoisie, S; Hraber, P; Kiphart, D; Krakowski, L; McLeod, M; Schwertfeger, J; Seluja, G; Siepel, A; Singh, G; Stamper, D; Steadman, P; Thayer, N; Thompson, R; Wargo, P; Waugh, M; Zhuang, J J; Schad, P A

1998-01-01

405

Overview of PSB track on gene structure identification in large-scale genomic sequence  

SciTech Connect

The recent funding of more than a dozen major genome centers to begin community-wide high-throughput sequencing of the human genome has created a significant new challenge for the computational analysis of DNA sequence and the prediction of gene structure and function. It has been estimated that on average from 1996 to 2003, approximately 2 million bases of newly finished DNA sequence will be produced every day and be made available on the Internet and in central databases. The finished (fully assembled) sequence generated each day will represent approximately 75 new genes (and their respective proteins), and many times this number will be represented in partially completed sequences. The information contained in these is of immeasurable value to medical research, biotechnology, the pharmaceutical industry and researchers in a host of fields ranging from microorganism metabolism, to structural biology, to bioremediation. Sequencing of microorganisms and other model organisms is also ramping up at a very rapid rate. The genomes for yeast and several microorganisms such as H. influenza have recently been fully sequenced, although the significance of many genes remains to be determined.

Uberbacher, E.C.; Xu, Y.

1998-12-31

406

The genome sequence of the model ascomycete fungus Podospora anserina  

PubMed Central

Background The dung-inhabiting ascomycete fungus Podospora anserina is a model used to study various aspects of eukaryotic and fungal biology, such as ageing, prions and sexual development. Results We present a 10X draft sequence of P. anserina genome, linked to the sequences of a large expressed sequence tag collection. Similar to higher eukaryotes, the P. anserina transcription/splicing machinery generates numerous non-conventional transcripts. Comparison of the P. anserina genome and orthologous gene set with the one of its close relatives, Neurospora crassa, shows that synteny is poorly conserved, the main result of evolution being gene shuffling in the same chromosome. The P. anserina genome contains fewer repeated sequences and has evolved new genes by duplication since its separation from N. crassa, despite the presence of the repeat induced point mutation mechanism that mutates duplicated sequences. We also provide evidence that frequent gene loss took place in the lineages leading to P. anserina and N. crassa. P. anserina contains a large and highly specialized set of genes involved in utilization of natural carbon sources commonly found in its natural biotope. It includes genes potentially involved in lignin degradation and efficient cellulose breakdown. Conclusion The features of the P. anserina genome indicate a highly dynamic evolution since the divergence of P. anserina and N. crassa, leading to the ability of the former to use specific complex carbon sources that match its needs in its natural biotope.

Espagne, Eric; Lespinet, Olivier; Malagnac, Fabienne; Da Silva, Corinne; Jaillon, Olivier; Porcel, Betina M; Couloux, Arnaud; Aury, Jean-Marc; Segurens, Beatrice; Poulain, Julie; Anthouard, Veronique; Grossetete, Sandrine; Khalili, Hamid; Coppin, Evelyne; Dequard-Chablat, Michelle; Picard, Marguerite; Contamine, Veronique; Arnaise, Sylvie; Bourdais, Anne; Berteaux-Lecellier, Veronique; Gautheret, Daniel; de Vries, Ronald P; Battaglia, Evy; Coutinho, Pedro M; Danchin, Etienne GJ; Henrissat, Bernard; Khoury, Riyad EL; Sainsard-Chanet, Annie; Boivin, Antoine; Pinan-Lucarre, Berangere; Sellem, Carole H; Debuchy, Robert; Wincker, Patrick; Weissenbach, Jean; Silar, Philippe

2008-01-01

407

Predicting Prokaryotic Ecological Niches Using Genome Sequence Analysis  

PubMed Central

Automated DNA sequencing technology is so rapid that analysis has become the rate-limiting step. Hundreds of prokaryotic genome sequences are publicly available, with new genomes uploaded at the rate of approximately 20 per month. As a result, this growing body of genome sequences will include microorganisms not previously identified, isolated, or observed. We hypothesize that evolutionary pressure exerted by an ecological niche selects for a similar genetic repertoire in those prokaryotes that occupy the same niche, and that this is due to both vertical and horizontal transmission. To test this, we have developed a novel method to classify prokaryotes, by calculating their Pfam protein domain distributions and clustering them with all other sequenced prokaryotic species. Clusters of organisms are visualized in two dimensions as ‘mountains’ on a topological map. When compared to a phylogenetic map constructed using 16S rRNA, this map more accurately clusters prokaryotes according to functional and environmental attributes. We demonstrate the ability of this map, which we term a “niche map”, to cluster according to ecological niche both quantitatively and qualitatively, and propose that this method be used to associate uncharacterized prokaryotes with their ecological niche as a means of predicting their functional role directly from their genome sequence.

Suen, Garret; Goldman, Barry S.; Welch, Roy D.

2007-01-01

408

Sequence modelling and an extensible data model for genomic database  

SciTech Connect

The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

Li, Peter Wei-Der [California Univ., San Francisco, CA (United States)]|[Lawrence Berkeley Lab., CA (United States)

1992-01-01

409

Sequence modelling and an extensible data model for genomic database  

SciTech Connect

The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

Li, Peter Wei-Der (California Univ., San Francisco, CA (United States) Lawrence Berkeley Lab., CA (United States))

1992-01-01

410

Prostate cancer genomics by high-throughput technologies: genome-wide association study and sequencing analysis.  

PubMed

Prostate cancer (PC) is the most common malignancy in males. It is evident that genetic factors at both germline and somatic levels play critical roles in prostate carcinogenesis. Recently, genome-wide association studies (GWAS) by high-throughput genotyping technology have identified more than 70 germline variants of various genes or chromosome loci that are significantly associated with PC susceptibility. They include multiple 8q24 loci, prostate-specific genes, and metabolism-related genes. Somatic alterations in PC genomes have been explored by high-throughput sequencing technologies such as whole-genome sequencing and RNA sequencing, which have identified a variety of androgen-responsive events and fusion transcripts represented by E26 transformation-specific (ETS) gene fusions. Recent innovations in high-throughput genomic technologies have enabled us to analyze PC genomics more comprehensively, more precisely, and on a larger scale in multiple ethnic groups to increase our understanding of PC genomics and biology in germline and somatic studies, which can ultimately lead to personalized medicine for PC diagnosis, prevention, and therapy. However, these data indicate that the PC genome is more complex and heterogeneous than we expected from GWAS and sequencing analyses. PMID:23625613

Nakagawa, Hidewaki

2013-06-24

411

CoryneCenter - An online resource for the integrated analysis of corynebacterial genome and transcriptome data  

PubMed Central

Background The introduction of high-throughput genome sequencing and post-genome analysis technologies, e.g. DNA microarray approaches, has created the potential to unravel and scrutinize complex gene-regulatory networks on a large scale. The discovery of transcriptional regulatory interactions has become a major topic in modern functional genomics. Results To facilitate the analysis of gene-regulatory networks, we have developed CoryneCenter, a web-based resource for the systematic integration and analysis of genome, transcriptome, and gene regulatory information for prokaryotes, especially corynebacteria. For this purpose, we extended and combined the following systems into a common platform: (1) GenDB, an open source genome annotation system, (2) EMMA, a MAGE compliant application for high-throughput transcriptome data storage and analysis, and (3) CoryneRegNet, an ontology-based data warehouse designed to facilitate the reconstruction and analysis of gene regulatory interactions. We demonstrate the potential of CoryneCenter by means of an application example. Using microarray hybridization data, we compare the gene expression of Corynebacterium glutamicum under acetate and glucose feeding conditions: Known regulatory networks are confirmed, but moreover CoryneCenter points out additional regulatory interactions. Conclusion CoryneCenter provides more than the sum of its parts. Its novel analysis and visualization features significantly simplify the process of obtaining new biological insights into complex regulatory systems. Although the platform currently focusses on corynebacteria, the integrated tools are by no means restricted to these species, and the presented approach offers a general strategy for the analysis and verification of gene regulatory networks. CoryneCenter provides freely accessible projects with the underlying genome annotation, gene expression, and gene regulation data. The system is publicly available at .

Neuweger, Heiko; Baumbach, Jan; Albaum, Stefan; Bekel, Thomas; Dondrup, Michael; Huser, Andrea T; Kalinowski, Jorn; Oehm, Sebastian; Puhler, Alfred; Rahmann, Sven; Weile, Jochen; Goesmann, Alexander

2007-01-01

412

Mitochondrial DNA sequences in the nuclear genome of a locust.  

PubMed

The endosymbiotic theory of the origin of mitochondria is widely accepted, and implies that loss of genes from the mitochondria to the nucleus of eukaryotic cells has occurred over evolutionary time. However, evidence at the DNA sequence level for gene transfer between these organelles has so far been limited to a single example, the demonstration that a mitochondrial ATPase subunit gene of Neurospora crassa has an homologous partner in the nuclear genome. From a gene library of the insect, Locusta migratoria, we have now isolated two clones, representing separate fragments of nuclear DNA, which contain sequences homologous to the mitochondrial genes for ribosomal RNA, as well as regions of homology with highly repeated nuclear sequences. The results suggest the transfer of sequences between mitochondrial and nuclear genomes, followed by evolutionary divergence. PMID:6298629

Gellissen, G; Bradfield, J Y; White, B N; Wyatt, G R

413

Complete genome sequence of Thauera aminoaromatica strain MZ1T  

PubMed Central

Thauera aminoaromatica strain MZ1T, an isolate belonging to genus Thauera, of the family Rhodocyclaceae and the class the Betaproteobacteria, has been characterized for its ability to produce abundant exopolysaccharide and degrade various aromatic compounds with nitrate as an electron acceptor. These properties, if fully understood at the genome-sequence level, can aid in environmental processing of organic matter in anaerobic cycles by short-circuiting a central anaerobic metabolite, acetate, from microbiological conversion to methane, a critical greenhouse gas. Strain MZ1T is the first strain from the genus Thauera with a completely sequenced genome. The 4,496,212 bp chromosome and 78,374 bp plasmid contain 4,071 protein-coding and 71 RNA genes, and were sequenced as part of the DOE Community Sequencing Program CSP_776774.

Jiang, Ke; Sanseverino, John; Chauhan, Archana; Lucas, Susan; Copeland, Alex; Lapidus, Alla; Del Rio, Tijana Glavina; Dalin, Eileen; Tice, Hope; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Sims, David; Brettin, Thomas; Dette